Clinical Modelling of RVHF Using Pre-Operative Variables: A Direct and Inverse Feature Extraction Technique

Right ventricular heart failure (RVHF) mostly occurs due to the failure of the left-side of the heart. RVHF is a serious disease that leads to swelling of the abdomen, ankles, liver, kidneys, and gastrointestinal (GI) tract. A total of 506 heart-failure subjects from the Faculty of Medicine, Cardiovascular Surgery Department, Ege University, Turkey, who suffered from a severe heart failure and are currently receiving support from a ventricular assistance device, were involved in the current study. Therefore, the current study explored the application of both the direct and inverse modelling approaches, based on the correlation analysis feature extraction performance of various pre-operative variables of the subjects, for the prediction of RVHF. The study equally employs both single and hybrid paradigms for the prediction of RVHF using different pre-operative variables. The visualized and quantitative performance of the direct and inverse modelling approach indicates the robust prediction performance of the hybrid paradigms over the single techniques in both the calibration and validation steps. Whereby, the quantitative performance of the hybrid techniques, based on the Nash–Sutcliffe coefficient (NC) metric, depicts its superiority over the single paradigms by up to 58.7%/75.5% and 80.3%/51% for the calibration/validation phases in the direct and inverse modelling approaches, respectively. Moreover, to the best knowledge of the authors, this is the first study to report the implementation of direct and inverse modelling on clinical data. The findings of the current study indicates the possibility of applying these novel hybridised paradigms for the prediction of RVHF using pre-operative variables.


Introduction
Right ventricular heart failure (RHVF) is generally due to problems arising from the left atrium [1]. This heart disease is a syndrome characterised by the inability of the cardiac output to match the body's metabolic demands, stemming from structural or functional impairment of ventricular filling or ejection. There are, however, two predominant conditions where the anatomic right ventricular (RV) lies in the systemic position and the anatomic left ventricular (LV) lies in the subpulmonary position, namely d-Transposition of the great arteries (d-TGA), which is palliated with an atrial switch operation, and congenitally corrected transposition of the great arteries (cc-TGA) [2] Most patients with symptoms and signs of heart failure have a left ventricular ejection fraction that is not markedly abnormal [3]. The recognition of the magnitude of the problem of heart failure with preserved ejection fraction in the past 20 years has spurred an explosion of clinical investigation and a growing intensity of informative outcome trials [4].
Although substantial progress has been made in understanding the various pathological conditions of RVHF, effective and accurate medical assessment of these factors, as well as the benefits of RVHF diagnosis at an earlier stage, have led to a tremendous surge in the use of detectors [5][6][7]. Furthermore, Manca et al. [8] reported that Titin (TTN)related dilated cardiomyopathy (DCM) has a higher likelihood of left ventricular reverse remodelling compared with other genetic etiologies. Moreover, research indicates that previously there were no data regarding the evolution of right ventricular dysfunction (RVD) according to genetic background. Moreover, the findings of these results indicate that the evolution of RVD in DCM is heterogeneous in different genetic backgrounds. Whereby, TTN-related DCM is associated with a higher chance of RVD recovery compared with other genetic etiologies.
Therefore, the need for understanding this medical condition using a simple, fast, and cost-effective technique is of paramount importance. For instance, the application of artificial intelligence (AI) and machine learning (ML) in locating, analysing, interpreting, forecasting, and classifying the medical information associated with RVHF can serve as the robust integration for understanding this important health condition, especially when provided with the necessary medical information. Recently, ML and AI have been recommended as valuable methods to enhance illness prognosis, diagnosis, and prediction as well as progress management [5,9]. Several machine learning (ML) algorithms have been implemented in RVHF to improve the medical workflow and avoid the limitations of conventional methods. A recent example is the ML-based research by Jingjing et al. [10], which uses AI-assisted auscultation to detect congenital cardiac disorders at a Shanghai children's medical center; they focused on the sensitivity, specificity, and accuracy of remote auscultation. This study found that remote auscultation could find unusual heart sounds with 98% responsiveness, 91% specificity, 97% accuracy, and a 0.87 kappa coefficient. A total of 1397 people with CHD signed up for the survey. The rest of the specimens from the 1362-patient population (mean age 2.4 3.1 years, 46% female) were evaluated.
Silvia et al. [11] reported that cardiovascular disease (CVD), despite the significant advances in its diagnosis and treatment, still represents the leading cause of morbidity and mortality worldwide. Therefore, to improve and optimise CVD outcomes, artificial intelligence techniques have the potential to radically change the way we practice cardiology, offering us novel tools to interpret data and make clinical decisions. AI techniques such as machine learning and deep learning can also improve medical knowledge due to the increase in the volume and complexity of the data, unlocking clinically relevant information.
The current study is, to our best knowledge, the first in the technical published literature that employs the applications of hybridised paradigms (ILR-ANFS, ILR-GPR, and ILR-GRNN) for the clinical prediction of RVHF using pre-operative variables and, ultimately, is based on the recent technical literature as well as a scan of the literature, as shown in Figure 1; in addition, to our best knowledge, this is the first study that reports the feasibility of applying direct and inverse modelling for clinical prediction in health-and medical-related studies.
Therefore, the current study aims at understanding the connection between preoperative variables and RVHF by using the correlational feature extraction method based on heart failure patients' data. Based on the correlational analysis results of the preoperative variables and RVHF as the dependent variable, two novel approaches were developed, namely the inverse and direct approaches for modelling RVHF using single stand-alone models with improved novel hybridised paradigms. Therefore, the current study aims at understanding the connection between pre-operative variables and RVHF by using the correlational feature extraction method based on heart failure patients' data. Based on the correlational analysis results of the pre-operative variables and RVHF as the dependent variable, two novel approaches were developed, namely the inverse and direct approaches for modelling RVHF using single stand-alone models with improved novel hybridised paradigms.

Gaussian Process Regression (GPR)
Gaussian process regression (GPR) can be referred to as robust non-linear prediction model, which is probabilistic, nonparametric, supervised, and unsupervised learning method that generalises the non-linear and complex function mapping hidden in datasets. Recently, GPR has increasingly attracted the attention of researchers from different engineering fields [12,13]. GPR is capable of handling non-linear data due to its use of kernel functions. Moreover, one of the merits of a GPR model is that the model can provide a reliable response to input data [14].

Gaussian Process Regression (GPR)
Gaussian process regression (GPR) can be referred to as robust non-linear prediction model, which is probabilistic, nonparametric, supervised, and unsupervised learning method that generalises the non-linear and complex function mapping hidden in datasets. Recently, GPR has increasingly attracted the attention of researchers from different engineering fields [12,13]. GPR is capable of handling non-linear data due to its use of kernel functions. Moreover, one of the merits of a GPR model is that the model can provide a reliable response to input data [14].

Adaptive Neuro-Fuzzy Inference System (ANFIS)
When there are problems, ANFIS can figure out what they are and how to fix them. Its origin was from the feed-forward and multilayer adaptive networks. ANFIS is jointly made up of input variables and the rule of fuzzy, which has dependent and independent variables according to TSK (Takagi-Sugeno-Kan) inferences. Fuzzification and defuzzier are all contained in the database of fuzzy. By using membership function parameters, fuzzy set theory converts the information into fuzzified values.
Nodes had a vital role as member functions (MFs), which made it possible to model the correlation between the two parameters in a way that makes sense. It has a triangular, trapezoidal sigmoid and Gaussian member function [15]. Based on the theory, Equations (1) and (2) are created.
Rule No.1 : if µ(x) is A 1 and µ(y)B 1 then f 1 = p 1 x + q 1 y + r 1 (1) Rule No. 2 : if µ(x) is A 2 and µ(y)is B 2 then f 2 = p 2 x + q 2 y + r 2 (2) A 1 ,B 1 , A 2 , B 2 parameters are membership functions for x and y, and inputs p 1 , q 1 , r 1 , p 2 , q 2 , r 2 , are output function data. The structure of ANFIS and its formulation agrees with a neural net set up with 5 tiers. More details about ANFIS are explained by Khademi et al. 2016.

Generalised Regression Neural Network (GRNN)
GRNN, also known as the lazy training method model, was developed by Specht [16] to behave in the manner of the regression method, by generating a relationship between the dependent manipulated variable (X) and the outcome variable (Y) with a non-linear regression estimation for a smaller group of data. The input layer is similar to that of a conventional neural network, in that its main purpose is to train the input data, and the size of the input vectors is the main determinant of the number of neurons required for training. The model training begins immediately in the pattern layer due to the Gaussian kernel's conversion of previously input data. The smoothing parameter (σ) is used to calculate the weight of each neuron in this layer. This parameter is referred to as the "hyper-parameter of the GRNN model", and it contributes to the GRNN model's prediction accuracy [17]. Its general form is depicted as follows.
where X equals the input data of the dataset to be tested, X i is the ith input of the training dataset, and σ is the smoothing parameter.
ILR (isometric log-ratio) is a type of regression that looks at how the relying (target) factors and one or more reaction (response) parameters are related and communicate [22]. Overall, S. I. Abba et al. [23] demonstrates that the multi-linear regression (MLR) concept is the most commonly used regression model. MLR could be read even though it has a lower prediction performance than AI-based modelling techniques [24]. Generally, LR models can be expressed as: where y represents the target parameter, x 1 equals the value of the ith predictor, b 0 denotes constant regression, and b i indicates ith predictor coefficient.

Hybrid-Based Paradigms
Various modellers have different opinions regarding the buildup of models, methodology employed in inputting data, and duration of modelling; all had a significant impact in optimal performance of the model [25][26][27][28].
The problem associated with artificial intelligence could be surmounted courtesy of evolving techniques utilised in eradicating the issues. This recent outstanding technique takes into consideration the straight line association between the input data and the predicted variable, and also there is no direct relationship between the information and variable it gives out.
These models, in artificial intelligence (AI) models such as Gaussian process regression (GPR), general regression neural networks and adaptive neuro-fuzzy inference, have been impacted positively.
According to the work of Marrero-Ponce et al. [29], ANN (artificial neural network) as well as MLR (multi-linear regression) have been excellent in making remarkable predictions; this is as a result of the combination of rectilinear and haphazard domains of these models and their applications [30][31][32][33]. The "no free lunch" theorem emphatically states that no singular model could be applied to varieties of datasets. The optimal performance of model is dependent on the kind of information the model utilises in working and data features such as measurement of linearity, size, and general wholeness, which all contribute immensely to the working principle of the model.
Several researchers have reiterated that the same information and performance index can vary once various provided variants are utilised.
Again, when data intelligence models are modified, they are wholly utilised in multiple problems.
Therefore, in this study, four single models were employed to figure out right ventricular heart failure, which are GPR (Gaussian process regression), GRNN (general regression neural network), adaptive neuro-fuzzy inference, and ILR. An algorithm named "Hybrid Data Intelligence" is then suggested. This combines both the linear ILR model and the various artificial intelligence-based prototypes (ILR, GPR, GRNN, and ANFIS) to reap the benefits of both the unique characteristics and strong points of both models, especially for predicting data trends of various types. The combination of different modelling approaches makes the whole process work efficiently. Afterwards, an ILR model learns in order to acquire the finest models with linear characteristics. The composite (hybrid) method has two categories. ILR, with learning that is associated with general best values, cannot model non-linear features of the data, thereby it returns to the sequential ILR model, which has information about non-linear dynamics and can be utilised by artificial intelligence models to represent information.
f (y t ) = q t + r t (5) where q t represents the linear phase, and r t represents the non-linear phase. In order to evaluate how the two proceed, information must be utilised. Let represent the residual at time t from the linear model, therefore: where f is given as the function that is not linear, which was determined by the AI models that make them, GPR, L-Boost, and SVM, giving ε t as random errors. Moreover, Figure 2 depicts the methods used based on block diagram.

Grading Metrics of the Models Employed in the Current Study
The optimal performance of the model is estimated using different variables that compare estimated values to the one obtained for any category of dataset.
where N, Y obsi , Y, and Y comi are data quantity, data that are seen, overall average of data noted, and calculated values, respectively.

Grading Metrics of the Models Employed in the Current Study
The optimal performance of the model is estimated using different variables th compare estimated values to the one obtained for any category of dataset.
Hence, the current study employed the application of both single and hybrid paradigms using different pre-operative variables for the prediction of RVHF. In addition, prior to the modelling step, correlation-based feature extraction step was conducted in order to separate the variables based on their connections with the RVHF output variable. The direct modelling scenario involves pre-operative variables, namely mPAP, CVP, tpg, alt, ast, BUN, pre Bili, PT time, pre htc, pre-sodium, pre ty, ECMO, and preMV for the prediction of RVHF, while the inverse modelling scenario, composed of pcw, creatinine, pre INR, pre lvesd, pre lvef, pre my, pre ay, pre spap, pre tapse, and IABP, was involved in the clinical prediction of RVHF.
Furthermore, the fundamental aim of data-driven techniques is geared toward the model of a collection of datasets, with the pointer in use as a building block for a reliable prediction of unknown. Keeping in mind that several constraints such as overfitting and underfitting lead to poor training and testing results. Testing schemes include kfold cross-validation, holdout, leave one out, and others. The 10 k-fold cross-validation procedure was used in the current study [34], for assessing and validating the datasets used. According to the k-fold cross-validation, researchers have divided the information: 75% for the calibrating (training) stage and 25% for the checking (verification) stage. Moreover, Table 1   Based on Table 1, the minimum value is 0, which denotes 'no', meaning normal patients; the maximum value is 1, meaning 'yes', which indicates patients suffering from RVHF. The complete dataset was obtained from a clinical study consisting of 506 heart-failure subjects from the Faculty of Medicine, Cardiovascular Surgery Department, Ege University, Turkey, who suffered from a severe heart failure and are currently receiving support from a ventricular assistance device. Furthermore, the data points were divided into 75% for the calibration stage and 25% for the testing stage. The data were subsequently validated to check and control potential modelling problems such as overfitting and underfitting.

Model Conceptualisation
Phase 2: Simulation using single models The stand-alone paradigms (GPR, GRNN, and ANFIS) together with the traditional linear regression ILR were all conducted using MATLAB 9.3 (R2020a).
Phase 3: Hybrid data-intelligence The data-intelligence techniques involve combining the properties of the classical linear ILR technique with AI-based techniques in order to improve the performance of the single models. Hence, ILR, GPR, GRNN, and ANFIS are developed.
Furthermore, the major essence of developing these techniques is to understand the behaviour of different clinical data (consisting of the input and output variables) to assign some weight that can be used in predicting the target. For instance, in the current study, the pre-operative variables are used in determining whether a patient will suffer from RVHF or not. Moreover, developing mobile application is very possible through this approach, in order to assist clinicians, patients, and policy makers in understanding the behaviour and risk factors of RVHF.

Results
Recently, computational techniques have been established in the medical field for the prediction of various diseases, prevalence, and disorders. AI-based techniques and machine learning are the major dominant techniques over classical regressions, owing to their robustness in handling highly chaotic datasets. Therefore, understanding a dataset prior to the employment of AI techniques for prediction is of paramount importance. Different feature-extraction techniques such as mutual understanding, sensitivity analysis, and correlational analysis feature the extraction method. Therefore, an exploratory method based on correlation analysis is utilised in Table 2 in order to determine the input-output relation of the variables used in the current study.
An exploratory technique informed of correlation analysis is employed in Table 2 to elucidate the relation between the variables used in the current study.

Performance of the Single and Hybrid Paradigms for Modelling RVHF Using Pre-Operative Variables
The currect section demonstrates both the quantitative and visualised performance of both the single and hybrid paradigms for the direct and inverse prediction of RVHF using various pre-operative variables.

Discussion
The correlational analysis shown in Table 2 demonstrates both the direct and inverse  relation for the input and the corresponding RVHF variable. Based on Table 2a, all the independent variables showed a weak connection with the output variable (RVHF), whereby pre sodium and PT time showed a superior relation than the others, with R-values equal to 0.16 and 0.15, respectively. Moreover, CVP showed a 0 correlational value with RVHF; this is due to the fact that most of the patients are suffering from left ventricular heart failure but not RVHF. Furthermore, Table 2b equally indicates that all the input variables showed a weak negative correlation with RVHF as the corresponding output variable. Based on the inverse results, pre my with an R-value equal to −0.24 with RVHF showed the highest relation with the output variable, in which pre spap showed a 0 relation with the corresponding RVHF output variable.
Based on the correlational analysis performance shown in Table 2, four different single-paradigm (GPR, GRNN, ANFIS, and ILR) models integrated with three different novel hybrid paradigms (ILR-GPR, ILR-GRNN, and ILR-ANFIS) were used in predicting RVHF using various pre-operative variables in two different scenarios (direct and inverse approaches). Table 3 demonstrates the performance of the direct modelling scenario approach for the prediction of RVHF. The performance metrics, PC, DC, RMSE, and MSE, used in the current scenario indicate the superiority of the hybrid techniques over the single approaches for the clinical modelling of RVHF in both the calibration and validation stages. Various modellers reported that for any prediction model or computation paradigm to be accepted, it should show a minimum DC-value of 0.8. Therefore, all four single models (ILR, ANFIS, GRNN, and GPR) failed to fulfill this criterion, as they failed to meet this requirement in both the calibration and validation phases. In addition, ILR-GPR and ILR-ANFIS, with DC-values equal to 0.862 and 0.813, respectively, in the calibration phase fulfilled the minimum requirement. Whereby, ILR-ANFIS (0.696) failed and ILR-GPR (0.861) succeeded in the validation step. The weak performance of the paradigms can be attributed to the low correlation between the pre-operative variables and RVHF as the corresponding output variable, as shown in Table 2a. In general, for both the single and hybrid paradigms used in the direct modelling of RVHF based on the pre-operative variables, only the ILR-GPR hybrid technique was able to capture the highly non-linear and chaotic nature of the dataset. Hence, the performance of the results obtained in the current study is in line with studies reported by Konstantinos et al. [35] regarding the current and future state of the AI-enhanced electrocardiogram in detecting heart disease for patients in at-risk population densities. Their technique has made it possible for the electrocardiogram to be interpreted quickly.
Moreover, Muni et al. used AI techniques to present the importance of passive and semi-sensors and unique approaches analysing heart failure [36]. More studies related with the implementation of ML and AI on cardiovascular diseases require elucidation.
The performance of both the single and hybrid techniques can be comparatively visualised using various graphical illustrations. For instance, the MSE-and RMSE-values are used to indicate the error performance of each model. The error performance of both the single and hybrid techniques developed using the direct modelling approach can be graphically compared in both the calibration and validation steps using column and bar charts (see Figure 3). Hence, the performance of the results obtained in the current study is in line with studies reported by Konstantinos et al. [35] regarding the current and future state of the AI-enhanced electrocardiogram in detecting heart disease for patients in at-risk population densities. Their technique has made it possible for the electrocardiogram to be interpreted quickly.
Moreover, Muni et al. used AI techniques to present the importance of passive and semi-sensors and unique approaches analysing heart failure [36]. More studies related with the implementation of ML and AI on cardiovascular diseases require elucidation.
The performance of both the single and hybrid techniques can be comparatively visualised using various graphical illustrations. For instance, the MSE-and RMSE-values are used to indicate the error performance of each model. The error performance of both the single and hybrid techniques developed using the direct modelling approach can be graphically compared in both the calibration and validation steps using column and bar charts (see Figure 3). Moreover, the fitness of the direct modelling can be compared based on performances against the clinical RVHF values, which can be visualised using a time series plot (see Figure 4). Furthermore, Table 4 indicates the performance of inverse modelling for the prediction of RVHF in both the calibration and validation steps. Based on the quantitative performance of the single and hybrid techniques, it can be seen that all four single models, GPR, GRNN, ANFIS, and ILR, failed to predict RVHF as the dependent variable. Whereas, for the hybridbased inverse modelling, ILR-GPR and ILR-ANFIS were able to predict the behaviour and properties of the complex RVHF dataset. Moreover, comparative analysis of all the inversebased techniques indicated that ILR-ANFIS outperformed all six other techniques in both the training and validation stages. Furthermore, based on the DC-metrics performance of the ILR-ANFIS technique, its ability in improving the performance prediction of the single paradigms increased up to 81% and 51% in the calibration and validation stages, respectively. Hence, the comparative performance of the techniques can be graphically compared, based on the performance-error in terms of RMSE and MSE (see Figure 5).
Moreover, the metrics PC and DC indicate the performance fitness between the predicted and experimental values. Therefore, the response plot information of the time series can be used to compare the performance of the hybrid techniques for the simulation of RVHF (see Figure 6). Moreover, the fitness of the direct modelling can be compared based on performances against the clinical RVHF values, which can be visualised using a time series plot (see Figure 4). Furthermore, Table 4 indicates the performance of inverse modelling for the prediction of RVHF in both the calibration and validation steps. Based on the quantitative performance of the single and hybrid techniques, it can be seen that all four single models, GPR, GRNN, ANFIS, and ILR, failed to predict RVHF as the dependent variable. Whereas, for the hybrid-based inverse modelling, ILR-GPR and ILR-ANFIS were able to predict the behaviour and properties of the complex RVHF dataset. Moreover, comparative analysis of all the inverse-based techniques indicated that ILR-ANFIS outperformed all six other techniques in both the training and validation stages. Furthermore, based on the DC-metrics performance of the ILR-ANFIS technique, its ability in improving the performance prediction of the single paradigms increased up to 81% and 51% in the calibration and validation stages, respectively. Hence, the comparative performance of the techniques can be graphically compared, based on the performance-error in terms of RMSE and MSE (see Figure 5).   Hence, the novelty of the current work can be shown in different ways: (1) This is the first study that reports the combined application of GPR, GRNN, and ANFIS AI-based techniques for the clinical prediction of RVHF using pre-operative variables. (2) It is, equally, the first study that employs the application of the ILR regression method for the clinical modelling of RVHF; in fact, this is the first study that reports the implementation of this model in any clinical/health-related study. Ultimately, to the best knowledge of the authors, based on the recent technical literature as well as a scan of the literature, as shown in Figure 1, this is the first study that reports the feasibility of applying direct and inverse modelling for clinical prediction in health-and medical-related studies. Moreover, the quantitative performance of the hybrid technique based on the Nash-Sutcliffe coefficient (NC) metric depicts its superiority over the single paradigms by up to 58.7%/75.5% and 80.3%/51% for the calibration/validation phases in the direct and inverse modelling approaches, respectively. However, one of the major limitations of the current study is the employment of a two-step technique: the single and hybrid approaches; whereby, the single approach was unable to capture the RVHF datasets owing to its complexity and chaotic nature.  Moreover, the metrics PC and DC indicate the performance fitness between the predicted and experimental values. Therefore, the response plot information of the time series can be used to compare the performance of the hybrid techniques for the simulation of RVHF (see Figure 6).  Hence, the novelty of the current work can be shown in different ways: (1) This is the first study that reports the combined application of GPR, GRNN, and ANFIS AI-based techniques for the clinical prediction of RVHF using pre-operative variables. (2) It is, equally, the first study that employs the application of the ILR regression method for the clinical modelling of RVHF; in fact, this is the first study that reports the implementation of this model in any clinical/health-related study. Ultimately, to the best knowledge of the authors, based on the recent technical literature as well as a scan of the literature, as shown in Figure 1, this is the first study that reports the feasibility of applying direct and inverse modelling for clinical prediction in health-and medical-related studies. Moreover, the quantitative performance of the hybrid technique based on the Nash-Sutcliffe coefficient (NC) metric depicts its superiority over the single paradigms by up to 58.7%/75.5% and 80.3%/51% for the calibration/validation phases in the direct and inverse modelling approaches, respectively. However, one of the major limitations of the current study is the employment of a two-step technique: the single and hybrid approaches; whereby, the single approach was unable to capture the RVHF datasets owing to its complexity and chaotic nature.

Conclusions
Medical informatics deals with improving the management of clinical knowledge, patient data, population data, and information related to patient care. This emerging technique is regarded as a promising tool that helps policy and decision makers in making critical decisions related to patients' care. Therefore, the current study explores the application of both direct and inverse modelling using AI-based techniques and hybrid-based paradigms for the prediction of RVHF; whereby, the hybrid techniques depict a higher performance compared with the single paradigms. Hence, the results of the current research recommend the application of various metaheuristic and computational approaches for improving the prediction ability of RVHF using various pre-operative variables. Furthermore, future work on different ways of identifying the complex behaviour of the data through the non-linear feature extraction technique, feature scaling, the normalisation of data, and standardisation are equally recommended.