A Novel Blunge Calibration Intelligent Feature Classification Model for the Prediction of Hypothyroid Disease

According to the Indian health line report, 12% of the population suffer from abnormal thyroid functioning. The major challenge in this disease is that the existence of hypothyroid may not propagate any noticeable symptoms in its early stages. However, delayed treatment of this disease may lead to several other health problems, such as fertility issues and obesity. Therefore, early treatment is essential for patient survival. The proposed technology could be used for the prediction of hypothyroid disease and its severity during its early stages. Though several classification and regression algorithms are available for the prediction of hypothyroid using clinical information, there exists a gap in knowledge as to whether predicted outcomes may reach a higher accuracy or not. Therefore, the objective of this research is to predict the existence of hypothyroidism with higher accuracy by optimizing the estimator list of the pycaret classifier model. With this overview, a blunge calibration intelligent feature classification model that supports the assessment of the presence of hypothyroidism with high accuracy is proposed. A hypothyroidism dataset containing 3163 patient details with 23 independent and one dependent feature from the University of California Irvine (UCI) machine-learning repository was used for this work. We undertook dataset preprocessing and determined its incomplete values. Exploratory data analysis was performed to analyze all the clinical parameters and the extent to which each feature supports the prediction of hypothyroidism. ANOVA was used to verify the F-statistic values of all attributes that might highly influence the target. Then, hypothyroidism was predicted using various classifier algorithms, and the performance metrics were analyzed. The original dataset was subjected to dimensionality reduction by using regressor and classifier feature-selection algorithms to determine the best subset components for predicting hypothyroidism. The feature-selected subset of the clinical parameters was subjected to various classifier algorithms, and its performance was analyzed. The system was implemented with python in the Spyder editor of Anaconda Navigator IDE. Investigational results show that the Gaussian naive Bayes, AdaBoost classifier, and Ridge classifier maintained the accuracy of 89.5% for the regressor feature-selection methods. The blunge calibration regression model (BCRM) was designed with naive Bayes, AdaBoost, and Ridge as the estimators with accuracy optimization and with soft blending based on the sum of predicted probabilities of classifiers. The proposed BCRM showed 99.5% accuracy in predicting hypothyroidism. The implementation results show that the Kernel SVM, KNeighbor, and Ridge classifier maintained an accuracy of 87.5% for the classifier feature-selection methods. The blunge calibration classifier model (BCCM) was developed with Kernel SVM, KNeighbor, and Ridge as the estimators, with accuracy optimization and with soft blending based on the sum of predicted probabilities of classifiers. The proposed BCCM showed 99.7% accuracy in predicting hypothyroidism. The main contribution of this research is the design of BCCM and BCRM models that were built with accuracy optimization with soft blending based on the sum of predicted probabilities of classifiers. The BCRM and BCCM models uniqueness’s are achieved by updating the estimators list with the effective classifiers and regressors that suit the application at runtime.


Introduction
For the accurate diagnosis of thyroid illness, functional data from the thyroid gland must be interpreted. Hypothyroidism is a condition where the thyroid gland in the body is unable to secrete thyroid hormone. Women are eight times as likely as males to suffer a thyroid condition. The thyroid condition tends to worsen and persist with ageing and may affect adults differently compared with children. The thyroid gland mainly helps in controlling the body's metabolism. Globally, thyroid disorders have begun to become more prevalent. For instance, one in eight women in Romania suffers from thyroid cancer. Approximately 30% of Romanians have endemic goiters. A limited diet, the use of drugs, anxiety, sickness, trauma, pollutants, and other elements can all affect thyroid function. Predetermined data sets can be categorized using classification, and this is a crucial supervised learning data-mining approach.

Literature Review
We examined, using machine learning, the thyroid data included in UC Irvine's knowledge discovery archive [1]. Thyroid disease has been classified as a common thyroid dysfunction in the general population. Our findings demonstrate the great accuracy of each of the aforementioned classification models, in which the decision tree model has the highest categorization rate. The infrastructure for creating and evaluating the models was provided by the KNIME analytics platform and Weka, which are two data-mining applications [2]. Classification is commonly used in the healthcare sector to inform business choices, diagnose patients, and provide them with exceptional care [3].
The precise estimation of thyroid gland operational information is critical for thyroid diagnosis. The thyroid gland mainly aids in the control of an individual's metabolism. The types of thyroid disease are determined by the production of either too little or too much thyroid hormone. Various neural networks have been used in this study to aid in the analysis of thyroid disease [4]. These networks aimed to diagnose thyroid disease by using a new hybrid machine-learning method that includes our classification system. A method for solving this diagnosis problem via classification was obtained by hybridizing AIRS with an advanced fuzzy weighted pre-processing. A cross-validation analysis was used to determine the technique's soundness for sampling variability [5]. A novel hybrid machine learning approach that incorporates this classification system was used to identify thyroid illness. AIRS and sophisticated fuzzy weighted pre-processing were combined to create a strategy for categorizing this diagnostic issue. By using cross-validation analysis, the technique's robustness for sampling variability was evaluated [6]. The expansion of scientific knowledge and the massive production of data have resulted in an exponential growth in databases and repositories. One of these rich data domains is the biomedical domain. A large amount of biomedical data is available, ranging from clinical symptom details to various types of biochemical data and imaging device outputs. Mechanically retrieving biological information from images and reshaping them into machine-readable knowledge is a challenging task, because the biomedical domain is vast, dynamic, and complicated. Data mining can improve the quality of biomedical pattern extraction [7].
A backpropagation algorithm is an early method for the detection of thyroid disease. An advanced neural network (ANN) was created using backpropagation of error for prior disease diagnosis. Afterward, this ANN was trained using empirical values, and testing was performed using information that had not been used during the training process [8]. Data collection is an important methodological approach in the field of medical disciplines, because efficient techniques for analyzing and identifying disorders are required. Data mining applications are used in clinical governance, health information technology, and patient care systems. It is also important in determining disease resilience. The popular data mining techniques used to recognize the complex parameters of the nutrition data set are classification and clustering [9].
A novel approach was used for the detection of three types of anomalous red blood cells, known as poikilocytes, that were found in iron-deficient blood smears. The classification and counting of poikilocytes are critical steps in the rapid recognition of iron deficiency anemia disease. The three basic poikilocytes in IDA are dacrocyte, elliptocyte, and schistocyte [10]. High-dimensional biomedical datasets contain thousands of features that are used to diagnose genetic diseases, but their predictive accuracy is affected by numerous irrelevant or weak connection features. While minimizing computation complexity in data mining, feature-selection techniques enable classification models to precisely discover patterns in features and determine a feature vector from an initial set of features. An enhanced shuffled frog-leaping algorithm (ISFLA) is presented in this paper, and it explores the space for potential subsets to choose the set of attributes that maximizes prognostication while minimizing irrelevant attributes in high-dimensional biological data [11]. The latest ANN-based finite impulse response extreme learning machine (FIR-ELM) was used to further analyze the categorization of two binary bioinformatics datasets into leukemia and colon tumor. To investigate the hidden layer of the neural classifier's FIR-ELM for the smoothing capabilities of feature identification, we performed a time series analysis of the microarray samples. Afterward, we determined how linearly divergent the data patterns in the microarray datasets were [12].
The optimal feature-selection problem, and its authors, describe a coherent analytical foundation that can retrofit successful heuristic criteria, indicating the approximate solutions made by each method [13]. The outcome of a microstructure heart arrhythmia detection system based on electrocardiography, ECG, and signal features was analyzed. These signals came from the MIT/BIH arrhythmia directory. Initially, Hermitian basis functions were used to model the ECG beats. The width parameter-sigma-of HBF was optimized in this step to minimize model error. The extracted features, which contain the model's boundary conditions, were used as input for the k-nearest neighbor classifier, KNN, to evaluate the model's effectiveness [14]. Approximately 90% of patients with Parkinson's disease are predicted to have vocal and speech issues. Vocal folds are often weakened by this infection, causing the patients to have an unnatural voice. In the present study, different samples from the auditory signal of patients with Parkinson's disease and healthy individuals were gathered. The data classification was then completed using the KNN classification approach based on varied amounts of optimized features after the optimized features that influenced the data classification process were determined using a genetic algorithm [15]. Although thrombolysis reduces impairment and increases survival rates in patients with ischemic stroke, some people continue to suffer detrimental effects. Consequently, it will be beneficial, when making health decisions, to predict how patients with myocardial infarction might react to regional rehabilitation [16].
Straightforward, mathematical assessment criteria need to be established to generate and quantify pragmatic forecasts in cerebral ischemia with data that are readily available post-surgery in the emergency unit. Regression was used to investigate the causes of inferior outcomes in the originating sample of formerly independent people with information systems. The covariate correlations from the computed holistic framework were used to build a scoring model based on integers for each correlation coefficient, and the average of the scores for the criterion was used to obtain the total result [17]. This process aims to offer a self-contained method for improving learning-argumentation frameworks that employ deformation key frames of MR images to aid in the rational frameworks of ischemic stroke diagnosis. Anthropological, physiological, and statistical approaches were gathered from the fragmented tumors to form a feature set that was then further defined using classification techniques. The results of the recommended approach, which accurately designates electromagnetic fields as vascular tumors with a 93.4% accuracy, are significantly superior to those of the classification model [18]. Among many other clinical and imaging parameters, ageing and the severity of a hemorrhage are immediate, precise indicators of the likelihood of SICH and the results of treatment following intravenous infusion therapy [19]. The use of aided technology for stroke could reduce the evaluation period, improve prediction performance, make it simpler to discriminate between different types of ICH, and reduce the chance of human error [20].
One study presents the improvements in learning methods and developments that are in line with the different varieties and manifestations of dyslexia. This study opens with a discussion of cosmic mythology and examines how learning environments that consider student's skills and requirements can be combined with the appropriate assistive technology to deliver effective e-learning experiences and reliable instructional resources. The Ontology Web Language, a data-handling framework that enables programmers to handle both the substance and the introduction of the data available on the web, was used to generate the metaphysics used in this evaluation [21]. The methodology was designed and implemented to help identify the fundamental problems that may affect students learning to read or write and problems that may then lead to further problems with memory cognition. This strategy was used to assist activists and parents in understanding the issue of dyslexia and to put children on the right path to academic success [22]. Participants, with and without dyslexia, used an online game with language-independent melodic and visual components to communicate in different languages. A total of 178 participants were involved. The analysis revealed nine game measures for Spanish children with and without dyslexia that had significant differences and which could be used in current projects as a justification for speech independent exploration [23]. Quantitative and artificial intelligence-based methods are recommended to instinctually seek innovative and complicated features that consider reliable credentials among dyslexic and control listeners and to support the hypothesis that the majority of differences between dyslexic and talented readers are located on the left side of the brain. Unexpectedly, these devices have also demonstrated how high pass signals carry vital information [24]. Their analysis revealed certain remarkable EEG patterns associated with autism, which is a learning disability with a neurological basis. Although EEG signals contain important information about mental processes, understanding these practices is typically indirect because of their intricate nature. This approach identifies the optimal EEG terminals and brain regions for order and the extraordinary EEG signals produced during writing and composition in adults with dyslexia [25]. The central idea is to begin creating code language for scripting matrices by using the Boolean algebra features of the codes and to present two decryption techniques that enable the identification and retrieval of potential faults or rejection [26]. Dynamic subsamples of ocean climate predictions of surface temperature anomalous outliers in the Tasman Sea were enhanced by the employment of reports of extreme sea-surface temperature that derived from the space station's geographical position system. The parameters of an extreme value distribution were predicted using regression analysis on the important marine meteorological data in a probabilistic conceptual structure [27]. Additional or standardized nuclear approaches can be employed to overcome the constraints of current investigations into the original sources of seafood. Cross luminescence and carbon isotope analysis have been used to pinpoint the production method and geographic origin of Asian freshwater fish [28].
Security concerns that develop during earthquake activity and during periods when the threat of earthquake activity is at its peak, should always be handled probabilistically [29]. For this study, the two quantifiable methods for estimating the likelihood of seismic behavior to affect important and relatively low-and mid-rise structures are presented. The non-linear and linear systems separately and simultaneously assess the injury concerns of an inclined plane exposed to uncontrollable shaking and atmospheric threats, respectively. These systems are divided into three parts: danger showcase; underpinning delicacy examination; and destruction likelihood processing [30].
Numerous well-known classification methods, such as decision tree, ANN, logistic regression, and naive Bayes were examined for one study. Then, bagging and boosting procedures were created to increase the durability of these frameworks. Additionally, random forest was considered when the investigation was evaluated. The best result of the sickness risk random-forest strategy was employed for classification according to outcomes. Subsequently, a web application for predicting future occurrences was created using this approach. People with higher chance of getting diabetes were included in the diabetes risk class [31]. Heart rate variation information derived from ECG signal data were used for a further investigation. Here, when CNN-LSTM was originally tested with the HRV data, the prediction accuracy was 90.9%. By using CNN-LSTM integration, the accuracy was improved to 95.1%, and by using five-fold cross-validation based on the same data, the efficiency was enhanced to 93.6%. The cross-validation efficiency is the maximum priority currently available for the automatic identification of hypertension [32]. The information was subjected to several machine-learning approaches, and categorization was carried out using a range of strategies, in which regression analysis resulted in the highest accuracy of 96%. With a 98.8% accuracy rate, the AdaBoost classifier was the pipeline's most appropriate prediction. Two independent datasets were used to compare the accuracy of the machine-learning methods. The algorithm clearly enhanced the diabetes prediction accuracy and precision when utilizing this information compared to previous resources [33].
Additionally, the mellitus dataset was used to evaluate the effectiveness of various suggested deep neural networks and machine learning classification techniques. The other methods had an accuracy that is higher than 90%; for instance, the XGBoost classifier achieved a performance of approximately 100.0% [34]. Both cutting-edge methodologies and some well-known machine learning strategies were contrasted with the DNN algorithm. The suggested technique, which is dependent on the DNN technique, delivered impressive outcomes, with an accuracy of 99.75% and an F1-score of 99.66% [35]. Some papers have been published by authors that report the application of SVM, KNN, or other ML tools in biomedical applications [36][37][38][39][40]. Automated medical diagnostic systems can be easily accessed by the general public, especially by those who cannot afford quality medical care. This methodology essentially combines soft and harsh inputs. A wide range of typical symptoms, including fever, headaches, and cough, were considered soft inputs. Each chosen illness was associated with a range of universal symptoms. Images of the tongue were considered hard inputs because doctors frequently utilize them to identify a variety of illnesses. Hard input analysis was split into two stages: chromatic color analysis and statistical analysis based on texture. After being decoded from the hard and soft inputs, the feature vectors were supplied to a neural network to create a classification mode [41].

Research Methodology
A hypothyroidism dataset from the UCI containing 3163 patient details with 23 independent features and one dependent feature (https://archive.ics.uci.edu/ml/datasets/ thyroid+disease, accessed on 12 January 2023) was used as shown in Equation (1). (1) where HY represents the hypothyroid dataset. We undertook dataset preprocessing and determined its incomplete values. The incomplete data were computed for the hypothyroidism dataset by computing the mean of input values for each attribute with Equation (2).
Equation (2) expresses the estimation of the null data information and attribute scaling of the vehicle motion dataset with Equation (3).
where HY is the complete processed dataset without null values. The imputation deviation of features was measured using the average of the estimated variance within the hypothyroidism dataset as shown in Equation (4).
The imputed dataset was estimated with the interval value "Interval"of each feature by finding its variance and was estimated using Equation (5).
The overall architecture of the work is shown in Figure 1. The following contributions are provided in this work.

∑
( 3) where is the complete processed dataset without null values. The imputation deviation of features was measured using the average of the estimated variance within the hypothyroidism dataset as shown in Equation (4).
The imputed dataset was estimated with the interval value " "of each feature by finding its variance and was estimated using Equation (5).
The overall architecture of the work is shown in Figure 1. The following contributions are provided in this work. The complete processed data including incomplete values that contained the complete variance were estimated using Equation (6) as follows: An exploratory data analysis was performed to analyze all the clinical parameters and the extent to which each feature supports the prediction of hypothyroidism. The number of parameters, the correlation of all variables, as in the following equation, and the data type of the characteristics as given in Equation (7), were evaluated by subjecting a dataset to exploratory prescriptive data analysis.
As stated in Equations (8)-(10), the dataset was divided into training and testing data with an 80:20 ratio. Python script was used for the implementation by using the Spyder platform and Anaconda navigator.
ANOVA test was carried out to verify the F-statistic values of all features with a PR(>F) <0.05 that highly influence the target. Then, hypothyroidism was predicted using various classifier algorithms, and the performance was analyzed. The original dataset was subjected to normalization in order to make it ready for application of the ANOVA test. This is achieved by using the Box-Cox method from the statistical package of NumPY and pandas. The Box-Cox approach transforms and normalizes the data to handle nonnormally distributed data. The results obtained from the Box-Cox method is shown below in Figure 2. The complete processed data including incomplete values that contained the complete variance were estimated using Equation (6) as follows: (6) An exploratory data analysis was performed to analyze all the clinical parameters and the extent to which each feature supports the prediction of hypothyroidism. The number of parameters, the correlation of all variables, as in the following equation, and the data type of the characteristics as given in Equation (7), were evaluated by subjecting a dataset to exploratory prescriptive data analysis.
As stated in Equations (8)-(10), the dataset was divided into training and testing data with an 80:20 ratio. Python script was used for the implementation by using the Spyder platform and Anaconda navigator.
ANOVA test was carried out to verify the F-statistic values of all features with a PR(>F) <0.05 that highly influence the target. Then, hypothyroidism was predicted using various classifier algorithms, and the performance was analyzed. The original dataset was subjected to normalization in order to make it ready for application of the ANOVA test. This is achieved by using the Box-Cox method from the statistical package of NumPY and pandas. The Box-Cox approach transforms and normalizes the data to handle non-normally distributed data. The results obtained from the Box-Cox method is shown below in Figure 2. The original dataset was subjected to dimensionality reduction using the regressor and classifier feature-selection algorithms to determine the best subset components for predicting hypothyroidism. The feature-selected subset of the clinical parameters was subjected to various classifier algorithms, and the performance was analyzed using the specified metrics. The implementation was carried out with python in Spyder editor with Anaconda Navigator IDE. Investigational results show that the Gaussian naive Bayes, AdaBoost classifier, and Ridge classifier maintained an accuracy of 89.5% for the regressor The original dataset was subjected to dimensionality reduction using the regressor and classifier feature-selection algorithms to determine the best subset components for predicting hypothyroidism. The feature-selected subset of the clinical parameters was subjected to various classifier algorithms, and the performance was analyzed using the specified metrics. The implementation was carried out with python in Spyder editor with Anaconda Navigator IDE. Investigational results show that the Gaussian naive Bayes, AdaBoost classifier, and Ridge classifier maintained an accuracy of 89.5% for the regressor feature selection methods. The blunge calibration regression model, as shown in Figure 3, was created with naive Bayes, Ada boost, and Ridge as the estimators with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers as shown in Equations (11)- (15).
Sensors 2023, 23, x FOR PEER REVIEW 9 of 28  The implementation results show that the Kernel SVM classifiers, KNeighbor classifier, and Ridge classifier maintained an accuracy of 87.5% for the classifier feature-selection methods. The blunge calibration classifier model, as shown in the Figure 4, was created with Kernel SVM, KNeighbor, and Ridge as the estimators with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers as shown in Equations (16)- (19).

Implementation Setup
The hypothyroid dataset with 3163 rows and 24 feature components from UCI was used for data preprocessing. The dataset information is shown in Figure 5.

Implementation Setup
The hypothyroid dataset with 3163 rows and 24 feature components from UCI was used for data preprocessing. The dataset information is shown in Figure 5. Implementation was undertaken with Python under an NVidia Tesla V100 GPU server with 30 training epochs and a batch size of 64. All clinical parameters were analyzed by determining the relationship between each feature and its correlation, as shown in Figure 6.

Anova Test Analysis
ANOVA was carried out to analyze those attributes of the dataset with PR(>F) < 0.05 that highly influence the target. ANOVA was applied to the dataset features, and the results show that the features (thyroid surgery, pregnant, tumor, lithium) have values of PR(>F) > 0.05 and do not contribute to the target, the results are shown in Table 1. Implementation was undertaken with Python under an NVidia Tesla V100 GPU server with 30 training epochs and a batch size of 64. All clinical parameters were analyzed by determining the relationship between each feature and its correlation, as shown in Figure 6. Implementation was undertaken with Python under an NVidia Tesla V100 GPU server with 30 training epochs and a batch size of 64. All clinical parameters were analyzed by determining the relationship between each feature and its correlation, as shown in Figure 6.

Anova Test Analysis
ANOVA was carried out to analyze those attributes of the dataset with PR(>F) < 0.05 that highly influence the target. ANOVA was applied to the dataset features, and the results show that the features (thyroid surgery, pregnant, tumor, lithium) have values of PR(>F) > 0.05 and do not contribute to the target, the results are shown in Table 1.

Anova Test Analysis
ANOVA was carried out to analyze those attributes of the dataset with PR(>F) < 0.05 that highly influence the target. ANOVA was applied to the dataset features, and the results show that the features (thyroid surgery, pregnant, tumor, lithium) have values of PR(>F) > 0.05 and do not contribute to the target, the results are shown in Table 1.

Results and Discussion
Hypothyroidism was predicted using various classifier algorithms before and after feature scaling, the performances were analyzed, and the results are shown in Tables 2 and 3. The raw dataset was subjected to dimensionality reduction by using AdaBoost, gradient boosting regressor, extra trees, and random forest regressor feature-selection methods, and the feature importance values of each attribute of the hypothyroidism dataset before and after feature scaling are shown in Tables 4 and 5. The raw dataset was subjected to dimensionality reduction using AdaBoost, gradient boosting, extra trees, and random forest classifier feature-selection methods, and the feature importance values of each attribute of the hypothyroid dataset before and after scaling are shown in Tables 6 and 7.    A feature importance index of all the regressor and classifier feature-selection methods of the hypothyroid dataset, before and after feature scaling, was also compared, and the results are shown in Table 8.
The feature-selected subset of the AdaBoost regressor was applied to the classifiers, and the performance was analyzed. The results are shown in Tables 9 and 10.  The feature-selected subset of the gradient boosting regressor was applied to the classifiers, the performances before and after feature scaling were analyzed, and the results are shown in Tables 11 and 12. The feature-selected subset of extra trees regressor was applied to the classifiers, the performances before and after scaling were analyzed, and the results are shown in Tables 13 and 14. The feature-selected subset of random forest regressor was applied to the classifiers, the performances before and after feature scaling were analyzed, and the results are shown in Tables 15 and 16. The performances of all classifiers after reduction with the feature importance of the AdaBoost, gradient boost, extra tree, and random forest regressors before and after feature scaling are shown in Figures 7 and 8.  The performances of all classifiers after reduction with the feature importance of the AdaBoost, gradient boost, extra tree, and random forest regressors before and after feature scaling are shown in Figures 7 and 8.    The performances of all classifiers after reduction with the feature importance of the AdaBoost, gradient boost, extra tree, and random forest regressors before and after feature scaling are shown in Figures 7 and 8.   The feature selected subset of the AdaBoost classifier was applied to the other classifiers, the performances were analyzed, and the results are shown in Tables 17 and 18. The feature-selected subset of the gradient boosting classifier was applied to the classifiers, the performances before and after feature scaling were analyzed, and the results are shown in Tables 19 and 20. The feature selected subset of the extra trees classifier was applied to the other classifiers, the performances were analyzed, and the results are shown in Tables 21 and 22. The feature-selected subset of the random forest classifier was applied to the other classifiers, the performances before and after feature scaling were analyzed, and the results are shown in Tables 23 and 24. The performances of all classifiers after reduction with the feature importance of the AdaBoost, gradient boost, extra tree, and random forest classifiers before and after feature scaling are shown in Figures 9 and 10. The performances of all classifiers after reduction with the feature importance of the AdaBoost, gradient boost, extra tree, and random forest classifiers before and after feature scaling are shown in Figures 9 and 10.    The overall dataset was analyzed with the OLS features, such as p value, R squared, adjusted R squared, parameter coefficient, significance, AIC, BIC, standard error, F-statistic, log-likelihood, residual MSE, model MSE, omnibus probability, and JarqueBera probability for all 255 subset combinations of the features. The following subset includes highly significant features based on the p values, and the parameters are listed in Tables 25-28.  The overall dataset was analyzed with the OLS features, such as p value, R squared, adjusted R squared, parameter coefficient, significance, AIC, BIC, standard error, F-statistic, log-likelihood, residual MSE, model MSE, omnibus probability, and JarqueBera probability for all 255 subset combinations of the features. The following subset includes highly significant features based on the p values, and the parameters are listed in Tables 25-28.  Experimental results show that the Gaussian naive Bayes, AdaBoost classifier, and Ridge classifier maintained an accuracy of 89.5% before and after feature scaling for the regressor feature-selection methods. The proposed BCRM was designed with Gaussian naïve Bayes, Ada boost, and Ridge as the estimators and with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers. The proposed BCRM model showed 99.5% accuracy in predicting hypothyroidism. The implementation results show that the Kernel SVM, KNeighbor, and Ridge classifiers maintained an accuracy of 87.5% before and after feature scaling for the classifier feature-selection methods. The BCCM was created with Kernel SVM, KNeighbor, and Ridge as the estimators with accuracy optimization using soft blending based on the sum of the predicted probabilities of classifiers. The proposed BCCM showed 99.7% accuracy in predicting hypothyroidism. The performance analysis of the proposed BCRM was analyzed with the existing classifiers and the results are shown in Table 29 and Figure 11.

Conclusions
This paper aimed to predict the existence of hypothyroidism based on an analysis of the features required for classification. The ANOVA test was utilized for the identification of the significant features that predict the target variable. This paper also attempted to apply the regressor and classifier feature-selection algorithms to reduce the dataset with significant features. The dataset was also examined with OLS performance indicators for identification of the best subset of features based on p values. The subset feature ['TSH_measured', 'T4U_measured'] has an R squared value of 0.938, which is close to the ideal value. The implementation was carried out with Python in Spyder editor with the Anaconda Navigator IDE. Experimental results show that the Gaussian naive Bayes, Ada-Boost classifier, and Ridge classifier maintained an accuracy of 89.5% before and after feature scaling for the regressor feature-selection methods. The MCRM was developed with Gaussian naive Bayes, Ada boost, and Ridge as the estimators, with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers. The proposed BCRM showed 99.5% accuracy in predicting hypothyroidism. The implementation

Conclusions
This paper aimed to predict the existence of hypothyroidism based on an analysis of the features required for classification. The ANOVA test was utilized for the identification of the significant features that predict the target variable. This paper also attempted to apply the regressor and classifier feature-selection algorithms to reduce the dataset with significant features. The dataset was also examined with OLS performance indicators for identification of the best subset of features based on p values. The subset feature ['TSH_measured', 'T4U_measured'] has an R squared value of 0.938, which is close to the ideal value. The implementation was carried out with Python in Spyder editor with the Anaconda Navigator IDE. Experimental results show that the Gaussian naive Bayes, AdaBoost classifier, and Ridge classifier maintained an accuracy of 89.5% before and after feature scaling for the regressor feature-selection methods. The MCRM was developed with Gaussian naive Bayes, Ada boost, and Ridge as the estimators, with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers. The proposed BCRM showed 99.5% accuracy in predicting hypothyroidism. The implementation results show that the Kernel SVM, KNeighbor, and Ridge classifiers maintained an accuracy of 87.5% before and after feature scaling for the classifier feature selection methods. The blunge calibration classifier model was developed with Kernel SVM, KNeighbor, and Ridge as the estimators, with accuracy optimization using soft blending based on the sum of predicted probabilities of classifiers. The proposed blunge calibration classifier model showed 99.7% accuracy in predicting hypothyroidism. As an overview of novelty, the BCCM and BCRM models were built to optimize accuracy with soft blending based on the sum of predicted probabilities of classifiers. The BCRM and BCCM models uniqueness's are achieved by updating the estimators list with the effective classifiers and regressors that suit the application at runtime. Despite the outstanding performance of the BCRM and BCCM models, it is still difficult for researchers to adjust the model hyper-parameters by combining them with other optimizers and statistical loss functions.