A Novel Approach for Data Feature Weighting Using Correlation Coefﬁcients and Min–Max Normalization

: In the realm of data analysis and machine learning, achieving an optimal balance of feature importance, known as feature weighting, plays a pivotal role, especially when considering the nuanced interplay between the symmetry of data distribution and the need to assign differential weights to individual features. Also, avoiding the dominance of large-scale traits is essential in data preparation. This step makes choosing an effective normalization approach one of the most challenging aspects of machine learning. In addition to normalization, feature weighting is another strategy to deal with the importance of the different features. One of the strategies to measure the dependency of features is the correlation coefﬁcient. The correlation between features shows the relationship strength between the features. The integration of the normalization method with feature weighting in data transformation for classiﬁcation has not been extensively studied. The goal is to improve the accuracy of classiﬁcation methods by striking a balance between the normalization step and assigning greater importance to features with a strong relation to the class feature. To achieve this, we combine Min–Max normalization and weight the features by increasing their values based on their correlation coefﬁcients with the class feature. This paper presents a proposed Correlation Coefﬁcient with Min–Max Weighted (CCMMW) approach. The data being normalized depends on their correlation with the class feature. Logistic regression, support vector machine, k-nearest neighbor, neural network, and naive Bayesian classiﬁers were used to evaluate the proposed method. Twenty UCI Machine Learning Repository and Kaggle datasets with numerical values were also used in this study. The empirical results showed that the proposed CCMMW signiﬁcantly improves the classiﬁcation performance through support vector machine, logistic regression, and neural network classiﬁers in most datasets.


Introduction
Data preprocessing is one of the most crucial steps in machine learning.Using important and necessary methods to prepare raw data correctly will positively affect the output and improve a model's performance [1,2].The data preparation stage includes many tasks, such as removing outliers and noise, integrating data from diverse sources, dealing with missing data, and transforming data into a scale suitable for analysis.This stage aims to eliminate the impact of systematic sources of variation as much as is feasible [3,4].While data scaling is considered a treatment for domination issues in most cases, it will have a reverse impact when giving the same contribution to a feature that is irrelevant to the target feature.In this case, using feature weighting to select essential features is one of the solutions, and using it to balance a feature based on its importance is the other possible solution.In many data-mining algorithms, such as neural network algorithms, clustering, and distance-based algorithms, data rescaling is an essential stage prior to the training phase to avoid misleading results and speed up the learning process [5][6][7].To avoid numeric features dominating other numeric values, transforming all feature values into a standard range will give equal importance to all features [4,6].Many research findings suggest that data normalization and standardization substantially impact classification and clustering results [8].The effect of data normalization on the results of many machine learning (ML) methods has been studied, such as its effect on classification accuracy [9,10], neural network training [11,12], the clustering process [8], and outlier detection methods [13].When data have been normalized, all features have an equal contribution to ML results, but it does not mean that these features are equally important.In some cases, data have many irrelevant and redundant features.Ref. [14] proposed the Adaptive Distinct Feature Normalization Strategy, which enhances the fusion of sparse topics and deep features by adaptively normalizing them separately.This results in a clearer feature description and decreases confusion in complex scenes.The presence of undesired features makes learning difficult and increases the feature's space.Generally, normalization gives all features the same contribution to a model.But removing any dominant features may reduce the model's performance.In this case, the behavior of data plays a vital role in the performance [15].
In the same context, feature weighting is one of the essential stages since it is used to adjust the feature values based on their contribution to the model and the result [16].Many variable importance measurements are presented and categorized into sub-categories based on the techniques used.One of those techniques is the correlation coefficient, a parametric regression technique.Another different strategy is the nonparametric strategy which includes multiple techniques; one example is Random Forest [17].Many researchers have used variable importance measurement strategies and applied them to enhance the classifier's performance, such as [18], naive Bayes text classifiers [19,20], the fuzzy clustering method, and feature weighting used for the neural network [21,22], with SVMs [23].Also, feature weighting has been used as a feature selection strategy to know the influence of features on results and then exclude irrelevant, redundant features [24][25][26], as well as the information gain attribute [27].
In contrast, a correlation coefficient is a measurement tool used to look for relationship patterns between different characteristics [28].Therefore, a bivariate study measures the degree of association between two variables and the direction of the relationship [29].In this context, the correlation coefficient can be considered as a variable importance method amongst parametric regression techniques when used to indicate linear dependence [17].Various types of correlation coefficient formulas have been presented depending on the data type (numerical, ordinal, or nominal) and the type of correlation (linear or nonlinear relationship) [30].Pearson's correlation coefficient is a measurement method to explore the linearity of a correlation [31] for numerical data only.Although the correlation coefficient has been used mainly in statistical analysis to discover relationships among variables, there are many other uses for correlation metrics, especially in data mining.For example, correlation coefficients have been used for feature selection [32][33][34], missing data imputation methods [35][36][37], and feature quality measurement to find the best splitting features and points of decision trees [38].
Normalization is a preprocessing step applied to data to give features equal importance and prevent the domination of a few features.The impact of normalization methods has been studied in many works [39][40][41].The Min-Max normalization method is one of the best normalization methods that has been found to improve the performance of classifiers [42,43].Data standardization has also produced better outcomes in neural network training, though the benefit decreases with increasing network and sample sizes [11].Choosing the appropriate normalization method is essential, as it affects the performance of supervised learning algorithms [44].Also, Ref. [40] examined the impact of four normalization techniques on forecasting.They found that standardization calculations have a high affectability, and that a cautious approach is needed to deal with the outcomes.The impact of normalization on the performance of different methods such as outlier detection, violent video classification, backpropagation neural networks, and classification performance has been studied, and it was found that the performance of normalization depends on both the data and the technique used [13,45].Due to the importance of normalization, [46] proposed a new normalization method to only deal with integer values.The authors of [47] proposed two normalization methods, a "new approach to Min-Max" and a "new approach to Decimal Scaling," to reduce the impact of large feature values and represent them in a small range.These methods are based on the Min-Max and Decimal Scaling normalization techniques.The study performed well with the k-means clustering method, but only that method was used for evaluation.
At the same time, feature weighting is another best strategy to prepare data to improve a method's accuracy.Feature weighting methods are strategies for correlation analysis and dimension reduction, where the weight represents the contribution of each feature dimension to classification [48].Ref. [49] proposed feature weighting methods based on similarity calculations using k-means, fuzzy c-means, and mean shift clustering.The weight of each feature is obtained by calculating the difference between original data and cluster centers, dividing by the mean of differences, and multiplying by the feature value in the dataset.
Ref. [50] proposed a support vector machine (SVM) with an information gain approach to improve fraud detection accuracy in card transactions.The approach involves normalizing the data using Min-Max normalization and reducing features through information gain-based feature selection.Discretization is carried out before normalization.Ref. [51] proposed a two-stage strategy combining normalization and supervised feature weighting using the Pearson correlation coefficient (PCC) and Random Forest Feature Importance estimation.The first stage involves normalizing the data using standardization.In contrast, the second stage involves calculating feature weights through the PCC and Random Forest Feature Importance and multiplying each feature value by weight.Ref. [52] proposed a dynamic feature-weighting algorithm to address multi-label classification by minimizing an objective function to bring together samples with the same label and separate samples with different labels.The weight function is used to evaluate the method with a multi-label classifier.Ref. [53] proposed a Correlation-Based Hierarchical k-Nearest Neighbor Imputation (CoHiKNN) method that weights the distance based on the correlation between the features and the label.The correlation coefficient is used to weigh the distance and impute missing values using k-NN.The weighting strategy multiplies each difference between two points by the correlation coefficient value of the feature.Ref. [54] proposed a hybrid data-scaling method combining Min-Max and Cox-Box to improve fault classification performance in compressors.The Min-Max method rescales data to one range, while Cox-Box transforms non-linear distributions into normal ones.
However, although it is essential to normalize the data and give all features the same contribution to override the feature's dominance over the results of the model, in many cases, the importance of variables varies from one feature to another.Also, normalization has some limitations, such as destroying the original structure of the dataset [50].In addition, using the feature weight for selecting important features has some limitations, in that the selected features still have an equal contribution.Also, using a subset of the dataset can decrease the quality and results of the correlation coefficient due to low instance numbers, affecting the correct weight assigned to each feature [53].
In this work, we tried to balance the importance of normalization with the various importance of features and reduce feature domination by employing data standardization, as well as maximizing the feature contribution by maximizing the value based on its correlation coefficient.The maximization limit parameter is proposed to control the new range of feature values.We investigate the influence of normalization on the classification and regression methods using three data types in two phases.First, we implement the un-normalized data (raw data) and normalized data using one normalization method.In the second phase, the normalized data are used to weight the values depending on the association among the values with the class feature.This step calculates the correlation coefficient between the class feature and the rest of the features.This step increases the contribution of each feature based on its relationship with the class.Each value will increase based on its correlation, where features with higher correlation will experience a greater increase compared to features with lower correlation.This weighting strategy significantly influences the model construction, directing more attention towards features exhibiting high correlation.The features with strong correlation will potentially increase the model's accuracy and achieve more power in classification algorithms.

Materials and Methods
A new weighting strategy is proposed using the correlation value between features for data representation.The new data representation values are used to increase the contribution of data.Each value will be maximized depending on the feature's actual value and Pearson correlation coefficient.Calculating the correlation between the class and the rest of the features is the initial step for weighting the feature values.The new maximum values will be calculated based on the actual and correlation coefficient values.First, all features will be normalized using Min-Max (0,1) to give the same contribution to all features, then increase the values using its correlation coefficient.The ensuing maximization step carefully calculates new maximum values by considering both the original feature values and their corresponding correlation coefficients.This dual-factor approach ensures that the augmented values strike a balance between the inherent importance of features and their contextual significance, enhancing the robustness of the data representation methodology.The strategy's dual consideration of raw feature values and correlation coefficients establishes a nuanced approach that not only maximizes the impact of each feature but also captures the intricacies of their relationships within the dataset.
As the Min-Max (0,1) normalization method represents all features in the same range (0-1), the proposed method will modify this concept, where each feature will have its own range.If we assume that f1 is a feature with 0 correlation, all the values will represent the same value due to having no correlation with the class label.On the other hand, the features with a correlation of 1 with the class label will represent the new maximum value.The maximum value is determined based on the coefficient of change (C) and the correlation R-value.The relationship is direct between the coefficient of change and the maximum value, indicating that as the coefficient increases, the maximum value also increases, and vice versa.The maximum possible value for each feature corresponds to the correlation value of 1 (strong correlation) for that feature.In this scenario, strong correlation is the only case where new values can attain the maximum values.

Datasets and Experiments
The proposed method will calculate the correlation among features using the Pearson correlation coefficient (PCC) (Equation ( 1)) and Min-Max normalization (MMN) (Equation ( 2)).new_min and new_max denote the updated range of feature values, with new_min set to 0 and new_max set to 1 in the context of Min-Max scaling (0,1).The new weighted value is created using Equation (3).
Symmetry 2023, 15, 2185 5 of 18 where the following are defined in Table 1: The following, Algorithm 1 explains the steps of the proposed method CCMMW.CCoT = Calculate the correlation coefficient of each feature with the Label feature corr(V j ,V Target ) using the PCC method as in Equation ( 1) Calculate No-Data using MMN normalization method, Equation ( 2) where n is the number of features in the data Calculate CCMMW-Data using the proposed method as in Equation ( 3) Assume a dataset (D) with six features (F1 to F6) where D is the normalized Min-Max (0,1) data.As a result of the MMN method, each small value of features is presented as 0, and the maximum value is presented as 1, as in the original normalized Min-Max 0,1 data in Table 2. Also, the correlation among the complete dataset is shown in Table 3, where each correlation coefficient (CC) value represents the correlation coefficient value between the feature and the class label feature.Table 2 shows three sets of values as the results of CCMMW.The first is CCMMW (1), where the table is produced with a C parameter value of 1. Also, CCMMW(5) is produced using the CCMMW method with a value of 5 for the C parameter.Finally, CCMMW (10) is the results of the CCMMW method using a value of 10 for the C parameter.
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0. As shown in Figure 1, the algorithm aims to improve the accuracy of classification methods by applying normalization and correlation-based weighting techniques to the Symmetry 2023, 15, 2185 6 of 18 input data.It starts by calculating the correlation coefficient between each feature and the target feature using the PCC method.Then, it normalizes the data using the Min-Max method to scale the values between 0 and 1. Next, it calculates the correlation-weighted normalized data for each feature, incorporating the correlation coefficients obtained.The algorithm outputs the correlation coefficient of the target feature, the normalized data, and the correlation-weighted normalized data.By normalizing the data and incorporating feature-target correlations, the algorithm aims to enhance the classification accuracy by giving more importance to relevant features in the classification process.
normalized data for each feature, incorporating the correlation coefficients obtained algorithm outputs the correlation coefficient of the target feature, the normalized and the correlation-weighted normalized data.By normalizing the data and incorpor feature-target correlations, the algorithm aims to enhance the classification accura giving more importance to relevant features in the classification process.
The general research methodology is presented in Figure 1; two strategies are to transform the data prior to new data creation.One is transformed using existing new approaches to normalization methods.The second uses CC values to weigh th tures for representing the data, and then uses the new data to improve the accura classification approaches.Then, the performance of the proposed method is comp using un-normalized and normalized data with different classifiers.The general research methodology is presented in Figure 1; two strategies are used to transform the data prior to new data creation.One is transformed using existing and new approaches to normalization methods.The second uses CC values to weigh the features for representing the data, and then uses the new data to improve the accuracy of classification approaches.Then, the performance of the proposed method is compared using un-normalized and normalized data with different classifiers.
Experiments were conducted on twenty datasets comprising both discrete and continuous features to assess the proposed method's consistency.These datasets were carefully selected from reputable sources, including the Machine Learning Repository UCI and Kaggle Repository databases.These databases are renowned for their diverse and well-curated datasets, making them valuable resources for empirical studies in this field.Table 4 provides an overview of the datasets.Also, various data sizes are considered, where various instances and features are used in the study.
In this experiment, to analyze the effect of weighting features based on its CC on the classification results, comparisons are made between the un-normalized (RAW) data, the normalized Min-Max (0,1) (MM) method, the Two-Stage Min-Max with Pearson (2S-P) method, the Two-Stage Min-Max with RF Feature Importance (2S-RF) method, the New Approach Min-Max Normalization method (NAMM), and the proposed CCMMW approach.All codes are written in Python 3.7.Jupyter (Anaconda 3).The SciKitLearn Library was used to implement all methods in this study.Five classification methods were used to investigate the impact on the performance and the improvement of the proposed method.The classifiers are the logistic regression (LR), SVM, k-nearest neighbor (kNN), neural network (NN), and naive Bayes (NB) classifiers.Finally, all experimental results were performed on numerical datasets.In the evaluation stage, the effect of using the CC to weigh the normalized values was evaluated based on the type of algorithm that was applied to the data.For classification purposes, the result was evaluated based on the accuracy rate.Ten cross-validations and the experiment were repeated ten times, and the results were averaged for further analysis.For the k-NN classifier, 3 neighbors were considered as k values.In the kernel function of the SVM, RBF was used as the kernel parameter value, and all other settings used the default values.

The Evaluation of Different Classifiers Based on the Best Result of CCMMW
The best results of CCMMW from 10 available values in the experiment results are shown in Figure 2. The ten results were due to the setting of the C parameter, which we used to adjust the new range of data.The impact of the various values of C is discussed in the further section of this discussion.
In the following results, the classification accuracy is presented.Bold values are the highest out of the six methods, and the CCMMW values are underlined when the value is the highest value after the raw values.

Logistic Regression (LR) Classifier
Table 5 shows that the CCMMW method obtained the highest accuracy in 12 out of the 20 datasets, and an extra four datasets if the raw data are excluded.It shows that the best accuracy improvement using the CCMMW + LR classifier compared with the MM method was 5.46%, obtained from the Letter dataset.The accuracy of MM was 71.46%, whereas CCMMW gave 76.92%.Although this case had the best raw data accuracy, the proposed method was still better than the other strategies.Also, the accuracy decreased in three cases out of twenty, whereas the worst was −1.24%.In the following results, the classification accuracy is presented.Bold values are the highest out of the six methods, and the CCMMW values are underlined when the value is the highest value after the raw values.

Logistic Regression (LR) Classifier
Table 5 shows that the CCMMW method obtained the highest accuracy in 12 out of the 20 datasets, and an extra four datasets if the raw data are excluded .It shows that the best accuracy improvement using the CCMMW + LR classifier compared with the MM method was 5.46%, obtained from the Letter dataset.The accuracy of MM was 71.46%, whereas CCMMW gave 76.92%.Although this case had the best raw data accuracy, the proposed method was still better than the other strategies.Also, the accuracy decreased in three cases out of twenty, whereas the worst was −1.24% .
As in Figure 2a, we find that, through a comparison between the proposed method and the MM normalization method, the proposed method excelled in most cases, as it excelled with 17 databases in this study, while MM excelled in only three cases, with an outperformance that did not exceed 1.24%.In addition, 2S-P outperformed in three cases while CCMMW outperformed in 17 cases, with a high difference in the Letter dataset, where 2S-P was 49.96% while CCMMW was 76.92%.Also, when comparing the results with 2S-RF, we find that, the same as with 2S-F, the proposed method outperformed 2S-  As in Figure 2a, we find that, through a comparison between the proposed method and the MM normalization method, the proposed method excelled in most cases, as it excelled with 17 databases in this study, while MM excelled in only three cases, with an outperformance that did not exceed 1.24%.In addition, 2S-P outperformed in three cases while CCMMW outperformed in 17 cases, with a high difference in the Letter dataset, where 2S-P was 49.96% while CCMMW was 76.92%.Also, when comparing the results with 2S-RF, we find that, the same as with 2S-F, the proposed method outperformed 2S-RF with 17 datasets.In comparison, only 2S-RF outperforms in the Breast Cancer 1, German, and Blood datasets.The last compared method is the NAMM, where the proposed method performed well in all datasets except German, where the NAMM result was 73.53% while that of CCMMW was 72.56%.Only three differences between the NAMM and CCMMW had a less than 1% improvement, while all other results obtained were between 1.27 and 26.78% with Spam and Letter.

Support Vector Machine Classifier
Table 6 shows that CCMMW obtained the highest accuracy in 16 out of the 20 datasets, and an extra four datasets when raw data are excluded, when using model building based on weighting correlation features.When comparing CCMMW + SVM with the MM normalization method, as in Figure 2b, all the CCMMW results outperformed the MM results.The minimum increase was obtained with the German dataset, which was 0.38%, while the best result obtained was with Breast Ca Coimbra, which increased by 19.64%.Also, CCMMW outperformed 2S-P with all datasets.Some large differences were observed, such as 76.22%, 50.15%, 43.50%, and 27.49% with the Letter, Vehicle, Ecoli, and Sonar datasets, due to the bad performance of 2S-P with the SVM.The same was observed with 2S-RF, where some large differences in accuracy were obtained compared with CCMMW.Also, due to the bad performance of 2S-RF with the SVM, the difference reached 77.12% with the Letter dataset.The lower difference obtained with 2S-RF was 0.49%.Finally, compared with the NAMM, all CCMMW results outperformed the NAMM results.Overall, CCMMW+SVM outperformed the other data preparation methods, increasing the accuracy compared with the rest of the normalization and feature weighting methods.The significant results obtained from all datasets showed an improvement in accuracy.Only the raw data and CCMMW produced the best results with all datasets.The performance of CCMMW with the SVM increased significantly, such as 19.64%, 14.24%, 11.97%, and 11.18 with Breast Ca Coimbra, Vehicle, PARKINSON, and Monkey, respectively.

k-Nearest Neighbor (k-NN) Classifier
Table 7 shows that CCMMW obtained the highest accuracy in only two out of the 20 datasets, and an extra two datasets if the raw data are excluded.As shown in Table 7 and Figure 2c, the performance of CCMMW with the k-NN classifier compared with MM shows us that CCMMW was only improved with the QSAR, Liver, Breast Ca Coimbra, and Bupa datasets.The accuracy of other datasets obtained by CCMMW was less than the accuracy of MM.Also, compared to 2S-P and 2S-RF, CCMMW only outperformed six and eight datasets, respectively.The NAMM was the only method that CCMMW outperformed, whereas the proposed method outperformed in 14 out of 20 datasets.Overall, the two improvements of the CCMMW + k-NN classifier were with the Breast Ca Coimbra dataset, while the improvement with QSAR was 0.91%.Another two improvements compared with the other methods (except for the raw data) were the liver and Bupa datasets, which reached 3.72% and 3.63% improvement, respectively.As seen in Figure 2c, the performance of CCMMW with the k-NN method had no improvement in general, where the results obtained by CCMMW were the worst when comparing most numbers.

Neural Network (NN) Classifier
The first thing we figured out from Figure 2d is that the raw data did not produce any of the best values in all datasets.Also, Table 8 show that Improvements were obtained with most of the datasets, where the accuracy increased with 15 datasets.The best improvement when using CCMMW with the NN classifier was 9.91% with the Letter dataset, while the second-best improvement was with Breast Ca Coimbra.The worst decrease with the NN classifier was 7.30%, where the accuracy decreased from 84.91% to 77.61%.Also, compared with the 2S-P method, the results show that CCMMW outperformed 2S-P, whereas, with 16 datasets, CCMMW produced higher accuracy than 2S-P.As with 2S-P, the comparison between 2S-RF and CCMMW favored the proposed method as it also outperformed when applied to most of the datasets.Although the NAMM outperformed CCMMW in three datasets (Monkey: 2.62% higher; Hearts Cleveland: 0.24% higher; and Blood: 0.32% higher), CCMMW performed as expected with the rest of the datasets.Some of the performance differences were high, such as 24% higher with Wholesale, 23.78% with Sonar, and 22.39% with Vehicle.Overall, we can conclude that CCMMW performed better when applied to data when using the NN classifier.

Naive Bayes Classifier
Table 9 shows that CCMMW obtained the highest accuracy in 10 out of the 20 datasets.Compared with the rest of the normalization and raw methods, as in Figure 2e, CCMMW produced high accuracy, outperforming others.Ten datasets were given high accuracy when using CCMMW with the NB classifier, while MM exceeded with only two datasets: Wine and Magic.The best accuracy increase for CCMMW with the NBC was 2.44% with Blood.Weighting features using 2S-P produced higher accuracy than CCMMW with eight datasets, while 2S-RF outperformed with eleven datasets.The same happened with the NAMM, which obtained the two best accuracies overall, 97.86% with the Musk dataset and 79.86% with the Glass dataset.

Discussion
For an overall discussion, we calculated the average accuracy of the methods with the twenty datasets.
As shown in Figure 3, the average accuracy is presented as an indicator of the accuracy results on various datasets.Also, Table 10 shows the average accuracy in numbers.the Logistic regression outperformed other preparation methods, where it had the best results in 12 out of the 20 datasets, and an extra four results outperformed other methods, except for the raw data.In the second method, the SVM classifier shows a clear improvement in accuracy when using the CCMMW strategy.Out of the 20 datasets, 16 gave the best accuracy with the SVM.In comparison, an extra four datasets gave the best accuracy when raw data are excluded.For the k-NN classifier, the proposed method's accuracy was slightly lower than the others.For the NN classifier, the results still show a good improvement in the accuracy of CCMMW.In 15 out of 20 datasets, its accuracy outperformed the other methods.Finally, although the averaged results present the proposed method as the fourthbest result, this is in part due to a lower difference between the results, where its accuracy with 11 datasets outperforms all other methods.

The Effect of C Parameter on the CCMMW Results
As we see in Figures 4a-e and 5a-e, the primary role of parameter C is to adjust the range of the data, since the relationship between C and the maximum value of the new range is direct, so the range increases as the value of C increases.The Effect of C Parameter on the CCMMW Results As we see in Figures 4a-e and 5a-e, the primary role of parameter C is to adjust the range of the data, since the relationship between C and the maximum value of the new range is direct, so the range increases as the value of C increases.

The Effect of C Parameter on the CCMMW Results
As we see in Figures 4a-e and 5a-e, the primary role of parameter C is to adjust the range of the data, since the relationship between C and the maximum value of the new range is direct, so the range increases as the value of C increases.As we see in Figure 4a, the best results of CCMMW were obtained with a high C value of 10.Most of the best results were obtained with C values of more than 5, as in Figure 4a (around 80%).Only three of the best results were obtained with C values less As we see in Figure 4a, the best results of CCMMW were obtained with a high C value of 10.Most of the best results were obtained with C values of more than 5, as in Figure 4a (around 80%).Only three of the best results were obtained with C values less than 5, meaning that increasing the range of values by increasing the C parameter values will positively impact the results with the LR classifier.Like LR, CCMMW with a high C parameter value gave good results with the SVM classifier.As shown in Figure 4b, most of the best results with the SVM were obtained specifically with a C value of 10, which is 67% of the result, as shown in Figure 5b.Only with the German and Wholesale datasets was the best accuracy result reached with a C value of 1.In some cases, multiple C values obtained the same result.All results of the SVM show that increasing the C parameter's value positively impacted the classifier's results, where all results showed increasing accuracy.Although the results of CCMMW with the k-NN classifier did not outperform other classifiers, where it is considered as having a weak impact on the results among the classifiers, the best results with CCMMW were obtained by setting a high C parameter value of 10, as seen in Figure 5c, where it represents 80% of the high results.All of the higher results were obtained with C values of 10, as seen in Figure 4c.In contrast to the previous methods, most of the good results of the proposed method were obtained by using a value of 5 or less for C, which is around 62% of the higher results, as seen in Figure 5d.Although some of the higher accuracies of CCMMW were obtained with C values greater than 5, the best results of the other methods with the NN were with C values less than 5.We can conclude that the impact of the C values on the NN were not clearly impactful, which means that adjusting the range towards higher possible values is not the best approach, as seen in Figure 4d. Figure 5e illustrates that higher accuracy was mostly achieved with small C values.For the naive Bayes (NB) model, the optimal result was obtained at C = 1, yielding 16%, whereas for C values of 2, 3, 4, and 5, the accuracy remained consistent at 12% each.This cumulative value amounts to 64% in total.According to the experimental results, data normalization has the potential to develop prediction models with the highest prediction accuracy.However, compared to the results of models with normalized and non-normalized data, the accuracy was improved based on many factors such as the used normalization method, the type of data, and the classification methods.

Conclusions
Our experiment discusses the impact of the maximization of the role of features based on their correlation.The impact of the combined data normalization and feature weight is apparent.Although normalization impacted the results positively by giving an equal contribution of features to avoid eliminating features with high range values, the weighted features outperform those methods by increasing the contribution of the features based on the dependency measurement of features, such as the CC of importance of feature measurement.In this study, we have presented a novel feature weighting with normalization method named the Correlation Coefficient Min-Max Weighted (CCMMW) method.The relation between the correlation in the normalization area has not received much attention in the literature.We used the CC values to give features with a strong association more contribution to the learning step in ML methods to improve the performance of those methods.We used the LR, SVM, k-NN, NN, and NB classification methods to evaluate the effect of CCMMW.The performance improvement of the SVM, NN, and LR classifiers was clear, as most results showed increased accuracy.Only the k-NN classifier's accuracy results were unsuitable, where the proposed method outperformed the Min-Max normalization method in 40% of the datasets.Still, other normalization and weighted methods outperform CCMMW in most cases.Also, adjusting the upper limit of the feature's maximum value plays the main role in reaching the best result of the proposed method.In future work, various normalization methods could be explored, as each method offers distinct advantages that may impact the outcomes of data transformation.Additionally, incorporating alternative weight measurement methods may enhance the significance of features in constructing classification models, potentially leading to improved accuracy.

Algorithm 1
CCMMW algorithm Algorithm: Normalization + Weighting data using Min-Max01 and CC to improve the accuracy of classification methods Input: Un-Data: un-normalize data Output: CCoT Correlation Coefficient of Label feature, Normalized data (No-Data), Correlation Weighted Normalized Data (CCMMW-Data)

1 .
The Evaluation of Different Classifiers Based on the Best Result of CCMMWThe best results of CCMMW from 10 available values in the experiment results are shown in Figure2.The ten results were due to the setting of the C parameter, which we used to adjust the new range of data.The impact of the various values of C is discussed in the further section of this discussion.

Table 2 .
CC values among features and the label feature using the proposed weight feature.

Table 3 .
CC among features and the label feature.

Table 5 .
Performance of CCMMW on logistic regression classifier.

Table 6 .
Performance of CCMMW on SVM classifier.

Table 7 .
Performance of CCMMW on k-NN classifier.

Table 8 .
Performance of CCMMW on neural network classifier.

Table 9 .
Performance of CCMMW on naive Bayes classifier.

Table 10 .
Average classification accuracy of data preparation method.

Table 10 .
Average classification accuracy of data preparation method.