Kano Model Integration with Data Mining to Predict Customer Satisfaction

The Kano model is one of the models that help determine which features must be included in a product or service to improve customer satisfaction. The model is focused on highlighting the most relevant attributes of a product or service along with customers’ estimation of how the presence of these attributes can be used to predict satisfaction about specific services or products. This research aims to develop a method to integrate the Kano model and data mining approaches to select relevant attributes that drive customer satisfaction, with a specific focus on higher education. The significant contribution of this research is to solve the problem of selecting features that are not methodically correlated to customer satisfaction, which could reduce the risk of investing in features that could ultimately be irrelevant to enhancing customer satisfaction. Questionnaire data were collected from 646 students from UAE University. The experiment suggests that XGBoost Regression and Decision Tree Regression produce best results for this kind of problem. Based on the integration between the Kano model and the feature selection method, the number of features used to predict customer satisfaction is minimized to four features. It was found that ANOVA features selection model’s integration with the Kano model gives higher Pearson correlation coefficients and higher R2 values.


Background
It is important for any corporation to maintain its existing customers, make a profitable share, and improve the profit margins. Corporations need to satisfy their customers' needs and go even beyond [1]. The client's satisfaction can be considered as one of the significant aspects that play a big part in the success or failure of any business [2]. Therefore, companies endeavor to meet and exceed customers' expectations to gain their loyalty. An unhappy customer is a critical and challenging problem that can adversely affect the business; he/she could lead to a 'churn' of the customer, which could result in the failure of the business [3]. It is much more rewarding to keep current customers loyal and happy than getting new ones. As a result, customer satisfaction prediction has become a very important concept in the business world. The concept is considerably attracting the interests of both academic researchers and businesses [4].
Integrating the Kano model with data mining techniques could improve the selection of relevant characteristics that drive customer satisfaction [5]. The Kano model can provide a precise classification of the requirements of customers, such as excitement, performance, basic, neutral, indifferent, or reverse factors [6]. On the other hand, data mining techniques do not only rank the attributes according to their importance, but they continue to use all possible variations of interaction patterns from all variables [7]. So, combining both approaches will take advantage of both.
A brief explanation of Kano's five categories is given here. The first category is excitement. It is the quality characteristics that make customers satisfied if present, but do not 2 of 18 make customers unsatisfied when absent, while the opposite situation is defined by mustbe quality characteristics [8]. On the other hand, one-dimensional quality characteristics cause customers to be satisfied when present, but dissatisfied when absent, while reverse quality characteristics have the opposite effects. Finally, apathetic quality characteristics do not affect customer satisfaction at all.
The Kano model could help managers better understand the requirements of customers [9]. The Kano model moves from a "more is always better" approach to a "less is more" approach, so adding one feature could be much better than adding many features, which could result in having an opposite effect on enhancing customers' satisfaction. On the other hand, clustering the customers to different segments using data mining techniques will allow the Kano model to improve the satisfaction for each segment. In addition, comparing both approaches could support selection decisions and avoid removing attributes that could cause information loss [10].

Aim
This research aims at developing a method to integrate the Kano model and data mining approaches to select relevant attributes that drive customer satisfaction with a specific focus on the higher education field. It also intends to apply data mining and feature selection techniques to predict customer satisfaction [11].

Method Statement
The main contribution of this research is to solve the problem of selecting features that are not methodically correlated to customer satisfaction. This could reduce the risk of investing in features that could ultimately be irrelevant to enhancing customer satisfaction. This research studies the degree of correlation between customer satisfactions and attributes [12]; in the context of customer satisfaction, how can customer satisfaction be improved by integrating the Kano model with data mining techniques to select relevant attributes that drive customer satisfaction and reduce the risk of investing in features that could ultimately be irrelevant to enhancing customer satisfaction. Figure 1 represents the research's problem and sub-problems [13].
must-be quality characteristics [8]. On the other hand, one-dimensional quality characteristics cause customers to be satisfied when present, but dissatisfied when absent, while reverse quality characteristics have the opposite effects. Finally, apathetic quality characteristics do not affect customer satisfaction at all.
The Kano model could help managers better understand the requirements of customers [9]. The Kano model moves from a "more is always better" approach to a "less is more" approach, so adding one feature could be much better than adding many features, which could result in having an opposite effect on enhancing customers' satisfaction. On the other hand, clustering the customers to different segments using data mining techniques will allow the Kano model to improve the satisfaction for each segment. In addition, comparing both approaches could support selection decisions and avoid removing attributes that could cause information loss [10].

Aim
This research aims at developing a method to integrate the Kano model and data mining approaches to select relevant attributes that drive customer satisfaction with a specific focus on the higher education field. It also intends to apply data mining and feature selection techniques to predict customer satisfaction [11].

Method Statement
The main contribution of this research is to solve the problem of selecting features that are not methodically correlated to customer satisfaction. This could reduce the risk of investing in features that could ultimately be irrelevant to enhancing customer satisfaction. This research studies the degree of correlation between customer satisfactions and attributes [12]; in the context of customer satisfaction, how can customer satisfaction be improved by integrating the Kano model with data mining techniques to select relevant attributes that drive customer satisfaction and reduce the risk of investing in features that could ultimately be irrelevant to enhancing customer satisfaction. Figure 1 represents the research's problem and sub-problems [13].
The Kano model has the advantage of classifying the customer requirement into different categories (excitement, performance, basic, neutral, or reverse factors) [14]. It could enhance the understanding of customer requirements. Therefore, integrating the Kano model with data mining techniques could enhance the process of selecting the aspects that are more significant for the contentment of the clients. Moreover, the process could reduce the resources required to produce a particular product or service, helping in efficient manufacturing.  The Kano model has the advantage of classifying the customer requirement into different categories (excitement, performance, basic, neutral, or reverse factors) [14]. It could enhance the understanding of customer requirements. Therefore, integrating the Kano model with data mining techniques could enhance the process of selecting the aspects that are more significant for the contentment of the clients. Moreover, the process could reduce the resources required to produce a particular product or service, helping in efficient manufacturing.

Literature Review
This research focuses on the notable contributions in the literature of customer satisfaction prediction to enhance customer satisfaction by selecting the most essential attributes [15]. Though data mining methods have made numerous advances in information processing and representation as compared to traditional techniques, this research will show why they still have not resolved the problem of feature categorization according to the Kano categorization.
According to the previous research, among all data collection techniques and surveys used, only those who used the Kano questionnaire were able to categorize the features according to Kano's five categories [16]. The main problem with existing data mining techniques is that the feature selected does not represent the real feature correlated to customer satisfaction. As mentioned earlier, the main contribution of this research is to solve the problem of selecting features that are not methodically correlated to customer satisfaction [17]. The new proposed model could reduce the risk of investing in features that could ultimately be irrelevant to enhancing customer satisfaction because it will exclude these features.
An earlier study deployed the Kano Model in Higher Education for Quality Improvement by comparing the current situation with the ideal situation of the indicators for the quality using a traditional survey [18]. The Kano Model was applied to categorize the requirements into five categories to know which attributes could increase customer satisfaction. Feature selection techniques have been applied to choose the most important attributes to minimize dimensionality. Moreover, studies exploring the Kano model have applied the model without any integration with feature selection techniques [19]. The only combination that ensued was to group clients to different clusters and then the Kano model was applied to draw out the users' requirements of each cluster. Nevertheless, to the author's knowledge, no studies to date have generated a model that combines the Kano model with feature selection techniques to select and rank the most prominent attributes related to customer satisfaction as presented in this proposal.
Scientists and scholars have admitted the fact that statistics has been the most prosperous information science [20]. In comparison, the emergence of data analysis, such as data mining, is fundamentally ascribed to the progress of technologies in computing and data storage [21]. In the following sub-section, an overview of customer satisfaction prediction using data mining techniques will be provided. Afterwards, features selection techniques will be discussed. The next sub-section will present the Kano Model. The last sub-section will illustrate how the integration between the Kano model and data mining could improve customer satisfaction.

Data Mining Approaches
For analyzing the survey data, we used a couple of ML models like Logistic Regression, Decision Tree Regression, Random Forest Regression, Adaboost Regression, XGBoost Regression, and Random Tree Regression. For tuning the model and improving the performance of the model, we have used some of the common feature selection methods like correlation-based feature selection, chi square-based feature selection, mutual information lasso feature selection, and ANOVA t-test based feature selection [22]. In this section, the justification along with a quick overview of the model has been presented. Moreover, the major advantages and limitations of each model have been discussed in this section.

Logistic Regression
In empirical research, logistic regression is a statistical technique that is often used to analyze categorical dependent variables [23]. An individual's class (or category) may be predicted using the statistical method of logistic regression, which is based on one or more factors (x). Since it is simple to implement a broad range of applications, it may serve as a performance basis for several systems [24]. As a result, each engineer should be acquainted with the ideas it contains [25]. Furthermore, while developing neural networks, the principles of logistic regression may be used in the development of deep learning by using neural network architecture [26]. If one has a binary result, which is a variable with just two potential values (0 and 1, yes and no, ill or well), one can use this symbol to represent it.

Decision Tree Regression
Decision trees are used to develop regression or classification models based on tree topologies. It progressively subdivides a dataset into smaller and smaller subgroups while simultaneously constructing a decision-making tree to represent the data [21]. As a result, a tree comprising leaf nodes and decision nodes is formed. A decision node is composed of two or more branches, each of which represents a value for the feature being checked [27]. The leaf node shows that a decision has been made about the numerical objective. The root node of a decision tree is the node at the top of the tree that corresponds to the best prediction. Numerous tests, including multicollinearity tests, VIF calculations, and IV calculations on variables, may be performed to narrow the field down to a small number of top variables. Therefore, performance is enhanced since all the undesirable factors have been eliminated [28].

Random Forest Regression
Random Forest Regression is a supervised learning technique that makes use of a regression learning methodology to get its results [29]. Using ensemble learning, one may build a forecast that is more accurate than a single model by combining predictions from multiple algorithms simultaneously [30].
The Random Forest is constructed wherein the trees run parallel to one another, but do not meet one another at all. Random Forests are used to train decision trees since they build multiple decision trees at once, and give the mean class for all the trees [31].

Adaboost Regression
Adaboost develops and assembles itself mostly via the efforts of succeeding members who have been trained to correctly predict the appearance of certain data events [32]. Each new predictor is provided with a training package that includes progressively difficult examples that may be weighted or resampled as they go through the training process [33]. It is a straightforward meta-estimator that begins by fitting an instance regressor to the original data set, and then fits further regressor copies to the same data set, but with the weights of the instances modified to account for the current prediction error [34]. Therefore, successive regressors lay emphasis on more complicated circumstances.

XGBoost Regression Random Tree
XGBoost is a highly successful regression technique for the development of controlled models that may be found in many applications [35]. It is possible that knowledge of its goal function (XGBoost), in addition to the basic learners, will aid in determining the veracity of this claim. In the purpose function, there is a loss function as well as a regularization term that must be considered. The distance between the actual values and the model's predictions is shown by this parameter, which is also known as the gap between the observed and expected values. The reg: linear and reg: logistics functions are the most often encountered sources of XGBoost regression problems [36].

Feature Selection Methods
In machine learning, attribute selection has been perceived to be a preferable technique for selecting a subset of relevant features from high-dimensional data. According to [37], the Feature Selection Model is essential for analyzing the variability and how common the product is amongst other products in an organization portfolio. It proposes to incorporate customer preference information into the model using sentiment analysis of user-generated product reviews [38].
Different features selection methods have been used to discover the most important attributes amongst all the attributes of various brand measures. Principle Component Analysis (PCA), Correlation-based Feature Subset Selection, and Relief method have been discussed as attribute selection methods [39]. Furthermore, feature selection algorithms such as Exploratory Factor Analysis (EFA) [40], feature-based transfer learning strategy, TFS supervised forward feature selection (SFFS) [41], and Filter-Wrapper [42] were used.
In addition to this, balanced iterative reducing and clustering using hierarchies (BIRCH) has been used for customer segmentation [43]. K-means algorithm clustering was based on the loyalty level [44]. Different feature selection techniques in text cate-Big Data Cogn. Comput. 2021, 5, 66 5 of 18 gorization have been discussed which include Information Gain (IG), Chi-square (CHI), Correlation Coefficient (CC), and Odds Ratio (OR). To compare different feature selection techniques, different performance metrics like the number of features selected, a list of features, Classifier accuracy, and elapsed time can be used. Feature selection could improve the performance of the prediction algorithms and reduce the memory storage requirements and computation time, which could reduce the computational costs for data analytics [45].
As mentioned earlier, the Kano model can categorize attributes into five different categories, which has made the Kano model very popular over the last three decades. Different approaches were utilized to assess different kinds of similarities between the mentioned models. According to [46], various methods have been used to classify quality attributes into Kano categories, which include the Penalty-Reward Contrast Analysis (PRCA), Importance Grid Analysis (IGA), Direct Classification Method, and qualitative data methods. The Kano questionnaire and the direct classification method seem to be the most capable way of characterization technique. However, it is very complicated, and not easy to be implemented.

The Kano Model
The Kano model represents one of the practical tools that managers can use to assess which characteristics of their company's products are considered most relevant for customer satisfaction [47]. Since the introduction of this model, it has gained the interest of both academia and practitioners. In theory, every characteristic of a product, qualitative or quantitative, can be classified into five categories [48].
These characteristics, also called basic requirements, can be considered as pre-requisite features that are taken for granted and affect satisfaction only when absent. The features from the category of one-dimensional quality, also called performance requirements, affect satisfaction both when present and absent. When present, they improve customer satisfaction, while their absence undermines customer satisfaction. On the contrary, reverse quality attributes improve customer satisfaction when absent and reduce it when present. Finally, indifferent quality characteristics do not have a relevant contribution to customer satisfaction. Table 1 shows Kano categories [49]. Various customer satisfaction models can be adopted in research, which can include analytical Kano (A-Kano) model using quantitative measures, fuzzy Kano approach, the Kano method, which is based on the classical conjoint analysis model [50], and CSTrust that combines the quality of service (QoS) and customer satisfaction prediction [51]. Ref. [52] identified the Kano model that uses quantitative and qualitative approaches, which could explain the association between customer satisfaction and customer requirement fulfillment [52]. Fuzzy Kano questionnaire was used to determine the most important factors in food quality.
Different scholars have used the Kano model to explain their viewpoints. Literature review reveals the product and service quality features and their impact on customer satisfaction as mentioned by researchers. Ref. [45] explained that complete awareness of customers' requirements, for example desires and anticipations, represents the critical and mandatory qualification for all those organizations that want to achieve customer satisfaction. Almost two decades ago, Noriaki Kano conceptualized and presented an extremely beneficial model called the Kano model to categorize the characteristics of a product or service, bearing in mind how any product or service can fulfill the demands of the users. The Kano model is deeply entrenched in social psychology. Therefore, the researchers were able to differentiate the aspects into three different kinds in relation to the expectations from the service. The contentment of the clients is found to be deeply impacted by the fulfillment of the mentioned aspects. The classification process might be advantageous for the innovative design guide as an outcome to a novelty element.

The Kano Model and Data Mining Integration
The existence and continued use of the Kano model over the past three decades may be indicative of the model's effectiveness in analyzing customer satisfaction. However, new approaches, such as data mining, have become popular. Thus, the following section will examine certain literature to elaborate on whether the use of data mining to complement the Kano model is a novel idea [53].
This paper reviews data mining integration with the Kano model in addition to wellknown statistical methodologies of customer satisfaction. The data mining model can predict customer satisfaction by employing a minimum number of customer attributes required with extremely accurate results. A correlation between the degree of these attributes and customer satisfaction can be analyzed [54]. Thanks to this methodological approach, company market shares and customer loyalty can be enhanced, and risk can be reduced by avoiding investment in those attributes that are not directly linked to customer satisfaction maximization. The integration of the Kano model with the data mining approach is expected to enhance the limitations of previous standalone methodologies. Furthermore, organizational performance transformation can be guaranteed and reinforced via effective customer satisfaction measurement [55].
The Kano model has employed various regression analyses to evaluate the model's non-linear and asymmetric relationships [56]. Other researchers have criticized the effectiveness of those models that were conceptualized to assess the repetitions in order to evaluate the model's reliability in assessing the aspects of quality. The Kano model would be employed to extract users' inherent needs from the derived clusters [57]. The resulting model customized a website's content per user cluster and provided an improved newsfeed ideal for each user [58].

Methodology
Predicting the behavior of customers' unstructured data is well-suited for AI-based algorithms, which search for hidden features and commonalities to link clusters of data that have specific properties [59]. Furthermore, these models are capable of forecasting price, weather conditions, and customer preferences. It is possible to create customer behavior predictions by segmenting consumers into artificial intelligence groups since customers with similar traits are more likely to buy the same item [60].
Marketers may enhance their service to potential and existing customers by detecting patterns in big data or data already collected by an organization. With the expansion of this industry, it is expected to grow much more in the years to come [61]. Using data mining technologies prone to have problems with customer satisfaction. The main idea here is to select the appropriate feature selection combination and ML model that predicts the maximum possible accuracy by using the minimum number of variables or features [62]. In this case, the developed questionnaire contains nearly 36 features related to student satisfaction in the university. So, if the college or university management wanted to increase the student satisfaction rate, it would need to concentrate on all 36 features. However, the problem is that it practically takes a lot of time as well as resources.
According to [63], performance attributes are also known as satisfiers. These attributes increase the customer's enjoyment of the product or service. They do not come under the basic requirements of the product. Ref. [64] revealed that excitement attributes, also known as surprising elements, offer the uniqueness from the products of rivals and the competitive edge to the product. According to [65], customers do not know whether they want this feature or not for the functioning of the product. However, these attributes increase customer satisfaction directly. Ref. [66] found that the basic features provide more satisfaction to the customers. However, along with the basic functioning features, the usage of delighters and one-dimensional attributes increase customer satisfaction because it makes the customers feel that they have the best product or service in hand. It also gives the feeling that they have something different from the common ones [19]. The category of attractive quality refers to characteristics of a product that can improve customer satisfaction if they are present but do not make customers dissatisfied when absent. These characteristics, also called excitement requirements, can be observed as minor bonuses that make customers more satisfied but are not expected by the customers [49]. On the other hand, the category of must-be quality refers to those characteristics which would not make customers satisfied when present but would make them dissatisfied when absent.

Data Collection
In this research, a survey was conducted involving students from the United Arab Emirates University. For this research, the sample was selected randomly from different colleges of United Arab Emirates University (UAE). It is found that nearly 14,387 students are studying in the UAE University. For ensuring the 95% confidence level and 5% margin of error, we need a minimum of 375 or more respondents. It is calculated using the below-given formula.
where N represents population size, e denotes Margin of error (percentage in decimal form), and z indicates z-score. So, we sent a survey questionnaire to nearly 1500 students. For reaching the respondents, we used online survey conducting tools. At the end of the data collection process, we collected responses of 646 students.

Scope of the Study
The proposed research helps find out the most important features that have a maximum impact on the student satisfaction rate. So, we can concentrate on these few parameters to improve the student's satisfaction rate [67]. For selecting the most important features that show a very higher impact on the student's satisfaction, this paper uses the combination of the Kano model as well as ML feature selection approaches [68]. The paper offers a method for determining students' happiness or satisfaction with the university based on the major features like lab facilities. The author especially intended to sort out the major features that affect the student's satisfaction with the university so that the universities can be able to focus on those areas to improve the student's satisfaction.
We utilized data mining to evaluate students' behavior based on several variables, such as their usage of laboratory facilities. Figure 2 clearly shows the proposed methodology. Different research papers discussed how one-dimensional and delight features are related to satisfaction. According to [63], performance attributes, also known as satisfiers, increase the customer's enjoyment of the product or service. These attributes do not come under the basic requirements of the product. Ref. [69] revealed that excitement attributes, also known as surprising elements, offer the uniqueness from the products of rivals and the competitive edge to the product. According to [65], customers do not know whether they want this feature or not for the functioning of the product. However, these attributes increase customer satisfaction directly. Ref. [66] found that the basic features do not provide more satisfaction to the customers. However, along with the basic functioning features, the usage of delighters and one-dimensional attributes increases customer satisfaction because it makes the customers feel that they have the best product or service in hand [19]. It gives the feeling that they have something different from the common ones [19]. The category of attractive quality refers to characteristics of a product that can improve customer satisfaction if they are present but do not make customers dissatisfied when absent [40]. On the other hand, the category of must-be quality refers to those characteristics which would not make customers satisfied when present but would make them unsatisfied when absent.
increase customer satisfaction directly. Ref. [66] found that the basic features do not provide more satisfaction to the customers. However, along with the basic functioning features, the usage of delighters and one-dimensional attributes increases customer satisfaction because it makes the customers feel that they have the best product or service in hand [19]. It gives the feeling that they have something different from the common ones [19]. The category of attractive quality refers to characteristics of a product that can improve customer satisfaction if they are present but do not make customers dissatisfied when absent [40]. On the other hand, the category of must-be quality refers to those characteristics which would not make customers satisfied when present but would make them unsatisfied when absent.

Research Analysis Criteria
Evaluation of the results will be done by various methods used to assess the performance, like Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-Square value [70]. These evaluation techniques are the most popular metrics for continuous variables similar to the present problem. The connection between the real values and the estimated values of Y can be determined with the help of the coefficient. If the coefficient is attributed to a greater value, the procedures are said to be effective.

Research Analysis Criteria
Evaluation of the results will be done by various methods used to assess the performance, like Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-Square value [70]. These evaluation techniques are the most popular metrics for continuous variables similar to the present problem. The connection between the real values and the estimated values of Y can be determined with the help of the coefficient. If the coefficient is attributed to a greater value, the procedures are said to be effective. The closeness of the prediction to the eventual outcomes can be assessed with the help of a unique error identification mechanism. Root Mean Square Error (RMSE) represents the sample standard deviation of the differences between predicted values (y') and observed values (y). The lower values of mean absolute error are often attributed to effectiveness in performance. In addition, the Pearson correlation will be used to find the overall correlation between the independent variables and dependent variables. The next measure used for evaluating the model is the R-square value. It is also one of the important measures for evaluating the regression model [71]. It exactly shows the percentage of dependent variables measured by the model. According to the authors, the R-square is one of the important measures for evaluating the regression models. R-square value is found to be in the range of 0 to 1. The higher the R-square value, the higher the models' performance.

Results for All Features
Results involving different prediction methods have been presented in Table 2 for both datasets: the satisfaction datasets. The best results were obtained with XGBoost Regression Model with depth = 15. The high correlation coefficient value is 0.986. The Root Mean Squared Error of the model is 0.168 [72]. The Mean Absolute Error of the model is 0.025. The observation has given rise to the assumption that the contentment of the students can be effectively assessed with the help of this model. Moreover, the coefficient is often found to have a value that is higher than 0.6, which also seems to exhibit the fact that the model is effective in the process of prediction [73]. From Table 2, the XGBoost Regression Algorithm provides very accurate results because this model has a higher fit [74]. The R2 value for this model was found to be 0.974, and the correlation value was found to be 0.987. It means that 97.4 percentage of the dependent variable is explained by the independent variables. The next step is to categorize features according to the Kano model. Based on the Kano questionnaire, Table 3 shows the Kano feature categorization into five different categories. The Information in the E-services (Grades, Schedules, Payment Reports, etc.)

One-Dimensional
Code26 Organizing socio-cultural activities Basic

Code27
Teaching quality One-Dimensional Code28 Curve grading system Basic Table 3. Cont.

Code29
The availability of internet in the campus Basic

Code30
Organizing some courses with certificate Basic

Code31
The libraries having got a rich data base One-Dimensional

Code32
The range of Academic Majors Basic Code33 The

Feature Selection Results
The Kano (Dimensional and delight features) are 7, 13, 25, 27, 35, 12, 23, 31, and 36, so the common features between Kano (dimensional and delight features) and other feature selection methods are shown in Table 4 below. The common features were observed and assessed in a more comprehensive way to enhance the effectiveness of the study. Table 4 (summary of the feature selection approach) shows the detailed features selected by each machine learning approach as well as the Kano model. Moreover, the common features between different ML models as well as the Kano model are presented in the table [46]. We can see that the first two common features in the satisfaction dataset are the Information Technology Services. Teaching quality features are considered as dimension features according to the Kano model categorization. When present, they improve student satisfaction, whereas their absence undermines satisfaction. The results of feature selection are shown in Table 4.

Prediction Results for Satisfaction Dataset with Selected Features
This section of the paper discusses the key results of various ML prediction models on the target variable. Here, the different ML model's results for different feature selection methods are given as tablets. In this section, a detailed comparison of different ML models for different feature selection models has been provided [75]. From the conducted feature selection process, five attributes have been selected. Here, these five attributes are common attributes between the ML-based feature selection as well as the Kano model feature selection. We also tried different imputations on the feature selection model integration with the Kano model like taking union attributes etc. However, the "Union Approach" increases the number of attributes. At the same time, taking common attributes for the analysis provides results nearly close to those with all variables [35]. Table 5 contains the key results of the Multiple Linear regression model. From the table given below, the lasso feature selection method performed very well with the multiple linear regression model [76]. It has given a higher R-square value as well as a higher Pearson correlation value. In addition, this combination gives a lower RMSE value and MAE value. Here, the R-square value is 0.32, and the Pearson correlation value is 0.57395. It means that 33 % of the variables found to be dependent on certain aspects can be examined using the opposite kind of variables [48]. Another main parameter is the RMSE value for different feature selection approaches. We found different RMSE values. Among them, for the lasso regularizationbased feature selection method, the RMSE value is lower and is equal to 0.85767. Here, the linear regression results are very poor as compared to other methods. Table 6 contains the key results of the XGBRegressor model with depth = 15. From the table given below, it is clear that the ANOVA based feature selection method performed very well with the XGBRegressor model. It has given a higher R-square value as well as a higher Pearson correlation value. In addition, this combination gives a lower RMSE value and MAE value [35]). Besides this, the R-square value is 0.69068, and the Pearson correlation value is 0.83106. It means that 69% of the variables found to be dependent on certain aspects can be examined using the opposite kind of variables. Here, the results show that the XGBRegressor model predicts the target variable very well than the multiple linear regression model [77]. Another main parameter is the RMSE value for different feature selection approaches. We found different RMSE values. Among them, the RMSE value for the ANOVA feature selection method is lower, namely, 0.58250. Table 7 presents the key results of the AdaBoost Regressor model. From the table given below, the ANOVA based feature selection method performed very well with the AdaBoost Regressor model [78]. It produced a higher R-square value as well as a higher Pearson correlation value. Moreover, this combination gives a lower RMSE value and MAE value [79]. The R-square value is 0.68051, and the Pearson correlation value is 0.82530. It means that 68% of the variables found to be dependent on certain aspects can be examined using the opposite kind of variables. The outcomes were able to express the fact that the AdaBoost Regressor model with depth = 15 predicts the target variable very well than the multiple linear regressor model. However, the performance of the AdaBoost Regressor model with the ANOVA feature selection is lower than the XGBRegressor model with ANOVA feature selection [78]. The RMSE value for the ANOVA feature selection method is lower, in particular, 0.592.  It provided a higher R-square value as well as a higher Pearson correlation value. Moreover, this combination produced a lower RMSE value and MAE value. The R-square value was 0.68664, and Pearson correlation value was 0.82883 [80]. This means that 68.7% of the variables were found to be dependent on certain aspects which can be examined using the opposite kind of variables. The outcomes were able to express the fact that the Random Forest Regressor model with a depth of 15 can predict the target variable better than the multiple linear regressor model [81]. However, the performance of this model was slightly lower than the performance of the AdaBoost Regressor model with the ANOVA feature selection and the XGBRegressor model with the ANOVA feature selection. The RMSE value for the ANOVA feature selection method was lower; 0.58630. Table 9 contains the key results of the Decision Tree Regressor model. From the table given above, it is clear that the ANOVA feature selection method performed very well with the Decision Tree Regressor model with depth = 15. It has produced a higher R-square value as well as a higher Pearson correlation value. Furthermore, this combination provides a lower RMSE value and MAE value. The R-square value is 0.69068, and the Pearson correlation value is 0.83107. The decision Tree Regressor model predicts the target variable very well than the multiple linear regressor model [82]. Besides this, the performance of the Decision Tree Regressor model with the ANOVA feature selection is similar to the XGBRegressor model with the ANOVA feature selection. The RMSE value for the ANOVA feature selection method is lower, specifically, 0.58251.

Discussion
The connection between the contentment of the clients and different aspects of the institutions has been found to be effectively determined with the help of regression techniques [83]. As shown in the result part, the main metrics for model evaluation used in this research are R-square value, RMSE, MAE, and Pearson correlation coefficient. The comparison clearly shows that the best model with all attributes (XgBoost Regression model with n = 15) has R-Square value, RMSE, MAE, and Pearson Correlation Coefficient of 0.974, 0.169, 0.025, and 0.987, respectively. One of the goals of the integration experiments was to find out the subset of attributes that can provide almost the same prediction accuracy as with all attributes besides knowing which attributes match between Kano and other feature selection methods. The results show that XGBRegressor with depth 15 and Decision Tree Regression with depth 15 have the best performance. Only four features have been used to predict the common features between ANOVA and Kano features located under one dimensional and delight categories. The R-Square value, RMSE, MAE, and Pearson Correlation Coefficient are 0.69, 0.58, 0.32, and 0.83, respectively, which are closer to the model with all attributes.
The outcomes were able to suggest that the techniques that are used to assess the repetitions were found to be capable of identifying the relationship between the contentment of the clients and different aspects of the services of the institution [69]. Different methods were found to be effective in determining the aspects that are more significant in contributing to the contentment of the clients. In addition, the process of obtaining similar aspects was able to enhance the precision in relation to the assessment of all different characteristics. With the help of this information, the administration team of the institution will be able to significantly improve the contentment of the clients. The outcomes were able to determine the major aspects that were found separately in various institutions. So, it will be a wise move to combine various university services for predicting customer satisfaction. Various university services seem to emphasize various aspects; therefore, the unification process will be able to substantially enhance the services of all the involved institutions. Moreover, the study can be equipped in many different situations to obtain effective outcomes.
The significance of every aspect in relation to the contentment of the clients has been carefully observed. From this research, it is clear that the maximum R-square value and Pearson correlation value are found to be 0.69068 and 0.83107, respectively, for Decision Tree Regressor as well as XGBoost Regressor. Moreover, the used feature selection approach is ANOVA Based Feature selection approach. Here, these results are derived using four different parameters like R-square value, RMSE, MAE, and Pearson correlation coefficient. The common attributes between the ANOVA features selection method and Kano's one dimensional and delight features produce the highest Pearson correlation coefficient value that is equal to 83%. It is nearest to the results with all the attributes with a 98% Pearson correlation coefficient. It was achieved with only four features which can be considered a very small number of features as compared to the full model which has 37 attributes. This shows that the ANOVA technique is effective in the identification of aspects of the students that are found to be effectively contributing to the contentment of the students. Moreover, the four similar characteristics between Kano and ANOVA feature selection can produce acceptable readings of performance if the information is adequate. The four mentioned characteristics are Food, Dining Hall, Services, and the Possibility of communicating with the administration. They are located under the Delight category. The other common services are the Information in the E-services (Grades, Schedules, Payment Reports, etc.) and teaching quality. They are located under the one-dimensional category. The teaching quality feature has been selected by all feature selection methods, which means that it is the most important attribute. The correlation coefficient between the features and the student satisfaction index is not less than 0.48 for all prediction methods.

Conclusions
The main objective of the experiment is the construction of an effective model that can determine student satisfaction at university based on the independent variables, like features and facilities, provided by the university. Integrating the Kano model with data mining techniques could improve the selection of relevant characteristics that drive customer satisfaction. Different kinds of regression techniques were equipped in the experiment for the purpose of determining the contentment of the students. According to the results of integration between the Kano model and ANOVA feature selection method, we found Food, Dining Hall, Services, the Possibility of communicating with the administration as the most important features related to satisfaction. The Information in the E-services (Grades, Schedules, Payment Reports, etc.) relates to teaching quality. The teaching quality feature has been selected by all feature selection methods, which means that it is the most important attribute. It was achieved with only these four features, which can be considered as a very small number of features as compared to the full model which has 37 attributes. This shows that the ANOVA technique is effective in the identification of aspects of the students that are found to be effectively contributing to the contentment of the students. The comparison clearly shows that the best model with all attributes (XGBoost Regression model with n = 15) has R-Square value, RMSE, MAE, and Pearson Correlation Coefficient of 0.974, 0.169, 0.025, and 0.987, respectively. From this research, it is clear that the Maximum R-Square value, RMSE, MAE, and Pearson Correlation Coefficient are 0.69, 0.58, 0.32, and 0.83, respectively, for the XGBoost Regressor as well as the Decision Tree Regressor, which are closer to the model with all attributes. Furthermore, the correlation coefficient between the features and the student satisfaction index is not less than 0.48 for all prediction methods.
Based on the results, the administration team of the institution will be able to effectively make use of the connections to determine the contentment of the students in relation to any changes that are to be made to the characteristics of the institution. There are 646 records that are attributed to small data, and this is one of the very few disadvantages of the experiment. Moreover, the outcomes could be only effective for the institutions that are present within the country.
This research work can be further improved by using different imputation approaches to feature selection. In this paper, the common features among the Kano model as well as the ML-based feature selection approach have been used. Here, higher importance was given to the feature selection approach and ML prediction algorithms. In the future, the model tuning processes should be conducted to improve the model efficiency and accuracy. Funding: This research was funded by United Arab Emirates University, grant number 31T076 and the APC was funded by United Arab Emirates University.