Analysis of University Students’ Behavior Based on a Fusion K-Means Clustering Algorithm

.


Introduction
With the continuous development of information technology, the data accumulated in the campus information environment is gradually expanding, and a complete campus big data environment has been formed. Traditional campus management concepts and data analysis methods have been unable to meet the growing data processing needs. How to effectively manage and share campus data, use big data mining ideas to optimize student management, and provide clearer, more detailed data services for students is a problem faced by today's campus service system.
At present, there is much research on university students' behavior, involving multiple aspects of behavior. Some researchers analyzed the college students' physical activities (PA) to find PA patterns and their determinants, which will help college students to notice their health conditions [1]. Belingheri examined the prevalence of smoking, binge drinking, physical inactivity, and excessive bodyweight in a population of healthcare students attending an Italian university [2]. SY Park studied the university students' behavioral intention to use mobile learning [3]. J Kormos investigated the influence of motivational factors and self-regulatory strategies on autonomous learning behavior [4]. Most of the above studies mainly used traditional mathematical statistical methods to obtain data information, but did not further explore the laws behind the data. Besides, it can be seen from these studies that the learning behavior and life behavior of university students are the main concerns of the school. Therefore, we mainly focus on the learning performance and living habits of university students, and classify them accordingly.

1.
We applied the relevant theories and knowledge of data mining and machine learning to the analysis of university students' behavior. This is an application innovation in the field of machine learning and education.

2.
Compared with the traditional behavior analysis model of college students, this study used data for analysis, which reduces the subjectivity of human judgment and avoids prejudice caused by preconceptions. Therefore, the analysis results of this study are more objective.

3.
Most current research used the K-Means algorithm, but the number of student categories and cluster centers are difficult to determine. The K-Means and clustering by fast search and find of density peaks (K-CFSFDP) proposed in this study can automatically determine the number of student behavior types and typical representatives based on data. Therefore, K-CFSFDP has high flexibility and wide applicability, and it also avoids human intervention in the clustering process. 4.
The K-CFSFDP proposed in this research did not completely rely on the CFSFDP framework, instead, it improved CFSFDP in some aspects. Therefore, the running time is shorter and the running efficiency is higher, which has application advantages in the environment of campus big data.

Behavior Analysis Indicators
The data is obtained from 4 universities in China. For convenience, S1, S2, S3, and S4 represent 4 different universities. We mainly described the university students' behavior on two aspects: living habit and learning performance.
The evaluation indexes of living habit are as Table 1: The evaluation indexes of learning performance are as Table 2: As the dimensions of various behavior indicators are not uniform, this study used maximum and minimum standardization to normalize the data. This is a linear transformation of the original data, which can map the data values on the [0, 10] interval. The transformation function is as follows: where max is the maximum value of the data and min is the minimum value of the data. After normalization, the average values of the data were calculated. Then the evaluation indexes of the living habit and learning performance were obtained, and the value range was from 1 to 10. When the value is higher, the performance is much better.
The data in 4 different universities is as Table 3:

Data Visualization
In order to provide more details of the data, this study used violin plots, box plots, scatter plots and scatter plot matrices to visualize the data. From the data visualization chart, we could get the distribution law and aggregation degree of the data.
The violin plot can display the distribution status and probability density of multiple sets of data. The violin chart of the data is shown in Figure 1. Green represents the living performance score, blue represents the learning performance score, and the abscissa corresponds to 4 different universities. It can be seen from the violin chart that the distribution of student behavior data in different schools is different. The living data and learning data of students in the same school are not the same. In addition, it can be intuitively seen from the figure that the student behavior data does not meet the common normal distribution, and the data is concentrated near certain values, so there is a possibility of clustering.
Appl. Sci. 2020, 10, x 5 of 28 The violin plot can display the distribution status and probability density of multiple sets of data. The violin chart of the data is shown in Figure 1. Green represents the living performance score, blue represents the learning performance score, and the abscissa corresponds to 4 different universities. It can be seen from the violin chart that the distribution of student behavior data in different schools is different. The living data and learning data of students in the same school are not the same. In addition, it can be intuitively seen from the figure that the student behavior data does not meet the common normal distribution, and the data is concentrated near certain values, so there is a possibility of clustering. A box plot is a statistical chart that can display data dispersion. It can display the maximum, minimum, median, outlier, and upper and lower quartiles of a set of data. It can provide key information about the location and dispersion of data, especially when comparing different university student behavior data. The box diagram of the data is shown in Figure 2. Green represents the living performance score, blue represents the learning performance score, and the abscissa corresponds to 4 different universities. The horizontal line in the middle of the box is the median of the data, and the upper and lower ranges of the box correspond to the upper and lower quartiles of the data. The horizontal line above the box is the maximum value, and the horizontal line below the box is the minimum value. The solid points are outliers. It can be seen from the figure that, overall, the learning scores of college students in most schools were lower than living scores. In addition, the center points of most data were between 4 and 7. A box plot is a statistical chart that can display data dispersion. It can display the maximum, minimum, median, outlier, and upper and lower quartiles of a set of data. It can provide key information about the location and dispersion of data, especially when comparing different university student behavior data. The box diagram of the data is shown in Figure 2. Green represents the living performance score, blue represents the learning performance score, and the abscissa corresponds to 4 different universities. The horizontal line in the middle of the box is the median of the data, and the upper and lower ranges of the box correspond to the upper and lower quartiles of the data. The horizontal line above the box is the maximum value, and the horizontal line below the box is the minimum value. The solid points are outliers. It can be seen from the figure that, overall, the learning scores of college students in most schools were lower than living scores. In addition, the center points of most data were between 4 and 7. The data distribution scatter diagram is shown in Figure 3. It can be seen intuitively from the figure that the data points show the phenomenon of aggregation. Some data points are very densely distributed, while others are relatively sparsely distributed. We can see very intuitively from the data distribution map of the first university that these data points were clustered into 7 categories. This shows that many students had similar living habits and learning performance and could be divided into a certain number of categories. Further, we draw scatter plot matrices to comprehensively show the distribution of data and the shape of aggregation. The scatter plot matrix can reflect the correlation between learning scores and The data distribution scatter diagram is shown in Figure 3. It can be seen intuitively from the figure that the data points show the phenomenon of aggregation. Some data points are very densely distributed, while others are relatively sparsely distributed. We can see very intuitively from the data distribution map of the first university that these data points were clustered into 7 categories. This shows that many students had similar living habits and learning performance and could be divided into a certain number of categories.
Appl. Sci. 2020, 10, x 6 of 28 The data distribution scatter diagram is shown in Figure 3. It can be seen intuitively from the figure that the data points show the phenomenon of aggregation. Some data points are very densely distributed, while others are relatively sparsely distributed. We can see very intuitively from the data distribution map of the first university that these data points were clustered into 7 categories. This shows that many students had similar living habits and learning performance and could be divided into a certain number of categories. Further, we draw scatter plot matrices to comprehensively show the distribution of data and the shape of aggregation. The scatter plot matrix can reflect the correlation between learning scores and Further, we draw scatter plot matrices to comprehensively show the distribution of data and the shape of aggregation. The scatter plot matrix can reflect the correlation between learning scores Appl. Sci. 2020, 10, 6566 7 of 28 and living scores. The matrix includes three types of charts. The upper triangle represents the scatter plot of the data, the diagonal line represents the probability density distribution map, and the lower triangle represents the contour map (the denser the data, the more concentrated the line, and the brighter the color).
The scatter plot matrices of the four universities are shown in Figures 4-7 respectively. It can be seen from these scatter plot matrices that some data points were very densely distributed. This proves that some university students had similarities in living and study, and they gathered into a certain number of clusters. In addition, it can be seen from the figure that the number of distribution groups, the characteristics of study and life, and the degree of aggregation of student behavior in different universities were different. Therefore, the task of this research was to divide these categories using cluster analysis, so as to make a scientific and targeted classification of university student behavior. This helps school administrators to provide corresponding guidance for university students' improvement in living and learning according to different categories.
Appl. Sci. 2020, 10, x 7 of 28 living scores. The matrix includes three types of charts. The upper triangle represents the scatter plot of the data, the diagonal line represents the probability density distribution map, and the lower triangle represents the contour map (the denser the data, the more concentrated the line, and the brighter the color).
The scatter plot matrices of the four universities are shown in Figures 4-7 respectively. It can be seen from these scatter plot matrices that some data points were very densely distributed. This proves that some university students had similarities in living and study, and they gathered into a certain number of clusters. In addition, it can be seen from the figure that the number of distribution groups, the characteristics of study and life, and the degree of aggregation of student behavior in different universities were different. Therefore, the task of this research was to divide these categories using cluster analysis, so as to make a scientific and targeted classification of university student behavior. This helps school administrators to provide corresponding guidance for university students' improvement in living and learning according to different categories.   Appl. Sci. 2020, 10, x 7 of 28 living scores. The matrix includes three types of charts. The upper triangle represents the scatter plot of the data, the diagonal line represents the probability density distribution map, and the lower triangle represents the contour map (the denser the data, the more concentrated the line, and the brighter the color). The scatter plot matrices of the four universities are shown in Figures 4-7 respectively. It can be seen from these scatter plot matrices that some data points were very densely distributed. This proves that some university students had similarities in living and study, and they gathered into a certain number of clusters. In addition, it can be seen from the figure that the number of distribution groups, the characteristics of study and life, and the degree of aggregation of student behavior in different universities were different. Therefore, the task of this research was to divide these categories using cluster analysis, so as to make a scientific and targeted classification of university student behavior. This helps school administrators to provide corresponding guidance for university students' improvement in living and learning according to different categories.

Data Analysis and Algorithm Tools
The above data analysis and visualization show that the categories and distribution of student behaviors in different schools were different, so it is impossible to use a unified baseline to classify student behaviors. That is, there is no universal student classification method to label each student's category. In other words, the existing classification methods based on expert evaluation and educational standards are not objective and accurate enough and do not consider the diversity of student categories. Therefore, this is an unsupervised machine learning problem. Clustering is an effective method to solve unsupervised learning problems. We mainly used the K-CFSFDP algorithm to analyze the students' behaviors and compared with K-Means and CFSFDP clustering algorithms.
The K-Means clustering algorithm is a classical algorithm that is widely used in fields such as data mining and knowledge discovery. The principle of K-Means clustering algorithm is simple with high efficiency, and it is adapted to big data sets. However, it has two major drawbacks. First, the k value needs to be set in advance, in most cases, the optimal number of categories for a dataset can not be determined in advance, so it is difficult to choose a reasonable value. For example, in this

Data Analysis and Algorithm Tools
The above data analysis and visualization show that the categories and distribution of student behaviors in different schools were different, so it is impossible to use a unified baseline to classify student behaviors. That is, there is no universal student classification method to label each student's category. In other words, the existing classification methods based on expert evaluation and educational standards are not objective and accurate enough and do not consider the diversity of student categories. Therefore, this is an unsupervised machine learning problem. Clustering is an effective method to solve unsupervised learning problems. We mainly used the K-CFSFDP algorithm to analyze the students' behaviors and compared with K-Means and CFSFDP clustering algorithms.
The K-Means clustering algorithm is a classical algorithm that is widely used in fields such as data mining and knowledge discovery. The principle of K-Means clustering algorithm is simple with high efficiency, and it is adapted to big data sets. However, it has two major drawbacks. First, the k value needs to be set in advance, in most cases, the optimal number of categories for a dataset can not be determined in advance, so it is difficult to choose a reasonable value. For example, in this

Data Analysis and Algorithm Tools
The above data analysis and visualization show that the categories and distribution of student behaviors in different schools were different, so it is impossible to use a unified baseline to classify student behaviors. That is, there is no universal student classification method to label each student's category. In other words, the existing classification methods based on expert evaluation and educational standards are not objective and accurate enough and do not consider the diversity of student categories. Therefore, this is an unsupervised machine learning problem. Clustering is an effective method to solve unsupervised learning problems. We mainly used the K-CFSFDP algorithm to analyze the students' behaviors and compared with K-Means and CFSFDP clustering algorithms.
The K-Means clustering algorithm is a classical algorithm that is widely used in fields such as data mining and knowledge discovery. The principle of K-Means clustering algorithm is simple with high efficiency, and it is adapted to big data sets. However, it has two major drawbacks. First, the k value needs to be set in advance, in most cases, the optimal number of categories for a dataset can not be determined in advance, so it is difficult to choose a reasonable value. For example, in this study, the number of student groups in each school may not be the same. Second, in the K-Means algorithm, it is necessary to determine an initial partition based on the initial cluster center and then optimize the initial partition, and the choice of this initial cluster center has a greater impact on the clustering result. Once the initial value is not selected well, it may not be able to obtain effective clustering results. For example, we cannot find a representative from each student category, especially when the number of student categories is unknown.
We proposed the K-CFSFDP algorithm to determine the k value and cluster center based on the characters of the data set.

Clustering by K-Means
The principle of the K-Means clustering algorithm is simple. Given a data set X i (i = 1, 2, 3 . . . n). K value and the cluster centers are specified. Sum of the squared errors (SSE) is the objective function of the K-Means clustering algorithm C represents one class of the cluster result, k represents the number of categories, and x i is the average of one class. When the objective function obtains the minimum value, the clustering effect is optimal.
The step of K-Means clustering algorithm is divided into the following three steps: Step 1: Assign samples to their nearest center vector and reduce the value of the objective function Distance formula between points and points adopts Euclidean distance: p represents k cluster centers. d represents the attribute of x.
Step 2: Update cluster average Step 3: Calculate the objective function When the value of the objective function is the lowest, the cluster effect is optimal.

Determining the K Value and Cluster Center
In order to correct the two drawbacks in the K-Means clustering algorithm. The paper uses the way of determining the cluster centers proposed by Alex Rodriguez and Alessandro Laio in the "CFSFDP" cluster algorithm, which is novel, simple, and fast, and it can find the right number of classifications and the global optimal clustering center according to the data. The core of the algorithm is the description of its cluster center [31].
There are two basic assumptions in the clustering algorithm: • Cluster centers are surrounded by neighbors with lower local density.

•
Cluster centers are at a relatively large distance from any points with a higher local density.
This clustering algorithm can be divided into four steps, which are introduced as follows.
Step 1: Calculate the local density The dataset for clustering is x j represents a certain distance between points x i and x j . According to Cut-off kernel, the local density ρ i of data point i is defined as follows: where χ(x) is defined as follows: Additionally, d c is a cutoff distance that needs to be specified in advance. Based on formula (6), ρ i is the number of data points whose distance is less than d c , without regard to the number of x i itself. To some extent, the parameter d c determines the effect of this clustering algorithm. If d c is too large, the local density value of each data point will be large, resulting in low discrimination. The extreme case is that the value of d c is greater than the maximum distance of all points, so the end result of the algorithm is that all points belong to the same cluster center. If the value of d c is too small, the same group may be split into multiples. The extreme case is that d c has a smaller distance than all points, which will result in each point being a cluster center. The reference method is to select a d c so that the average number of neighbors per data point is about 1-2% of the total number of data points.
Step 2: Calculate the distance A descending sequence of subscripts q i N i=1 was generated.
The distance formula is as follows: For the above formula, when i = 1, δ i is the distance between the data point with the largest distance from x i in S. If i > 2, δ i represents the distance between x i and the data point (or those points) with the smallest distance from x i for all data points with a local density greater than x i .
Step 3: Select the clustering center So far, we could get the (ρ i , δ i ), i ∈ I S of every data point. When considering comprehension, the following formula is used to select the clustering center: For example, Figure 8 contained 20 data points, and the γ i = ρ i δ i , i ∈ I S of every data point could be obtained.
Next, γ was calculated to select the cluster center. Figure 9 shows the γ curve.
So far, we could get the ( , ), ∈ of every data point. When considering comprehension, the following formula is used to select the clustering center: For example, Figure 8 contained 20 data points, and the = ， ∈ of every data point could be obtained.  Next, was calculated to select the cluster center. Figure 9 shows the curve. According to Figure 9, it can be found that for the non-cluster centers, the curve is smoother. Besides this, there is a clear jump between the cluster centers and non-cluster centers.
Step 4: Categorize other data points After the cluster center was determined, the category labels of the remaining points were specified according to the following principles: the category label of the current point was equal to the label of the nearest point higher than the current point density, which will take much time.
The CFSFDP proposed by Alex Rodriguez and Alessandro Laio has the following defects: 1. The density calculation adopts the cut-off kernel function, and the clustering result is very dependent on . 2. The author did not provide a specific distance calculation formula. The distance measurement of different problems is not the same, the specific distance measurement formula should be determined according to the actual problem. 3. The method of categorizing other data points is inefficient. After the cluster centers have been found, each remaining point is assigned to the same cluster as its nearest neighbor of higher density, which will cause unnecessary multiple iterations and repeated calculations. Besides, as the amount of data increases, the amount of calculation will increase sharply, resulting in a long running time of the algorithm, and it is difficult to meet the operational needs of campus big data. In addition, this classification method does not take full advantage of the determined number of clusters and cluster centers.
This study improved the above problems in the K-CFSFDP algorithm, especially the data point classification problem, so as to improve the operating efficiency of the CFSFDP algorithm.

K-CFSFDP Algorithm
Based on the way of determining the k value and cluster center, the K-CFSFDP algorithm was proposed, which mainly includes the following steps: Step 1: Data process: data normalization: According to Figure 9, it can be found that for the non-cluster centers, the curve is smoother. Besides this, there is a clear jump between the cluster centers and non-cluster centers.
Step 4: Categorize other data points After the cluster center was determined, the category labels of the remaining points were specified according to the following principles: the category label of the current point was equal to the label of the nearest point higher than the current point density, which will take much time.
The CFSFDP proposed by Alex Rodriguez and Alessandro Laio has the following defects: 1.
The density calculation adopts the cut-off kernel function, and the clustering result is very dependent on d c .

2.
The author did not provide a specific distance calculation formula. The distance measurement of different problems is not the same, the specific distance measurement formula should be determined according to the actual problem.

3.
The method of categorizing other data points is inefficient. After the cluster centers have been found, each remaining point is assigned to the same cluster as its nearest neighbor of higher density, which will cause unnecessary multiple iterations and repeated calculations. Besides, as the amount of data increases, the amount of calculation will increase sharply, resulting in a long running time of the algorithm, and it is difficult to meet the operational needs of campus big data. In addition, this classification method does not take full advantage of the determined number of clusters and cluster centers.
This study improved the above problems in the K-CFSFDP algorithm, especially the data point classification problem, so as to improve the operating efficiency of the CFSFDP algorithm.

K-CFSFDP Algorithm
Based on the way of determining the k value and cluster center, the K-CFSFDP algorithm was proposed, which mainly includes the following steps: Step 1: Data process: data normalization: We first used formula (1) to standardize the data. This step was implemented in the data preprocessing section.
Step 2: Calculate the density of each point: The clustering set was S = {x i } N i = 1 . We adopted the Gaussian kernel function to calculate the density. The formula is as follows: represents the Euclidean distance between points x i and x j .
Step 3: Calculate the distance value for each point: • Distance formula between points and points adopted the Euclidean distance as shown in formula (4). The d ij was calculated and was d ij Step 4: Calculate the γ value: If the difference is too large, it's necessary to perform a normalization process.
Step 5: Determine the k value and the cluster center: We selected the number of cluster (k value) and cluster center according to the decision graph and initialized the data point classification attribute tag {c i } N i=1 , as follows: [1, k], i f x i is the cluster center and belongs to the f irst k cluster −1, other Step 6: Use Euclidean distances to classify other points (formula 4): For data points that were not cluster centers (c i = −1), we calculated the Euclidean distance between the data point and each cluster center, selected the cluster center with the shortest distance, and classified the data point into the category to which the cluster center belongs.
Compared with CFSFDP, K-CFSFDP has achieved improvements in the following aspects: 1. K-CFSFDP used Gaussian kernel instead of the original cut-off kernel in CFSFDP. Cut-off kernel is a discrete value while Gaussian kernel is a continuous value. Therefore, the Gaussian kernel has a smaller probability of conflict (i.e., different data points have the same local density value). In addition, the Gaussian kernel still satisfies that the more data points whose distance from x i is less than d c , the greater the value of ρ i .

2.
We clarified the measurement method of data point distance in K-CFSFDP.

2.
Using the determined number of clusters and cluster centers, this study optimized the classification of other data points. Each data point only needs to traverse the Euclidean distance to each cluster center to find the nearest cluster, without additional calculations to the distance of other non-cluster center data points. This greatly reduces the computational complexity of the algorithm. Assigning data points to the cluster center closest to it can spend less time to improve efficiency.
Compared to the original K-Means algorithm, the advantages of the K-CFSFDP algorithm is that the algorithm can automatically select the appropriate number of classes and initial cluster center based on the characteristics of the data. This can reduce human involvement in clustering.

Model Performance Metrics
In order to evaluate the performance of the clustering model, in addition to SSE and running time, we also adopted the following evaluation criteria: silhouette coefficient (SC) [48], Calinski-Harabasz index (CHI) [49], and Davies-Bouldin index (DBI) [50]. These are commonly used evaluation criteria for clustering performance measurement.

Silhouette Coefficient (SC)
For a good cluster, the distance between samples of the same category is very small, and the distance between samples of different categories is very large. The silhouette coefficient (SC) can evaluate both characteristics at the same time. A higher silhouette coefficient score relates to a model with better clusters.
The silhouette coefficient s for a single sample is given as: where a is the mean distance between a sample and all other points in the same class and b is the mean distance between a sample and all other points in the next nearest cluster. The silhouette coefficient for a set of samples is given as the mean of the silhouette coefficient for each sample [48].

Calinski-Harabasz Index (CHI)
A higher Calinski-Harabasz score relates to a model with better clusters. For a set of data E of size n E , which has been clustered into k clusters, the Calinski-Harabasz score s is defined as the ratio of the between-clusters dispersion mean and the within-cluster dispersion [49]: where tr(B k ) is trace of the between group dispersion matrix and tr(W k ) is the trace of the within-cluster dispersion matrix defined by: With C q the set of points in cluster q, c q the center of cluster q, c E the center of E, and n q the number of points in cluster q.

Davies-Bouldin Index (DBI)
A lower Davies-Bouldin index relates to a model with better separation between the clusters. The index is defined as the average similarity between each cluster C i for i = 1, . . . , k and its most similar one C j [50]. In the context of this index, similarity is defined as a measure R ij that trades off: s i , the average distance between each point of cluster i and the centroid of that cluster-also known as the cluster diameter. d ij , the distance between cluster centroids i and j.
A simple choice to construct R i j so that it is nonnegative and symmetric is: Then the Davies-Bouldin index is defined as:

The Results of K-CFSFDP
First, we used the K-CFSFDP clustering algorithms to process the four kinds of data. According to this algorithm, the k value and the cluster centers can be determined. Figure 10 is the density, distance decision graph, where the colored points represent the selected cluster center points. γ value is shown as Figure 11. According to the γ value, the γ values of the seven scattered points were relatively large, so the four kinds of data were divided into seven cluster centers. The cluster centers are shown as Table 4, and the clustering results are shown in the Figure 12. γ value is shown as Figure 11. According to the γ value, the γ values of the seven scattered points were relatively large, so the four kinds of data were divided into seven cluster centers. The cluster centers are shown as Table 4, and the clustering results are shown in the Figure 12.

K-Means Clustering Algorithm
According to the traditional K-Means clustering algorithm, k value needs to be specified in advance. We in turn specified the value of k, drew the sum of squares due to error (SSE) curve, and then determined the specific value of k.
Due to the possibility of no solution in the traditional K-Means clustering algorithm, when the K-Means algorithm was initialized, the clustering centers were randomly selected. In order to ensure the validity of the results of the K-Means clustering algorithm. We conducted 10 experiments for each k value of the K-Means algorithm, and took the average of each cluster center in 10 experiments as the final cluster center for each k, then calculated the SSE, and the SSE curve is shown as Figure  13.

K-Means Clustering Algorithm
According to the traditional K-Means clustering algorithm, k value needs to be specified in advance. We in turn specified the value of k, drew the sum of squares due to error (SSE) curve, and then determined the specific value of k.
Due to the possibility of no solution in the traditional K-Means clustering algorithm, when the K-Means algorithm was initialized, the clustering centers were randomly selected. In order to ensure the validity of the results of the K-Means clustering algorithm. We conducted 10 experiments for each k value of the K-Means algorithm, and took the average of each cluster center in 10 experiments as the final cluster center for each k, then calculated the SSE, and the SSE curve is shown as Figure 13.

K-Means Clustering Algorithm
According to the traditional K-Means clustering algorithm, k value needs to be specified in advance. We in turn specified the value of k, drew the sum of squares due to error (SSE) curve, and then determined the specific value of k.
Due to the possibility of no solution in the traditional K-Means clustering algorithm, when the K-Means algorithm was initialized, the clustering centers were randomly selected. In order to ensure the validity of the results of the K-Means clustering algorithm. We conducted 10 experiments for each k value of the K-Means algorithm, and took the average of each cluster center in 10 experiments as the final cluster center for each k, then calculated the SSE, and the SSE curve is shown as Figure  13. According to Figure 13, it can be found that as the value of k increased, the SSE value decreased. When the value of k was 7, the SSE reached the minimum value, as the value of k continued to increase, the SSE gradually increased, so the best number of clusters was 7. The traditional K-Means algorithm clustering effect is shown in Figure 14.  According to Figure 13, it can be found that as the value of k increased, the SSE value decreased. When the value of k was 7, the SSE reached the minimum value, as the value of k continued to increase, the SSE gradually increased, so the best number of clusters was 7. The traditional K-Means algorithm clustering effect is shown in Figure 14. According to Figure 13, it can be found that as the value of k increased, the SSE value decreased. When the value of k was 7, the SSE reached the minimum value, as the value of k continued to increase, the SSE gradually increased, so the best number of clusters was 7. The traditional K-Means algorithm clustering effect is shown in Figure 14.  According to the SSE curve, the best value of k was 7. So, we used the traditional K-Means clustering algorithm to cluster and chose 7 as the value of k. Table 5 is the mean value of the cluster centers under multiple clustering.

CFSFDP Cluster Algorithm
In addition to the traditional K-Means clustering algorithm, we also used the CFSFDP algorithm for clustering. Since the clustering center and the CFSFDP algorithm were consistent in this paper, the improved K-Means clustering algorithm was the same as the clustering center of the algorithm.
The clustering results are shown in Figure 15.
Appl. Sci. 2020, 10, x 18 of 28 According to the SSE curve, the best value of k was 7. So, we used the traditional K-Means clustering algorithm to cluster and chose 7 as the value of k. Table 5 is the mean value of the cluster centers under multiple clustering.

CFSFDP Cluster Algorithm
In addition to the traditional K-Means clustering algorithm, we also used the CFSFDP algorithm for clustering. Since the clustering center and the CFSFDP algorithm were consistent in this paper, the improved K-Means clustering algorithm was the same as the clustering center of the algorithm.
The clustering results are shown in Figure 15.

Evaluation and Comparison of Three Algorithms
The comparison results of each model under different evaluation criteria are shown in Tables 6 and 7. The running time of the three algorithms is shown in Table 8. In order to more clearly reflect the comparison results of the three models, we standardized the results in Tables 6-8

Evaluation and Comparison of Three Algorithms
The comparison results of each model under different evaluation criteria are shown in Tables 6 and 7. The running time of the three algorithms is shown in Table 8. In order to more clearly reflect the comparison results of the three models, we standardized the results in Tables 6-8

Discussion
Now we could analyze university students' behavior based on the results shown in Table 4 and Figure 12. From Table 4, it can be found the students' behaviors were different among four universities. S1 was taken as an example to analyze the students' behaviors. The center point of the first category was (6.0389, 5.47474), 6.0389 was the score of living habit, and 5.47474 was the score of learning performance, the center point indicates that the living habits and learning performance of students in this category were moderate. The center point of the second category was (8.01667, 3.18542), which indicates that students in this category had good living habits but poor learning performance. The fact was adverse in the third type, the center point of the third category was (4.15417, 7.85827), which shows these students performed well in learning performance and badly in living habit. Figure 12 shows that students in these four schools could be divided into seven categories, but the distribution of student behavior categories in different schools was different. From Figure 12, the distribution of most S1 students was located in the top right of the figure. Compared with the three other universities, it could be found that the students performed very differently between each category. Most of the S2 categories were in the lower right of the picture, which indicates that the learning performance was better, but the living habits were poor. S3 and S4 students performed moderately in living habit and learning performance.
We measured and compared the effects of the three algorithms on the behavior clustering of university students by calculating SC, CHI, DBI, and SSE. It can be seen from Table 6 and  that the SC and CHI of K-CFSFDP were higher than those of K-Means and CFSFDP on the four university data sets. It can be seen from Table 7 and Figures 16-19b that the DBI of K-CFSFDP was lower than that of K-Means and CFSFDP on the four university data sets. According to the SSE value, it could be clearly found that the K-CFSFDP algorithm was obviously better than the traditional K-Means clustering algorithm, and the partial SSE of the K-CFSFDP algorithm was less than CFSFDP algorithm. This shows that the clustering effect of K-CFSFDP was better than that of K-Means and CFSFDP, and could better cluster the behavior of university students. The experimental results in Tables 6 and 7 confirmed that because K-CFSFDP could determine the number of student categories (number of clusters) and student representatives (cluster centers), and used the Gaussian kernel function to calculate the point density, it has greater advantages. Therefore, compared with the other two algorithms, K-CFSFDP could better gather students with similar learning behaviors and living habits.

Discussion
Now we could analyze university students' behavior based on the results shown in Table 4 and Figure 12. From Table 4, it can be found the students' behaviors were different among four universities. S1 was taken as an example to analyze the students' behaviors. The center point of the first category was (6.0389, 5.47474), 6.0389 was the score of living habit, and 5.47474 was the score of learning performance, the center point indicates that the living habits and learning performance of students in this category were moderate. The center point of the second category was (8.01667, 3.18542), which indicates that students in this category had good living habits but poor learning performance. The fact was adverse in the third type, the center point of the third category was (4.15417, 7.85827), which shows these students performed well in learning performance and badly in living habit. Figure 12 shows that students in these four schools could be divided into seven categories, but the distribution of student behavior categories in different schools was different. From Figure 12, the distribution of most S1 students was located in the top right of the figure. Compared with the three other universities, it could be found that the students performed very differently between each category. Most of the S2 categories were in the lower right of the picture, which indicates that the learning performance was better, but the living habits were poor. S3 and S4 students performed moderately in living habit and learning performance.
We measured and compared the effects of the three algorithms on the behavior clustering of university students by calculating SC, CHI, DBI, and SSE. It can be seen from Table 6 and Figures 16a, 17a, 18a and 19a that the SC and CHI of K-CFSFDP were higher than those of K-Means and CFSFDP on the four university data sets. It can be seen from Table 7 and Figures 16b, 17b, 18b and 19b that the DBI of K-CFSFDP was lower than that of K-Means and CFSFDP on the four university data sets. According to the SSE value, it could be clearly found that the K-CFSFDP algorithm was obviously better than the traditional K-Means clustering algorithm, and the partial SSE of the K-CFSFDP algorithm was less than CFSFDP algorithm. This shows that the clustering effect of K-CFSFDP was better than that of K-Means and CFSFDP, and could better cluster the behavior of university students. The experimental results in Tables 6 and 7 confirmed that because K-CFSFDP could determine the number of student categories (number of clusters) and student representatives (cluster centers), and used the Gaussian kernel function to calculate the point density, it has greater advantages. Therefore, compared with the other two algorithms, K-CFSFDP could better gather students with similar learning behaviors and living habits.
We compared the running time of the three algorithms based on the same data. From Table 8 and Figures 16b, 17b, 18b and 19b, it could be found that the traditional K-Means algorithm ran faster than the other two algorithms. However, the time spent in its early stage was much longer than the other two algorithms. The running time of K-CFSFDP algorithm was shorter than CFSFDP. Therefore, K-CFSFDP could perform clustering faster and reduced the computing time of large-scale campus data.
Compared with the current studies and methods on the behavior analysis of university students, the method proposed in this paper had considerable advantages in the following aspects: (1) Traditional data mining and supervised machine learning methods often set labels in advance when classifying student behaviors. University student behavior data is often only used to analyze the relationship between behavior characteristics and labels. This does not take into account the diversity of student behavior and the knowledge contained in the data itself, and the evaluation criteria for labels are often subjective. For example, in accordance with the traditional supervised machine learning algorithm (such as decision trees, random forests, etc.), students with a score greater than 60 can be labeled as "good students", and those with scores less than 60 can be labeled as "bad students". Student behavior data and label data are input into a supervised machine learning model for training and analysis. After the training is completed, when the behavior data of a new student is input, the model will output which category the student belongs to. Obviously, this method has two problems. First, it is difficult to quantify the evaluation criteria of the label. People can question: "Why is the label threshold of good students 60 instead of 50 or 70?" Therefore, the judgment of labels is often vague, and the classification results of students are not objective. Second, the environmental conditions of different schools are different. It is possible that the difficulty of the exam of school B is greater than that of school A, and a student with a score of 50 may be defined as a "good student". Therefore, the judgment of student labels is often more complicated and the model is difficult to adjust flexibly. The method proposed in this paper does not need to determine in advance how many types of university students there are, nor does it need to determine in advance which type a student belongs to. This method uses unsupervised clustering to automatically classify students' behavior data based on the similarity, so the result is more objective and accurate, and it can reflect the impact of the university's own characteristics on students' behavior. (2) The traditional K-Means clustering algorithm can select the number of clusters according to the SSE value, but the number of possible classes of the data needs to be estimated in advance. This is unrealistic for unfamiliar large data sets because it is impossible to determine the number of student behavior categories in advance. Additionally, the K-Means clustering algorithm may not find the cluster center. As for CFSFDP, its computing time is relatively long, which cannot be applied to large-scale campus data. The method proposed in this paper combines the advantages of the two algorithms, which can accurately determine the number of student behavior categories and cluster centers, and can also process large-scale university student behavior data at a reasonable speed.

Conclusions
In this paper, the K-CFSFDP algorithm based on K-Means and CFSFDP was proposed to analyze different university students' behaviors. We first introduced the relevant research on the behavior analysis of university students, and clarified that educational data mining was the current development trend. We noticed that clustering was an effective tool for behavior analysis of university students. We found that K-Means clustering algorithm had the disadvantage of not specifying K value and its clustering effect depended on the initial clustering center. Although CFSFDP clustering algorithm had good clustering results, its operation efficiency was low. Under the background of big data, the CFSFDP algorithm will be greatly restricted. Considering the clustering effect and running time, we constructed the K-CFSFDP algorithm as an effective tool to analyze the behavior of university students. In order to verify the effectiveness of the model, we collected data on the learning performance and living habits of 8000 students from four universities, and used K-Means, CFSFDP, and K-CFSFDP to cluster these data. We judged the clustering effect and operating efficiency through the value of SC, CHI, DBI, SSE, and the running time. Comparing and analyzing the experimental results, we could draw the following conclusions:

1.
University students with similar learning performance and living habits in each university gathered into a certain number of sets.

2.
Clustering centers could reflect the behavioral characteristics of a certain category of students in the areas of learning performance and living habits. 3.
The distribution of behavior categories of university students in different schools was not the same. 4.
The K-CFSFDP algorithm could directly specify the appropriate k value and the optimal clustering center. That is, the algorithm could determine the number of student behavior types and behavior scores of each university. 5.
K-CFSFDP had a better clustering effect than K-Means algorithm, and had a shorter running time than CFSFDP algorithm, so it could be applied to the analysis of university students' behavior.
This study had certain practical significance in education and the pedagogical aspect. Teachers or school administrators could better obtain the categories and characteristics of student behavior. The practical value and significance of this study are as follows: First, this study could achieve a scientific and reasonable classification of university students' behavior and was simple to operate. For university decision makers and student administrators, they do not need to judge in advance which category a student belongs to. They only need to collect students' behavioral data, and then input them into the model for clustering and analysis. This facilitates the management process of decision makers and avoids preconceived judgments about student behavior types. Since it is based on the data clustering, the result is more objective and accurate.
Second, this study can help school administrators provide targeted assistance to different student groups. The clustering results can show the differences between each type students, which can help schools better understand student behavior. Schools can analyze and summarize the behavior characteristics of different types of students, and take targeted measures for different types of students to help them have good living habits and perform well in learning.
Third, the results of this research can provide feedback on the management effects of school administrators. For example, if a university has a small number of student groups, it means there exists a high similarity and concentrated distribution of students' behavior. This indicates that student life on campus may be tedious. As a result, school administrators should take measures to enrich students' campus life. If a university's student clusters are more distributed near certain values, or less distributed near certain values, education administrators can rethink what caused the imbalance in student distribution and take targeted improvement measures. For another example, if there are a large number of students in a certain group and a small number of students in another group, education administrators can think about what caused this difference, so as to provide corresponding assistance to the minority student groups. This can avoid ignoring the need of minority student groups.
Fourth, the results of this research can help educators to further analyze what factors can affect and determine the behavior characteristics of students, such as using correlation analysis methods to study the relationship between students' personal characteristics (such as gender, height, weight, etc.) and student behavior.
Fifth, this study provides a benchmark for the behavior classification of university students. Since K-CFSFDP determines the student behavior category, it is equivalent to providing a classification baseline about sample label. Therefore, we can use supervised machine learning methods to analyze university student behavior.
This research was an application innovation of university student behavior analysis. There were application scope and applicable conditions. The application circumstances and data requirements of the algorithm proposed in this study were as follows:

1.
University student behavior data was structured data.
Structured data refers to data that can be represented and stored in a relational database, and is represented in a two-dimensional form. The general characteristics are: the unit of data is "row", a row of data represents the information of an entity, and the attributes of each row of data are the same. In this study, each row of the data represented each student, and each column was the student's behavioral attributes (such as learning habits and living habits).

2.
The behavior classification of college students was an unsupervised learning problem.
Machine learning can be divided into supervised learning and unsupervised learning. There are data and labels in supervised learning, and machine learning can learn a function that maps data to labels. There are many forms of label definition, such as classifying students as "good students" or "bad students" based on the threshold of test scores. The data for unsupervised learning has no labels. In this study, we obtained student behavior data through statistics without any labels. That is, we did not define in advance which categories the students belong to, nor did we define the characteristics of each category, so it was an unsupervised learning problem. This means that from the distribution map of student behavior data (the abscissa is the learning score, the ordinate is the living score), we could not intuitively judge how many categories these data could be divided into, nor could we judge the typical representative data points of each category. We used K-CFSFDP to automatically classify students based on the similarity of data.

3.
The scale of college students' behavior data was relatively large.
First, the collection, calculation, and storage of big data are huge. Second, the dimensionality of the data is higher. Third, the data growth rate is very fast, and the data acquisition and processing speed is required to be fast. Fourth, the data value density is relatively low. In this study, the number of students in each university was often very large, reaching tens of thousands. This study counted data of a total of 8000 students, which had a certain number scale. In addition, we counted the data of each student's study and living habits. There were eight evaluation indicators (as shown in Tables 1  and 2), and the data had a certain dimension.
The study in this paper had the following indications for further research. First, this paper only analyzed student behavior from two dimensions of living habits and learning performance. The behavior of university students has multiple dimensions, such as social behavior, network behavior, etc. Future research will expand the dimensions of student behavior and test the clustering effect of K-CFSFDP on high-dimensional student behavior data. Second, this study only used data from four universities, so the number of data sets was small. In future research, we will investigate more universities to expand the number of data sets, so that we can use a statistical test to analyze and compare the clustering results of different methods. Third, each clustering algorithm has its own distance metric function. Each distance metric function is not suitable for all data. The K-CFSFDP algorithm in this paper is still using Euclidean distance. Different distance metrics should be adopted for different data characteristics. Fourth, the cutoff distance d c in CFSFDP has a significant impact on the clustering results. The K-CFSFDP did not further optimize the d c . Future study will explore how to choose the best d c in K-CFSFDP.