# Analysis of University Students’ Behavior Based on a Fusion K-Means Clustering Algorithm

^{*}

## Abstract

**:**

## 1. Introduction

- We applied the relevant theories and knowledge of data mining and machine learning to the analysis of university students’ behavior. This is an application innovation in the field of machine learning and education.
- Compared with the traditional behavior analysis model of college students, this study used data for analysis, which reduces the subjectivity of human judgment and avoids prejudice caused by preconceptions. Therefore, the analysis results of this study are more objective.
- Most current research used the K-Means algorithm, but the number of student categories and cluster centers are difficult to determine. The K-Means and clustering by fast search and find of density peaks (K-CFSFDP) proposed in this study can automatically determine the number of student behavior types and typical representatives based on data. Therefore, K-CFSFDP has high flexibility and wide applicability, and it also avoids human intervention in the clustering process.
- The K-CFSFDP proposed in this research did not completely rely on the CFSFDP framework, instead, it improved CFSFDP in some aspects. Therefore, the running time is shorter and the running efficiency is higher, which has application advantages in the environment of campus big data.

## 2. Materials and Methods

#### 2.1. Students’ Behavior Data from Four Universities

#### 2.1.1. Behavior Analysis Indicators

#### 2.1.2. Data Normalization

#### 2.1.3. Data Visualization

#### 2.1.4. Data Analysis and Algorithm Tools

#### 2.2. Clustering by K-Means

#### 2.3. Determining the K Value and Cluster Center

- Cluster centers are surrounded by neighbors with lower local density.
- Cluster centers are at a relatively large distance from any points with a higher local density.

- 1.
- The density calculation adopts the cut-off kernel function, and the clustering result is very dependent on ${d}_{c}$.
- 2.
- The author did not provide a specific distance calculation formula. The distance measurement of different problems is not the same, the specific distance measurement formula should be determined according to the actual problem.
- 3.
- The method of categorizing other data points is inefficient. After the cluster centers have been found, each remaining point is assigned to the same cluster as its nearest neighbor of higher density, which will cause unnecessary multiple iterations and repeated calculations. Besides, as the amount of data increases, the amount of calculation will increase sharply, resulting in a long running time of the algorithm, and it is difficult to meet the operational needs of campus big data. In addition, this classification method does not take full advantage of the determined number of clusters and cluster centers.

#### 2.4. K-CFSFDP Algorithm

_{ij}= dist(x

_{i}, x

_{j}) represents the Euclidean distance between points ${x}_{i}$ and ${x}_{j}$.

- Distance formula between points and points adopted the Euclidean distance as shown in formula (4). The ${d}_{ij}$ was calculated and was ${d}_{ij}={d}_{ji},i<j,i,j\in {I}_{S}$.
- According to ${\left\{{\rho}_{i}\right\}}_{i=1}^{N}$, we generated its descending order subscript ${\left\{{q}_{i}\right\}}_{i=1}^{N}$.
- We calculated the distance value ${\left\{{\delta}_{i}\right\}}_{i=1}^{N}$.

- 1.
- K-CFSFDP used Gaussian kernel instead of the original cut-off kernel in CFSFDP. Cut-off kernel is a discrete value while Gaussian kernel is a continuous value. Therefore, the Gaussian kernel has a smaller probability of conflict (i.e., different data points have the same local density value). In addition, the Gaussian kernel still satisfies that the more data points whose distance from ${x}_{i}$ is less than ${d}_{c}$, the greater the value of ${\rho}_{i}$.
- 2.
- We clarified the measurement method of data point distance in K-CFSFDP.
- 2.
- Using the determined number of clusters and cluster centers, this study optimized the classification of other data points. Each data point only needs to traverse the Euclidean distance to each cluster center to find the nearest cluster, without additional calculations to the distance of other non-cluster center data points. This greatly reduces the computational complexity of the algorithm. Assigning data points to the cluster center closest to it can spend less time to improve efficiency.

#### 2.5. Model Performance Metrics

#### 2.5.1. Silhouette Coefficient (SC)

#### 2.5.2. Calinski–Harabasz Index (CHI)

#### 2.5.3. Davies–Bouldin Index (DBI)

- ${s}_{i}$,
- the average distance between each point of cluster $i$ and the centroid of that cluster—also known as the cluster diameter.
- ${d}_{ij}$,
- the distance between cluster centroids $i$ and $j$.

## 3. Results

#### 3.1. The Results of K-CFSFDP

#### 3.2. K-Means Clustering Algorithm

#### 3.3. CFSFDP Cluster Algorithm

#### 3.4. Evaluation and Comparison of Three Algorithms

## 4. Discussion

- (1)
- Traditional data mining and supervised machine learning methods often set labels in advance when classifying student behaviors. University student behavior data is often only used to analyze the relationship between behavior characteristics and labels. This does not take into account the diversity of student behavior and the knowledge contained in the data itself, and the evaluation criteria for labels are often subjective. For example, in accordance with the traditional supervised machine learning algorithm (such as decision trees, random forests, etc.), students with a score greater than 60 can be labeled as “good students”, and those with scores less than 60 can be labeled as “bad students”. Student behavior data and label data are input into a supervised machine learning model for training and analysis. After the training is completed, when the behavior data of a new student is input, the model will output which category the student belongs to. Obviously, this method has two problems. First, it is difficult to quantify the evaluation criteria of the label. People can question: “Why is the label threshold of good students 60 instead of 50 or 70?” Therefore, the judgment of labels is often vague, and the classification results of students are not objective. Second, the environmental conditions of different schools are different. It is possible that the difficulty of the exam of school B is greater than that of school A, and a student with a score of 50 may be defined as a “good student”. Therefore, the judgment of student labels is often more complicated and the model is difficult to adjust flexibly. The method proposed in this paper does not need to determine in advance how many types of university students there are, nor does it need to determine in advance which type a student belongs to. This method uses unsupervised clustering to automatically classify students’ behavior data based on the similarity, so the result is more objective and accurate, and it can reflect the impact of the university’s own characteristics on students’ behavior.
- (2)
- The traditional K-Means clustering algorithm can select the number of clusters according to the SSE value, but the number of possible classes of the data needs to be estimated in advance. This is unrealistic for unfamiliar large data sets because it is impossible to determine the number of student behavior categories in advance. Additionally, the K-Means clustering algorithm may not find the cluster center. As for CFSFDP, its computing time is relatively long, which cannot be applied to large-scale campus data. The method proposed in this paper combines the advantages of the two algorithms, which can accurately determine the number of student behavior categories and cluster centers, and can also process large-scale university student behavior data at a reasonable speed.

## 5. Conclusions

- University students with similar learning performance and living habits in each university gathered into a certain number of sets.
- Clustering centers could reflect the behavioral characteristics of a certain category of students in the areas of learning performance and living habits.
- The distribution of behavior categories of university students in different schools was not the same.
- The K-CFSFDP algorithm could directly specify the appropriate k value and the optimal clustering center. That is, the algorithm could determine the number of student behavior types and behavior scores of each university.
- K-CFSFDP had a better clustering effect than K-Means algorithm, and had a shorter running time than CFSFDP algorithm, so it could be applied to the analysis of university students’ behavior.

- 1.
- University student behavior data was structured data.

- 2.
- The behavior classification of college students was an unsupervised learning problem.

- 3.
- The scale of college students’ behavior data was relatively large.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Keating, X.D.; Guan, J.; Castro-Piñero, J.; Bridges, D.M. A Meta-Analysis of College Students’ Physical Activity Behaviors. J. Am. Coll. Health
**2005**, 54, 116–125. [Google Scholar] [CrossRef] [PubMed] - Belingheri, M.; Facchetti, R.; Scordo, F.; Butturini, F.; Turato, M.; De Vito, G.; Cesana, G.; Riva, M.A. Risk behaviors among Italian healthcare students: A cross-sectional study for health promotion of future healthcare workers. La Med. del Lav.
**2019**, 110, 155–162. [Google Scholar] - Park, S.Y.; Nam, M.-W.; Cha, S.-B. University students’ behavioral intention to use mobile learning: Evaluating the technology acceptance model. Br. J. Educ. Technol.
**2012**, 43, 592–605. [Google Scholar] [CrossRef] - Kormos, J.; Csizér, K. The Interaction of Motivation, Self-Regulatory Strategies, and Autonomous Learning Behavior in Different Learner Groups. TESOL Q.
**2014**, 48, 275–299. [Google Scholar] [CrossRef] - Lee, M.W.; Chen, S.Y.; Chrysostomou, K.; Liu, X. Mining students’ behavior in web-based learning programs. Expert Syst. Appl.
**2009**, 36, 3459–3464. [Google Scholar] [CrossRef] - Luo, J.; Sorour, S.E.; Goda, K.; Mine, T. Predicting Student Grade Based on Free-Style Comments Using Word2Vec and ANN by Considering Prediction Results Obtained in Consecutive Lessons. Int. Educ. Data Min. Soc.
**2015**, 396–399. [Google Scholar] - Arat, G.; Wong, P.W. Examining the Association Between Resilience and Risk Behaviors Among South Asian Minority Students in Hong Kong: A Quantitative Study. J. Soc. Serv. Res.
**2019**, 45, 360–372. [Google Scholar] [CrossRef] - Zullig, K.J.; Divin, A.L. The association between non-medical prescription drug use, depressive symptoms, and suicidality among college students. Addict. Behav.
**2012**, 37, 890–899. [Google Scholar] [CrossRef] - Natek, S.; Zwilling, M. Student data mining solution–knowledge management system related to higher education institutions. Expert Syst. Appl.
**2014**, 41, 6400–6407. [Google Scholar] [CrossRef] - Yadav, S.K.; Bharadwaj, B.; Pal, S. Data mining applications: A comparative study for predicting student’s performance. arXiv
**2012**. [Google Scholar] - Saenz, V.B.; Hatch, D.; Bukoski, B.E.; Kim, S.; Lee, K.-H.; Valdez, P. Community College Student Engagement Patterns. Community Coll. Rev.
**2011**, 39, 235–267. [Google Scholar] [CrossRef] - Rapp, K.; Büchele, G.; Jähnke, A.G.; Weiland, S.K. A cluster-randomized trial on smoking cessation in German student nurses. Prev. Med.
**2006**, 42, 443–448. [Google Scholar] [CrossRef] [PubMed] - Battaglia, O.R.; Di Paola, B.; Fazio, C. A New Approach to Investigate Students’ Behavior by Using Cluster Analysis as an Unsupervised Methodology in the Field of Education. Appl. Math.
**2016**, 7, 1649–1673. [Google Scholar] [CrossRef] [Green Version] - Quintiliani, L.M.; Allen, J.; Marino, M.; Kelly-Weeder, S.; Li, Y. Multiple health behavior clusters among female college students. Patient Educ. Couns.
**2010**, 79, 134–137. [Google Scholar] [CrossRef] [Green Version] - Head, M.; Ziolkowski, N. Understanding student attitudes of mobile phone features: Rethinking adoption through conjoint, cluster and SEM analyses. Comput. Hum. Behav.
**2012**, 28, 2331–2339. [Google Scholar] [CrossRef] - Patton, G.; Bond, L.; Carlin, J.B.; Thomas, L.; Butler, H.; Glover, S.; Catalano, R.; Bowes, G. Promoting Social Inclusion in Schools: A Group-Randomized Trial of Effects on Student Health Risk Behavior and Well-Being. Am. J. Public Health
**2006**, 96, 1582–1587. [Google Scholar] [CrossRef] - Cilibrasi, R.; Vitanyi, P.M.B. A Fast Quartet tree heuristic for hierarchical clustering. Pattern Recognit.
**2011**, 44, 662–677. [Google Scholar] [CrossRef] [Green Version] - Mirzaei, A.; Rahmati, M. A Novel Hierarchical-Clustering-Combination Scheme Based on Fuzzy-Similarity Relations. IEEE Trans. Fuzzy Syst.
**2009**, 18, 27–39. [Google Scholar] [CrossRef] - Xiao, J.; Xu, Q.; Wu, C.; Gao, Y.; Hua, T.; Xu, C. Performance Evaluation of Missing-Value Imputation Clustering Based on a Multivariate Gaussian Mixture Model. PLoS ONE
**2016**, 11, e0161112. [Google Scholar] [CrossRef] - Wang, X.; Liu, G.; Li, J.; Nees, J.P. Locating Structural Centers: A Density-Based Clustering Method for Community Detection. PLoS ONE
**2017**, 12, e0169355. [Google Scholar] [CrossRef] [Green Version] - Peng, K.; Leung, V.C.M.; Huang, Q. Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data. IEEE Access
**2018**, 6, 11897–11906. [Google Scholar] [CrossRef] - Niukkanen, A.; Arponen, O.; Nykänen, A.; Masarwah, A.; Sutela, A.; Liimatainen, T.; Vanninen, R.; Sudah, M. Quantitative Volumetric K-Means Cluster Segmentation of Fibroglandular Tissue and Skin in Breast MRI. J. Digit. Imaging
**2018**, 31, 425–434. [Google Scholar] [CrossRef] [PubMed] - Yuhui, P.; Yuan, Z.; Huibao, Y. Development of a representative driving cycle for urban buses based on the K-means cluster method. Clust. Comput.
**2019**, 22, 6871–6880. [Google Scholar] [CrossRef] - Slamet, C.; Rahman, A.; Ramdhani, M.A.; Darmalaksana, W. Clustering the verses of the Holy Qur’an using K-means algorithm. Asian J. Inf. Technol.
**2016**, 15, 5159–5162. [Google Scholar] - Huang, X.; Ye, Y.; Zhang, H. Extensions of Kmeans-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation. IEEE Trans. Neural Netw. Learn. Syst.
**2013**, 25, 1433–1446. [Google Scholar] [CrossRef] [PubMed] - Liu, C.-L.; Chang, T.-H.; Li, H.-H. Clustering documents with labeled and unlabeled documents using fuzzy semi-Kmeans. Fuzzy Sets Syst.
**2013**, 221, 48–64. [Google Scholar] [CrossRef] - Antonenko, P.D.; Toy, S.; Niederhauser, D.S. Using cluster analysis for data mining in educational technology research. Educ. Technol. Res. Dev.
**2012**, 60, 383–398. [Google Scholar] [CrossRef] - Yang, C.Y.; Liu, J.Y.; Huang, S. Research on EARLY warning system of college students’ behavior based on big data environment. ISPRS-Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci.
**2020**, 42, 659–665. [Google Scholar] [CrossRef] [Green Version] - Sorour, S.E.; Mine, T.; Goda, K.; Hirokawax, S. A Predictive Model to Evaluate Student Performance. J. Inf. Process.
**2015**, 23, 192–201. [Google Scholar] [CrossRef] - Wang, W.; Song, W.; Liu, S.-X.; Zhang, Y.; Zheng, H.-Y.; Tian, W. A cloud detection algorithm for MODIS images combining Kmeans clustering and multi-spectral threshold method. Guang pu xue yu guang pu fen xi = Guang pu
**2011**, 31, 1061–1064. [Google Scholar] - Rodríguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science
**2014**, 344, 1492–1496. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Cuell, C.; Bonsal, B. An assessment of climatological synoptic typing by principal component analysis and kmeans clustering. Theor. Appl. Clim.
**2009**, 98, 361–373. [Google Scholar] [CrossRef] - Liu, Z.; Guo, Z.; Tan, M. Constructing Tumor Progression Pathways and Biomarker Discovery with Fuzzy Kernel Kmeans and DNA Methylation Data. Cancer Inform.
**2008**, 6, 1–7. [Google Scholar] [CrossRef] [PubMed] - Rashidi, F.; Nejatian, S.; Parvin, H.; Rezaie, V. Diversity based cluster weighting in cluster ensemble: An information theory approach. Artif. Intell. Rev.
**2019**, 52, 1341–1368. [Google Scholar] [CrossRef] - Deng, X.; Chen, J.; Li, H.; Han, P.; Yang, W. Log-cumulants of the finite mixture model and their application to statistical analysis of fully polarimetric UAVSAR data. Geo-Spat. Inf. Sci.
**2018**, 21, 45–55. [Google Scholar] [CrossRef] - Mojarad, M.; Parvin, H.; Nejatian, S.; Rezaie, V.; Rezaei, V. Consensus Function Based on Clusters Clustering and Iterative Fusion of Base Clusters. Int. J. Uncertain. Fuzziness Knowl.-Based Syst.
**2019**, 27, 97–120. [Google Scholar] [CrossRef] - Abbasi, S.-O.; Nejatian, S.; Parvin, H.; Rezaie, V.; Bagherifard, K. Clustering ensemble selection considering quality and diversity. Artif. Intell. Rev.
**2019**, 52, 1311–1340. [Google Scholar] [CrossRef] - Bidgoli, B.M.; Parvin, H.; Alinejad-Rokny, H.; Alizadeh, H.; Punch, W.F. Effects of resampling method and adaptation on clustering ensemble efficacy. Artif. Intell. Rev.
**2014**, 41, 27–48. [Google Scholar] [CrossRef] - Alizadeh, H.; Minaei-Bidgoli, B.; Parvin, H. Cluster ensemble selection based on a new cluster stability measure1. Intell. Data Anal.
**2014**, 18, 389–408. [Google Scholar] [CrossRef] [Green Version] - Parvin, H.; Beigi, A.; Mozayani, N. A clustering ensemble learning method based on the ant colony clustering algorithm. Int. J. Appl. Comput. Math.
**2012**, 11, 286–302. [Google Scholar] [CrossRef] - Parvin, H.; Minaei-Bidgoli, B.; Alinejad-Rokny, H.; Punch, W.F. Data weighing mechanisms for clustering ensembles. Comput. Electr. Eng.
**2013**, 39, 1433–1450. [Google Scholar] [CrossRef] - Parvin, H.; Minaei-Bidgoli, B. A clustering ensemble framework based on elite selection of weighted clusters. Adv. Data Anal. Classif.
**2013**, 7, 181–208. [Google Scholar] [CrossRef] - Nazari, A.; Dehghan, A.; Nejatian, S.; Rezaie, V.; Parvin, H. A comprehensive study of clustering ensemble weighting based on cluster quality and diversity. Pattern Anal. Appl.
**2019**, 22, 133–145. [Google Scholar] [CrossRef] - Mojarad, M.; Nejatian, S.; Parvin, H.; Mohammadpoor, M. A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clusters. Appl. Intell.
**2019**, 49, 2567–2581. [Google Scholar] [CrossRef] - Bagherinia, A.; Bidgoli, B.M.; Hossinzadeh, M.; Parvin, H. Elite fuzzy clustering ensemble based on clustering diversity and quality measures. Appl. Intell.
**2019**, 49, 1724–1747. [Google Scholar] [CrossRef] - Parvin, H.; Minaei-Bidgoli, B. A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Anal. Appl.
**2015**, 18, 87–112. [Google Scholar] [CrossRef] - Zhao, W.; Yan, L.; Zhang, Y. Geometric-constrained multi-view image matching method based on semi-global optimization. Geo-Spat. Inf. Sci.
**2018**, 21, 115–126. [Google Scholar] [CrossRef] [Green Version] - Peter, R.J.; Mathematics, A. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.
**1999**, 20, 53–65. [Google Scholar] - Calinski, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat.-Theory Meth.
**1974**, 3, 1–27. [Google Scholar] [CrossRef] - Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell.
**1979**, 224–227. [Google Scholar] [CrossRef]

**Figure 8.**Example and schematic: (

**A**) distribution of random points and (

**B**) the $\rho $ and $\delta $ values of each point.

**Figure 13.**Sum of squares due to error (SSE) curve of K-Means algorithm from 4 universities: (

**a**) S1; (

**b**) S2; (

**c**) S3; and (

**d**) S4.

**Figure 16.**Clustering performances of three models under different evaluation criteria in dataset S1: (

**a**) the higher the value, the better and (

**b**) the lower the value, the better.

**Figure 17.**Clustering performances of three models under different evaluation criteria in dataset S2: (

**a**) the higher the value, the better and (

**b**) the lower the value, the better.

**Figure 18.**Clustering performances of three models under different evaluation criteria in dataset S3: (

**a**) the higher the value, the better and (

**b**) the lower the value, the better.

**Figure 19.**Clustering performances of three models under different evaluation criteria in dataset S4: (

**a**) the higher the value, the better and (

**b**) the lower the value, the better.

Index | Type | Note |
---|---|---|

Regular diet | Numerical value | The number of days per month |

Physical Exercise | Numerical value | The number of days per month |

Regular rest | Numerical value | The number of days per month |

Normal consumption | Numerical value | The number of days per month |

Index | Type | Note |
---|---|---|

Average score | Numerical value | The number of scores per month |

Attendance rate | Numerical value | The number of attendances per month |

study-time | Numerical value | The hours of studying per month |

Book reading | Numerical value | The number of books per month |

Number | S1 | S2 | S3 | S4 | ||||
---|---|---|---|---|---|---|---|---|

1 | 6.07368 | 5.55742 | 7.66723 | 6.00785 | 7.55621 | 7.50524 | 6.67336 | 7.32494 |

2 | 6.14057 | 5.66159 | 6.55764 | 1.74391 | 8.17891 | 7.65729 | 6.00577 | 6.74123 |

3 | 7.22919 | 5.99426 | 5.68571 | 3.41523 | 8.11347 | 8.21814 | 5.87413 | 6.55677 |

4 | 6.12385 | 5.81432 | 5.55817 | 2.78966 | 7.60752 | 8.22783 | 6.97421 | 7.36017 |

1997 | 8.69287 | 1.67829 | 6.11535 | 3.34144 | 9.66132 | 4.19352 | 4.55479 | 5.19746 |

1998 | 7.48125 | 2.52936 | 5.78211 | 2.66147 | 7.88631 | 4.75178 | 8.56978 | 3.67127 |

1999 | 8.36386 | 1.64275 | 5.66985 | 2.40725 | 7.25812 | 4.99851 | 3.33214 | 4.87413 |

2000 | 8.45574 | 4.23727 | 7.61459 | 3.42789 | 7.33275 | 4.63247 | 6.11267 | 3.39741 |

**Table 4.**The cluster centers of K-Means and clustering by fast search and find of density peaks (K-CFSFDP).

Center | S1 | S2 | S3 | S4 | ||||
---|---|---|---|---|---|---|---|---|

1 | 6.03289 | 5.74414 | 8.32161 | 6.38682 | 5.16315 | 6.04157 | 6.48546 | 7.47637 |

2 | 8.01669 | 3.18551 | 5.64966 | 2.47307 | 5.33499 | 2.75543 | 3.03581 | 6.96716 |

3 | 4.15422 | 7.85831 | 2.56964 | 7.40917 | 2.28359 | 3.47640 | 7.31236 | 4.29811 |

4 | 8.21340 | 7.33396 | 5.42544 | 2.32415 | 4.40999 | 4.06838 | 6.68811 | 7.70098 |

5 | 8.54662 | 1.66175 | 8.11425 | 2.41816 | 7.64964 | 7.92441 | 5.33691 | 5.16572 |

6 | 3.37340 | 5.61049 | 4.44462 | 6.11261 | 3.62245 | 5.93139 | 6.67577 | 2.67338 |

7 | 1.66517 | 3.49263 | 7.41792 | 4.91789 | 7.59799 | 4.57699 | 3.72421 | 3.66076 |

Center | S1 | S2 | S3 | S4 | ||||
---|---|---|---|---|---|---|---|---|

1 | 8.27400 | 2.39930 | 2.59326 | 7.40172 | 4.46283 | 4.08395 | 7.54576 | 4.89052 |

2 | 4.13315 | 7.68235 | 5.44932 | 4.33228 | 5.46431 | 2.86967 | 6.65832 | 2.67032 |

3 | 7.77590 | 7.66245 | 8.36084 | 6.38125 | 7.68175 | 4.50673 | 5.54568 | 5.14083 |

4 | 1.71074 | 3.49110 | 5.67678 | 2.42658 | 7.68820 | 7.87882 | 4.08892 | 5.44782 |

5 | 3.37812 | 5.62245 | 4.44752 | 6.11293 | 3.49562 | 5.95816 | 3.76442 | 3.64857 |

6 | 6.06610 | 5.73412 | 7.41603 | 4.88391 | 2.30457 | 3.41682 | 2.82903 | 7.02875 |

7 | 8.03260 | 3.21768 | 8.08883 | 2.36514 | 5.15896 | 6.10679 | 6.45891 | 7.35572 |

**Table 6.**Silhouette coefficient (SC) and Calinski–Harabasz index (CHI) value of the three algorithms.

Criteria | SC | CHI | ||||
---|---|---|---|---|---|---|

University | K-Means | CFSFDP | K-CFSFDP | K-Means | CFSFDP | K-CFSFDP |

S1 | 0.720163 | 0.773615 | 0.78262114 | 18654.66 | 17171.17 | 20351.42166 |

S2 | 0.629851 | 0.653892 | 0.67294505 | 8548.027 | 7939.654 | 9330.545438 |

S3 | 0.511018 | 0.551514 | 0.56356586 | 4042.819 | 4053.911 | 4316.641802 |

S4 | 0.516445 | 0.526225 | 0.56055753 | 3389.293 | 3115.909 | 3572.915619 |

**Table 7.**Davies–Bouldin index (DBI) and sum of the squared errors (SSE) value of the three algorithms.

Criteria | DBI | SSE | ||||
---|---|---|---|---|---|---|

University | K-Means | CFSFDP | K-CFSFDP | K-Means | CFSFDP | K-CFSFDP |

S1 | 0.47214 | 0.283085 | 0.27428561 | 168.6821 | 169.2103 | 168.5100431 |

S2 | 0.573574 | 0.427371 | 0.41893048 | 159.0415 | 157.6216 | 158.5206654 |

S3 | 0.723378 | 0.581062 | 0.56537985 | 149.0415 | 148.9562 | 142.2402269 |

S4 | 0.751687 | 0.611766 | 0.56924453 | 165.5415 | 156.224 | 163.5586647 |

Type Algorithm | S1 | S2 | S3 | S4 |
---|---|---|---|---|

K-CFSFDP | 8.667326 | 7.011259 | 9.043949 | 8.7854173 |

K-Means | 0.435833 | 1.091194 | 0.831946 | 0.823445 |

CFSFDP | 12.740592 | 10.269540 | 13.293498 | 12.880990 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chang, W.; Ji, X.; Liu, Y.; Xiao, Y.; Chen, B.; Liu, H.; Zhou, S.
Analysis of University Students’ Behavior Based on a Fusion K-Means Clustering Algorithm. *Appl. Sci.* **2020**, *10*, 6566.
https://doi.org/10.3390/app10186566

**AMA Style**

Chang W, Ji X, Liu Y, Xiao Y, Chen B, Liu H, Zhou S.
Analysis of University Students’ Behavior Based on a Fusion K-Means Clustering Algorithm. *Applied Sciences*. 2020; 10(18):6566.
https://doi.org/10.3390/app10186566

**Chicago/Turabian Style**

Chang, Wenbing, Xinpeng Ji, Yinglai Liu, Yiyong Xiao, Bang Chen, Houxiang Liu, and Shenghan Zhou.
2020. "Analysis of University Students’ Behavior Based on a Fusion K-Means Clustering Algorithm" *Applied Sciences* 10, no. 18: 6566.
https://doi.org/10.3390/app10186566