Analysis of College Students’ Consumption Behavior Data Based on Fractional-Order Firefly Optimization Clustering Algorithm

Meng, Xiang; He, Qi; Dong, Yanhua; Sun, Hongyu

doi:10.3390/app15147723

Open AccessArticle

Analysis of College Students’ Consumption Behavior Data Based on Fractional-Order Firefly Optimization Clustering Algorithm

¹

College of Mathematics and Computer, Jilin Normal University, Siping 136000, China

²

Jilin Earthquake Agency, Siping 136000, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(14), 7723; https://doi.org/10.3390/app15147723

Submission received: 17 May 2025 / Revised: 5 July 2025 / Accepted: 6 July 2025 / Published: 10 July 2025

Download

Browse Figures

Versions Notes

Abstract

Data mining-based student consumption behavior analysis is an important part of smart campus construction, which could find students’ eating patterns and consumption levels. Therefore, data mining-based student consumption behavior analysis became a hot topic both in research and industry areas. For an increasing amount of data, traditional data mining algorithms are not suitable. The clustering algorithm is becoming more and more important in the field of data mining, but the traditional clustering algorithm does not take the clustering efficiency and clustering effect into consideration. In this paper, the algorithm based on k-means and clustering by fractional-order firefly algorithm (FFA-k-means), which optimizes the clustering centers algorithm, is proposed. This method is used to cluster students from colleges. The experiment shows that the algorithm proposed in this paper has better clustering results compared with the traditional k-means clustering algorithm. Additionally, through the analysis results, it can be found that the problem of the group of students with too few times of consumption, the problem of a low number of students’ consumption of three meals, and the proportion of living diets is too low. The causes and characteristics of these problems are used as a reference for colleges to take corresponding measures timely.

Keywords:

smart campus; fractional-order firefly algorithm; k-means; consumer behavior analysis; clustering

1. Introduction

With the development of information technology, many higher education institutions have established information management and monitoring systems [1]. Therefore, timely and effective mining of students’ behavioral characteristics has become particularly important. Student consumption data constitutes a vital component of university student datasets. As a branch of behavioral research in higher education, the analysis of student consumption behavior not only explores students’ dietary patterns and consumption levels but also provides references for evaluating financial aid. However, traditional campus management concepts and data analysis methods can no longer meet the growing demand for data processing. How to effectively manage and share campus data, optimize student management using data mining approaches, and provide students with clearer and more detailed data services has become challenges facing today’s campus service systems.

There have been preliminary studies on student consumption utilizing data mining technology both domestically and internationally. S. Fan employed the k-means clustering algorithm to analyze student consumption data based on two indicators: total consumption amount and average consumption amount. This study also explored the association between student behavior and academic performance [2]. Jiang T. disaggregated the data, mined student consumption patterns, and analyzed the current status of canteen operations [3]. J. A. Cook conducted behavioral profiling of college students using various data from universities, providing a comprehensive description of college students’ information from multiple dimensions. Through data mining techniques, a cluster analysis of these behavioral profiles was performed to uncover potential patterns in students’ academic and social lives [4]. Chai Zheng analyzed students’ “One Card” consumption records to assess their spending levels, subsequently employing neural network-based data mining methods to identify at-risk students [5]. X. Jiang utilized the forward and backward sequential pattern mining algorithm, “NegI-NSP,” to analyze students’ consumption data, establishing a correlation between consumption patterns and academic performance to examine their relationship [6]. Yang. C. Y. proposed an early warning system for college student behavior using the Hadoop open-source platform [7]. Traditional clustering analysis, specifically the k-means algorithm, has been used to mine large datasets on campus, studying students’ behavioral characteristics and patterns [8,9,10,11].

Most of the aforementioned studies have focused on analyzing student consumption behavior through clustering techniques [12,13,14]. Clustering algorithms are a crucial aspect of data mining [15]. The goal of clustering is to group objects in such a way that those within the same class exhibit greater similarity, while objects in different classes show greater dissimilarity. Common clustering methods include density-based [16], model-based [17], hierarchical [18], and partitioning clustering [19]. These algorithms have applications across various fields. For instance, evolutionary clustering algorithms like iECA* have proven effective for grouping medical diseases [20]. To demonstrate how rider expertise affects postural coordination, Iman proposed a framework for automatically analyzing athlete behavior using cluster analysis [21]. Hussain introduced a centrality-clustering method called UICPC, enhancing its effectiveness in identifying cellular groupings [22]. Kamal developed a new version of spectral clustering named text-associated DeepWalk-Spectral Clustering (TADW-SC) for attributed networks, ensuring structural cohesiveness and attribute homogeneity among identified protein complexes [23]. Chen proposed a novel method for predicting and classifying ventricular arrhythmias (VA), particularly fatal VA, before they occur [24]. Some of the studies referenced above utilized supervised machine learning methods. However, since the number of student types is often unknown beforehand, unsupervised machine learning methods are necessary for clustering.

Among the various clustering algorithms, the k-means algorithm [25,26,27] is widely used for analyzing student consumption behavior due to its speed, ease of understanding, and efficiency [28]. However, traditional k-means has limitations, including a lack of flexibility with fixed weights, dependency on initial cluster centers, and the requirement for a pre-specified number of clusters (K). These limitations restrict the algorithm’s applicability. In recent years, various meta-heuristic algorithms have been proposed to optimize clustering algorithms, overcoming the shortcomings of k-means and achieving better results. Some of these meta-heuristics include the monarch butterfly optimization (MBO) algorithm [29], the slime mold algorithm (SMA) based on the feeding behavior of slime molds [30], the moth swarm algorithm (MSA) inspired by moth navigation toward moonlight [31,32], and the Harris hawks optimization (HHO) algorithm that simulates predatory behavior [33]. The firefly algorithm, as a meta-heuristic clustering intelligence algorithm [34], has garnered increasing attention from researchers since its inception. It boasts advantages such as a simple principle, clear process, and ease of implementation, and has been successfully applied in fields like image processing, computer networks, and engineering design [35,36,37]. Lin proposed a fuzzy clustering method based on the firefly algorithm to enhance clustering accuracy [38]. Zhou applied the firefly algorithm to determine the insulation levels of DC transmission lines, improving the insulation of UHV DC lines [39]. Wang developed a traffic signal timing optimization method based on the firefly algorithm, effectively reducing emissions at intersections [40]. However, the firefly algorithm typically relies on previous memory during its search process. To address the issue of local optima, the introduction of fractional-order memory—based on the properties of memory and genetic processes—has been suggested. This paper employs a fractional-order firefly-optimized k-means clustering algorithm to enhance clustering performance effectively.

The main contributions and innovations of this paper are as follows:

(1) The proposed methodology involves the utilization of a clustering algorithm for the analysis of students’ spending patterns. In comparison with the conventional statistical behavior analysis model, this clustering-based approach has the capacity to reduce the impact of human factors and thereby facilitate the more objective identification of students in need of financial aid.

(2) We proposed three models: the overall consumption model, the living diet model, and the average consumption model for morning, lunch, and dinner. These models seek to evaluate students’ consumption levels by leveraging data from the university’s digital system, thereby offering a more comprehensive representation of their consumption patterns.

(3) In order to address the issues inherent to the k-means algorithm, namely the challenges associated with determining the optimal number of student clusters and the random allocation of cluster centers, a methodology is proposed that utilizes the elbow rule and the Calinski–Harabasz index to determine the number of clusters. Concurrently, the fractional-order firefly algorithm is employed to select the initial cluster centers. This approach has been demonstrated to be effective in addressing the issue of the algorithm’s sensitivity to the initial value and its susceptibility to local optimization. In comparison with the conventional k-means algorithm, the proposed method facilitates more rational clustering.

(4) Based on the aforementioned models, we conduct a comprehensive analysis of students’ consumption characteristics. The experimental findings have significant practical implications, assisting college administrative personnel in enhancing their operational efficiency. Furthermore, these results provide reliable data for colleges to accurately assess students’ economic situations and inform decisions regarding grant allocations. This approach elevates the scientific rigor and precision of targeted poverty alleviation initiatives within universities, ensuring that resources are effectively distributed to those in genuine need.

The remainder of this paper is organized as follows: In Section 2, we provide an overview of related work, outline the proposed general objectives, and present the framework and flowchart. Section 3 details the specific process of analyzing students’ consumption behavior, offering step-by-step instructions for the analysis. In Section 4, we present and discuss the experimental results, examining them from multiple perspectives. Finally, we summarize the work performed, highlight the main findings and contributions, and propose potential directions for future research.

2. Materials and Methods

2.1. The Goal of Student Consumption Behavior Analysis

Most studies on students’ consumption behavior rely heavily on administrators’ personal experiences and often lack individualized insights into the students themselves. The individual differences and complexities among students present significant challenges for administrators, making it difficult for them to quickly and accurately understand the current situation. Additionally, students exhibiting abnormal behaviors may intentionally provide misleading information in an effort to appear “normal.” Due to constraints in time and energy, capturing the true state of these students becomes exceedingly challenging. To address these issues, administrators are expected to implement efficient data management and sharing practices, transitioning from a reactive to a proactive approach. By leveraging the principles of big data mining, they can conduct in-depth and effective analyses of diverse datasets. This approach not only aims to optimize student management strategies but also prioritizes the safeguarding of students’ privacy. Furthermore, it facilitates the provision of clearer and more comprehensive data services tailored to student needs, ultimately enhancing the overall quality and effectiveness of educational administration. The goals of student behavior analysis are depicted in Figure 1.

As shown in Figure 1, initially, data should be collected at regular intervals, with the corresponding database synchronized periodically to ensure data integrity and timeliness. The collected data must then undergo processes of storage, transformation, and processing, followed by standardization for effective management. Next, an analytical model based on the student behavior model and a dedicated data analysis system for visual presentation should be established. Ultimately, these efforts will lead to the achievement of behavioral analysis and management objectives.

2.2. Behavior Analysis Framework

The framework diagram of student consumption behavior is shown in Figure 2. The framework consists of six parts: student data collection, data preprocessing, statistical analysis, modeling, clustering the built model, and analyzing the clustering results.

2.3. Application of Clustering Algorithm in Student Consumption Behavior Analysis

The clustering algorithm operates without the need for labeled information from students, making it a valuable tool for uncovering latent patterns in their behavioral tendencies. In the context of analyzing students’ consumption behavior, this algorithm effectively reveals hidden structures and trends. The step-by-step process of applying the clustering algorithm to analyze students’ consumption behavior is illustrated in Figure 3.

As shown in Figure 3, the first and crucial step is to gather student behavior data. This data collection serves as an essential prerequisite for subsequent analysis. The more comprehensive and rich the collected data is, the better it facilitates the construction of accurate models and in-depth analyses. Next, the raw data often contains a significant amount of noise, necessitating a preprocessing step to clean and prepare the data for analysis. Following this, statistical analysis provides a preliminary understanding of consumption characteristics. Subsequently, three models are established, and clustering indicators are selected. The clustering process is then executed using an improved k-means algorithm, resulting in the acquisition of clustering outcomes. These results form the foundation for more in-depth exploration. Finally, a thorough analysis is conducted to examine the consumption characteristics and behavioral traits of each cluster based on the obtained clustering results.

3. Analysis of Students’ Consumption Behavior

3.1. Dataset

The dataset utilized in this paper originates from a college’s digital system database, primarily gathered from four categories: student ID information, card consumption records, access control data, and classroom information, as illustrated in Figure 4. Analyzing the student data revealed that card data and student ID information exhibited a higher level of activity. Consequently, this paper conducts an in-depth exploration and analysis of these two data categories. The extracted raw data comprises a sample of 4341 students, which serves as the experimental dataset. The dataset is divided into a training set and a test set at a ratio of 7:3.

The original data displays several irregularities. Specifically, the exported data from the system contains numerous issues, including incomplete entries, inconsistent information, duplicate records, and noisy data elements. Therefore, data preprocessing is essential before modeling. The first step involves removing duplicate entries. In instances where a card transaction is marked as “spent”, but the spending amount remains identical, these indicate duplicate expenditures that need to be eliminated. Upon examining the student consumption records for missing values, it was found that two columns contained a significant proportion of missing data, which were subsequently removed from the dataset. Next, the dataset was analyzed for outliers in consumption timestamps, followed by the extraction of records pertaining solely to consumption categories. Given that the operational hours of the college canteen are from 6:00 to 24:00, any consumption records recorded between 0:00 and 6:00 were identified as outliers and removed from the dataset. Additionally, an analysis of the distribution of the “CardCount” feature revealed several records with excessively high values that fell outside the normal range. Finally, the processed card data and student ID information were merged using the campus card number as the primary key.

3.2. Statistical Analysis

To accurately identify clustering indicators and comprehensively assess student consumption levels, we conducted a multidimensional analysis of student consumption behaviors and cafeteria operations from three distinct perspectives: the number of meals attended at each canteen during the morning, afternoon, and evening; meal times on weekdays and weekends; and the Pearson linear correlation coefficient of student consumption. The dataset was partitioned to facilitate subsequent data processing and feature extraction.

3.2.1. Meal Times at Breakfast, Lunch, and Dinner

We segment the time into three meal periods: breakfast, lunch, and dinner. Consumption records containing the term “canteen” were filtered, and these records were categorized into the corresponding meal periods. Any records that did not fit within these three timeframes were excluded from the analysis. The operation times for the three meals are summarized in Table 1.

In general, when students are having meals, they tend to use their cards multiple times to purchase various types of food. Thus, defining the number of meals based solely on the number of card swipes is not feasible. Therefore, we introduce the following definition for meal occurrences: if the time interval between two consecutive card-swipes of a student is less than 10 min (600 s), these two swipes are counted as one meal.

3.2.2. Distribution of Meal Times on Weekday and Non-Workday Dining

First, we tally the number of meal occurrences within each ten-minute interval. If no meals occur during a specific interval, that interval is recorded with a value of 0 to indicate absence. Additionally, the analysis considers legal holidays and transfer periods, alongside regular weekend breaks, necessitating further detailed processing. Finally, a graph is constructed with time on the horizontal axis and the average number of meals on the vertical axis to visually present the data.

3.2.3. Pearson’s Linear Correlation Coefficient Analysis of Student Consumption

The Pearson correlation coefficient, also known as the Pearson product moment correlation coefficient [41,42], is a linear correlation coefficient. The Pearson correlation coefficient is a statistical measure used to reflect the degree of linear correlation between two variables. The correlation coefficient is denoted by r; it describes the strength of the linear relationship between the two variables. A higher absolute value of the correlation coefficient indicates a stronger correlation. The Pearson correlation coefficient is calculated using Equation (1). In this study, we extract information such as consumption locations and consumption data categorized by morning, noon, and evening periods, including the average consumption amount and the number of consumption instances. Subsequently, the Pearson correlation coefficient is employed to measure the degree of similarity between the average consumption amount and the number of consumption instances.

β_{x, y} = \frac{C O V (X, Y)}{σ_{X} σ_{Y}} = \frac{E ((X - μ_{X}) (Y - μ_{Y}))}{σ_{X} σ_{Y}} = \frac{E (X Y) - E (X) E (Y)}{\sqrt{E (X^{2}) - E^{2} (X) \sqrt{E (Y^{2}) - E^{2} (Y)}}}

(1)

The formula is defined as the Pearson correlation coefficient

ρ_{X, Y}

of two continuous variables (X, Y) equal to the product of their covariance

cov (X, Y)

divided by their respective standard deviations

σ_{X} σ_{Y}

. The coefficient always takes values between −1.0 and 1.0, with variables close to 0 being considered uncorrelated and those close to 1 or −1 considered to have a strong correlation.

3.2.4. Statistical Result Visualization

According to the statistical analysis of the consumption data, breakfast, lunch, and dinner times at the dining canteens are illustrated in Table 2 and Figure 5.

In terms of breakfast, the second canteen has the highest number of diners, accounting for 50.3% of the total. This indicates that breakfast provided by the second canteen is the most popular choice among students. Following closely are the fifth and first canteens, with proportions of diners at 31.3% and 17.3%, respectively. The third and fourth canteens see almost negligible participation during breakfast. For lunch, the distribution of diners is more balanced. The fourth canteen leads with 27.6% of diners, followed by the fifth canteen at 22.6%. The second and third canteens account for 20.1% and 18.5%, respectively, while the first canteen lags behind with only 11.1%. The dinner attendance mirrors that of lunch, though there is a slightly more pronounced disparity in the distribution of diners. The fourth canteen again has the highest attendance, with 28.5%, followed closely by the second canteen at 21.9% and the fifth canteen at 21.2%. In contrast, the first and third canteens show less encouraging numbers, with market shares of just 11.2% and 17.2%, respectively.

Based on this analysis, we offer the following suggestions for the canteens: First, the third and fourth canteens have minimal presence in the breakfast market. Therefore, it may be prudent to consider either reducing to a single service window or eliminating breakfast altogether to cut costs. Second, the first canteen is underperforming across breakfast, lunch, and dinner services. To improve its competitiveness, strategies such as reducing prices, diversifying the menu, and enhancing the dining environment could be beneficial. Finally, since the second and fifth canteens are popular, expanding their operations and adding more service windows could help alleviate the current dining pressure.

The statistical analysis of consumption data indicates that the dining curves of the canteen on working days and non-working days are presented in Figure 6 and Figure 7, respectively.

Statistical analysis of the consumption data reveals the cafeteria’s dining patterns on and off weekdays, as shown in Figure 6 and Figure 7. On weekdays, there are three distinct peak hours corresponding to breakfast, lunch, and dinner. The breakfast peak starts at 7:20 a.m. and ends at 7:50 a.m., with a peak at 7:25 a.m. The lunch peak is concentrated at 11:50 a.m., with a large number of diners 20 min before and after the peak; and the dinner peak is relatively smooth, lasting about 1.5 h before and after the peak, with a peak at 18:00. The lunch peak is concentrated at 11:50 a.m., with a large number of diners 20 min before and after the peak.

In addition to these three significant peaks, it is worth noting that the graph shows a small peak in the second cafeteria at 9:35. In addition, at 16:20, not only the second cafeteria but also the fifth cafeteria showed a relatively significant minor peak.

On rest days, breakfast service extends over a longer period, but the peak is less crowded. The peak times vary among the three most frequented canteens: the first canteen peaks at 8:50, the fifth at 7:35, and the second canteen consistently peaks between 7:30 and 9:10. Lunchtime peaks across all canteens occur between 11:30 and 12:00, with a secondary peak at the first canteen around 12:10. In contrast, the peak period for dinner is less intense, occurring between 17:00 and 18:30. Generally, the differences in peak dining times for lunch and dinner between working and rest days are not significant; however, the breakfast peak on rest days is noticeably prolonged.

Based on this analysis, we recommend the following: First, increase food supply during peak meal times on working days, while reducing supply during breakfast on non-working days to minimize waste. Second, given that peak dining times on working days align closely with class schedules, the college should consider implementing staggered arrival and departure times for students. Additionally, food preparation could begin one to two hours before peak times to improve efficiency. Finally, it may be advisable to keep only the second and fifth canteens open on weekends, as their capacity is sufficient to meet dining needs while also reducing operational costs.

The results of Pearson’s linear correlation coefficients for the students’ three meals are shown in Table 3.

The Pearson’s linear correlation coefficient table for student consumption revealed a negative linear relationship between the average expenditure on three meals and the number of meals consumed. The correlation is strongest for breakfast and weakest for dinner.

3.3. Behavior Analysis Model

In college activities, students exhibit diverse consumption behaviors. To accurately assess their economic status, it is crucial to utilize suitable tools. Drawing on the statistical findings and analyses from the previous stage, three distinct models have been developed.

3.3.1. Overall Consumption Model

First, a general consumption model is developed that can portray the economic situation of students, namely the DFM model. It is based on the classical RFM model [43,44] and is extended and optimized. The RFM model is an important tool for measuring customers’ consumption behavior. However, considering that the research subject of this paper is college students and the latest consumption indicator primarily refers to the daily consumption of college students on campus, this indicator lacks significant reference value. Students often make purchases while at university. Therefore, the “Deposit” indicator can be added. The DFM model has three indicators, and their meanings are presented in Table 4.

In this model, indicator D assesses whether a student can sustain their spending, making it an essential component for evaluating overall spending behavior. A higher F indicator indicates a poorer economic situation for students. Conversely, lower consumption frequency suggests that students’ economic conditions exert more significant constraints on their spending behavior. The M indicator serves as the foundation for all consumption behaviors, directly reflecting students’ consumption capacities.

3.3.2. Living Diet Model

To further investigate students’ consumption patterns, we segmented overall monthly consumption into specific categories, thereby constructing a model for daily dietary and living expenses. This model also features three indicators, with their meanings outlined in Table 5.

3.3.3. Average Consumption Model for Breakfast, Lunch and Dinner

Additionally, we meticulously segmented data on living and dietary consumption based on students’ daily dietary needs. An average consumption model encompassing breakfast, lunch, and dinner has been established, including four indicators, as shown in Table 6.

3.4. Clustering Process

Before conducting the clustering analysis, the data were subjected to standardization processing. This approach helps reflect the true clustering characteristics of the data, thereby providing a reliable basis for selecting an appropriate k-value for subsequent clustering algorithms.

3.4.1. Determining the K Value

The Elbow Method [45] is used to explain and verify the consistency of the cluster analysis to help find the right number of clusters in the dataset [46]. The most important metric of the Elbow Method is the SSE (sum of squared errors).

J_{I} = \sum_{a = 1}^{K} \sum_{Y_{N \in C_{k}}} {|Y_{n} - C_{o} (I)|}^{2}

(2)

where

C_{k}

is the k-th cluster,

Y_{n}

is the sample points in

C_{k}, C_{o} (I)

is the center of mass of

C_{k}

(the mean of all samples in

C_{k}

), and

J_{I}

is the clustering error of all samples.

The sum of squared distance errors (SSE) represents the total of the squared distances between each cluster’s centroid and the sample points within that cluster, known as the degree of distortion. A lower distortion degree indicates that cluster members are closely grouped, while a higher degree suggests a looser cluster structure. As the number of categories increases, the degree of distortion typically decreases. However, for data with a certain level of differentiation, a critical point may be reached beyond which significant improvements in distortion occur, followed by a gradual decrease. This critical point is regarded as optimal for clustering performance. Therefore, the relationship between SSE and the number of clusters (K) forms an elbow shape, with the corresponding K value at the elbow being the optimal number of clusters for the data [47].

The Calinski–Harabasz index [48] is utilized to calculate the optimal clustering value. Its methodology entails computing the sum of the squares of the distances between any point within the same cluster and the central point, and then measuring the degree of separation, which is the sum of the squares of the distances between the center of each cluster and the centers of other clusters. The closer the points are to other points within the same cluster, the farther they are from the points in different clusters. The larger the Calinski–Harabasz (CH) index is, the better the clustering performance will be. The ratio of the separation degree and the compactness is the CH index (Equation (3)).

C H (K) = \frac{S S B}{S S W} \times \frac{(n - k)}{(k - 1)}

(3)

where n is the number of all data points, K is the number of clusters, SSB is the covariance matrix between clusters, and SSW is the covariance matrix of the same cluster. SSB and SSW are calculated as shown in Equation (4) and Equation (6), respectively.

S S B = t r (B_{K})

(4)

B_{K} = \sum_{q = 1}^{K} n_{q} (c_{q -} c_{E}) {(c_{q -} c_{E})}^{T}

(5)

S S W = t r (W_{K})

(6)

W_{K} = \sum_{q = 1}^{K} \sum_{X_{q}} (χ - C_{q}) {(X - C_{q})}^{T}

(7)

where

c_{q}

is the set of all data in class q,

c_{E}

is the centroid of all data, and

n_{q}

is the total number of data points in class q.

3.4.2. Determining Clustering Center

A notable shortcoming of the conventional k-means algorithm is its considerable sensitivity to the selection of initial clustering centers. The selection of different centers has the potential to yield significantly divergent clustering outcomes. The proposed solution to this problem is an innovative approach that employs the fractional-order firefly algorithm (FFA) to optimize the clustering center search process in the k-means algorithm. This fusion effectively mitigates the inherent defects of the k-means algorithm and improves its clustering performance.

The firefly algorithm mimics the information exchange patterns of fireflies, which attract mates and potential prey. It can also serve as a protective warning mechanism. The firefly algorithm idealizes the flashing characteristics of fireflies based on the following three idealized rules: (1) All fireflies are unisex, meaning one firefly can attract another regardless of gender. (2) Attractiveness is directly proportional to brightness; thus, a dimmer firefly will move towards a brighter one. (3) A firefly’s brightness is determined by the objective function.

In the firefly algorithm, brighter fireflies attract less bright fireflies. Each firefly will fly towards those of a brighter hue in order to ascertain a superior position. It has been demonstrated that the brightness of a firefly has a direct correlation with the strength of its attraction to other fireflies. Concurrently, the attraction between fireflies is inversely proportional to the spatial distance, signifying that the farther apart two fireflies are, the weaker the attraction between them.

Define the position of each firefly as X (x1, x2,…, xD), and represent the relative luminous intensity of the firefly as shown in Equation (8).

I = I_{0} e^{- γ^{r_{i j}}}

(8)

where

I_{0}

indicates the brightness of the brightest firefly, that is, its own (

= 0

) fluorescence brightness; γ indicates the light absorption coefficient, which reflects the fact that the fluorescence gradually weakens as the distance increases due to the effect of the transmission medium.

r_{ij}

indicates the distance between firefly i and firefly j.

The formula for calculating the relative attraction between fireflies is as Equation (9):

β (r) = β_{0} e^{- γ^{r_{i j}^{2}}}

(9)

In Equation (9),

β_{0}

represents the initial attraction, that is, the attraction when the distance between two fireflies is 0. During the operation of the algorithm, each firefly will move toward all fireflies with higher brightness than itself, and its position update Equation is as Equation (10).

X_{i} = X_{i} + β_{0} e^{- γ^{r_{i j}^{2}}} (X_{j} - X_{i}) + α (r a n d - 0.5)

(10)

where

X_{i}

denotes the location of a firefly that is brighter than the ith individual, rand() is a random perturbation, α is the step factor of the perturbation, and generally rand() is a random number generator uniformly distributed in [0, 1].

The basic process of the firefly algorithm is as follows: Compare the brightness of fireflies. Fireflies with lower brightness will be attracted by those with higher brightness and move towards them. Their positions are updated according to Equation (10), and this process is repeated in a loop until all individuals converge to the brightest firefly, indicating the optimal position [49]. The main principle of the Firefly Algorithm is shown in Algorithm 1.

Algorithm 1. Main Process of the Firefly Algorithm
Input	Objective function $f (x)$ , initial population $x_{i} (i = 1, 2, \dots, n)$ light absorption coefficient $γ$ , MaxGeneration
Output	Best solution $x_{b e s t}$ , optimized objective value $f (x_{b e s t})$
1	Begin
2	Generate initial population of fireflies $x_{i} (i = 1, 2, \dots, n)$
3	Evaluate light intensity $I_{i}$ for each firefly using $f (x_{i})$
4	while $t < M a x G e n e r a t i o n$ do
5	for each firefly i from 1 to n do
6	for each firefly j from 1 to i do
7	if $I_{j} > I_{i}$ then
8	Move firefly i towards j in d-dimension
9	Update attractiveness using $e x p [- γ r]$
10	End if
11	End for
12	Evaluate new solutions and update $I_{i}$
13	End for
14	Rank fireflies and find current best $x_{b e s t}$
15	Increment t
16	End while
17	Postprocess results and visualization
18	Return $x_{b e s t}$ and $f (x_{b e s t})$
19	End

Fractional-order calculus generalized the classical theory of differentiation by introducing the concepts of real or complex order integrals and derivatives [38,50]. Fractional-order calculus dates back to the beginning of calculus theory, but it is only in recent decades that it has been found to play an important role in modeling physical, biological, and social phenomena [51]. The fractional-order term is introduced into the standard firefly position update formulation by utilizing the last memory of each firefly during the search process of the firefly algorithm, which effectively improves the issue of getting trapped in local optima. In this paper, the firefly algorithm is defined through the Gamma function. This function converts the original integers into fractions, and the gamma function is employed to extend the order from the integer order to the fractional order. Equation (9) can be rewritten as Equation (11):

X_{i} (t + 1) - X_{i} (t) = β_{r_{i j}} (X_{j} - X_{i}) + α (r a n d - 0.5)

(11)

Using only the first r = 4 terms of the differential equation given in Equation (11), the position update equation is obtained as Equation (12):

X (t + 1) = v X (t) + \frac{1}{2} v (1 - v) x (t - 1) - \frac{1}{6} v (1 - v) (2 - v) x (t - 2) + \frac{1}{24} v (1 - v) (2 - v) (3 - v) x (t - 3) + β_{r_{i j}} (X_{j} - X_{i}) + α (r a n d - 0.5)

(12)

The main procedure of the fractional-order firefly optimization clustering algorithm k-means algorithm (FFA-k-means) is shown in Algorithm 2. The specific steps of the algorithm implementation are as follows:

Step 1. Utilize the Elbow Method and the Calinski–Harabasz index to determine the optimal number of clusters, K.

Step 2. Implement the fractional-order firefly algorithm (FFA).

Step 3. Use the cluster centers obtained from the FFA as the initial centroids for the k-means algorithm, serving as the starting points for the clustering process.

Step 4. Compute the Euclidean distance between the remaining sample points and the cluster centers.

Step 5. Assign each sample to the cluster that is most similar to it based on the calculated Euclidean distances.

Step 6. Determine the average value for each cluster to establish the new center of that cluster.

Step 7. Repeat Steps 4, 5, and 6 until it stops when the cluster centers no longer change.

Algorithm 2. Main Process of FFA-k-Means
Input	Dataset $X = \{x_{1}, x_{2}, x_{n}\}$ , Maximum K value $K_{m a x}$ , FFA parameters
Output	Final cluster centroids $C = \{c_{1}, c_{2}, c_{k}\}$ , Cluster assignments $L = \{l_{1}, l_{2}, l_{n}\}$
1	Begin
2	Determine Optimal K: Apply Elbow Method by calculating WCSS
3	for k = 1 to Kmax:
4	a. Compute Calinski–Harabasz index for each k.
5	b. Select K with the best trade-off
6	End for
7	Fractional-Order FFA: Initialize firefly population with random positions
8	For each iteration:
9	a. Calculate brightness
10	b. Update firefly positions using fractional-order attraction rules.
11	c. Apply Lévy flight for exploration.
12	d. Return best solutions as initial centroids $C F F A = {c_{1}^{}, c_{2}^{}, c_{K}^{*}}$ .
13	End for
14	K-means Initialization: Set C←CFFA as initial centroids for K-means.
15	For each sample $x_{i} \in X$ :
16	a. Compute Euclidean distance $d (x_{i}, c_{j})$ to all centroids $c_{j} \in C$ .
17	b. Assign label $l_{i} = a r g m i n j d (x_{i}, c_{j})$ .
18	End for
19	Cluster Assignment: Update L with new labels from Step 16.
20	For each cluster j:
21	Recompute centroid $c_{j}$ as mean of all samples $x_{i}$ where $l_{i} = j$ .
22	End for
23	If C did not change from previous iteration:
24	Return C and L.
25	Else
26	Repeat Steps 15–22.
27	End

3.4.3. Evaluation Methodology

The Davies–Bouldin Index (DBI) [52] is a metric used to evaluate the effectiveness of clustering algorithms. It primarily focuses on measuring the distance between clusters i and j. The DBI shows a positive correlation for “intra-cluster” scenarios and a negative correlation for “inter-cluster” situations [53]. A lower DBI value indicates a more effective clustering outcome. The formula for the Davies–Bouldin index is provided in Equation (13).

D B I = \frac{1}{K} \sum_{i = 1}^{K} \begin{matrix} \max \\ j \neq i \end{matrix} (\frac{S_{i} + S_{j}}{d_{i, j}})

(13)

where

S_{i}

is the measure of scatter within cluster i, calculated as Equation (14),

S_{i} = \frac{1}{|C_{i}|} \sum_{x_{j} \in c_{i}} ‖X_{j} - V_{i}‖

(14)

where

x_{j}

is the n-dimensional feature vector assigned to cluster i,

V_{i}

is the center of cluster i.

C_{i}

denotes cluster I,

d_{i, j}

is the Euclidean distance from the center of cluster i to j. The calculation is as shown in Equation (15):

d_{i, j} = ‖V_{i} - V_{j}‖

(15)

4. Experiment Result and Analysis

4.1. Optimal Number of Clusters

The number of student types for the overall consumption model, living diet model, and average consumption model for breakfast, lunch, and dinner was determined to be 4, 3, and 3, respectively, using the Elbow Method and Calinski–Harabasz indices, as shown in Figure 8. Figure 8a,c,e depict the process of determining the K value using the Sum of Squared Errors (SSE), the most critical metric in the Elbow Method. In contrast, Figure 8b,d,f illustrate the procedure for identifying the k value based on the Calinski–Harabasz Index. In these figures, the corresponding metric values are highlighted by red vertical lines.

4.2. Comparison

The three models were subjected to clustering analysis using a k-means algorithm optimized by a fractional-order firefly algorithm. The Davies–Bouldin Index (DBI) was employed to evaluate the clustering performance. Combining the previously established optimal k-values, this study compared the traditional k-means algorithm with PSO-k-means, GA-k-means, and FFA-k-means algorithms. The results of the experimental analysis are presented in Table 7 and Figure 9.

The findings indicate that the FFA-k-means algorithm demonstrates a lower DBI score and superior clustering performance, proving more effective in identifying latent groups of college students exhibiting similar patterns.

4.3. Analysis of Clustering Results

Based on the improved k-means clustering algorithm outlined in this paper, a clustering analysis was performed on the overall consumption model, with the clustering centers displayed in Table 8.

The analysis reveals distinct consumption patterns associated with each category of students:

(1) Type 1: This group is relatively small and may represent an extreme consumption cohort.

(2) Type 2: Characterized by a high average monthly balance, total consumption, and consumption frequency, this group engages in numerous campus activities.

(3) Type 3: These individuals show a concentrated distribution with three notable characteristics: a low average monthly balance, minimal total consumption, and infrequent consumption patterns, indicating limited engagement in campus activities.

(4) Type 4: This group is more dispersed, with consumption levels that fall within an intermediate range.

Table 9 presents the results of the clustering centers for the living diet model derived from the improved k-means clustering algorithm.

(1) Type 1 students tend to have lower living and other consumption levels. They usually prefer to dine outside of college, particularly for food.

For such students, University administrators should prioritize food safety and personal health for vulnerable students. One recommended approach is for universities to collaborate with off-campus food regulatory authorities to create a “whitelist for off-campus dining.” This initiative would involve auditing the hygiene standards of nearby restaurants and disseminating safety guidelines for off-campus dining to students, thereby reducing food safety risks.

(2) Type 2 students are characterized by the highest living consumption but low other consumption and minimal card surplus, indicating a lower overall consumption level. These students often eat on campus, and their consumption patterns are stable, suggesting that their families face economic challenges and practice frugality.

For this situation, universities should provide extra support and resources for students from low-income backgrounds and pay closer attention to their living conditions. For instance, universities could establish “dedicated service windows for economically disadvantaged students” in campus dining facilities, offering low-cost, nutritious meal packages. Consumption data from campus cards can be utilized to automatically identify these students for non-symptomatic subsidies. Meanwhile, universities can collaborate with enterprises to donate daily necessities, which are distributed through a point-based redemption system to avoid the psychological pressure caused by direct financial assistance.

(3) Type 3 students exhibit the highest other living expenses and have the largest card surpluses, along with elevated living and dining costs. This indicates that they belong to a high-consumption group. Universities should facilitate students’ understanding of the detriments of irrational consumption and provide systematic guidance for rational consumption behaviors.

Based on the improved k--means clustering algorithm in this dissertation, the students were classified into three categories after the average consumption patterns of morning, midday, and evening were clustered and analyzed. According to the information presented in Table 10, the characteristics of students in each category are as follows:

(1) Students classified as Type 1 and Type 2 exhibit infrequent dining habits on campus. Type 1 students display a significantly elevated average consumption per meal during lunch, while Type 2 students report the highest average consumption per meal during dinner. It is recommended to set up special food windows in the campus canteen and update the menu regularly to attract students to dine on campus.

(2) Type 3 students demonstrate lower average expenditure per meal for breakfast, lunch, and dinner while maintaining a higher monthly dining frequency. This suggests that they tend to spend less on individual meals but dine on campus more often, reflecting frugality that may indicate economic pressures on their families. Therefore, when allocating resources for economically disadvantaged students, prioritizing this group is advised. In addition to regular scholarships, recommendations for on-campus work-study positions can be provided to ease their financial burdens effectively.

(3) Type 4 students maintain relatively stable average consumption per meal for breakfast, lunch, and dinner, as well as consistent meal frequency.

5. Conclusions

In this paper, an FFA-k-means algorithm is proposed, which is based on the k-means algorithm and the fractional order firefly algorithm. The purpose of the paper is to analyze the characteristics of students’ consumption behavior. In this study, data in time series format were initially collected, which were subsequently preprocessed to extract relevant features. Subsequently, three models were constructed to provide a preliminary analysis of students’ spending power and behavioral patterns. In conclusion, the clustering results, derived from the enhanced k-means algorithm, can provide useful information for professional management regarding the consumption characteristics of various types of students. For instance, educational institutions may consider offering students reduced catering consumption in exchange for work-study opportunities, and may also grant them priority in the selection of scholarships. Furthermore, it is imperative to closely monitor the health and safety status of students who exhibit a markedly reduced food and beverage intake. In subsequent research, it is imperative to further refine this study. Specifically, this can be achieved by integrating multidimensional data sources (including, but not limited to, travel records, self-service laundry usage data, and bathing records) in order to gain a more comprehensive understanding of students’ consumption behavior. Furthermore, the psychological factors affecting students with low consumption patterns will be explored in greater depth by incorporating psychological research findings. To elaborate further, an in-depth examination will be conducted to ascertain the correlation between students’ consumption behavioral characteristics and their academic performance, as well as their eventual field of employment.

Author Contributions

Conceptualization, X.M.; methodology, X.M.; software, Q.H.; validation, X.M. and Q.H.; formal analysis, X.M.; investigation, Q.H.; resources, Y.D.; data curation, Y.D.; writing—original draft preparation, X.M.; writing—review and editing, H.S.; visualization, X.M. and Q.H.; supervision, H.S.; project administration, Y.D.; funding acquisition, Y.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the China University Industry-University Research Innovation Fund New Generation Information Technology Innovation Project (no. 2020ITA05017); Science and Technology Research Project of Jilin Provincial Education Department (no. JJKH20210456KJ, JJKH20210457KJ) and Science and Natural Science Foundation of Jilin Province (no.20210101176JC).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The release of original data containing detailed behavioral records is restricted to safeguard individual privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nie, M.; Yang, L.; Sun, J. Advanced forecasting of career choices for college students based on campus big data. Front. Comput. Sci. 2018, 12, 494–503. [Google Scholar] [CrossRef]
Fan, S.; Li, P.; Liu, T.; Chen, Y. Population behavior analysis of chinese university students via digital campus cards. In Proceedings of the IEEE International Conference on Data Mining Workshops (ICDMW), Atlantic, NJ, USA, 14–17 November 2015; pp. 72–77. [Google Scholar]
Jiang, T.; Cao, J.; Su, D. Analysis and data mining of students’ consumption behavior based on a campus card system. In Proceedings of the 2017 International Conference on Smart City and Systems Engineering, Washington, DC, USA, 11–12 November 2017; pp. 58–60. [Google Scholar]
Dong, X.; Hu, Y.; Chen, Y. Research and analysis of college students’ behavior portrait based on campus data. Comput. Digit. Eng. 2018, 46, 1200–1204. [Google Scholar]
Zheng, C.; Lili, Q.; Vip, P. Neural network model of precise financial support for poor students in colleges and universities. Pract. Underst. Math. 2018, 48, 85–91. [Google Scholar]
Jiang, X.; Xu, T.; Dong, X. Campus data analysis based on positive and negative sequential patterns. Int. J. Pattern Recognit. Artif. Intell. 2019, 33, 1959016. [Google Scholar] [CrossRef]
Yang, C.Y.; Liu, J.Y.; Huang, S. Research on EARLY warning system of college students’ behavior based on big data environment. ISPRS-Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 42, 659–665. [Google Scholar] [CrossRef]
Quintiliani, L.M.; Allen, J.; Marino, M.; Kelly-Weeder, S.; Li, Y. Multiple health behavior clusters among female college students. Patient Educ. Couns. 2010, 79, 134–137. [Google Scholar] [CrossRef] [PubMed]
Head, M.; Ziolkowski, N. Understanding student attitudes of mobile phone features: Rethinking adoption through conjoint, cluster and SEM analyses. Comput. Hum. Behav. 2012, 28, 2331–2339. [Google Scholar] [CrossRef]
Patton, G.; Bond, L.; Carlin, J.B.; Thomas, L.; Butler, H. Promoting Social Inclusion at colleges: A Group-Randomized Trial of Effects on Student Health Risk Behavior and Well-Being. Am. J. Public Health 2006, 96, 1582–1587. [Google Scholar] [CrossRef]
Yao, H.; Lian, D.; Cao, Y.; Wu, Y.; Zhou, T. Predicting academic performance for college students: A campus behavior perspective. ACM Trans. Intell. Syst. Technol. 2019, 10, 1–21. [Google Scholar] [CrossRef]
Miller, C.H.; Sacchet, M.D.; Gotlib, I.H. Support vector machines and affective science. Emot. Rev. 2020, 12, 297–308. [Google Scholar] [CrossRef]
Kim, S.; Kim, C. Influence diagnostics in support vector machines. J. Korean Stat. Soc. 2020, 49, 757–778. [Google Scholar] [CrossRef]
Cook, J.A.; Siddiqui, S. Random forests and selected samples. Bull. Econ. Res. 2020, 72, 272–287. [Google Scholar] [CrossRef]
Ma, C.; Wang, B.; Jooste, K.; Zhang, Z.; Ping, Y. Practical privacy-preserving frequent itemset mining on supermarket transactions. IEEE Syst. J. 2020, 14, 1992–2002. [Google Scholar] [CrossRef]
Laohakiat, S.; Saing, V. An incremental density-based clustering. Inf. Sci. 2021, 547, 404–426. [Google Scholar] [CrossRef]
Meng, W.; Zhang, P.; Qingguo, F.; Fei, G. Modified micro-mechanics based multiscale model for progressive failure prediction of 2D twill woven composites. Chin. J. Aeronaut. 2020, 33, 2070–2087. [Google Scholar]
Mirzaei, A.; Rahmati, M. A Novel Hierarchical-Clustering-Combination Scheme Based on Fuzzy-Similarity Relations. IEEE Trans. Fuzzy Syst. 2009, 18, 27–39. [Google Scholar] [CrossRef]
Sabo, K.; Scitovski, R. An approach to cluster separability in a partition. Inf. Sci. 2015, 305, 208–218. [Google Scholar] [CrossRef]
Hassan, B.A.; Rashid, T.A.; Hamarashid, H.K. A Novel Cluster Detection of COVID-19 Patients and Medical Disease Conditions Using Improved Evolutionary Clustering Algorithm Star. Comput. Biol. Med. 2021, 138, 104866. [Google Scholar] [CrossRef]
Trabelsi, I.; Hérault, R.; Baillet, H.; Thouvarecq, R.; Seifert, L.; Gasso, G. Identifying patterns in trunk/head/elbow changes of riders and non-riders: A cluster analysis approach. Comput. Biol. Med. 2022, 143, 105193. [Google Scholar] [CrossRef]
Chowdhury, H.A.; Bhattacharyya, D.K.; Kalita, J.K. UICPC: Centrality-based Clustering for scRNA-seq Data Analysis without User Input. Comput. Biol. Med. 2021, 137, 104820. [Google Scholar] [CrossRef]
Berahmand, K.; Nasiri, E.; Mohammadiani, R.P.; Li, Y. Spectral clustering on protein-protein interaction networks via constructing affinity matrix using attributed graph embedding. Comput. Biol. Med. 2021, 138, 104933. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Das, S.; Morgan, J.M.; Maharatna, K. Prediction and classification of ventricular arrhythmia based on phase-space reconstruction and fuzzy c-means clustering. Comput. Biol. Med. 2022, 142, 105180. [Google Scholar] [CrossRef]
Saha, J.; Mukherjee, J. Cluster number assisted K-means. Pattern Recognit. 2021, 110, 107625. [Google Scholar] [CrossRef]
Zhang, Z.; Feng, Q.; Huang, J.; Guo, Y.; Xu, J.; Wang, J. A local search algorithm for k-means with outliers. Neurocomputing 2021, 450, 230–241. [Google Scholar] [CrossRef]
Rezaee, M.J.; Eshkevari, M.; Saberi, M.; Hussain, O. GBK-means clustering algorithm: An improvement to the K-means algorithm based on the bargaining game. Knowl. Based Syst. 2021, 213, 106672. [Google Scholar] [CrossRef]
Peng, K.; Leung, V. Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data. IEEE Access 2018, 6, 11897–11906. [Google Scholar] [CrossRef]
Feng, Y.; Deb, S.; Wang, G.G.; Alavi, A.H. Monarch Butterfly Optimization: A Comprehensive Review. Expert Syst. Appl. 2020, 168, 114418. [Google Scholar] [CrossRef]
Li, S.; Chen, H.; Wang, M.; Heidari, A.A.; Mirjalili, S. Slime mould algorithm: A new method for stochastic optimization. Future Gener. Comput. Syst. 2020, 111, 300–323. [Google Scholar] [CrossRef]
Wang, G.-G. Moth search algorithm: A bio-inspired metaheuristic algorithm for global optimization problems. Memetic Comput. 2018, 10, 151–164. [Google Scholar] [CrossRef]
Feng, Y.-H.; Wang, G.-G. Binary moth search algorithm for discounted 0–1 knapsack problem. IEEE Access 2018, 6, 10708–10719. [Google Scholar] [CrossRef]
Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
Yang, X.S. Nature-Inspired Metaheuristic Algorithms; Luniver Press: Bristol, UK, 2008; pp. 83–96. [Google Scholar]
Wang, Y.; Zhou, H. Hybridized firefly algorithm based RFID network multi-objective planning. Appl. Res. Comput. 2018, 35, 3003–3006. [Google Scholar]
Zhang, C.; Zhang, H.; Yan, Y.; Su, N. Remote sensing image fusion based on adaptive pulse coupled neural network (PCNN) in firefly optimization. J. Harbin Eng. Univ. 2019, 40, 501–508. [Google Scholar]
Long, W.; Cai, S.H.; Jiao, J.; Chen, Y.; Huang, Y. Firefly algorithm forsolving constrained optimizationproblems and engineering applications. J. Cent. South Univ. (Sci. Technol.) 2015, 46, 1260–1267. [Google Scholar]
Lin, M.; Liu, F.; Tong, X. Fuzzy clustering algorithm based on firefly algorithm. Comput. Eng. Appl. 2014, 50, 35–38. [Google Scholar]
Zhou, L.; Wang, Y. Firefly algorithm-based insulation fit for extra-high voltage DC transmission lines. J. Xiangtan Univ. (Nat. Sci. Ed.) 2021, 43, 70–78. [Google Scholar]
Wang, Y.S.; Zhu, Y.X.; Zhao, C.Y.; Zhang, X.W.; Wei, X.X. Research on intersection traffic light timing optimization based on firefly algorithm. J. Wuhan Univ. Technol. 2021, 45, 699–703. [Google Scholar]
Edelmann, D.; Móri, T.F.; Székely, G.J. On relationships between the Pearson and the distance correlation coefficients. Stat. Probab. Lett. 2021, 169, 108960. [Google Scholar] [CrossRef]
Liu, Y.; Mu, Y.; Chen, K.; Li, Y.; Guo, J. Daily activity feature selection in smart homes based on pearson correlation coefficient. Neural Process. Lett. 2020, 51, 1771–1787. [Google Scholar] [CrossRef]
Pu, X.C.; Huang, J.L.; Qi, N.; Song, C.S. Application of K-Means algorithm based on density information entropy in customer segmentation. J. Jilin Univ. (Sci. Ed.) 2021, 59, 1246–1251. [Google Scholar]
Ling, Z. Digital cluster user classification method based on improved RFM model. Appl. Res. Comput. 2020, 37, 2821–2826. [Google Scholar]
Li, S.; Man, Z. K-means clustering algorithm with adaptive feature weights. Comput. Technol. Dev. 2013, 23, 98–105. [Google Scholar]
Chen, I.; Bojin, T.; Yunzhu, P. Joint elbow method and expectation maximization of Gaussian hybrid clustering power system customer binning algorithm. Comput. Appl. 2020, 40, 123–129. [Google Scholar]
Thorndike, R.L. Who belongs in the family. Psychometrika 1953, 18, 267–276. [Google Scholar] [CrossRef]
Liu, F.; Deng, Y. Determine the number of unknown targets in Open World based on Elbow method. IEEE Trans. Fuzzy Syst. 2020, 29, 986–995. [Google Scholar] [CrossRef]
Crozier, S.N.; Falconer, D.D. Least sum of squared errors. IEE Proc. F-Radar Signal Process. 1991, 138, 371–378. [Google Scholar] [CrossRef]
Ramazan, U.; Petros, X. Estimating the number of clusters in a dataset via consensus clustering. Expert Syst. Appl. 2019, 125, 33–39. [Google Scholar]
Ortigueira, M.D. Fractional-order Calculus for Scientists and Engineers; Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Machado, J.A.T.; Mainardi, F. Fractional-order calculus: Quo vadimus? (where are we going?). Fract. Calc. Appl. Anal. 2015, 18, 495–526. [Google Scholar] [CrossRef]
Lopes, A.M.; Machado, J.T. Application of fractional-order techniques in the analysis of forest fires. Int. J. Nonlinear Sci. Numer. Simul. 2016, 17, 381–390. [Google Scholar] [CrossRef]

Figure 1. Student behavior analysis goals.

Figure 2. Student behavior analysis framework diagram.

Figure 3. Flow chart of analysis process.

Figure 4. Student personal big data analysis model.

Figure 5. Pie chart of the proportion of dining locations in the meals.

Figure 6. The curve of dining hours in the cafeteria on working days.

Figure 7. Non-working day cafeteria dining hours curve.

Figure 8. Folding line of SSE-K, Calinski–Harabasz-K relationship.

Figure 9. Comparison curve of clustering results.

Table 1. Table of meals operating hours.

Operating Time	Time
Breakfast	06:00–09:00
Lunch	11:00–13:00
Dinner	17:00–20:00

Table 2. Meal times in each canteen in meals.

Meal	First	Second	Third	Fourth	Fifth
Breakfast	11,285	33,075	534	124	20,594
Lunch	11,232	20,336	18,663	27,879	22,824
Dinner	8703	17,025	13,370	22,096	16,448

Table 3. Pearson coefficients for the meals.

Time Quantum	Pearson’s Linear Coefficient
Breakfast	−0.903313
Lunch	−0.715571
Dinner	−0.346169

Table 4. Meaning of DFM model indicators.

Indicator	Indicator Representativeness	Indicator Implication
D (Deposit)	Average monthly balance	consumption base
F (Frequency)	Frequency of consumption	consumption enthusiasm
M (Monetary)	The amount spent during the month	consumption power

Table 5. Meaning of model indicators of living diet.

Indicator	Implication
Life consumption	Canteen consumption
Other expense	Non-living food consumption is other consumption
Surplus	The amount deposited into the campus card

Table 6. Meaning of indicators of the average consumption model for meals.

Indicator	Implication
Morning data	The average amount spent per breakfast meal
Noon data	The average amount spent per meal at lunch
Dinner data	The average amount spent per meal at dinner
Month count	Number of meals per month

Table 7. Comparison of Davies–Bouldin Index scores.

Model	k-Means	PSO-k-Means	GA-k-Means	FFA-k-Means
DFM	0.89859	0.53351	0.56867	0.53165
Living	0.78934	0.38183	0.24605	0.16071
Meals	0.84562	0.46144	0.60607	0.44335

Table 8. Clustering results of DFM model.

Number	Deposit	Frequency	Monetary
1	194.84	22.00	2331.90
2	97.75	77.52	699.47
3	56.08	24.91	137.48
4	78.06	57.57	360.70

Table 9. Clustering results of the living model.

Number	Life Consumption	Other Expense	Surplus
1	99.21	8.51	10.30
2	500.86	42.56	41.65
3	622.72	227.71	189.54

Table 10. Clustering results of the meals.

Number	Morning Data	Noon Data	Dinner Data	Month Count
1	1.83	93.50	6.00	26.00
2	6.98	4.67	138.33	19.00
3	3.13	7.18	6.88	66.27
4	3.71	8.09	7.58	32.03

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Meng, X.; He, Q.; Dong, Y.; Sun, H. Analysis of College Students’ Consumption Behavior Data Based on Fractional-Order Firefly Optimization Clustering Algorithm. Appl. Sci. 2025, 15, 7723. https://doi.org/10.3390/app15147723

AMA Style

Meng X, He Q, Dong Y, Sun H. Analysis of College Students’ Consumption Behavior Data Based on Fractional-Order Firefly Optimization Clustering Algorithm. Applied Sciences. 2025; 15(14):7723. https://doi.org/10.3390/app15147723

Chicago/Turabian Style

Meng, Xiang, Qi He, Yanhua Dong, and Hongyu Sun. 2025. "Analysis of College Students’ Consumption Behavior Data Based on Fractional-Order Firefly Optimization Clustering Algorithm" Applied Sciences 15, no. 14: 7723. https://doi.org/10.3390/app15147723

APA Style

Meng, X., He, Q., Dong, Y., & Sun, H. (2025). Analysis of College Students’ Consumption Behavior Data Based on Fractional-Order Firefly Optimization Clustering Algorithm. Applied Sciences, 15(14), 7723. https://doi.org/10.3390/app15147723

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of College Students’ Consumption Behavior Data Based on Fractional-Order Firefly Optimization Clustering Algorithm

Abstract

1. Introduction

2. Materials and Methods

2.1. The Goal of Student Consumption Behavior Analysis

2.2. Behavior Analysis Framework

2.3. Application of Clustering Algorithm in Student Consumption Behavior Analysis

3. Analysis of Students’ Consumption Behavior

3.1. Dataset

3.2. Statistical Analysis

3.2.1. Meal Times at Breakfast, Lunch, and Dinner

3.2.2. Distribution of Meal Times on Weekday and Non-Workday Dining

3.2.3. Pearson’s Linear Correlation Coefficient Analysis of Student Consumption

3.2.4. Statistical Result Visualization

3.3. Behavior Analysis Model

3.3.1. Overall Consumption Model

3.3.2. Living Diet Model

3.3.3. Average Consumption Model for Breakfast, Lunch and Dinner

3.4. Clustering Process

3.4.1. Determining the K Value

3.4.2. Determining Clustering Center

3.4.3. Evaluation Methodology

4. Experiment Result and Analysis

4.1. Optimal Number of Clusters

4.2. Comparison

4.3. Analysis of Clustering Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI