Energy Consumption Patterns and Characteristics of College Dormitory Buildings Based on Unsupervised Data Mining Method

: The college building is a large energy consumer with a high density of energy consumption. However, less attention is paid to college buildings, particularly college dormitory buildings. Based on the one-year historical data collected from 20 college dormitory buildings located in Wuhan, China, this study aims to propose a three-stage strategy to identify and analyze the energy consumption patterns and characteristics of college dormitories in detail, including determining energy consumption patterns, analyzing key characteristics based on four indexes, and examining three inﬂuencing factors (occupants’ gender and ﬂoor and orientation location of rooms). The results show that the heavy energy users (around 10% of all occupants) consume around 20% of the total energy and have the narrowest comfort temperature range. However, the light energy users, 42% of total occupants, consume only approximately 27% of total energy. Their different tolerance to coldness is the main reason contributing to different energy consumption. The dormitories of males and location of the top ﬂoor and corner tend to consume signiﬁcantly more energy in hot weather. This study would help campus facilities to understand the energy use behavior of occupants and formulate adequate policies so as to improve the energy management of campuses.


Introduction
The building is one of the major energy consumption sectors which is responsible for over one-third of global final energy consumption and nearly 40% of total direct and indirect CO 2 emissions [1].As the world's largest energy consumer, China accounts for 21% of the world's total final energy consumption [2].Particularly, China's buildings in operation approximately account for 22% of the domestic total energy consumption and 20% of the domestic total CO 2 emissions [3].China aims to reach CO 2 emissions peak (a inflection point) before 2030 and achieve carbon neutrality (net-zero carbon dioxide emissions) before 2060 [4].The goal of sustainable development and the global energy crisis have made building energy-related work and research increasingly important.
The energy conservation of college buildings has received much attention recently.One reason for this is that the educational level has improved continuously, and the university scale has expanded increasingly.Additionally, over 44 million students were enrolled in higher education institutions in 2021 in China [5].Another reason is that the research indicated that the consumption per student is four times that of the national average [6], and the energy consumption density of university dormitories is 5-10 times higher than that of ordinary residential buildings [7].Gui et al. [8] analyzed data from 122 buildings at Griffith University and concluded that teaching buildings have the highest energy consumption (over 50% of the total), whereas research buildings have the highest energy utilization intensity, with over three times that of teaching buildings.Wadud et al. [9] Buildings 2023, 13, 666 2 of 18 indicated that research universities were more energy intensive, especially for the science and medical areas, through examining 144 universities in the UK.Ge et al. [10] investigated laboratories, teachers' offices, and students' offices of a university in Hangzhou, China.They pointed out that occupants' behaviors varied in different function types of rooms in research buildings.Room temperature setback contributed to a sharp decrease in cooling load in the summer but limited influence in winter.In a case study of a teaching building at Seoul National University, Korea, Song et al. [11] found that optimizing the course timetable in terms of energy use contributed to 4% energy saving during air-conditioning seasons (summer and winter).Furthermore, mitigating the level of hard constraints was helpful to achieve up to 5% energy saving during the space cooling or heating periods.Wang [12] drew attention to schools, from elementary schools to universities in Taiwan, emphasizing that universities consumed much more energy than other schools with the main power of electricity.Research-based universities consumed more energy than the courseworkbased ones did.Sekki et al. [13] provided an analysis in daycare centers, schools, and university buildings located in southern Finland.They reported that the variation in energy consumption and primary energy consumption was high between the buildings without considering the age and type of buildings.Escobedo et al. [14] considered the buildings of the National Autonomous University of Mexico, and found that the main energy uses were lighting (28%) and refrigeration (14%).Vosoughkhosravi et al. investigated several residential college buildings and indicated that the LEED (the Leadership in Energy and Environmental Design)-certified buildings had more energy consumption that other non-LEED-certified buildings [15].In a case study of Tianjin University in China, Yin [16] highlighted that the total energy saving of the campus (14.9%) mainly resulted from dormitory buildings (6.49%).
With the advance of data collection tools and devices such as smart meters, a huge amount of all kinds of data makes data mining techniques more important than ever before.As a method of data mining, clustering is a process of grouping physical or abstract objects into classes of similar objects, also called unsupervised classification [17].Through clustering, energy consumers with certain similar energy consumption patterns will be partitioned into the same group.Westermann et al. [18] compared different combinations of the clustering algorithm (k-means clustering and agglomerative hierarchical clustering) and the distance metric (Euclidean distance and dynamic time warping) and developed a smart-meter-based strategy to automatically retrieve thermal building characteristics (heating system type and building type), which was essential to enable a large-scale, accurate building retrofit analysis.Li et al. [19] presented a clustering-based strategy to identify typical daily electricity usage profiles of multiple buildings, including two levels of clustering.The intra-building clustering used a Gaussian-mixture-model-based clustering to identify profiles of each individual building.The inter-building clustering used an agglomerative hierarchical clustering to identify profiles of multiple buildings.It could discover useful information related to building electricity usage, including typical patterns and periodical variation in daily electricity usage.McLoughlin et al. [20] evaluated three of the most wildly used unsupervised clustering methods of k-means, k-medoid, and self-organizing maps (SOM) against a Davies-Bouldin validity index for segmenting the data into disparate patterns of electricity use within homes.A multi-nominal logistic regression was then used to link profile classes to dwelling, occupant, and appliance characteristics.As a result, it was possible to classify customers and the manner with how they used electricity based on their individual characteristics and without prior knowledge of household electricity consumption.Liu et al. [21] proposed a general datamining-based framework for mining real-time building electricity consumption data.They used the density-based spatial clustering application with noise (DBSCAN) algorithm to detect outliers, and grouped similar daily electricity load profiles by means of the k-means algorithm to extract typical electricity load patterns.A classification and regression tree (CART) algorithm was employed to discover insightful knowledge on typical electricity load patterns and to improve the interpretability of clustering results.Gianniou et al. [22] used the k-means algorithm to segment heating consumption intensity and load pattern groups.The results showed that a constant load profile could represent most of district heating customers, and calendar context affected load patterns and consumer behaviors.Customers with high energy consumption had lower variability.Wang et al. [23] proposed new methods based on three pattern features to cluster district heating users, including extracting representative pattern features from heat usage record data and clustering district heating users.They found that the ambient temperature had strong impacts on the heat demand of almost all users, and it could conceal the influence of individual user's behaviors.The results could reveal typical daily consumption patterns when the consumption linearly related to ambient temperature was removed.
Influencing factors of building energy consumption can be divided into two categories: technical and physical (objective) factors and human-influenced (subjective) factors [24].The meteorological parameters have strong effects on building energy consumption [25].The building envelope, shape factor, floor, orientation, etc., have a joint effect on energy consumption [26][27][28].An occupant's behavior has huge impact on building energy consumption, with a gap between the predicted and actual consumption up to 300% [29].Although gender is a human-related factor, it belongs to the objective factors in this study in that it cannot be changed by human behaviors, and it has a significant impact on energy consumption in both the cooling season and heating season [30,31].
From the above literature review, a few studies have focused on investigating building energy consumption.However, several gaps still exist in the current research.Numerous studies pay attention to commercial or residential buildings, whereas much less research focuses on college buildings, particularly college dormitory buildings, which contribute to the large energy consumption of campuses.In fact, characteristics of dormitory building energy consumption differ a lot from ordinary residential buildings due to the difference in the occupant type, occupancy duration, involved appliances, and so on.In addition, statistical analysis, particularly for the averaging or aggregation process, may result in the loss of some characteristic information, while the data mining process can effectively discover and extract potential patterns as well as characteristics based on actual monitored data, which could benefit managers for knowing their occupants well and yield further insights for campus energy conservation.
This paper aims to propose a three-stage strategy to identify and analyze the energy consumption patterns and characteristics of college dormitories, which can help campus facilities to understand the energy use behavior of occupants, optimize building operations, and formulate adequate policies to improve the energy efficiency of the campus.Firstly, the k-means clustering algorithm with two distance metric and five validation indexes was conducted to adequately determine energy consumption patterns.Key energy consumption characteristics were then quantitatively analyzed through four indexes, including the energy signature, energy flow, energy performance, and comfort temperature range.Finally, multi-nominal logistic regression was carried out to identify the relevance of the occupants' gender, floor location, and orientation of rooms to the energy consumption pattern.
The remainder of this paper is organized as follows.The methods of three stages-clustering, energy consumption characteristics, and influencing factor characteristics-are introduced in Section 2.Then, the room information and monitored data are demonstrated in Section 3. The results of those three-stage analyses are provided in Section 4. Some additional statements are discussed in Section 5.The conclusions are summarized in Section 6.

Methodology
The methods for data analysis are introduced in this section, as shown in Figure 1.

Data Cleaning
Data cleaning refers to finding, removing, and replacing bad data (outliers), which is one of the most important procedures in data mining.The quality of data will have a significant influence on the final result [33], hence we use several methods to perform a detailed data cleaning.In this study, outliers mainly include missing data, always-zero data, and abnormal data.Missing data may be due to the uncertainty of electricity meters.Taking always-zero data into consideration will interfere with analysis, as some dormitories may be unoccupied on some days.In order to capture energy consumption characteristics better, we set the unoccupied threshold to 0.1 kWh rather than 0 kWh, as it is normally set.
The third type of outlier is abnormal data, including two parts: abnormal data on some day of a certain room and abnormal data of all rooms on a certain day.For the first condition, the distance-based outlier detection algorithm is conducted, which was proposed by E.M. Knorr et al. [34] and improved by S. Ramaswamy et al. [35] and S.D. Bay et al. [36].An object O in a dataset T is a DB(p, D)-outlier if at least fraction p of the objects in T lies greater than distance D from O [34].In other words, an object O is an outlier if it has less than M (M = n × (1−p), n is the number of data objects) neighbors within the distance of D. In this research, we tried different parameter combinations and finally decided to set M to 1 and D to 4. For the second condition, the energy data of all rooms may be all wrong on some days due to the defect of electricity meters or management systems.The value of daily minimum consumption is used to detect whether all data on a certain day are all wrong.As shown in Figure 2, 95.49% of daily minimum consumption is below 0.5 kWh.
Therefore, we set the threshold to 0.5 kWh, which means all data are available on a certain day if the minimum consumption is smaller than 0.5 kWh.The above cleaning methods identify abnormal energy consumption values of each room.However, a room with too many outliers will lead to an inaccurate analysis, therefore, those rooms need to be removed from the dataset.The outlier number of each room (each room has 266 data points in total) is demonstrated in Figure 3, where the box plot is used to identify the outlier number threshold.The rooms with more than 44 outliers (1.5 box) are considered abnormal and removed from the dataset.In total, 434 dormitories are taken into consideration in the following analysis.

Data Filling
For those available (434) dormitories, we removed those three types of outliers.The resulting missing values are filled through the polynomial-regression-based missing value filling algorithm in this section.In MATLAB2019b, polyfit [37] uses x to form the Vandermonde matrix V with n + 1 columns and m = length (x) rows, resulting in the linear system Equation( 1), which polyfit solves with p = V\y.Since the columns in the Vandermonde matrix are powers of the vector x, the condition number of V is often large for high-order fits, resulting in a singular coefficient matrix.In those cases, centering and scaling can improve the numerical properties of the system to produce a more reliable fit.
Only available data of each room are used to calculate to obtain a polynomial p(x) of degree n as shown in Equation(2) [37], which is the best fit in a least-squares sense for the data in y.Through several data trainings, we found that the degree of 4 is the best option.The degree of 3 is underfitting in the middle of the plot and 5 is overfitting at the tail.The result of pre-processing of consumption data in typical rooms is shown in Figure 4).

Data Integration
The daily energy consumption data of each dormitory are integrated and tagged with date, outdoor air temperature, occupants' gender, and dormitory's floor and orientation location.According to the order of time or temperature, the daily energy consumption data can be in the time series or temperature series.In this study, data normalization or standardization, such as the min-max or z-score [38], is not conducted in energy consumption data and temperature data.For one reason, the raw energy consumption data only have a one-dimensional parameter in the energy signature, which is outdoor air temperature.For another reason, the data scales of energy consumption and outdoor air temperature are similar.Using raw data rather than normalized or standardized data can reflect a real circumstance in the following analysis.[39] is a widely used and well-known unsupervised learning method for cluster analysis in data mining because of its high efficiency and simplicity.It aims to partition data objects X p ∈ X, X = X 1, X 2 , . . ., X p , . . ., X n into k clusters C q ∈ C, C = C 1, C 2 , . . ., C q , . . ., C k , in which more similar data objects are in the same cluster by calculating the within-cluster sum of the squared distance.The algorithm is described as follows:

K-Means clustering
(1) Randomly select k (the number of clusters which is already determined) data objects X p as the initial cluster centroids A q ∈ A, A = A 1, A 2 , . . ., A q , . . ., A k .(2) Assign each data object X p into the closest cluster C q by calculating the distance between the object X p and the centroid of clusters A q .(3) Update a new centroid A q of each cluster by calculating the distance.(4) Iterate step (2) and (3) until the centroid of clusters A q will not change any more.
The limitations [40,41] of the k-means clustering algorithm mainly include that (1) it needs an outlier detection algorithm to deal with outliers, (2) it needs to be improved in dealing with non-spherical-shaped distribution and different-density data, and (3) clustering results vary each time due to the uncertainty caused by randomly initializing centroids.As k-means clustering initializes randomly in cluster centroids which leads to different clustering results after each run, we set the initialization of cluster centroids as 1 to k in order to compare the performance of different approaches.

Distance Metrics
The Euclidean distance [17] is the most well-known and commonly used distance metric in clustering algorithms, defined as Equation(3).
where X = {X 1 , X 2 , . . ., X n } and Y = {Y 1 , Y 2 , . . ., Y n } are n-dimensional vectors.Dynamic time warping (DTW) [42] aims to find the minimum distance (usually the Euclidean distance) between two sequences, which is improved alignment based on shape [43].It calculates the pairwise distance of all objects in a sequence, and the shortest path is accumulated to the minimum summed distance.Hence, some points in a sequence are matched to the same point in another sequence, as shown in Figure 5.

Validation Indexes
Silhouette index [44].An object i in the dataset is taken and denoted as cluster A, to which it has been assigned; a(i) is the average distance of i to all other objects of A; and b(i) is the minimum average distance of i to all other objects of other clusters.The silhouette of this object s(i) is defined as Equation ( 4).The silhouette index assesses the effect of clustering combined with cohesion and separation.The value of s(i) is between [−1, 1], and closer to 1 represents relatively good cohesion and separation.A value of larger than 0.5 supports strongly an underlying structure inside the data.The silhouette index is the average silhouette of all objects, as shown in Equation (5).
The Dunn index [45] computes the minimum distance between any two clusters (between clusters) divided by the maximum distance in any cluster (within cluster) and defined as Equation ( 6).The higher value of Dunn represents greater distance between clusters and a smaller distance within the cluster at the same time, which indicates a better clustering result.
where dist(C i , C j represents the minimum distance between cluster C i and cluster C j , defined as dist(C i , C j ) = min x∈C i , y∈C j d(x, y), and diam(C l ) represents the maximum distance within cluster C l , defined as diam(C l ) = max x,y∈C l d(x, y).
The Calinski-Harabasz index [46] calculates score through an assessment between the variance between clusters and the variance within the cluster, defined as Equation (7).
where SSB is the inter-cluster variance, SSW is the within-cluster variance, k is the number of clusters, and n is the number of all members.A higher value of CH index indicates a better clustering result.However, the factor (n−k)/(k−1) decreases as k increases, which leads to a problem in that it tends to select a fewer-cluster clustering result (such as k = 2).Therefore, local optimal results can be acceptable sometimes.The Davies-Bouldin index [47] purposes to measure the average of the maximum similarity of each cluster through calculating the ratio of the distance within a cluster and the distance between clusters, defined as Equation (8).
where S i (S j ) represents the dispersions of clusters i (j), which calculates the average distance between the centroid of the cluster and the data within the cluster, defined as , T i is the number of vectors in cluster i, X k is the member within a cluster, A i is the centroid of cluster i, and S i is the average Euclidean distance when q = 1.M ij represents the distance between cluster i and j, which calculates the distance between vectors of clusters i and j, defined as M ij = ∑ N k=1 a ki − a kj p 1/p , a ki (a kj ) is the k-th component of the n-dimensional vector a ki (a kj ), which is also the centroid of cluster i (j), and M ij is the Euclidean distance between centroids when p = 1.

Sum of squared errors (SSE)
[48] calculates the within-cluster sum of squared errors and is defined as Equation (9).With the increasing of clustering number k, SSE will become smaller, as the sample will be categorized meticulously and with better cohesion of each cluster.The increasing of k will greatly increase the aggregation of each cluster when k is less than the real number of clusters, which results in a large decline of SSE.On the contrary, the increasing of k will slightly increase the aggregation of each cluster when k is larger the real number of clusters, which makes the decline in SSE tend to be flat.The figure of SSE and k is like an elbow, and the k value of the elbow (an obvious turning) is the real clustering number.This method is also named the Elbow Method.
where k is the number of clustering, X is the member within cluster C i , and A i is the centroid of C i .

Stage 2-Energy Consumption Characteristics
Based on the clustering result, the two-dimensional temperature-consumption scatter plot, also called the energy signature, of several energy consumption patterns can be displayed to intuitively demonstrate the temperature-consumption relationship in each consumption pattern.
According to the selection of seasonal typical months in the previous study [31], the outdoor air temperature range of [−4, 8) • C represents the cold weather condition (heating seasons), that of [8,23) • C represents the comfort weather condition (transition seasons), and that of [23,30] • C represents hot weather condition (cooling seasons) in the following analysis.Based on this partition, we can quantify the energy flow, referring to the energy consumption flow between the clusters and weather conditions throughout this paper, from each cluster to each weather condition.Additionally, the proportion of the number of rooms and energy consumption in each cluster can be obtained in order to compare their corresponding relationships, the same as the number of days and energy consumption in each weather condition.Furthermore, average daily energy consumption of each cluster in different weather conditions can be determined.
The temperature is divided into several intervals, in which x • C of outdoor air temperature in the X-axis represents the range [x − 0.5, x + 0.5) • C of that, and y kWh represents the average energy consumption of each dormitory in that temperature range.This energy performance could determine and compare the energy consumption between clusters at a certain temperature.
When the thermal environment is comfortable energy consumption will be less, as air conditioners are not needed.The comfort temperature range differs for different cluster consumers.In order to quantify this comfort range, we pick up the energy consumption data which are less than 2 kWh (based on the average consumption which is obviously lower in a certain temperature range, such as 14-20 • C, and take the 5% to 95% temperatures corresponding to those energy consumptions as the comfort range.

Stage 3-Influencing Factor Characteristics
Three influencing factors, occupants' gender, room floor location, and room orientation location, are quantified in each cluster.The multi-nominal logistic regressions [49] are conducted in this stage, and introduced as follows.
When non-continuous variables exist in regression analysis, the normal regression method such as multi-linear regression is not suitable to apply because it does not meet the requirements for the type of variables, and it violates the premise assumption of the regression model.Multi-nominal logistic regression is used to deal with the model which has the multi-category response variable Y that depends on several X explanatory variables (whether X is continuous or non-continuous) and aims to analyze the comparison between each category and the reference category of response variables.The model is defined as Equation (10).
where Exp(B) describes the likelihood or odds ratio, which means the probability ratio of something happens and something does not happen.P j describes the probability that the response variable belongs to category j when compared with the probability P J that the response variable belongs to reference category J (which is already appointed previously).β 0 is an intercept of the model which is a constant.β i is the regression coefficient of explanatory variable X i , which describes the increasing in ln-odds as the one unit increasing of X i .
In the multi-nominal logistic regression, we can quantify the impact of influencing factors and gain further details of the model including the Exp (B) and the standard error, which represent the association strength of each explanatory variable with each cluster and variation within the explanatory variable, respectively.In addition, the statistical significance level of p shows how significant the result is.As mentioned above, one of the clusters needs to be set as the reference cluster, which means the reference cluster cannot be analyzed and explained through the logistic regression model.

Energy Consumption Data of a Campus in China
A university located in Wuhan, a hot summer and cold winter zone of China, was selected to conduct energy consumption analysis.In total, 480 rooms of 20 buildings were considered, including 16 male and 4 female dormitory buildings, as shown in Figure 6.These considered buildings were built in 2003 and had similar layouts.In each building, which contains six floors, north-south exposure, and east-west corridors, rooms are distributed on the north or south sides of the corridor.A split air conditioner is installed in each dormitory with a cooling capacity of 3360 W, energy efficiency ratio (EER) of 3.49, and a heat capacity of 3700 W + 1000 W (an electric auxiliary heater).More information about layout and equipment is detailed in our previous study [31].Raw energy consumption data of a typical dormitory and daily average outdoor air temperature are illustrated in Figure 7.The monitored period began on 3 July 2018 and ended on 2 July 2019, including two semesters and two vacations.Only two semesters were taken into consideration, as few or even no students stayed in dormitories during vacation, which interfered with the analysis of normal school time.During this one-year period, the daily average outdoor dry-bulb temperature fluctuated from −3.

Results
The following contents in this section demonstrate results for each stage of the strategy introduced in Section 2.

Clustering
The k-means clustering with Euclidean/DTW distance metrics and time/temperatureseries data was conducted in order to find the best performance of these four approaches.Using time-series or temperature-series data makes no difference in clustering results for k-means clustering with the Euclidean distance metric, because the pairwise alignment between two certain points will not change.As shown in Figure 8, for these three clustering approaches, the best number of clusters (supported by most indexes) is five, four, and four, respectively.The approach of DTW and temperature series performs worse than others based on the indexes.Although the performances of the two remaining approaches are to tell based on the indexes, the approach of DTW and time series underperforms in the low-energy-consumption dormitory identification according to artificial comparison of clustering results.As a result, the approach of k-means and the Euclidean distance metric was eventually adopted, and the data are segmented into four clusters in the following analysis.

Energy Consumption Characteristics
The clustering result can be intuitively demonstrated by the energy signature, as shown in Figure 9.The U-shaped relationship between consumption and temperature can be observed in each cluster.In total, 183 rooms are included in cluster 1, while only 43 rooms in cluster 4. Here, cluster 1 is considered as light energy users, while cluster 4 is considered as heavy energy users.Clusters 2 and 3 are neutral energy users.The corresponding relationships between the gender proportion, number of rooms, energy consumption, and outdoor air temperature are illustrated in a composite Sankey diagram, as shown in Figure 10.The distribution of occupants' gender in each cluster is analyzed in Section 4.3.The proportions of the number of rooms in each cluster and total energy consumption are illustrated in the middle part of Figure 10.The dormitories in cluster 1 account for 42.17% of the total rooms, and merely consume 26.78% of the total energy.In contrast, the dormitories in cluster 4 merely account for 9.91% of total rooms, while they consume 19.51% of total energy.The lower part of Figure 10 demonstrates the energy flow.It indicates that cluster 1 has a relatively uniform distribution of energy consumption in three weather conditions.Clusters 2, 3, and 4 consume over 40% energy in the cold, hot, and cold conditions, respectively.In the cold condition ([−4, 8) • C), over 40% of the energy is contributed by cluster 2, which is obviously larger than its total energy proportion of 31.32%.Similarly, in the hot weather condition, almost 30% of the energy is contributed by cluster 3, which is obviously larger than its total energy proportion of 22.40%.The proportions of the number of days in each weather condition and their energy consumption are illustrated in the bottom part of Figure 10.The least days of 19.92% of the cold weather condition contribute the most energy consumption of 37.46%.This reveals a huge demand for heating.Furthermore, over 70% of the energy consumption is contributed by the cold and hot weather, which can be attributed to the operation of air conditioners.In fact, this proportion has a correspondence with the energy consumption proportion of air conditioners (75-80%) concluded in the previous study [31].

−
The average daily energy consumption is demonstrated in Figure 11.Cluster 4 becomes a high-energy-consumption group with average daily energy consumption of 7.74 kWh, which mainly results from its performance in cold winter, consuming 4.5 times more than cluster 1 does.Interestingly, cluster 4 consumes over 3 times more than cluster 1 does at [8,23) • C (transition seasons), which may result from more electric devices and longer durations of staying inside.Although the energy consumptions of clusters 2 and 3 are almost identical in the entire temperature range (4.31 kWh and 4.55 kWh), cluster 2 consumes much more energy (10.41 kWh) than cluster 3 does (6.47 kWh) at [-4, 8) • C; on the contrary, cluster 3 consumes more energy (8.94 kWh) than cluster 2 does (4.87 kWh) at [23,30) • C.
The energy performance at smaller temperature intervals provides further details, as illustrated in Figure 12.The energy consumption of cluster 4 has an obvious increase when outdoor air temperature starts to be lower than 13 • C.This also partly contributes to its higher average daily energy consumption of 4.28 kWh at [8,23) • C and obviously higher than others, as shown in Figure 11.In addition, Figure 12 shows that the energy consumption of cluster 3 is even a little bit greater than that of cluster 4 in hot weather.However, when the temperature is below 13 • C, the energy consumption of cluster 4 increases much more rapidly than any other cluster as temperature goes lower continuously.Its occupants consume 16.48 kWh daily when the outdoor air temperature is lower than 8 • C, which is 1.58 times the 10.41 kWh of cluster 2, as shown in Figure 11.Therefore, it is necessary to make a further investigation to these identified high-energy-consumption dormitories and conduct some well-directed measures or policies.Furthermore, in the cold weather, the energy consumptions of the four clusters differ with each other greatly, as illustrated in Figure 12, while in the hot weather, these four clusters only have two energy consumption trends.As the temperature increases, a similar rapid growth trend of energy consumption can be seen in clusters 3 and 4, indicating that the occupants in both clusters 3 and 4 tend to consume more energy than those in clusters 1 and 2 in the hot weather.The comfort outdoor air temperature range is described in Figure 13.The upper ends of the comfort range in different clusters are similar within 23 ± 1.1 • C; in contrast, the lower ends vary greatly from 3.2 to 9.4 • C. We can conclude that the difference in occupants' tolerance to coldness is bigger than that of hotness in this area.Furthermore, the range of cluster 1 is obviously wider than that of others, while cluster 4 has the narrowest comfort temperature range.

Influencing Factors
Three influencing factors, occupants' gender, room floor location, and room orientation location, are quantified in this section through the multi-nominal logistic regression.Clusters 1-4 together are used as the reference cluster.
Occupants' gender.Table 1 illustrates that the males are more likely to appear in clusters 3 and 4, particularly in cluster 3 at a 1% statistical significance level.The composite Sankey diagram of Figure 10 similarly illustrates that 95.24% of rooms belong to males in cluster 3 and 86.05% rooms in cluster 4, where these proportions are apparently more than the overall male proportion of 79.49%.This difference reveals that the males generally consume more energy, especially in hot weather.In fact, this difference is also reflected in the cooling comfort temperature of cluster 3 shown in Figure 13.The floor location of room has an obvious influence on energy consumption, significantly for clusters 2 and 3 at a less than 5% statistical significance level.The ground floor is significantly dominant in cluster 2 (a cold-sensitive cluster), as shown in Table 1, indicating that the ground floor's occupants tend to consume more in cold environments.In contrast, the top floor is significantly dominant in cluster 3 (a hot-sensitive cluster), displaying that the top floor extremely and significantly consumes more energy with a huge Exp(B) value of 20.276.We can conclude that the ground floor consumes more energy in the cold weather whereas the top floor does in the hot weather.
The orientation location of room.The rooms in the corner including northeast, southeast, southwest, and northwest generally show a higher likelihood for clusters 2-4.Particularly, the northwest orientation has an Exp(B) of 2.715 in cluster 4. Considering its minor Exp(B) of 0.933, we can conclude that the occupants living in the northwest corner dormitories are more easily influenced by cold environments.In contrast, the middle orientations (north and south) tend to show in cluster 1, indicating a lower consumption.

Discussion
This study discovers four typical energy consumption patterns in college dormitories.Cluster 1 represents an energy-saving pattern with an average energy consumption of 2.50 kWh/day per room, obviously lower than others.With the widest comfort range, its occupants' behavior tends to be less air-conditioning use and more energy saving.The middle floor and north-and south-facing rooms tend to have this pattern.Cluster 2 represents an energy-neutral and cold-sensitive cluster with an average energy consumption of 4.31 kWh/day per room.The cold weather contributes 48.09% of energy in this cluster.The female and ground floor rooms are more likely to consume in this pattern.No certain orientation is obviously dominant in this cluster.Cluster 3 is an energy-neutral and hot-sensitive cluster with an average energy consumption of 4.55 kWh/day per room.Its 44.30% of energy is attributed to the hot weather.The males are more likely to consume in this pattern.A high tendency of the male and top floor dormitories belongs to this pattern.Cluster 4 is an energy-consuming cluster with an average energy consumption of 7.74 kWh/day per room, which is essential to identify and pay special attention to.With the narrowest comfort range, the occupants are extremely sensitive to weather changes.Its 42.41% of energy was consumed in cold winter, which is the main reason for being the high-energy-consumption cluster.The male, top and ground rooms, and the rooms in the corner (particularly the northwest) are more likely to consume in this pattern.
Figure 10 shows that the highest proportion (37.46%) of total energy is consumed in the cold weather (only 19.92% of total days), which seems to indicate a heating-dominant area.However, this study only investigated the school time, without involving vacations.The hottest period occurs in the summer vacation, whereas the coldest period is during school time.Therefore, this result is not contrary to a cooling-dominant city of Wuhan.
It is worth noting that the energy consumption of each cluster reaches a local peak at 0 • C, as shown in Figure 12.Particularly, an obvious decline is illustrated in the energy consumption of cluster 1 when outdoor air temperature continuously goes lower than 0 • C, which contradicts the general trend.This difference may imply the tendency of human thermal comfort to change at around 0 • C, also exactly the freezing point of water.More research needs to be undertaken to contrast or explain this phenomenon.
Interestingly, the comfort outdoor air temperature range demonstrated in Figure 13.has a similar upper end at approximately 23 • C, which is just the upper end of the 90% acceptable indoor air temperature range of 13-23 • C suggested by Hu et al. [50].In contrast, the lower end of comfort ranges in this study is substantially lower than 13 • C, even down to 3.2 • C. The reason that the comfort ranges of outdoor temperature and indoor temperature have similar upper ends but completely different lower ends still need further research to explain.
A data-driven approach based on daily energy consumption and outdoor air temperature is conducted in this study to characterize dormitory energy use.Compared to the physical model, although some drawbacks exist, such as over-reliance on historical data and lack of explanation, it is easy to obtain the energy consumption characteristics without requiring detailed and comprehensive information of buildings such as geometric dimensions or building materials, and it is flexible for real-time monitoring and optimizing, as there is no need to calculate a number of equations as in physical models [51].A group of two-dimensional data is enough to identify and classify rooms, which would be helpful for energy saving or building retrofitting [18].In addition, as daily energy consumption is often available rather than hourly data in existing buildings in China, this method can be applicable for these buildings.
Compared to other types of buildings, college dormitory buildings have an unfixed occupancy schedule, resulting from various grades, courses, personal arrangements, etc.Furthermore, unlike those college public buildings such as teaching or office buildings, less advanced smart meters are installed in dormitories, resulting in less detailed information and accurate analysis.In the future, energy managers and building designers should pay more attention and install more high-resolution (such as hourly or by the minute) smart meters in dormitory buildings, as the dormitory is a group living space with a high-density energy consumption.
Currently, the students in this university can easily reach their historical daily energy consumption through mobile phones.However, there is still a lack of some specific indicators or analysis results (such as the comparison of peer usage [52,53]) of historical data to have a rational and efficient feedback in order to (1) increase the awareness of occupants about energy consumption; (2) integrate the habitants to contribute to the energy efficiency strategies; and (3) create a tool of communication with the customers which can help them to learn controlling and preserving energy [54].In fact, a user-centered approach, which allows architects or managers to exchange information and knowledge with communities, has been proved as a practical way to not only improve energy efficiency and environmental sustainability, but also help with building retrofitting and refurbishment [55].In addition, except for the energy feedback to consumers, the influencing factor analysis also implies that energy managers are supposed to consider those factors and make differentiated policies or pricing strategies.

Conclusions
In this study, a three-stage strategy is proposed to identify and analyze the energy consumption patterns and characteristics of college dormitories in detail, including clustering, energy consumption characteristics, and influencing factor characteristics.The following conclusions can be summarized from the results: 1.The heavy energy use dormitories, accounting for 10% of total dormitories, approximately consume 20% of total energy; in contrast, the light energy use dormitories, 42% of total dormitories, approximately consume only 27% of total energy.Over 71% of total energy is consumed in air-conditioning seasons that account for less than 43% of total days.2. In the cold weather ([−4, 8) • C), the occupants in four clusters have four completely different energy consumption patterns, whereas in the hot weather ( [23,30] • C), these four clusters only have two energy consumption trends.3. The deviation in different occupants' tolerance to coldness is obviously larger than that to hotness, which is the main reason contributing to the energy consumption difference in this area.4. All influencing factors of the occupants' gender and floor and orientation location have impacts on energy consumption.Generally, the males prefer to use more energy, particularly in the hot weather.The middle floor dormitories are most likely to consume in an energy-saving pattern.The top floor dormitories are significantly dominant in the energy-consuming pattern in hot weather, whereas the ground floor dormitories do in cold weather.The dormitories in the corner (northeast, southeast, southwest, and northwest) tend to consume more energy, particularly in the hot weather.
This study is based on the case of 20 college dormitory buildings located in Wuhan, China, which belongs to the hot summer and cold winter climate zone of China.The results could be also adopted and generalized to other building stocks in similar climate zones for improving energy efficiency and sustainable development.

2. 1 .
Stage 1-Clustering 2.1.1.Data and Information Collection Historical daily energy consumption of monitored dormitories were collected through electricity meters within a one-year period.Room information including the occupants' gender, floor location, and orientation location were recorded according to on-site investigations.The daily outdoor air temperature was downloaded from the China Meteorological Data Service Center [32].

Figure 1 .
Figure 1.Steps to the proposed strategy.

Figure 2 .
Figure 2. Pareto chart of daily minimum energy consumption.

Figure 3 .
Figure 3. Box plot of outlier number of each room.

Figure 4 .
Figure 4. Results of data cleaning and filling in two typical rooms.

Figure 5 .
Figure 5. Pairwise match in using of DTW.

Figure 7 .
Figure 7. Raw energy consumption data of a typical dormitory and outdoor air temperature.

Figure 8 .
Figure 8. Clustering validation indexes in using different distance metrics and data series.

Figure 9 .
Figure 9. Energy signature of each cluster.

Figure 10 .
Figure 10.A composite Sankey diagram demonstrates (from top to bottom) (1) the gender proportion and its corresponding distribution in each cluster; (2) the corresponding energy consumption proportion of each cluster; (3) the cluster energy consumption distribution in each weather condition; and (4) the corresponding relation between number of days and energy consumption in each weather.

Figure 11 .Figure 12 .
Figure 11.Average daily energy consumption of each dormitory in different outdoor air temperature ranges.

Author
Contributions: Conceptualization, Y.Y.; methodology, Y.Y. and W.G.; software, Y.Y.; validation, Y.Y. and J.Y.; formal analysis, Y.Y. and J.Y.; investigation, Y.Y. and Z.Z.; data curation, Y.Y.; writing-original draft preparation, Y.Y.; writing-review and editing, W.G.; visualization, Y.Y. and Z.Z.; resources, C.T.; supervision, W.G. All authors have read and agreed to the published version of the manuscript.Funding: Science and technology project of Ministry of Housing and Urban-Rural Development (Research and application of key technologies for the optimization and intelligent operation of energy supply systems in urban building clusters and areas, No. 2021-K-003).