Gradient Boosting Approach to Predict Energy-Saving Awareness of Households in Kitakyushu

Singh, Nitin Kumar; Fukushima, Takuya; Nagahara, Masaaki

doi:10.3390/en16165998

Open AccessArticle

Gradient Boosting Approach to Predict Energy-Saving Awareness of Households in Kitakyushu

by

Nitin Kumar Singh

^1,*

,

Takuya Fukushima

² and

Masaaki Nagahara

¹

Graduate School of Advanced Science and Engineering, Hiroshima University, Higashihiroshima 739-8527, Japan

²

Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma 630-0192, Japan

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(16), 5998; https://doi.org/10.3390/en16165998

Submission received: 14 June 2023 / Revised: 3 August 2023 / Accepted: 9 August 2023 / Published: 16 August 2023

(This article belongs to the Special Issue Factors Influencing Households’ Energy Consumption)

Download

Browse Figures

Versions Notes

Abstract

:

This paper aims to develop a machine-learning model based on a gradient-boosting algorithm to predict the energy-saving awareness of households using a questionnaire survey and 11-month energy data collected from more than 200 smart houses in Kitakyushu, Japan. We utilize the LightGBM (light gradient boosting machine) classifier to perform feature selection for the prediction. By using this approach, we demonstrate that the key features are the standard deviations of electricity purchased between 8 a.m. and 9 a.m. and electricity consumed between 7 p.m. and 9 p.m. Next, by using k-means clustering we split the households based on the obtained features into three groups. Finally, by using statistical hypothesis testing, we prove that these three groups have statistically distinct levels of energy-saving awareness. This model enables us to detect eco-friendly households from their energy data, which may support energy policymaking.

Keywords:

gradient boosting; LightGBM; k-means clustering; time-series data; questionnaire survey; home energy management systems; zero energy houses

1. Introduction

Electricity demand has been globally increasing in recent years, mainly driven by population growth, economic development, and urbanization [1,2]. Fossil fuels such as oil and coal are still currently the dominant sources of electricity generation globally [3]. Consequently, using fossil fuels in electrical power generation has led to significant environmental impacts, especially the emission of greenhouse gases such as CO

_{2}

, which has become a critical issue of worldwide concern in the current scenario due to global warming [4]. To address this challenge and uphold the commitments made in the Paris Agreement (2015), several countries have set energy policies aiming at lowering annual greenhouse gas emissions and restricting the increase in global warming to a level significantly below 2 °C and actively striving to contain it within the range of 1.5 °C [5,6].

The electricity usage of the residential sector accounts for a significant portion of global power consumption [7]. Consequently, several countries and organizations are trying to develop energy policies that promote the widespread adoption of renewable energy sources at home to lower domestic electricity consumption. To address the aforementioned problems, home energy management systems (HEMS) have received colossal research attention in recent years [8,9]. Household energy management systems refer to the strategies or tools that households can use to manage their energy usage at home to achieve the status of nearly zero energy houses (ZEH) and to assist governments and policymakers in framing strategic energy plans in order to develop an eco-friendly environment [10,11]. Consequently, HEMS has the potential to reduce households’ electricity consumption up to a great extent and can play a crucial role in the realization of nearly zero-energy houses.

The electricity-saving awareness among households is a part of HEMS and it has been observed that residents’ electricity-saving awareness is a crucial factor and can reduce domestic power consumption drastically; consequently, there is plenty of space for home energy management systems to develop energy policies based on behavioral intervention strategies.

However, in this field, only limited conventional research has been conducted so far, and only a few research papers are available to predict the energy conservation awareness of households based on domestic electricity consumption profiles and questionnaire survey data. In the past, only a basic level of research has been carried out in an attempt to develop behavioral intervention strategies based on energy conservation awareness of households for reducing domestic power consumption. Moreover, it has been also found that if the behavioral intervention strategies are inefficient, then it is challenging to obtain the desired results and convert these strategies into the realization of zero-energy houses [12,13].

Some researchers made attempts to investigate the role of occupant behavior in reducing domestic electricity consumption [14,15]. Still, the majority of work in this area has focused on simple statistics and conventional machine learning-based techniques for analyzing household energy data and occupant behavior [16,17]. Moreover, it is challenging to predict household energy-saving awareness and develop energy-saving policies due to the uncertainty that arises from occupants’ behaviors, lack of sufficient data, use of inefficient data analysis methods, house characteristics, etc. [18].

Xueyan et al. present a study on the energy-saving behaviors of rural households in the Loess hilly region of China. This study involved a survey using the participatory farmer assessment method and analyzed the data using mathematical statistics and the Lorenz curve [14].

Nilson et al. present a Swedish field study on the impact of real-time feedback on energy consumption behavior in households. This study involved a consumer-oriented multidisciplinary approach, combining quantitative and qualitative methods [19].

Meng et al. explore the relationship between personality traits and energy-saving potentials in households. In this article, the authors use a support vector regression model and the Monte Carlo method to predict electricity consumption and design optimal intervention strategies [18].

Sun et al. discuss the use of C-vine copulas for consumer categorization based on smart meter data and a decision tree classification module is developed to partition new consumers based on their smart meter data [20].

Estiri et al. used structural equation modeling to examine the effects of household and building characteristics on residential energy consumption [21]. Abdo et al. used random forest (RF), multiple regression (MLR), multilayer neural network (MNN), and gradient boosting (GB) algorithms for predicting the electricity and gas consumption of residential buildings [22].

Kaile et al. examined household energy data by applying an interdisciplinary research framework that incorporates energy, social, and information science and discussed the “4V” characteristics of energy data [23].

From the above-reported work, we can conclude that work in this field is still evolving. To devise impactful behavioral intervention strategies, the selection of an appropriate machine learning algorithm and the nature of the dataset is crucial. This paper focuses on predicting the energy-saving awareness of households using a machine learning model (LightgBM) based on a novel dataset, which should be useful in making effective intervention strategies to change households’ behavior for lowering domestic power consumption.

We use household data collected from 237 households for 11 months and adopt the LightGBM (light gradient boosting machine) method to predict energy-saving awareness of households. LightGBM is a decision tree ensemble method based on a gradient boosting framework, which has been proven to be effective [24,25]. LightGBM selects the feature of larger gradients, drastically decreasing the computation memory, making the algorithm faster, and improving the predictive performance. Due to utilizing less memory, LightGBM performs well on a large dataset [26,27,28]. We used

ℓ^{1}

regularization based on the idea of compressed sensing, by which we can sparsify the coefficients in the model [29].

By using the LightGBM classifier, we extract important features from a questionnaire survey and 11-month energy data collected from 237 households in Kitakyushu, Japan. We demonstrate that the key features are:

The standard deviation of the electricity purchased between 8 a.m. and 9 a.m.,
The standard deviation of the electricity consumed between 8 p.m. and 9 p.m., and
The standard deviation of the electricity consumed between 7 p.m. and 8 p.m.

Based on the above key features, we split the households into three distinct groups using k-means clustering. By statistical hypothesis testing, we prove these groups have distinct levels of energy-saving awareness. The obtained three-group model can be used for predicting the energy-saving awareness of households who did not participate in the questionnaire survey.

This paper can contribute to the LightGBM model-based household electricity data analysis to develop efficient behavioral intervention strategies for reducing domestic power consumption. The findings can assist policymakers, energy-based companies, and governments in formulating effective energy policies and designing targeted marketing strategies. Ultimately, this research has the potential to promote sustainable energy practices, enhance households’ energy-saving behaviors, and move towards nearly zero-energy houses.

2. Materials and Methods

2.1. Data Collection and Description

This paper analyzes household energy data collected by smart meters and questionnaire survey results. The household energy data includes four items listed in Table 1, recorded every 30 min for 11 months from 1 April 2021 to 28 February 2022, collected from 493 smart houses in Kitakyushu, Japan. These smart houses are also equipped with solar panels for producing their own electricity. Household energy data includes electricity purchased, sold, and produced by various residents. The floor size of these smart houses ranges from less than 50

m^{2}

to more than 200

m^{2}

.

Survey Question

In addition, we use the results of a questionnaire survey. The survey question and its options are given as follows:

Survey question: how has your energy-saving awareness changed compared to before you moved into smart houses?

Please choose the best suitable answer from the following options:

Very high.
A little bit higher.
It is almost the same (I have been aware of it for a long time).
It is almost the same (no awareness as before).
I am not as aware of it as I used to be.

The household energy data was collected in CSV format. In the questionnaire survey-based CSV, all options were assigned a numeric value from 1 to 5 based on the option number of answers to the survey question.

2.2. Data Pre-Processing

Data pre-processing is an important and preliminary part of dealing with machine learning-based problems. A lot of effort is needed before applying any machine-learning algorithm to raw data. The data was collected from an energy-saving awareness-based intervention experiment conducted on 493 households in Kitakyushu city. Among 493 households, 151 include critical incorrect data that we removed from the dataset. Among the remaining 342 households, 237 answered the survey question. Although there remain some missing values in the energy data of the 237 households, the LightGBM algorithm accepts such missing data, and hence we use the data as it is.

2.3. Model Development

We develop a machine learning model using LightGBM to predict the survey’s answers from time series-based household energy data. Then, using a few important features obtained by the LightGBM, we divide the households into three clusters by k-means clustering algorithm such that each cluster demonstrates a different level of energy-saving awareness. These clusters can be used to predict the energy-saving awareness of households who did not participate in the survey. We finally conduct statistical hypothesis testing with the

χ^{2}

test to verify our model statistically. The detail of the machine learning-based model for analyzing household energy data has been discussed below.

2.3.1. Light Gradient Boosting Machine

We apply LightGBM to energy data belonging to 237 households (comprising 617 family members) to identify the features that are effective in predicting energy conservation awareness in households using the following variables:

Explanatory variables: mean and standard deviations of household energy data (hourly and monthly)
Target variables: different levels of energy saving awareness (changes in energy conservation awareness, i.e., options (1–5) to the survey question)

We use

ℓ^{1}

regularization to sparsity the coefficient in the model. The

ℓ^{1}

regularization has a feature selection property, as it forces some of the features to have zero weight, which effectively drops unimportant features from the model. The regularization parameter is manually tuned to get the best performance. We use the k-fold cross-validation technique to validate our LightGBM-based model. In our model, the default cost function for LightGBM’s regression is the mean squared error.

Finally, we extract a few important features to predict the values of the answers (1–5) to the survey question, as shown in Figure 1.

2.3.2. k-Means Clustering

We then execute k-means clustering with the three most important features (extracted from LightGBM) to split the 342 households (including those who did not respond to the survey question) into three different groups, representing different levels of energy-saving awareness. These clusters can predict the energy-saving awareness of households that did not respond to the survey question.

2.3.3. Statistical Hypothesis Testing

To statistically analyze the nature of all three clusters and to prove these clusters belong to a different level of energy-saving awareness, we use statistical hypothesis testing (on 237 households who responded to the questionnaire survey) under the null and alternative hypothesis framework followed by the

χ^{2}

test, as explained below in detail:

Null hypothesis: this assumes changes in energy conservation awareness and the clusters are mutually independent
Alternative hypothesis: this assumes changes in energy conservation awareness and the clusters are mutually dependent
$χ^{2}$ test: for this, we conduct a $χ^{2}$ test to calculate the p-value for making a decision to accept or reject the null hypothesis.

For rejecting the null hypothesis, the p-value should be less than the significance level (0.05) consequently, we can conclude that the alternative hypothesis is valid, and three clusters, namely 0, 1 and 2, represent three different levels of energy-saving awareness.

3. Results and Discussion

Energy economics is a vast field, however, it mainly involves the consumption of energy, the economic impact of energy policies, and the economic and environmental implications of different energy sources. Our machine learning model is based on k-means clustering, LightGBM, and statistical hypothesis testing that can interpret the above fields. Our model can also assist policymakers by offering important predictions based on household energy data and decision-making capabilities that can help to improve domestic power consumption, develop efficient energy policies, and promote sustainable energy practices while balancing economic, environmental, and social goals. Detailed results have been discussed below:

Figure 1 shows the result of the importance values of the statistical features extracted from LightGBM. The standard deviation and mean value of energy data are calculated for extracting the statistical features from time series energy data.

We use the stratified k-fold cross-validation technique to utilize the entire dataset and the F1 score was calculated to evaluate the model’s performance, as given below in Table 2.

We have also calculated the F1 score on the 20% test sample, which was 0.05.

This result suggests that (1) the standard deviation (STD) of electricity purchased from 8 a.m. to 9 a.m., (2) the STD of electricity consumption from 8 p.m. to 9 p.m., and (3) STD of electricity consumption from 7 p.m. to 8 p.m., are the three most important features in predicting household energy conservation awareness (i.e., the answers to the survey’s question).

Here, STD indicates how much the values of electricity purchased, and consumed deviate from the mean value within a given time interval. A higher standard deviation indicates more significant fluctuations or differences in electricity consumed or purchased during that time frame among the households.

From the above results, we can say that the STD of electricity consumption from 7 p.m. to 9 p.m. and the STD of electricity purchased from 8 a.m. to 9 a.m. is effective in predicting energy conversation awareness of households and designing effective intervention strategies for the following reasons:

Most occupants stay at home during this time interval (as fluctuations in energy data are comparatively huge during the morning and evening hours), so it is effective to conduct a questionnaire survey or compare households based on energy-saving awareness during the morning (until 9 a.m.) or evening hours (until 9 p.m.).
Home appliances used for space heating and space cooling (which consume high amounts of energy) may be strongly associated with energy-saving awareness.

Based on the aforementioned results, it can be concluded that a majority of households stay at home during the evening and morning hours. Therefore, if policymakers aim to implement effective energy policies for reducing household energy consumption, they should consider conducting surveys during evening and morning hours, as it allows for a broader reach to households. Furthermore, the results indicate that electricity consumption is comparatively higher in some households during the evening. Therefore, it is recommended that policymakers should encourage households to use energy-efficient equipment for heating, cooling, and kitchen purposes in order to minimize electricity usage during the evening.

As explained above, we used LightGBM to extract important features from households based on their characteristics and behaviors. Energy-based companies can use this information to develop targeted marketing strategies and design pricing and tariffs to encourage electricity saving, promote energy-saving appliances, or incentivize renewable energy sources. Additionally, the aforementioned information extracted by LightGBM can help governments analyze households’ energy consumption patterns and develop efficient energy-conservation policies for enhancing households’ energy-saving behavior.

Then, we apply the k-means clustering algorithm using all 342 energy data based on the three features (1), (2), and (3) as mentioned above.

We set the number of clusters

k = 3

and obtain the result as shown in Figure 2.

The numbers of households in Groups 0, 1, and 2 are 60, 164, and 118, respectively, which can be visualized from a 3D cluster representation of 342 households, as shown in Figure 3.

The results of the k-means clustering can be used to identify different groups of households with distinct energy consumption patterns and behaviors. This information can be used to design energy policies that can help to reduce household energy consumption and evaluate the economic impact of those policies on different groups of households while promoting sustainable energy practices.

We then confirm if the clustering well splits the households according to the energy-saving awareness. For this, we test the 237 households who responded to the questionnaire survey by evaluating the statistical significance of clusters with the

χ^{2}

test.

The null hypothesis assumes the change in energy-saving awareness and the clusters are independent. The

χ^{2}

value and the p-value are

25.26

and

0.001

, respectively. The probability value was lower than the significance level of 0.05. Consequently, the null hypothesis is rejected, and it can be concluded that these clusters represent the changes in the energy conservation awareness of households.

In Table 3, it can be seen that all clusters show different mean values of changes in energy conservation awareness data. From Table 3 we found cluster 0 has the highest value of the mean of changes in energy conservation awareness data of households belonging to cluster 0; consequently, it represents the lowest level of energy conservation awareness as in the questionnaire survey, 1 (option 1) is assigned to households who has the highest level of energy conservation awareness. Cluster 1 shows the lowest value of the mean of changes in electricity conservation awareness data belonging to cluster 1; consequently, cluster 1 represents the highest level of energy conservation awareness. The changes in energy conservation awareness for cluster 2 lie between cluster 1 and cluster 3. Consequently, we can say that households belonging to cluster 2 exhibit higher changes in energy-saving awareness compared to households in cluster 0 and lower changes in energy-saving awareness compared to households in cluster 1.

The relationship between the level of energy conservation awareness and clusters can be understood from the violin plot, shown in Figure 4. The violin plot depicts summary statistics and the density of each variable. It combines a box and a kernel density plot. The distribution of probability density of changes in energy conservation awareness (option numbers to survey question) for all three types of clusters has been depicted in Figure 4.

The black center line divides the violin plot into two parts of kernel density estimation to show the data distribution. The broader and narrower section of the violin plot represents a higher and lower probability of changes in energy-saving awareness of households, respectively. The white dot, and the thick and thin line in the center of the violin plot represent the median value, interquartile range, and the rest of the distribution of changes in energy-saving awareness of each cluster, respectively. Based on three features, these clusters should play an important role in preparing effective behavior intervention policies to encourage household energy-saving awareness.

From the above explanation, it can be seen that we used statistical hypothesis testing to determine the statistical significance of results obtained from data analysis of energy data based on k-means clustering and LightGBM, which can ensure the policymakers that the results are statistically verified and can be used to develop business strategies and decisions, product development, targeted marketing strategies, energy policies based on behavioral intervention and household energy profiles.

The above results extracted from our model have the potential to help the government to implement different energy policies and are also beneficial to energy-based companies to analyze the economic impact of those policies and provide the best services to households; on the other hand, these results can also empower households financially by lowering their monthly utility bills.

4. Conclusions

In this paper, we propose to use the LightGBM classifier to extract important features by using energy data and questionnaire survey results collected from households in Kitakyushu, Japan. We have revealed that key features are the standard deviations of the electricity purchased between 8 a.m. and 9 a.m. and the power consumed between 7 p.m. and 9 p.m.

We used the k-means clustering algorithm to split the household electricity data into three different groups, based on important features, and with the help of statistical hypothesis testing, we demonstrate that these three groups have distinct levels of energy-saving awareness. These three clusters can be used to predict the energy-saving awareness of different households, including those that did not participate in the questionnaire survey. This article can contribute to the development of machine learning-based models for predicting and influencing energy-saving awareness. Results reported in this article, are highly encouraging and can be used in making effective intervention strategies for enhancing households’ energy-saving behaviors, which can be a big step towards nearly zero-energy houses.

Author Contributions

Conceptualization, N.K.S. and T.F.; methodology, N.K.S. and T.F.; software, N.K.S. and T.F.; validation, N.K.S. and T.F.; formal analysis, N.K.S. and T.F.; investigation, N.K.S. and T.F.; resources, M.N.; data curation, N.K.S. and T.F.; writing—original draft preparation, N.K.S. and T.F.; writing—review and editing, M.N.; visualization, N.K.S. and T.F.; supervision, M.N.; project administration, M.N.; funding acquisition, M.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by JSPS KAKENHI Grant Nos. 23H01436, 22H00512, 22H01653, 22KK0155, and also the Japanese Ministry of Environment.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is unavailable due to privacy restrictions.

Acknowledgments

We would like to express our gratitude to Yoshiaki Ushifusa for his generous provision of data and invaluable support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GBM	Gradient-boosting machine
HEMS	Home energy management systems
ZEH	Zero energy house
STD	Standard deviation
RF	Random forest
MLR	Mltiple regression
MNN	Multilayer neural network

References

Geng, Y.; Chen, W.; Liu, Z.; Chiu, A.S.; Han, W.; Liu, Z.; Zhong, S.; Quia, Y.; Wei, Y.; Cui, X. A biblio-metric review: Energy consumption and greenhouse gas emissions in the residential sector. J. Clean. Prod. 2017, 159, 301–316. [Google Scholar]
Sheng, P.; He, Y.; Guo, X. The impact of urbanization on energy consumption and efficiency. Energy Environ. 2017, 28, 673–686. [Google Scholar] [CrossRef]
Dorian, J.P.; Franssen, H.T.; Simbeck, D.R. Global challenges in energy. Energy Policy 2006, 34, 1984–1991. [Google Scholar] [CrossRef]
Dong, K.; Hochman, G.; Timilsina, G.R. Do drivers of CO₂ emission growth alter overtime and by the stage of economic development? Energy Policy 2020, 140, 111420. [Google Scholar] [CrossRef]
Schleussner, C.-F.; Rogelj, J.; Schaeffer, M.; Lissner, T.; Licker, R.; Fischer, E.M.; Knutti, R.; Levermann, A.; Frieler, K.; Hare, W. Science and policy characteristics of the Paris Agreement temperature goal. Nat. Clim. Chang. 2016, 6, 827–835. [Google Scholar] [CrossRef]
Guzović, Z.; Duić, N.; Piacentino, A.; Markovska, N.; Mathiesen, B.V.; Lund, H. Paving the way for the Paris Agreement: Contributions of SDEWES science. Energy 2022, 263, 125617. [Google Scholar]
Nejat, P.; Jomehzadeh, F.; Taheri, M.M.; Gohari, M.; Abd Majid, M.Z. A global review of energy con-sumption, CO₂ emissions and policy in the residential sector (with an overview of the top ten CO₂ emitting countries). Renew. Sustain. Energy Rev. 2015, 43, 843–862. [Google Scholar] [CrossRef]
Park, E.; Kim, B.; Park, S.; Kim, D. Analysis of the Effects of the Home Energy Management System from an Open Innovation Perspective. J. Open Innov. 2018, 4, 31. [Google Scholar] [CrossRef]
Zhou, B.; Li, W.; Chan, K.W.; Cao, Y.; Kuang, Y.; Liu, X.; Wang, X. Smart home energy management systems: Concept, configurations, and scheduling strategies. Renew. Sustain. Energy Rev. 2016, 61, 30–40. [Google Scholar]
Beaudin, M.; Zareipour, H. Home energy management systems: A review of modelling and complexity. Renew. Sustain. Energy Rev. 2015, 45, 318–335. [Google Scholar]
AlFaris, F.; Juaidi, A.; Manzano-Agugliaro, F. Intelligent homes’ technologies to optimize the energy per-formance for the net zero energy home. Energy Build. 2017, 153, 262–274. [Google Scholar] [CrossRef]
Iweka, O.; Liu, S.; Shukla, A.; Yan, D. Energy and behaviour at home: A review of intervention methods and practices. Energy Res. Soc. Sci. 2019, 57, 101238. [Google Scholar]
Słupik, S.; Kos-Łabędowicz, J.; Trzęsiok, J. Energy-Related Behaviour of Consumers from the Silesia Province (Poland)—Towards a Low-Carbon Economy. Energies 2021, 14, 2218. [Google Scholar] [CrossRef]
Zhao, X.; Cheng, H.; Zhao, H.; Jiang, L.; Xue, B. Survey on the households’ energy-saving behaviors and influencing factors in the rural loess hilly region of China. J. Clean. Prod. 2019, 230, 547–556. [Google Scholar] [CrossRef]
Zhang, Y.; Bai, X.; Mills, F.P.; Pezzey, J.C.V. Rethinking the role of occupant behavior in building energy performance: A review. Energy Build. 2018, 172, 279–294. [Google Scholar]
Ali, U.; Buccella, C.; Cecati, C. Households electricity consumption analysis with data mining techniques. In Proceedings of the IECON 2016—42nd Annual Conference of the IEEE Industrial Electronics Society, Florence, Italy, 23–26 October 2016. [Google Scholar]
Olu-Ajayi, R.; Alaka, H.; Owolabi, H.; Akanbi, L.; Ganiyu, S. Data-Driven Tools for Building Energy Consumption Prediction: A Review. Energies 2023, 16, 2574. [Google Scholar]
Shen, M.; Lu, Y.; Wei, K.H.; Cui, Q. Prediction of household electricity consumption and effectiveness of concerted intervention strategies based on occupant behaviour and personality traits. Renew. Sustain. Energy Rev. 2020, 127, 109839. [Google Scholar] [CrossRef]
Nilsson, A.; Wester, M.; Lazarevic, D.; Brandt, N. Smart homes, home energy management systems and real-time feedback: Lessons for influencing household energy consumption from a Swedish field study. Energy Build. 2018, 179, 15–25. [Google Scholar]
Sun, M.; Konstantelos, I.; Strbac, G. C-Vine Copula Mixture Model for Clustering of Residential Electrical Load Pattern Data. IEEE Trans. Power Syst. 2017, 32, 2382–2393. [Google Scholar] [CrossRef]
Estiri, H. Building and household X-factors and energy consumption at the residential sector. Energy Econ. 2014, 43, 178–184. [Google Scholar] [CrossRef]
Ahmed Gassar, A.A.; Yun, G.Y.; Kim, S. Data-driven approach to prediction of residential energy con-sumption at urban scales in London. Energy 2019, 187, 115973. [Google Scholar] [CrossRef]
Zhou, K.; Yang, S. Understanding household energy consumption behavior: The contribution of energy big data analytics. Renew. Sustain. Energy Rev. 2016, 56, 810–819. [Google Scholar]
Liu, X.; Tang, H.; Ding, Y.; Yan, D. Investigating the performance of machine learning models combined with different feature selection methods to estimate the energy consumption of buildings. Energy Build. 2022, 273, 112408. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
Bassi, A.; Shenoy, A.; Sharma, A.; Sigurdson, H.; Glossop, C.; Chan, J.H. Building Energy Consumption Forecasting: A Comparison of Gradient Boosting Models. In Proceedings of the IAIT2021: The 12th International Conference on Advances in Information Technology, Bangkok, Thailand, 29 June–1 July 2021; ACM: New York, NY, USA, 2021. [Google Scholar]
Zhang, Y.; Teoh, B.K.; Wu, M.; Chen, J.; Zhang, L. Data-driven estimation of building energy con-sumption and GHG emissions using explainable artificial intelligence. Energy 2023, 262, 125468. [Google Scholar] [CrossRef]
Emami Javanmard, M.; Ghaderi, S.F.; Hoseinzadeh, M. Data mining with 12 machine learning algorithms for predict costs and carbon dioxide emission in integrated energy-water optimization model in buildings. Energy Convers. Manag. 2021, 238, 114153. [Google Scholar]
Nagahara, M. Sparsity Methods for Systems and Control; Now Publishers: Norwell, MA, USA, 2020. [Google Scholar]

Figure 1. Feature importance using the LightGBM algorithm.

Figure 2. Clusters based on changes in energy-saving awareness of households (using k-means clustering).

Figure 3. Clustering result in the 3D feature space.

Figure 4. Violin plot for depicting statistical-based characteristics and density of changes in energy conversation awareness of all clusters.

Table 1. Household energy data description.

Energy Type	Unit
Electricity purchased	Watt-hour
Electricity sold	Watt-hour
Electricity produced by solar panels	Watt-hour
Net domestic electricity consumption	Watt-hour

Table 2. F1 score in each fold.

Fold Number	F1 Score
1	0.05
2	0.05
3	0.10
4	0.04
5	0.13
6	0.04

Table 3. Statistical characteristic of clusters based changes in energy saving awareness.

Cluster	Count	Mean	Std	Min	25%	50%	75%	Max
0	39	3.02	1.13	1	2	3	4	5
1	117	2.30	0.87	1	2	2	3	4
2	81	2.45	0.93	1	2	2	3	5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Singh, N.K.; Fukushima, T.; Nagahara, M. Gradient Boosting Approach to Predict Energy-Saving Awareness of Households in Kitakyushu. Energies 2023, 16, 5998. https://doi.org/10.3390/en16165998

AMA Style

Singh NK, Fukushima T, Nagahara M. Gradient Boosting Approach to Predict Energy-Saving Awareness of Households in Kitakyushu. Energies. 2023; 16(16):5998. https://doi.org/10.3390/en16165998

Chicago/Turabian Style

Singh, Nitin Kumar, Takuya Fukushima, and Masaaki Nagahara. 2023. "Gradient Boosting Approach to Predict Energy-Saving Awareness of Households in Kitakyushu" Energies 16, no. 16: 5998. https://doi.org/10.3390/en16165998

APA Style

Singh, N. K., Fukushima, T., & Nagahara, M. (2023). Gradient Boosting Approach to Predict Energy-Saving Awareness of Households in Kitakyushu. Energies, 16(16), 5998. https://doi.org/10.3390/en16165998

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gradient Boosting Approach to Predict Energy-Saving Awareness of Households in Kitakyushu

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection and Description

Survey Question

2.2. Data Pre-Processing

2.3. Model Development

2.3.1. Light Gradient Boosting Machine

2.3.2. k-Means Clustering

2.3.3. Statistical Hypothesis Testing

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI