Monthly Electric Load Forecasting Using Transfer Learning for Smart Cities

: Monthly electric load forecasting is essential to e ﬃ ciently operate urban power grids. Although diverse forecasting models based on artiﬁcial intelligence techniques have been proposed with good performance, they require su ﬃ cient datasets for training. In the case of monthly forecasting, because just one data point is generated per month, it is not easy to collect su ﬃ cient data to construct models. This lack of data can be alleviated using transfer learning techniques. In this paper, we propose a novel monthly electric load forecasting scheme for a city or district based on transfer learning using similar data from other cities or districts. To do this, we collected the monthly electric load data from 25 districts in Seoul for ﬁve categories and various external data, such as calendar, population, and weather data. Then, based on the available data of the target city or district, we selected similar data from the collected datasets by calculating the Pearson correlation coe ﬃ cient and constructed a forecasting model using the selected data. Lastly, we ﬁne-tuned the model using the target data. To demonstrate the e ﬀ ectiveness of our model, we conducted an extensive comparison with other popular machine-learning techniques through various experiments. We report some of the results.


Introduction
With the recent increase in the use of fossil fuels to cope with the explosive demand for energy, diverse global problems, such as greenhouse gas and energy resource depletion, have attracted much attention. For instance, many efforts have been made to reduce greenhouse gas emissions [1,2]. One of the representative efforts of the local government is the transition to a smart city. A smart city can be defined as a developed urban area that employs big data and advanced information and communication technologies (ICTs) to improve sustainability through planning [3,4]. Smart cities can reduce greenhouse gas emissions by reducing traffic congestion and energy consumption via big data analysis and then introducing alternatives, such as electric vehicles and renewable energy [5]. The transition to a smart city has been accelerated and refined by various artificial intelligence (AI) technologies, which have already enhanced the performance of various ICT areas, such as communications [6,7], applications [8,9], content [10,11], and digital commerce [12,13].
The infrastructure for electricity forms the basis for maintaining the fundamental survival of members of society [14]; hence, it is imperative to supply electricity stably and efficiently in an environmentally friendly manner [15]. The current electric grid can respond to different contextual purposes in diverse environments using various subsystems and components [16]. However, because traditional power systems operate based on old technology, it is difficult to meet new requirements, such as optimal power generation [17]. Improving the productivity and quality of the electricity is one 1.
We use the transfer learning technique to perform accurate hierarchical monthly electric load forecasting for metropolitan cities using public datasets.

2.
By calculating the Pearson correlation coefficients (PCCs), we selected relevant domains that can improve the effectiveness of transfer learning. 3.
We demonstrate that our proposed model can exhibit higher performance than popular statistical and ensemble methods.
This paper is organized as follows. In Section 2, we introduce several related studies on electric load forecasting. In Section 3, we describe our forecasting modeling. In Section 4, we explain some of the experiments for evaluating the performance of our forecasting model. Finally, in Section 5, we conclude the paper.

Related Work
Thus far, many studies have been conducted to construct electric load forecasting models using various techniques, such as statistics and machine learning. In particular, because deep learning algorithms are used to construct such models, it has become critical to acquire sufficient data for training. When the available data for training a model are extremely limited, transfer learning can be considered to solve the data shortage problem. Table 1 summarizes the various models for load forecasting.  [36] Thailand's electric utility 1 month Time factors SARIMA [37] Four buildings in Singapore 1 month Weather information Economic factors SVM [38] Large-scale public buildings in Xi'an 1 month Historical load Time factors Weather information MLR [39] One million customers from Bexar

Artificial Neural Network-Based Models
Hossen et al. [30] proposed a deep neural network (DNN)-based load forecasting model. To train the model, they collected hourly ninety days of electric load data. When constructing the forecasting model, they considered temperature, wind speed, solar irradiance, and the prior load data as input. They investigated various combinations of activation functions for short-term load forecasting and compared the results.
Kuo and Huang [31] developed an hourly electric load forecasting model based on a convolutional neural network (CNN). To build the forecasting model, they considered the electric load data from the past seven days for input variables and the next three days for output variables. Through an extensive comparison with other models based on support vector machine (SVM), random forest (RF), decision tree, and other methods, they demonstrated that the proposed model achieved the best prediction performance.
In addition, Chitsaz et al. [32] proposed a self-recurrent wavelet neural network (SRWNN)-based electric load forecasting model. The SRWNN is a modified wavelet neural network model that includes the properties of the dynamics of a recurrent neural network (RNN). Their forecasting model predicted the hourly electric load of one building and two power systems. Moreover, Hosein et al. [33] built short-term load forecasting models using the DNN and popular machine-learning techniques such as weighted moving average, linear regression, regression trees, and support vector regression, and compared their performance. They demonstrated that the DNN-based models outperform the machine-learning-based models.

Mid-Term Load Forecasting Models
Damrongkulkamjorn et al. [36] constructed a monthly electric energy forecasting model using the seasonal autoregressive integrated moving average (SARIMA). They presented the idea of forecasting the trend-cycle portion of the decomposition method using the autoregressive integrated moving average (ARIMA) method. The seasonal component was estimated using an averaging method. Then, they compared their forecasting model with standard approaches and estimated the trend-cycle using a best-fit mathematical function called the S-curve.
Furthermore, Dong et al. [37] proposed a forecasting model using the SVM to predict the monthly electricity consumption of four buildings in a tropical region. They trained the model using data collected over the previous three years, and then the trained model was applied to one year of data. They demonstrated that their forecasting model exhibits effective performance.
Ma et al. [38] integrated multiple linear regression and self-regression methods to predict monthly electricity consumption for large-scale public buildings. They eliminated the error caused by the self-selection of variables through the self-regression analysis.
In addition, Berriel et al. [39] developed several forecasting models based on deep learning approaches for monthly energy consumption prediction. They used more than 10 million samples from almost one million customers to build forecasting models based on the DNN, CNN, and long short-term memory (LSTM). They confirmed that the LSTM model has the highest prediction performance.

Transfer Learning-Based Models
Mocanu et al. [40] constructed a building energy prediction model using reinforcement learning. They presented a deep belief network (DBN) for feature extraction and then extended two reinforcement learning algorithms to perform knowledge transfer between reference building and target building. They found that reinforcement learning using DBNs for continuous-state estimation can successfully perform energy prediction. Moreover, they found that the proposed method can be used to apply trained models to other buildings.
Rebeiro et al. [41] proposed a transfer learning method called Hephaestus for cross-building energy forecasting based on time-series multi-feature regression with seasonal and trend adjustment to forecast the energy consumption of educational buildings. They collected energy consumption data from similar buildings with different energy magnitudes and merged them. The data were processed using the Hephaestus steps. They confirmed that the proposed approach could improve energy prediction for a university by using additional data from other universities.
Moreover, Hooshmand and Sharma [42] proposed a transfer learning method based on a CNN and demonstrated their approach to the use case of daily electric load forecasting. They confirmed that the proposed transfer learning strategy on the CNN model demonstrates superior prediction performance compared to other forecasting methods, such as SARIMA and the basic CNN model.
Diverse machine-learning and deep learning techniques have been proposed for monthly load forecasting. However, few studies have been done on deep learning models using transfer learning. To the best of our knowledge, this is the first study that used transfer learning based on similarity between data for MTLF.

Proposed Model
In this section, we first describe the dataset we used to construct our monthly load forecasting model. The dataset, which is provided by Seoul, contains the monthly electric load of 25 districts in Seoul from January 2005 to December 2018. The dataset can be divided into five categories: household, public, industrial, service, and total [43]. Figure 1 illustrates the details of the dataset. The household dataset comprises the monthly electricity consumption of purely residential customers. The public dataset contains the monthly electricity consumption for public purposes, such as social infrastructures and government organizations. Likewise, the industrial dataset reveals the monthly electricity consumption for mining, manufacturing, construction, and so on. The service dataset includes the monthly electricity consumption for water/wastewater treatment, fast transit systems, commercial offices, and retail buildings. Finally, the total dataset represents the sum of electricity consumption in only these four categories. As a result, the dataset comprises 125 pieces of sequence data for 14 years covering five categories and 25 districts and is used as source domains for transfer learning.
between data for MTLF.

Proposed Model
In this section, we first describe the dataset we used to construct our monthly load forecasting model. The dataset, which is provided by Seoul, contains the monthly electric load of 25 districts in Seoul from January 2005 to December 2018. The dataset can be divided into five categories: household, public, industrial, service, and total [43]. Figure 1 illustrates the details of the dataset. The household dataset comprises the monthly electricity consumption of purely residential customers. The public dataset contains the monthly electricity consumption for public purposes, such as social infrastructures and government organizations. Likewise, the industrial dataset reveals the monthly electricity consumption for mining, manufacturing, construction, and so on. The service dataset includes the monthly electricity consumption for water/wastewater treatment, fast transit systems, commercial offices, and retail buildings. Finally, the total dataset represents the sum of electricity consumption in only these four categories. As a result, the dataset comprises 125 pieces of sequence data for 14 years covering five categories and 25 districts and is used as source domains for transfer learning.

Calendar Data
In general, the amount of electricity consumption is determined by diverse factors. Among them, we considered three major factors. The first major factor is calendar data, such as the year, month, season, number of days, number of weekends, and number of holidays. The data are summarized with a brief description of the data representation in Table 2.
Months have periodic properties [44]. For example, because December and January are temporally adjacent to each other, they have similar characteristics in terms of temperature and season. However, if we represent them numerically using 12 and 1, the difference in the categorical format becomes 11. To reflect this periodicity, we transform the month data into continuous data using Equation (1) and Equation (2) [45]: Additionally, electricity consumption is usually reduced on holidays. Hence, the number of days and holidays in a month is related to the electricity consumption of the month [46]. In particular, because the number of holidays on weekdays may significantly affect the model, we also considered Sustainability 2020, 12, 6364 6 of 18 the number of holidays on weekdays for input variables. In addition, the extended holiday season is likely to generate a different electricity demand pattern than usual. Thus, the maximum holiday length for the month was also considered. We collected the holiday data from the website Time and Date (https://www.timeanddate.com). Consequently, we chose 24 input variables from the calendar data.

Population Data
The second factor that we considered for the model construction was the population data for a district [47]. The population data for the districts in Seoul are available from the Seoul Open Data Plaza (http://data.seoul.go.kr/). Table 3 presents the input variables representing the population data, which include the population, area of the district, population density, in-migration, out-migration, and net migration. These data are determined in March each year based on the ID card information. The two types of population data are annual and monthly. The annual population data contain the population, district area, and population density. Monthly population data contain in-migration, out-migration, and net migration. The population in a district is highly correlated with the electricity consumption of that district. Still, we considered the monthly population data for a more detailed analysis. Population density Annual 4 In-migration Monthly 5 Out-migration Monthly 6 Net migration Monthly

Weather Data
The last factor is the weather data, such as temperature, humidity, and precipitation [48]. We collected the monthly weather data from the Korea Meteorological Administration (KMA). The KMA provides diverse types of forecasts, such as three-day ahead, mid-term, and long-range forecasts, depending on the time resolution. The long-range forecast consists of the one-month and three-month outlooks. We used the one-month outlook forecast for MTLF to provide a monthly weather forecast. The one-month outlook included the average temperature, average maximum temperature, average minimum temperature, and total precipitation for one month. We used them as the weather data.

Model Construction
In this paper, we aim to construct a forecasting model for a specific category of district. Hence, the target domain is represented by the district name and electricity category, and other combinations of districts and categories become the source domain. As we mentioned, we have a total of 125 combinations from 25 districts and five categories. We first constructed a monthly electric load forecasting model using similar electric load data from the remaining 124 electric load data, which are the source domains. To do this, we first divided each source data and target data into a training and a test set. To find similar load data from the source domains, we calculated the PCCs between the training sets of the source data and the target data. Then, using the selected source data, we constructed a DNN-based forecasting model. Finally, we fine-tuned the forecasting model using the training set of the target data and evaluated the model using the test set of the target data. The overall steps for constructing our model are illustrated in Figure 2.

Deep Neural Network
The artificial neural network is a machine-learning algorithm that imitates the human brain and consists of three layers: an input layer, one or more hidden layers, and an output layer [49,50]. Each layer consists of several nodes called perceptrons. Each node takes values from the nodes in the previous layer and determines an activation for the next nodes through a node activation function, such as a sigmoid function, hyperbolic tangent function, rectified linear unit (ReLU), exponential linear unit (ELU), scaled exponential linear unit (SELU), and so on. This process is repeated and the activation of the nodes in the final layer (the output layer) produces the desired outputs. If the number of hidden layers in the ANN is at least two, the network is called a DNN [50]. Figure 3 illustrates a typical DNN structure for MTLF. The more hidden layers in the DNN, the more complex the network becomes. Increasing the network complexity may improve the performance of the

Deep Neural Network
The artificial neural network is a machine-learning algorithm that imitates the human brain and consists of three layers: an input layer, one or more hidden layers, and an output layer [49,50]. Each layer consists of several nodes called perceptrons. Each node takes values from the nodes in the previous layer and determines an activation for the next nodes through a node activation function, such as a sigmoid function, hyperbolic tangent function, rectified linear unit (ReLU), exponential linear unit (ELU), scaled exponential linear unit (SELU), and so on. This process is repeated and the activation of the nodes in the final layer (the output layer) produces the desired outputs. If the number of hidden layers in the ANN is at least two, the network is called a DNN [50]. Figure 3 illustrates a typical DNN structure for MTLF. The more hidden layers in the DNN, the more complex the network becomes. Increasing the network complexity may improve the performance of the network. However, if the structure of DNN is more complex than necessary, the model could exhibit poor performance for unseen data, called overfitting [51]. Consequently, selecting the proper number of hidden layers is essential. Various hyper-parameters must be considered to construct a DNN-based MTLF model, such as the number of hidden layers, number of nodes in the hidden layers, and activation function. To determine the hyper-parameters of the DNN model, we performed several comparative experiments. We set the number of hidden layers to 3, 4, 5, and 6, and considered SELU, ReLU, and ELU as activation functions. Table 4 shows the results. The number of hidden layers is set to five, and the activation function in a multilayer perceptron is SELU [49]. Compared with the ReLU and ELU functions, which are typically used in DNNs, SELU is linear and increases the convergence speed of the stochastic gradient descent. The SELU is defined by Equation (3), where and are fixed constants, and is the input value. In the node configuration of the DNN model, the input layer comprises 33 nodes, and the hidden layer comprises 22 nodes per layer by applying two-thirds of the number of nodes of the input layer [52,53]. The output layer has only one node. Furthermore, we set the rest of the hyper-parameters as follows: batch size to 12, epoch to 1000, learning rate to 0.0001, and optimizer to adaptive moment estimation (Adam). We use the mean squared error (MSE) as a loss function, which is defined in Equation (4), where n is the number of observations and and are the actual and predicted values at time i:

Input Layer Output Layer Hidden Layers
Forecast Load ( ) Various hyper-parameters must be considered to construct a DNN-based MTLF model, such as the number of hidden layers, number of nodes in the hidden layers, and activation function. To determine the hyper-parameters of the DNN model, we performed several comparative experiments. We set the number of hidden layers to 3, 4, 5, and 6, and considered SELU, ReLU, and ELU as activation functions. Table 4 shows the results. The number of hidden layers is set to five, and the activation function in a multilayer perceptron is SELU [49]. Compared with the ReLU and ELU functions, which are typically used in DNNs, SELU is linear and increases the convergence speed of the stochastic gradient descent. The SELU is defined by Equation (3), where α and λ are fixed constants, and x is the input value. In the node configuration of the DNN model, the input layer comprises 33 nodes, and the hidden layer comprises 22 nodes per layer by applying two-thirds of the number of nodes of the input layer [52,53]. The output layer has only one node. Furthermore, we set the rest of the hyper-parameters as follows: batch size to 12, epoch to 1000, learning rate to 0.0001, and optimizer to adaptive moment estimation (Adam). We use the mean squared error (MSE) as a loss function, which is defined in Equation (4), where n is the number of observations and y i andŷ i are the actual and predicted values at time i:

Transfer Learning
When limited data are available from the target domain for training, adaptation techniques, such as transfer learning, can be employed to improve the performance of a forecasting model [34,54]. Figure 4 illustrates a typical structure for transfer learning that uses the weights generated by neural networks trained using a large dataset similar to the target domain dataset. This process is called pre-training.  When limited data are available from the target domain for training, adaptation techniques, such as transfer learning, can be employed to improve the performance of a forecasting model [34,54]. Figure 4 illustrates a typical structure for transfer learning that uses the weights generated by neural networks trained using a large dataset similar to the target domain dataset. This process is called pretraining. Then, the pre-trained network is trained again using the smaller target domain dataset. This process is called fine-tuning. When the data in the target domain and the source domain are similar, the resulting model could exhibit satisfactory performance. Thus, to find similar source domain data, correlation analysis methods, such as the PCC analysis, can be used to determine the similarity between two domains [34,55]. The PCC analysis is defined by Equation (5)  Then, the pre-trained network is trained again using the smaller target domain dataset. This process is called fine-tuning. When the data in the target domain and the source domain are similar, the resulting model could exhibit satisfactory performance. Thus, to find similar source domain data, correlation analysis methods, such as the PCC analysis, can be used to determine the similarity between two domains [34,55]. The PCC analysis is defined by Equation (5), where R XY denotes the PCC between X and Y, n is the number of observations, and X i and Y i are the values at time i. Moreover, X and Y are the mean values of X and Y, respectively: Sustainability 2020, 12, 6364 10 of 18 The PCC is used to measure the linear correlation between two variables X and Y. It has a value between +1 and −1. Whereas a value of +1 (−1) indicates the total positive (negative) linear correlation between them, 0 indicates no linear correlation. Figure 5 illustrates an example of constructing a transfer learning-based DNN model, where X i,S and X i,T are the input of the source domain and the target domain at time i, respectively. Y i,S and Y i,T are the output of the source domain and the target domain at time i, respectively. When selecting the source domain, to consider all possible positive and negative trends, we selected domains with an absolute value of PCC close to 1. Using the selected source domains as a training set, we pre-trained the DNN model and fine-tuned the DNN model using a training set for the target domain. The PCC is used to measure the linear correlation between two variables X and Y. It has a value between +1 and −1. Whereas a value of +1 (−1) indicates the total positive (negative) linear correlation between them, 0 indicates no linear correlation. Figure 5 illustrates an example of constructing a transfer learning-based DNN model, where , and , are the input of the source domain and the target domain at time i, respectively. , and , are the output of the source domain and the target domain at time i, respectively. When selecting the source domain, to consider all possible positive and negative trends, we selected domains with an absolute value of PCC close to 1. Using the selected source domains as a training set, we pre-trained the DNN model and fine-tuned the DNN model using a training set for the target domain.

Experimental Results
To evaluate the effectiveness of our model, we performed very extensive experiments. For the experiments, we collected five categories of monthly electric load data for 25 districts in Seoul from January 2005 to December 2018. Among them, we used the data from January 2005 to December 2016 as a training set and the data from January 2017 to December 2018 as a testing set. In the experiment, we considered every combination of district and category as a target domain and the other data as source domains. All experiments were performed in Python 3.7.6, and the models were constructed using TensorFlow 1.13.1. Figure 6 reveals part of a table containing PCC values calculated for the 125 domains. Each cell is marked with distinct colors according to the PCC value. The figure schematically shows the data characteristics by district and category. For instance, Jongno's total is very similar to most categories of other districts because they have high PCC values. In contrast, Yongsan's industrial is very different from most categories of other districts because they show low PCC values. This indicates that the Jongno's total data are suitable for transfer learning, while the Yongsan's industrial data are not. Meanwhile, Jongno's industrial is not similar to Jung's industrial, even though they belong to the same category. Rather, it is more similar to Jung's service even though they belong to different

Experimental Results
To evaluate the effectiveness of our model, we performed very extensive experiments. For the experiments, we collected five categories of monthly electric load data for 25 districts in Seoul from January 2005 to December 2018. Among them, we used the data from January 2005 to December 2016 as a training set and the data from January 2017 to December 2018 as a testing set. In the experiment, we considered every combination of district and category as a target domain and the other data as source domains. All experiments were performed in Python 3.7.6, and the models were constructed using TensorFlow 1.13.1. Figure 6 reveals part of a table containing PCC values calculated for the 125 domains. Each cell is marked with distinct colors according to the PCC value. The figure schematically shows the data characteristics by district and category. For instance, Jongno's total is very similar to most categories of other districts because they have high PCC values. In contrast, Yongsan's industrial is very different from most categories of other districts because they show low PCC values. This indicates that the Jongno's total data are suitable for transfer learning, while the Yongsan's industrial data are not.
Meanwhile, Jongno's industrial is not similar to Jung's industrial, even though they belong to the same category. Rather, it is more similar to Jung's service even though they belong to different categories. As a result, electric data from various domains can be effectively used for transfer learning if the similarity between data is well considered.
Sustainability 2020, 12, x FOR PEER REVIEW 13 of 20 categories. As a result, electric data from various domains can be effectively used for transfer learning if the similarity between data is well considered.
In addition, to observe the effect of the number of data for training the forecasting model, we constructed three different DNN models by selecting the top 10 (DNN_T10), 20 (DNN_T20), and 30 (DNN_T30) most similar domains in terms of PCC. Table 5 lists the average of the highest 10, 20, and 30 PCC values in the five categories for each district.   In addition, to observe the effect of the number of data for training the forecasting model, we constructed three different DNN models by selecting the top 10 (DNN_T10), 20 (DNN_T20), and 30 (DNN_T30) most similar domains in terms of PCC. Table 5 lists the average of the highest 10, 20, and 30 PCC values in the five categories for each district.
For the performance comparison, we considered the mean absolute percentage error (MAPE) and the normalized root mean squared error (NRMSE) [20,50]. The MAPE and NRMSE are defined by Equations (6) and (7), where n is the number of observations, y i andŷ i are the actual and predicted values at time i, respectively, and y is the mean of the actual values: In addition to these three models, we considered four more forecasting models for comparison [56]: multiple linear regression (MLR), RF, extreme gradient boost (XGB), and baseline DNN. We calculated the MAPE and NRMSE values for all categories and then calculated the average value for each district. Tables 6 and 7 present the average MAPEs and NRMSEs of five datasets for each district. The most commonly used method for tuning hyper-parameters is a grid search, which tries all possible combinations of the hyper-parameters of interest. Therefore, we selected the optimal hyper-parameters for the RF and XGB models using the grid search with cross-validation. The hyper-parameter tuning method, which divides data into training, validation, and test sets, can cause the problem of overfitting the validation set. To prevent this problem, we divided the data into training and test sets, and then used 5-fold cross-validation for the training data. Figures 7 and 8 illustrate the box plots for the MAPE and NRMSE of the forecasting models. Our three transfer learning-based MTLF models exhibit better prediction performance than baseline DNN and other machine-learning-based models. Figure 9 illustrates the trend intuitively by presenting the MAPE reduction rate for PCC values for DNN_T10, 20, and 30. The x-axis represents the PCC, and the y-axis represents the MAPE reduction rate, with each point representing the MAPE reduction rate for each PCC of the proposed models. The red line in the middle represents the trend line of the MAPE values. Figure 9 depicts a positive trend line slope, so the increase in the PCC value corresponding with transfer learning is closely related to the improvement in performance.      Figure 9 illustrates the trend intuitively by presenting the MAPE reduction rate for PCC values for DNN_T10, 20, and 30. The x-axis represents the PCC, and the y-axis represents the MAPE reduction rate, with each point representing the MAPE reduction rate for each PCC of the proposed models. The red line in the middle represents the trend line of the MAPE values. Figure 9 depicts a positive trend line slope, so the increase in the PCC value corresponding with transfer learning is closely related to the improvement in performance.   Figure 9 illustrates the trend intuitively by presenting the MAPE reduction rate for PCC values for DNN_T10, 20, and 30. The x-axis represents the PCC, and the y-axis represents the MAPE reduction rate, with each point representing the MAPE reduction rate for each PCC of the proposed models. The red line in the middle represents the trend line of the MAPE values. Figure 9 depicts a positive trend line slope, so the increase in the PCC value corresponding with transfer learning is closely related to the improvement in performance.

Conclusions
Because collecting sufficient monthly electricity consumption data is challenging due to the long recording period, it is difficult to build a sophisticated forecasting model based on an AI technique. To address this issue, in this study, we developed transfer-learning-based DNN models for monthly load forecasting. We used monthly electric load data collected for 14 years from Seoul public data. We configured the 33 input variables of three types, and constructed DNN-based MTLF models of 22 hidden nodes within five hidden layers. We considered the PCC to determine the similarity between the two domains. Then, we determined the top 10, 20, and 30 values with the highest PCC values as the source domains. We concatenated them to develop a training set for pre-training the DNN model. The pre-trained DNN model was fine-tuned using a training set for the target domain. We compared the performance of the proposed models with that of the machine-learning-based models, such as MLR, RF, and XGB, and a basic DNN model. We adopted the MAPE and NRMSE, which are the most popular performance metrics, to compare the prediction performance. We demonstrated that our model outperforms the existing machine-learning-based models and the base DNN model. Consequently, we conclude that the prediction performance improved when using transfer learning compared to the basic DNN.
In future studies, we plan to collect additional datasets from other regions and then to verify that our model is available to different datasets. Furthermore, by discussing other correlation coefficients, we can discover which correlation coefficients are useful for transfer learning.