1. Introduction
Electricity is a cornerstone of modern living, assisting essential activities such as cooking, heating, lighting, and powering countless electronic devices. The demand for electricity is addressed through contracts of varying voltage types, determined by the volume of electricity required and the nature of the equipment used. These contracts are broadly categorized into Low Voltage (LV), typically for residential buildings, small offices, and shops, and High Voltage (HV), for larger establishments such as malls and factories.
The COVID-19 pandemic underscored the profound impact of human behavior on electricity demand. Lockdowns led to significant reductions in industrial sector demand [
1], while residential demand surged, exhibiting substantial temporal variations [
2]. The continuous presence of individuals at home altered daily routines, leading to shifts in consumption peaks, changes in magnitude, and ultimately reshaped consumption profiles [
3]. Although patterns began reverting to pre-pandemic levels post-pandemic, some areas witnessed historic increases in electricity demand, such as Europe in 2021, driven by a robust rebound effect. Moreover, the rise of both telework and hybrid work styles has diversified consumption profiles, making accurate forecasting increasingly challenging. However, with rising demand across both residential and industrial sectors, precise ELD forecasts are more than ever crucial for balancing consumption with generation, optimizing the energy mix, and facilitating effective energy planning.
ELD forecasting, indexed over time, falls within the domain of Time Series Forecasting (TSF). Techniques for addressing TSF range from traditional decomposition methods [
4] and statistical models [
5] to modern Machine Learning (ML) approaches like Recurrent Neural Network (RNN) [
6]. Recently, Transformer-based models [
7,
8] have demonstrated superior performance in capturing temporal dependencies and predicting long-term sequences. When electricity consumption is forecasted across multiple geographic locations, the problem transitions into spatiotemporal forecasting. Established methods such as Convolutional Neural Network (CNN) [
9], Graph Neural Network (GNN) [
10], and hybrid approaches [
11] have shown promise for such tasks. In addition, architectures applying attention mechanisms in both spatial and temporal domains [
12] further enhance forecasting capabilities. Moreover, treating spatiotemporal forecasting as a multivariate TSF problem aligns particularly well with Transformer-based models, further improving efficiency [
13,
14]. However, without incorporating external data that reflects human behavior, these solutions will struggle to fully capture the spatiotemporal patterns of ELD.
This study hypothesizes that integrating HV consumption data along with LV data can improve the understanding of human behavior’s influence on electricity demand. LV consumption patterns, reflective of residential activity, exhibit distinct peaks and drops that align with daily routines, such as waking up, leaving for work, or returning home. In contrast, HV consumption patterns, representative of industrial and office activities, predominantly peak during working hours. We postulate that by comparing fluctuations in the volume and timing of LV and HV consumption, valuable insights into human mobility can be derived—such as estimating commuting times based on the temporal differences between LV drops and HV increases. We posit that incorporating both LV and HV data into Deep-Learning (DL) models will enable them to adequately perform these comparisons, thereby enhancing their understanding of human routines, and ultimately improving LV consumption predictions. Similarly, the increasing popularity of wearable devices and GPS-enabled technologies has made mobility data, referred to as Human Dynamics (HD) in this study, more accessible. These location-based data, capturing human activity, has already demonstrated its utility in improving ELD forecasts, particularly at broader geographic scales [
15]. We presume that such HD data, when applied at a localized level, offers even greater potential for capturing the diverse patterns of daily routines, especially those influencing LV consumption.
This study aims to predict spatiotemporal LV electricity consumption at the mesh level (500 m grid cells) by leveraging features that represent or capture human behavior: HV consumption and HD. Through various experimental scenarios, we evaluate the accuracy improvements of two-day-ahead LV consumption forecasting. The objectives of this work are twofold: (1) to validate whether integrating HV data enhances the model’s understanding of LV consumption patterns; and (2), to determine whether HV or HD data serves as the more effective feature for achieving precise spatiotemporal LV predictions.
The key contributions of this study are as follows:
Demonstrated the benefit of combining HV with LV data to improve the accuracy of LV consumption predictions.
Assessed the importance of HD data in predicting mesh-level spatiotemporal LV consumption, particularly on large regions with significant human mobility.
Examined the challenges of selecting between HV and HD features, considering area characteristics (e.g., Land Use (LU) distribution) and forecasting objectives.
2. Datasets
This study focuses on Utsunomiya City in Tochigi Prefecture, Japan, which was geographically divided into 500 m square meshes (Following the division principle of
https://nlftp.mlit.go.jp/ (accessed on 15 July 2025)), totaling 1758 meshes.
2.1. Electricity Load Demand
ELD for Utsunomiya City was accessed via GridDataBank Lab (GDBL) (
https://www.gdbl.jp/ (accessed on 15 July 2025)), originally provisioned by TEPCO Power Grid, Inc., Tokyo, Japan (TEPCO PG) for demonstration purposes. This dataset includes hourly Low Voltage (LV) and, where applicable, High Voltage (HV) consumption per 500 m mesh from 1 January 2017 to 31 December 2020.
2.2. Human Dynamics
KDDI Corporation (
https://k-locationdata.kddi.com/ (accessed on 15 July 2025)) provided smartphone location data from its cellular users who consented to share their positions. The data were anonymized and statistically processed to derive Human Dynamics (HD). The dataset provides half-hourly counts of people staying within a mesh (referred to as Human Dynamics Stay (HDS)) and people moving within a mesh (Human Dynamics Move (HDM)) from 1 October 2017 to 31 October 2020. Further details on the derivation of these metrics can be found in [
16].
2.3. Study Period
As the datasets differ in terms of temporal coverage, the study focuses on their common timeframe. To exclude anomalies caused by COVID-19 mobility restrictions, the study period ends on 31 March 2020, just before the first wave of the pandemic in Japan. The data are partitioned as follows: training is conducted using data from 1 October 2017 to 30 September 2018; validation employs data from 1 October 2018 to 30 September 2019; and, evaluation is performed on data from 1 October 2019 to 31 March 2020.
2.4. Study Coverage
The study considers four features (LV, HV, HDM, and HDS), resulting in eight feature combinations (scenarios). Interpolating missing values without external information is often impractical, especially for prolonged periods, and typically requires extensive domain knowledge. Therefore, to ensure data quality, meshes with over 10% missing data for any of the considered features during either the training or study period were excluded. This initial filtering resulted in 775 meshes.
Next, meshes lacking LV data were excluded, as this study focuses on predicting LV consumption. This step removed one mesh with both HV and HD data and 161 meshes that only provided HD data, corresponding to areas primarily covering Agricultural and Forest lands. As a result, the final set of meshes considered in this study was reduced to 613, referred to as set W.
To account for the scarcity of meshes with HV data and to explore the impact of HV and HD on LV prediction, two additional sets of meshes were defined. Set
U includes only meshes with all three types of data (15 meshes in total). Set
V expands on this by including meshes with at least two types of data, adding 498 meshes to set U for a total of 513. Thus, the sets follow a nested structure:
. A detailed breakdown of data availability by feature for these sets is provided in
Table 1.
2.5. Land-Use
Land Use data, publicly available from the Japanese government (
https://nlftp.mlit.go.jp/ (accessed on 15 July 2025)), maps 100 m land meshes to various LU types.
Figure 1 illustrates the LU distribution for Utsunomiya City and the study sets. As shown in
Figure 1a, Utsunomiya city features a dense urban center characterized by
High Building and
Low Dense Building (dark blue), along with
Transportation (black) meshes. Surrounding this area is a large residential zone (
Low Building, gray) with Public Facilities (pink) concentrated in the south and significant industrial areas (
Factory, turquoise) located in the east. The outer regions are predominantly agricultural lands (yellow), rivers (cyan), and forests (dark green).
Figure 1b reveals that set U primarily comprises dense urban meshes, with a high concentration of buildings and transportation areas. Set V, as shown in
Figure 1c, expands this coverage to include a broader city center, incorporating several agricultural lands. Finally,
Figure 1d illustrates that set W extends coverage even further, encompassing more
Agricultural and
Forest lands.
2.6. Limitations
Unless explicitly stated otherwise, the datasets used in this study are not publicly available and were accessed exclusively for this research. Moreover, access to the ELD data was limited to a one-month period, constraining the scope for extensive experimentation, such as testing various experimental settings and models. Finally, the data’s geographical coverage was restricted to Utsunomiya City, which may limit the generalizability of the findings to other cities.
3. Methodology
3.1. Goal
This study aims to assess the extent to which Human Dynamics (staying and/or moving) and High Voltage data, either individually or combined, can improve DL models’ ability to account for mobility behavior and enhance LV consumption predictions. To achieve this goal, eight distinct scenarios involving various combinations of LV, HV, HDM and HDS features are tested across different mesh sets to evaluate their impact on spatiotemporal prediction performance. These experiments allow for valuable insights into the relationships between geographical coverage, LU distribution, features availability, and predictive performance.
3.2. Problem Setting
For a given feature
f, let
denote its value at time
t for mesh
m. Let
represent the list of all available observed feature values at time
t for mesh
m. For example, for a mesh
which includes all features (i.e., belongs to the set U in scenario 8) and a mesh
which includes only LV data (i.e., belongs to the set W),
and
are defined as:
Let
O denote the length of the observation window (i.e., historical data), and
represent the set of meshes considered in experiment
e. The input of the model,
, at the forecast origin
, is defined as:
Let
H denote the forecast horizon (i.e., prediction length),
R the number of repetitions of experiment
e, and
one of the iteration. The ground truth,
, and the model’s predicted output,
, from the forecast origin
, are expressed as:
3.3. Metrics
The prediction error is evaluated using the Mean Squared Error (MSE). The error of experiment
e, at the forecast origin
, for run
r, and mesh
m is defined by
. Considering that there are
T samples over the evaluation period, the temporal MSE for experiment
e and mesh
m is calculated as:
A per-mesh analysis of temporal performance is impractical and suboptimal for understanding the spatial impact of HV and HD features. Instead, we evaluate the spatiotemporal prediction performance by comparing the spatiotemporal MSE and the corresponding standard deviation computed across all meshes of a given scenario.
The spatiotemporal prediction error (STPE) for an experiment,
, represents the average MSE across meshes, while the spatiotemporal standard deviation (STSD),
, measures the variability in MSE. These metrics are defined as follows:
A scenario is considered to improve prediction performance if it outperforms the baseline (i.e., LV-only) in both (lower average error) and (lower variability).
3.4. Forecasting Task
The forecasting task of this study is to predict LV consumption for the next two days () using observations from the past four days ().
While transformer models are well-suited for long-sequence predictions, this study focuses on mid-term forecasts to emphasize the impact of feature combinations on mesh-level predictions while reducing computational demands.
Weekly patterns in human behavior (e.g., differences between weekdays and weekends) suggest that longer observation windows could enhance prediction accuracy. However, the computational requirements of transformer models grow significantly with an increasing number of variables and extended observation periods due to their inherent point-wise connections. To address these computational constraints and ensure all experiments could be conducted, the observation period was limited to four days. Although this may slightly reduce prediction accuracy, it still provides valuable insights into feature combinations and their effects on LV consumption forecasts, paving the way for future research with extended observation windows.
3.5. Reference Model
The Informer model was selected as the reference model for its efficiency and scalability in multivariate time series forecasting. Due to the high computational demands of certain experiments, more recent models, such as PatchTST [
13], were not feasible to implement. While state-of-the-art transformer-based models [
14] may outperform Informer, resource constraints precluded their use in this study. Moreover, the goal of this study is not to benchmark models but to evaluate the impact of incorporating mobility data on LV consumption predictions. The findings of this study are expected to generalize across any transformer-based models sharing similar architecture.
4. Experiments Results
Our study consists of 24 experiments (8 scenarios × 3 sets), each run three times (
). We conducted them on an AWS instance equipped with 16 vCPUs, 64GB of memory and 1 GPU. Except for sequence and prediction lengths, default parameters from Informer’s public implementation (
https://github.com/zhouhaoyi/Informer2020 (accessed on 15 July 2025)) were used for each experiment. To account for spatial influences in the forecasting task, the model was configured in a multivariate-to-multivariate setting.
Table 2 summarizes the STPE and STSD values for all experiments, accordingly to the definitions of
Section 3.
Analyzing results across the sets (U, V, and W), we observed that adding more features (HD or HV) generally outperformed Scenario 1 in terms of STPE, with the exception of set U and Scenario 3. Moreover, except for set V, we observed a nested improvement pattern: on average, the more features incorporated, the better the STPE performance. These results underscore the importance of including additional features, such as HD and HV, for improving spatiotemporal LV predictions. However, adding more features also increased STSD, indicating that improvements are not anymore uniform across all meshes. This observation suggests some specialization towards certain meshes, as illustrated in
Figure 2. More specifically, for set U, incorporating HV achieved the smallest STPE, while HDM resulted in the lowest STSD. Scenario 6, which combined HV and HDM, provided a balanced performance across both metrics. In set V, combining HV and HDS yielded the best STPE, while Scenario 2 (LV and HV) produced the smallest STSD. Scenario 3 (with HDM) offered a good balance between STPE and STSD. Finally, for set W, using HDS achieved the best STPE, while with HDM resulted in the smallest STSD. Scenarios 3 and 4 provided favorable trade-offs between STPE and STSD for this set. Several key insights emerge from these results.
4.1. Impact of HV Feature
A comparison of Scenario 1 and Scenario 2 across all sets reveals that adding historical HV data reduced STPE by an average of % and stsd by %. This suggests that combining LV and HV patterns helps the model capture spatiotemporal correlations more effectively, even for sets with a limited number of HV meshes (e.g., V and W).
4.2. Impact of HD Features
Integrating HD features (HDM, HDS, or both) improved STPE and STSD in most cases, except for set U and Scenario 3. Specifically, Scenarios 3 to 5 showed an average reduction of % in stpe and % in stsd compared to Scenario 1. These results highlight the ability of HD features to enhance spatiotemporal correlations.
4.3. HV vs. HD
The relative benefits of HD and HV features vary with spatial coverage. For set W, HD features improved STPE by % and stsd by % on average—nearly times better than the improvements from HV. For set V, HD features slightly outperformed HV ( times better), while for set U, HD improvements were smaller, with STPE improving by only %, times worse than HV. These findings suggest a threshold in spatial coverage beyond which increasing meshes with HD data yields better results than with a fixed number of meshes with HV data.
4.4. Impact of HV and HD Features
Combining HV with HD features consistently reduced STPE but mostly did not lower STSD. In fact, adding more features often increased STSD, particularly for larger sets. On average, Scenarios 6 to 8 reduced STPE by % compared to Scenario 1, but their impact on stsd was more variable. When compared to Scenario 2, adding HD features to HV reduced STPE for set W but never decreased STSD. For set V, STPE improved only in Scenarios 6 and 7, but STSD increased significantly. Conversely, for set U, these scenarios did not outperform Scenario 2 in terms of STPE, though they reduced STSD in Scenarios 6 and 7.
In summary, combining LV and HV consistently improves performance across all sets, while HD features appear more effective for larger sets of meshes. In the following section, we attempt to explain these observations.
5. Results Interpretation
Our numerical results reveal a strong correlation between the area covered, the number and type of features used (and thus variables), and the model’s performance in predicting LV consumption two days ahead.
5.1. Dependence on Area Covered
The impact of HD features is particularly pronounced when considering spatial coverage. Both HDM and HDS individually outperform HV for the largest set, W, likely due to the nature of HD features. HDM represents the volume of people in movement within a mesh, while HDS represents the volume of people staying at a given location. However, HD features do not account for the flow of people between meshes (i.e., how many enter or exit a given mesh).
Despite containing 30% of agricultural or forest meshes, having limited human flow, set W encompasses most of Utsunomiya’s areas with high human activity. As a result, the total volume of people in set W remains relatively stable over time, with minimal influence from external flows (e.g., unconsidered meshes or neighboring cities). In contrast, for smaller sets like U, which represents less than 1% of the meshes in Utsunomiya, the limitation of HD definition becomes significant. The number of people entering or exiting set U fluctuates significantly throughout the day, especially around working hours, leading to instability in the total volume of people. These fluctuations can mislead DL models, which lack contextual understanding about the numbers they process, making it harder to identify patterns and correlations accurately. This instability likely explains why HD features perform worse than the LV-only scenario in set U but perform better in larger sets like W.
5.2. Dependence on the Number of Features
The number of features primarily affects the STSD. An increase in STSD indicates greater variability in temporal MSE across meshes, which may not always be problematic. If the model achieves high efficiency for a subset of meshes without significantly increasing the MSE for others, it can be considered effective but specialized. However, if certain meshes show substantial improvements while others exhibit significant degradation in performance, this probably indicates overfitting on specific meshes.
Figure 2 illustrates the spatial distribution of temporal prediction errors for different scenarios in sets V (top row) and W (bottom row). Temporal MSE changes, relative to Scenario 1 (using historical LV data only), are represented using a gradient color scheme: yellow indicates no significant change, red shades show error reductions, and blue shades indicate error increases. Darker colors represent greater deviations. The visualizations confirm that the model’s focus varies across scenarios and meshes. This behavior likely stems from how neural networks optimize their loss function, here favoring meshes where error reduction is more achievable. For instance, Scenario 4 in set W achieves the best STPE. However,
Figure 2l shows that while many meshes see improved performance (red), several others experience degradation (blue), indicating uneven improvements and possible overfitting. Conversely, Scenarios 7 or 8, shown in
Figure 2o,p, exhibit slightly higher STPE but deliver more uniform improvements across meshes. Therefore, selecting the best set of features depends on the intended use of the predictions, requiring a trade-off between STPEs improvement and STSD uniformity.
Figure 2.
Visualization of temporal MSE changes for set V (top row) and set W (bottom row), relative to the LV-only scenario [(a) for set V and (i) for set W]. Figures (b–h) and (j–p) correspond chronologically to Scenarios 2 through 8. A gradient scheme is used to color meshes: yellow indicates no significant change, red shades show error reductions, and blue shades indicate error increases.
Figure 2.
Visualization of temporal MSE changes for set V (top row) and set W (bottom row), relative to the LV-only scenario [(a) for set V and (i) for set W]. Figures (b–h) and (j–p) correspond chronologically to Scenarios 2 through 8. A gradient scheme is used to color meshes: yellow indicates no significant change, red shades show error reductions, and blue shades indicate error increases.
In conclusion, recommending a universally superior feature or feature combination for improving LV consumption predictions is challenging. The optimal choice depends on data availability, the characteristics of the target city, the dynamics of human activity and flow with neighboring areas.
6. Conclusions
This study investigated the efficacy of High Voltage (HV) and Human Dynamics (HD) data in predicting Low Voltage (LV) Electricity Load Demand (ELD). Both data types demonstrated their potential to reduce prediction errors across a city; however, their effectiveness across different cities would require further experimentation. HV data proved particularly beneficial for urban regions with strong business activity, offering consistent performance across various area sizes. Conversely, HD data showed greater promise in widespread regions characterized by significant but stable human movement. These findings offer valuable insights into the influence of human behavior on ELD, with practical implications for optimizing energy planning and resource allocation. As working habits and movement patterns continue to evolve, accurately describing human mobility is becoming increasingly critical for precise ELD forecasting.
Future research will prioritize automating mesh labeling, refining the definition and application of HD data, and exploring advanced modeling techniques to further enhance spatiotemporal forecasting of data influenced by human behavior.