The Economy and Policy Incorporated Computing System for Social Energy and Power Consumption Analysis

Human activities, such as energy consumption and economic development, will significantly affect the natural environment, while changes in the natural environment will also affect the sustainability of human society. Studying the energy consumption changes of human society and forecasting medium and long-term electricity demand will help realize the sustainable development of energy in future society. However, current medium- and long-term electricity consumption forecasts have insufficient data samples and the inability to consider policy impacts. Here, we develop an Economy and Policy Incorporated Computing System (EPICS), which can use artificial intelligence technology to extract the summaries of energy policy texts automatically and calculate the importance index of energy policy. It can also process economic data of different lengths to expand samples of medium- and long-term electricity consumption forecasting effectively. A forecasting method that considers policy factors and mixed-frequency economic data is introduced to estimate future social energy and power consumption. This method has shown good forecasting ability in 27 months. The effect of EPICS can be demonstrated by predicting the medium- and long-term electricity demand.


Introduction
Global warming is one of the main threats to human society, so reducing carbon emissions has become the consensus of all countries. The energy and power industries account for a large proportion of carbon emissions and are the main force in energy conservation and emission reduction.
In order to reduce carbon emissions in the power industry, low-carbon power technologies have emerged. Low-carbon power technology mainly includes power system carbon emission analysis, low-carbon power planning, power system low-carbon transportation, etc. Among them, the medium-and long-term electricity consumption forecast in the power industry is the basis for achieving low-carbon power planning and evaluation, which can help the power system achieve economic and low-carbon goals. Statistics in References [1][2][3][4] show that, for every 1% reduction in the error of electricity consumption forecast, the annual operating cost of the power system will be reduced by 10 million pounds. Thus, how to improve the accuracy of power consumption prediction has always been a popular issue for researchers.
The medium-and long-term electricity consumption will be significantly affected by many randomly changeable factors, such as macro-control policy, economy, and weather [5]. Accurately quantifying the "energy policy" for grasping the impact of economic changes and energy policies on the power market is critical for improving the accuracy of mediumand long-term electricity consumption forecasts. However, current research on the effect of build the feature multi-input fusion model. In the 27-month consumption diction task i three provinces in China, EPICS achieved high forecasting accuracy of 1.386, 0.985, an 1.683, which is 2.415 higher than traditional methods, on average. This proves the effe tiveness of the EPICS framework. The list of symbols used in this paper is shown in Ap pendix A.

EPICS Framework
The EPICS framework has two types of input data: policy text data and mixed fr quency economic data. Among them, policy text data is unstructured and economic da is structured. In order to jointly process these two kinds of data, the EPICS framework designed to contain two main modules: policy quantification module and mixed-fr quency economic data processing module. The structure is shown in Figure 1. The policy quantification module uses BERT-based automatic text summarization technology to refine many power policy texts. The mixed economic data processing module mainly uses the LSTM network to realize the automatic feature extraction of the mixed data and constructs the feature multi-input fusion model.

•
First, the policy text data is used as the input of the policy quantification modul which can extract a large amount of power policy text summaries through the auto matic text summarization technology based on BERT. In addition, the power polic The policy quantification module uses BERT-based automatic text summarization technology to refine many power policy texts. The mixed economic data processing module mainly uses the LSTM network to realize the automatic feature extraction of the mixed data and constructs the feature multi-input fusion model.
• First, the policy text data is used as the input of the policy quantification module, which can extract a large amount of power policy text summaries through the automatic text summarization technology based on BERT. In addition, the power policy summary can be quantified based on Policy Modeling Consistency Index (PMC-Index) [23]. This method can summarize the main content of policy measures and improve the efficiency of policy quantification. • Second, the output of the policy quantification module (PMC-Index) and mixed frequency economic data are integrated. This step is to implement the joint processing of structured data and unstructured data. • Third, the fused data are utilized as the input of the mixed-frequency economic data processing module. The mixing economic data fusion modeling module mainly uses the masking layer of the Keras to cover and filter the vacancy time steps in the data. The masking layer can mask a sequence by using a mask value to skip timesteps. For each timestep in the input tensor, if all values in the input tensor at that timestep are equal to "mask value", then the timestep will be masked (skipped) in all downstream layers. In addition, this module also uses the LSTM network to realize the automatic feature extraction of the mixed data and constructs the multi-input feature fusion model, which aims to cope with the issue that the data volume of medium-and long-term power consumption is not sufficient for deep network model training.

EPICS Framework: Policy Quantification Module
Before quantifying policies, a sufficient amount of policy data must be obtained and pre-processed. Electricity policy data usually exists in text, mainly including news information, policy reports, etc. These data can be obtained on government official websites and electric power portals through web crawler technology. However, there are many useless interfering texts in these data. It is challenged to extract useful information from the huge amount of text data by manpower alone. Therefore, it is necessary to use automatic text summarization technology to extract and refine the massive amount of power policy texts and summarize the main content of policy measures.
In order to pre-process of policy text, we propose a BERT-based abstract extraction model. Figure 2 illustrates the structure of the model. The input text is separated by two special symbols [CLS] and [SEP], where [CLS] is located at the beginning of the text, which means that the feature is used for classification models. For non-classification models.
[SEP] means clause symbol, which is used to disconnect two sentences in the input corpus.
The processed input text is assigned to three kinds of embedding: token embeddings, segmentation embeddings, and position embeddings. The token embeddings are used to indicate the meaning of each tag. The segmentation embeddings are used to distinguish two sentences. The position embeddings are used to indicate the position of each tag in the text sequence. The input coding vector of BERT is the unit sum of the three embedded features.
Summarization Layers are used to process each sentence vector T i and calculate the gold label Y i of each sentence sent i . The loss of the whole model is the binary classification entropy of the yieldŶ i and the gold label Y i [24]. The calculation process ofŶ i is as follows: where h 0 is equal to PosEmb(T i ), and T i is the sentence vector output by BERT. PosEmb is the function to add positional embeddings to T i . MH Att is the Multi-Head Attention operation [25]. LN is the layer normalization operation. The superscript l indicates the depth of the stack layer. The final output still uses the S-type classifier: where h 0 is the vector for sent i from the L − th layer of the Transformer. W 0 is the weight, and b 0 is the bias.
useless interfering texts in these data. It is challenged to extract useful information fro the huge amount of text data by manpower alone. Therefore, it is necessary to use au matic text summarization technology to extract and refine the massive amount of pow policy texts and summarize the main content of policy measures. In order to pre-process of policy text, we propose a BERT-based abstract extracti model. Figure 2 illustrates the structure of the model. The input text is separated by tw special symbols [CLS] and [SEP], where [CLS] is located at the beginning of the text, wh means that the feature is used for classification models. For non-classification mode [SEP] means clause symbol, which is used to disconnect two sentences in the input corp After pre-processing the policy text through the above steps, we get the power policy summary. Next, a suitable model needs to be established to evaluate and quantify the policy summary. We propose a power policy quantitative model based on PMC-Index. We establish 9 first-level variables and 33 second-level variables based on the specific characteristics of China's power policy. The detailed variable design is shown in Table 1. See the Methods section for other calculation steps.  (6): where X t j represents the first-level variable, X t represents the second-level variable, and T X t j represents the number of second-level variables.

EPICS Framework: Mixed-Frequency Economic Data Processing Module
The mixed-frequency economic data processing module of EPICS mainly includes several steps: mixing input, coverage and filtering, feature extraction, feature merging, and multi-layer perception. The mixed data, including the mixed-frequency economic data, the historical power consumption data, and the quantified policy data, are processed by the masking layer to mask missing data points, and they are then fed into the deep neural network for feature extraction. Finally, through the feature merging layer and perception layer, the monthly electricity consumption is predicted.
Macroeconomic indicators of different frequencies contain different characteristic information. Among them, the high-frequency monthly economic data can reflect the short-term fluctuations of the economic market to a certain extent. The low-frequency quarterly annual data is not as real-time as the monthly data due to the long accounting period. However, it is of great significance to accurately describe the long-term trend and overall situation of regional economic operations. The extensive use of regional long-term and short-term economic data for power consumption forecasting modeling helps establish a more comprehensive model of the relationship between economic factors and power consumption.
Low-frequency data in mixed frequency data can be regarded as high-frequency data with missing values. The problem of missing low-frequency data caused by frequency mixing is an inherent characteristic of macroeconomic data and cannot be directly filled by traditional interpolation methods. Traditional econometric models often turn highfrequency data into low-frequency data through accumulation or turn low-frequency data into high-frequency data through interpolation. Then model the economic data of a single frequency. However, this kind of processing method will change the original data, resulting in the loss of important information.
Referring to the network [26] of the multi-input fusion model, we let the mixed data pass the Masking layer to achieve coverage of the vacant time steps. Then let them enter the deep learning network to realize the automatic feature extraction of different frequency economic data. Finally, let the data pass the feature merging layer and the perception layer to realize the prediction of monthly electricity consumption. In order to couple multiple time series information, all variables at any time are concatenated into a vector representation to form a new time series, as shown in the following formula.
Sustainability 2021, 13, 10473 8 of 18 In the above formula, T is the time window size; n is the number of influencing factor variables, in this article n = 53; x k is the numerical sequence of the k-th variable at time T; and x t is the set of n variable values at time t.

Electricity Consumption Forecasting Methods Considering Economic and Policy Factors
The mixed-frequency economic data processing module contains a medium-and longterm electricity consumption forecasting model. Next, we will introduce the construction of the forecasting model. The basic idea to construct a medium-and long-term load forecasting model considering economic and policy factors is to use PMC index and mixed frequency economic data as the input of the LSTM electricity consumption forecasting model. At present, the LSTM model is mainly used for short-term forecasting with relatively sufficient data volume and has achieved high forecasting accuracy. However, its application on medium-and long-term forecasting is less due to insufficient data [27]. In order to solve the problem that historical data for medium-and long-term electricity consumption forecasts is insufficient, we used the electricity economic data from 30 provinces in China to achieve data enhancement.
Medium-and long-term electricity consumption is affected by many factors. In order to couple multiple time series information, we concatenate variables at different times into a vector representation.
Considering that the influence of economic and policy factors on medium-and longterm electricity consumption is delayed, we select the historical economic and policy data X and electricity consumption data Y in the previous 24 months as eigenvectors to predict the next monthly electricity value Y T+1 . The model is a multivariate forecasting problem, and its mathematical expression is: In the above formula, T is the time window size. F is the model mapping function, which is the nonlinear mapping relationship to be learned in this article.
In order to reduce the error brought by the Spring Festival to the electricity consumption forecast, we first isolate the Spring Festival effect component in the electricity consumption sequence. Then, we use the X-12-ARIMA seasonal adjustment algorithm to decompose the remaining amount into long-term trend components, seasonal components and irregular components. Finally, we use the model proposed in this article to predict each component separately and sum them up to get the final prediction result. This method can further improve the learning effect of the model.

EPICS Policy Quantitative Results
The policy text data used in this article is obtained from the official websites of China State Grid Corporation of China and China Electricity Council through web crawler technology. We label the original data manually to get the power policy data. In order to construct the original policy text as a data set suitable for the BERT-based abstract extraction model, we use CoreNLP [28] to segment sentences and pre-process the data set according to the method of See et al. [29].
In order to effectively evaluate the effectiveness of the EPICS on the task of extracting power policy abstracts, this article uses the general index ROUGE [30] in the field of automatic text summarization to automatically evaluate the quality of abstracts. This index can count the basic units of overlap between the summary generated by the model and the artificial summary and objectively evaluate the quality of the summary generated by the model. In order to make a comparison, this experiment constructed an untrained Transformer model, LEAD model, and REFRESH model as baselines. Transformer baseline has six layers, a hidden size of 512 and a feedforward filter size of 2048. It uses the same architecture as BERT-based model but has fewer parameters and is randomly initialized. LEAD is a simple summary extraction model that uses the first three sentences of a document as a summary. REFRESH [31] is an abstraction system trained by global optimization of Rouge index through reinforcement learning. In this article, the word overlap ROUGE-1 is used to evaluate the power policy text abstract, and the extraction results of the summary on the power policy data set are shown in Table 2. The results show that the BERT-based abstract extraction model proposed in this article has significant advantages over Transformer, LEAD, REFRESH, and other models and can improve the Rouge-1 evaluation index of the power policy text abstract extraction task by 1.5-3.4, which provides a reasonable basis for the following policy quantification.
After getting the summary of the policy text, we use word frequency analysis software ROSTCM6 to conduct data mining on the policy text. The software can retrieve the keywords related to the secondary variables in the multi-input-output table, assign values to the secondary variables according to the retrieval results, and, finally, calculate the PMC index values of each policy text.
In addition, we divided the obtained PMC index value into four grades according to the evaluation criteria proposed by Estrada [23]. If the PMC index is between 10 and 9, the policy text is "perfect". If the PMC index is between 8.99 and 7, the policy text is "good". If the PMC index is between 6.99 to 5, the policy text is "acceptable". If the PMC index is between 4.99 and 0, the policy text is "bad". Table 3 shows the calculation results of the PMC index of the three policy texts, which are graded according to the policy scoring criteria. As is shown in Table 3, the PMC index of Paper-1 is 4.72, whose grade is "Bad". Its scores of X7 and X8 are low, indicating that the policy's focus is not clear, and the content is not rational. The PMC index of Paper-2 is 5.43, with a rating of "Acceptable". Its scores of X7 and X8 are high, while scores of X1 and X2 are low, indicating that the policy is reasonable but ineffective. The PMC index of Paper-3 is 5.21, with a rating of "Acceptable". Its scores of X2 and X5 are low, while other first-level variable scores are moderate, indicating that the policy is less effective.

Mixed-Frequency Economic Data Processing Results
The economic data used in this article is selected from the economic power data of 30 provinces and cities in China from January 2007 to December 2019. Due to the large time span and wide spatial distribution of sample data, we set the ratio of the training set and test set to 8:2 according to the temporal and spatial factors. The types, quantity, and credibility of economic data have been effectively improved and standardized with the continuous improvement of the intelligence of data statistics. Considering that mid-to-long-term electricity consumption forecasting is a complex, multi-dimensional and non-linear problem, we should ensure the comprehensiveness and extensiveness of statistical indicators when selecting economic data. On the other hand, considering the replacement of different indicators and the differences between regions, we should ensure the time continuity and statistical adequacy of the selected economic data.
We introduce the three meteorological factors, temperature, air pressure, and humidity, to characterize the significant influence of meteorological factors on electricity consumption. We obtained a wide range of socio-economic data with various structures through opensource websites, such as the national or local statistical bureau. Then we built a database of macroeconomic meteorological indicators related to electricity demand based on the above principles. The database contains 52 indicators, including 19 monthly indicators, 10 quarterly indicators, and 23 annual indicators. See Table 4 for specific indicators.

Monthly data
Year, month, province information, consumer price index, commodity retail price index, power generation, real estate development investment, real estate development enterprise housing construction area, real estate development enterprise housing new construction area, real estate development enterprise housing completion area, general public budget income, financial institutions in foreign currency deposit balance, financial institutions in foreign currency loan balance, value of import, value of export, total value of export import and export, average temperature, average pressure, average relative humidity Quarterly data GDP, regional GDP index, total output value of construction industry, completed output value of construction industry, construction area of housing construction, newly started area, labor productivity calculated by total output value, per capita completed output value, completed area of housing construction, fixed asset investment price index Annual data GDP, GDP real growth index, per capita GDP, per capita GDP real growth index, added value of primary industry, added value of secondary industry, added value of tertiary industry, real growth index of added value of primary industry, real growth index of added value of secondary industry, real growth index of added value of tertiary industry, industrial added value, consumption level of residents, consumption level of urban residents, consumption level of rural residents, consumption level comparison between urban and rural areas, completed investment in fixed assets of the whole society, investment in fixed assets (excluding farmers), investment in fixed assets (excluding farmers), total retail value of social consumer goods, added value of construction enterprises, resident population, natural growth rate of resident population, total electricity consumption A simple list of mixed-frequency economic data is shown in Table 5. Here, we only show the form of data organization in a certain region in a year. Due to large data volume, we do not show the specific values of 4680 rows and 52 columns. We simplify the representation of 19 columns of monthly data, 10 columns of quarterly data, and 23 columns of annual data. In addition, 0s in Table 5 represent the absence of data at those locations, and 1s represent the existence of data at those locations.

Electricity Consumption Forecast Results
In order to further verify the effectiveness of the EPICS framework, we input economic and policy data into LSTM power consumption prediction network and conduct control experiments in 30 provinces of China.
In the experiment, the EPICS method proposed in this article contains all the data: mixing economic data, policy quantitative data, and electricity consumption data. LSTM1 uses mixed-frequency economic data and electricity consumption, LSTM2 only uses mixedfrequency economic data, LSTM3 only uses monthly economic data, and LSTM4 only uses electricity consumption data. We also designed two traditional load forecasting methods as a comparison, in which the input of Gate Recurrent Unit (GRU) [32,33] network is electricity consumption data. The Autoregressive Integrated Moving Average (ARIMA) model predicts the power consumption components and then adds up to get the total power consumption, whose input is historical power consumption data.
In order to measure the model training results, we use MAPE and RMSE to evaluate the load forecasting results. The calculation Formulas are given by Formulas (12) and (13), where n represents the number of samples, y i represents the actual value of the training sample, andŷ i represents the predicted value of the training sample. Table 6 shows the prediction errors of the proposed method and the controlled experiment method in the medium-and long-term electricity consumption of 30 provinces and municipalities in China. Statistics show that the accuracy of the EPICS framework ranked first in 24 provinces and second in the remaining 6 provinces. In these 6 provinces, the gap between EPICS and the first-ranked model was very small. Specifically, the MAPE difference between the two was maintained at 0.2-0.3%, and the RMSE difference between the two was maintained at 0.3-0.6. Overall, the average prediction error of EPICS in 30 provinces in China was much lower than that of other models. The average MAPE of EPICS is 2.16%, and the average RMSE is 3.98. In order to show the generality of the EPICS algorithm in more detail, we show the line chart of prediction error of 7 algorithms in 30 provinces, as shown in Figure 3. It can be seen from the figure that the overall error of the EPICS algorithm is low, which shows that the EPICS method is effective with better performance in most regions.   Observing the prediction results of the control group LSTM1~4, it can be found that compared with LSTM4, which only uses electricity consumption as input to the LSTM network, the MAPE of LSTM3 drops on average by 0.69%, and the RMSE drops on average by 1.92. This is because LSTM3 adds monthly economic data as input, which shows that economic data has a significant improvement effect on electricity consumption forecasts. Compared with the LSTM3 model that only uses monthly economic data, the MAPE of LSTM2 is reduced by an average of 0.37%, and the RMSE is reduced by an average of 0.52. This is because LSTM2 adds quarterly and annual mixing data as input, which indicates that mixed-frequency economic data helps to improve the accuracy of electricity consumption forecasts.
Compared with the electricity consumption forecasting method GRU, the MAPE of the EPICS algorithm has dropped by 2.84%, on average, and the RMSE has dropped by 2.61, on average. Compared with the electricity consumption forecasting method ARIMA, the MAPE of the EPICS algorithm has dropped by 4.59%, on average, and the RMSE has dropped by 6.44, on average. This reflects the advantages of our model EPICS in learning complex nonlinear relationships and long-term dependencies. Figure 4 shows the comparison between the predicted value and the true value of all methods in Jiangsu, Henan, and Hubei provinces. It can be seen from the figure that the traditional load forecasting methods ARIMA and GRU are not as effective as other models in fitting the trend of electricity consumption in the three provinces, and the error value even reaches 4 billion kWh at some points at some points. EPICS and LSTM1-5 have very close fitting effect on the trend of electricity consumption, but there are large deviations at some specific time points, such as 14 and 21 time points in Jiangsu Province, 4 and 25 time points in Henan Province, and 11 and 24 time points in Hubei Province. At these time points, the EPICS maintains a high prediction accuracy, which shows that policy data and mixed frequency economic data are helpful to improve the robustness of the system.

Conclusion
Most of the research on policy evaluation is based on qualitative research, and few people study policy quantification [34]. Objective evaluation of the implementation effect of energy industry policy is of great value for scientific policy-making [35]. Quantifying complex policy texts into specific influence indexes is an important method to reduce the subjectivity of policy evaluation, but its scientificity and rationality have always been suspected. This article verified the effectiveness of the method through scientific experiments. We use the PMC model to obtain the influence index of the policy text, and we then apply it to medium-to long-term electricity consumption forecasting scenarios to observe the effect of the model. The results of the experiment show that the PMC index is helpful to improve the accuracy of electricity consumption forecasting and objectively reflect the influence of the policy text to a certain extent.
Our study also shows that abundant mixing economic data information can not only solve the problem of lack of historical data for medium-and long-term electricity consumption forecasts but also effectively improve forecast accuracy. Traditional econometric models often turn high-frequency data into low-frequency data through accumulation, or turn low-frequency data into high-frequency data through interpolation and filling, and then use single-frequency economic data for modeling. However, such processing methods will change the original information of the data, leading to missing important information of the data or adding useless artificial information, which will increase the error of the prediction result and decrease the accuracy [36][37][38][39][40][41][42]. The economic data fusion input framework proposed by us can fully explore the complex relationship between electricity and economy. This has certain reference value for predicting the development trend of electricity consumption under the background of complex economic situation.
In the future, researchers can use more powerful semantic extraction models to implement generative text summaries and dig out more policy elements. It may improve the comprehensiveness of abstract extraction and the accuracy of policy quantification. At the same time, introducing more policy factors is expected to further enhance the model's capability of understanding the impact of policy on social activities, such as power consumption, and improve the model's information processing performance.

Conclusions
Most of the research on policy evaluation is based on qualitative research, and few people study policy quantification [34]. Objective evaluation of the implementation effect of energy industry policy is of great value for scientific policy-making [35]. Quantifying complex policy texts into specific influence indexes is an important method to reduce the subjectivity of policy evaluation, but its scientificity and rationality have always been suspected. This article verified the effectiveness of the method through scientific experiments. We use the PMC model to obtain the influence index of the policy text, and we then apply it to medium-to long-term electricity consumption forecasting scenarios to observe the effect of the model. The results of the experiment show that the PMC index is helpful to improve the accuracy of electricity consumption forecasting and objectively reflect the influence of the policy text to a certain extent.
Our study also shows that abundant mixing economic data information can not only solve the problem of lack of historical data for medium-and long-term electricity consumption forecasts but also effectively improve forecast accuracy. Traditional econometric models often turn high-frequency data into low-frequency data through accumulation, or turn low-frequency data into high-frequency data through interpolation and filling, and then use single-frequency economic data for modeling. However, such processing methods will change the original information of the data, leading to missing important information of the data or adding useless artificial information, which will increase the error of the prediction result and decrease the accuracy [36][37][38][39][40][41][42]. The economic data fusion input framework proposed by us can fully explore the complex relationship between electricity and economy. This has certain reference value for predicting the development trend of electricity consumption under the background of complex economic situation.
In the future, researchers can use more powerful semantic extraction models to implement generative text summaries and dig out more policy elements. It may improve the comprehensiveness of abstract extraction and the accuracy of policy quantification. At the same time, introducing more policy factors is expected to further enhance the model's capability of understanding the impact of policy on social activities, such as power consumption, and improve the model's information processing performance.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. List of symbols used in this article.