Research on the Root Cause Tracing Method of the Change in Access to Electricity Index Based on Data Mining

Luo, Hongshan; Zhou, Xu; Zheng, Weiqi; He, Yuling

doi:10.3390/en18092275

Open AccessArticle

Research on the Root Cause Tracing Method of the Change in Access to Electricity Index Based on Data Mining

¹

Shenzhen Power Supply Bureau Co., Ltd., Shenzhen 518048, China

²

Department of Mechanical Engineering, North China Electric Power University, Baoding 071003, China

³

Hebei Engineering Research Center for Advanced Manufacturing & Intelligent Operation and Maintenance of Electric Power Machinery, North China Electric Power University, Baoding 071003, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(9), 2275; https://doi.org/10.3390/en18092275

Submission received: 2 April 2025 / Revised: 27 April 2025 / Accepted: 28 April 2025 / Published: 29 April 2025

Download

Browse Figures

Versions Notes

Abstract

Superior electricity-optimized business ecosystems (EOBEs) have evolved into pivotal determinants in catalyzing industrial–commercial prosperity. The access to electricity index (AEI) constitutes a valid instrument for assessing the EOBE. To realize the accurate evaluation of EOBE and the root cause tracing of its changes, this paper constructs a new analytical model for evaluating and monitoring changes in EOBE. First, this paper constructs a new evaluation model of EOBE based on the Business Ready (B-READY) evaluation system, considering three factors: the power regulatory quality, the public service level, and the enterprises’ gain power efficiency. Then, the model uses the raw data collected to calculate a score for AEI to enable an accurate assessment of EOBE. Next, this paper uses a priori assessment to extract the coupling features of indicators and combines the time series features and policy features to construct the feature matrix. Finally, the characteristic contribution was analyzed using support vector regression (SVR) and Shapley’s additive interpretation (SHAP) value. The experiment shows that the factors affecting the change in AEI are time series features, policy features, and coupling features in decreasing order of importance. This study provides reference cases and improvement ideas for the assessment and optimization of EOBE.

Keywords:

access to electricity index (AEI); root cause tracing; a priori; support vector regression (SVR); Shapley’s additive interpretation (SHAP) value

1. Introduction

In today’s international context, the business environment (BE), as the basis of the active economy, affects the operation of enterprises and the development of the world economy via many aspects such as electricity [1], taxation [2], and the protection of property rights [3]. As electricity is one of the main energy sources in modern society, the electricity-optimized business ecosystem (EOBE) has become an important part of BE and is closely related to the international economy [4,5]. Access to electricity index (AEI) is an important criterion with to evaluate EOBE, and its accurate application becomes an effective means of optimizing EOBE. It has become a research hotspot in this field, allowing us to evaluate the level of EOBE from multiple dimensions and analyze the reasons for its changes.

Numerous scholars have conducted extensive research on the improvement in AEI. In 2001, the World Bank published the Doing Business Evaluation Indicator System, the first set of indicators to measure and evaluate the business environment provided to companies in a region or country. Since electricity has become one of the necessary energy sources for enterprises’ daily operations, the World Bank incorporated the index of “access to electricity” into its Index System of Doing Business in 2010, mainly considering the efficiency of enterprises’ access to electricity supply and the stability of power supply [6,7,8]. Maryam Doroodi et al. pointed out that, in addition to the dimension of power acquisition efficiency at the enterprise level, the environmental impact caused by power supply should also be considered [9]. Qiang Christine Zhenwei et al. also deeply influenced the development of EOBE by assessing the quality of power regulations [10]. In 2024, the World Bank launched Business Ready (B-READY), based on the Doing Business indicator system, taking into account three dimensions: regulatory framework, public services, and overall efficiency [11].

To study the nature of AEI changes, many scholars have carried out extensive research on the root cause tracing method of target variables. Zhu et al. conducted an in-depth analysis of the dominant influences on good production by calculating gray correlation, maximum mutual information, feature importance, and Shapley’s additive interpretation (SHAP) value. This provides a promising method for the prediction and contribution analysis of multifactorial influences [12]. In their study of shale brittleness assessment methods, Wang et al. introduced SHAP value analysis for quantitatively assessing factor impacts, providing a ranking of characteristic contributions and visualizing contribution trends [13]. Wang et al. found that the generation and reliability of electricity supply are affected by climatic conditions by studying the flow of electricity, from generation to the power system [14]. Liang Weihui et al. simulated the experimental results under different feature combinations by controlling feature variables and accurately quantified the contribution degree of each feature [15]. All of the above literature combines a prediction model and contribution analysis to analyze the reasons for the change in target value. In practical application, the accuracy of contribution degree analysis is affected by many complicated factors.

Developing a suitable prediction model is the key to ensuring the accuracy of prediction and contribution analysis. Emily Royal et al. used neural networks for the time series forecasting of electricity loads and found out that electricity loads are growing at a high rate and are affected by seasonal factors [16]. Li et al. used the Extreme Gradient Boosting (XGBoost) model optimized by the Northern Goshawk Optimization (NGO) algorithm to forecast electricity load, which significantly reduced the mean absolute percentage error and improved the coefficient of determination [17]. Fan et al. developed a novel short-term load forecasting model that mixes several machine learning methods such as support vector regression (SVR), grey catastrophe, and random forest (RF) modeling. This improves the forecasting accuracy and reduces the forecasting randomness [18]. Wu et al. proposed a Bayesian network-based blast furnace gas generation prediction method, which unfolds the prediction for the training event set. It is constructed in two dimensions—interval generation and interval time—and effectively improves the prediction accuracy [19]. The above literature studies methods used to improve the prediction accuracy, which can increase the accuracy of the contribution analysis results. However, in practical applications, contribution analysis should not only consider the influence of prediction accuracy.

Using an effective feature extraction method also plays a crucial role in the results of contribution analysis. Kan et al. extracted the wind speed distribution features of actual wind farms through data-mining methods and improved the accuracy of identifying the effective operating conditions of wind turbines under different wind speed intervals [20]. Wang et al. used multiscale data mining for three-stage feature screening and later increased the accuracy of prediction by combining the features through convolutional neural network (CNN) modeling to carry out prediction [21]. Yu et al. investigated a fast control method of distributed photovoltaic countercurrent prevention based on multi-dimensional data mining, enabling the idea of extracting features using data mining [22]. Most of the above literature screens features using data mining, which can reflect the influence of experimental data on prediction results well. However, due to different application scenarios, the specific method of feature extraction and application objectives should be adapted.

In general, the above literature review improved AEI to adapt to local EOBE in various ways and selected appropriate contribution quantification methods to conduct root cause tracing research on target variables. In order to further promote the development of EOBE, more factors affecting EOBE need to be considered, and appropriate evaluation criteria should be specified. In addition, by optimizing the prediction model and feature screening methods to improve the accuracy of contribution quantification, a reasonable contribution analysis model is obtained, which has a positive impact on the development of root cause tracing research technology. In addition, it is worth noting that the current AEI has certain limitations in the evaluation of EOBE in different regions, and the level of local EOBE cannot be evaluated pertinently, resulting in the subsequent AEI change root cause tracing research results being accurate.

Inspired by this, working on the basis of the traditional AEI, which only focuses on the enterprises’ gain power efficiency and power reliability, this paper adds considerations of the soundness, the rationality of the regulatory framework, and the level of power public services. Further, it proposes an EOBE evaluation model considering three dimensions: power regulatory quality, public service level, and enterprises’ gain power efficiency. In addition, a feature extraction method targeting the influencing factors behind AEI changes is proposed, and the main causes of EOBE changes are studied. This provides a specific case and theoretical framework for EOBE optimization. The contributions of this paper can be summarized as follows:

The EOBE evaluation model is constructed, considering three dimensions: power regulatory quality, public service level, and enterprises’ gain power efficiency. This model includes the subjective evaluation of the soundness of the regulatory framework and the level of power public services in EOBE, breaking through the limitation of the traditional AEI that only assesses EOBE from an objective perspective. This model can measure and evaluate the local EOBE level simultaneously from both subjective and objective aspects.
This EOBE evaluation model also takes into account the correlation between EOBE and the environment. It has corresponding indicators to assess whether the government considers the sustainable development of the environment in the development of EOBE. In this way, EOBE can be accurately evaluated from the three perspectives of society, environment, and enterprise.
A root cause tracing method is proposed for AEI changes. This is based on data mining to collect the multi-dimensional data of sample materials, extract multi-dimensional features, and analyze the influence and mechanism of each feature on AEI changes by the SVR model and SHAP values. Thus, the method can trace the root causes of AEI changes.

The organizational structure of this paper is as follows: Section 2 constructs a new EOBE evaluation model. Section 3 mainly introduces the theoretical aspects and main process of the AEI change root cause tracing model. Section 4 investigates the root causes of AEI changes. Finally, Section 5 presents a summary of the main findings.

2. EOBE Evaluation Model

2.1. Components of the EOBE Evaluation Model

The EOBE evaluation model is embodied as the AEI, which reflects the difficulty of enterprises in obtaining electricity supply, and is composed of four secondary indicators: time, links, cost, reliability, and tariff transparency. It fully reflects the level of access to electricity at the enterprise level, but lacks consideration of the quality of electricity regulations and the level of public service. Since many countries are now focusing on the performance of EOBE in terms of social benefits and enterprise satisfaction, this paper will consider improving the EOBE evaluation model from multiple dimensions. To accurately reflect the level of access to electricity in the region, this paper improves the AEI in terms of power regulatory quality, public service level, and enterprises’ gain power efficiency, taking into account the current development of the international EOBE, and meets the nine AEI targets, as shown in Table 1 (the improvement ideas of “power regulatory quality” and “public service level” respectively refer to references [9,10]; “enterprises’ gain power efficiency” refers to reference [11]).

2.2. AEI Calculation Method

2.2.1. Power Regulatory Quality

The power regulatory quality indicator is mainly used to secure the completeness of the relevant systems of electricity supply companies and their regulators and consists of the following three secondary indicators.

Joint planning and construction $Y_{1}$

Indicator joint planning and construction is scored based on the fulfillment of the following criteria:

Regulations that require joint planning and construction (e.g., utility poles, overhead or underground cables, telephone lines), including provisions for joint excavation permits, joint excavation, and “one-excavation” policies.
There are provisions in the statute that stipulate the time frame for an authorization decision or the issuance of consent by the body involved in the electrical connection.

The index calculation formula is shown in Equation (1):

Y_{1} {= 50 (α}_{1 a} {+ α}_{1 b})

(1)

where

Y_{1}

indicates the score of the indicator joint planning and construction;

α_{1 a}, α_{1 b}

indicates the fulfillment of criteria a and b in the scoring target area, respectively. Regarding

α_{1 a} {, α}_{1 b} \in (0, 1)

, when the criteria are fulfilled, the value will be 1, and when the criteria are not fulfilled, the value will be 0.

2.: Inspection system for electrical installations $Y_{2}$

The inspection system for electrical installations indicator consists of two parts, the internal installation works and the external installation works. The internal installation work component is scored based on the fulfillment of the following criteria:

There is a legal requirement for internal electrical installation to be carried out by a licensed professional/company and for that professional or company to inspect and certify the quality of the installation.
The law provides for a final inspection by a third party to ensure the quality of internal electrical installations.

This part of the scoring formula is shown in Equation (2):

Y_{21} = \{\begin{matrix} {50, α}_{2 a} {+ α}_{2 b} \geq 1 \\ {0, α}_{2 a} {+ α}_{2 b} < 1 \end{matrix}

(2)

where

Y_{21}

indicates the score of the internal installation work component of the inspection system for electrical installations indicator;

α_{2 a} {, α}_{2 b}

indicates the fulfillment of criteria a and b in the scoring target area, respectively. Regarding

α_{2 a} {, α}_{2 b} \in (0, 1)

, when the criteria are fulfilled, the score will be 1, and when the criteria are not fulfilled, the score will be 0.

The exterior installation section is scored based on the fulfillment of the following criteria:

c.: The law requires that external electrical installations be carried out by a licensed professional/firm and that the professional or firm inspect and certify the quality of the installation.
d.: There is a legal requirement for a final inspection by a third party to ensure the quality of external electrical installations.

This part of the scoring formula is shown in Equation (3):

Y_{22} = \{\begin{matrix} {50, α}_{2 c} {+ α}_{2 d} \geq 1 \\ {0, α}_{2 c} {+ α}_{2 d} < 1 \end{matrix}

(3)

where

Y_{22}

indicates the score of the exterior installation work component of the inspection system for the electrical installation indicator;

α_{2 c} {, α}_{2 d}

indicates the fulfillment of criteria c and d in the scoring target area, respectively. Regarding

α_{2 c} {, α}_{2 d} \in (0, 1)

, when the criteria are fulfilled, the score will be 1, and when the criteria are not fulfilled, the score will be 0.

The index score calculation method of the inspection system for electrical installations is shown in Equation (4):

Y_{2} {= Y}_{21} {+ Y}_{22}

(4)

3.: Environmental sustainability of electricity supply $Y_{3}$

The environmental sustainability of electricity supply indicator relies on two components, the generation environment and the transmission and distribution environment. The generation sector component is scored based on the fulfillment of the following criteria:

The regulatory framework includes provisions for applicable environmental standards for power generation (e.g., energy efficiency requirements for power plants; requirements to reduce local emissions of fossil fuel air pollutants (NO_x, SO₂, particulate matter)).
The regulatory framework provides for deterrence or enforcement mechanisms to ensure compliance with environmental standards for power generation (e.g., penalties or fines for violations of standards by the relevant units; requirements for generating utilities to report to the regulator on the fulfillment of energy efficiency and environmental requirements).

This part of the scoring formula is shown in Equation (5):

Y_{31} {= 25 (α}_{3 a} {+ α}_{3 b})

(5)

where

Y_{31}

indicates the score of environmental sustainability using the power generation indicator;

α_{3 a} {, α}_{3 b}

indicates the fulfillment of criteria a and b in the scoring target area, respectively; regarding

α_{3 a} {, α}_{3 b} \in (0, 1),

when the criteria are fulfilled, the value will be 1, and when the criteria are not fulfilled, the value will be 0.

The transmission and distribution sector section is scored based on the fulfillment of the following criteria:

c.: Relevant environmental standards for transmission and distribution are set in the regulatory framework (e.g., energy efficiency requirements imposed on transmission and distribution utilities; legal requirements to provide smart meters free of charge to commercial customers; development of “smart grids”).
d.: The regulatory framework provides for deterrence or enforcement mechanisms to ensure compliance with environmental standards for transmission and distribution (e.g., penalties or fines for violations of standards by relevant units).

This part of the scoring formula is shown in Equation (6):

Y_{32} {= 25 (α}_{3 c} {+ α}_{3 d})

(6)

where

Y_{3}

indicates the score of environmental sustainability in transmission and distribution indicators;

α_{3 c} {, α}_{3 d}

indicates the fulfillment of criteria c and d in the scoring target area, respectively. Regarding

α_{3 c} {, α}_{3 d} \in (0, 1),

when the criteria are fulfilled, the value will be 1, and when the criteria are not fulfilled, the value will be 0.

The index score calculation method of the environmental sustainability of electricity supply is shown in Equation (7):

Y_{3} {= Y}_{31} {+ Y}_{32}

(7)

2.2.2. Public Service Level

The public service level indicator is mainly used to reflect the quality of service level of governance and the transparency of the public service facilities of electricity supply institutions and enterprises. It consists of the following three secondary indicators.

KPI for monitoring the reliability and sustainability of service delivery $Y_{4}$

The KPI for monitoring the reliability and sustainability of service delivery indicator is scored based on the fulfillment of the following criteria:

Existence of key performance indicators on average duration of power interruptions and average frequency of interruptions to monitor reliability of power supply.
Existence of key performance indicators to monitor environmental sustainability of electricity supply (e.g., renewable energy utilization, energy efficiency of city’s power plants, electricity volume savings, etc.).

The index score calculation method of the KPI for monitoring the reliability and sustainability of service delivery is shown in Equation (8):

Y_{4} = 100 ({2 α}_{4 a} {+ α}_{4 b}) / 3

(8)

where

Y_{4}

indicates the score of the KPI for monitoring the reliability and sustainability of service delivery indicator;

α_{4 a} {, α}_{4 b}

indicate the fulfillment of criteria a and b in the scoring target area, respectively. Regarding

α_{4 a} {, α}_{4 b} \in (0, 1)

, when the criteria are fulfilled, the score will be 1, and when the criteria are not fulfilled, the score will be 0.

2.: Transparency of electricity tariffs and tariff setting $Y_{5}$

The transparency of electricity tariffs and tariff setting indicator is scored based on the fulfillment of the following criteria:

Current electricity prices in the region are available on the official website of the utility or regulator.
The public is notified of changes in electricity rates at least one billing cycle in advance (forms of notification include, but are not limited to, letters, bills, emails, SMS, and channels of notification include, but are not limited to, postings in the media, regulations, or websites).
The formula for calculating a customer’s final electricity rate level is publicly available (the formula is publicly available in a manner that includes, but is not limited to, posting online and on customer bills).

The index score calculation method of the transparency of electricity tariffs and tariff setting is shown in Equation (9):

Y_{5} = 100 (α_{5 a} {+ α}_{5 b} {+ α}_{5 c}) / 3

(9)

where

Y_{5}

indicates the score of the transparency of electricity tariffs and tariff setting indicator;

α_{5 a} {, α}_{5 b} {, α}_{5 c}

indicates the fulfillment of criteria a, b, and c in the scoring target area, respectively. Regarding

α_{5 a} {, α}_{5 b} {, α}_{5 c} \in (0, 1)

, when the criteria are fulfilled, the value will be 1, and when the criteria are not fulfilled, the value will be 0.

3.: Electronic applications for power connections $Y_{6}$

The electronic applications for power connection indicators are scored based on the fulfillment of the following criteria:

New commercial electricity connections can be applied for electronically.
Enables the online tracking of the status of electrical connection applications.

The index score calculation method of the electronic applications for power connections is shown in Equation (10):

Y_{6} = 50 (α_{6 a} {+ α}_{6 b})

(10)

where

Y_{6}

indicates the score of the electronic applications for power connection indicator;

α_{6 a} {, α}_{6 b}

indicates the fulfillment of criteria a and b in the scoring target area, respectively. Regarding

α_{6 a} {, α}_{6 b} \in (0, 1)

, when the criteria are fulfilled, the value will be 1, and when the criteria are not fulfilled, the value will be 0.

2.2.3. Enterprises’ Gain Power Efficiency

The enterprises’ gain power efficiency indicator is mainly used to reflect the reliability of enterprises’ access to electricity supply and the cost of maintaining electricity supply in real life. It consists of the following three secondary indicators. Since the enterprises’ gain power efficiency is presented in the form of data, the score of this type of indicator is calculated by the method of frontier distance [23]. The formula of front distance method is shown in Equation (11):

S = \frac{y_{worst} - y}{y_{worst} - y_{best}} \times 100

(11)

where S is the score obtained for the indicator, where 100 is taken when S is greater than 100 and 0 when S is less than 0;

y_{worst}

is the worst value of the indicator;

y_{best}

is the best value of the indicator; and

y

is the actual value of the indicator.

Average customer outage time $Y_{7}$

The calculation formula of

Y_{7}

is shown in Equation (12):

S = \frac{{SAIDI}_{worst} - SAIDI}{{SAIDI}_{worst} - {SAIDI}_{best}} \times 100

(12)

where

Y_{7}

indicates the score of the average customer outage time indicator;

{SAIDI}_{worst} = 0.31

indicates the worst value of the indicator;

{SAIDI}_{best} = 0

indicates the best value of the indicator;

SAIDI

indicates the actual value of the indicator; and

SAIDI

indicates the average duration of customer outages during the month (hours/household). The formula is shown in Equation (13):

S A I D I = \frac{\sum (T i m e \times N u m b e r_S A I D I)}{A l l N u m b e r}

(13)

where

Time

is outage time;

Number_SAIDI

is the number of customers without power; and

All Number

is the total number of electricity users.

2.: Average frequency of customer outages $Y_{8}$

The calculation formula of

Y_{8}

is shown in Equation (14):

Y_{8} = \frac{{SAIFI}_{worst} - SAIFI}{{SAIFI}_{worst} - {SAIFI}_{best}} \times 100

(14)

where

Y_{8}

indicates the score of the average frequency of the customer outage indicator;

{SAIFI}_{worst} = 1 . 66

indicates the worst value of the indicator;

{SAIFI}_{best} = 0

indicates the best value of the indicator;

S A I F I

indicates the average frequency of customer outages during the month (minutes).The formula is shown in Equation (15):

S A I F I = \frac{\sum N u m b e r_S A I F I}{A l l N u m b e r}

(15)

where

Number_SAIFI

is the number of customers per outage.

3.: Average cost of electric service to customers $Y_{9}$

The calculation formula of

Y_{9}

is shown in Equation (16):

Y_{9} = \frac{F_{worst} - F}{F_{worst} - F_{best}} \times 100

(16)

where

Y_{9}

indicates the score of the average cost of electric service to customers;

F_{worst} = 50

indicates the worst value of the indicator;

F_{best} = 10

indicates the best value of the indicator; and

F

indicates the actual value of the coefficient, i.e.,

F

is the coefficient of the cost of electricity connection service, indicating the advantages and disadvantages of the cost of electricity connection service in the region. The formula is shown in Equation (17):

F = \frac{E}{R}

(17)

where

E

indicates the total electricity charges for the month of an agency purchased electricity for industrial and commercial users when voltage = 10 KV and load capacity = 180 KVA;

R

indicates the monthly per capita disposable income of the region.

2.2.4. Index Weight Division

Since the targets assessed using the indicators in the EOBE evaluation model all have corresponding social and environmental benefits, the nine indicators are considered to be of equal importance. Therefore, the AEI weight is evenly divided into 1/9.

3. Root Cause Tracing Model of AEI Change

3.1. A Priori Arithmetic

In order to establish the factors affecting AEI changes and the mechanism of the factors influencing AEI changes, this paper conducts root art tracing experiments on AEI changes. In order to increase the experimental accuracy of the root cause tracing experiment, this paper uses association rule mining, a data-mining method, to extract the coupling features between AEI. The following is the introduction of the association rule algorithm.

The association rule algorithm is a data-mining technique. The purpose of association rules is to discover the correlation between the frequency of the occurrence of certain items and the frequency of the occurrence of other items through the relationships between items in a data set.

Suppose that there is a transaction set

T

= {

t_{1}, t_{2}, \dots, t_{n}

};

t_{n}

denotes a transaction in the transaction set

T

, consisting of several directions. If there are

A, B,

which are subsets of the transaction set

T

, the association rule is usually expressed in the form

A \to B

. When mining association rules using association rule algorithms, three metrics, support, confidence, and lift, are usually used to evaluate the effectiveness of the rules [24,25].

Formula for calculating support is shown in Equation (18):

s u p p o r t (A \to B) = P (A \cap B)

(18)

where

s u p p o r t (A \to B)

is the probability of the simultaneous occurrence of

A, B

and is represented as the ratio of the number of times both itemsets

A

and

B

appear simultaneously in the transaction set

T

to the total number of transactions in the transaction set

T

.

Calculation formula for confidence is shown in Equation (19):

c o n f i d e n c e (A \to B) = P (B | A)

(19)

where

c o n f i d e n c e (A \to B)

is the probability that

B

occurs conditional on the occurrence of

A

.

c o n f i d e n c e (A \to B)

is represented as the ratio of the probability of

A

and

B

appearing together to the probability of

A

appearing alone.

The calculation formula for

lift

is shown in Equation (20):

l i f t = \frac{c o n f i d e n c e (A \to B)}{s u p p o r t (B)}

(20)

where

lift

is the ratio of

c o n f i d e n c e (A \to B)

to

s u p p o r t (B)

. When

lift

> 1, it means that

A, B

are positively correlated and

A \to B

has a strong association rule; when

lift

= 1, it means that

A, B

are independent of each other; when

lift

< 1, it means that

A, B

are mutually exclusive or irrelevant.

Referring to the existing association rule mining algorithm, the a priori algorithm is selected in this paper to carry out a complete strong association rule mining of the AEI score of the experimental region to extract the coupling features. The a priori association rule algorithm is divided into two parts [26,27]:

Mining frequent itemsets in the transaction set using minimum support (min_support).
Generating strong association rules from frequent itemsets based on minimum confidence (min_confidence) and lift.

The steps of the a priori algorithm are as follows:

Step 1: Read the data, scan the data set, and collect all the items in the data set to generate the candidate 1 itemset

C_{1}

.

Step 2: Prune

C_{1}

and filter out the items that meet the

\min_support

condition to generate the frequent 1 itemset

L_{1}

.

Step 3: According to the property that the subsets of frequent itemsets must be frequent itemsets and the supersets of infrequent itemsets must not be frequent itemsets, the items in the frequent k-1 item

L_{k - 1}

are combined to generate the candidate k-itemset

C_{k}

.

Step 4: Prune

C_{k}

and filter the items that meet the

\min_support

condition to generate the frequent k-itemset

L_{k}

.

Step 5: Repeat steps 3 and 4 until the generated frequent itemset is the empty set.

Step 6: Mining association rules in frequent itemsets based on

\min_confidence

and

lift

requirements.

3.2. Support Vector Regression Prediction Model

High-precision prediction models are necessary for accurately quantifying the contribution degree of features. The support vector regression (SVR) model uses kernel functions to capture the nonlinear relationships of sample data and minimize overfitting. Additionally, it is good at handling high-dimensional and small-sample data [28]. Due to the small sample size of the experimental data, the SVR model has extremely strong matching with this study. The Decision Tree (DT) model has a low requirement for sample size and can have high robustness by handling missing values through alternative splitting [29]. Therefore, the DT model has a better coping ability for the volatility of data, which can ensure the stability and prediction accuracy of the model. The feedforward neural network (FNN) model automatically learns the complex nonlinear relationships through multiple layers of neurons and activation functions [30]. Theoretically, it can approximate any function and achieve optimal accuracy with sufficient data. Due to the insufficient data in this study, the FNN model cannot fully exert its advantage of the upper limit of high precision. However, the feature interaction automatic learning function of FNN can independently determine the importance of features and modify the feature weights according to the data change trend, thereby improving the prediction accuracy of the model.

Since the three models of SVR, DT, and FNN all have high adaptability to this study, in order to maximize the prediction accuracy of the models, this paper, respectively, applies the three different types of prediction models of SVR, DT, and FNN. The accuracies of these three models are compared after Bayesian optimization. Finally, according to the results of prediction accuracy comparison, SVR model is selected to carry out AEI root cause tracing research.

SVR is a regression analysis method based on Support Vector Machine that aims to predict the value of a continuous variable by finding an optimal hyperplane. The optimal hyperplane calculation formula is shown in Equation (21):

f (x) = ω ϕ (x) + b

(21)

where

f (x)

is the regression hyperplane,

ω

is the weight parameter vector of the regression hyperplane,

ϕ (x)

is the mapping function, and

b

is the bias term.

To find a function

f (x)

that predicts the target value and tries to keep the error between all the training data points and the predicted value within a certain tolerance, parameter

ε

(called the “tolerance error”) is introduced, i.e., the error within

ε

is considered to be reasonable.

The prediction accuracy of the model is accurately measured by the mean square error (

MSE

) and the mean absolute error (MAE) [31]. The calculation formula of

MSE

is shown in Equation (22):

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}

(22)

The calculation formula of MAE is shown in Equation (23):

MAE = \frac{1}{n} \sum {| y}_{i} - \hat{y_{i}} |

(23)

In the Equations (22) and (23),

n

is the total number of samples;

y_{i}

is the true value; and

\hat{y_{i}}

is the predicted value.

3.3. SHAP Value

In order to realize the root cause tracing study of AEI changes, this paper selects the SHAP value and prediction model to analyze the contribution degree of extracted features, so as to study the main influencing factors of AEI changes.

The SHAP value allows us to interpret the predicted results of a machine learning model based on the Shapley value concept in cooperative game theory. It aims to utilize the Shapley value in game theory to assign a corresponding value to the contribution of each feature to effectively explain the model’s predictions [32]. The core idea of SHAP value calculation is to consider all possible combinations of features and evaluate the contribution of each feature in all possible combinations.

Suppose that the i th sample of the problem under study is

x_{i}

, the j th feature of the i th sample is

x_{ij}

, the model’s predicted value of the sample is

y_{i}

, and the base value of the entire model (usually the mean value of the target variables of all samples) is

y_{base}

. Then, the SHAP values satisfy Equation (24):

y_{i} {= y}_{base} + f (x_{i 1}) + f (x_{i 2}) + \dots + f (x_{ik})

(24)

where

{f (x}_{ij})

is the SHAP value of

x_{ij}

.

The SHAP value, the marginal contribution of each feature, is the difference between a feature’s contribution at the time it was added and its contribution before it was added. The SHAP value is calculated as shown in Equation (25):

\emptyset_{i} = \sum_{S \subseteq N {i}} \frac{|S|! (M - |S| - 1)!}{M!} {[f}_{x} (S \cup \{i\}) - f_{x} (S)]

(25)

where

N = {1, 2, \dots, M}

represents the subscript of the feature in the data set, M is the total number of characteristic variables, S is a subset of the set

{1, 2, \dots, i - 1, i + 1, \dots, M}

,

|S|

is the total number of elements in

S

,

f_{x} (S \cup \{i\})

represents the predicted value of the model when only the features in

S \cup {i}

are included,

f_{x} (S)

represents the predicted value of the model when there are only features in S, the two are reduced to the marginal contribution of the i-th characteristic variable under subset S.

3.4. The Model Overall Operation Flow

The overall flow of running this root cause tracing model of AEI change is shown in Figure 1. The “a priori” section refers to the literature [27], while the rest is original. The output data of the “EOBE evaluation model” section are the scores of each indicator. The data in the “Feature” section are derived from Section 4.2, “Eigenmatrix Construction”. First, the AEI score was calculated using the constructed EOBE evaluation model. Then, the a priori algorithm was used to mine the association rules among indicators to extract the coupling features, and the feature matrix was constructed by combining the time series features and policy features. Then, the eigenvalues and target values were brought in to compare the prediction accuracy of SVR, DT, and FNN models after Bayesian optimization. Finally, the optimal prediction model was used to calculate the SHAP value and analyze the feature contribution degree.

4. Calculus Analysis

4.1. Calculation of the Access to Electricity Index Score

4.1.1. Data Sources

The original data used in this study are the data needed to calculate the score of the EOBE evaluation model; they are obtained through the official channels of the official websites of relevant departments. The raw data for the calculation of the AEI come from the Guangdong Energy Bureau, the National Bureau of Statistics, and the official website of the China Southern Power Grid in China, and they are required for the calculation of the improved AEI system of this paper for the period of 2021–2024 in a region of Guangdong Province. The types of raw data required for the calculation of the AEI are summarized in Table 2. The parameter definitions in the table are all from Section 2.2., entitled “AEI calculation Method”. The data of the first six indicator parameters are derived from the results of the distributed questionnaire survey, while the data of the last three indicator parameters are from the business environment module of the online platform of China Southern Power Grid. The raw data for the calculation of the AEI consist of two overall types of data, expert judgment and actual values. Among them, the data required for indicators

Y_{1}

–

Y_{6}

are of the expert judgment type. These are assessed and judged by experts, using the information on the official website of the data source, and assigned values to the original data; the data required for indicators

Y_{7}

–

Y_{9}

are of the actual value type, which are the official and publicized real data collected from the official website of the data source.

4.1.2. Calculation and Analysis of Indicator Scores

Working according to the improved AEI formula, the indicator score calculation model is constructed in MATLAB R2021a, and the raw data collected and organized for indicator calculation are used as inputs to obtain the AEI score of a region in Guangdong Province, as shown in Figure 2, Figure 3 and Figure 4 (the data in Figure 2, Figure 3 and Figure 4 are all calculated from the data in Table 2 through the calculation method in Section 2.2).

As can be seen from Figure 2, during 2021–2024, the “access to electricity” score of the experimental area fluctuates around 95 points, with the maximum fluctuation range not exceeding 5 points. In the comparison of the scores of the primary indicators, the scores of the two primary indicators “power regulatory quality” and “public service level” remained at the level of 100 points over the past four years, while the scores of the first indicator “Enterprises’ gain power efficiency” were less than the total score, indicating that for the experimental area, “power regulatory quality” and “public service level” were always at the forefront of the EOBE. The score of the first-level index of “Enterprises’ gain power efficiency” is lower than the total score, indicating that there is still a lot of room for improvement in the area of enterprise access to power efficiency. As can be seen from Figure 3, the score curves of the two primary indicator groups, the quality of electric power regulations and the level of public service, are both 100 points, indicating that the quality of electric power regulations and the level of public service in the experimental area are both superior and stable. As can be seen from Figure 4, the score of the secondary index in the index group of “Enterprises’ gain power efficiency” fluctuated somewhat over four years, but showed an overall upward trend. After analysis, it is believed that the cause of this situation may be that the index evaluation object is affected by the market economy changes, climate instability and other factors, resulting in a large fluctuation in the score of the index of “Enterprises’ gain power efficiency”.

4.2. Eigenmatrix Construction

The level of access to electricity is usually affected by national policies, market economic development trends, climate change, and internal coupling factors of AEI. So, in order to realize the root cause of changes in the level of access to electricity, the characteristics of changes in AEI are uniformly divided into time series characteristics, coupling characteristics, and policy characteristics. The time series features reflect the influence of the market economy’s developmental trend, climate change, and other factors on the change in the level of access to electricity. The corresponding influence of the year and month is different; the coupling features reflect the influence of the internal coupling factors of the AEI on the change in the level of access to electricity; and the policy features reflect the influence of the national policy, national development plan, and other factors on the change in the level of access to electricity. Finally, the feature matrix of the experimental data set is constructed from the time series features, coupling features, and policy features.

4.2.1. Time Series Feature Extraction

In this paper, the AEI values are evaluated in a month as a cycle. The scores of the indicators in the same month are divided into a group of data to reflect the impact of time on the level of access to electricity. The year and month of each group of data are extracted as the time-ordered characteristics of the group of data.

4.2.2. Coupling Feature Extraction

In this paper, the a priori algorithm is used to mine the experimental data, and the coupling features of each group of data are extracted according to the association rules of data mining. First of all, the experimental data were categorized, and since the total score of each indicator was set at 100 points, the categorization interval was set at 5, and the scores of each indicator were divided into 20 categories, which were assigned numbers in combination with the type of the indicator. For example, the number A12 indicates that the score for indicator A at that point in time is between 60 and 65.

Since the data samples used in this paper are small, the information of support and confidence parameters used in trial mining can be utilized to predict the optimal support and optimal confidence for data mining in this data set. After three attempts at mining, it was found that 10% min_support and 50% min_confidence are the best combination for mining binomial association rules. Thus, the min_support is set to 10% and the min_confidence is set to 50%. Using an a priori data-mining algorithm, the strong association rules that satisfy the conditions of minimum support and minimum confidence are obtained from the database of electric power index scores, which are Combination 1 (Outage Time 14 and Outage Frequency 19), Combination 2 (Outage Time 14 and Electricity Service Cost 17), Combination 3 (Outage Time 16 and Outage Frequency 19), Combination 4 (Outage Time 16 and Electricity Service Cost 17), Combination 5 (Outage Time 17 and Outage Frequency 19), Combination 6 (Outage Time 17 and Electricity Service Cost 18), Combination 7 (Outage Time 19 and Outage Frequency 19), Combination 8 (Outage Frequency 18 and Electricity Service Cost 17), Combination 9 (Outage Frequency 19 and Electricity Service Cost 18). Obtaining electricity indicator scores highlighting the strong association rules (support ≥ 10%, confidence ≥ 50%) for the impact of the two internal couplings are shown in Table 3 (The data in Table 3 is calculated from the data in Figure 4 through the model in the “a priori” section of Figure 1).

The cause and effect of the strong correlation rule is used as the judgment condition, if the cause item in the strong correlation rule meets the condition and the score of the result item is less than or equal to the judgment condition in a certain month’s data of obtaining power index score, it can be regarded as conforming to the strong correlation rule. According to this judgment method, we sequentially judge the number of compliance of each group of data in the data set of the score of the AEI for the strong association rule in Table 1, and take the number of compliance of the strong association rule of this group of data as its coupling characteristics, and finally form the coupling characteristics of the experimental data set.

4.2.3. Policy Feature Extraction

According to the evaluation content and evaluation objectives of each indicator, experts’ suggestions were collected to select and establish the search keywords for each indicator, and then keyword searches were carried out on the official website of the National Energy Administration to collect and organize the number of policy documents involved in each indicator in the past four years. After correlation test and sorting, the number of governmental dynamic information disclosed by the government per month during the period of 2021–2024 related to each indicator is summarized as the policy characteristics of this data set.

4.3. Comparison of Predictive Models

The index scores of the dimension of enterprises’ access to electricity efficiency have great volatility and dynamics, so it is planned to predict the scores of the three secondary indexes,

Y_{7}

,

Y_{8}

,

Y_{9}

, of the dimension of enterprises’ access to electricity efficiency in the region by means of the prediction model. In order to ensure the accuracy of the prediction model, three different kinds of prediction models, namely, SVR, DT, and FNN, are selected for the comparison of prediction accuracy. Firstly, in order to improve the accuracy of the prediction models, Bayesian optimization is used to obtain the best hyperparameter combination of each model, and then the optimized prediction models are used to conduct model training and score prediction experiments on the scores of the indicators average customer outage time, average customer outage frequency and average customer electricity service cost, and to compare the prediction accuracies of the different prediction models so as to select the best prediction model for the root cause retrospective experiments of each indicator. The best prediction model for each indicator is selected by comparing the prediction accuracy of different prediction models.

The proportion of the training set is set to be 75%, and the proportion of the test set is set to be 25%. The prediction models were trained by the training set data to minimize the root MSE and the MAE as the objective function, and bayesian optimization was used to optimize the model hyper-parameters, and the optimal hyper-parameter combinations for each model were determined as shown in Table 4 (The parameters in Table 4 are the model parameters of the three models SVR, DT and FNN after Bayesian optimization).

The comparative results of the prediction experiments using different prediction models for the three metrics of average customer outage time, average customer outage frequency, and average customer cost of electric service are shown in Figure 5 and Figure 6 (Figure 5 and Figure 6 show the comparison of the prediction results and prediction accuracy of the three models SVR, DT, and FNN after the model was constructed using the parameters in Table 4, with the data in Figure 4 as the data set). The experimental results show that the best predictive model for all three metrics is the SVR model. After analysis, this situation may occur because SVR can capture complex data patterns by finding the best hyperplane in the high-dimensional feature space for regression. SVR shows better generalization ability for high-dimensional and small data sets, and is able to better generalize the trends in the training data to avoid overfitting to the training data. In contrast, DT and FNN produce overfitting in the prediction of small data sets, resulting in lower prediction accuracy.

4.4. Contribution Analysis

Using the SHAP method, the SVR model is constructed using the best combination of hyperparameters after Bayesian optimization, and the feature contributions of the three secondary indicators

Y_{7}

,

Y_{8}

and

Y_{9}

are calculated, and the distributions of the influence of the features of the three indicators on the target value are visualized, as shown in Figure 7, Figure 8 and Figure 9 (Figure 7, Figure 8 and Figure 9 respectively show the contribution degrees of each feature in indicators

Y_{7}

,

Y_{8}

and

Y_{9}

calculated by the contribution degree analysis model constructed based on the parameters in Table 4 and the method of Section 3.3. “SHAP Value”). In general, the influences on the changes of the AEI are, in descending order, time series feature, coupling feature, and policy feature, respectively.

The results of the comparison of SHAP values of each feature in Figure 7 show that the influence of each feature on the indicator

Y_{7}

is, in descending order, the temporal feature (month), the temporal feature in (year), the policy feature, and the coupling feature. The reason for this is that, first of all, the average customer outage time indicator is mainly affected by the professional degree of maintenance personnel and the level of power supply infrastructure, which is a factor that accompanies the change in time. Moreover, the time span of the year is larger than that of the month, and the influence on the above time factors is also larger, so the influence of the time series feature (year) is slightly greater than that of the time series feature (month). Secondly, as the country is committed to the improvement in maintenance personnel’s professionalism and the upgrading of power equipment, many relevant policies have been introduced also have a non-negligible impact on the change in the average outage time of customers. Of course, outage time is also affected by outage frequency, but not as influential as other characteristics.

The influence mechanism of each feature on

Y_{7}

was deeply analyzed. First of all, the influence of year on

Y_{7}

is almost always positive. Only with the increase of years, the positive influence of years gradually slowed down, which may indicate that the incentive effect of time tends to saturation. Then, the influence mechanism of the month is realized as a positive effect in the first half of the year, and a reverse effect in the second half of the year, which may indicate that the first half of the climate and other conditions, the blackout time is shorter. Next, policies have a progressive effect on

Y_{7}

as a whole, and the greater the policy intensity, the more obvious the promotion effect. Finally, when the influence among indicators is large, it has a promoting effect on

Y_{7}

, while when it is small, it has almost no effect or even a little negative effect.

The results of the comparison of the SHAP values of each feature in Figure 8 show that the influence of each feature on the indicator

Y_{8}

occurs in descending order of temporal feature (month), policy feature, temporal feature (year), and coupled feature. The reason for this situation is that the humidity and temperature in the air are different during the year due to the months and seasons. At high temperatures, the probability of power failures and power loads increases, thus significantly affecting the frequency of power outages. Secondly, the state attaches great importance to the reliability of power supply and, in response to the current situation of the industry, to the timely release of relevant policies and development and investment plans to promote the construction of power grids and optimize the mode of operation of the power system, significantly reducing the frequency of power outages. At the same time, with the growth of the year power system optimization, infrastructure renewal will also have an impact on the frequency of power outages. The final effect of coupling feature, which is generated by the correlation between outage time and outage frequency, is not as large as the effect of other characteristics.

The influence mechanism of each feature on

Y_{8}

was deeply analyzed. First of all, from January to December, the impact of the month on

Y_{8}

gradually decreases in the second half of the year, which may indicate that the power load is larger in the second half of the year, and the climate is more likely to cause power outages. Next, the influence of policy and year on

Y_{8}

almost all show positive incentive effect. The difference is that the greater the number of relevant policies, the greater the incentive effect on

Y_{8}

, while the overall year presents a relatively uniform positive incentive effect.

The results of the comparison of SHAP values of each feature in Figure 9 show that the influence of each feature on the indicator

Y_{9}

is, in descending order, time series features (year), policy features, time series features (month), and coupling features, among which the influence of coupling features is negligible. The reason for this situation is that the cost of electricity service is mainly influenced by market economic development trends and national policy regulation. At the same time, the average electricity service cost of customers has a strong independence compared with other indicators, and so it is minimally affected by the coupling features.

The influence mechanism of each feature on

Y_{9}

was analyzed deeply. First of all, with the growth of the year, the impact of

Y_{9}

gradually changed from negative to positive, and the influence gradually increased. This may indicate that over time, people’s electricity service cost as a proportion of disposable income is gradually decreasing, and people’s consumption level is constantly upgrading. Next, the more the number of relevant policies, the better the incentive effect on

Y_{9}

. If the number of policies is too small, it may cause some negative impact on

Y_{9}

. This may indicate that

Y_{9}

is significantly affected by macro-control policies. Finally,

Y_{9}

in the course of a year shows a gradual shift from negative to positive effects from January to December. This may indicate that the demand for electricity in the first half of the year is large, and so electricity charges account for a relatively large proportion of residents’ disposable income.

4.5. Comparative Discussion of Similar Studies

4.5.1. Similar Studies

Reference [33] studied the electricity demand in Jiangsu Province and the seasonal and temporal influences on electricity prices through data-mining technology. Studies show that the electricity demand fluctuates greatly in summer and winter, and the electricity consumption is also relatively large. The resulting consequence is that the electricity charges in these two seasons are also higher.

The literature [34] predicted the electricity demand from 2010 to 2050 through the mining and analysis of the electricity demand data before 2010 and carried out analysis and research on the changes in electricity demand. The research results show that the global demand for electricity is constantly increasing, but fossil energy is constantly decreasing. The introduction of renewable energy power generation technology can meet global electricity demand. However, distributed power generation is greatly affected by the environment and has weak stability, which may reduce the reliability of power supply.

4.5.2. Comparative Discussions

This study and reference [33] are both based on the power data of 2024, but the research locations are different. In the analysis of electricity charges in winter, the research results of both are consistent, showing the characteristics of higher electricity charges in winter. The research results all indicate that the changes in electricity charges present seasonal characteristics, but show an overall upward trend within one year. The difference between the two studies is that, compared with the research results in reference [33], the summer electricity bill in the research results of this paper is higher. After analysis, it was found that this was caused by the difference in electricity consumption due to the temperature difference between the two places.

This study and reference [34] are, respectively, based on the power data of two time points in 2024 and 2010. The research results of both explain the characteristics of the continuous growth in electricity demand, and correspondingly, the increase in electricity charges and the increase in the difficulty of maintaining reliable power supply. All these results indicate that the efficiency of enterprises in obtaining electricity is affected by temporal characteristics.

This paper and the above two similar studies reveal the changing characteristics and trends of the indicator group of enterprises’ acquisition of power efficiency. Based on these research results, reasonable predictions can be made for electricity charges, power outage time, and power outage frequency, thereby making advanced decisions on possible future EOBE changes and promoting the sound development of EOBE.

4.5.3. Policy Action Recommendations

In response to the research results of this article, we have put forward the following policy action suggestions:

The government can adjust electricity prices according to the seasonality of electricity consumption changes. This will prevent users’ monthly electricity bills from taking up too large a proportion of their income and optimize the local EOBE.
Under the background of the integration of new energy, the government can plan a reasonable energy storage configuration, better utilize new energy for power generation, and ensure that power enterprises can meet the electricity demands of power users as much as possible. This can minimize power outages caused by insufficient public power supply to the greatest extent.

5. Conclusions

This paper studies the improved AEI system and puts forward the root cause tracing method based on data mining and contribution analysis. The main conclusions are summarized as follows:

Based on the World Bank’s B-ready system, the AEI system has been improved, taking into account the quality of electric power regulations, the level of public service, and the efficiency of enterprises’ access to electricity, and expanding the idea of constructing indicators for evaluating the EOBE.
Applying the a priori algorithm to mine the association rules and visualize the analysis helps to reveal the mutual coupling relationship between the indicators and their influence mechanism.
By comparing the prediction accuracy of the three models SVR, DT, and FNN, the optimal prediction model is finally constructed using SVR and contribution analysis is performed. The main factors affecting AEI changes are analyzed, and the influence of these factors on the change in different indicators and the influence mechanisms are different. In general, organizing by influence, from high to low, features are as follows: time series features, policy features, and coupling features. Studying the influence mechanism of each feature on AEI, it is found that the change in AEI is positively related to the year and the number of related policies, and the indicator of average customer outage time is positively incentivized by the coupling feature. As the number of months in a year increases, the average customer outage time decreases and the average customer outage frequency decreases, while the cost of electricity service increases.

The proposed improved AEI system provides a more scientific and reasonable basis for the accurate evaluation of EOBE. The results of root cause tracing experiment provide direction and targets for EOBE optimization.

This paper improved AEI to enable it to evaluate EOBE more accurately. The root cause traceability research on the changes in AEI has also achieved good results. However, in the face of constantly changing social demands and the social background of policy reforms and technological development, the improved AEI and root cause tracing methods proposed in this study still have certain limitations. Although the improved AEI in this study added to the subjective assessment of EOBE, the indicators relating to this relied on enterprise questionnaires, were vulnerable to the cognitive bias of the respondents, and lacked objective data support. However, the static nature of the evaluation criteria of the assessment indicators at the objective level is overly prominent, making it difficult to dynamically reflect the impact of policy adjustments or emergencies. Meanwhile, in the face of more complex factors and an increasing number of samples, the root cause tracing method also needs to continuously optimize the feature extraction approach and subdivide the influencing factors.

In view of the limitations of this study, we will continue to optimize the research of AEI. The evaluation indicators at the subjective level are supported by the evaluation criteria of objective data. More assessment details should be added to the assessment indicators at the objective level, such as how the assessment standards should be adjusted in the face of policy adjustments and unexpected events. In this way, AEI can evaluate EOBE more comprehensively and accurately. Based on data mining and SHAP values, we will also explore more scientific feature extraction methods and root cause tracking methods to improve the accuracy and scientific nature of mining and analyzing the factors and mechanisms influencing factors AEI changes and their influencing mechanisms, and provide more reliable case basis and reference suggestions for EOBE optimization.

Author Contributions

Conceptualization, H.L. and W.Z.; methodology, X.Z.; software, Y.H.; validation, H.L. and Y.H.; formal analysis, W.Z.; investigation, W.Z.; resources, X.Z.; data curation, X.Z.; writing—original draft preparation, Y.H.; writing—review and editing, Y.H.; visualization, H.L.; supervision, Y.H.; project administration, H.L.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The authors declare that this study received funding from the China Southern Power Grid Company Limited Science and Technology Project, titled “Research and Application of Key Technologies for Smart Monitoring of the Electricity Business Environment Based on Multi-source Data Fusion and Privacy Computing” (Project No. 090000KC23030099). The funder had the following involvement with the study: Topic selection, data preparation, model construction, and experimental simulation.

Data Availability Statement

The data used in this research have been clearly stated in Section 4.

Acknowledgments

We thank Shenzhen Power Supply Bureau Co., Ltd. for the support of the project “Research and Application of Key Technologies for Smart Monitoring of the Electricity Business Environment Based on Multi-source Data Fusion and Privacy Computing”. We thank the academic editors and anonymous reviewers for their kind suggestions and valuable comments.

Conflicts of Interest

Authors Hongshan Luo, Xu Zhou and Weiqi Zheng were employed by the company Shenzhen Power Supply Bureau Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BE	Business environment
EOBE	electricity-optimized business ecosystem
AEI	access to electricity index
XGBoost	Extreme Gradient Boosting
NGO	Northern Goshawk Optimization
SVR	support vector regression
RF	random forest
CNN	convolutional neural network
min_support	minimum support
min_confidence	minimum confidence
DT	Decision Tree Regression
B-READY	Business Ready
FNN	feedforward neural network
MSE	mean square error
MAE	mean absolute error
SHAP	Shapley’s additive interpretation

References

Cong, T.N.; Wang, M.Q. An assessment of power sector reforms and utility performance to strengthen consumer self-confidence towards private investment. Econ. Anal. Policy 2021, 69, 676–689. [Google Scholar]
Yang, T.; Zhang, C. Taxation Business Environment Optimization and Enterprise Innovation Efficiency: Empirical Evidence from China’s Policy Tests. Sustainability 2025, 17, 1563. [Google Scholar] [CrossRef]
Fan, H.W.; Yin, J.M. Intellectual property protection and total factor productivity of enterprises: A quasi-natural experiment of intellectual property courts. Financ. Res. Lett. 2024, 70, 106236. [Google Scholar] [CrossRef]
Zheng, Y.J.; Guo, Y.N. Business environment and industrial chain resilience. Financ. Res. Lett. 2025, 75, 106886. [Google Scholar] [CrossRef]
Hou, G.Y.; Sun, C.J. Business environment and its impact on high-quality economic development: A configurational analysis of enhancement pathways. PLoS ONE 2025, 20, 0302508. [Google Scholar] [CrossRef]
Allcott, H. How Do Electricity Shortages Affect Industry? Evidence from India. Am. Econ. Rev. 2016, 106, 587–624. [Google Scholar] [CrossRef]
Fisher-Vanden, K.; Mansur, E.T. Electricity shortages and firm productivity: Evidence from China’s industrial firms. J. Dev. Econ. 2015, 114, 172–188. [Google Scholar] [CrossRef]
Alby, P.; Dethier, J. Firms Operating under Electricity Constraints in Developing Countries. World Bank Econ. Rev. 2013, 27, 109–132. [Google Scholar] [CrossRef]
Maryam, D.; Bakhtiar, O. A hybrid method of system dynamics and design of experiments for investigating the economic and environmental indicators of electricity industry. Heliyon 2024, 10, 31260. [Google Scholar]
Qiang, C.Z.; Wang, H.; Xu, L.C. Ownership, Enforcement, and the Effects of Business Environment. J. Gov. Econ. 2021, 2, 100007. [Google Scholar] [CrossRef]
Wang, C. Profound changes, rich connotations and important implications of the World Bank’s new assessment system for doing Business. Contemp. Econ. Manag. 2024, 46, 88–96. [Google Scholar]
Zhu, R.; Li, N. Well-Production Forecasting Using Machine Learning with Feature Selection and Automatic Hyperparameter Optimization. Energies 2025, 18, 99. [Google Scholar] [CrossRef]
Wang, D.; Jiao, D. Interpretable Combinatorial Machine Learning-Based Shale Fracability Evaluation Methods. Energies 2025, 18, 186. [Google Scholar] [CrossRef]
Wang, Z.H.; Wang, Y. Learning from leading indicators to predict long-term dynamics of hourly electricity generation from multiple resources. Neural Netw. 2025, 186, 107268. [Google Scholar] [CrossRef]
Liang, W.H.; Chen, F.N. A pre-assessment and pollution prevention tool for indoor volatile organic compound simulations during the interior design stage. Build. Environ. 2022, 226, 109584. [Google Scholar] [CrossRef]
Emily, R.; Soutir, B. A statistical framework for district energy long-term electric load forecasting. Appl. Energy 2025, 384, 125445. [Google Scholar]
Li, Y.X.; Zhou, Q. Mid-Long-Term Power Load Forecasting of Building Group Based on Modified NGO. Energies 2025, 18, 668. [Google Scholar] [CrossRef]
Fan, G.-F.; Yu, M. Forecasting short-term electricity load using hybrid support vector regression with grey catastrophe and random forest modeling. Util. Policy 2021, 73, 101294. [Google Scholar] [CrossRef]
Wu, Z.; Wu, D. Prediction of Blast Furnace Gas Generation Based on Bayesian Network. Energies 2025, 18, 1182. [Google Scholar] [CrossRef]
Kan, Y.Z.; Jiang, W. Optimized design method for power matching of composite hydrodynamic wind turbine based on wind farm data mining. Ocean Eng. 2025, 321, 120388. [Google Scholar] [CrossRef]
Wang, S.; Zuo, X.W. Predicting the martensite start temperature of steels via a combination of deep learning and multi-scale data mining. Comput. Mater. Sci. 2025, 246, 113430. [Google Scholar] [CrossRef]
Yu, D.; Guo, Y.H. Research on fast control of distributed photovoltaic countercurrent based on multidimensional data mining. Comput. Electr. Eng. 2025, 123, 110079. [Google Scholar] [CrossRef]
Yin, M.H.; Li, Y. “Access to electricity” evaluation index system and suggestions for improvement. Electr. Power Demand Side Manag. 2022, 24, 105–110. [Google Scholar]
Niu, L.H.; Mi, J.S. Zoom method for association rules in multi-granularity formal context. Soft Comput. 2025, 29, 613–627. [Google Scholar] [CrossRef]
Zhang, Y.; Huang, H.M. Unraveling how poor logistics service quality of cross-border E-commerce influences customer complaints based on text mining and association analysis. J. Retail. Consum. Serv. 2025, 84, 104237. [Google Scholar] [CrossRef]
He, Y.Z.; Xiong, K. Investigation into the evolution of airport cement pavement distresses based on association rule mining. Constr. Build. Mater. 2025, 462, 140046. [Google Scholar] [CrossRef]
Wang, S.P.; Gang, L.H. Analysis of the Characteristics of Ship Collision-Avoidance Behavior Based on Apriori and Complex Network. J. Mar. Sci. Eng. 2024, 13, 35. [Google Scholar] [CrossRef]
Marzieh, M.; Tam, M.P. Predicting wind turbine energy production with deep learning methods in GIS: A study on HAWTs and VAWTs. Sustain. Energy Technol. Assess. 2024, 72, 104070. [Google Scholar]
Somayeh, G.H.; Toktam, S. Publisher Correction: Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regression. Sci. Rep. 2025, 15, 2840. [Google Scholar]
Ellis, M.; Bosman, A.S.; Engelbrecht, A.P. Regularised feed forward neural networks for streamed data classification problems. Eng. Appl. Artif. Intell. 2024, 133, 108555. [Google Scholar] [CrossRef]
Yin, J.Q.; Ding, J.Y. Wave-induced motion prediction of a deepwater floating offshore wind turbine platform based on Bi-LSTM. Ocean Eng. 2025, 315, 119836. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, R. Revealing the impact of urban spatial morphology on land surface temperature in plain and plateau cities using explainable machine learning. Sustain. Cities Soc. 2025, 118, 106046. [Google Scholar] [CrossRef]
Guo, Z.; Zhou, K. Data mining based framework for exploring household electricity consumption patterns: A case study in China context. J. Clean. Prod. 2018, 195, 773–785. [Google Scholar] [CrossRef]
Liu, Z.Y. Chapter 4—Supply and Demand of Global Energy and Electricity. In Global Energy Interconnection, 1st ed.; Academic Press: Cambridge, MA, USA, 2015; pp. 101–182. [Google Scholar]

Figure 1. A flow chart of the root cause tracing model of AEI change.

Figure 2. The first-level index of AEI and the total score of AEI.

Figure 3. Scores of secondary indicators.

Figure 4. Three secondary index scores of the “Enterprises’ gain power efficiency” index group.

Figure 5. Multi-model prediction results of three secondary indexes under the index group of “Enterprise obtaining Power efficiency”: (a) Average customer outage time; (b) Average frequency of customer outages; (c) Average cost of electric service to customers.

Figure 6. Multi-model prediction accuracy comparison of three secondary indexes under the index group of “Enterprise power efficiency”: (a) Average customer outage time; (b) Average frequency of customer outages; (c) Average cost of electric service to customers.

Figure 7. Comparison of SHAP values of the average power failure time of indicator customers: (a) SHAP value; (b) mean absolute SHAP value.

Figure 8. Comparison of SHAP values of each characteristic of average power failure frequency of indicator customers: (a) SHAP value; (b) mean absolute SHAP value.

Figure 9. Comparison of SHAP values of each characteristic of average power service cost of indicator customers: (a) SHAP value; (b) mean absolute SHAP value.

Table 1. Components of the AEI.

Primary Indicator	Secondary Indicator
power regulatory quality $A$	$Joint planning and construction Y_{1}$
	$Inspection system for electrical installations Y_{2}$
	$Environmental sustainability of electricity supply Y_{3}$
public service level $B$	$Key performance indicator (KPI) for monitoring the reliability and sustainability of service delivery Y_{4}$
	$Transparency of electricity tariffs and tariff setting Y_{5}$
	$Electronic applications for power connections Y_{6}$
enterprises’ gain power efficiency $C$	$Average customer outage time Y_{7}$
	$Average frequency of customer outages Y_{8}$
	$Average cost of electric service to customers Y_{9}$

Table 2. Summary of types of original data calculated for AEI.

Indicator Name	Raw Data
$Joint planning and construction Y_{1}$	$α_{1 a} {, α}_{1 b}$
$Inspection system for electrical installations Y_{2}$	$α_{2 a} {, α}_{2 b} {, α}_{2 c} {, α}_{2 d}$
$Environmental sustainability of electricity supply Y_{3}$	$α_{3 a} {, α}_{3 b} {, α}_{3 c} {, α}_{3 d}$
$KPI for monitoring the reliability and sustainability of service delivery Y_{4}$	$α_{4 a} {, α}_{4 b}$
$Transparency of electricity tariffs and tariff setting Y_{5}$	$α_{5 a} {, α}_{5 b} {, α}_{5 c}$
$Electronic applications for power connections Y_{6}$	$α_{6 a} {, α}_{6 b}$
$Average customer outage time Y_{7}$	$SAIDI$
$Average frequency of customer outages Y_{8}$	$SAIFI$
$Average cost of electric service to customers Y_{9}$	$E, R$

Table 3. AEI scores highlighting strong correlation rules for the impact of two internal couplings (support ≥ 10%, confidence ≥ 50%).

Number	Support %	Confidence %	Association Rule Description
1	12.5	85.71	Outage time 65~70 → Outage frequency 90~95
2	12.5	85.71	Outage time 65~70 → Electricity Service Cost 80~85
3	16.67	100	Outage time 75~80 → Outage frequency 90~95
4	14.58	87.5	Outage time 75~80 → Electricity Service Cost 80~85
5	27.08	100	Outage time 80~85 → Outage frequency 90~95
6	16.67	61.54	Outage time 80~85 → Electricity Service Cost 85~90
7	18.75	100	Outage time 90~95 → Outage frequency 90~95
8	14.58	77.78	Outage frequency 85~90 → Electricity Service Cost 80~85
9	31.25	88.24	Outage frequency 90~95 → Electricity Service Cost 85~90

Table 4. Optimal combination of hyperparameters for multiple models.

Model	Hyperparameter	$Y_{7}$	$Y_{8}$	$Y_{9}$
SVR	C	14.3743464	107.8995	999.6546
	Gamma	2.8551	11.6181	33.5656
	Epsilon	0.2104	0.0022	0.8591
DT	minLS	1	1	1
	minParentSize	1	1	1
	maxNumSplits	47	50	50
	numPTS	2	3	1
FNN	HiddenLayerSize	77	43	13
FNN	LearningRate	0.0482	0.0258	0.0701

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, H.; Zhou, X.; Zheng, W.; He, Y. Research on the Root Cause Tracing Method of the Change in Access to Electricity Index Based on Data Mining. Energies 2025, 18, 2275. https://doi.org/10.3390/en18092275

AMA Style

Luo H, Zhou X, Zheng W, He Y. Research on the Root Cause Tracing Method of the Change in Access to Electricity Index Based on Data Mining. Energies. 2025; 18(9):2275. https://doi.org/10.3390/en18092275

Chicago/Turabian Style

Luo, Hongshan, Xu Zhou, Weiqi Zheng, and Yuling He. 2025. "Research on the Root Cause Tracing Method of the Change in Access to Electricity Index Based on Data Mining" Energies 18, no. 9: 2275. https://doi.org/10.3390/en18092275

APA Style

Luo, H., Zhou, X., Zheng, W., & He, Y. (2025). Research on the Root Cause Tracing Method of the Change in Access to Electricity Index Based on Data Mining. Energies, 18(9), 2275. https://doi.org/10.3390/en18092275

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on the Root Cause Tracing Method of the Change in Access to Electricity Index Based on Data Mining

Abstract

1. Introduction

2. EOBE Evaluation Model

2.1. Components of the EOBE Evaluation Model

2.2. AEI Calculation Method

2.2.1. Power Regulatory Quality

2.2.2. Public Service Level

2.2.3. Enterprises’ Gain Power Efficiency

2.2.4. Index Weight Division

3. Root Cause Tracing Model of AEI Change

3.1. A Priori Arithmetic

3.2. Support Vector Regression Prediction Model

3.3. SHAP Value

3.4. The Model Overall Operation Flow

4. Calculus Analysis

4.1. Calculation of the Access to Electricity Index Score

4.1.1. Data Sources

4.1.2. Calculation and Analysis of Indicator Scores

4.2. Eigenmatrix Construction

4.2.1. Time Series Feature Extraction

4.2.2. Coupling Feature Extraction

4.2.3. Policy Feature Extraction

4.3. Comparison of Predictive Models

4.4. Contribution Analysis

4.5. Comparative Discussion of Similar Studies

4.5.1. Similar Studies

4.5.2. Comparative Discussions

4.5.3. Policy Action Recommendations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI