Defense Strategy Against False Data Injection Attacks on Cyber–Physical System for Vehicle–Grid Based on KNN-GAE

Li, Qiuyan; Song, Dawei; Wang, Yuanyuan; Wang, Di; Tao, Weijian; Ai, Qian

doi:10.3390/en18195215

Open AccessArticle

Defense Strategy Against False Data Injection Attacks on Cyber–Physical System for Vehicle–Grid Based on KNN-GAE

by

Qiuyan Li

¹,

Dawei Song

¹,

Yuanyuan Wang

¹,

Di Wang

^2,*,

Weijian Tao

² and

Qian Ai

²

¹

State Grid Henan Electric Power Company Economic and Technological Research Institute, Zhengzhou 450052, China

²

School of Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(19), 5215; https://doi.org/10.3390/en18195215

Submission received: 31 August 2025 / Revised: 24 September 2025 / Accepted: 28 September 2025 / Published: 30 September 2025

(This article belongs to the Section E: Electric Vehicles)

Download

Browse Figures

Versions Notes

Abstract

With the in-depth integration of electric vehicles (EVs) and smart grids, the Cyber–Physical System for Vehicle–Grid (CPSVG) has become a crucial component of power systems. However, its inherent characteristic of deep cyber–physical coupling also renders it vulnerable to cyberattacks, particularly False Data Injection Attacks (FDIAs), which pose a severe threat to the safe and stable operation of the system. To address this challenge, this paper proposes an FDIA defense method based on K-Nearest Neighbor (KNN) and Graph Autoencoder (GAE). The method first employs the KNN algorithm to locate abnormal data in the system and identify the attacked nodes. Subsequently, Graph Autoencoder is utilized to reconstruct the tampered and contaminated data with high fidelity, restoring the accuracy and integrity of the data. Simulation verification was conducted in a typical vehicle–grid interaction system scenario. The results demonstrate that, compared with various scenarios such as no defense, traditional detection mechanisms, and only location-based data elimination, the proposed KNN-GAE method can more accurately identify and repair all attacked data. It provides reliable data input that is closest to the true values for subsequent state estimation, thereby significantly enhancing the system’s state awareness capability and operational stability after an attack. This study offers new insights and effective technical means for ensuring the security defense of the Vehicle–Grid Interaction Cyber–Physical System.

Keywords:

cyber–physical system for vehicle–grid; false data injection attack; k-nearest neighbors; graph autoencoder; state estimation

1. Introduction

In recent years, with the transformation of power systems toward intelligence and decarbonization, the Cyber–Physical System for Vehicle–Grid (CPSVG) has emerged as a bidirectional interaction platform between electric vehicles and power grids, gradually becoming a critical component of smart grids [1]. CPSVG constitutes a multidimensional complex system integrating computation, networking, and physical environments. Through the organic fusion and deep collaboration of 3C (Computing, Communication, Control) technologies, it enables real-time perception, dynamic control, and information services in vehicle–grid interactions [2]. Within CPSVG, real-time data exchange among vehicles, charging piles, and power grids supports functions such as demand response, peak–valley tariff scheduling, and load forecasting. However, its inherent complexity and interconnectedness also render it a potential target for cyberattacks.

Within CPSVG, massive heterogeneous vehicle terminals, charging piles, and grid nodes engage in high-frequency data exchanges via diverse wired and wireless networks, involving intricate authentication, authorization, encryption, and privacy protection mechanisms [3]. Tehrani [4] proposed a Smart Cyber–Physical Energy Systems (SCPES) model for electric vehicles, which provides a key reference for the intelligent management and control of on-board energy. With a CPS (Cyber–Physical System) as its core, the model achieves in-depth synergy between the cyber domain (data transmission, decision-making generation) and the physical domain (energy storage, conversion) through embedded computing and feedback networks, serving as a core support for enhancing vehicle autonomy and energy efficiency. When extended to the vehicle–grid interaction scenario (CPSVG), the model points out that false data injection attacks are easily concealed by the dynamic characteristics of energy, and it is necessary to construct targeted cross-domain defense strategies. Attackers may exploit any vulnerable link to infiltrate the cyber domain, thereby causing substantial damage to the physical domain, including grid stability and vehicle safety [5]. Traditional single-layered, static defense strategies exhibit limited efficacy in such highly dynamic and complex adversarial environments. Notably, FDIAs characterized by their strong concealment, severe harm, and difficulty in traceability, have emerged as a critical threat to the secure operation of V2G systems. FDIAs [6] represent a class of security threats that have garnered significant attention in recent years. By tampering with sensor data or fabricating control commands, attackers deceive the system into processing falsified information, thereby disrupting critical processes such as power load forecasting [7], electric vehicle charging/discharging scheduling [8], and distributed energy management [9]. This not only risks triggering erroneous responses from CPSVG—such as accommodating fabricated load demands or misguiding charging/discharging behaviors—but may also lead to large-scale power resource misallocation, economic losses, and safety hazards [10].

In addressing FDIAs, traditional approaches primarily rely on detection methods based on state estimation and data prediction. Hu [11] proposed a state estimation method employing equivalent measurement transformation, which utilizes weighted residuals to analyze attack data. Manandhar [12] adopted a Kalman estimation framework combined with a Euclidean detector, validating the method under three attack scenarios: random attacks, Denial of Service (DoS) attacks, and FDIAs. Data prediction-based methods compare predicted data with real-time measurements to determine the presence of FDIAs, with detection techniques including consistency test algorithms [13,14] and generalized likelihood ratio [15,16]. While these methods can identify conspicuous anomalous behaviors to some extent, their effectiveness diminishes as attack techniques evolve. Simple statistical and rule-matching approaches are increasingly inadequate against sophisticated attackers who disguise malicious data as legitimate. In recent years, machine learning-based defense methods have gained prominence due to their ability to extract features from vast historical datasets and flexibly identify concealed anomalous patterns. Huang [17] partitioned a centralized system into multiple subsystems and proposed a distributed FDIA detection strategy using a CNN-LSTM (Convolutional Neural Networks-Long Short-Term Memory) model for feature extraction and predictive classification. He [18] introduced a detection mechanism combining state estimation with deep learning identification, with functional validation conducted on large-scale test systems IEEE 118 and IEEE 300. Yang [19] integrated photovoltaic and electric vehicle loads into the traditional IEEE 33-node system to simulate a new energy internet, constructing dual Markov chain models based on measurement data mapped to two distinct spaces for FDIA detection.

However, existing machine learning methods exhibit limited effectiveness in processing highly dynamic and nonlinear data, particularly in vehicle–grid interaction scenarios where the high dynamism and complexity of power loads and vehicle behaviors significantly increase detection challenges. Consequently, there is an urgent need to develop more effective detection methods to address complex and evolving FDIAs. Generative Adversarial Networks (GANs), as a class of generative models, have recently achieved widespread applications in fields such as image generation and anomaly detection. Variant models like the Reweighted Generative Adversarial Network (REWGAN) have been employed for generating and detecting anomalous data, offering new opportunities for identifying sophisticated attacks. Xie [20] proposed a novel technique based on Bayesian Generative Adversarial Networks (GANs), which successfully distinguishes between secure and compromised measurement data even under severe data imbalance conditions. Wang [21] addressed False Data Injection (FDI) attacks during the interaction between physical and cyber layers in DC microgrids by introducing a GCNN-CGAN-based proactive defense method, transforming FDIA defense into a dual problem of attack detection and measurement data recovery. Liu [22] tackled FDIAs targeting State of Charge (SoC) estimation in Battery Energy Storage Systems (BESS) by proposing a TSCW-GAN defense framework.

Although GAN-based anomaly detection methods can be applied to FDIA (False Data Injection Attack) detection to a certain extent, they still have significant limitations when facing complex Vehicle–Grid Integration Cyber–Physical Systems (VGI-CPSs). This is first due to the dynamic nature of data in VGI-CPSs. Secondly, traditional machine learning-based defense strategies either focus on single data features or insufficiently model the dynamic coupling topology of the vehicle–grid system. As a result, they struggle to adapt to the topological changes caused by the integration of charging clusters into the distribution network, leading to poor recognition robustness. Graph Autoencoders (GAEs), leveraging their inherent advantage in modeling network topological structures, have demonstrated outstanding performance in data reconstruction tasks in recent years. Particularly in the processing of network topology-related data, GAE can effectively fuse node features with topological correlations to achieve accurate reconstruction, thereby providing reliable support for anomaly data detection. As Yan [23] stated, GAE operates based on a hybrid similarity graph that integrates normalized Euclidean distance and Pearson correlation coefficient. During the data reconstruction process, it can effectively retain the topological structure information between cells, facilitating data dimensionality reduction and feature mining. As Hamilton [24] stated, GAE is used to encode and reconstruct the vehicular social network; by detecting reconstruction errors, abnormal nodes can be accurately identified. This demonstrates GAE’s strong ability to preserve the integrity of network structures and the validity of data during data reconstruction.

Given these challenges, developing efficient and reliable FDIA defense mechanisms holds substantial theoretical significance and practical urgency for ensuring the security and stability of future energy transportation systems. The main contributions of this paper are as follows:

(1): Proposing an FDIA defense framework tailored for CPSVG, aligning system characteristics with attack defense requirements. By addressing the core features of the vehicle–grid interactive cyber–physical system—”deep coupling of information and physics,” “dynamic heterogeneity of data,” and “topology changes with charging cluster access”—as well as the characteristics of FDIA, such as strong concealment and ease of disguising within data fluctuations, a two-stage defense framework of “attack localization–data reconstruction” is constructed. This framework overcomes the limitations of traditional single defense strategies with poor adaptability and achieves end-to-end precise defense against FDIAs.
(2): Designing a KNN-based attack localization mechanism leveraging spatiotemporal features to enhance attack detection accuracy. Based on the spatiotemporal correlation of CPSVG measurement data, a feature vector integrating “node historical time-series data + grid topological neighborhood data” is constructed. The non-parametric advantage of the KNN algorithm is utilized to adapt to the multi-modal distribution characteristics of the data. This mechanism effectively distinguishes normal load fluctuations from malicious attack signals, addressing the high missed detection rate of traditional BDD methods for concealed attacks and achieving precise localization of attacked nodes.
(3): Constructing a topology-aware GAE data reconstruction model to ensure high-fidelity data recovery. The CPSVG measurement system is modeled as a “node–edge feature” attribute graph, leveraging the inherent advantage of Graph Autoencoders (GAEs) in integrating node features with grid topology. The model is trained on normal operational data to learn the inherent physical constraints of the system. For the contaminated nodes localized by KNN, a mask-based reconstruction strategy is employed to achieve high-fidelity data repair, avoiding the information loss problem caused by “mere removal without reconstruction” and providing complete and reliable data input for state estimation.

2. Problem Statement

This section first conducts modeling on the overall architecture and operational mechanism of the CPSVG. Secondly, it models the state estimation of CPSVG and clarifies the main targets of FDIA. Finally, it models the attack behavior of FDIA, and elaborates on the attack principles of FDIA as well as the harms it imposes on the system.

2.1. Modeling of the CPSVG

A typical CPSVG is a hierarchical architecture that deeply integrates physical energy facilities and multi-layer information networks. As shown in Figure 1, this architecture can be macroscopically divided into three functional layers with distinct roles: the terminal perception layer, the network aggregation layer, and the power grid regulation layer.

The terminal sensing layer serves as the physical foundation of the entire system, consisting of a large number of geographically distributed electric vehicles (EVs) and their connected charging stations (CS). This layer not only acts as the terminal for executing physical processes such as charging and discharging, but also serves as the source of the system’s raw data. It is responsible for real-time sensing and collecting underlying operational data of vehicles, including the state of charge (SoC) of vehicle batteries and power status. The network aggregation layer is a key hub connecting the underlying terminals and the upper-level control, which is usually operated by an Aggregator. It aggregates massive amounts of data uploaded by hundreds or thousands of terminals within its jurisdiction, conducts regional resource optimization and scheduling management, and packages scattered vehicle resources into a uniformly dispatchable resource aggregate (e.g., virtual power plant (VPP)). Meanwhile, it transmits aggregated information upward and issues electricity price signals and regional control instructions downward. At the top of the architecture is the power grid control layer, i.e., the power grid control center, which monitors the operating status of the entire power system from a global perspective. This layer is responsible for formulating top-level control strategies, such as system-level economic dispatch and safety verification, and issues global control instructions and guidance signals to the network aggregation layer according to the real-time needs of the power grid.

In this three-layer architecture, energy flows interact bidirectionally between the terminal perception layer and the physical power grid, while multi-directional information flows—including EV state data, charging power, control instructions, and electricity price signals—run through the three layers. This realizes the tight coupling and efficient collaboration between the information space and the physical space.

2.2. State Estimation of the CPSVG

State Estimation (SE) is a core technology for accurate monitoring and effective control of the CPSVG operation status, and it is also the primary penetration target of FDIA. By utilizing a large amount of redundant measurement data deployed in CPSVG, state estimation abstracts complex physical processes and information interactions into a refined mathematical model, and conducts a credible and optimal estimation of the current operation status of CPSVG. This provides comprehensive, reliable, and accurate data for power system economic dispatch, optimal power flow calculation, and other applications. Different from traditional power systems, the operation status of CPSVG—especially on the distribution network side that bears a large number of Electric Vehicle (EV) charging and discharging loads—exhibits high dynamics, randomness, and time-variability. Therefore, it is necessary to establish a state estimation model that can accurately describe this dynamic system.

Specifically, the operation status of CPSVG can be completely described by a state vector

x

. This vector is usually composed of the voltage amplitude

V

and phase angle

θ

of

N

key nodes (buses) in the distribution network. In practical modeling, one node is generally selected as the reference node (

θ_{1} = 0

), so the state vector

x \in ℝ^{(2 N - 1) \times 1}

can be expressed as:

x = {[θ_{2}, \dots, θ_{N}, V_{1}, \dots, V_{N}]}^{T}

(1)

To solve this state vector, the system control center needs to collect real-time measurement data from each key node (including distribution network nodes connected to a large number of charging piles), and these data form a measurement vector

z

. This vector is a “physical–information” hybrid dataset, which not only includes grid physical quantities such as line power flow and bus power injection collected by the traditional SCADA system, but more importantly, it also contains a large amount of Vehicle-to-Grid (V2G)-specific data from the information domain. These data mainly include the total charging and discharging power of the managed charging station clusters, as well as the voltage quality monitoring data of key charging nodes, which are packaged and uploaded by the aggregator. Therefore, the measurement vector

z

is a high-dimensional dataset that integrates multiple sources and has distinct Cyber–Physical System (CPS) characteristics. In a normally operating system, the number of measurements

m

is much larger than the number of state variables

2 N - 1

to ensure the observability and estimation accuracy of the system.

The relationship between the state vector

x

and the measurement vector

z

is determined by the physical laws of the system (i.e., the power network power flow equations). During state estimation, the DC power flow model is usually adopted, or the AC power flow equations are linearized near the current operating point, thus obtaining the following linearized state-space model:

z = H x + e

(2)

In this model,

H \in ℝ^{m \times (2 N - 1)}

is the measurement Jacobian matrix, whose structure and element values are uniquely determined by the topology and line parameters of the distribution network. The vector

e \in ℝ^{m \times 1}

represents measurement noise, which originates from the inherent measurement error of sensors and communication interference. It is usually modeled as a random vector following a zero-mean Gaussian distribution, i.e.,

e \sim N (0, R)

, where

R

is the covariance matrix of measurement errors.

The measurement vector

z

in the equation is the core target of the False Data Injection Attack studied in this paper. For CPSVG, the measurement vector

z

is not merely a collection of physical sensor readings; its security boundary has been greatly expanded. Attackers are no longer limited to attacking the physical equipment of substations, but can shift their targets to more vulnerable information links. For example, they can infiltrate the Aggregator’s database through network attacks to tamper with the power data uploaded to the power grid, or they can crack communication protocols to inject a large amount of forged vehicle charging status information into the system. This attack method uses the inherent data volatility of the V2G system as a cover, making it extremely difficult to distinguish maliciously injected data from fluctuations caused by normal vehicle charging and discharging behaviors. A successful FDIA will directly contaminate the data source of state estimation, leading the power grid control center to make fatal misjudgments about the bearing capacity of the distribution network, and then issue incorrect dispatch instructions. This may cause economic losses in mild cases or trigger voltage collapse of the local power grid or line overload in severe cases, seriously threatening the safe and stable operation of CPSVG.

2.3. Modeling of FDIA in the CPSVG

Once an attacker successfully infiltrates the information collection link of the system, they inject an attack vector

a

into the real measurement vector

z

. At this point, what the power grid control center receives is no longer the real measurement data, but a maliciously tampered data vector

z_{a}

, whose mathematical expression is:

z_{a} = z + a = H x + e + a

(3)

Here, the non-zero elements of the attack vector

a \in ℝ^{m \times 1}

correspond to the attacked measurement channels. In the CPSVG scenario, these channels may be a compromised aggregator data server or a group of smart charging piles infected with malicious firmware. Unlike the random communication noise

e

, the attack vector

a

is deliberately constructed by the attacker to achieve specific purposes, and its design directly determines the stealthiness and harmfulness of the attack.

FDIA is characterized by “stealthiness”, meaning it can effectively evade the traditional Bad Data Detection (BDD) mechanism based on state estimation residuals [25]. The traditional BDD detects anomalies by calculating the measurement residual

r = ‖z - H \hat{x}‖

and comparing it with a preset threshold

τ

. If

r > τ

, it is determined that bad data exists. An attacker proficient in system knowledge can exploit their mastery of power grid topology information to construct an attack that cannot be detected by this mechanism. Specifically, the attacker can construct an attack vector

a

such that it lies entirely within the space spanned by the column vectors of matrix

H

. The most typical construction method is:

a = H c

(4)

where

c \in ℝ^{(2 N - 1) \times 1}

is a non-zero vector arbitrarily chosen by the attacker, representing the offset they intend to induce in the system state estimation results. When this attack vector is injected into the system, the contaminated measurement vector becomes:

\begin{matrix} z_{a} & = z + H c \\ = H (x + c) + e \end{matrix}

(5)

The control center performs state estimation based on

z_{a}

, and the resulting estimate will be

{\hat{x}}_{a} \approx \hat{x} + c

. At this point, the calculated measurement residual is:

\begin{matrix} r_{a} & = ‖z_{a} - H {\hat{x}}_{a}‖ \\ \approx ‖H (x + c) + e - H (x + c) + e‖ \\ = ‖e‖ \end{matrix}

(6)

It can be seen that the residual after the attack is almost identical to the noise level under normal operating conditions, so

r_{a}

will be much smaller than the detection threshold

τ

.

A successful stealthy FDIA will cause serious consequences to the CPSVG system, with impacts far exceeding mere data errors. Since the state estimation result is offset by

c

(

{\hat{x}}_{a} = \hat{x} + c

), the control center will perceive and make decisions based on the erroneous state. For example, an attacker can design the vector

c

to artificially create false line congestion or voltage violation issues. Based on this erroneous system state, the control center may believe that the distribution network in a certain area cannot bear more loads due to the excessive concentration of EV charging, thereby issuing instructions to the aggregator in that area to reduce charging power or even stop V2G services. This will not only cause economic losses to aggregators and EV users, but may also result in a lack of adjustable resources when the power grid actually needs V2G resources for peak shaving and frequency modulation, threatening the stable operation of the power grid. Conversely, attackers can also use attacks to cover up real system problems, allowing potential overload or voltage stability issues to deteriorate continuously, ultimately leading to physical equipment damage or power outages.

3. Defense Strategy Based on K-Nearest Neighbor-Graph Autoencoder

3.1. Overall Design of the Defense Framework

There are two core problems faced by the CPSVG system after suffering from False Data Injection Attack (FDIA): first, how to quickly and accurately locate the attacked measurement data; second, how to perform high-fidelity recovery of the contaminated data after location. This section proposes an FDIA defense strategy based on K-Nearest Neighbor–Graph Autoencoder (KNN-GAE), and the specific technical path and data processing flow of this defense strategy are shown in Figure 2.

Step 1: Data Input. The starting point of the defense process is that the data acquisition system in the control center receives the real-time measurement vector

z_{a}

contaminated by False Data Injection Attack (FDIA). This vector has a high dimension and integrates multi-source heterogeneous data, including power, voltage, and other parameters from traditional power grid sensors and various Vehicle-to-Grid (V2G) aggregators.

Step 2: Attack Localization. The received vector

z_{a}

is first sent to the attack localization module based on the K-Nearest Neighbors (KNN) algorithm. This module leverages the multivariate coupling pattern of the V2G system learned from massive historical normal operation data to quickly identify each measurement dimension in the vector

z_{a}

, and outputs an index set

I_{a}

marking the positions of all suspicious data points.

Step 3: Data Reconstruction. Subsequently, the original contaminated vector

z_{a}

and the index set

I_{a}

output by the localization module are jointly transmitted to the hybrid data reconstruction module based on the Graph Autoencoder (GAE). This module takes the “healthy” data in the vector

z_{a}

that is not marked by the index set

I_{a}

as a reliable anchor for the current system state. It maps the measurement data to the real power grid topology graph and uses the powerful spatial information reasoning capability of graph neural networks to repair the contaminated node data.

Step 4: Data Output. The final output of the framework is a reconstructed trustworthy measurement vector

z_{r e c}

. This vector is ultimately sent to the system’s state estimator and other advanced application modules, ensuring that subsequent key decisions such as situational awareness, economic dispatch, and safety control are based on accurate and reliable data—thereby safeguarding the safe and efficient operation of the entire CPSVG.

3.2. Rapid Localization of Attacked Data via KNN

3.2.1. Principle of the KNN Algorithm and Its Applicability in CPSVG

This section uses the KNN classification algorithm to conduct rapid and efficient preliminary screening and localization of real-time measurement data contaminated by FDIA. Due to the random charging behavior of a large number of electric vehicle users in the CPSVG system, its measurement data often does not follow a simple Gaussian distribution, and presents complex nonlinear and multimodal characteristics. As a non-parametric model, KNN does not require any prior assumptions about data distribution and can better adapt to such complex data patterns.

The KNN algorithm, as a common classification algorithm, can predict the category of the input sample based on the surrounding nodes of the input sample node. In practical applications, the KNN classification algorithm holds that in the feature space, if most of the k nearest neighbors of a sample belong to a certain category, then the sample also belongs to that category.

3.2.2. Feature Construction and Training for Attack Localization

To enable KNN to effectively distinguish between normal data fluctuations and malicious attack injections, it is essential to design feature vectors that can fully reflect the inherent correlation of data. Using only the measured values themselves as features can easily misjudge normal load mutations as attacks. Therefore, this section constructs a feature vector that can capture the spatiotemporal correlation of data simultaneously. For any measured value

z_{i} (t)

collected at time

t

, its feature vector

f_{i} (t)

is constructed as follows:

Time dimension: It includes the measured value

z_{i} (t)

of the data point at the current moment and the historical values

{z_{i} (t - 1), z_{i} (t - 2), \dots, z_{i} (t - Δ t)}

of the past time steps

Δ t

. This part of the features aims to describe the temporal evolution law of a single measuring point itself.

Spatial dimension: It includes the measured values

{z_{j} (t) ∣ j \in Ω_{i}}

of the set of neighbor nodes

Ω_{i}

directly connected to node

i

in the power grid topology at time. This part of the features utilizes the physical law of the power grid that “the states of adjacent nodes are also similar” and introduces spatial constraints.

The feature vector makes the discrimination of any data point depend on its own historical trends and the current states of surrounding nodes, enhancing the sensitivity to covert attacks.

The training dataset of the model consists of two parts. Normal samples (labeled 0) are directly extracted from the cleaned and verified historical normal operation database of the CPSVG system and constructed into the above spatiotemporal feature vectors. Abnormal samples (labeled 1) are generated by superimposing covert attack vectors

a = H c

on normal historical data for simulation. By injecting a large number of attack vectors with different amplitudes and positions, we can build a diverse attack sample library, thereby training a robust KNN classifier.

3.2.3. Online Attack Localization Process

During the online operation phase of the defense framework, when a new, potentially contaminated measurement vector

z_{a}

arrives at time

t

, the system will iterate through each measurement component in this vector

z_{a, i} (t)

and construct the corresponding spatiotemporal feature vector

f_{i} (t)

in real time. Subsequently,

f_{i} (t)

is fed into the KNN classifier to calculate the Euclidean distance between

f_{i} (t)

and all training samples. Specifically, for the feature vector

f_{i} (t)

to be classified and the feature vector of a certain training sample

f_{j}^{train} (t)

, their Euclidean distance

d

is calculated as follows:

d (f_{i} (t), f_{j}^{train} (t)) = \sqrt{\sum_{k = 1}^{D} {({(f_{i} (t))}_{k} - {(f_{j}^{train} (t))}_{k})}^{2}}

(7)

where

D

represents the total dimension of the feature vector.

The classifier identifies the nearest K neighbors and conducts voting based on the labels of these K neighbors. If the voting result determines that the category is “abnormal” (labeled 1), the index

i

of this data point will be added to an abnormal index set

I_{a}

. After iterating through all measurement components, the finally formed index set

I_{a}

will be transmitted to the graph autoencoder module, providing it with accurate targets that require data reconstruction.

3.3. Topology Aware Data Reconstruction Based on Graph Autoencoder

The goal of this phase is to achieve high-precision and physically interpretable recovery of contaminated data. The core is to utilize the Graph Autoencoder (GAE), which incorporates the inherent physical topology of the power grid as strong prior knowledge to guide and constrain the data reconstruction process.

3.3.1. Graph Modeling of CPSVG Measurement System

To analyze the CPSVG system using graph neural networks, the measurement network of the system is first formally defined as an attribute graph

G = (U, E, X)

, whose components are defined as follows:

(1): Nodes. Nodes ( $U$ ) are defined as the set of all key measurement entities in the CPSVG system. Each node $u_{i} \in U$ uniquely corresponds to a physical location, such as a grid bus, the grid connection point of a V2G charging station, or the data aggregation point of a vehicle aggregator.
(2): Edges. Edges ( $E$ ) are defined as the set of physical and electrical connections between nodes. If there is a direct power line between nodes $u_{i}$ and $u_{j}$ , then edge $(u_{i}, u_{j}) \in E$ . These connection relationships are encoded into a static adjacency matrix $A \in ℝ^{N \times N}$ , which serves as the structured input of the graph, where if nodes $i$ and $j$ are connected, $A_{i j} = 1$ , otherwise 0.
(3): Node Features. Node Features ( $X$ ) are defined as the set of measurement values of all nodes at a specific time $t$ . It is a dynamic feature matrix $X (t) \in ℝ^{N \times 1}$ mapped from the system measurement vector $z_{a}$ at that time, where the values in the $i$ -th row are the real-time measurement values of node $u_{i}$ .

3.3.2. Principle and Training of Graph Autoencoder (GAE)

Graph Autoencoder (GAE) is an unsupervised deep learning model specifically designed for graph-structured data. Its core goal is to learn low-dimensional latent representations of nodes in the graph while preserving the topological structure information and node feature information of the graph. In the defense framework of this paper, GAE is used to learn the inherent patterns that the Whole-Grid measurement data should follow under physical topology constraints when the CPSVG system is in normal operation. It has a symmetric structure, consisting of a graph encoder and a graph decoder.

(1): Graph Encoder

The function of the encoder is to compress the input high-dimensional graph signals into a low-dimensional, information-dense latent space. A Graph Convolutional Network (GCN) is used as the core building block of the encoder. GCN updates node representations by aggregating neighborhood information layer by layer. For a standard GCN layer, the node feature propagation rule from layer

l

to layer

l + 1

can be expressed as:

H^{(l + 1)} = σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(8)

where

H^{(l + 1)}

is the feature matrix of

N

nodes in layer

l

, and

H^{(0)}

is the input original node feature matrix

X

.

\tilde{A} = A + I_{N}

is the adjacency matrix with self-loops added, which is to enable nodes to retain their own feature information during information aggregation.

\tilde{D}

is the degree matrix of

\tilde{A}

, which is a diagonal matrix.

{\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}}

is the symmetric normalization operation on the adjacency matrix, aiming to solve the gradient problem caused by uneven node degrees and make model training more stable.

W^{(l)}

is the weight matrix to be learned in layer

l

, which is a trainable parameter of the neural network.

σ

is a nonlinear activation function, such as ReLU (Rectified Linear Unit).

By stacking several GCN layers, the encoder

{GCN}_{enc}

can take the original node feature matrix

X

and adjacency matrix

A

as inputs and finally output a low-dimensional latent representation matrix

Z \in ℝ^{N \times d}

(where

d ≪ N

), i.e.,

Z = {GCN}_{enc} (X, A)

.

(2): Graph Decoder

The task of the decoder is to receive generated

Z

by the encoder and attempt to reconstruct the original node features from it without loss. The decoder

{GCN}_{dec}

is designed as a graph convolutional network symmetric to the encoder structure, whose goal is to map the low-dimensional

Z

back to the original feature space to obtain the reconstructed node feature matrix

\hat{X}

:

\hat{X} = G C N_{dec} (Z, A)

(9)

If the input data is healthy and free of attacks, the reconstructed

\hat{X}

should be highly similar to the original input

X

.

(3): Unsupervised Training of GAE

The training process of GAE is offline and unsupervised. We use a large-scale graph signal dataset

{X_{1}, X_{2}, \dots, X_{T}}

containing only historical normal operation data of the CPSVG system for training. The optimization goal of the model is to minimize the difference between the original node features

X_{t}

and the decoder-reconstructed features

{\hat{X}}_{t}

. The Mean Squared Error is used as the loss function

L

, which is obtained by calculating the square of the Frobenius norm of the difference between the two matrices:

L = \frac{1}{T} \sum_{t = 1}^{T} {‖X_{t} - \hat{X}‖}_{F}^{2} = \frac{1}{T} \sum_{t = 1}^{T} \sum_{i = 1}^{N} {({(X_{t})}_{i} - {({\hat{X}}_{t})}_{i})}^{2}

(10)

Through optimization based on backpropagation, GAE can learn an optimal encoding–decoding mapping. This training enables the model to learn the inherent correlations between measurement data under given grid topology constraints. Therefore, a fully trained GAE model can identify and correct data points that deviate from this pattern, thereby possessing the ability to identify and reconstruct FDIA-contaminated data.

3.3.3. Online Topology-Aware Reconstruction Process

In the online operation phase of the defense strategy, after receiving the contaminated measurement vector and the abnormal index set output by the KNN module, the pre-trained GAE model will execute the following data reconstruction process:

(1): Graph Signal Construction: First, map the real-time measurement vector $z_{a}$ to the graph node feature matrix $X_{a}$ at the current moment.

(2): Node Masking: According to the abnormal index set $I_{a}$ , locate all nodes determined to be abnormal in the feature matrix $X_{a}$ . The system sets the feature values of these nodes to 0, forming a feature matrix $X_{a}^{'}$ containing unknown information.
(3): Topology-Aware Reconstruction: Input the graph signal $(X_{a}^{'}, A)$ containing unknown information into GAE. Although the features of some nodes are missing, the encoder can still infer the macroscopic operating state of the power grid from a large number of healthy nodes and their neighbor relationships and generate a reasonable latent representation $Z$ . Subsequently, the decoder uses this latent representation $Z$ to simulate values that conform to global physical laws and neighborhood states for those masked nodes, outputting a complete, reconstructed feature matrix ${\hat{X}}_{a}$ .
(4): Data Output: Finally, the system replaces the values at the positions corresponding to the index set $I_{a}$ in the reconstructed matrix ${\hat{X}}_{a}$ with the corresponding values in the original contaminated vector $z_{a}$ , while keeping the remaining healthy data unchanged.

In summary, the flow of the False Data Injection Attack defense strategy based on KNN-GAE is shown in Figure 3.

4. Case Study Analysis

4.1. Case Setup

4.1.1. Simulation Test System

To verify the effectiveness of the proposed KNN-GAE algorithm in the complex environment of the CPSVG, a simulation test system is designed in this section. The system takes the standard IEEE 33-bus distribution network as the main grid of the physical layer, and four different types of Electric Vehicle (EV) aggregators are connected to it. Each aggregator is modeled as a cyber–physical subsystem with an internal network topology and monitoring nodes, as shown in Figure 4.

The four EV aggregators include two public charging-type aggregators and two residential V2G-type aggregators. The specific connection locations, types, and capacity parameters of these aggregators are set as shown in Table 1, and the internal topology of each aggregator is illustrated in Figure 5.

Aggregators A1 and A3 simulate large public charging station operators. Their internal sub-nodes are individual charging islands, and each charging island is equipped with several charging piles. Aggregators A2 and A4 simulate community charging pile operators. Their internal sub-nodes are community charging clusters, which represent collections of V2G resources in a residential community or a building.

Each sub-node (charging island/community charging cluster) within an aggregator is connected to a large number of terminal devices (EV terminals and charging piles). For example, Aggregator A1 manages 200 EVs, and these 200 EVs are distributed across 5 charging islands under A1. Data is collected from the underlying EV terminals to the sub-nodes (charging islands/community charging clusters), then uploaded from the sub-nodes to the aggregator, and finally the aggregator uploads the integrated information to the power grid dispatching layer.

The measurement nodes in this paper include 34 internal sub-nodes of the 4 EV aggregators and 33 nodes of the distribution network, totaling 67 monitoring nodes.

4.1.2. Dataset Generation

First, power flow calculation is performed based on the CPSVG test system described in Section 4.1.1. The specific steps of the power flow calculation are as follows:

(1): The Monte Carlo method [26] is used to simulate the random behavior of individual EVs according to statistical rules (such as travel chains, charging habits, etc.). The power demands of all EVs within the same aggregator and belonging to the same sub-node are arithmetically summed to obtain the total active power and reactive power demands of each sub-node at the current time interval (5 min interval).
(2): Based on the active power (P) and reactive power (Q) values of all sub-nodes generated in step (1), AC power flow calculation [27,28] is conducted for the internal networks of the four aggregators to solve for the states of the sub-nodes (such as voltage and phase angle) and the exchange power between each aggregator and the distribution network at the grid-connection point.
(3): Based on the exchange power at the grid-connection point between aggregators and the distribution network solved in step (2), power flow calculation is performed for the IEEE 33-bus distribution network to solve for the global state of the entire CPSVG system at the current time. This includes the voltage and phase angle of all main network nodes and the power flow of all lines.

In this section, we simulated and generated one year of EV charging and discharging data. The exchange power between aggregators and the distribution network at the grid-connection point is shown in Figure 6.

Subsequently, we obtain the historical operation dataset of the CPSVG, which includes measurement vectors and state vectors. The number of data samples is 365 × 288, as shown in Table 2.

Subsequently, in accordance with the method described in Section 2.3, 20% of the samples were randomly selected from the generated normal operation dataset to inject simulated attacks. Three attack modes were designed: (a) single-point attack on the distribution network measurement vector; (b) single-point attack on the aggregator sub-node measurement vector; (c) coordinated attack on both the distribution network measurement vector and the aggregator sub-node measurement vector. Among them, the offset vector

c

in the attack vector is randomly generated within the range of [−0.05, 0.05], which represents the offset that the attacker intends to impose on the system states (node voltage magnitude and phase angle), so as to simulate diverse attack scenarios.

Finally, the original normal samples were labeled as “0”, and the newly generated attack samples were labeled as “1” according to the attack modes. This formed a complete dataset containing both normal and abnormal data, which can fully reflect the dynamic characteristics of the CPSVG. The dataset was partitioned into training, validation, and test sets in an 8:1:1 ratio for subsequent training and performance evaluation of the defense model.

4.1.3. Evaluation Metrics

For the two scenarios of attack localization and data reconstruction, two metrics were selected to quantitatively evaluate the comprehensive performance of the defense strategy.

(1): Detection Rate (DR)

The detection rate is defined as the ratio of the number of measurement data points correctly identified as attacks to the total number of actually attacked measurement data points. Its calculation formula is as follows:

D R = \frac{N_{true}}{N_{all_attack}}

(11)

In the formula,

N_{true}

represents the number of attack data points successfully identified by the model, and

N_{all_attack}

represents the total number of actually attacked measurement data points.

(2): Root Mean Square Error (RMSE)

The root mean square error reflects the reconstruction quality by calculating the deviation between the reconstructed data and the real data before the attack. Its calculation formula is as follows:

R M S E = \sqrt{\frac{1}{N_{all_attack}} \sum_{i \in S_{a}} {(z_{i} - {\hat{z}}_{i})}^{2}}

(12)

In the formula,

S_{a}

is the index set of attacked data points,

z_{i}

is the real measurement value before the attack, and

{\hat{z}}_{i}

is the reconstructed measurement value.

(3): F1-Score

F_{1} = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(13)

Precision = \frac{T P}{T P + F P}

(14)

Recall = \frac{T P}{T P + F N}

(15)

T P

(True Positive) represents the number of samples that are attacked and correctly detected.

F N

(False Negative) represents the number of samples that are attacked but missed by detection.

F P

(False Positive) represents the number of normal samples that are incorrectly reported as attacks.

4.1.4. Scenario Design

In this section, four different scenarios are set up for comparative analysis, aiming to verify the effectiveness and advancement of the proposed KNN-GAE defense strategy.

(1): Scenario 1: No Defense

The system does not deploy any detection or defense mechanisms. The state estimator of the Vehicle–Grid Interaction Cyber–Physical System (CPSVG) directly uses the measurement vector contaminated by False Data Injection Attack (FDIA) for calculation. This scenario demonstrates the maximum harm that FDIA can cause to the system’s state estimation.

(2): Scenario 2: BDD Detection Mechanism

The traditional Bad Data Detection (BDD) method in power systems, which is based on state estimation residuals, is used to determine whether abnormal data exists.

(3): Scenario 3: Localization and Elimination Only, No Reconstruction

The system uses the KNN module to locate the attacked measurement data. The localization system eliminates the identified abnormal data from the measurement vector and uses the remaining measurement data for state estimation.

(4): Scenario 4: KNN-GAE Method

First, the KNN-based localization module is used to locate the contaminated measurement data. Then, the GAE-based reconstruction module reconstructs the contaminated data using power grid topology information. Finally, the reconstructed measurement vector is sent to the state estimator.

4.2. Analysis of Simulation Results

4.2.1. Simulation Results and Comparison

A coordinated attack scenario is simulated for analysis in this section. The attack is set to occur during the 19:00–19:05 time period of a day. To create the illusion of power grid overload in a specific area, the attacker launches a false data injection attack on the measured value of injected power at Node 21 in the IEEE 33-bus system (where Aggregator A3, a large public charging station operator, is connected) and the measured values of aggregated power at two charging islands (A3-2 and A3-5) within Aggregator A3. The attacker maliciously inflates the power injection readings of these three nodes, attempting to mislead the power grid control center into making incorrect load reduction decisions. Table 3 and Table 4 show the results of attack localization and data reconstruction under the four scenarios.

Table 3 demonstrates the attack localization capabilities of different defense strategies. It can be seen that due to the limitation of its detection threshold, the traditional BDD method (Scenario 2) only identifies the abnormal injected power at Node 21 (where the attack magnitude is the largest), while failing to detect the attacks on two aggregator sub-nodes (A3-2 and A3-5), which have relatively smaller attack magnitudes and higher concealment. In contrast, the KNN-based localization method adopted in this paper (shared by Scenarios 3 and 4) successfully and accurately identifies all three attacked nodes, demonstrating its excellent sensitivity and accuracy in dealing with complex attack modes. The comparison of F1-scores demonstrates that the KNN localization mechanism proposed in this paper (F1 = 1.0) exhibits significantly superior comprehensive performance over the traditional BDD method (F1 = 0.5). While maintaining 100% precision, the KNN mechanism achieves 100% recall, accomplishing the ideal effect of precise localization.

Table 4 further compares the specific processing methods of contaminated data in each scenario. Scenario 1 (No Defense) directly uses the tampered attack values. Scenario 2 (BDD) locates and eliminates the anomaly at Node 21, but retains the attack values of the other two nodes due to missed detection, resulting in distorted data sent to the state estimator. Scenario 3 (Localization and Elimination Only) identifies and eliminates all contaminated data, but causes the loss of key information. Scenario 4 (Proposed KNN-GAE Method) performs high-fidelity reconstruction on all contaminated data after accurate localization; the reconstructed values are extremely close to the actual values, providing the most accurate and complete data input for subsequent state estimation.

Figure 7 shows the voltage magnitude state of Node 21 estimated under the four scenarios.

Figure 7 presents the state estimation results of the voltage magnitude of Node 21 under four different scenarios after a coordinated attack occurs at 19:00. In the figure, the real value curve (used as a benchmark) fluctuates stably within the normal range. In Scenario 1 (No Defense), the state estimator is affected by the injected false data, and the calculated voltage value drops rapidly after the attack, showing a large deviation from the real trajectory. Scenario 2 (BDD Detection Mechanism) has poor reconstruction effect due to the presence of undetected attacked nodes, and its state estimation results deviate from the real values. In Scenario 3 (Localization and Elimination Only), although the KNN module successfully locates the attack and eliminates the contaminated data, the state estimator lacks key measurement information, so its estimation results are improved but still show an obvious deviation from the real values. In contrast, the estimation curve of Scenario 4 (Proposed KNN-GAE Method) has the closest state estimation results to the real values during the post-attack period. This indicates that the proposed method can not only accurately locate attacks, but can also restore the complete information of the system through high-fidelity data reconstruction, minimizing the interference of attacks on state estimation and verifying the superiority of the proposed method.

4.2.2. Robustness Analysis of the Model

To further verify the stability and reliability of the proposed KNN-GAE defense framework under different attack conditions, this section conducts robustness analysis from two dimensions—attack intensity and attack coverage—and compares its performance with that of the traditional BDD method.

(1): Performance Analysis Under Different Attack Intensities

This analysis aims to verify the model’s performance when attackers increase their attack intensity. Different attack intensities are simulated by adjusting the value range of the offset vector

c

described in Section 2.3.

Figure 8 shows the performance changes of various defense methods under different attack intensities. It can be observed that the detection rate (DR) of the proposed method in this paper remains at an extremely high level of over 97%, barely affected by the increase in attack intensity. In contrast, the traditional BDD method performs poorly in the CPSVG system: although its detection rate increases slightly with the rise of attack intensity, it does not exceed 70% even at the highest intensity, revealing its fundamental flaw in dealing with stealthy attacks. Meanwhile, the RMSE of data reconstruction of the proposed method rises slightly as attack intensity increases, but its absolute value remains at a low level (<0.04). This indicates that the GAE reconstruction module has good robustness and can achieve high-fidelity recovery even when facing large data deviations.

(2): Performance Analysis Under Different Attack Coverages

This analysis aims to verify the model’s detection rate when confronting attacks of different scales. Different attack coverages are simulated by changing the number of nodes attacked simultaneously.

Table 5 presents the performance of the proposed KNN-GAE method in coping with different attack coverages. The experimental results show that whether facing a precise attack on a single node or a large-scale coordinated attack on up to 10 nodes, the detection rate of this method remains above 90%. At the same time, the RMSE of data reconstruction rises slightly as the number of attacked nodes increases—this is because the amount of data to be reconstructed increases, leading to higher uncertainty—but the increase amplitude is very gentle, and the absolute value remains low. This proves that the GAE module can effectively use the topology information of the entire network and the data of healthy nodes to reliably recover large-scale contaminated data, demonstrating the model’s excellent robustness in complex attack scenarios. Under larger-scale attacks (5-node and 10-node scenarios), the model faces increased difficulty in maintaining high precision, resulting in a small number of false positives (FP) and a decrease in precision from 1.0. Nevertheless, even in the most challenging 10-node coordinated attack scenario, the F1-score remains relatively high (>0.8), outperforming the traditional BDD method’s score of 0.5. This demonstrates the robustness of the proposed defense framework.

4.2.3. Comparison with Other Methods

To further evaluate the advancement of the KNN-GAE framework proposed in this paper, this section compares it with two typical machine learning methods recently applied in FDIA defense: a CNN-LSTM model and a GAN model.

According to the comparative results in Table 6, it can be observed that the KNN-GAE method achieves the highest detection rate (DR) and F1-score across all scenarios, demonstrating its comprehensive advantage in achieving extremely high detection rates while maintaining very low false positive rates. Although the GAN model exhibits a high detection rate, its F1-score is constrained by persistent false positives (FPs), indicating that a single reconstruction error threshold approach is prone to misjudgment. The CNN-LSTM model shows the lowest DR and F1-score, suggesting that it suffers from significant missed detections and false alarms, making it difficult to cope with sophisticated attacks. In terms of data reconstruction, the proposed method maintains a consistently low RMSE (<0.05), proving the ability of the GAE reconstruction module to achieve high-fidelity data recovery. In contrast, the other compared methods fail to effectively restore data. The consistently superior F1-score of the KNN-GAE framework reflects its optimal overall performance. The GAN model performs moderately, while the CNN-LSTM model struggles to meet the high-security demands of CPSVG.

5. Conclusions and Outlook

5.1. Conclusions

This paper conducts in-depth research on the security and stability issues of the Vehicle–Grid Interaction Cyber–Physical System (CPSVG) when facing FDIA. The main work and conclusions of this paper are summarized as follows:

(1): This paper first analyzes the structural characteristics of the CPSVG system and its vulnerability to cyberattacks, and points out that FDIA, due to its high concealment and strong destructiveness, has become a key factor threatening system security.
(2): To effectively defend against FDIA, this paper proposes a two-stage defense strategy. The strategy first uses the KNN algorithm to analyze the system measurement data, realizing accurate localization of the attack data source. Subsequently, for the located contaminated data, a Graph Autoencoder (GAE) model is used for data reconstruction to restore the key information distorted by the attack.
(3): Through simulation experiments, this paper compares the proposed method with three scenarios: no defense, traditional Bad Data Detection (BDD), and only localization and elimination of abnormal data. The experimental results show that the proposed KNN-GAE method exhibits advantages in both the accuracy of attack localization and the fidelity of data reconstruction. In the voltage state estimation of key nodes, the data processed by this method makes the state estimation results highly consistent with the real values, effectively resisting the interference of FDIA on the system and ensuring the safe and stable operation of the power grid.

In summary, the KNN-GAE-based defense method proposed in this paper can effectively cope with FDIA in CPSVG. While accurately identifying attacks, it ensures the accuracy of system state estimation through high-quality data reconstruction, providing a theoretical basis and practical reference for improving the cyberattack defense capability of complex cyber–physical systems.

5.2. Outlook

Although the method proposed in this paper has achieved certain results in defending against FDIAs, there are still some research directions that can be further deepened and expanded:

(1): Optimization of the algorithm and improvement of generalization ability: In the future, more advanced machine learning or deep learning algorithms can be explored, for example, combining the Attention Mechanism with GAE to further improve the accuracy of data reconstruction.
(2): Defense strategies for hybrid attack scenarios: The current research mainly focuses on FDIA. However, in practical applications, the system may face multiple types of cyberattacks simultaneously, such as Denial of Service (DoS) attacks and replay attacks. Future research can focus on designing a comprehensive defense framework that can defend against multiple hybrid attacks at the same time.
(3): Lightweight deployment and engineering practice: Considering the limitations of computing resources in actual systems, future research can explore lightweight methods for the model. On the premise of ensuring defense performance, reduce the computational complexity and time cost of the algorithm, so as to facilitate deployment and application on edge computing nodes and promote the transformation of research results into engineering practice.

Author Contributions

Conceptualization, Q.L., D.S. and Y.W.; methodology, Q.L., D.S., D.W. and Y.W.; validation, D.S., W.T. and Q.A.; formal analysis, Q.L. and D.W.; investigation, D.W. and W.T.; resources, W.T. and Q.A.; data curation, Q.L. and Q.A.; writing—original draft preparation, D.W., W.T. and Q.A.; writing—review and editing, Q.L., D.S. and Y.W.; visualization, W.T. and Q.A.; supervision, Q.A.; project administration, Q.A.; funding acquisition, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Science and Technology Project of State Grid Henan Electric Power Company (SGAAYJ00NDJS2400018).

Data Availability Statement

The data can be obtained from the paper.

Conflicts of Interest

Authors Q.L., D.S. and Y.W. were employed by the State Grid Henan Electric Power Company Economic and Technological Research Institute. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Sovacool, B.K.; Noel, L.; Axsen, J.; Kempton, W. The neglected social dimensions to a vehicle-to-grid (V2G) transition: A critical and systematic review. Environ. Res. Lett. 2018, 13, 013001. [Google Scholar] [CrossRef]
Dibaji, S.M.; Pirani, M.; Flamholz, D.B.; Annaswamy, A.M.; Johansson, K.H.; Chakrabortty, A. A systems and control perspective of CPS security. Annu. Rev. Control 2019, 47, 394–411. [Google Scholar] [CrossRef]
Guille, C.; Gross, G. A conceptual framework for the vehicle-to-grid (V2G) implementation. Energy Policy 2009, 37, 4379–4390. [Google Scholar] [CrossRef]
Tehrani, K. A smart cyber physical multi-source energy system for an electric vehicle prototype. J. Syst. Archit. 2020, 111, 101804. [Google Scholar] [CrossRef]
Han, W.; Xiao, Y. Privacy preservation for V2G networks in smart grid: A survey. Comput. Commun. 2016, 91, 17–28. [Google Scholar] [CrossRef]
Ahmed, M.; Pathan, A.S.K. False data injection attack (FDIA): An overview and new metrics for fair evaluation of its countermeasure. Complex Adapt. Syst. Model. 2020, 8, 4. [Google Scholar] [CrossRef]
Xydas, E.S.; Marmaras, C.E.; Cipcigan, L.M.; Hassan, A.S.; Jenkins, N. Electric vehicle load forecasting using data mining methods. In Proceedings of the IET Hybrid and Electric Vehicles Conference 2013 (HEVC 2013), London, UK, 6–7 November 2013; IET: Stevenage, UK, 2013; pp. 1–6. [Google Scholar]
Liu, H.; Qi, J.; Wang, J.; Li, P.; Li, C.; Wei, H. EV dispatch control for supplementary frequency regulation considering the expectation of EV owners. IEEE Trans. Smart Grid 2016, 9, 3763–3772. [Google Scholar] [CrossRef]
Alnowibet, K.; Annuk, A.; Dampage, U.; Mohamed, M.A. Effective energy management via false data detection scheme for the interconnected smart energy hub–microgrid system under stochastic framework. Sustainability 2021, 13, 11836. [Google Scholar] [CrossRef]
Liang, G.; Zhao, J.; Luo, F.; Weller, S.R.; Dong, Z.Y. A review of false data injection attacks against modern power systems. IEEE Trans. Smart Grid 2016, 8, 1630–1638. [Google Scholar] [CrossRef]
Hu, Z.; Wang, Y.; Tian, X.; Yang, X.; Meng, D.; Fan, R. False data injection attacks identification for smart grids. In Proceedings of the 2015 Third International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), Beirut, Lebanon, 29 April–1 May 2015; IEEE: New York, NY, USA, 2015; pp. 139–143. [Google Scholar]
Manandhar, K.; Cao, X.; Hu, F.; Liu, Y. Detection of faults and attacks including false data injection attack in smart grid using Kalman filter. IEEE Trans. Control. Netw. Syst. 2014, 1, 370–379. [Google Scholar] [CrossRef]
Zhao, J.; Zhang, G.; La Scala, M.; Dong, Z.Y.; Chen, C.; Wang, J. Short-term state forecasting-aided method for detection of smart grid general false data injection attacks. IEEE Trans. Smart Grid 2015, 8, 1580–1590. [Google Scholar] [CrossRef]
Shi, W.; Wang, Y.; Jin, Q.; Ma, J. PDL: An efficient prediction-based false data injection attack detection and location in smart grid. In Proceedings of the 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo, Japan, 23–27 July 2018; IEEE: New York, NY, USA, 2018; Volume 2, pp. 676–681. [Google Scholar]
Li, S.; Yılmaz, Y.; Wang, X. Quickest detection of false data injection attack in wide-area smart grids. IEEE Trans. Smart Grid 2014, 6, 2725–2735. [Google Scholar] [CrossRef]
Zhao, J.; Mili, L.; Wang, M. A generalized false data injection attacks against power system nonlinear state estimator and countermeasures. IEEE Trans. Power Syst. 2018, 33, 4868–4877. [Google Scholar] [CrossRef]
Huang, D.; He, L.; Sun, J.; Hu, A. Distributed Detection Method for Power Grid False Data Attacks Based on Edge Computing. Power Syst. Prot. Control 2021, 49, 1–9. [Google Scholar] [CrossRef]
He, Y.; Mendis, G.J.; Wei, J. Real-time detection of false data injection attacks in smart grid: A deep learning-based intelligent mechanism. IEEE Trans. Smart Grid 2017, 8, 2505–2516. [Google Scholar] [CrossRef]
Yang, S.; Tan, B.; Guo, J. Detection of false data injection attacks in a novel energy Internet based on dual Markov chains. Electr. Power Autom. Equip. 2021, 41, 131–137. [Google Scholar] [CrossRef]
Xie, J.; Rahman, A.; Sun, W. Bayesian GAN-based false data injection attack detection in active distribution grids with DERs. IEEE Trans. Smart Grid 2023, 15, 3223–3234. [Google Scholar] [CrossRef]
Wang, Y.; Han, X.; Zhang, G.; Jia, K. Mitigation of false data injection attack in DC microgrid based on conditional GAN. In Proceedings of the 2024 3rd International Conference on Energy and Electrical Power Systems (ICEEPS), Guangzhou, China, 14–16 July 2024; IEEE: New York, NY, USA, 2024; pp. 744–748. [Google Scholar]
Liu, Z.; Li, Y.; Wang, Q.; Li, J. TSCW-GAN based FDIAs defense for state-of-charge estimation of battery energy storage systems in smart distribution networks. IEEE Trans. Ind. Inform. 2023, 20, 5048–5059. [Google Scholar] [CrossRef]
Yan, X. scADGH: scRNA-seq clustering utilizing on attention-based DAE and hybrid similarity GAE. In Proceedings of the 2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI), Changchun, China, 24–26 May 2024; IEEE: New York, NY, USA, 2024; pp. 1448–1453. [Google Scholar]
Hamilton, A.; Khan, M.S.; Silvestri, S.; Scott, C. Big-Data Driven Anomaly Detection in Vehicular Social Networks Using Graph Autoencoders. In Proceedings of the 2024 27th International Symposium on Wireless Personal Multimedia Communications (WPMC), Greater Noida, India, 17–20 November 2024; IEEE: New York, NY, USA, 2024. [Google Scholar]
Crawford, K.; Baran, M.E. Topology Error Monitoring Using Bad Data Detection Methods. IEEE Trans. Ind. Appl. 2023, 60, 1476–1483. [Google Scholar] [CrossRef]
Nogueira, T.; Magano, J.; Sousa, E.; Alves, G.R. The impacts of battery electric vehicles on the power grid: A Monte Carlo method approach. Energies 2021, 14, 8102. [Google Scholar] [CrossRef]
Liu, C.; Zhang, B.; Hou, Y.; Wu, F.F.; Liu, Y. An improved approach for AC-DC power flow calculation with multi-infeed DC systems. IEEE Trans. Power Syst. 2010, 26, 862–869. [Google Scholar] [CrossRef]
Baradar, M.; Ghandhari, M.; Van Hertem, D.; Kargarian, A. Power flow calculation of hybrid AC/DC power systems. In Proceedings of the 2012 IEEE Power and Energy Society General Meeting, San Diego, CA, USA, 22–26 July 2012; IEEE: New York, NY, USA, 2012; pp. 1–6. [Google Scholar]

Figure 1. Three-layer Operational Architecture of CPSVG.

Figure 2. Flow Chart of the Defense Strategy Based on KNN-GAE.

Figure 3. Training and Real-Time Defense Overall Process of the FDIA Defense Strategy Based on KNN-GAE.

Figure 4. Overall Architecture of the Simulation Test System.

Figure 5. Internal Topology Diagram of EV Aggregators.

Figure 6. Simulation Data of Charging/Discharging Operational Power for Four Different Aggregators.

Figure 7. Comparison of Voltage Magnitude States of Node 21 Under Four Scenarios.

Figure 8. Performance Comparison of Different Methods Under Different Attack Intensities.

Table 1. EV aggregator topology information.

Aggregator	Type	Connected Distribution Network Bus	Number of Sub-Nodes	Managed EV Scale	Maximum Aggregated Power
A1	Large public charging station operator	30	5	200 units	2.0 MW
A2	Community charging pile operator	32	12	350 units	1.5 MW
A3	Large public charging station operator	21	8	250 units	2.5 MW
A4	Community charging pile operator	17	9	300 units	1.2 MW

Table 2. Historical operation data set information.

Vector Type	Composition	Data Source (Source Node)	Dimension
State Vector	Node voltage magnitude	33 nodes of the distribution network (obtained via power flow calculation)	33
State Vector	Node voltage phase angle	32 non-reference nodes of the distribution network (obtained via power flow calculation)	32
Measurement Vector	Node injected power (P, Q)	33 nodes of the distribution network (obtained from known data + power flow calculation)	66
Measurement Vector	Sub-node interactive power (P, Q)	34 internal nodes of 4 EV aggregators (directly obtained via Monte Carlo simulation)	68

Table 3. Comparison of Attack Localization Results of Different Methods Under Four Scenarios.

Target Node	Actual State	Localization Result of Scenario 2 (BDD)	Localization Result of Scenarios 3 & 4 (KNN)
Injected Power at Node 21	Attacked	Abnormal	Abnormal
Aggregated Power at A3-2	Attacked	Normal	Abnormal
Aggregated Power at A3-5	Attacked	Normal	Abnormal
... (Other Nodes)	Normal	Normal	Normal
DR	/	33.33%	100%
F1-Score	/	0.5	1

Table 4. Comparison of Contaminated Data Processing and Reconstruction Results Under Four Scenarios (Unit: MW).

Measurement Node	Actual Value	Processing Result of Scenario 1	Processing Result of Scenario 2	Processing Result of Scenario 3	Processing Result of Scenario 4
Injected Power at Node 21	1.85	2.5	Eliminated	Eliminated	1.86
Aggregated Power at A3-2	0.92	1.6	1.60 (Undetected)	Eliminated	0.94
Aggregated Power at A3-5	0.76	1.4	1.40 (Undetected)	Eliminated	0.77
RMSE	/	0.657	0.360	/	0.014

Table 5. Performance of the Proposed Method Under Different Attack Coverages.

Number of Simultaneously Attacked Nodes	DR	RMSE	F1-Score
1	100%	0.011	1
3 (Typical Scenario)	100%	0.014	1
5	100%	0.018	0.83
10	90%	0.023	0.82

Table 6. Performance Comparison of Various Defense Methods Under Different Attack Scales.

Method	Attacked Nodes	TP	FP	FN	DR	RMSE	F1-Score
CNN-LSTM	5	4	3	1	80.0%	0.135	0.667
	10	7	4	3	70.0%	0.228	0.667
	15	10	5	5	66.7%	0.315	0.667
	20	12	6	8	60.0%	0.410	0.632
GAN	5	5	2	0	100.0%	/	0.833
	10	9	3	1	90.0%	/	0.818
	15	13	4	2	86.7%	/	0.812
	20	16	5	4	80.0%	/	0.781
KNN-GAE	5	5	1	0	100.0%	0.018	0.909
	10	10	1	0	100.0%	0.023	0.952
	15	14	2	1	93.3%	0.035	0.903
	20	18	2	2	90.0%	0.048	0.900

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Q.; Song, D.; Wang, Y.; Wang, D.; Tao, W.; Ai, Q. Defense Strategy Against False Data Injection Attacks on Cyber–Physical System for Vehicle–Grid Based on KNN-GAE. Energies 2025, 18, 5215. https://doi.org/10.3390/en18195215

AMA Style

Li Q, Song D, Wang Y, Wang D, Tao W, Ai Q. Defense Strategy Against False Data Injection Attacks on Cyber–Physical System for Vehicle–Grid Based on KNN-GAE. Energies. 2025; 18(19):5215. https://doi.org/10.3390/en18195215

Chicago/Turabian Style

Li, Qiuyan, Dawei Song, Yuanyuan Wang, Di Wang, Weijian Tao, and Qian Ai. 2025. "Defense Strategy Against False Data Injection Attacks on Cyber–Physical System for Vehicle–Grid Based on KNN-GAE" Energies 18, no. 19: 5215. https://doi.org/10.3390/en18195215

APA Style

Li, Q., Song, D., Wang, Y., Wang, D., Tao, W., & Ai, Q. (2025). Defense Strategy Against False Data Injection Attacks on Cyber–Physical System for Vehicle–Grid Based on KNN-GAE. Energies, 18(19), 5215. https://doi.org/10.3390/en18195215

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Defense Strategy Against False Data Injection Attacks on Cyber–Physical System for Vehicle–Grid Based on KNN-GAE

Abstract

1. Introduction

2. Problem Statement

2.1. Modeling of the CPSVG

2.2. State Estimation of the CPSVG

2.3. Modeling of FDIA in the CPSVG

3. Defense Strategy Based on K-Nearest Neighbor-Graph Autoencoder

3.1. Overall Design of the Defense Framework

3.2. Rapid Localization of Attacked Data via KNN

3.2.1. Principle of the KNN Algorithm and Its Applicability in CPSVG

3.2.2. Feature Construction and Training for Attack Localization

3.2.3. Online Attack Localization Process

3.3. Topology Aware Data Reconstruction Based on Graph Autoencoder

3.3.1. Graph Modeling of CPSVG Measurement System

3.3.2. Principle and Training of Graph Autoencoder (GAE)

3.3.3. Online Topology-Aware Reconstruction Process

4. Case Study Analysis

4.1. Case Setup

4.1.1. Simulation Test System

4.1.2. Dataset Generation

4.1.3. Evaluation Metrics

4.1.4. Scenario Design

4.2. Analysis of Simulation Results

4.2.1. Simulation Results and Comparison

4.2.2. Robustness Analysis of the Model

4.2.3. Comparison with Other Methods

5. Conclusions and Outlook

5.1. Conclusions

5.2. Outlook

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI