1. Introduction
The integration of photovoltaic (PV) systems into modern smart grids has fundamentally transformed the energy landscape, enabling distributed generation and enhanced grid flexibility. However, this transformation has introduced unprecedented cybersecurity challenges. Unlike traditional centralized power systems, PV installations rely heavily on smart inverters equipped with communication interfaces, creating multiple attack vectors that threaten both system integrity and grid stability [
1]. When adversaries gain control over DER systems, they can deliberately alter power generation patterns, resulting in persistent grid oscillations that compromise overall system stability [
2]. The vulnerability increases when attackers control multiple DER units, especially energy storage systems. Such attacks can push voltage levels beyond safe limits in distribution networks [
3]. Electric vehicle infrastructure creates new attack vectors. Researchers study how cyberattacks on charging systems can affect the broader grid [
4]. Adversaries can exploit weaknesses in communication protocols and individual DER devices. These attacks can significantly disrupt power system operations [
5]. Despite growing awareness of these vulnerabilities, the cybersecurity community lacks standardized datasets and evaluation frameworks for developing and comparing detection algorithms specific to PV systems. Existing cybersecurity datasets focus primarily on network traffic or generic industrial control systems, failing to capture the unique physics-based relationships inherent in PV operations. This gap has hindered the development of effective, physics-informed anomaly detection algorithms tailored to PV cybersecurity applications. To address this limitation, ref. [
6] introduces Photo-Set, an open-source dataset capturing various cyberattack scenarios in PV systems through high-fidelity MATLAB/Simulink simulations [
7]. Building upon this foundation, this paper presents the first comprehensive benchmark evaluation of anomaly detection algorithms for PV cybersecurity applications. Our contributions extend beyond dataset provision to include the following:
A systematic evaluation of three state-of-the-art anomaly detection algorithms across 12 distinct attack scenarios plus realistic environmental conditions, establishing performance baselines for future research comparisons;
A quantified attack detectability hierarchy that categorizes threats based on algorithmic detection difficulty, providing insights into fundamental detection challenges;
Comprehensive performance characterization including computational requirements, scalability analysis, and real-time processing capabilities for industrial deployment;
Evidence-based implementation guidelines that translate research findings into practical deployment strategies for utility and industrial applications;
Failure mode analysis that identifies algorithm limitations and suggests future research directions for improving detection coverage.
The paper is structured as follows:
Section 2 introduces physics-based anomaly detection algorithms and reviews related work on public cybersecurity datasets.
Section 3 details the use case, including the system architecture and executed attacks.
Section 4 presents the dataset in its entirety.
Section 5 suggests potential dataset applications. Finally,
Section 7 concludes this paper.
4. Dataset Description
Building on the previously defined attack taxonomy, we developed a series of simulated scenarios involving both cyberattacks and operational faults within a photovoltaic (PV) system. These scenarios were used to generate a dataset composed of 12 distinct subsets, each corresponding to a specific simulation instance. The contents of these subsets—including file names, dataset dimensions, and brief descriptions—are summarized in
Table 4, which outlines the structure of the publicly available repository.
Each attack scenario covers 90–120 s. These can occur at any point during normal operation. This enables algorithm evaluation under varying conditions: peak solar generation, cloud intermittency, and low-light periods. This flexible timing approach allows researchers to customize evaluation scenarios based on their specific research requirements and operational contexts.
Each dataset captures system behavior over time, with a sampling rate of one second, meaning each row reflects a discrete second of PV system operation. The training dataset is provided without labels, making it suitable for unsupervised or semi-supervised learning approaches. In contrast, all evaluation datasets include a label column: a value of 1 indicates normal operation, while −1 denotes anomalous behavior.
The next subsection provides a detailed description of each dataset subset, including the type of anomaly simulated, the control or measurement parameters affected, and the context of the abnormal behavior.
4.1. Normal Functioning
To establish a comprehensive baseline for comparison, we simulated the normal operation of the photovoltaic system over three full days, representing different seasonal and weather conditions. This contributes to our training dataset. Each simulation day corresponds to a different season with varying meteorological patterns to capture the full spectrum of realistic operational behaviors. The three simulated scenarios include the following:
Day 1 (Summer): High irradiance with typical variable weather conditions.
Day 2 (Spring): Clear conditions with moderate irradiance levels and gradual temperature variations.
Day 3 (Winter): Lower irradiance levels with reduced daylight hours.
Panel temperatures range from 25 °C during early morning hours to peak values of 65–70 °C during high irradiance periods, accounting for ambient temperature effects, solar heating, and thermal inertia. The thermal model incorporates the following:
where
= 300 s represents the thermal time constant, accounting for the slower response of temperature changes compared to irradiance variations.
The overall trend of solar irradiance across the four days is illustrated in
Figure 2,
Figure 3 and
Figure 4, highlighting the natural variability that arises from seasonal changes and serving as a reference for interpreting system performance in the absence of faults or attacks. The figures indicate that t = 0 represents midnight. The training dataset captures 107,260 operational samples representing this diverse range of normal operating conditions, providing a robust foundation for anomaly detection algorithm training. The temporal sampling rate of 1 Hz ensures adequate resolution for capturing both gradual environmental changes and rapid cloud-induced irradiance variations.
4.2. Bad Data Injection—P Reduction
In this attack scenario, the adversary targets the active power (P) setpoint transmitted to the inverter, artificially lowering it without any legitimate operational justification. Under normal conditions and given a certain level of solar irradiance, the system initially operates at a power output of approximately 37 kW. However, despite unchanged irradiance levels, the attacker abruptly reduces the power setpoint—first dropping it to around 30 kW and then progressively down to 0.1 kW, amounting to a 99.7% reduction from the expected output.
This manipulation results in a significant underutilization of the available solar energy and could potentially destabilize local power flows or trigger alarms in supervisory systems. The overall progression of this manipulated power output is depicted in
Figure 5, illustrating the drastic deviation from normal operational behavior.
4.3. Bad Data Injection—Q Exceeds Limits
In this attack scenario, the adversary alters the reactive power (Q) setpoint communicated to the inverter, deliberately forcing it beyond the bounds typically observed during normal operations. Under standard conditions—and with constant irradiance—the system maintains a reactive power output of approximately 0 kVAR, reflecting a balanced power factor.
The attack begins by abruptly increasing the reactive power setpoint to 10 kVAR, followed by a gradual escalation to nearly 38 kVAR. This manipulation pushes the inverter to operate outside of its expected reactive power range, which may lead to voltage instability or increased stress on grid components, particularly in sensitive distribution networks.
The evolution of the reactive power output during this attack is presented in
Figure 6, highlighting the deviation from nominal behavior and the potential operational risks posed by such a control-level compromise.
4.4. Bad Data Injection—P and Q Oscillations
These two scenarios simulate the effects of partial Man-in-the-Middle (MitM) attacks, in which an adversary intermittently manipulates both the active (P) and reactive (Q) power setpoints sent to the inverter. As a result of the attack, the setpoints exhibit continuous fluctuations, causing the inverter’s output to oscillate between the legitimate and malicious commands.
Such behavior illustrates a vulnerability inherent in many industrial communication protocols, where the SCADA system periodically transmits control setpoints to field devices—such as inverters—without authentication or integrity checks. When an attacker injects unauthorized packets into the communication stream, both the valid and falsified setpoints compete, leading the inverter to alternate erratically between them.
In the simulated attack, the rogue setpoint follows a ramped sinusoidal pattern, effectively creating oscillations with increasing amplitude over time. This kind of disturbance not only degrades system stability but also stresses the inverter’s control and switching mechanisms.
The dynamic trends of active power (P) and reactive power (Q) under this oscillatory attack scenario are illustrated in
Figure 7 and
Figure 8, respectively.
4.5. False Data Injection—P Tampering
In this attack scenario, the adversary executes a Man-in-the-Middle (MitM) attack to tamper with the power injection measurements transmitted from the inverter to the SCADA system. Specifically, the attacker targets and alters only the active power readings while leaving other measurements such as voltage and current unchanged. This selective manipulation leads to data inconsistency, as the reported active power no longer corresponds to the expected value derived from the product of voltage and current—thereby violating basic electrical relationships.
The simulation was conducted over a 120 s window under stable irradiance conditions, which under normal circumstances would yield a power setpoint of 55 kW. At the 30 s mark, the attack is initiated, causing the active power measurement reported to SCADA to diverge from the actual inverter output. This kind of inconsistency could mislead the SCADA system into issuing erroneous commands or overlooking a fault condition.
The evolution of the falsified active power readings during the attack is depicted in
Figure 9, showing the disruption introduced by the manipulated data stream.
Anomaly detection algorithms are designed to rapidly identify such attacks, as the manipulated measurement vectors produced by the attacker violate fundamental physical laws and thus represent physically implausible data. This inconsistency between the altered active power readings and the corresponding voltage and current measurements serves as a strong indicator of anomalous behavior, enabling timely detection and response.
4.6. False Data Injection— Tampering
In this attack scenario, the adversary exploits a Man-in-the-Middle (MitM) attack to manipulate the temperature measurements transmitted from the inverter to the SCADA system. The attacker selectively alters only the panel temperature reading while leaving other sensor data—such as irradiance and active power—unchanged. This selective modification results in inconsistent data, as the reported temperature no longer corresponds logically with the environmental and electrical measurements.
The simulation spans 120 s under constant irradiance conditions, which normally correspond to a panel temperature of approximately 14 °C. A photovoltaic (PV) panel temperature of 14 °C can be possible under operating conditions with low ambient temperatures, like a winter day or an overcast day with low sun intensity.
At the 30 s mark, the attack commences, abruptly increasing the reported temperature to 36 °C, after which the falsified temperature values continue to rise progressively up to 70 °C. This creates a physically implausible scenario where the panel temperature suggests extreme overheating without corresponding changes in irradiance levels or other thermal indicators.
The manipulated temperature trend is illustrated in
Figure 10, clearly showing the divergence from normal operating conditions. Given that the altered measurements violate the expected physical relationships between temperature, irradiance, and power output, anomaly detection algorithms are anticipated to promptly detect this attack due to the presence of physically implausible data within the measurement vector.
4.7. False Data Injection—Irradiance Tampering
In this attack scenario, the adversary conducts a Man-in-the-Middle (MitM) attack to compromise the irradiance measurements transmitted from the inverter to the SCADA system. The attacker selectively modifies only the irradiance values, while other sensor readings, such as active power, remain unaltered. This results in inconsistent data, as the reported irradiance no longer corresponds with the other measured parameters, disrupting the expected physical relationships. The simulation covers a 120 s period with an initial irradiance setpoint of 700 W/m2. At the 30 s mark, the attacker initiates the manipulation by increasing the irradiance measurement by 10%, after which the falsified irradiance values continue to rise gradually.
The distorted irradiance profile is shown in
Figure 11, signifying the deviation from true operating conditions. Given the inconsistency between irradiance and other measurements, anomaly detection algorithms are expected to detect this manipulation promptly due to the physically implausible nature of the altered data.
Anomaly detection algorithms are anticipated to detect this type of attack promptly, as the manipulated irradiance values create inconsistencies within the overall measurement vector. Specifically, the altered irradiance no longer aligns with other physical quantities—such as active power output and panel temperature—violating fundamental physical correlations. These discrepancies result in physically implausible system states, which well-designed detection models can recognize as anomalous behavior.
4.8. Firmware Modifications—Harmonics Tampering
In this attack scenario, the adversary compromises the internal operation of the inverter by altering the waveform it produces—specifically through the injection of an additional harmonic component into the sinusoidal output signal. The magnitude of this injected harmonic is increased in three distinct stages, progressively degrading the quality of the voltage waveform.
From the first stage onward, the total harmonic distortion (THD) exceeds the acceptable thresholds defined by the IEEE 519-2022 [
28]. Standard for power quality, thereby breaching operational safety and regulatory compliance. This form of manipulation introduces electrical noise into the system and can have disruptive effects on the stability and performance of the local grid, especially in sensitive or tightly regulated environments.
The staged escalation of the harmonic component—and the resulting THD variation—is depicted in
Figure 12.
Given the severity and breadth of its impact, this attack is expected to be promptly detected by anomaly detection algorithms, which can identify the abnormal rise in THD and associated fluctuations across all three voltage phases. Such deviations strongly diverge from normal inverter behavior and serve as reliable indicators of tampering or internal malfunction.
4.9. Firmware Modifications—MPPT Tampering
In this attack scenario, the intruder compromises the internal control logic of the DC/DC converter within the inverter. By manipulating the converter’s behavior, the attacker disrupts the regulation of power flow between the photovoltaic modules and the DC link. This type of interference is particularly hazardous, as it can drive the system into unsafe operating conditions, potentially resulting in equipment damage or reduced system lifespan.
The simulated manipulation is introduced in three escalating stages, each causing progressively more severe deviations in system behavior. The evolution of this disturbance is illustrated in
Figure 13, showing the increasing impact on system dynamics.
The attack leads to notable fluctuations in critical electrical parameters, especially the battery voltage and the DC/DC link voltage—both of which are essential for maintaining energy balance and ensuring safe converter operation.
Due to the magnitude and nature of these anomalies, anomaly detection algorithms are expected to detect this attack rapidly. The physical implausibility and volatility introduced by the altered control signals generate clear indicators of abnormal behavior, making the attack readily identifiable within a well-designed monitoring framework.
4.10. Fault—Short Circuited Cells
A common fault that can arise in photovoltaic (PV) panels is the formation of hot spots, typically caused by short-circuited cells. In this condition, one or more cells cease to function properly and begin dissipating energy as heat instead of converting into electrical power. These faulty cells effectively bypass the current flow, resulting in localized overheating and potentially leading to permanent damage or a reduced lifespan of the panel.
One of the most immediate effects of this fault is a notable drop in terminal voltage, as the short-circuited cells no longer contribute to the total voltage output. The voltage reduction is approximately proportional to the number of affected cells, and this directly translates into a significant decline in power output, ultimately degrading the performance and efficiency of the solar energy system.
In our simulation, the PV panel initially operates under normal conditions with a voltage of approximately 282 V. At a specific point, a fault is introduced by simulating short-circuited cells, resulting in a drop to 226 V. The fault is further escalated with additional disruptions occurring at 60 s and 90 s, each causing further reductions in panel voltage.
The full progression of the panel voltage under this simulated fault condition is depicted in
Figure 14, illustrating the gradual degradation of system performance due to cumulative cell failures.
4.11. Fault Dust
A frequent fault affecting photovoltaic (PV) panels is the accumulation of dust on the panel’s surface, which results in reduced light absorption. Dust particles obstruct and scatter incoming solar radiation, preventing it from efficiently reaching the photovoltaic cells. This reduction in effective irradiance leads to a decline in the panel’s conversion efficiency and, consequently, a significant drop in power output.
The presence of dust diminishes the amount of solar energy converted into electricity, directly impacting the energy yield of the PV system. The severity of the performance loss depends on the extent and density of the dust coverage, which can vary with environmental conditions and maintenance frequency. Over time, such degradation can noticeably lower the system’s overall energy production and operational efficiency.
In the simulated scenario, the PV system initially operates under normal conditions, generating approximately 75 kW of active power. At the 30 s mark, the effect of dust accumulation is introduced, resulting in a reduction in power output to approximately 56 kW. The fault is further replicated at 60 s and 90 s, with each event causing additional degradation in power generation.
The progressive impact of dust accumulation on the system’s active power output is shown in
Figure 15, which indicates the declining trend in P over time, assuming that irradiance remains constant throughout the simulation.
4.12. Realistic Environmental Conditions
To evaluate algorithm performance under realistic operational conditions, we include a comprehensive cloudy day dataset spanning natural weather variations. Unlike the controlled attack scenarios, this dataset captures the inherent variability present in actual PV system operations, including gradual irradiance changes, cloud transients, and temperature fluctuations that occur during normal cloudy weather patterns. This profile is illustrated in
Figure 16.
The dataset contains 43,201 samples of normal PV system operation under variable cloudy conditions, representing the type of environmental challenges that anomaly detection algorithms must handle in real-world deployments. All samples are labeled as normal operation (label = 1), as no attacks or faults occur during this period.
The inclusion of this dataset addresses a significant gap in cybersecurity research, where algorithms often perform well under controlled conditions but fail when deployed in variable real-world environments.