Photo-Set: A Proposed Dataset and Benchmark for Physics-Based Cybersecurity Monitoring in Photovoltaic Systems

Mokarim, Afroz; Gaggero, Giovanni Battista; Ferro, Giulio; Robba, Michela; Girdinio, Paola; Marchese, Mario

doi:10.3390/en18195318

Open AccessArticle

Photo-Set: A Proposed Dataset and Benchmark for Physics-Based Cybersecurity Monitoring in Photovoltaic Systems^†

by

Afroz Mokarim

¹

,

Giovanni Battista Gaggero

¹

,

Giulio Ferro

²,

Michela Robba

²

,

Paola Girdinio

^1,* and

Mario Marchese

¹

Department of Electrical, Electronic and Telecommunications Engineering, and Naval Architecture (DITEN), University of Genoa, Via all’Opera Pia 11A, 16145 Genoa, Italy

²

Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Via all’Opera Pia 13, 16145 Genoa, Italy

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper titled “Photo-Set: A Dataset for Physics-Based Cybersecurity Monitoring in Photovoltaic Systems” presented in 2025 IFAC Workshop on Smart Energy Systems for Efficient and Sustainable Smart Grids and Smart Cities (SENSYS 2025), Bari, Italy, 18–20 June 2025.

Energies 2025, 18(19), 5318; https://doi.org/10.3390/en18195318 (registering DOI)

Submission received: 31 July 2025 / Revised: 14 September 2025 / Accepted: 30 September 2025 / Published: 9 October 2025

(This article belongs to the Section A2: Solar Energy and Photovoltaic Systems)

Download

Browse Figures

Versions Notes

Abstract

Modern photovoltaic (PV) systems face increasing cybersecurity threats due to their integration with smart grid infrastructure. While previous research has identified vulnerabilities, the lack of standardized datasets has hindered the development and evaluation of detection algorithms. Building upon our previously introduced Photo-Set dataset, this paper presents a benchmark evaluation of anomaly detection algorithms for PV cybersecurity applications. We evaluate three state-of-the-art algorithms (One-Class SVM, Isolation Forest, and Local Outlier Factor) across 12 attack scenarios, establishing performance baselines and identifying algorithm-specific strengths and limitations. Our experimental results reveal a clear detectability hierarchy. This work proposes a standardized benchmark for PV cybersecurity research and provides the industry with evidence-based guidance for security system deployment.

Keywords:

dataset; cybersecurity; anomaly detection; smart grid; photovoltaic; distributed energy resources

1. Introduction

The integration of photovoltaic (PV) systems into modern smart grids has fundamentally transformed the energy landscape, enabling distributed generation and enhanced grid flexibility. However, this transformation has introduced unprecedented cybersecurity challenges. Unlike traditional centralized power systems, PV installations rely heavily on smart inverters equipped with communication interfaces, creating multiple attack vectors that threaten both system integrity and grid stability [1]. When adversaries gain control over DER systems, they can deliberately alter power generation patterns, resulting in persistent grid oscillations that compromise overall system stability [2]. The vulnerability increases when attackers control multiple DER units, especially energy storage systems. Such attacks can push voltage levels beyond safe limits in distribution networks [3]. Electric vehicle infrastructure creates new attack vectors. Researchers study how cyberattacks on charging systems can affect the broader grid [4]. Adversaries can exploit weaknesses in communication protocols and individual DER devices. These attacks can significantly disrupt power system operations [5]. Despite growing awareness of these vulnerabilities, the cybersecurity community lacks standardized datasets and evaluation frameworks for developing and comparing detection algorithms specific to PV systems. Existing cybersecurity datasets focus primarily on network traffic or generic industrial control systems, failing to capture the unique physics-based relationships inherent in PV operations. This gap has hindered the development of effective, physics-informed anomaly detection algorithms tailored to PV cybersecurity applications. To address this limitation, ref. [6] introduces Photo-Set, an open-source dataset capturing various cyberattack scenarios in PV systems through high-fidelity MATLAB/Simulink simulations [7]. Building upon this foundation, this paper presents the first comprehensive benchmark evaluation of anomaly detection algorithms for PV cybersecurity applications. Our contributions extend beyond dataset provision to include the following:

A systematic evaluation of three state-of-the-art anomaly detection algorithms across 12 distinct attack scenarios plus realistic environmental conditions, establishing performance baselines for future research comparisons;
A quantified attack detectability hierarchy that categorizes threats based on algorithmic detection difficulty, providing insights into fundamental detection challenges;
Comprehensive performance characterization including computational requirements, scalability analysis, and real-time processing capabilities for industrial deployment;
Evidence-based implementation guidelines that translate research findings into practical deployment strategies for utility and industrial applications;
Failure mode analysis that identifies algorithm limitations and suggests future research directions for improving detection coverage.

The paper is structured as follows: Section 2 introduces physics-based anomaly detection algorithms and reviews related work on public cybersecurity datasets. Section 3 details the use case, including the system architecture and executed attacks. Section 4 presents the dataset in its entirety. Section 5 suggests potential dataset applications. Finally, Section 7 concludes this paper.

2. Background

2.1. Physics-Based Anomaly Detection

Anomaly detection plays a pivotal role across various domains, notably in cybersecurity, where irregular patterns often signal critical incidents such as security breaches or equipment malfunctions. Conventional cybersecurity anomaly detection techniques primarily analyze data such as network traffic and system logs produced by applications on networked devices. In contrast, physics-based anomaly detection represents an emerging paradigm that integrates domain-specific physical principles to improve detection accuracy. This approach recognizes that many engineering systems operate under well-defined physical laws. By embedding these laws into detection frameworks, we can better model the expected system behavior and identify anomalous deviations.

Physics-based models offer distinct advantages over traditional data-driven methods and can effectively augment them to enhance visibility into system processes. A comprehensive review of the literature on this topic is provided in [8], which surveys research spanning control theory, information security, and power systems, thereby fostering interdisciplinary collaboration. The review also highlights the rapid expansion of this field. Recent developments have seen the increasing integration of deep learning methods, which address challenges such as large-scale data and the requirement for specialized domain knowledge [9]. Several applications of physics-based anomaly detection can already be found in power systems. For instance, ref. [10] proposes a contextual anomaly detection method leveraging artificial neural networks to detect malicious voltage control activities in low-voltage distribution grids. Similarly, ref. [11] presents a high-dimensional, data-driven technique that utilizes electric waveform measurements from distribution networks to identify cyber-physical attacks. In another study, ref. [12] introduces a machine learning-based anomaly detection algorithm capable of identifying various attacks on photovoltaic (PV) systems—such as PV disconnection, power curtailment, Volt-var manipulations, and reverse power flow—particularly in distribution networks with a high penetration of distributed energy resources (DERs).

Progress in this area is further propelled by innovations in neural network architectures, notably “physics-informed neural networks,” which integrate physical constraints directly into the learning process [13]. Given these advancements, the adoption of physics-based anomaly detection for cybersecurity monitoring within industrial control systems (ICSs) is expected to increase significantly [14]. Within this context, the proposed Photo-Set offers potential as a valuable resource for advancing cybersecurity research.

2.2. Related Works

Open-access datasets are indispensable for fostering innovation in cybersecurity and machine learning, especially by enabling practitioners without in-depth domain expertise to test and refine their models effectively. In this context, ref. [15] introduces a structured methodology for the systematic creation of anomaly detection datasets tailored to industrial control systems (ICSs), demonstrating its application through a dataset focused on electric traction substations in the railway sector. Complementing this, ref. [16] provides a comprehensive review of existing ICS-related datasets and testbeds, concentrating on those that capture network communication and protocol-level interactions. However, the review also reveals a notable deficiency in datasets that encompass the physical behavior of underlying processes.

Despite growing interest in physics-aware cybersecurity, there is a critical gap. Few datasets incorporate physical system data for distributed energy resources, especially photovoltaic systems. A systematic review of existing datasets reveals several key limitations that Photo-Set addresses.

The overwhelming majority of publicly available ICS datasets are centered around network traffic data, which poses limitations for researchers seeking to develop physics-based anomaly detection techniques. For instance, the dataset presented in [17] supports the development of intrusion detection systems (IDSs) for water distribution infrastructures using artificial intelligence and machine learning. It was derived from a hardware-in-the-loop Water Distribution Testbed designed by the authors. Meanwhile, ref. [18] investigates typical electrical substations at the distribution level and simulates a range of critical protection events and cyberattack scenarios, resulting in a dataset that includes diverse trace data intended to support cybersecurity analysis.

Recent advances in photovoltaic system monitoring have introduced sophisticated anomaly detection approaches. The authors in [19] present a threat model for the cybersecurity of PV farms based solely on data integrity attacks. The assessment of global and meteorological datasets for PV system reliability is carried out in [20]. The progressive deformable transformer for photovoltaic panel defect segmentation (PDeT) presented in [21] demonstrates significant improvements in visual defect detection. Still, it does not address cybersecurity-specific anomalies or provide datasets for cyberphysical attacks. Similarly, cross-modal learning approaches for anomaly detection in complex industrial processes [22] offer methodological advances but lack application to photovoltaic-specific attack scenarios. Table 1 compares the existing datasets for cybersecurity in industrial and energy systems.

Photo-Set provides a dataset that captures the complex interactions between environmental conditions, electrical behavior, and cybersecurity threats in photovoltaic systems. The dataset includes 22 electrical and environmental parameters that reflect the operational reality of grid-connected PV installations, enabling the development and validation of physics-informed anomaly detection algorithms specifically tailored to renewable energy cybersecurity applications.

3. Simulation Environment

3.1. PV System

Photovoltaic (PV) systems consist of a range of electrical, electronic, and communication components that work together to enable efficient energy generation and integration. In this study, we examine a representative scenario involving a PV system coupled with an energy storage unit and connected to a microgrid managed by a SCADA (Supervisory Control and Data Acquisition) system. The main electrical elements in this configuration include the following:

Photovoltaic (PV) Cell Modules: Individual PV cells are interconnected in series and/or parallel arrangements to achieve the required direct current (DC) voltage and desired peak power output. The current-voltage relationship is given by the following:

$I = I_{ph} - I_{0} [exp (\frac{q (V + I \cdot R_{s})}{n \cdot k \cdot T}) - 1] - \frac{V + I \cdot R_{s}}{R_{sh}}$

(1)

where
$I_{p h}$ is the photocurrent (A);
$I_{0}$ is the reverse saturation current (A);
q is the elementary charge (1.602 × $10^{- 19}$ C);
V is the cell voltage (V);
$R_{s}$ is the series resistance ( $Ω$ );
$R_{s h}$ is the shunt resistance ( $Ω$ );
n is the ideality factor (dimensionless);
k is the Boltzmann constant (1.381 × $10^{- 23}$ J/K);
T is the cell temperature (K).
The photocurrent is temperature- and irradiance-dependent:

$I_{p h} = [I_{s c, ref} + K_{i} (T - T_{ref})] \cdot \frac{G}{G_{ref}}$

(2)

where $I_{s c, r e f}$ is the short-circuit current at reference conditions, $K_{i}$ is the temperature coefficient of current, G is the irradiance, and the subscript “ref” denotes reference conditions (25 °C, 1000 W/m²).
DC/DC Converter: This electronic device regulates the power flow by implementing control strategies that enable maximum power point tracking (MPPT), thereby optimizing the energy harvested from the PV modules. The converter dynamics are governed by the following:

$L \frac{d i_{L}}{d t} = V_{p v} - (1 - D) \cdot V_{d c}$

(3)

$C_{dc} \frac{d V_{dc}}{d t} = (1 - D) \cdot i_{L} - i_{out}$

(4)

where D is the duty cycle, L is the inductor value, $C_{d c}$ is the DC-link capacitance, $i_{L}$ is the inductor current, and $i_{o u t}$ is the output current.
Power Inverter: Responsible for converting the DC output from the PV system into three-phase alternating current (AC), the inverter facilitates the integration of the generated power into the microgrid or utility network. The inverter converts DC power to three-phase AC using space vector modulation.
AC voltage generation:

$[\begin{matrix} v_{a} \\ v_{b} \\ v_{c} \end{matrix}] = \frac{V_{d c}}{3} [\begin{matrix} 2 & - 1 & - 1 \\ - 1 & 2 & - 1 \\ - 1 & - 1 & 2 \end{matrix}] [\begin{matrix} S_{a} \\ S_{b} \\ S_{c} \end{matrix}]$

(5)

Here,

S_{a}

,

S_{b}

and

S_{c}

are the switching states.

Power calculations:

P = \frac{3}{2} (v_{d} i_{d} + v_{q} i_{q})

(6)

Q = \frac{3}{2} (v_{q} i_{d} - v_{d} i_{q})

(7)

Here,

v_{d}

,

v_{q}

,

i_{d}

, and

i_{q}

are the d-q axis components of the voltage and current.

Control system: The inverter implements dual-loop control with outer voltage control and inner current control:

i_{d}^{*} = K_{p, v} (V_{d c}^{*} - V_{d c}) + K_{i, v} \int (V_{d c}^{*} - V_{d c}) d t

(8)

i_{q}^{*} = \frac{Q^{*}}{1.5 \cdot V_{q}}

(9)

The system response to environmental and operational changes is characterized by different time constants:

Electrical Parameters (voltage, current): 10–100 milliseconds;
MPPT Tracking Response: 0.1–1 s;
Irradiance Response: 0.1–1 s;
Temperature Response: 5–10 min (thermal inertia).

These time constants are crucial for understanding the system’s ability to respond to both legitimate operational changes and malicious cyberattacks, as they determine the detection windows available for anomaly detection algorithms. A detailed simulation model of a photovoltaic (PV) system has been developed, incorporating multiple PV cell arrays, a DC-DC converter, and a power inverter—each equipped with its own dedicated control mechanism. The system is configured with the following parameters: single-phase AC voltage (VAC) of 230 V, a total power capacity (S) of 100 kVA, DC voltage (VDC) of 500 V, and a grid frequency (f) of 50 Hz. The Active Front End (AFE) establishes a three-phase connection to the main electrical grid. The simulation is based on an electromagnetic model, which allows for an accurate representation of dynamic electrical behavior.

The control architecture integrates a Maximum Power Point Tracking (MPPT) algorithm for the DC-DC converter to ensure optimal energy extraction from the PV cells under varying irradiance conditions. The inverter operates by regulating the DC link voltage: it increases the active power injected into the grid when the DC voltage rises and decreases it when the voltage drops, thereby maintaining voltage stability.

This entire PV system is modeled and simulated using MATLAB/Simulink, specifically utilizing components from the Simscape library [26]. Figure 1 presents the full Simulink schematic of the system’s architecture. To mimic a realistic monitoring scenario, a range of operational parameters are recorded from the inverter—representative of those typically communicated to a SCADA system. The specific set of measurements is summarized in Table 2.

3.2. Details of the Attacks

We adopted a streamlined classification of cyberattacks targeting smart inverters in the smart grid, drawing upon the framework presented in [27]. The attacks are grouped into three principal categories:

Bad Data Injection: This attack involves the manipulation of control commands issued by the supervisory controller to the inverter. These commands often relate to parameters such as active power, reactive power, or power factor setpoints. A common execution method is a Man-in-the-Middle (MitM) attack, where an adversary intercepts and alters the control messages during transmission. The lack of authentication in widely adopted industrial communication protocols—such as Modbus and IEC 61850—facilitates the injection of malicious packets, thereby compromising the integrity of commands.
False Data Injection: In this case, the attacker tampers with measurement data sent from the inverter to the supervisory system (e.g., a SCADA platform). This is often achieved through MitM attacks or packet spoofing that exploit protocol-level vulnerabilities. The objective is to mislead the controller by falsifying operational data, potentially causing it to issue inappropriate or unsafe control actions based on inaccurate information.
Firmware Modification: This form of attack involves altering the inverter’s embedded software to gain persistent, low-level control over its operation. The attacker may exploit remote vulnerabilities in web interfaces or gain direct physical access to inject malicious firmware. Such modifications can affect critical functionalities—including the shape of the output waveform or protection settings—potentially destabilizing the local distribution grid and compromising safety.

A summary of this attack taxonomy, along with the specific parameters potentially targeted and their consequences, is provided in Table 3.

Each of these attack types has been emulated within the previously described simulation environment. The Bad Data Injection scenario was implemented by modifying setpoint values at the inverter’s control interface. The False Data Injection scenario was replicated by artificially altering measurement outputs recorded from the simulation. Lastly, the Firmware Modification attack was simulated by altering the underlying control logic within the model of the power converter.

4. Dataset Description

Building on the previously defined attack taxonomy, we developed a series of simulated scenarios involving both cyberattacks and operational faults within a photovoltaic (PV) system. These scenarios were used to generate a dataset composed of 12 distinct subsets, each corresponding to a specific simulation instance. The contents of these subsets—including file names, dataset dimensions, and brief descriptions—are summarized in Table 4, which outlines the structure of the publicly available repository.

Each attack scenario covers 90–120 s. These can occur at any point during normal operation. This enables algorithm evaluation under varying conditions: peak solar generation, cloud intermittency, and low-light periods. This flexible timing approach allows researchers to customize evaluation scenarios based on their specific research requirements and operational contexts.

Each dataset captures system behavior over time, with a sampling rate of one second, meaning each row reflects a discrete second of PV system operation. The training dataset is provided without labels, making it suitable for unsupervised or semi-supervised learning approaches. In contrast, all evaluation datasets include a label column: a value of 1 indicates normal operation, while −1 denotes anomalous behavior.

The next subsection provides a detailed description of each dataset subset, including the type of anomaly simulated, the control or measurement parameters affected, and the context of the abnormal behavior.

4.1. Normal Functioning

To establish a comprehensive baseline for comparison, we simulated the normal operation of the photovoltaic system over three full days, representing different seasonal and weather conditions. This contributes to our training dataset. Each simulation day corresponds to a different season with varying meteorological patterns to capture the full spectrum of realistic operational behaviors. The three simulated scenarios include the following:

Day 1 (Summer): High irradiance with typical variable weather conditions.
Day 2 (Spring): Clear conditions with moderate irradiance levels and gradual temperature variations.
Day 3 (Winter): Lower irradiance levels with reduced daylight hours.

Panel temperatures range from 25 °C during early morning hours to peak values of 65–70 °C during high irradiance periods, accounting for ambient temperature effects, solar heating, and thermal inertia. The thermal model incorporates the following:

T_{p a n e l} (t) = T_{a m b i e n t} (t) + G (t) \cdot \frac{N O C T - 20}{800} + τ_{t h e r m a l} \frac{d T}{d t}

(10)

where

τ_{t h e r m a l}

= 300 s represents the thermal time constant, accounting for the slower response of temperature changes compared to irradiance variations.

The overall trend of solar irradiance across the four days is illustrated in Figure 2, Figure 3 and Figure 4, highlighting the natural variability that arises from seasonal changes and serving as a reference for interpreting system performance in the absence of faults or attacks. The figures indicate that t = 0 represents midnight. The training dataset captures 107,260 operational samples representing this diverse range of normal operating conditions, providing a robust foundation for anomaly detection algorithm training. The temporal sampling rate of 1 Hz ensures adequate resolution for capturing both gradual environmental changes and rapid cloud-induced irradiance variations.

4.2. Bad Data Injection—P Reduction

In this attack scenario, the adversary targets the active power (P) setpoint transmitted to the inverter, artificially lowering it without any legitimate operational justification. Under normal conditions and given a certain level of solar irradiance, the system initially operates at a power output of approximately 37 kW. However, despite unchanged irradiance levels, the attacker abruptly reduces the power setpoint—first dropping it to around 30 kW and then progressively down to 0.1 kW, amounting to a 99.7% reduction from the expected output.

This manipulation results in a significant underutilization of the available solar energy and could potentially destabilize local power flows or trigger alarms in supervisory systems. The overall progression of this manipulated power output is depicted in Figure 5, illustrating the drastic deviation from normal operational behavior.

4.3. Bad Data Injection—Q Exceeds Limits

In this attack scenario, the adversary alters the reactive power (Q) setpoint communicated to the inverter, deliberately forcing it beyond the bounds typically observed during normal operations. Under standard conditions—and with constant irradiance—the system maintains a reactive power output of approximately 0 kVAR, reflecting a balanced power factor.

The attack begins by abruptly increasing the reactive power setpoint to 10 kVAR, followed by a gradual escalation to nearly 38 kVAR. This manipulation pushes the inverter to operate outside of its expected reactive power range, which may lead to voltage instability or increased stress on grid components, particularly in sensitive distribution networks.

The evolution of the reactive power output during this attack is presented in Figure 6, highlighting the deviation from nominal behavior and the potential operational risks posed by such a control-level compromise.

4.4. Bad Data Injection—P and Q Oscillations

These two scenarios simulate the effects of partial Man-in-the-Middle (MitM) attacks, in which an adversary intermittently manipulates both the active (P) and reactive (Q) power setpoints sent to the inverter. As a result of the attack, the setpoints exhibit continuous fluctuations, causing the inverter’s output to oscillate between the legitimate and malicious commands.

Such behavior illustrates a vulnerability inherent in many industrial communication protocols, where the SCADA system periodically transmits control setpoints to field devices—such as inverters—without authentication or integrity checks. When an attacker injects unauthorized packets into the communication stream, both the valid and falsified setpoints compete, leading the inverter to alternate erratically between them.

In the simulated attack, the rogue setpoint follows a ramped sinusoidal pattern, effectively creating oscillations with increasing amplitude over time. This kind of disturbance not only degrades system stability but also stresses the inverter’s control and switching mechanisms.

The dynamic trends of active power (P) and reactive power (Q) under this oscillatory attack scenario are illustrated in Figure 7 and Figure 8, respectively.

4.5. False Data Injection—P Tampering

In this attack scenario, the adversary executes a Man-in-the-Middle (MitM) attack to tamper with the power injection measurements transmitted from the inverter to the SCADA system. Specifically, the attacker targets and alters only the active power readings while leaving other measurements such as voltage and current unchanged. This selective manipulation leads to data inconsistency, as the reported active power no longer corresponds to the expected value derived from the product of voltage and current—thereby violating basic electrical relationships.

The simulation was conducted over a 120 s window under stable irradiance conditions, which under normal circumstances would yield a power setpoint of 55 kW. At the 30 s mark, the attack is initiated, causing the active power measurement reported to SCADA to diverge from the actual inverter output. This kind of inconsistency could mislead the SCADA system into issuing erroneous commands or overlooking a fault condition.

The evolution of the falsified active power readings during the attack is depicted in Figure 9, showing the disruption introduced by the manipulated data stream.

Anomaly detection algorithms are designed to rapidly identify such attacks, as the manipulated measurement vectors produced by the attacker violate fundamental physical laws and thus represent physically implausible data. This inconsistency between the altered active power readings and the corresponding voltage and current measurements serves as a strong indicator of anomalous behavior, enabling timely detection and response.

4.6. False Data Injection— $T_{p v}$ Tampering

In this attack scenario, the adversary exploits a Man-in-the-Middle (MitM) attack to manipulate the temperature measurements transmitted from the inverter to the SCADA system. The attacker selectively alters only the panel temperature reading while leaving other sensor data—such as irradiance and active power—unchanged. This selective modification results in inconsistent data, as the reported temperature no longer corresponds logically with the environmental and electrical measurements.

The simulation spans 120 s under constant irradiance conditions, which normally correspond to a panel temperature of approximately 14 °C. A photovoltaic (PV) panel temperature of 14 °C can be possible under operating conditions with low ambient temperatures, like a winter day or an overcast day with low sun intensity.

At the 30 s mark, the attack commences, abruptly increasing the reported temperature to 36 °C, after which the falsified temperature values continue to rise progressively up to 70 °C. This creates a physically implausible scenario where the panel temperature suggests extreme overheating without corresponding changes in irradiance levels or other thermal indicators.

The manipulated temperature trend is illustrated in Figure 10, clearly showing the divergence from normal operating conditions. Given that the altered measurements violate the expected physical relationships between temperature, irradiance, and power output, anomaly detection algorithms are anticipated to promptly detect this attack due to the presence of physically implausible data within the measurement vector.

4.7. False Data Injection—Irradiance Tampering

In this attack scenario, the adversary conducts a Man-in-the-Middle (MitM) attack to compromise the irradiance measurements transmitted from the inverter to the SCADA system. The attacker selectively modifies only the irradiance values, while other sensor readings, such as active power, remain unaltered. This results in inconsistent data, as the reported irradiance no longer corresponds with the other measured parameters, disrupting the expected physical relationships. The simulation covers a 120 s period with an initial irradiance setpoint of 700 W/m². At the 30 s mark, the attacker initiates the manipulation by increasing the irradiance measurement by 10%, after which the falsified irradiance values continue to rise gradually.

The distorted irradiance profile is shown in Figure 11, signifying the deviation from true operating conditions. Given the inconsistency between irradiance and other measurements, anomaly detection algorithms are expected to detect this manipulation promptly due to the physically implausible nature of the altered data.

Anomaly detection algorithms are anticipated to detect this type of attack promptly, as the manipulated irradiance values create inconsistencies within the overall measurement vector. Specifically, the altered irradiance no longer aligns with other physical quantities—such as active power output and panel temperature—violating fundamental physical correlations. These discrepancies result in physically implausible system states, which well-designed detection models can recognize as anomalous behavior.

4.8. Firmware Modifications—Harmonics Tampering

In this attack scenario, the adversary compromises the internal operation of the inverter by altering the waveform it produces—specifically through the injection of an additional harmonic component into the sinusoidal output signal. The magnitude of this injected harmonic is increased in three distinct stages, progressively degrading the quality of the voltage waveform.

From the first stage onward, the total harmonic distortion (THD) exceeds the acceptable thresholds defined by the IEEE 519-2022 [28]. Standard for power quality, thereby breaching operational safety and regulatory compliance. This form of manipulation introduces electrical noise into the system and can have disruptive effects on the stability and performance of the local grid, especially in sensitive or tightly regulated environments.

The staged escalation of the harmonic component—and the resulting THD variation—is depicted in Figure 12.

Given the severity and breadth of its impact, this attack is expected to be promptly detected by anomaly detection algorithms, which can identify the abnormal rise in THD and associated fluctuations across all three voltage phases. Such deviations strongly diverge from normal inverter behavior and serve as reliable indicators of tampering or internal malfunction.

4.9. Firmware Modifications—MPPT Tampering

In this attack scenario, the intruder compromises the internal control logic of the DC/DC converter within the inverter. By manipulating the converter’s behavior, the attacker disrupts the regulation of power flow between the photovoltaic modules and the DC link. This type of interference is particularly hazardous, as it can drive the system into unsafe operating conditions, potentially resulting in equipment damage or reduced system lifespan.

The simulated manipulation is introduced in three escalating stages, each causing progressively more severe deviations in system behavior. The evolution of this disturbance is illustrated in Figure 13, showing the increasing impact on system dynamics.

The attack leads to notable fluctuations in critical electrical parameters, especially the battery voltage and the DC/DC link voltage—both of which are essential for maintaining energy balance and ensuring safe converter operation.

Due to the magnitude and nature of these anomalies, anomaly detection algorithms are expected to detect this attack rapidly. The physical implausibility and volatility introduced by the altered control signals generate clear indicators of abnormal behavior, making the attack readily identifiable within a well-designed monitoring framework.

4.10. Fault—Short Circuited Cells

A common fault that can arise in photovoltaic (PV) panels is the formation of hot spots, typically caused by short-circuited cells. In this condition, one or more cells cease to function properly and begin dissipating energy as heat instead of converting into electrical power. These faulty cells effectively bypass the current flow, resulting in localized overheating and potentially leading to permanent damage or a reduced lifespan of the panel.

One of the most immediate effects of this fault is a notable drop in terminal voltage, as the short-circuited cells no longer contribute to the total voltage output. The voltage reduction is approximately proportional to the number of affected cells, and this directly translates into a significant decline in power output, ultimately degrading the performance and efficiency of the solar energy system.

In our simulation, the PV panel initially operates under normal conditions with a voltage of approximately 282 V. At a specific point, a fault is introduced by simulating short-circuited cells, resulting in a drop to 226 V. The fault is further escalated with additional disruptions occurring at 60 s and 90 s, each causing further reductions in panel voltage.

The full progression of the panel voltage under this simulated fault condition is depicted in Figure 14, illustrating the gradual degradation of system performance due to cumulative cell failures.

4.11. Fault Dust

A frequent fault affecting photovoltaic (PV) panels is the accumulation of dust on the panel’s surface, which results in reduced light absorption. Dust particles obstruct and scatter incoming solar radiation, preventing it from efficiently reaching the photovoltaic cells. This reduction in effective irradiance leads to a decline in the panel’s conversion efficiency and, consequently, a significant drop in power output.

The presence of dust diminishes the amount of solar energy converted into electricity, directly impacting the energy yield of the PV system. The severity of the performance loss depends on the extent and density of the dust coverage, which can vary with environmental conditions and maintenance frequency. Over time, such degradation can noticeably lower the system’s overall energy production and operational efficiency.

In the simulated scenario, the PV system initially operates under normal conditions, generating approximately 75 kW of active power. At the 30 s mark, the effect of dust accumulation is introduced, resulting in a reduction in power output to approximately 56 kW. The fault is further replicated at 60 s and 90 s, with each event causing additional degradation in power generation.

The progressive impact of dust accumulation on the system’s active power output is shown in Figure 15, which indicates the declining trend in P over time, assuming that irradiance remains constant throughout the simulation.

4.12. Realistic Environmental Conditions

To evaluate algorithm performance under realistic operational conditions, we include a comprehensive cloudy day dataset spanning natural weather variations. Unlike the controlled attack scenarios, this dataset captures the inherent variability present in actual PV system operations, including gradual irradiance changes, cloud transients, and temperature fluctuations that occur during normal cloudy weather patterns. This profile is illustrated in Figure 16.

The dataset contains 43,201 samples of normal PV system operation under variable cloudy conditions, representing the type of environmental challenges that anomaly detection algorithms must handle in real-world deployments. All samples are labeled as normal operation (label = 1), as no attacks or faults occur during this period.

The inclusion of this dataset addresses a significant gap in cybersecurity research, where algorithms often perform well under controlled conditions but fail when deployed in variable real-world environments.

5. Dataset Applications and Validation

5.1. Usage of the Dataset

The dataset emphasizes how cyberattacks can influence the electrical behavior of a medium-scale grid-connected photovoltaic (PV) system. As such, it supports a range of valuable applications:

Cyberattack Risk Assessment: This dataset can play a vital role in evaluating the risks posed by cyberthreats in PV environments. While existing research has extensively documented smart inverter vulnerabilities, quantifying the real-world impact of such threats remains complex. The dataset bridges this gap by providing time-series trends for key electrical variables under each attack scenario described in Section 3.2. Additionally, the included simulation model allows users to tailor and extend experiments based on their own configurations or threat models.
Design of Cyber-Responsive Electrical Protections: Emerging research focuses on leveraging traditional electrical protection schemes as part of the cyberincident response toolkit [29]. Engineers can implement or tune relay protections to ensure system stability in the face of malicious interference. The dataset offers quantitative insights into how attacks influence system dynamics, providing a foundation for configuring effective protection mechanisms based on measurable thresholds.
Development of Physics-Aware Anomaly Detection Systems: A core objective of this dataset is to foster the creation of anomaly detection algorithms that can identify a wide range of attack types in PV systems. Physics-based approaches, which rely on the underlying physical laws governing system behavior, show strong promise, as noted in Section 2 and demonstrated in [30]. The dataset offers labeled instances of various anomalies, allowing data scientists and machine learning practitioners—even those without deep power systems expertise—to experiment with detection strategies. In this way, it promotes interdisciplinary research in monitoring and securing industrial control systems.

The variables selected for inclusion, as detailed in Table 2, are based on a survey of commercial inverter communication module manuals. They reflect the most widely exchanged operational parameters and control signals. Consequently, Photo-Set provides a flexible testbed for validating algorithms across diverse inverter brands. If a specific product reports a narrower set of metrics, irrelevant variables can simply be excluded—preserving only the applicable features for targeted analysis.

5.2. Validation

To demonstrate the practical utility of the Photo-Set dataset, we evaluated the performance of three widely used anomaly detection algorithms on the collected data. This validation serves multiple purposes:

Establishing baseline performance metrics for future research.
Identifying which attack types are most challenging to detect.
Providing guidance for real-world PV system monitoring implementations.

5.2.1. Methodology

We implemented three unsupervised anomaly detection algorithms commonly used in cybersecurity applications:

One-Class Support Vector Machine (OC-SVM): OC-SVM learns a decision boundary that encapsulates the normal data distribution by solving the optimization problem:

$min_{w, ξ, ρ} \frac{1}{2} {∥ w ∥}^{2} + \frac{1}{ν n} \sum_{i = 1}^{n} ξ_{i} - ρ$

(11)

which is subject to

$w^{T} ϕ (x_{i}) \geq ρ - ξ_{i}, ξ_{i} \geq 0$

(12)

where $ϕ (x)$ maps input data to a higher-dimensional space using a kernel function, $ν$ ∈ (0,1] controls the fraction of outliers, and $ρ$ determines the decision boundary. The decision function is as follows:

$f (x) = sign (w^{T} ϕ (x) - ρ)$

(13)
Isolation Forest: Isolation forest exploits the principle that anomalies are few and different, requiring fewer random splits to isolate them in feature space. The algorithm constructs t isolation trees, each built by recursively selecting random features and split values. The anomaly score for point x is as follows:

$s (x, n) = 2^{- \frac{E (h (x))}{c (n)}}$

(14)

where $E (h (x))$ is the average path length over all trees, and $c (n) = 2 H (n - 1) - \frac{2 (n - 1)}{n}$ , with H being the harmonic number. The algorithm’s time complexity is $O (t \cdot ψ \cdot log ψ)$ for training and $O (t \cdot log ψ)$ for prediction, where $ψ$ is the subsample size.
Local Outlier Factor (LOF): LOF measures the local density deviation of a data point with respect to its neighbors. The algorithm computes the following:
–
K-Distance:

$d_{k} (A) = distance to the k - th nearest neighbor$

–
Reachability Distance:

${reach - dist}_{k} (A, B) = max {d_{k} (B), d (A, B)}$

–
Local Reachability Density:

${lrd}_{k} (A) = \frac{| N_{k} (A) |}{\sum_{B \in N_{k} (A)} {reach - dist}_{k} (A, B)}$

–
Local Outlier Factor:

${LOF}_{k} (A) = \frac{\sum_{B \in N_{k} (A)} \frac{{lrd}_{k} (B)}{{lrd}_{k} (A)}}{| N_{k} (A) |}$

Normal operation data (training.csv) was used to train all models, while each attack scenario was evaluated separately to assess detection performance for specific threat types.

5.2.2. Performance Metrics

We employed standard binary classification metrics adapted for anomaly detection, where normal operation samples are treated as the positive class and attack samples as the negative class.

Accuracy: Overall correctness of predictions across all samples.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(15)

Sensitivity: Fraction of actual attacks successfully detected.

S e n s i t i v i t y = \frac{T P}{T P + F N}

(16)

Specificity: The model’s ability to accurately identify normal cases and avoid false alarms.

S p e c i f i c i t y = \frac{T N}{T N + F P}

(17)

where

T P

denotes true positive,

T N

denotes true negative,

F P

denotes false positive, and

F N

denotes false negative.

Performance metrics serve as baselines for future algorithm development, enabling an objective comparison of alternative approaches. The comprehensive evaluation protocol provides a standardized framework for reproducible research in this domain. This experimental framework establishes a proposed benchmark for anomaly detection in PV cybersecurity applications. The following section presents performance results across all algorithm–attack combinations, providing evidence-based guidance for practical deployment decisions.

5.3. Benchmark Results

The performance evaluation was conducted using the Photo-Set dataset, comprising normal photovoltaic system operations alongside various cyberattack and fault scenarios. The dataset includes 107,260 normal operational samples with 22 features, representing three days of typical PV system behavior under varying environmental conditions. Test scenarios encompass 12 distinct attack/fault types: Bad Data Injection (BDI) attacks targeting power setpoints, False Data Injection (FDI) attacks manipulating sensor measurements, firmware modification attacks, and physical system faults.

All algorithms were trained exclusively on normal operational data, with hyperparameters optimized for the characteristics of photovoltaic system. All features were standardized using z-score normalization to ensure equal contribution from different measurement scales. The 22 electrical parameters detailed in Table 2 were used as input features.

The performance metrics of One-Class SVM are presented in Table 5. Similarly, for Isolation Forest and Local Outlier Factor, the metrics are shown in Table 6 and Table 7, respectively. Each attack scenario was evaluated independently to isolate algorithm performance for specific threat types. This approach enables attack-specific performance characterization and identifies algorithm strengths and limitations without cross-contamination between scenarios.

The evaluation revealed distinct performance characteristics among the three anomaly detection algorithms. Isolation Forest demonstrated effective detection capabilities across multiple attack scenarios, while One-Class SVM and the Local Outlier Factor exhibited systematic classification issues requiring further parameter optimization. This is indicated in Figure 17 and Figure 18 as well. Isolation Forest achieved good performance on firmware-level attacks and reactive power manipulation scenarios, demonstrating its effectiveness for detecting structured anomalies in photovoltaic systems.

6. Discussion

The results confirm that detection failures stem from fundamental algorithmic limitations rather than suboptimal hyperparameters. One-Class SVM and Local Outlier Factor continued to show poor performance across all scenarios, while Isolation Forest maintained its superiority but could not overcome the inherent challenges of detecting gradual, physics-compliant anomalies.

The Cloudy_Test results reveal a critical limitation in current anomaly detection approaches for PV systems. Isolation Forest demonstrated the highest specificity (83.14%) among the three algorithms, suggesting better discrimination of normal operational patterns, while Local Outlier Factor performed worst (3.68% specificity), flagging nearly all cloudy conditions as anomalous. This finding underscores the critical need for physics-informed approaches that can distinguish between legitimate environmental variations and actual cyberthreats, representing a fundamental limitation that must be addressed for real-world PV cybersecurity monitoring systems.

Three attack categories proved consistently undetectable using standard anomaly detection approaches:

Physical degradation faults that manifest as gradual efficiency losses indistinguishable from environmental variations;
Subtle power oscillations that fall within normal operational fluctuation ranges;
Attacks that maintain physical system constraints while manipulating individual parameters.

Sensitivity analysis from the Isolation Forest demonstrates that attack detection coverage varies significantly by attack type, with firmware modifications achieving detection rates of more than 90%, while physical faults remain completely undetected. Hence, this work identifies fundamental algorithmic limitations and provides a roadmap for improvement, establishing that the raw application of classical anomaly detection methods is insufficient for high-dimensional cyberphysical systems without domain-specific preprocessing and parameter optimization. The persistent detection challenges point to promising research directions:

Physics-informed detection methods that incorporate domain-specific relationships and physical laws;
Hybrid approaches combining anomaly detection with specialized fault diagnosis techniques;
Time-series analysis methods optimized for gradual degradation patterns;
Ensemble approaches that integrate multiple detection paradigms for comprehensive coverage.

7. Conclusions

This study presents Photo-Set, a publicly available dataset designed to capture the behavior of photovoltaic (PV) systems under various cyberattack scenarios and realistic environmental conditions. Alongside the dataset, an openly shared Simulink model is provided to enable reproducibility and customization [6]. The experimental evaluation reveals significant performance disparities across algorithms. It also establishes a performance benchmark for anomaly detection in PV cybersecurity applications, demonstrating that while perfect detection remains elusive, strategic algorithm deployment can provide meaningful protection against critical threats. Utility-scale solar installations face unique cybersecurity challenges due to their critical role in grid stability and the high-value targets they represent. The dataset’s comprehensive coverage of coordinated attack scenarios directly addresses threats identified in operational deployments. Quantitative analysis could demonstrate the economic benefits of implementing Photo-Set-derived cybersecurity solutions. The clear performance hierarchy and attack-specific insights enable evidence-based security system design for operational PV installations.

Author Contributions

Conceptualization, G.B.G. and G.F.; methodology, G.B.G.; software, A.M.; investigation, A.M.; data curation, A.M.; writing—original draft, A.M.; writing—review and editing, G.B.G. and G.F.; supervision, M.M. and M.R.; project administration, M.M.; funding acquisition, M.M. and P.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union-NextGenerationEU.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Paul, B.; Sarker, A.; Abhi, S.H.; Das, S.K.; Ali, M.F.; Islam, M.M.; Islam, M.R.; Moyeen, S.I.; Rahman Badal, M.F.; Ahamed, M.H.; et al. Potential smart grid vulnerabilities to cyber attacks: Current threats and existing mitigation strategies. Heliyon 2024, 10, e37980. [Google Scholar] [CrossRef] [PubMed]
Tuttle, M.; Poshtan, M.; Taufik, T.; Callenes, J. Impact of cyber-attacks on power grids with distributed energy storage systems. In Proceedings of the 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Beijing, China, 21–23 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
Linnartz, P.; Winkens, A.; Simon, S. A method for assessing the impact of cyber attacks manipulating distributed energy resources on stable power system operation. In Proceedings of the 2021 IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe), Espoo, Finland, 18–21 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
Mokarim, A.; Gaggero, G.B.; Marchese, M. Evaluation of the Impact of Cyber-Attacks Against Electric Vehicle Charging Stations in a Low Voltage Distribution Grid. In Proceedings of the 2023 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Glasgow, UK, 6 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
Zografopoulos, I.; Hatziargyriou, N.D.; Konstantinou, C. Distributed energy resources cybersecurity outlook: Vulnerabilities, attacks, impacts, and mitigations. IEEE Syst. J. 2023, 17, 6695–6709. [Google Scholar] [CrossRef]
Mokarim, A. Afroz Mokarim’s Datasets on Kaggle. Available online: https://www.kaggle.com/afrozmokarim/datasets (accessed on 27 July 2025).
Mokarim, A.; Gaggero, G.B.; Ferro, G.; Robba, M.; Marchese, M. Photo-Set: A Dataset for Physics-Based Cybersecurity Monitoring in Photovoltaic Systems. In Proceedings of the 2025 IFAC Workshop on Smart Energy Systems for Efficient and Sustainable Smart Grids and Smart Cities (SENSYS 2025), IFAC, Bari, Italy, 18–20 June 2025. [Google Scholar]
Giraldo, J.; Urbina, D.; Cardenas, A.; Valente, J.; Faisal, M.; Ruths, J.; Tippenhauer, N.O.; Sandberg, H.; Candell, R. A survey of physics-based attack detection in cyber-physical systems. ACM Comput. Surv. 2018, 51, 76. [Google Scholar] [CrossRef] [PubMed]
Luo, Y.; Xiao, Y.; Cheng, L.; Peng, G.; Yao, D. Deep learning-based anomaly detection in cyber-physical systems: Progress and opportunities. ACM Comput. Surv. 2021, 54, 106. [Google Scholar] [CrossRef]
Kosek, A.M. Contextual anomaly detection for cyber-physical security in smart grids based on an artificial neural network model. In Proceedings of the 2016 Joint Workshop on Cyber-Physical Security and Resilience in Smart Grids (CPSR-SG), Vienna, Austria, 12 April 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
Li, F.; Xie, R.; Yang, B.; Guo, L.; Ma, P.; Shi, J.; Ye, J.; Song, W. Detection and identification of cyber and physical attacks on distribution power grids with pvs: An online high-dimensional data-driven approach. IEEE J. Emerg. Sel. Top. Power Electron. 2019, 10, 1282–1291. [Google Scholar] [CrossRef] [PubMed]
Shilay, D.M.; Lorey, K.G.; Weiz, T.; Lovetty, T.; Cheng, Y. Catching Anomalous Distributed Photovoltaics: An Edge-based Multi-modal Anomaly Detection. arXiv 2017, arXiv:1709.08830. [Google Scholar]
Zideh, M.J.; Chatterjee, P.; Srivastava, A.K. Physics-informed machine learning for data anomaly detection, classification, localization, and mitigation: A review, challenges, and path forward. IEEE Access 2023, 12, 4597–4617. [Google Scholar] [CrossRef]
Urbina, D.I.; Urbina, D.I.; Giraldo, J.; Cardenas, A.A.; Valente, J.; Faisal, M.; Tippenhauer, N.O.; Ruths, J.; Candell, R.; Sandberg, H. Survey and New Directions for Physics-Based Attack Detection in Control Systems; US Department of Commerce, National Institute of Standards and Technology: Gaithersburg, MD, USA, 2016.
Gómez, Á.L.P.; Maimó, L.F.; Celdrán, A.H.; Clemente, F.J.G.; Sarmiento, C.C.; Masa, C.J.D.C.; Nistal, R.M. On the generation of anomaly detection datasets in industrial control systems. IEEE Access 2019, 7, 177460–177473. [Google Scholar] [CrossRef]
Conti, M.; Donadel, D.; Turrin, F. A survey on industrial control system testbeds and datasets for security research. IEEE Commun. Surv. Tutor. 2021, 23, 2248–2294. [Google Scholar] [CrossRef]
Faramondi, L.; Flammini, F.; Guarino, S.; Setola, R. A hardware-in-the-loop water distribution testbed dataset for cyber-physical security testing. IEEE Access 2021, 9, 122385–122396. [Google Scholar] [CrossRef]
Biswas, P.P.; Tan, H.C.; Zhu, Q.; Li, Y.; Mashima, D.; Chen, B. A synthesized dataset for cybersecurity study of IEC 61850 based substation. In Proceedings of the 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Beijing, China, 21–23 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–7. [Google Scholar]
Hassan, G.F.; Ahmed, O.A.; Sallal, M. Cyber Security in PV Farm: Threat Modelling and Dataset Generation. J. Adv. Res. Appl. Sci. Eng. Technol. 2025, 63, 67–79. [Google Scholar] [CrossRef]
Chen, X.; Li, B.; Braid, J.L.; Byford, B.; Colvin, D.J.; Glaws, A.; Jost, N.; Pierce, B.; Rabade, S.; Springer, M.; et al. Open data sets for assessing photovoltaic system reliability. Appl. Energy 2025, 395, 126132. [Google Scholar] [CrossRef]
Zhou, P.; Fang, H.; Wu, G. PDeT: A Progressive Deformable Transformer for Photovoltaic Panel Defect Segmentation. Sensors 2024, 24, 6908. [Google Scholar] [CrossRef] [PubMed]
Wu, G.; Zhang, Y.; Deng, L.; Zhang, J.; Chai, T. Cross-Modal Learning for Anomaly Detection in Complex Industrial Process: Methodology and Benchmark. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 2632–2645. [Google Scholar] [CrossRef]
Mathur, A.P.; Tippenhauer, N.O. SWaT: A water treatment testbed for research and training on ICS security. In Proceedings of the 2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater), Vienna, Austria, 11 April 2016; pp. 31–36. [Google Scholar] [CrossRef]
Shin, H.K.; Lee, W.; Yun, J.H.; Kim, H. HAI 1.0: HIL-based Augmented ICS Security Dataset. In Proceedings of the 13th USENIX Workshop on Cyber Security Experimentation and Test (CSET 20), Boston, MA, USA, 12–14 August 2020; USENIX Association: Berkeley, CA, USA, 2020. [Google Scholar]
Sharafaldin, I.; Habibi Lashkari, A.; Ghorbani, A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy ICISSP, Madeira, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar] [CrossRef]
The MathWorks, Inc. MATLAB, Version 9.13.0 (R2022b); The MathWorks, Inc.: Natick, MA, USA, 2022; Available online: https://www.mathworks.com/products/matlab.html (accessed on 26 September 2025).
Li, Y.; Yan, J. Cybersecurity of smart inverters in the smart grid: A survey. IEEE Trans. Power Electron. 2022, 38, 2364–2383. [Google Scholar] [CrossRef]
IEEE Std 519-2022; IEEE Standard for Harmonic Control in Electric Power Systems. IEEE: New York, NY, USA, 2022.
Gaggero, G.B.; Mokarim, A.; Girdinio, P.; Marchese, M. Should We Include Cyberdefense Functionalities in Electrical Power System Protections?: A Proposed Approach. IEEE Ind. Electron. Mag. 2024, 19, 10–16. [Google Scholar] [CrossRef]
Gaggero, G.B.; Rossi, M.; Girdinio, P.; Marchese, M. Detecting system fault/cyberattack within a photovoltaic system connected to the grid: A neural network-based solution. J. Sens. Actuator Netw. 2020, 9, 20. [Google Scholar] [CrossRef]

Figure 1. Complete Simulink model of the photovoltaic system.

Figure 2. Summer day with high peak irradiance.

Figure 3. Spring day with moderate irradiance and occasional cloud variations.

Figure 4. Winter day with lower peak irradiance.

Figure 5. P reduction.

Figure 6. Q increment.

Figure 7. P oscillation.

Figure 8. Q oscillation.

Figure 9. P tampering.

Figure 10. Tampering of the panel temperature.

Figure 11. Irradiance tampering.

Figure 12. Tampering of harmonics.

Figure 13. Effects of tampering the MPPT.

Figure 14. Short circuiting of cells in the PV panel.

Figure 15. Faultdust on panels.

Figure 16. Realistic cloudy day profile.

Figure 17. Accuracy comparison of all three algorithms.

Figure 18. Overall performance summary.

Table 1. Comparison of existing cybersecurity datasets for distributed control systems.

Dataset	Year	System Type	Attack Types	Physical Parameters
Water Distribution Testbed [17]	2021	Water Systems	Network attacks	Pressure, flow
ICS Substation Dataset [18]	2019	Electrical Substation	Protocol attacks	Voltage, current
SWaT Dataset [23]	2016	Water Treatment	Network intrusion	Process variables
HAI Dataset [24]	2020	Industrial Control	Multiple attack types	Generic sensors
CICIDS-2017 [25]	2017	Network Systems	Network attacks	Network traffic
PV Farm Dataset [19]	2025	Photovoltaic Systems	Attacks	22 PV-specific parameters
Photo-Set [6]	2025	Photovoltaic Systems	Cyber-physical attacks	22 PV-specific parameters

Table 2. Features of the dataset. Reprinted from [7].

Feature	Symbol	Description
$X_{1}$	$I r r$	the solar irradiance hitting the panel
$X_{2}$	$T_{a i r}$	Temperature of the air
$X_{3}$	$T_{P V}$	Temperature of the solar panel
$X_{4}$	$V_{c e l l s}$	Voltage measured at the terminals
$X_{5}$	$I_{c e l l s}$	Current emitted by cells array
$X_{6}$	$V_{d c}$	Average voltage in the DC link
$X_{7}$	$V_{a}$	Voltage of phase a (AC side)
$X_{8}$	$V_{b}$	Voltage of phase b (AC side)
$X_{9}$	$V_{c}$	Voltage of phase c (AC side)
$X_{10}$	$I_{a}$	Current of phase a
$X_{11}$	$I_{b}$	Current of phase b
$X_{12}$	$I_{c}$	Current of phase c
$X_{13}$	$f_{a}$	Frequency of phase a
$X_{14}$	$f_{b}$	Frequency of phase b
$X_{15}$	$f_{c}$	Frequency of phase c
$X_{16}$	$T H D_{a}$	Total harmonic distortion of voltage on phase a
$X_{17}$	$T H D_{b}$	Total harmonic distortion of voltage on phase b
$X_{18}$	$T H D_{c}$	Total harmonic distortion of voltage on phase c
$X_{19}$	P	Active power emitted by the inverter
$X_{20}$	$P s e t$	Last active power setpoint sent by the SCADA controller
$X_{21}$	Q	Reactive power emitted by the inverter
$X_{22}$	$Q s e t$	Last reactive power setpoint sent by the SCADA controller

Table 3. Taxonomy of the attacks on DERs.

Attack	Description	Physical Parameters Involved	Effects
BDI	Attacker send malicious commands to the actuators by manipulating the communication channel	Power on/off Active and/or reactive power Islanding mode	Economic Damage Overload/excess of feneration Oscillations False protection trip
FDI	Attacker sends fake measures to the higher-level controller by manipulating the communication channel	All measures sent to the controller	Make the controller make bad decisions
Malware/FM	Attacker can modify the firmware by physical attacks or exploiting other web services	Different parameters, including the following: Voltages; Frequencies; Waveform; …	Technical damages to the grid, including the following: Voltage/frequency constraint violations; False protection trip; Hiding real faults in protection; Damage to other devices; …

Table 4. Resume of .csv dataset files. Reprinted from [7].

File Name	Dimension	Description
training.csv	107,260 × 22	Normal behavior (3 days)
BDI_P_reduction.csv	90 × 23	Bad data injection: P reduced without reason
BDI_Q_increment.csv	90 × 23	Bad data injection: Q becomes negative (inductive power)
BDP_P_oscillation.csv	90 × 23	Bad data injection: P oscillates
BDP_Q_oscillation.csv	90 × 23	Bad data injection: Q oscillates
FDI_P.csv		False data injection: P Tampering
FDI_T_panel.csv	120 × 23	False data injection: $T_{P V}$ tampering
FDI_Irr.csv	120 × 23	False data injection: irradiance tampering
Firmware_THD.csv	120 × 23	Firmware modifications: harmonics tampering
Firmware_MPPT_modification.csv	120 × 23	Firmware modifications: MPPT tampering
Fault_ShortCircuitedCells.csv	120 × 23	Fault: short-circuited cells
Fault_Dust.csv	120 × 23	Fault: dust on the panels
Cloudy_Test.csv	43,201 × 23	Normal operation under realistic cloudy conditions

Table 5. One-Class SVM results.

Dataset	Accuracy	Sensitivity	Specificity	Confusion Matrix
BDI_P_reduction	66.67%	100%	0%	TP = 60	FP = 30
				FN = 0	TN = 0
BDP_P_oscillation	66.67%	100%	0%	TP = 60	FP = 30
				FN = 0	TN = 0
BDI_Q_increment	67.78%	100%	0%	TP = 61	FP = 29
				FN = 0	TN = 0
BDP_Q_oscillation	66.67%	100%	0%	TP = 60	FP = 30
				FN = 0	TN = 0
FDI_P	75%	100%	0%	TP = 90	FP = 30
				FN = 0	TN = 0
FDI_T_panel	75%	100%	0%	TP = 90	FP = 30
				FN = 0	TN = 0
FDI_Irr	75%	100%	0%	TP = 90	FP = 30
				FN = 0	TN = 0
Firmware_THD	75%	100%	0%	TP = 90	FP = 30
				FN = 0	TN = 0
Firmware_MPPT_modification	66.67%	100%	0%	TP = 60	FP = 30
				FN = 0	TN = 0
Fault_ShortCircuitedCells	75%	100%	0%	TP = 60	FP = 30
				FN = 0	TN = 0
Fault_Dust	75%	100%	0%	TP = 90	FP = 30
				FN = 0	TN = 0
Cloudy_Test	49.42%	0%	49.42%	TP = 0	FP = 21,851
				FN = 0	TN = 21,350

Table 6. Isolation Forest results.

Dataset	Accuracy	Sensitivity	Specificity	Confusion Matrix
BDI_P_reduction	57.78%	36.67%	100%	TP = 22	FP = 0
				FN = 38	TN = 30
BDP_P_oscillation	100%	100%	100%	TP = 60	FP = 0
				FN = 0	TN = 30
BDI_Q_increment	96.67%	95.08%	100%	TP = 58	FP = 0
				FN = 3	TN = 29
BDP_Q_oscillation	95.56%	93.33%	100%	TP = 56	FP = 0
				FN = 4	TN = 30
FDI_P	25%	0%	100%	TP = 0	FP = 0
				FN = 90	TN = 30
FDI_T_panel	25%	0%	100%	TP = 0	FP = 0
				FN = 90	TN = 30
FDI_Irr	25%	0%	100%	TP = 0	FP = 0
				FN = 90	TN = 30
Firmware_THD	100%	100%	100%	TP = 90	FP = 0
				FN = 0	TN = 30
Firmware_MPPT_modification	70%	55%	100%	TP = 33	FP = 0
				FN = 27	TN = 30
Fault_ShortCircuitedCells	9.17%	12.22%	0%	TP = 11	FP = 30
				FN = 79	TN = 0
Cloudy_Test	83.14%	0%	83.14%	TP = 0	FP = 7284
				FN = 0	TN = 35,917

Table 7. Local Outlier Factor results.

Dataset	Accuracy	Sensitivity	Specificity	Confusion Matrix
BDI_P_reduction	66.67%	100%	0%	TP = 60	FP = 30
				FN = 0	TN = 0
BDP_P_oscillation	66.67%	100%	0%	TP = 60	FP = 30
				FN = 0	TN = 0
BDI_Q_increment	67.78%	100%	0%	TP = 61	FP = 29
				FN = 0	TN = 0
BDP_Q_oscillation	66.67%	100%	0%	TP = 60	FP = 30
				FN = 0	TN = 0
FDI_P	75%	100%	0%	TP = 90	FP = 30
				FN = 0	TN = 0
FDI_T_panel	75%	100%	0%	TP = 90	FP = 30
				FN = 0	TN = 0
FDI_Irr	75%	100%	0%	TP = 90	FP = 30
				FN = 0	TN = 0
Firmware_THD	75%	100%	0%	TP = 90	FP = 30
				FN = 0	TN = 0
Firmware_MPPT_modification	66.67%	100%	0%	TP = 60	FP = 30
				FN = 0	TN = 0
Fault_ShortCircuitedCells	75%	100%	0%	TP = 60	FP = 30
				FN = 0	TN = 0
Fault_Dust	75%	100%	0%	TP = 90	FP = 30
				FN = 0	TN = 0
Cloudy_Test	3.68%	0%	3.68%	TP = 0	FP = 41,612
				FN = 0	TN = 1589

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mokarim, A.; Gaggero, G.B.; Ferro, G.; Robba, M.; Girdinio, P.; Marchese, M. Photo-Set: A Proposed Dataset and Benchmark for Physics-Based Cybersecurity Monitoring in Photovoltaic Systems. Energies 2025, 18, 5318. https://doi.org/10.3390/en18195318

AMA Style

Mokarim A, Gaggero GB, Ferro G, Robba M, Girdinio P, Marchese M. Photo-Set: A Proposed Dataset and Benchmark for Physics-Based Cybersecurity Monitoring in Photovoltaic Systems. Energies. 2025; 18(19):5318. https://doi.org/10.3390/en18195318

Chicago/Turabian Style

Mokarim, Afroz, Giovanni Battista Gaggero, Giulio Ferro, Michela Robba, Paola Girdinio, and Mario Marchese. 2025. "Photo-Set: A Proposed Dataset and Benchmark for Physics-Based Cybersecurity Monitoring in Photovoltaic Systems" Energies 18, no. 19: 5318. https://doi.org/10.3390/en18195318

APA Style

Mokarim, A., Gaggero, G. B., Ferro, G., Robba, M., Girdinio, P., & Marchese, M. (2025). Photo-Set: A Proposed Dataset and Benchmark for Physics-Based Cybersecurity Monitoring in Photovoltaic Systems. Energies, 18(19), 5318. https://doi.org/10.3390/en18195318

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Photo-Set: A Proposed Dataset and Benchmark for Physics-Based Cybersecurity Monitoring in Photovoltaic Systems †