1. Introduction
The global transition toward renewable energy sources has accelerated in recent years, driven by the urgent need to mitigate climate change and enhance energy security [
1,
2]. Among various renewable options, solar power has emerged as a particularly attractive choice due to its declining costs, scalability, and widespread resource availability [
3]. Nevertheless, the integration of solar power into conventional grids presents significant challenges related to its inherent variability and intermittency. Fluctuations in solar irradiance due to weather conditions, seasonal patterns, and diurnal cycles can introduce substantial uncertainty into power generation [
4,
5].
Accurate forecasting of solar power output is crucial for minimizing the economic and technical issues associated with this variability. When forecasts are inaccurate, energy providers often face imbalance costs as they must rely on expensive backup power or balancing services to meet contractual obligations [
6]. This challenge becomes more pressing as the share of solar power in the energy mix grows, increasing the grid’s vulnerability to forecast errors [
7,
8].
Despite widespread smart meter deployment, operational PV data remain fragmented across balance groups: BRPs do not routinely access meter-level production from other BRPs’ portfolios due to confidentiality, competition, and governance constraints embedded in current market roles and data-access regimes [
9,
10,
11,
12,
13]. Public EU data services focus on aggregated market information rather than cross-party, meter-level operational streams needed for localized spatio-temporal PV nowcasting (e.g., the ENTSO-E Transparency Platform) [
14].
A promising solution to address data fragmentation and improve solar forecasting accuracy lies in decentralized physical infrastructure networks (DePINs), which offer a fundamentally different, user-driven approach to decentralized data collection [
15]. Rather than relying on top-down systemic changes or limited regional data sources, a DePIN empowers individual photovoltaic (PV) owners—prosumers—to voluntarily share localized production data via tokenized incentive mechanisms and blockchain-based platforms [
16,
17,
18,
19,
20,
21]. This creates a bottom-up, privacy-preserving data commons that bypasses traditional institutional barriers [
20,
22,
23], generating cross-balance-group operational streams needed for accurate nowcasting while maintaining commercial confidentiality. By fostering a dense, geographically distributed array of solar nodes, the DePIN provides granular insights into fluctuating weather conditions and real-time nowcasting, thereby enhancing prediction models with advanced machine learning integration [
5,
24]. This facilitates direct participation in peer-to-peer data and energy markets, with automated mechanisms incentivizing contributors, unlocking novel economic opportunities in the solar sector, and aligning with evolving EU data-sharing frameworks [
9,
10].
Moreover, although decentralized energy systems have garnered attention for facilitating peer-to-peer trading and grid flexibility, the existing literature often examines market design, transaction mechanisms, or blockchain frameworks without systematically quantifying the direct relationship between network density, forecast accuracy, and economic returns [
20]. Consequently, there is a pressing need for an integrated study that links these three components—network density, solar forecasting accuracy, and financial viability—within a
decentralized physical infrastructure network (DePIN) setting.
This study complements three major strands of existing research. First, most prior work in spatial PV forecasting has relied on satellite-derived irradiance fields, reanalysis datasets, or mesoscale numerical weather prediction models. Studies using dense networks of physical PV installations are far less common, and analyses quantifying how forecast accuracy scales with the number of proximate sensors are particularly limited. By using real production measurements from closely spaced PV systems in a single, well-instrumented region, spatial correlation and network-density effects were examined in this study at a micro-climatological scale that is rarely addressed in the previous literature. Second, while sensor-density research in temperature, precipitation, and air-quality monitoring consistently shows diminishing returns beyond several kilometers, PV forecasting is governed by much shorter decorrelation lengths due to small-scale cloud dynamics. Our results indicate a higher density threshold (approximately 10–15 neighbors) for saturation, aligning with findings from high-resolution cloud-motion studies but applied here to operational PV systems rather than atmospheric proxies. Third, our contribution is empirical and operational relative to the DePIN literature, which has predominantly focused on token design, governance structures, incentive mechanisms, and interoperability. We do not introduce new blockchain mechanisms; instead, we evaluate how decentralized data availability influences forecasting accuracy and imbalance-cost proxies. Thus, the work positions the DePIN primarily as a data-access layer that enables improved spatial forecasting, rather than as a conceptual advance in decentralized infrastructure design itself. These clarifications provide a more precise positioning of the manuscript and delineate its contributions within the broader contexts of spatial forecasting, sensor-density research, and decentralized data-sharing architectures.
We provide three key advancements:
Network density vs. forecast accuracy: An empirical, machine-learning-based analysis that quantifies how forecast performance scales with the number of neighboring PV installations in a single, well-instrumented region.
Cost-saving analysis: A simplified imbalance-cost proxy that maps observed MAE reductions into monetary terms using real portfolio energy volumes and a representative Dutch imbalance price, providing an interpretable link between forecasting improvements and expected financial impact for a BRP.
DePIN-oriented value allocation: A marginal-benefit framework that illustrates how the resulting imbalance-cost reductions could, in principle, fund data-sharing incentives in DePIN-like architectures, without specifying a concrete tokenomics or governance design.
By bridging the technical and economic dimensions of solar forecasting, our study provides valuable insights into how dense, decentralized sensor networks can facilitate more accurate prediction, reduce operational costs, and create new revenue streams in emerging energy markets.
The remainder of this paper is organized as follows:
Section 2 reviews the relevant literature on solar forecasting methods, the DePIN paradigm, and the economic implications of accurate predictions.
Section 3 outlines our dataset, forecasting methodology, and the economic model for DePIN profitability.
Section 4 presents the results, highlighting improvements in forecast accuracy and associated cost savings.
Section 5 discusses key findings and practical implications, followed by
Section 6, which concludes with a summary of contributions and future research directions.
4. Results
This section presents the findings from our experiments, focusing on the impact of participation levels in the DePIN on forecasting accuracy and the resulting economic benefits. We evaluate performance across the three forecasting levels outlined in the methodology: Level 1 (clear-sky baseline with no data sharing), Level 2 (solo forecasting using historical data from individual PV systems), and Level 3 (networked forecasting incorporating data from neighboring PV systems). Forecasting accuracy is assessed using the Mean Absolute Error (MAE), as defined in the methodology, to quantify prediction performance. Economic benefits are evaluated through total imbalance costs (C), approximated as , where is the average imbalance price and E is the total energy volume, enabling direct comparisons of cost savings () across levels. Several forecasting methods are compared, namely, a clear-sky model (CS), baseline averaging (AVG) model, Support Vector Regression (SVR), Random Forests (RFs), Extreme Gradient Boosting (XGB), and a Multilayer Perceptron (MLP), to provide a comprehensive assessment.
4.1. Forecasting Accuracy
4.1.1. Forecasting Accuracy at Level 1
At Level 1, no data is shared, and forecasting relies solely on a clear-sky model, which estimates PV production based on astronomical parameters without historical or real-time data. Specifically, we employ the Haurwitz clear-sky Global Horizontal Irradiance (GHI) model, which calculates solar irradiance using only astronomical inputs such as solar zenith angle [
73].
The performance metrics for this level were calculated across the 47 PV installations in the dataset, comparing the model’s predictions against actual power output. The distribution statistics of these metrics are summarized in
Table 2. The clear-sky model exhibits relatively high errors, with a mean Mean Absolute Error (MAE) of 0.0817 ± 0.0045, a mean Root Mean Square Error (RMSE) of 0.1488 ± 0.0075, and a mean Coefficient of Determination (
) of 0.6313 ± 0.0407. These values highlight the limitations of non-data-driven approaches, particularly in accounting for cloud cover, atmospheric conditions, and other localized variabilities that affect solar output.
4.1.2. Forecasting Accuracy at Level 2
At Level 2, forecasting relies solely on historical data from the individual PV installation, without incorporating spatial information from neighboring systems. This represents a solo forecasting approach that leverages temporal patterns from the system’s own production history.
We compared the performance of several forecasting methods in this configuration: a baseline averaging model (AVG), which uses simple day-profile averaging based on historical data without machine learning, and advanced machine learning models, namely, Support Vector Regression (SVR), Random Forests (RFs), Extreme Gradient Boosting (XGB), and a Multilayer Perceptron (MLP). This evaluation quantifies the benefits of machine learning over the simple history-based AVG approach for solo forecasting.
The results are summarized numerically in
Table 3. Three common forecasting accuracy metrics were employed: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the Coefficient of Determination (
).
The results demonstrate that all Level 2 models, including the simple AVG baseline, achieve comparable or slightly better performance than the Level 1 clear-sky model, underscoring the value of incorporating historical data from individual PV systems. Notably, the machine learning models (except SVR) outperform the AVG baseline, highlighting the advantages of data-driven approaches in capturing complex temporal patterns over both the non-data-driven Level 1 baseline and the history-only AVG method. Among them, the Random Forests model achieved the lowest MAE (0.0676 ± 0.0026) and highest (0.741 ± 0.014), while the MLP achieved the lowest RMSE (0.1213 ± 0.005), indicating its superior ability to model non-linear relationships in historical data. XGB also showed substantial improvements over AVG and Level 1, further validating the value of machine learning for solo forecasting. Interestingly, the AVG model achieved a lower MAE than SVR (0.1046 ± 0.0060), suggesting that simple averaging can be effective for minimizing average absolute errors, though it underperforms in explaining variance and handling variability compared with the better machine learning methods and offers only marginal gains over Level 1.
4.1.3. Forecasting Accuracy at Level 3: Impact of Network Density
At Level 3, forecasting incorporates historical data from both the target PV installation and a varying number of neighboring installations, leveraging spatial correlations to enhance predictions. This networked approach builds directly on the solo machine learning models from Level 2 (equivalent to 0 neighbors), allowing us to quantify the incremental benefits of increasing network density.
Figure 3,
Figure 4 and
Figure 5 present the evolution of forecasting accuracy metrics with the increase in the number of neighboring installations. The figures illustrate the distribution of forecasting results across PV units for each level of network density, with the first box (0 neighbors) representing the Level 2 solo performance for the machine learning models.
The results show that MAE decreases sharply when the first few neighboring installations are added, with most of the improvement occurring within the first 10–15 neighbors. Beyond this point, the curve flattens, indicating diminishing marginal forecasting benefits as spatial redundancy increases.
RMSE follows the same pattern as MAE: the accuracy improves quickly with initial increases in network density, and the gains saturate once roughly 10–15 neighbors are incorporated. This reinforces that short-term irradiance dynamics are primarily informed by geographically close systems.
The R2 metric increases significantly as nearby neighbors are included, confirming that spatial information enhances the explanatory power of the model. The improvement tapers off with larger neighbor counts, consistent with the saturation behavior observed in MAE and RMSE.
Across all forecasting models (RF, XGB, SVR, and MLP), the results consistently demonstrate improvements in forecasting accuracy as additional neighboring installations are included, extending the gains observed at Level 2 over the non-data-driven Level 1 baseline. MAE and RMSE decrease with network density, while increases, indicating closer alignment between predicted and actual production compared with solo forecasting.
We applied a paired Wilcoxon signed-rank test to the Random Forests MAE values across all 47 PV installations to confirm that the observed improvement from Level 2 to Level 3 is statistically significant. The test indicated a highly significant reduction in error when using 10 nearest neighbors . This shows that the gain from network-based forecasting is not due to random variation but is statistically robust.
Model comparisons reveal distinctive sensitivities to network density. To quantify these observations, we computed Spearman correlation coefficients between the number of neighboring PV installations and the three forecasting metrics. These coefficients, annotated in
Figure 3,
Figure 4 and
Figure 5, confirm the visual trends:
XGB consistently achieves the strongest correlations (absolute values ), underlining its robustness to spatial data integration.
RF follows closely, with correlation values around 0.7–0.8, also indicating high sensitivity to network density.
SVR and the MLP show weaker coefficients (absolute values generally ), reflecting limited responsiveness to neighboring installations.
In summary, the analysis establishes a robust link between forecasting accuracy and network density at Level 3, with clear thresholds (about ten neighbors), beyond which additional data provide diminishing returns. It highlights the superior adaptability of XGB and RF in networked settings, reinforces the critical role of spatial correlation in decentralized solar forecasting, and demonstrates substantial gains over the solo approaches at Levels 1 and 2.
4.2. Economic Impact
4.2.1. Imbalance Costs at Level 1
The imbalance price used in this study was determined based on market data from the Dutch electricity sector. Specifically, we adopted the average price for shortages in 2023, which was 198 EUR/MWh according to a Rabobank report [
78]. This value serves as the representative average imbalance price (
) in our simplified economic model, reflecting the costs incurred by Balance Responsible Parties (BRPs) for deviations in the balancing market. Given that our forecasting model provides predictions 1 h ahead, this aligns with intraday market dynamics, where imbalance costs often increase due to the urgency of short-term grid adjustments. This estimate helps quantify the financial implications of forecast errors, emphasizing the potential for cost reductions through improved accuracy in DePIN frameworks. At Level 1, where no data is shared and forecasting relies on the clear-sky model, the imbalance costs represent the baseline economic burden for the BRP. The economic burden of Level 1 participants is quantified through the imbalance costs they impose, as shown in Equation (
9):
where
represents the imbalance costs attributable to Level 1 forecasting inaccuracy,
EUR/MWh is the average imbalance price,
is the forecasting error using the clear-sky model, and
MWh is the total energy volume for the portfolio of 47 PV systems over the 9-month period (January to September 2016, computed from the de-normalized energy production data totaling 100,078.90 kWh). This level highlights the financial inefficiencies associated with non-data-driven forecasting, where the full imbalance costs are borne by the BRP without any mitigating contributions from participants.
4.2.2. Imbalance Costs at Level 2
At Level 2, where forecasting incorporates historical data from individual PV systems without cross-sharing, the imbalance costs are reduced compared with Level 1 due to improved accuracy from local temporal patterns. We evaluated the economic burden using the Mean Absolute Error (MAE) from various models, as detailed in the methodology.
The imbalance costs for each model are quantified using Equation (
9), with
EUR/MWh and
MWh. The cost for the Random Forests (RFs) model, which achieves the lowest MAE and highest relative savings, is calculated as shown in Equation (
12):
Similar calculations apply to the other models (
Table 4). The 17.24% relative savings with the RF model represent the portfolio-wide economic benefit if the historical data from all PV systems—covering the full energy production of the portfolio—are shared within the DePIN, enabling these accuracy improvements across the entire network.
This level demonstrates the economic benefits of utilizing historical data within a DePIN framework, where participants contribute their own data to enable these accuracy gains, potentially receiving tokenized rewards from the resulting cost savings borne by the BRP. Note that some models, such as SVR, may yield higher costs than Level 1, highlighting the importance of selecting appropriate forecasting methods.
4.2.3. Imbalance Costs at Level 3
At Level 3, where forecasting incorporates historical data from both the target PV installation and neighboring installations, the economic benefits are substantially enhanced through the network effect. The relationship between network density and economic costs follows a similar pattern to the forecasting accuracy improvements observed in
Figure 3,
Figure 4 and
Figure 5.
Figure 6 illustrates the reduction in imbalance costs as the network density increases, using the Random Forests model as representative. The economic costs show a clear decreasing trend with the increase in the network density, mirroring the improvements in forecasting accuracy. The most significant cost reductions occur within the first 10–15 neighboring installations, with diminishing returns beyond this threshold.
To quantify these improvements, the imbalance costs for key network densities using the Random Forests model are reported in
Table 5. The Level 2 configuration (0 neighbors) results in an imbalance cost of EUR 1339, whereas incorporating only five neighbors reduces this to EUR 1002. Increasing to ten neighbors yields EUR 953, while the full-network scenario (46 neighbors) achieves the lowest cost of EUR 884.
4.2.4. Sensitivity to Imbalance Price Variability
We performed a sensitivity analysis by varying the representative imbalance price by
around the Dutch 2023 average (198 EUR/MWh) to evaluate the robustness of the economic conclusions with respect to real-world imbalance price volatility.
Table 6 reports the resulting imbalance costs for key network densities.
Because the imbalance-cost model is linear in the imbalance price (Equation (
9)), absolute costs scale proportionally with the price level. Importantly, however, the relative savings achieved with networked forecasting remain unchanged. For example, the cost reduction from Levels 2 to 3 with full-network access remains approximately 33% across all three price scenarios, and the reduction from Levels 2 to 3 with ten neighbors remains 28.9%. This demonstrates that the economic benefits of network-based forecasting are robust to realistic fluctuations in imbalance pricing.
4.2.5. Marginal Benefits and Spatial Analysis
The analysis evaluates the marginal benefit (MB), defined as the reduction in total imbalance cost per additional participating PV system.
Figure 7 illustrates the relationship between marginal benefit and the number of participants, revealing a strong trend of diminishing marginal returns. As the collaborative forecasting pool scales, total costs decrease from EUR 1340 (with 0 participants) to EUR 885 (with 46 participants), demonstrating significant improvements in cost efficiency through network expansion.
The marginal benefit analysis shows that the initial participants provide the most substantial value, with the first additional system yielding EUR 155.28 in cost savings. Beyond approximately 20 participants, the marginal benefit stabilizes at low levels, generally remaining below EUR 3 per additional system.
Table 7 and
Figure 8 detail how system interconnections evolve with the increase in the geographical radius, revealing three distinct phases of network development.
During the rapid local clustering phase (0–7.5 km), connectivity grows exponentially with a 483% increase in mean neighbors from 2.5 to 7.5 km, forming dense local clusters with strong spatial correlation. The Transition to Regional Network phase (7.5–15 km) shows continued growth at a reduced rate (77% increase), integrating local clusters into a regional network. Finally, the Saturation and Completion phase (15–30 km) exhibits dramatically slowed growth (48% increase over 15 km), approaching full network integration.
Intermediate radii (5–15 km) show median exceeding mean, demonstrating above-average connectivity for most systems. Large radii (≥20) exhibit convergence of mean and median values, indicating uniform connectivity as the network approaches completeness. The standard deviation pattern peaks at 15 km radius () when connectivity variability is maximized, then decreases with network uniformity.
5. Discussion
This study provides the first integrated analysis of how network density in decentralized physical infrastructure networks (DePINs) affects both the solar forecasting accuracy and economic viability. By separating descriptive results from interpretive insights, the following discussion synthesizes the key mechanisms behind the observed performance patterns and highlights their broader implications.
5.1. Technical Implications for Solar Forecasting
Across all machine learning models, the forecasting accuracy improved consistently with the inclusion of neighboring PV installations, but these gains followed a clear saturation pattern. The strongest improvements occurred within the first 10–15 neighbors, after which additional installations contributed increasingly redundant information. This behavior reflects the underlying spatial correlation structure: nearby PV systems experience highly similar irradiance dynamics due to shared cloud movements and micro-local weather patterns, while installations beyond roughly 10–15 km—corresponding to typical cloud field scales in the Utrecht region—exhibit markedly weaker correlations. Consequently, the diminishing returns observed in the MAE, RMSE, and metrics directly stem from the reduced informational value of more distant nodes.
The model-specific behavior further elucidates the mechanisms of networked forecasting. Random Forests and XGBoost achieved the largest accuracy gains, confirming their capacity to exploit fine-grained non-linear relationships embedded in spatially distributed data. In contrast, the SVR and MLP displayed flatter improvements and higher variability across installations, suggesting weaker sensitivity to spatial features and greater dependence on hyperparameters. The strong negative Spearman correlations between the network density and both the MAE and RMSE, and the corresponding positive correlations with , reinforce that increased density systematically enhances predictive power up to the saturation threshold.
Overall, the technical findings indicate that DePIN-based forecasting benefits substantially from local clustering, with approximately 10 neighbors providing a practical upper bound for meaningful improvements in short-term solar predictions.
5.2. Economic Viability and Market Transformation
The economic analysis showed that network-driven gains in accuracy translate directly into substantial reductions in imbalance costs. The relative cost savings remained stable across different imbalance price scenarios because the economic model is linear in price; thus, improvements in MAE retain their proportional value under both high- and low-price conditions. This robustness strengthens the practical relevance of the observed cost reductions.
Interpreting the cost curve reveals the same saturation behavior visible in the forecasting metrics. The steepest decrease in imbalance costs occurs when moving from zero to approximately ten neighbors, consistent with the region of highest spatial correlation. Beyond this range, marginal cost reductions taper off sharply. The marginal benefit analysis further illustrates this phenomenon: the first few additional neighbors offer large reductions in imbalance costs, but the marginal value rapidly declines until stabilizing at very low levels once the network becomes fully interconnected.
These diminishing marginal returns have important implications for DePIN economics. Incentive structures should not reward unlimited expansion of data connectivity but rather prioritize reaching an optimal local density. High rewards are justified for early contributors who help establish the first few meaningful data links, while subsequent connections should receive smaller rewards to avoid over-incentivizing redundant data.
5.3. DePIN as a Solution to Data Fragmentation
Beyond the forecasting performance, the results underscore the role of the DePIN as a mechanism for overcoming entrenched data fragmentation in electricity markets. Traditional BRP frameworks prevent cross-portfolio access to operational PV measurements due to confidentiality and governance constraints. The DePIN enables prosumers to voluntarily and securely share data with privacy-preserving incentives, thereby unlocking spatial correlations that conventional market structures cannot access.
The two-layer benefit structure—improvements from solo historical data (Level 2) and additional improvements from spatial data (Level 3)—shows that DePINs can be deployed incrementally. Participants receive immediate value from sharing their own data, with additional value unlocked as local clusters densify. This gradual adoption pathway is particularly relevant for real-world deployment, where network density develops unevenly over time.
5.4. Practical Implementation Considerations
The spatial analysis revealed three distinct phases in the growth of neighbor connectivity: rapid local clustering within 5–10 km, a transition phase up to 15 km, and a saturation phase beyond 15 km where nearly all systems become interconnected. These phases closely match the marginal benefit behavior observed economically, confirming that spatial topology directly drives the forecasting value.
The alignment between geographical distance and forecasting benefit suggests that DePIN deployment strategies should aim to achieve dense local clusters before attempting broad regional coverage. Urban and suburban environments—where distances between installations are naturally small—represent ideal early deployment regions. Nevertheless, the strong performance of the forecasting models even with only 47 installations demonstrates that meaningful value can be created in medium-density regions as well.
Regulatory considerations further support DePIN implementation. The emerging EU Data Act and interoperability frameworks emphasize user consent, privacy preservation, and data portability, all of which align with the DePIN architecture. As a result, decentralized data sharing through tokenized participation can operate within existing regulatory constraints while enhancing grid resilience.
5.5. Limitations and Research Boundaries
Several limitations should be considered when interpreting these results. First, the analysis is restricted to a single geographical region (Utrecht), the relatively homogeneous climate and dense urban layout of which may not reflect conditions in other areas. Spatial correlation patterns, cloud dynamics, and irradiance variability can differ substantially across climates. In addition, the study focuses exclusively on residential-scale PV systems; utility-scale installations may exhibit different spatial and temporal behavior, and the extent to which our results generalize to those settings remains uncertain.
Second, the forecasting horizon examined here is limited to one hour ahead, which is relevant for intraday market decisions but represents only one segment of the forecasting landscape. Very short-term horizons (minutes ahead) or longer-term forecasts (day-ahead) are governed by different physical processes and may exhibit different relationships between network density and accuracy. As a result, the density–accuracy patterns identified in this study should not be assumed to hold uniformly across other horizons.
Finally, the economic model adopts a simplified imbalance pricing scheme based on a single representative average price. While this abstraction helps isolate the effect of forecasting accuracy, real-world settlement systems often include asymmetric pricing, time-dependent penalties, and ancillary services that can amplify or diminish the financial impact of forecast errors. Moreover, the 198 EUR/MWh reference value reflects conditions in the Dutch market during 2023 and may differ substantially in other regulatory or market contexts. Consequently, the economic results should be interpreted as illustrative rather than universally transferable.
5.6. Broader Implications for Energy Transition
The collective results point to broader implications for the future of renewable energy systems. By enabling privacy-preserving, incentive-aligned data sharing, DePINs effectively transform distributed PV assets into a collaborative sensing network. This bottom-up architecture complements traditional centralized grid monitoring and forecasting, offering resilience, redundancy, and cost-effective scalability.
The demonstrated success of ensemble machine learning models highlights the growing importance of AI techniques in managing distributed energy resources. As renewable penetration increases, accurate short-term forecasting will play a critical role in mitigating volatility and reducing infrastructure stress.
Finally, the principles demonstrated here—network-density-driven accuracy improvements, diminishing marginal returns, and decentralized incentive structures—extend naturally to other distributed resources, such as wind turbines, batteries, electric vehicles, and demand-responsive loads. DePIN architectures thus offer a foundational framework for the next generation of decentralized, user-centric, and economically sustainable energy systems.
6. Conclusions
This study presents a systematic analysis of how decentralized data sharing in decentralized physical infrastructure networks (DePINs) can improve both the technical and economic performance of solar power forecasting. Using real-world data from 47 PV systems in Utrecht, we developed a hierarchical forecasting framework that isolates the contributions of historical data (Level 2) and spatial neighbor information (Level 3). The results demonstrate that increasing network density substantially improves forecasting accuracy, with the most pronounced gains occurring when approximately 10–15 neighboring installations are included. These technical improvements translate directly into economic benefits: networked forecasting reduces imbalance costs by up to 45% relative to the non-data-driven Level 1 baseline and by 34% compared with Level 2 solo forecasting. These reductions reflect the strong spatial correlations in PV production within 5–10 km radii, where local cloud dynamics dominate short-term irradiance variability.
The main contributions of this work are threefold. First, we provide an empirical characterization of the relationship between the network density and forecasting accuracy, identifying a clear diminishing-returns pattern beyond 10–15 neighbors. Second, we quantify how the resulting accuracy gains reduce imbalance costs, including a sensitivity analysis demonstrating that the relative financial benefits persist across realistic price-volatility scenarios. Third, we show that DePIN-style data sharing could help overcome operational data fragmentation by enabling privacy-preserving access to distributed PV measurements.
While these findings highlight the potential of DePIN-enabled forecasting, they should not be interpreted as universally transferable. The analysis focuses on a single region, a one-hour forecasting horizon, and residential-scale PV systems. Future work should evaluate the density–accuracy relationship across diverse climates, geographies, and system scales and investigate how advanced spatio-temporal models (e.g., graph neural networks) interact with network density. In addition, real-world DePIN deployments will require careful design of token-based incentives and privacy safeguards, which remain open research directions.
Overall, our results suggest that decentralized data-sharing architectures hold promise for improving short-term forecasting and reducing imbalance costs. Rather than definitive claims of scalability, these findings should be viewed as a foundation for future exploration of DePIN-based forecasting in larger and more heterogeneous energy systems.