1. Introduction
In recent years, the proliferation of smart agricultural equipment has significantly heightened the need for diverse energy sources in rural areas [
1], encompassing greenhouse heating, refrigeration houses and the heat and cold energy of residential buildings [
2]. This substantial surge in energy demand has precipitated a series of safety concerns within the microgrid of agricultural microgrid [
3,
4]. For instance, prolonged overloading of the generating units in Vietnam’s coal power plants has led to the malfunction of certain units, resulting in detrimental consequences such as the occurrence of broiler deaths on farms and inadequate farmland irrigation. It led to significant financial losses for farmers. In a village in Henan Province, China, a surge in electricity consumption during summer months, driven by high power requirements and extensive irrigated farmland, resulted in overloaded electricity consumption and frequent local power outages, which ultimately paralyzed the town’s power supply. Evidently, the intricate and diverse energy demands within agricultural microgrid have posed a formidable challenge in power scheduling [
5]. Furthermore, the frailty of grid structures in rural areas, coupled with high line resistance, has led to increased transmission losses and diminished voltage [
6,
7]. Adding DER generation randomly may lead to voltage excursions [
8]. Consequently, constructing a microgrid planning method that can enhance the security of microgrids and optimize the utilization of DERs is worth further exploration.
A multitude of studies on the supply, planning and optimal scheduling of electrical energy in agricultural microgrid have been extensively examined by numerous scholars. Ref. [
9] proposes a hierarchical distributed alternating direction method of the multiplier-based model predictive control framework, which aims to provide appropriate conditions for the growth of greenhouse’s crops and plants and limit the total amount of electricity exchanged with the main electricity grid. Ref. [
10] employs the PV greenhouse rural energy system as its research object, establishing an agrometeorological model and an energy meteorological model, with a focus on the actual situation of rural energy systems in northern China. A regulation flexibility assessment method and a greenhouse load optimization aggregation strategy for modern agricultural microgrid are proposed in [
11], which can mitigate grid pressure by flexibly regulating greenhouse load demand. Ref. [
12] presents a two-stage stochastic operation method for the multi-energy microgrid, which allows for the optimal scheduling of energy generation, conversion of storage devices under constraints. This approach can handle various uncertainties in renewable energy generation, electricity price and load demand while controlling the indoor temperature to ensure heat comfort for the customers. Planning methods for integrated energy systems have also been proposed in several articles [
13,
14]. Ref. [
15] put forth a novel multilevel extension planning framework for active distribution networks.
When it comes to loads such as greenhouses and refrigeration houses, which require precise regulation of the internal temperature, traditional methods relying solely on the physical model of the building may yield inaccurate results. It potentially causes issues in the overall planning scheme. Therefore, employing the data mechanism fusion-driven method to train and modify the model indicators is essential for obtaining more accurate and reliable results. Ref. [
16] introduces a novel scheme based on the integration of the essential components. By using the exploitation of artificial neural network, the performance of the prediction and optimization components is additionally enhanced. Ref. [
17] introduces an innovative centralized control scheme designed for a smart network of greenhouses integrated with a microgrid, thus constituting a smart small-scale power grid within the framework of smart grid technology. Some studies have been conducted to compare the effectiveness of various types of neural network algorithms in greenhouse temperature, and the results show that the GRNN [
18] algorithm has a greater advantage in terms of accuracy and generalization ability [
19].
According to the above-mentioned articles, current studies on planning for agricultural microgrid focus on how to rationally dispatch various types of renewable energy generation [
20], improve the efficiency of integrated energy use and ensure the reliability of power supply [
21,
22,
23]. However, for large-scale DER replenishment, planning that considers the energy demands of multiple buildings while taking into account grid security is currently rare. In addition, since agricultural microgrid are mostly self-built and lack unified planning, further disorganized planning on this fragile grid will lead to escalating security threats. While the current study primarily focuses on the integration of distributed energy resources (DERs) and multi-type building loads to enhance voltage stability and energy utilization efficiency, the inclusion of carbon cycle dynamics could indeed provide a more comprehensive environmental assessment, which constitutes a valuable direction for future research. Similarly, although the proposed data mechanism fusion method relies on accurate data acquisition and processing, the current model does not explicitly incorporate robustness mechanisms against communication failures or data interruptions. Such considerations—particularly pertaining to the resilience of control and communication networks—are critical for real-world implementation and will be addressed in subsequent work to further strengthen the operational security of agricultural microgrids.
Therefore, a unified and integrated planning approach is needed to address the above issues. This paper proposes a data mechanism fusion-driven microgrid planning method framework that takes into account enhancing the security of microgrids and optimizing the utilization of DERs. The major contributions of this paper can be summarized as follows:
- (1)
A site selection and capacity determination planning methodology for microgrids is proposed, taking into account the characteristics of user demands. This planning method can satisfy intricate constraints of various types of buildings.
- (2)
Refined load models for greenhouses, refrigeration houses and residential buildings are developed, taking into account the effect of external temperature changes on the internal temperature of buildings. A temperature field prediction method for greenhouses is proposed by data mechanism fusion driven, which ensures the basic requirements for the growth of plants inside.
- (3)
Harnessing the regulatory capabilities of diverse building loads and energy storage systems, voltage excursions caused by DER generation in microgrids are mitigated.
The rest of this paper is arranged as follows:
Section 2 shows the structural framework.
Section 3 describes the comprehensive agricultural microgrid model that incorporates intricate constraints of various types of buildings.
Section 4 presents the data mechanism fusion driven planning methodology. Case studies are tested in
Section 5.
Section 6 concludes this paper.
4. Temperature Field Prediction Driven by Data Mechanism Fusion
Developing accurate internal temperature predictions, particularly in greenhouses, requires more than relying solely on heat transfer models. Therefore, refinement of model parameters is essential through the utilization of extensive datasets and neural network training. This approach ensures enhanced accuracy in temperature forecasting, thereby aligning agricultural microgrid planning results more closely with practical requirements. From the previous literature review, the GRNN algorithm has a greater advantage in terms of accuracy and generalization ability.
4.1. The Concept of Data Mechanism Fusion
In this study, data-mechanism fusion is defined as a methodology that integrates data-driven algorithms with physical mechanism-based models. Its core lies in utilizing data-driven algorithms (such as GRNN) and clustering algorithms (such as k-means) to learn from historical data, cluster scenarios, and correct key parameters in the mechanism model that are difficult to precisely determine. This process results in a hybrid model that combines physical interpretability with high predictive accuracy. This approach overcomes the limitations of pure mechanism-based models, such as insufficient accuracy, as well as the drawbacks of pure data-driven models, such as poor interpretability and weak extrapolation capability. Additionally, it addresses the issue of an excessive number of typical scenarios. This methodology is particularly suitable for applications such as microgrid planning, where high model reliability and interpretability are critical.
4.2. GRNN Algorithm
To implement the data-driven component of our proposed data-mechanism fusion approach, we employ the Generalized Regression Neural Network (GRNN). GRNN is a radial neural network algorithm based on a non-parametric kernel regression approach [
18]. The algorithm uses non-parametric density estimation to establish the relationship between independent and dependent variables in training samples. It then calculates the regression value of the dependent variable based on the independent variable. Unlike traditional neural network algorithms, the GRNN algorithm does not require the definition of the neural network’s structure, as it only needs smooth factor parameters. Due to its strong nonlinear mapping capability and high fault tolerance, the algorithm is suitable for analyzing the correlation between environmental variables and internal temperature points within a building, such as a greenhouse. The theoretical foundation of GRNN resides in nonlinear kernel regression analysis, wherein the regression relationship between the dependent variable
y and the independent variable
x is not delineated by a prescriptive formula, but rather inferred from the analysis of the formation of a probability density function. This approach facilitates the determination of the maximum probability density under the given conditions of the independent variable.
In this approach, the joint probability density function of the variables
x and
y as
,
) is defined, and the probability density function of the dependent variable is calculated given the independent variable
X, i.e., the conditional mean
of this probability density function.
The probability density function
can be obtained from the training samples by nonparametric estimation:
where
n is the number of samples, and
p is the dimension of the independent variable.
. are the sample observations of the random variables
.
δ is the smoothing factor.
Replacing
f(
x,
y) with the probability density estimate
in (43), the output of the neural network is obtained by simplifying the integration operation shown below:
where
is the neural network output value.
4.3. GRNN Structure
GRNN comprises four distinct layers, including the input layer, pattern layer, summation layer and output layer [
18]. The structure is shown in
Figure 2.
The input layer is the independent variable set in the study. The main external factors affecting the internal temperature of the greenhouse are illumination intensity, external temperature and time series, so the number of neurons in the input layer is three.
The pattern layer is a hidden regression layer trained on the samples, and the data are obtained from the input layer data by Gaussian transfer function. The number of neurons in the pattern layer is equal to the number of neurons in the input layer. The transfer function for information transfer between neurons is as follows:
where
X is the input variable of the neural network;
is the learning sample of the
i-th neuron.
The summation layer uses two different types of neurons to weigh and sum the neuron data in the pattern layer. The weighting methods of the summation layer are divided into direct summation and weighted summation. Direct summation is the sum of the data of the pattern layer according to the weight of 1.
is the output value of direct summing.
Weighted summation is the process of weighting and summing the pattern layer data by taking
, the elements of the pattern layer output sample
Y, as weights.
is the weighted sum output value.
The result of the final operation is passed to the output layer. Then the output layer divides the two results of the summation layer to obtain the estimation result
.
4.4. Parameter Modification of Building Model Based on GRNN
To assess the reliability of the prediction results more effectively, it is necessary to conduct a comprehensive fitting of the greenhouse temperature field following the acquisition of the predicted data. This involves the fitting of a two-dimensional temperature field incorporating known temperature values at specific sampling points, with particular emphasis on ensuring the continuity of the temperature field. The cubic spline interpolation method is well-suited for achieving such a fit. The temperature at any given point along the cross-section of the temperature sampling site can be derived through cubic spline interpolation fitting methodology, from which the greenhouse temperature field can be visualized by combining the relevant functions within MATLAB R2023a software.
Modifications to the parameters of the building model based GRNN present a viable avenue for mitigating the shortcomings inherent in traditional greenhouse models concerning the precise forecast of indoor temperatures. This endeavor lays the groundwork for the data mechanism fusion driven microgrid planning method.
4.5. Clustering of Typical Scenarios Based on k-Means
To address the variability and uncertainty of renewable energy generation, this study employs the k-means clustering method to extract typical operational scenarios from historical data. The k-means algorithm partitions the data into k clusters by minimizing the within-cluster variance, with the centroid of each cluster representing a typical scenario. This approach ensures robustness in planning while maintaining computational efficiency.
The clustering objective function is proposed as follows:
where
denotes the number of clusters;
represents the
-th cluster;
x is the data point in feature space;
denotes Centroid of cluster
.
Among them, the silhouette coefficient serves as an evaluation metric for assessing the performance of k-means clustering. By combining both cohesion (intra-cluster similarity) and separation (inter-cluster dissimilarity), it can be used to compare different clustering algorithms or different parameter settings on the same dataset. The magnitude of the silhouette value reflects how well each sample fits within its assigned cluster. A higher silhouette value indicates stronger cluster membership. The formula for calculating the silhouette coefficient is as follows:
where
represents the standardized distance between the
-th data point and other points within the same cluster.
denotes the standardized distance between the
-th point and points in the nearest neighboring cluster. The silhouette value
ranges from [−1, 1]. A higher
indicates better clustering quality. When
, it suggests the point is likely assigned to an incorrect cluster.
The scenario selection methodology employs k-means clustering (k = 3) to identify representative operational scenarios from historical renewable generation data, with the optimal cluster count determined by maximizing the average silhouette coefficient (S = 0.62 ± 0.08). The clustering objective function yields three distinct scenario centroids: (1) High Renewable Output Scenario ( = [8.2 m/s wind, 850 W/m2 irradiance]), characterizing conditions with abundant resources where generation exceeds 120% of nominal capacity; (2) Low Renewable Output Scenario ( = [3.1 m/s, 180 W/m2]) representing critical deficit periods below 40% capacity; and (3) Fluctuating Output Scenario ( = [5.7 ± 2.3 m/s, 450 ± 210 W/m2]) capturing high-variability states with coefficient of variation CV > 0.35. These scenarios are integrated into a chance-constrained optimization framework through probability weights derived from cluster populations, ensuring robust microgrid design while reducing the computational complexity from to for historical observations.
5. Case Studies
This paper is based on a city in northern China. This area is under the influence of high westerly circulation most of the year, with an average wind speed of 2.16 m/s (force 2), and the maximum wind speed of 17 m/s (force 8) occurs on average 9 times over the years, with a maximum of 25 times during the year. The local average monthly temperatures for 2023–2024 are shown in
Figure 3. The average winter temperature is −7 °C to 3 °C, with good sunshine and many sunny days.
The crop grown in the greenhouses of this agricultural microgrid is tomatoes, which are planted in September and October and harvested from November to June. Tomatoes grow best at temperatures ranging from 15 °C to 30 °C, with a temperature range of 20 °C to 25 °C being optimal. The greenhouse uses LED fill-in lights with adjustable power, assuming the same light intensity per square meter inside the greenhouse.
Only refrigeration houses are contained, no freezers. There are 5000 kg of apples stored in every cold room. Apples are stored at a minimum temperature of no less than 0 °C and a maximum temperature of no more than 2 °C. 0.25 kg apple generates 2 J of respiratory heat per hour.
In this paper, an IEEE 33-bus system model is used, which is shown in
Figure 4. There are 10 buildings of three different types in the system, including 4 greenhouses, 4 refrigeration houses and 2 residential houses. There are 11 WTs, 11 PVs and 14 BESS as candidates. The above planning model is solved by calling Yalmip and Cplex packages using MATLAB software.
The outputs of the WTs and PVs under typical day scenarios in winter are shown in
Figure 5.
The planning results of the microgrid are shown in
Table 1. √ means the equipment is planned at that node, while × means that equipment is not planned. Wind turbines are planned at 11 locations and photovoltaic units are planned at 6 locations, where wind power accounts for the mainstay of electric energy to meet the low-carbon energy needs of agriculture. Energy storage units are put into 10 places for to utilize their regulatory capabilities.
To meet the heat energy demands of different buildings in the agricultural microgrid, four groups of EBs and heat storage units are included. They are used to convert the excess power and store the heat energy in the buildings to improve the energy utilizations and mitigate voltage excursions caused by DER generation.
To demonstrate the economic advantages of the proposed data mechanism fusion-driven planning method, a traditional benchmark scheme is established for comparison. In this benchmark, all building loads (including greenhouses, refrigeration houses, and residences) are treated as fixed, non-adjustable loads, neglecting their inherent thermal inertia and regulatory potential. The planning model for the benchmark case only considers the installation of additional energy storage (BESS) and conversion devices (EB, HS) to mitigate voltage excursions caused by DERs, without leveraging the flexibility of building loads.
The average daily investment and operating cost is shown in
Table 2. The percentage of investment in each component is shown in
Figure 6.
The node voltages of the microgrid for a typical day scenario in winter before and after optimization are shown in
Figure 7. The building in
Figure 7a is not designed to be adjustable, whereas the building in
Figure 7b has the capacity for upward and downward adjustment. Comparing
Figure 7a,b shows that the midday voltage in
Figure 7b is significantly lower, i.e., the red part is reduced. This indicates that without flexible building consumption, the voltage will be lifted due to the increase in DER generation. Analyzing
Figure 7b in conjunction with the planning scenario for
Table 1, it can be seen that node 18 has only WT in operation, with low morning loads and relatively high node voltages. In contrast, since nodes 22, 24 and 25 have both wind turbines and photovoltaic units in operation, the high output of the photovoltaic units from 10:00 a.m. to 17:00 a.m. makes the voltage in the middle of the day significantly higher than the other periods. However, the presence of buildings reduces some of the voltages. Especially in nodes 22 and 25, where greenhouses, refrigeration houses and residential buildings have diverse energy demands, the microgrid realizes energy consumption through different buildings, thus mitigating voltage excursions. The voltage fluctuation rate comparison is listed in
Appendix A.
When considering greenhouse temperatures, only the heat field conditions near the bottom crop need to be controlled. Therefore, the model indicators need to be trained and refined by data mechanism fusion.
Temperature data from critical points along the greenhouse cross-section are gathered under standard conditions. MATLAB software is employed to model the comprehensive temperature distribution across the entire cross-sectional area. The solar greenhouse’s structural specifications are as follows: length 50 m × width (inner span) 10 m, ridge height 4.5 m, oriented north–south, with a hot-dip galvanized double-arch welded frame, front roof angle to the ground of 62°, daylighting angle of 15° and rear roof tilt angle of 38°. The rear and hill-facing walls feature a composite structure comprising a 37 cm thick solid red brick shale layer combined with a 5cm thick phenolic board, ensuring structural integrity and insulation efficiency.
Under this planning scenario, greenhouse and refrigeration house temperatures are controlled within the required limits. The electricity/cooling/heating requirements of the residential houses are also met.
Figure 8a,
Figure 8b,
Figure 8c and
Figure 8d are the predicted temperature fields of the greenhouse at 7:00, 12:00, 17:00, and 22:00, respectively. Combined with the prediction of the greenhouse temperature field distribution, the microgrid planning is optimized to enhance the security of microgrids and optimize the utilization of DERs. The optimization results indicate that the planning scheme remains unaltered.
To quantitatively evaluate the superiority of the proposed data mechanism fusion model over the traditional physical model, we compared their prediction performance against actual measurement data collected over 30 days. The Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) were adopted as evaluation metrics, as defined in Equations (54) and (55):
where
is the actual measured temperature,
is the predicted temperature, and
is the number of data points.
The results, summarized in
Table 3, clearly indicate that our proposed model significantly outperforms the traditional physical model, with errors reduced by approximately 74%. This high prediction accuracy is the foundation for the reliability of the subsequent microgrid planning results.