Configuration Optimization of Temperature–Humidity Sensors Based on Weighted Hilbert–Schmidt Independence Criterion in Chinese Solar Greenhouses

: For cost-sensitive Chinese solar greenhouses (CSGs) with an uneven spatial distribution in temperature and humidity, there is a lack of effective strategies for sensor configuration that can reduce sensor usage while monitoring the microclimate precisely. A configuration strategy for integrated temperature–humidity sensors (THSs) based on the improved weighted Hilbert–Schmidt independence criterion (HSIC) is proposed in this paper. The data independence of the THSs in different sites was analyzed based on the improved HSIC, and the selection priority of the THSs was ranked based on the weighted independence of temperature and humidity. Then, according to different cost constraints and monitoring requirements, suitable THSs could be selected sequentially and constitute the monitoring solution. Compared with the original monitoring solution containing twenty-two THSs, the optimized solution used only four THSs (S6, S9 and H6, H5) under strict cost constraints, with a maximum RMSE of the temperature and relative humidity of 0.6 ◦ C and 2.30%, as well as a maximum information gain rate (IGR) of 9.47% and 10.0%. If higher monitoring precision is required, we can increase the THS usage with a greater budget. The optimized solution with six THSs (S6, S9, S8 and H6, H5, H2) could further reduce the maximum RMSE of the temperature and relative humidity to 0.33 ◦ C and 1.10% and the IGR to 6.9% and 8.7%. This indicated that the proposed strategy could use much fewer THSs to achieve accurate and comprehensive monitoring, which would provide efficient and low-cost solutions for CSG microclimate monitoring.


Introduction
As of 2022, Chinese solar greenhouses (CSGs) account for 29% of Chinese facilities' agriculture cultivation areas.During cold winters, these are an important way to provide fresh vegetables and fruits in the Huang-Huai-Hai Plain and northeast and northwest regions of China [1].Proper temperature and relative humidity are two key microclimate factors for guaranteeing crop yield and quality in CSGs [2].The accurate and comprehensive monitoring of temperature and humidity is a prerequisite for the precise regulation of the CSG microclimate and proper field management.
Nowadays, there are three main ways to monitor the aforementioned factors: single point, uniformly multi-point and physically topological point.Benefitting from the advantages of low cost and easy operation, the monitoring method with a single offline thermo-hygrometer placed in the middle point of the greenhouse is widely used in practical production.However, it can only use single-point data to represent the mean temperature and relative humidity of the greenhouse and cannot reflect the spatial distribution of temperature and humidity.Recent research revealed that there are large spatial variations in temperature and humidity inside the greenhouse, which significantly influences crop physiological activity [3].In the absence of rational microclimate regulation, the temperature and humidity heterogeneity could cause obvious differences in the crop yield and quality in different regions in the greenhouses [4], and low local temperatures could cause irreversible crop damage or even death [5,6].Therefore, in order to purposely regulate the greenhouse microclimate, not only the mean value but also the spatial distribution of temperature and humidity should be monitored.In order to fully reflect the spatial information, uniformly multi-point [7,8] and physically topological-point methods [9,10] were adopted in some studies.What these two methods have in common is that a large number of sensors, up to 900 sensors, were deployed.Thus, these two methods are more suitable for cost-insensitive study scenarios but are not suitable for cost-sensitive practical greenhouses.It is essential to optimize sensor configuration for the accurate and comprehensive monitoring of temperature and humidity with as few sensors as possible, which would meet the dual requirements of precise monitoring and low investment.
Many meaningful works for optimizing sensor configuration have been carried out in both facility agriculture scenarios and similar industrial scenarios.According to the analysis of temperature spatial distribution based on a computational fluid dynamics (CFD) simulation, the temperature sensors were suggested to be installed at the locations with the lowest gradient of temperature and airflow velocity [11,12].With the objective of maximizing the monitoring coverage area, temperature sensors and air pollutant sensors were optimized for deployment, respectively, which helped the full exploitation of the monitoring capabilities of the sensor unit and reduced sensor quantity [13][14][15][16].With the rapid development of machine learning and artificial intelligence, the configurations of temperature sensors and CO 2 sensors were, respectively, optimized in large stadiums and other buildings with the use of different algorithms [17][18][19].Furthermore, information entropy theory is also a feasible and promising method to optimize sensor configuration and save sensor consumption [20].
Because integrated temperature-humidity sensors (THSs) are widely used in CSGs under strict cost constraints, the above-mentioned methods for single-point sensors are not applicable to THS configuration optimization.Therefore, to achieve the accurate and comprehensive monitoring of a CSG microclimate with limited THS quantity, we propose a THS configuration strategy based on the improved weighted Hilbert-Schmidt independence criterion (HSIC).The independence of different sensors from different areas was evaluated based on the HSIC.Then, the selection priority of the THSs was determined according to the independence ranking.Furthermore, a dual threshold value was proposed and used to determine the number of sensors deployed, which was calculated according to the RMSE and IGR.Finally, the effectiveness of the proposed configuration strategy was verified with test data.

Introduction of the Test CSG
The implementation and validation of the proposed configuration strategy need temperature and humidity data, so a test CSG for data collection is used in this work.It is located at South Campus of Shandong Agricultural University, Tai'an, Shandong Province, China (117.16 • E, 36.16 • N), where the outdoor temperature is about −15~10 • C and the indoor temperature is about 5~30 • C in winter.The CSG is planted with tomatoes, with an east-west length of 70 m, a north-south span of 10 m, a ridge height of 5 m, and a north wall height of 3.5 m.The walls are made of brick and concrete, the arches are made of steel, and the covering layer is polyvinyl chloride film, which is covered with an insulation blanket at night for heat preservation.The CSG has two vents at the top and bottom, respectively, and the lower vent is generally closed in winter, and only part of the upper vent is opened for short-term ventilation in the morning when there is sufficient sunlight.The experiment was carried out during a typical cold period in winter (9 December 2020 to 6 January 2021), and 28-day temperature and humidity data were collected for THS configuration optimization.

Temperature and Humidity Data Collection in the CSG
To collect temperature and humidity data, an IoT system was developed and deployed in the experimental CSG.The system included 22 THSs (DB171-10, Dalian North Measurement and Control, temperature accuracy: ±0.3 • C, range from −20 • C to 80 • C, relative humidity accuracy: 2%, range from 0~100%), 3 data transfer units (DTUs) (H7710D, Shenzhen Hongdian, Shenzhen, China), and a data monitoring platform, as shown in Figure 1a.At the same time, outdoor meteorological-environment-monitoring equipment was installed outside the CSG.
respectively, and the lower vent is generally closed in winter, and only part of the upper vent is opened for short-term ventilation in the morning when there is sufficient sunlight.The experiment was carried out during a typical cold period in winter (9 December 2020 to 6 January 2021), and 28-day temperature and humidity data were collected for THS configuration optimization.

Temperature and Humidity Data Collection in the CSG
To collect temperature and humidity data, an IoT system was developed and deployed in the experimental CSG.The system included 22 THSs (DB171-10, Dalian North Measurement and Control, temperature accuracy: ±0.3 °C, range from −20 °C to 80 °C, relative humidity accuracy: 2%, range from 0~100%), 3 data transfer units (DTUs) (H7710D, Shenzhen Hongdian, Shenzhen, China), and a data monitoring platform, as shown in Figure 1a.At the same time, outdoor meteorological-environment-monitoring equipment was installed outside the CSG.Through the preliminary analysis of the measured data, we found that the temperature and humidity gradient in the north-south vertical direction of the CSG is larger than that in the east-west horizontal direction.Therefore, inspired by the uniform grid configuration, the iso-gradient mode is used in the initial THS configuration.A north-south vertical plane and an east-west horizontal plane were then selected for sensor deployment.The irregular vertical plane is divided by 3 lines along the north-south direction and 6 lines along the vertical direction.Then there are 14 intersection points, each of which has Through the preliminary analysis of the measured data, we found that the temperature and humidity gradient in the north-south vertical direction of the CSG is larger than that in the east-west horizontal direction.Therefore, inspired by the uniform grid configuration, the iso-gradient mode is used in the initial THS configuration.A north-south vertical plane and an east-west horizontal plane were then selected for sensor deployment.The irregular vertical plane is divided by 3 lines along the north-south direction and 6 lines along the vertical direction.Then there are 14 intersection points, each of which has a THS.With respect to the horizontal plane, considering the structural symmetry of the solar greenhouse and the small temperature and humidity gradient, 8 THSs were equally spaced along the east-west direction.Considering the average height of the crop canopy during the monitoring period, the mounting height of the THSs in the east-west direction was set at 1.2 m, as shown in Figure 1b.Temperature and humidity data were collected continuously every 5 min for 28 days × 24 h.After data cleaning and pre-processing, 7880 × 22 sets of temperature and humidity data were obtained.

Optimization Strategy for THS Configuration
The aim of the THS configuration optimization is to achieve high monitoring accuracy and information abundance of temperature and humidity while reducing the THS usage.Firstly, the weighted HSIC algorithm is proposed to evaluate the priority of all the THSs and to rank them.Secondly, the number of THSs required is determined according to the monitoring accuracy and information abundance requirements.Thirdly, the specific THSs are selected based on the ranking and the required number.Finally, the appropriate THS configurations are constructed to meet different cost constraints and monitoring accuracy requirements.

Monitoring Independence of the THSs
The CSG usually has a large interior space, 2400 m 3 for our test, surrounded by several boundaries with different thermal characteristics.Meanwhile, all sub-areas of the CSG are interconnected and interact with each other.The temperature and humidity information from different sites has a high spatial correlation.There is information redundancy of different THSs, which provides improvement potential for optimizing THS configuration and reducing the usage.For the complex nonlinear characteristic of CSG microclimate, we proposed an improved weighted HSIC strategy to optimize the THS configuration.
The HSIC is a kernel-based independence measure, with the advantages of fast convergence, robustness, and low sensitivity to outliers, which is simpler than other kernel-based independence measures and does not require regularization [21].It can better reflect the nonlinear correlation between the variables [8].The basic principle is to compute the Hilbert-Schmidt mutual covariance operator on the reproducing kernel Hilbert space (RKHS), and the independence judgment criterion is obtained by empirically estimating the norm of this operator.The higher the calculated HSIC value, the lower the independence of the two variables, i.e., the greater the dependence of the two variables [22].The HSIC can be used as a basic assessment of the independence of THSs at different sites in the CSG.
Suppose that, X, Y are two sets of observable variables, x ∈ X, y ∈ Y, f (x) and g(y) are two functions that take elements from the sets X and Y as inputs, respectively, and map them onto the set of real numbers R, where f (x), g(y) ∈ R, and p(x, y) is the probability density functions of x and y, then for any f (x), g(y) ∈ R, the covariance operator is: If p(x, y) = p(x)p(y), C[ f , g]= 0, x and y are independent of each other.And, conversely, if p(x, y) ̸ = p(x)p(y), then C[ f , g] ̸ = 0, x and y are correlated.The loss function reflects the degree of mutual independence between two variables x and y.Define the loss function as: (2) To ensure the traversal of f (x), g(y), all possible f (x) compose vector spaces F, all possible g(y) compose vector spaces G, F and G belong to the RKHS; define the nonlinear mapping spaces φ(x) ∈ F, ψ(y) ∈ G, and x ∈ X, y ∈ Y and the corresponding kernel functions are denoted as follows, respectively [23].
where φ 1 , φ 2 , • • • are all the eigenfunctions of the kernel function K X that form a set of orthogonal bases of the space F, and α 1 , α 2 , • • • are all the eigenvalues of the kernel function K X that are positive.Similarly, ψ 1 , ψ 2 , • • • are all the eigenfunctions of the kernel function K Y that form a set of orthogonal bases of the space G, and β 1 , β 2 , • • • are all the eigenvalues of the kernel function K Y that are positive.Therefore, f (x), g(y) can be expressed as a linear combination of orthogonal bases, and the loss function in Equation ( 2) can be expressed as: Since the eigenvalues in Equation ( 3) are all positive, modify the loss function by using the eigenvalues as weights: Combine Equation ( 6) with Equation ( 3), which yields: Bringing the kernel function of Equation ( 4) into Equation ( 7), the independence measure coefficient HSIC between variables X, Y is defined as: Equation (8) indicates that the correlation between X, Y can be calculated using the sampled data.Since the kernel functions K X , K Y are differentiable, Equation ( 8) can be regarded as a direct and clear indicator.In the actual calculation, among many available kernel functions, the Gaussian kernel function is commonly used and well adapted, i.e.,: Under a finite sample, where K X (x i , x j ), K Y (y i , y j ) is a symmetric matrix of n × n, so that Equation ( 10) can be written as: where Tr(K X K Y ) is the trace function.
Similarly, the matrix form of the second and third terms of Equation ( 8) can be rewritten as trace functions, and Equation ( 8) is rewritten as: where Replacing the biased estimate from Equation ( 12) with the unbiased estimate gives Equation ( 13): Define Z as the data set consisting of two temperature (or humidity) variables: Then, the measure of the degree of independence between two variables is: Equation ( 15) provides a quantitative basis for prioritizing the THSs at different sites.

Selection Priority of the THSs
As the monitoring functions of temperature and humidity are integrated in one sensor hardware, the optimization of the THS configuration should consider the degree of independence of temperature sets and humidity sets simultaneously.According to the HSIC algorithm, the HSIC matrix of temperature data and the HSIC matrix of humidity data of different THSs could be calculated separately.Then, the integrated independence coefficient (IIC) is proposed to represent the temperature and humidity independence of each THS with respect to all the others.The IIC of a THS r(s, S) is defined as follows: where s is an individual THS, S is the set of all the others, |S| is the number of THSs in the set, h T is the HSIC between two temperature variables, h H is the HSIC between two humidity variables, S T is all the temperature sensors, s Ti is the ith temperature sensor, s T is all temperature sensors except the ith temperature sensor, S H is all the humidity sensors, s Hi is the ith humidity sensor, s H is all humidity sensors except the ith humidity sensor, ω T is the temperature weighting factor, and ω H is the humidity weighting factor.Both temperature and humidity parameters are critical to the growth and development of the crop in the CSG, and ω T and ω H are equally weighted as 0.5.
According to the definition of the r(s, S), when r(s, S)= 0, it means that the temperature and humidity data of the THS s are independent from those of the THS set S, and the s could provide the most additional information next to the set S. In other words, the larger r(s, S) is, the less independence between the s and the set S, i.e., the stronger the connection between the s and the set S. If only one THS is selected to characterize the information of the whole CSG, the THS with the highest IIC value should be selected.If more additional information about the spatial distribution of temperature and humidity is required, the next THS with the lowest IIC value should be selected.Therefore, in order to accurately and comprehensively monitor both the overall trend and the spatial distribution of the CSG microclimate, the ranking algorithm is designed to prioritize the selection of THSs.
(1) Definition of basic parameters: The set of all THSs is defined as S tot , containing n THSs; the set of sorted THSs is defined as U sort , and the set of unsorted THSs is defined as U unsort .
(2) Selection of the first THS: The first THS should be able to represent the overall trend of temperature and humidity, i.e., convey the maximum information of the CSG microclimate.Therefore, the THS with the highest IIC value should be selected.As shown in Equations ( 17) and ( 18), the first element in the set U sort is sorted as s 1 .
(3) Selection of the ith (1 < i ≤ n) THS: After selecting the first one, the spatial distribution information of temperature and humidity should be mainly considered in the selection of the following ones.Then, the THS s i in U unsort with the smallest IIC value in terms of U sort is selected, which can maximize the information abundance of temperature and humidity spatial distribution, as shown in Equation (19).And the expressions of the set U sort and U unsort are updated as shown in Equations ( 20) and (21).
(4) If i < n, repeat step (3) and sort the THSs from the set U unsort to U sort one by one.
(5) If i = n, it means that priority sorting for all THSs has been completed, and then

Determination of the Required THS Quantity
Considering the cost sensitivity of CSG monitoring, we should use as few THSs as possible while meeting the monitoring requirements.To quantify the different requirements of monitoring accuracy and information abundance, the root mean squared error (RMSE) and the information gain rate (IGR) are two indicators to determine the required THS quantity n select .Based on the n select and THS selection priority, the specific THSs are selected from the set U sort and constitute the suitable monitoring solution.The set of selected THSs is defined as U select , whose element quantity is denoted as |U select |. (

1) Monitoring accuracy
The RMSE of the monitoring value of the selected THSs with respect to that of all the THSs is used as the monitoring accuracy evaluation index.And the RMSE threshold value is set as err, the value of which is determined according to different requirements.If the RMSE of the selected THSs is less than the threshold value err, the selection is stopped.The numbers of temperature sensors and humidity sensors are determined separately, and the larger one of them is the required number of THSs, as in Equation (22).
where n T1 is the number of temperature sensors, n RH1 is the number of humidity sensors, n 1 is the number of THSs selected according to the monitoring accuracy requirements, n is the number of all sensors, T ave−n T1 is the average temperature of n T1 sensors ( • C), T ave−tot is the average temperature of all sensors ( • C), RH ave−n RH1 is the average relative humidity of n RH1 sensors (%), RH ave−tot is the average relative humidity of all sensors (%), err T is the temperature monitoring accuracy threshold ( • C), err RH is the humidity-monitoring accuracy threshold (%). (

2) Information abundance based on information entropy
To comprehensively reflect the spatial distribution of temperature and humidity in the CSG, the selected THSs should provide sufficient information abundance.IGR is used to characterize the abundance based on the information theory of Shannon.
The entropy of the discrete variable X is defined as the mean value of the selfinformation, which is calculated using Equation ( 23) [24]: where p(x i The conditional entropy between two variables X and Y is defined as the mean of the conditional self-information, From Equations ( 23) and ( 24), the information gain obtained by the additional variable Y compared to the original variable X is: The IGR is used to measure the information gain contributed by the newly added data.Using temperature as an example, the IGR can be expressed as [25][26][27] the information gain obtained from the new data set Y compared to the original temperature data set X, The above analysis quantitively indicates that the temperature and humidity information introduced by new sensors will gradually decrease, i.e., the marginal utility of newly added sensors will be very low after the IGR threshold is exceeded.Therefore, if the IGR of the newly added sensor s n T2 +1 is less than the threshold IGR T threshold , it is not necessary to continue adding sensors s n T2 +1 and the required number of temperature sensors is n T2 , as shown in Equation (27).Similarly, the number of humidity sensors n RH2 can be determined in Equation (28), and the number of THSs n 2 should be the greater of n T2 and n RH2 , as in Equation (29).
n T2 = arg min (3) Construction of monitoring solution In order to meet the requirements of monitoring accuracy and information abundance, the final number of THSs required n select should be the greater of n 1 and n 2 .The independence calculation, priority ranking, and THS selection form the complete optimal strategy for CSG THS configuration, as in Figure 2.  Through the analysis of 14 × 7880 sets of temperature and humidity data measured by 14 THSs in the north-south vertical direction, it was found that the temperature and humidity at different sites had the same trend with time but showed obvious spatial differences.The maximum temperature difference at the same time could be more than 6 °C, and the standard deviation of temperature data could be more than 2 °C, as shown in Figure 3a.Similarly, the maximum difference and standard deviation of relative humidity data could be more than 15% and 7%, respectively, as in Figure 3c.To quantitatively represent the spatial heterogeneity of temperature and humidity, the coefficient of variation (CV) is used as a characteristic index.As is well known, when the CV tends towards 0, the uniformity becomes better; when it tends towards 1, the uniformity becomes worse.Through the analysis of 14 × 7880 sets of temperature and humidity data measured by 14 THSs in the north-south vertical direction, it was found that the temperature and humidity at different sites had the same trend with time but showed obvious spatial differences.The maximum temperature difference at the same time could be more than 6 • C, and the standard deviation of temperature data could be more than 2 • C, as shown in Figure 3a.Similarly, the maximum difference and standard deviation of relative humidity data could be more than 15% and 7%, respectively, as in Figure 3c.To quantitatively represent the spatial heterogeneity of temperature and humidity, the coefficient of variation (CV) is used as a characteristic index.As is well known, when the CV tends towards 0, the uniformity becomes better; when it tends towards 1, the uniformity becomes worse.

Temperature and Humidity Heterogeneity in the CSG
The temperature and humidity inside the CSG vary under the influence of solar radiation, ventilation, physiological activities of the crop, etc.Therefore, the heterogeneity of temperature and humidity also varies with time.Based on statistical data, the mean and median temperature CVs are both over 8.2%, and the mean and median relative humidity CVs are both over 9.8%.Meanwhile, during the dynamics of the CSG microclimate, the maximum CVs of temperature and relative humidity could reach 20.7% and 29.5%, respectively, indicating that there is obvious spatial heterogeneity of temperature and humidity along the vertical direction.This is because warm air heated by solar radiation is driven vertically upwards by the thermal buoyancy, resulting in higher temperature in the upper part of the CSG.At the same time, crop transpiration and soil evaporation generate large amounts of water vapor near the ground surface, resulting in higher humidity in the lower part.The temperature and humidity inside the CSG vary under the influence of so diation, ventilation, physiological activities of the crop, etc.Therefore, the heterog of temperature and humidity also varies with time.Based on statistical data, the me median temperature CVs are both over 8.2%, and the mean and median relative hu CVs are both over 9.8%.Meanwhile, during the dynamics of the CSG microclima maximum CVs of temperature and relative humidity could reach 20.7% and 29.5 spectively, indicating that there is obvious spatial heterogeneity of temperature a midity along the vertical direction.This is because warm air heated by solar radia driven vertically upwards by the thermal buoyancy, resulting in higher tempera the upper part of the CSG.At the same time, crop transpiration and soil evaporatio erate large amounts of water vapor near the ground surface, resulting in higher hu in the lower part.
In addition, from Figure 3b,d, the acquired temperature and humidity data con many outliers, and the temperature outliers are all high-end values that appeared at while the humidity outliers are all low-end values at the same time, indicating th microclimate variation fluctuates more when the solar radiation is high.

Temperature and Humidity Heterogeneity in the East-West Horizontal Direc
Through the analysis of 8 × 7880 sets of temperature and humidity data measu eight THSs in the east-west horizontal direction, it was found that the temperatu humidity in the horizontal direction had similar temporal periodicity and spatial va ity as that in the vertical direction.The maximum CVs of temperature and relative h ity could reach 12.9% and 17.1%, respectively, and the maximum temperature and r humidity differences could reach 3.3 °C and 15%, respectively, as in Figure 4a,c.In tion, similar to Figure 3b,d, we can also find many outliers in Figure 4b,d, whose a ance also occurs during the time period of highest solar radiation intensity, sugg that temperature and humidity fluctuate more dramatically around noon.Therefore  In addition, from Figure 3b,d, the acquired temperature and humidity data contained many outliers, and the temperature outliers are all high-end values that appeared at noon, while the humidity outliers are all low-end values at the same time, indicating that the microclimate variation fluctuates more when the solar radiation is high.

Temperature and Humidity Heterogeneity in the East-West Horizontal Direction
Through the analysis of 8 × 7880 sets of temperature and humidity data measured by eight THSs in the east-west horizontal direction, it was found that the temperature and humidity in the horizontal direction had similar temporal periodicity and spatial variability as that in the vertical direction.The maximum CVs of temperature and relative humidity could reach 12.9% and 17.1%, respectively, and the maximum temperature and relative humidity differences could reach 3.3 • C and 15%, respectively, as in Figure 4a,c.In addition, similar to Figure 3b,d, we can also find many outliers in Figure 4b,d, whose appearance also occurs during the time period of highest solar radiation intensity, suggesting that temperature and humidity fluctuate more dramatically around noon.Therefore, if we want to comprehensively monitor CSG microclimate, the horizontal heterogeneity should not be ignored either.
The above analysis revealed the spatial heterogeneity of the CSG microclimate, so it is necessary to use multiple THSs to comprehensively reflect the spatial distribution of temperature and humidity.Meanwhile, considering the spatial correlation of temperature and humidity in neighboring locations, there is information redundancy in neighboring THSs.Therefore, it provides optimization potential for THS configuration with the aim of reducing the number of THSs and saving costs.
The above analysis revealed the spatial heterogeneity of the CSG microclimat is necessary to use multiple THSs to comprehensively reflect the spatial distribut temperature and humidity.Meanwhile, considering the spatial correlation of tempe and humidity in neighboring locations, there is information redundancy in neigh THSs.Therefore, it provides optimization potential for THS configuration with the reducing the number of THSs and saving costs.

Priority Ranking for the THSs
Based on the priority ranking strategy in Section 2.3.2, the priority ranking res the 14 THSs in the vertical direction and the 8 THSs in the horizontal direction determined, respectively, as shown in Tables 1 and 2.

Priority Ranking for the THSs
Based on the priority ranking strategy in Section 2.3.2, the priority ranking results of the 14 THSs in the vertical direction and the 8 THSs in the horizontal direction can be determined, respectively, as shown in Tables 1 and 2. As shown in Figure 5, the RMSE showed a decreasing trend as the number of THSs selected increased.At the same time, the IGR introduced by new THSs also showed a decreasing trend, as shown in Figure 6, which indicates that the information gain contributed by newly added THSs is very limited when the number of THSs exceeds a certain quantity.The specific THSs could then be selected according to different cost budgets and accuracy requirements.Two configuration scenarios are presented here as the examples: low monitoring requirement with a low cost budget; high monitoring requirement with a high cost budget.

Determination of THS Quantity Required
As shown in Figure 5, the RMSE showed a decreasing trend as the number of THSs selected increased.At the same time, the IGR introduced by new THSs also showed a decreasing trend, as shown in Figure 6, which indicates that the information gain contributed by newly added THSs is very limited when the number of THSs exceeds a certain quantity.The specific THSs could then be selected according to different cost budgets and accuracy requirements.Two configuration scenarios are presented here as the examples: low monitoring requirement with a low cost budget; high monitoring requirement with a high cost budget.(1) Low monitoring requirement scenario under low cost budget: For illustration, the RMSE thresholds for temperature and relative humidity are set to 1 °C and 10%, respectively, and the IGR threshold is set to 10%.As in Figure 5a, the RMSE of temperature and relative humidity are 0.64 °C and 1.9%, respectively, with the first ranked THS S6 in the vertical direction, which meets the temperature-and humidity-monitoring requirement.However, THS S6 cannot meet the IGR requirement by itself, so the second ranked THS S9 is also needed to improve the information abundance.From Figure 6a, the temperature IGR and relative humidity IGR of the next THS are 5.5% and 7.5%, respectively, both of which meet the IGR threshold of 10%.Therefore, in this scenario, two THSs, S6 and S9, are selected in the vertical direction to meet the dual requirements of monitoring accuracy and information abundance, i.e., vertical 2 n = .Similarly, according to the RMSE threshold, the first ranked THS H6 in the hori direction could meet the monitoring accuracy requirements, whose RMSEs of tem ture and relative humidity are 0.62 °C and 2.4%, respectively.In addition, the TH alone cannot meet the IGR requirement, so the second ranked THS H5 must be sel Then, the temperature IGR and relative humidity IGR of the next THS are 7.3% and respectively, meeting the IGR requirement of 10%.Therefore, in this scenario, two H6 and H5, should be selected in the horizontal direction, i.e., horizontal 2 n = .
(2) Scenario with high monitoring requirement under high cost budget: Wh monitoring requirements of temperature and humidity are high, the cost budget s be correspondingly improved.Then, the RMSE thresholds of temperature and re humidity are set at 0.5 °C and 5%, respectively, and the IGR threshold is set at 7%. scenario, as in Figure 5a, when the THS number is increased to three in the vertical tion, the RMSEs of temperature and relative humidity are 0.34 °C and 0.92%, respec which both meet the RMSE thresholds.Meanwhile, the temperature IGR and the re humidity IGR of the next THS are 4.4% and 4.3%, respectively, both below the thre value of 7%.In summary, to meet the strict requirements of monitoring accuracy a formation abundance, the top three THSs in the vertical direction should be selecte (1) Low monitoring requirement scenario under low cost budget: For illustration, the RMSE thresholds for temperature and relative humidity are set to 1 • C and 10%, respectively, and the IGR threshold is set to 10%.As in Figure 5a, the RMSE of temperature and relative humidity are 0.64 • C and 1.9%, respectively, with the first ranked THS S6 in the vertical direction, which meets the temperature-and humidity-monitoring requirement.However, THS S6 cannot meet the IGR requirement by itself, so the second ranked THS S9 is also needed to improve the information abundance.From Figure 6a, the temperature IGR and relative humidity IGR of the next THS are 5.5% and 7.5%, respectively, both of which meet the IGR threshold of 10%.Therefore, in this scenario, two THSs, S6 and S9, are selected in the vertical direction to meet the dual requirements of monitoring accuracy and information abundance, i.e., n vertical = 2.
Similarly, according to the RMSE threshold, the first ranked THS H6 in the horizontal direction could meet the monitoring accuracy requirements, whose RMSEs of temperature and relative humidity are 0.62 • C and 2.4%, respectively.In addition, the THS H6 alone cannot meet the IGR requirement, so the second ranked THS H5 must be selected.Then, the temperature IGR and relative humidity IGR of the next THS are 7.3% and 7.0%, respectively, meeting the IGR requirement of 10%.Therefore, in this scenario, two THSs, H6 and H5, should be selected in the horizontal direction, i.e., n horizontal = 2.
(2) Scenario with high monitoring requirement under high cost budget: When the monitoring requirements of temperature and humidity are high, the cost budget should be correspondingly improved.Then, the RMSE thresholds of temperature and relative humidity are set at 0.5 • C and 5%, respectively, and the IGR threshold is set at 7%.In this scenario, as in Figure 5a, when the THS number is increased to three in the vertical direction, the RMSEs of temperature and relative humidity are 0.34 • C and 0.92%, respectively, which both meet the RMSE thresholds.Meanwhile, the temperature IGR and the relative humidity IGR of the next THS are 4.4% and 4.3%, respectively, both below the threshold value of 7%.In summary, to meet the strict requirements of monitoring accuracy and information abundance, the top three THSs in the vertical direction should be selected, i.e., n vertical = 3.
Similarly, when the number of THSs in the horizontal direction is increased to three, the RMSEs of temperature and relative humidity are 0.30 • C and 2.4%, respectively, and the IGRs of temperature and relative humidity are 5.9% and 4.1%, respectively, which meet the monitoring requirement.Therefore, in order to meet the high requirements of monitoring accuracy and information abundance, the top three THSs in the horizontal direction should be selected, i.e., n horizontal = 3.

Optimal Configuration of the THSs Required
Based on the above analysis, in the case of a low cost budget, the number of THSs is set to two in both the vertical and the horizontal directions.According to the priority ranking results, the specific THSs S6 and S9 in the vertical direction and H6 and H5 in the horizontal direction could meet the dual objectives of cost constraint and monitoring requirement; in the case of a high monitoring requirement, additional THSs S8 and H2 should be added in the vertical and horizontal directions, respectively, to improve the monitoring accuracy and information abundance of the CSG monitoring, summarized in Table 3.
Table 3. Optimal THS configuration in the CSG.

Validation of the Optimal Configuration Strategy (1) Characterization performance
To verify the effectiveness of the proposed strategy, the monitoring RMSE and IGR of the THS configuration produced according to the strategy were compared with those of all the THSs, as shown in Figure 7 and Table 4.For both the THS configuration with low cost budget and that with high monitoring requirement, the temperature and humidity curves of the selected THSs are in good agreement with those of all THSs, as in Figure 7.For the THS configuration with low cost budget, the RMSE and IGR of temperature and relative humidity are all below the thresholds (RMSE: 1 • C and 10%; IGR: both 10%), thus satisfying the dual requirements of monitoring accuracy and information abundance.If the monitoring requirements are higher, additional THSs S8 and H2 will be added to the monitoring configuration, further reducing the RMSE and IGR.The comparison showed that the optimized monitoring configuration with only four or six THSs could accurately and comprehensively monitor the CSG microclimate with good characterization performance.The proposed strategy could significantly reduce THS usage and save monitoring investment.(

2) Comparative advantage
To validate the comparative advantage of the optimized monitoring configurations, the RMSE and IGR of eight randomly composed THS configurations were calculated, also in Table 4.The number of THSs in the eight configurations is the same as in the optimized configurations.Of the 16 monitoring RMSE values of the random configurations, 11 values were larger than those of the optimized configurations, indicating that the optimized configurations have the better monitoring accuracy.Furthermore, the IGR values of the random configurations are about twice those of the optimized ones, showing that the optimized configurations could comprehensively monitor the CSG microclimate much better.

Discussion
Compared to the single-point monitoring method, the proposed configuration strategy could reflect the spatial heterogeneity of the CSG microclimate with a few THSs.It is cost-effective to achieve accurate and comprehensive microclimate monitoring with a small increase in investment.Compared with the uniform multi-point monitoring method [7,8], the proposed strategy could significantly reduce the use of THSs and save investment, while maintaining monitoring accuracy and comprehensiveness within acceptable error.Taking the test CSG in this paper as an example, we used only four or six THSs to achieve accurate and comprehensive monitoring.Compared to the original monitoring configuration with 22 THSs, the proposed strategy could achieve a 72.7~81.8%reduction in THS usage.It indicates that the proposed strategy has the potential to provide efficient and cost-effective monitoring solutions for cost-sensitive CSGs.
Moreover, there are some necessary works in future study to further improve the effectiveness and universality of the proposed strategy.The optimal configuration strategy was constructed based on the spatial heterogeneity of the CSG microclimate, without considering the temporal variety of the microclimate.In practice, the temporal diversity interacts with the spatial distribution.As shown in Section 3.1, the spatial heterogeneity during the night is much less than that during the daytime.To further improve the adaptability and effectiveness of the proposed strategy, it is essential to integrate the effects of spatial distribution and temporal variety of the microclimate.Meanwhile, the proposed strategy has only been validated with the test data collected from a tomato CSG.There are differences in the morphological characteristics and physiological activities of different crops.The tomato crop does not adequately represent the effects of different crops on the CSG microclimate.We plan to verify and improve the universality of the strategy using the microclimate data sampled from CSGs with other crops, such as cucumber, eggplant, and lettuce.

Conclusions
In order to accurately monitor the CSG microclimate under cost constraints, an optimal THS configuration strategy was proposed based on the improved weighted HSIC.The selection priority of all the THSs was ranked based on the relative independence values, which were calculated according to the improved HSIC matrix.To simultaneously meet the monitoring requirement and the cost constraint, the appropriate number of THSs would be selected sequentially to form the suitable monitoring solution.Under strict cost constraints, the monitoring solution with four THSs (S6, S9 and H6, H5) was constructed based on the proposed strategy, with the maximum RMSE of 2.3%, 0.6 • C and the maximum IGR of 9.47%, 10% for temperature and relative humidity, respectively.For a higher monitoring requirement, additional THSs S8, H2 should be added to the solution, further reducing the monitoring RMSE and IGR.The proposed strategy could significantly reduce THS usage and save investment, while maintaining the monitoring accuracy and comprehensiveness.It could provide a theoretical reference and cost-effective solution for CSG microclimate monitoring.

Figure 1 .
Figure 1.Temperature-and humidity-monitoring system in the CSG.(a) Temperature and humidity data collected with IoT platform; (b) THS spatial layout diagram.Note: S1~S14 and H1~H8 represent the sensors located in north-south vertical direction and in east-west horizontal direction, respectively.

Figure 1 .
Figure 1.Temperature-and humidity-monitoring system in the CSG.(a) Temperature and humidity data collected with IoT platform; (b) THS spatial layout diagram.Note: S1~S14 and H1~H8 represent the sensors located in north-south vertical direction and in east-west horizontal direction, respectively.

3. 1 . 1 .
Temperature and Humidity Heterogeneity in the North-South Vertical Direction

Figure 2 .
Figure 2. Flow chart of THS configuration optimization.

1 .
Temperature and Humidity Heterogeneity in the CSG 3.1.1.Temperature and Humidity Heterogeneity in the North-South Vertical Direction

Figure 3 .
Figure 3. Temperature and humidity in the north-south vertical direction (9 December-6 Ja (a) Temperature in the vertical direction; (b) Box chart of temperature in the vertical direct Relative humidity in the vertical direction; (d) Box chart of relative humidity in the vertical di

Figure 3 .
Figure 3. Temperature and humidity in the north-south vertical direction (9 December-6 January).(a) Temperature in the vertical direction; (b) Box chart of temperature in the vertical direction; (c) Relative humidity in the vertical direction; (d) Box chart of relative humidity in the vertical direction.

Figure 4 .
Figure 4. Temperature and humidity in the east-west horizontal direction (9 December-6 Ja (a) Temperature in the horizontal direction; (b) Box chart of temperature in the horizontal dir (c) Relative humidity in the horizontal direction; (d) Box chart of relative humidity in the hor direction.

Figure 4 .
Figure 4. Temperature and humidity in the east-west horizontal direction (9 December-6 January).(a) Temperature in the horizontal direction; (b) Box chart of temperature in the horizontal direction; (c) Relative humidity in the horizontal direction; (d) Box chart of relative humidity in the horizontal direction.

Figure 5 .
Figure 5.The RMSE curve of temperature and humidity with the increase in THS quantity.(a) RMSE variation with the increase in THS quantity in vertical direction; (b) RMSE variation with the increase in THS quantity in horizontal direction.

Figure 5 .Figure 6 .
Figure 5.The RMSE curve of temperature and humidity with the increase in THS quantity.(a) RMSE variation with the increase in THS quantity in vertical direction; (b) RMSE variation with the increase in THS quantity in horizontal direction.Agriculture 2024, 14, x FOR PEER REVIEW 1

Figure 6 .
Figure 6.The IGR curve of temperature and humidity with the increase in THS quantity.(a) IGR variation with the increase in THS quantity in vertical direction; (b) IGR variation with the increase in THS quantity in horizontal direction.

Figure 7 .
Figure 7.The temperature and relative humidity curves with different THS combinations (12 January).(a) The temperature curves in the vertical direction; (b) The relative humidity curves in the vertical direction; (c) The temperature curves in the horizontal direction; (d) The relative humidity curves in the horizontal direction.

Figure 7 .
Figure 7.The temperature and relative humidity curves with different THS combinations (12 January).(a) The temperature curves in the vertical direction; (b) The relative humidity curves in the vertical direction; (c) The temperature curves in the horizontal direction; (d) The relative humidity curves in the horizontal direction.

Table 1 .
THS ranking in the vertical direction.

Table 2 .
THS ranking in the horizontal direction.

Table 1 .
THS ranking in the vertical direction.

Table 2 .
THS ranking in the horizontal direction.

Table 4 .
Validation results of THS configurations.

Table 4 .
Validation results of THS configurations.
Note: # represents the optimized configuration; * represents the randomly composed ones.