Next Article in Journal
Application of Compost as an Organic Amendment for Enhancing Soil Quality and Sweet Basil (Ocimum basilicum L.) Growth: Agronomic and Ecotoxicological Evaluation
Previous Article in Journal
Mechanistic Insights into Farmland Soil Carbon Sequestration: A Review of Substituting Green Manure for Nitrogen Fertilizer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Two-Level Distributed Multi-Source Information Fusion Model for Aphid Monitoring and Forecasting in the Greenhouse

1
College of Mechanical Engineering, Yangzhou University, Yangzhou 225127, China
2
Vegetable (Root Vegetable) Fully Mechanized Research Base, Ministry of Agriculture and Rural Affairs, Yangzhou 225009, China
3
Jiangsu Lixiahe Agricultural Science Research Institute, Yangzhou 225008, China
4
Yangzhou Agricultural Technology Comprehensive Service Center, Yangzhou 225101, China
5
Nanjing Research Institute for Agricultural Mechanization of National Ministry of Agriculture, Nanjing 210014, China
*
Author to whom correspondence should be addressed.
Agronomy 2025, 15(5), 1044; https://doi.org/10.3390/agronomy15051044
Submission received: 28 March 2025 / Revised: 24 April 2025 / Accepted: 25 April 2025 / Published: 26 April 2025
(This article belongs to the Section Agroecology Innovation: Achieving System Resilience)

Abstract

:
Aphids are the main agricultural pests that affect the quality and yield of peppers in the greenhouse. Efficient early prediction of aphid occurrence is of great significance for the development of digitization and information technology in intelligent agriculture. Forecasting accuracy could be improved by the incorporation of feature interactions into pest forecasting. This study integrates multiple environmental factors to efficiently predict the number of aphids and the aphid strain rate in the greenhouse. We propose a two-level distributed multi-source information fusion approach, which integrates a one-dimensional convolutional neural network (1D CNN) and Long Short-Term Memory (LSTM). To enhance the accuracy of regional environmental parameters, a weighted average algorithm employs environmental sensor data in the first level of fusion. In the second fusion level, a heterogeneous sensor fusion algorithm allows for the integration of multi-source data to model the connection between environmental factors and aphid dynamics. Finally, the improved 1D CNN-LSTM fusion model and other models were tested to verify the effectiveness and robustness of the proposed model. The experimental results show that the total root mean square error of the proposed model is 1.503, which is obviously better than the other networks. In the test set, the total root mean square error of the model for predicting the aphid number and strain rate is 1.378 and 0.337, respectively, compared with existing network models such as 1D CNN, LSTM, and back propagation (BP). The experimental results show that the proposed model has obvious advantages for predicting the aphid number and strain rate. It provides a promising step forward in pest management, offering precise, environmentally friendly solutions that enhance crop yield and quality.

1. Introduction

Pepper plants are susceptible to pest infestations, with aphid damage being particularly prominent [1,2]. According to the Food and Agriculture Organization (FAO), pests are among the greatest threats to food security, generating broad economic, social, and environmental impacts [3]. The primary aphid species affecting pepper cultivation is Myzus persicae (Sulzer) (Hemiptera: Aphididae) (commonly called the green peach aphid). Green peach aphids feed on the sap of stems and leaves of fruits and vegetables through piercing and sucking, and they also transmit viruses. They are one of the primary factors affecting the quality and production of fruits and vegetables. If pest alerts are not issued on time, it can easily lead to rash decision making, resulting in pesticide residues and a decline in the yields of fruits and vegetables [4,5,6].
The existing pest monitoring methods are divided into three types: expert observation, machine learning approaches, and deep learning-based models. One method is to estimate the occurrence of insect pests according to experience through expert field observation [7]. Zhang et al. suggested that climate change can significantly influence the population dynamics and abundance of three major pests of fruit and nut crops by quantifying changes, the lifecycle length, and the number of generations for these pests [8]. A well-established growing-degree days (GDD) model is used to investigate these pests in orchards; however, due to the complex non-linear relationship between plants, insect pests, and the environment, the actual early prediction effect of perceptual judgment is not ideal. Abiotic factors, like meteorological conditions, play an essential role in pest population dynamics. In particular, meteorological variables such as temperature, humidity, carbon dioxide (CO2), wind speed, and rainfall significantly affect pest populations. Sokame et al. analyzed the effects of air temperature and carbon dioxide concentration on insect pests and concluded that increasing the air temperature and carbon dioxide concentration greatly affect the physiology of insect pests, accelerate the metabolic consumption of insect pests, and eventually cause an increase in population density and other losses [9]. Singh et al. pointed out that the mung bean pest is highly affected by the weather; the highest temperature observed was highly, positively correlated with the pest; the lowest temperature, wind speed, and light were negatively correlated with it; and the highest humidity, lowest temperature, and rainfall were non-significantly correlated with it [10]. Mutamiswa et al. analyzed the relationship between environmental factors and moth number growth, and it was concluded that the average temperature and highest temperature and the moth growth number showed a positive correlation, and that the lowest temperature, minimum relative humidity, and maximum relative humidity showed a negative correlation with moth growth number; as such, giving a research correlation function that can provide a reference for pest prediction [11]. To explore the relationship between weevils and the environment, Madeira et al. combined laboratory and field sampling methods to determine the effects of different temperatures and humidities on the growth of the weevil population, and experiments proved that high temperature and low relative humidity caused a low survival rate, long development time, and slow growth of eggs [12]. Thus, climate change has a direct influence on the development, distribution, and performance of pests [13,14,15,16,17,18].
The second method is to use a machine learning algorithm combined with digital images and sensor networks to achieve pest monitoring. Traditional machine learning algorithms, such as support vector regression (SVR), artificial neural networks (ANNs), random forest (RF), and decision trees (DTs), have been successfully used to forecast pest outbreaks. Xiao et al. realized a multidimensional big data collection of soil and environmental effects on insect pests based on correlation analysis, and a pest recognition model based on a back propagation (BP) neural network was constructed [19]. Manrique-Silupú et al. developed a framework for facilitating the prediction of pest incidence; the results showed that the support vector regression (SVR) algorithm proved to have good performance for predicting the degree of pest occurrence in terms of meteorological information in organic banana crops [20]. Ibrahim et al. used fuzzy neural networks (FNN) as predictive tools to model the population dynamics of fruit fly pests by studying climatic variables and avocado plant physiology stages in different orchards [21]. Lafont et al. studied Artificial Neural Networks (ANNs) and Adaptive Neuro-Fuzzy Inference System (ANFIS) models to forecast pest risk levels in rose greenhouses, depending on the internal temperature and humidity, human intervention, and current pest risk levels [22]. Yang et al. built a prediction model for the population occurrence of paddy stem borer by applying principal components analysis (PCA) and back propagation (BP) artificial Neural Network (ANN) methods. The experimental results showed that there exists a non-linear relationship between the main meteorological factors and the population occurrence of the paddy stem borer [23].
With the development of deep learning methods, many novel methods have been proposed to accommodate forecast pest occurrence early. Thus, the third method is a deep learning-based model, which provides a deep learning model for making predictions of pest population dynamics by leveraging environmental parameters. Heryadi et al. addressed a Bidirectional Long Short-Term Memory (LSTM) model for forecasting crop pest attacks using multivariate time-series inputs of weather data [24]. Wang and Zhang proposed the Attention-based Long Short-Term Memory Interaction Convolutional Neural Network (ALIC) model to extract intricate inter-relationships between pest outbreaks and meteorological variables and proved the validity of the ALIC model in pest forecasting [25]. He et al. pointed out that meteorological conditions significantly affected rice plant hopper populations and utilized an attention-based LSTM encoder–decoder network to investigate population dynamic forecasting [26]. Chen et al. established a prediction model to investigate the correlation between the population dynamics of pests and environmental factors through short-term memory networks (LSTM) and machine learning (ML) [27].
According to the related works mentioned, Table 1 summarizes key contributions and key findings regarding these three methods. The above research shows that there is a certain connection between the environment and insect pests. However, there is no qualitative description, and the occurrence of insect pests cannot be accurately predicted. Due to the characteristics of pests, pests are scattered in leaves, roots, and other positions, and so it is difficult to achieve the targeted image capture of each location of the pest, and it is difficult to use any conclusions to guide the regulation of the environment. Furthermore, feature interactions between meteorological and pest data have not been taken into account in depth. It is critical to improve the forecasting accuracy by incorporating feature interactions into pest forecasting.
The interaction between “environment and pests” is complex. When aphids invade peppers, their physiological and biochemical mechanisms and development continuously adjust. Additionally, there are intricate relationships between different environmental factors, making a single type of environmental factor insufficient to support the establishment of pest monitoring models. Multi-source information fusion is a technique that integrates diverse information from multiple sensors and sources at various levels and dimensions. It is a powerful method that enhances decision making by combining diverse data from multiple sources. This approach not only improves the reliability and precision of conclusions, but also mitigates uncertainties and errors in judgment. It can be employed across various fields, such as environmental monitoring and smart agriculture. Therefore, this paper integrates multiple environmental factors to efficiently predict the number of aphids and the aphid strain rate in the greenhouse. A two-level distributed multi-source information fusion approach, which integrates a one-dimensional convolutional neural network (1D CNN) and Long Short-Term Memory (LSTM), is employed to investigate the number of aphids and aphid strain rate in the pepper planting process. At the first level, a homogeneous sensor information fusion algorithm is presented to achieve precise and consistent representation of the multi-sensor data. Using a weighted average fusion algorithm, unified measurements of temperature, humidity, CO₂ concentration, and light intensity within a fixed area of the greenhouse are obtained. Three features—the maximum, minimum, and average values per unit time for temperature, humidity, CO₂ concentration, and light intensity—are extracted to establish a multi-source information fusion model. This reduces the data volume while enhancing the system’s early prediction performance. At the second level, a heterogeneous sensor fusion algorithm is adopted. By analyzing the impact of different environmental variables on aphid population and infestation rates in the greenhouse, an improved 1D CNN-LSTM multi-source information fusion model is developed to construct an aphid early prediction system. This ultimately enables the precise prediction of pest outbreaks.
In summary, this study introduces the two-level distributed multi-source information fusion model architecture. The model integrates the 1D CNN and LSTM. The main contributions of this study are as follows:
(1)
We propose a two-level distributed multi-source information fusion framework that combines the 1D CNN with LSTM. To enhance the accuracy of regional environmental parameters, a weighted average algorithm employs environmental sensor data in the first level of fusion. The second fusion level, a heterogeneous sensor fusion algorithm, allows for the integration of multi-source data to model the connection between environmental factors and aphid dynamics.
(2)
We employ grey correlation analysis to identify the correlation between environmental factors and aphids and the degree of correlation, indicating that the light intensity is most closely related to the number of aphids and the aphid strain rate.
(3)
We conduct multi-source information fusion experiments and compare the proposed model with existing network models such as 1D CNN, LSTM, and BP. The results show that the proposed model has obvious advantages for predicting the aphid number and strain rate.
(4)
The outline of the paper is organized as follows: In Section 2, the two-level distributed multi-source information fusion framework is described. The dataset acquisition and the proposed approach are devised in this section. In Section 3, fusion experiments are conducted to investigate the prediction performance of the model. The discussion evaluates the two-level distributed multi-source information fusion model in Section 4. Finally, the conclusion is presented in Section 5.

2. Materials and Methods

2.1. Two-Level Distributed Multi-Source Information Fusion Framework

In this section, we introduce the overall architecture of an aphid early prediction system. As shown in Figure 1, the system consists of the perception layer, the transmission layer, and the application layer. The entire framework is centered on a cloud platform for environmental data collection, analysis, and modeling. To facilitate a more accurate acquisition of regional environmental factors, we employ a Microcontroller Unit (MCU) and LoRa technology in the perception layer. The perception layer consists of multiple sensor nodes. In the transmission layer, we use a WiFi module to collect environmental data, which is then uploaded to the OneNet cloud platform. In the application layer, a two-level distributed multi-source information fusion model is proposed to predict the number of aphids and aphid strain rate in the greenhouse.
Figure 2 illustrates the framework of a two-level distributed multi-source information fusion model. It includes two levels of fusion: The first level of fusion is for the homogeneous sensor, and at the data level, the consistent expression of the multi-node for the whole pepper planting area is realized through the weighted average fusion algorithm. In primary fusion, selecting the maximum, minimum, and average values for key environmental factors provides a robust framework for analyzing greenhouse conditions. It enables a deeper understanding of the relationships between environmental factors and pest behavior. Thus, the maximum, minimum, and mean features of temperature and humidity, light intensity, and carbon dioxide concentration were extracted and used to construct a multi-source heterogeneous sensor information fusion model to form the characteristic level fusion. Finally, the accurate prediction of the number of aphids and aphid strain rate in the greenhouse was obtained.

2.2. Data Acquisition

Using the controlled-environment greenhouse at Yangzhou University as the experimental background, vegetable cultivation was facilitated. We employed a multi-layer, three-dimensional cultivation device. The research focused on pepper, and the environmental factor information in the greenhouse was collected with the built environmental information collection system. Three hundred peppers were bred indoors from their emergence until harvest, which spanned from September 2023 to June 2024. Every three days, the number of aphids and aphid strain rate were recorded in the indoor planting area. Here, by standardizing inspections, cross-verifying results, and aligning the methodology with the hybrid model’s requirements, we maintained rigor and reproducibility. In addition, data on temperature, relative humidity, light intensity, and carbon dioxide concentration were regularly collected. Upon maturation, the pepper plants were removed, and the greenhouse was disinfected before introducing a new batch of pepper plants for cultivation.

2.3. Two-Level Distributed Multi-Source Information Fusion Model

To efficiently predict the number of aphids and aphid strain rate in the greenhouse, we propose a two-level distributed multi-source information fusion framework that combines the 1D CNN with the LSTM. It consists of two-level fusion models: primary fusion and secondary fusion. In the first level of fusion, a weighted average algorithm employs environmental sensor data to enhance the accuracy of regional environmental parameters; in the second fusion level, a novel approach that integrates the 1D CNN and the LSTM network is addressed. The 1D CNN extracts the features of environmental factors, while the LSTM conducts the reasoning of contextual time sequence information. The proposed model leverages key information from the greenhouse and possesses long-term memory functionality, ultimately enhancing the prediction accuracy for the number of aphids and the aphid strain rate in a climate-controlled greenhouse. A heterogeneous sensor fusion algorithm allows them to integrate multi-source data to model the connection between environmental factors and aphid dynamics.

2.3.1. Primary Fusion

Weighted-Average Fusion Algorithm

The homogeneous sensors, positioned in various locations, typically exhibit unique errors due to the manufacturing process, and their arrangements can be significantly influenced by the surrounding environment. This variability makes it challenging to establish a consistent representation of regional environmental parameters. As illustrated in Figure 3, we employ a weighted average fusion algorithm to address these issues. This algorithm processes the sensing data collected from nodes across different locations within the pepper planting area. By applying the weighted average method, we can effectively correct sensor data errors, allowing us to aggregate the information into a cohesive representation of the regional environment. The weighted average fusion algorithm works by assigning different weights to the readings of each sensor based on their reliability and proximity to the target parameters. Consequently, this approach mitigates the impact of sensor inaccuracies and environmental disturbances, leading to a more accurate and reliable fusion of regional environmental information.
The steps of homogeneous sensor information fusion based on the weighted fusion algorithm are as follows:
Step 1: Set the information collection interval of sensor nodes, and integrate the received data of temperature and humidity, carbon dioxide concentration, and light intensity collected by each node through the Cloud platform.
Step 2: Clean the data and eliminate any large or strange data. Commonly used singular value exclusion methods for data include the 3σ principles and the Grubbs criteria. Assuming that a set of data only contains random errors, the standard deviation is obtained through calculations, and an interval is obtained according to certain criteria. This interval is used to judge whether the data has a large error, and if so, it is eliminated. Although the criteria are convenient to use, the effect is not obvious for data with a small number of homogeneous sensors. The Grubbs criterion mainly detects outliers in a data set with a normal distribution. Compared with the 3σ principles, it is more sensitive to outliers, and can still guarantee a certain accuracy with fewer homogeneous sensors. Using the Grubbs criterion, we first calculate the average value and standard deviation of each group of data σ and then calculate the value of Gi. The formula for calculating Gi is shown in Formula (1).
G i = x i x ¯ / σ
Step 3: Calculate the data collected by each sensor node at a certain time, the data variance of the data collected by the homogenous sensor at different nodes at that time, and the weight allocation of the different homogenous sensors. The weight allocation is as shown in Formula (2):
w n = 1 D x n n = 1 N 1 D x n
where w n refers to the weight of the nth sensor, x refers to the data after pretreatment of the measured value of the sensor, Dxn refers to the variance of the measured value of the homogeneous sensor after pretreatment, and N refers to the number of homogeneous sensors after eliminating the abnormal value of the data cleaning in Step 2.
Step 4: Multiply the sensor data collected by each node by the corresponding weight of each sensor, and obtain the optimal estimate of the environmental factor information in each planting area.

Analysis of Influencing Factors of Aphids Based on Grey Correlation Degree

Grey correlation degree analysis is a multi-factor statistical analysis method based on the sample data of each factor used to describe the strength, size, and order of the relationship between the factors. The grey correlation analysis can determine the factors affecting the occurrence of aphids. The analysis process for the number of aphids and aphid strain rate based on grey correlation analysis is shown in Figure 4.
The specific process of grey correlation degree analysis regarding the number of aphids and aphid strain rate in the greenhouse is as follows:
Step 1: Reference to the selection of the sequence and comparison sequence. Set the target parameters—that is, the number of aphids and the aphid strain rate—as the reference number sequence, marked with y j ( k ) , where k = 1 , 2 , , 20 is the set number of arrays; j = 1 , 2 , respectively, corresponds to the number of aphids and the rate of aphids in the greenhouse within the cycle; and where the format of the reference number sequence is obtained as below:
y j ( k ) = y 1 ( 1 ) , y 1 ( 2 ) , , y 1 ( k ) y j ( 1 ) , y j ( 2 ) , , y j ( k )
Take the maximum temperature, minimum temperature, average temperature, maximum relative humidity, minimum relative humidity, average relative humidity, maximum light intensity, minimum light intensity, average light intensity, maximum carbon dioxide concentration, minimum carbon dioxide concentration, and average carbon dioxide concentration of the greenhouse in each week as the comparison data series, and record them as x i ( k ) , where i = 1 , 2 , , 12 corresponds to different selection parameters in the comparison data series, and the format of the comparison data series obtained is as follows:
x i ( k ) = x 1 ( 1 ) , x 1 ( 2 ) , ,   x 1 ( k ) x 2 ( 1 ) , x 2 ( 2 ) , ,   x 2 ( k ) x i ( 1 ) , x i ( 2 ) , ,   x i ( k )
Step 2: Perform dimensionless transformation of a sequence. For the reference columns and contrast columns of different units, the initial value transformation is used to eliminate dimensions and improve the reliability of the analysis results.
Step 3: Calculate the absolute value of the difference Δ i j ( k ) , with the mathematical expression as follows:
Δ i j ( k ) = y j ( k ) x i ( k )
Step 4: Calculate the maximum difference M j and the minimum difference m j :
M j = max i   max k Δ i j ( k )
m j = min i   min k Δ i j ( k )
Step 5: Calculate the correlation coefficient γ j ( x 0 ( k ) , x i ( k ) ) between the number of aphids and the rate of aphids of each influencing factor:
γ j ( x 0 ( k ) , x i ( k ) ) = m + ξ × M / x 0 ( k ) x i ( k ) + ξ × M
where ξ refers to the resolution coefficient, with a value range of ( 0 , 1 ) . The resolution coefficient is related to the discrimination ability, usually 0.5.
Step 6: Calculate the grey correlation degree γ j ( x 0 , x i ) of the number of aphids and aphid strain rate:
γ j ( x 0 , x i ) = 1 n γ ( x 0 ( k ) , x i ( k ) )

2.3.2. Secondary Fusion

1D CNN

The action principle of the one-dimensional convolutional neural network is similar to two-dimensional convolution in image recognition, including the basic composition of the convolution layer, pooling layer, full connection layer, and activation function. Convolution calculation is conducted by weight sharing to extract target features. The difference is that 1D CNN performs the convolution operation in one dimension, which can effectively extract the timing information of different depths and is widely used in the classification and prediction of timing, signal, and natural language.
The feature extraction process of the input climate indoor environment information x i ( k ) by 1D CNN is as follows:
Step 1: Enter Aphid multi-source information. In the greenhouse environmental information, x 1 ( 1 ) , x 2 ( 1 ) , , x i ( 1 ) is taken as a group of inputs, corresponding to the number of aphids y 1 ( 1 ) and aphid strain rate of pepper y 2 ( 1 ) in the greenhouse. To reduce the error caused by different units between different data, the value range of input data is normalized to the range [ 0 , 1 ] using the normalization method. The mathematical expression is as follows, where x max and x min correspond to the maximum and minimum values of each data, and x * represents the normalized data:
x * = ( x x min ) / ( x max x min )
Step 2: Extract the key features of the number of aphids and aphid strain rate through the convolution and activation function. The mathematical expression of the convolution process is as follows:
Q α = ζ ( β d α β s β + b α )
where Q α represents the output of layer α , ζ represents the activation function, s β represents the input of layer β , represents the convolution operation, d α β corresponds to the size of the convolution kernel, and b α represents the offset of layer α .
Step 3: Reduce the feature output dimension by a pooling layer. The pooling layer can reduce the operation amount and improve the robustness of the model. Pooling usually adopts maximum pooling or average pooling. The maximum pooling and average pooling take the local maximum and average values, respectively. Compared with the advantages of average pooling for overall data, the maximum pooling can better reflect the local information and realize the effective extraction of sensitive environmental factors of aphids, as shown in Figure 5. Figure 5a,b corresponds to the maximum pooling and average pooling, respectively.
Step 4: Realize the regression calculation of the number of aphids and aphid strain rate through full connections, and obtain the results for the number of aphids and the aphid strain rate in the greenhouse.

Aphid Monitoring Model Based on 1D CNN-LSTM

As a specialized recurrent neural network, Long Short-Term Memory (LSTM) networks can effectively address the issues of gradient disappearance and gradient explosion during the training of time series data. This capability enables the preservation of long-term dependency information within sequences. The network structure diagram of the LSTM is shown in Figure 6.
The information extraction process of different gates in LSTM is as follows:
Step 1: The forget gate f t determines the fate of historical state information through the sigmoid activation function ( σ ). The mathematical expression is as follows, where x t corresponds to the current input and w and b correspond to the weight coefficient and offset:
f t = σ ( w f [ h t 1 , x t ] + b f )
Step 2: Input the gate e t and status information g t of the current preparation cell to jointly determine the updated status c t of the current cell. The mathematical expressions are as follows:
e t = σ ( w i [ h t 1 , x t ] + b i )
g t = tanh ( w c [ h t 1 , x t ] + b c )
c t = f t c t 1 + e t g t
Step 3: The output gate o t and the updated status c t of the current cell control the final output h t of the cell under the action of the activation function tanh . The mathematical expression is as follows:
o t = σ ( w o [ h t 1 , x t ] + b o
h t = o t tanh c t
The structure regarding 1D CNN-LSTM is presented as shown in Figure 7, where the two convolution layers in the 1D CNN model all adopt the convolution kernels of length 3. They correspond to 16 and 32 convolution kernels, respectively. Additionally, 2 × 2 maximum pooling is adopted for the pooling layer. Based on the preliminary extraction of the features of environmental factors by the 1D CNN, the reasoning of contextual time sequence information was conducted through the LSTM network to realize the complex multi-source key information features of ‘aphid-environmental factors’.
The above approach builds a greenhouse aphid early prediction model based on 1D CNN-LSTM. The model training process is shown in Figure 8. In the training set, the 1D CNN network is used for the feature extraction of aphid multi-source information. On this basis, the LSTM network is used to strengthen the processing ability of the 1D CNN network for time series and aphid multi-source information training. When the root mean square error R M S E < 0.2 , the loss converges to 0, and it is believed that the model has achieved the expected training effect. By saving the model training parameters for model testing, as well as RMSE, the average absolute error (Mean Absolute Error, MAE) and R-Square ( R 2 ) are used as evaluation indicators to further investigate the prediction performance of the 1D CNN-LSTM model for the number of aphids and the aphid strain rate in the greenhouse. If the requirements are not met, readjust the activation function in 1D CNN and the optimizer of the model training.

2.4. Experimental Setting

The experiments described in this study were conducted on a Windows 10 operating system, utilizing an Intel(R) Core(TM) i5-9300H CPU running at 2.40 GHz and a GeForce GTX 1650 graphics card. For GPU (Graphics Processing Unit) acceleration, we employed CUDA (Computer Unified Device Architecture) version 10.2, which enhances the performance of graphics computing tasks. Adam was used as the optimizer to evaluate the training effect of the model through the output of RMSE and the training loss of the number of aphids and the plant rate. In addition, the experimental hyperparameter settings can be found in Table 2.

3. Results

3.1. Analysis of Primary Fusion Experiments

To reflect the principle of the weighted fusion algorithm, this paper takes the temperature value in the temperature and humidity sensor as an example. The temperature collection situation of the A temperature and humidity sensor in some planting areas at different times is shown in Table 3. As can be seen from the table, there is a certain gap between the temperature collected value of each node and the given value, and the absolute error between the measured value and the given value varies from 0 to 11.8.
According to the aforementioned Grubbs criteria, the critical coefficient G0 is obtained. Generally, the value G0 is determined by the number of sensors N, and the confidence level P. The confidence level P reflects the strict degree of data elimination, which can be adjusted according to the situation. Here, P = 95%, N = 7. Thus, we have G0 = 1.938. Furthermore, determine whether there is an abnormal value by judging if condition Gi > G0 is met. If yes, it should be eliminated. The calculation of the values of the homogeneous sensor Gi at each position of eight groups of data is shown in Table 4.
According to Table 4, the measurement data of the fourth sensor in the sensor A4 position determined whether the sensor may be damaged or abnormal, and if the data should be removed. Based on the above data, we calculated the data collected by each sensor node and computed the variance of the data collected by different homogeneous sensors at that moment. The variance was used to determine the weight assignment of the different homogeneous sensors according to Formula (2). As a result, the variances and weights assignment of the different sensors are shown in Table 5. Furthermore, by multiplying the sensor data collected from each node by the corresponding weights of each sensor and summing the results, we obtained the optimal estimated values of various environmental factors in the planting areas. The fused values obtained after the weighted fusion process are shown in Table 6.
According to the tables, the absolute error of the given value is within 0.1. To reflect the algorithm process, this paper only uses the temperature value measured eight different times as a reference. As the actual measurement times increase, the absolute error approaches zero. Thus, the weighted fusion algorithm yields more accurate results regarding the environmental characteristics of the fusion area.
In addition, the number of aphids and aphid strain rate are influenced by the above-mentioned environmental factors. Through grey correlation analysis, the factors affecting the occurrence of aphids can be identified. The data on the correlation between the number of aphids and the aphid strain rate in the greenhouse are shown in Table 7.
According to Table 7, the correlation coefficient γ 1 [ x 0 ( k ) , x i ( k ) ] regarding the number of aphids and the correlation coefficient γ 2 [ x 0 ( k ) , x i ( k ) ] regarding the aphid strain rate are obtained, respectively.
The correlation degree of the number of aphids and the aphid strain rate is shown in Figure 9. Among the different factors, the light intensity is most closely related to the number of aphids and the aphid strain rate, and the overall correlation degree is above 0.6, indicating that the use of the selected multi-source data on environmental factors is feasible for modeling the number of aphids and the aphid strain rate in the greenhouse.
γ 1 [ x 0 ( ) , x i ( ) ] = 1.0000   0.8885   0.7690   0.7369   0.7072     0.3983   0.3336 1.0000   0.8898   0.7681   0.7344   0.7032     0.3991   0.3341 1.0000   0.8917   0.7683   0.7346   0.7036     0.3989   0.3338 1.0000   0.8915   0.7680   0.7353   0.7055     0.3984   0.3338 1.0000   0.8892   0.7673   0.7344   0.7041     0.3985   0.3333 1.0000   0.8899   0.7675   0.7340   0.7037     0.3980   0.3334 1.0000   0.9362   0.8177   0.8011   0.7793     0.4258   0.3526 1.0000   0.9437   0.8211   0.8265   0.7884     0.4339   0.3575 1.0000   0.9442   0.8180   0.8152   0.7868     0.4315   0.3563 1.0000   0.9288   0.7949   0.7566   0.7318     0.4049   0.3396 1.0000   0.9250   0.7934   0.7558   0.7347     0.4054   0.3400 1.0000   0.9248   0.7960   0.7565   0.7340     0.4049   0.3399
γ 2 [ x 0 ( ) , x i ( ) ] = 1.0000   0.8389   0.7184   0.6076   0.5691     0.3689   0.3339 1.0000   0.8417   0.7166   0.6036   0.5630     0.3705   0.3351 1.0000   0.8457   0.7170   0.6039   0.5637     0.3700   0.3344 1.0000   0.8453   0.7164   0.6050   0.5665     0.3690   0.3344 1.0000   0.8405   0.7151   0.6036   0.5644     0.3692   0.3333 1.0000   0.8418   0.7155   0.6029   0.5637     0.3683   0.3334 1.0000   0.9458   0.8267   0.7192   0.6897     0.4292   0.3824 1.0000   0.9640   0.8348   0.7691   0.7067     0.4489   0.3963 1.0000   0.9653   0.8272   0.7464   0.7036     0.4430   0.3927 1.0000   0.9284   0.7738   0.6399   0.6076     0.3825   0.3485 1.0000   0.9196   0.7706   0.6384   0.6125     0.3834   0.3494 1.0000   0.9189   0.7763   0.6396   0.6112     0.3824   0.3491

3.2. Analysis of Secondary Fusion Experiments

To further investigate the prediction performance of the 1D CNN-LSTM model for the number of aphids and the aphid strain rate in the greenhouse, evaluation indicators such as   R M S E , M A E , and R 2 are used.
The experimental hyperparameter design is as follows: the number of iterations of the model is 100, the batch size is 128, the initial learning rate is 0.001, the number of LSTM neurons is 128, and Adam is used as the optimizer to evaluate the training effect of the model through the output of   R M S E and the training loss of the number of aphids and the plant rate. The mathematical expressions of   R M S E , M A E , and R 2 are as follows, where y λ represents the true value of the λ th sample, y λ represents the predicted value of the λ th sample, y ¯ λ represents the average of the true value of the λ th sample, and m represents the total number of samples in the test set. MAE reflects the mean value of the absolute value of the error between the predicted value and the true value, R M S E reflects the square root of the ratio of the square of the deviation between the true value and the predicted value, and the number of samples taken, and R 2 reflects the degree of fitting of the model. The smaller the value of M A E and R M S E , the greater the value of R 2 , and the stronger the prediction ability of the model.
R M S E = 1 m λ = 1 m ( y λ y λ ) 2
M A E = 1 m λ = 1 m y λ y λ
R 2 = 1 λ = 1 m ( y λ y λ ) 2 λ = 1 m ( y ¯ λ y λ ) 2
To investigate the impact of different activation functions on the 1D CNN-LSTM multi-source information early prediction model, set up the model RMSE and Loss performance experiments under different activation functions of LeakyReLu, clippedReLu, swish, and ReLu. The model training results under different activation functions are shown in Table 8.
According to Table 8, ReLu was used as the activation function, the model had a better performance in predicting the number of aphids and the plant rate in different climates, and the convergence of the model was better. Therefore, ReLu was used as the activation function to realize the prediction of aphid numbers and plant rates in different climates. The 1D CNN-LSTM aphid multi-source information fusion model using ReLu as the activation function is shown in Figure 10.
To demonstrate the superiority of using 1D CNN-LSTM as a multi-source aphid information prediction model, the above 1D CNN-LSTM network using ReLu as an activation function was compared with the 1D CNN and LSTM network alone and the traditional BP neural network model. The specific experimental results are shown in Table 9.
As can be seen from Table 9, the aphid multi-source information prediction model using 1D CNN-LSTM has the best effect in predicting the number of aphids and the insect strain rate. The overall RMSE improves by 6.891, 7.513, and 33.980 compared with the 1D CNN, LSTM, and BP neural networks, respectively, indicating that the aphid multi-source information prediction method combined with 1D CNN-LSTM has certain advantages.
To prove the prediction performance of the model in the actual environment, the number of aphids and the rate of aphids in the test set are predicted by saving the trained 1D CNN-LSTM network, and the number of aphids is rounded. The actual prediction effect of the model on the number of aphids and the aphid strain rate in the greenhouse is investigated by using the evaluation indexes   R M S E , M A E , and R 2 . The specific prediction performance results are shown in Table 10.
It can be seen from Table 10 that the traditional BP neural network has the worst prediction performance for the number of aphids and the aphid strain rate, being unable to meet the actual application requirements. The prediction performance of the aphid multi-source information early prediction model using 1D CNN-LSTM in the test set is consistent with that of the training set. Compared with the 1D CNN and LSTM models, the   R M S E increased by 5.777 and 10.253, the M A E by 4.900 and 7.800, and the number of aphids was predicted. Compared with the 1D CNN and LSTM models, the   R M S E increased by 3.030 and 2.206, respectively, and the M A E increased by 2.390 and 1.700, respectively, with an R 2 of 0.999, indicating that the fitting effect of the model is good and has a certain degree of robustness, and so can be used for indoor pest prediction in actual scenarios.

4. Discussion

Multi-source information fusion is a method that combines multi-level and multi-faceted information from multiple sensors through consistent and coordinated approaches to arrive at consistent conclusions. Compared to decision-making methods based on a single information source, it offers greater reliability and precision, overcoming the ambiguity and one-sidedness of single information sources. By utilizing multi-source information fusion, the value of data can be further explored, enabling more precise decision making based on scientific data analysis and modeling, thereby reducing the likelihood of decision-making errors.
This paper presents a pest multi-source information prediction method that integrates the 1D CNN-LSTM architecture to monitor environmental factors in a greenhouse setting for pepper cultivation. The 1D CNN model extracts the features of environmental factors, while LSTM conducts the reasoning of contextual time sequence information. In our evaluation, we compare the proposed model against established network architectures, including the 1D CNN, LSTM, and BP networks. Through grey correlation analysis, we examine the relationship between environmental factors and aphid populations, determining the degree of correlation. By utilizing data from various sensors to extract relevant environmental factors, the results indicate that our model significantly outperforms these existing methods in predicting aphid numbers and strain rates, showcasing its advantages in accuracy and reliability. Our model can accurately predict short-term aphid numbers and the insect strain rate of pepper aphids in the greenhouse, thereby informing pest prevention strategies and demonstrating significant value for plant protection. Therefore, the ability to effectively fuse multi-source information enhances the model’s predictive power, making it a valuable tool for growers aiming to optimize pest control measures. It is worth noting that the proposed system has certain limitations: For instance, its performance is highly dependent on the quality and quantity of the input data collected from sensors. Additionally, the model may require extensive training data to generalize effectively across different greenhouse environments. Addressing these limitations in future research could further enhance the robustness and applicability of the pest prediction system.
Furthermore, predicting the number and insect strain rate of pepper aphids in the greenhouse has a specific guiding role in the plant protection and control of pepper. The occurrence of aphids is influenced by multiple factors, not solely environmental conditions such as temperature, humidity, and light. While these factors play a significant role in aphid development and reproduction rates, the impact of natural enemies—such as predators and parasitoids—should not be overlooked. These natural enemies can significantly reduce aphid populations, and their presence in the greenhouse can alter the dynamics of aphid occurrence and survival. To gain deeper insights into aphid population dynamics, the next task involves analyzing the intricate interactions between aphids, environmental factors, and natural enemies. By studying these interactions in a controlled greenhouse setting, researchers can assess how varying conditions influence aphid populations and the effectiveness of their natural enemies in controlling these pests.

5. Conclusions

In this study, we investigated a two-level distributed multi-source information fusion approach that integrates the 1D CNN and LSTM for predicting aphid populations and strain rates in greenhouse environments. By employing a weighted average fusion algorithm to refine environmental parameter accuracy and utilizing a heterogeneous sensor fusion algorithm to integrate diverse data sources, this study establishes a robust framework for understanding the relationship between environmental factors and aphid behavior. Notably, grey correlation analysis identified light intensity as the most influential variable on aphid populations, providing critical insights for targeted pest management strategies. The key findings reveal that the proposed model achieves a total root mean square error of 1.503, showcasing superior performance compared to traditional network models such as 1D CNN, LSTM, and BP networks. Specifically, the model’s performance is highlighted by a total root mean square error of 1.378 for aphid number prediction and 0.337 for strain rate prediction, underscoring its effectiveness in accurately monitoring pest dynamics. In future work, the practical implications of this research extend to the development of intelligent agricultural systems capable of real-time pest monitoring and management. Future directions include optimizing the model for varying greenhouse conditions, exploring additional environmental factors, and integrating artificial intelligence techniques to further refine the predictive capabilities of the system. Hence, this research could pave the way for more resilient and sustainable agricultural practices.

Author Contributions

Conceptualization, X.L., L.W. and M.D.; methodology, X.L. and L.W.; software, L.W.; validation, X.L., L.W. and W.S.; formal analysis, M.W.; investigation, W.S. and Y.Z.; resources, H.M.; data curation, X.L.; writing—original draft preparation, X.L. and L.W.; writing—review and editing, M.D., M.W. and H.M.; visualization, Y.Z.; supervision, W.S., Y.Z. and H.M.; project administration, H.M.; funding acquisition, M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the National Key Research and Development Program of China (No. 2024YFD2000800), Jiangsu Province Modern Agricultural Machinery Equipment and Technology Promotion Project (No. NJ2024-22), Jiangsu Provincial Key Research and Development Program Project (Modern Agriculture) (No. BE2021330).

Data Availability Statement

All datasets used in this study are included in the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chen, Q.; Liang, X.; Wu, C. Trait inheritance in pepper (Capsicum spp.) cultivars identified as resistant to green peach aphid (Myzus persicae). Plant Breed. 2020, 139, 996–1002. [Google Scholar] [CrossRef]
  2. Sun, M.; Voorrips, R.E.; Steenhuis-Broers, G.; van’t Westende, W.; Vosman, B. Reduced phloem uptake of Myzus persicae on an aphid resistant pepper accession. BMC Plant Biol. 2018, 18, 138. [Google Scholar] [CrossRef] [PubMed]
  3. Calicioglu, O.; Flammini, A.; Bracco, S.; Bellu, L.; Sims, R. The Future Challenges of Food and Agriculture: An Integrated Analysis of Trends and Solutions. Sustainability 2019, 11, 222. [Google Scholar] [CrossRef]
  4. Arcanjo, L.d.P.; da Silva, E.M.; de Araujo, T.A.; Barreto Crespo, A.L.; Santana Junior, P.A.; Oliveira Gomes, G.B.; Picanco, M.C. Decision-making systems for management of the invasive pest Neoleucinodes elegantalis (Guenee) (Lepidoptera: Crambidae) in commercial tomato crops according to insecticide spray method and plant stage. Crop Prot. 2021, 140, 105408. [Google Scholar] [CrossRef]
  5. Helps, J.C.; van den Bosch, F.; Paveley, N.; Jorgensen, L.N.; Holst, N.; Milne, A.E. A framework for evaluating the value of agricultural pest management decision support systems. Eur. J. Plant Pathol. 2024, 169, 887–902. [Google Scholar] [CrossRef]
  6. Soubeyrand, S.; Estoup, A.; Cruaud, A.; Malembic-Maher, S.; Meynard, C.; Ravigne, V.; Barbier, M.; Barres, B.; Berthier, K.; Boitard, S.; et al. Building integrated plant health surveillance: A proactive research agenda for anticipating and mitigating disease and pest emergence. CABI Agric. Biosci. 2024, 5, 72. [Google Scholar] [CrossRef]
  7. Rustia, D.J.A.; Lin, C.E.; Chung, J.-Y.; Zhuang, Y.-J.; Hsu, J.-C.; Lin, T.-T. Application of an image and environmental sensor network for automated greenhouse insect pest monitoring. J. Asia-Pac. Entomol. 2020, 23, 17–28. [Google Scholar] [CrossRef]
  8. Jha, P.K.; Zhang, N.; Rijal, J.P.; Parker, L.E.; Ostoja, S.; Pathak, T.B. Climate change impacts on insect pests for high value specialty crops in California. Sci. Total Environ. 2024, 906, 167605. [Google Scholar] [CrossRef]
  9. Tonnang, H.E.; Sokame, B.M.; Abdel-Rahman, E.M.; Dubois, T. Measuring and modelling crop yield losses due to invasive insect pests under climate change. Curr. Opin. Insect Sci. 2022, 50, 100873. [Google Scholar] [CrossRef]
  10. Bairwa, B.; Singh, P.S.; Meena, R.S. Impact of weather factors on population abundance of major insect pest on mungbean vigna radiata (l.) Wilczek in gangetic plains. J. Exp. Zool. India 2016, 19, 285–288. [Google Scholar]
  11. Machekano, H.; Mutamiswa, R.; Mvumi, B.M.; Nyabako, T.; Shaw, S.; Nyamukondiwa, C. Disentangling factors limiting diamondback moth, Plutella xylostella (L.), spatio-temporal population abundance: A tool for pest forecasting. J. Appl. Entomol. 2019, 143, 670–682. [Google Scholar] [CrossRef]
  12. Levi-Mourao, A.; Madeira, F.; Meseguer, R.; Garcia, A.; Pons, X. Effects of Temperature and Relative Humidity on the Embryonic Development of Hypera postica Gyllenhal (Col.: Curculionidae). Insects 2021, 12, 250. [Google Scholar] [CrossRef] [PubMed]
  13. Barton, M.; Parry, H.; Ward, S.; Hoffmann, A.A.; Umina, P.A.; van Helden, M.; Macfadyen, S. Forecasting impacts of biological control under future climates: Mechanistic modelling of an aphid pest and a parasitic wasp. Ecol. Model. 2021, 457, 109679. [Google Scholar] [CrossRef]
  14. Eigenbrode, S.D.; Adhikari, S. Climate change and managing insect pests and beneficials in agricultural systems. Agron. J. 2023, 115, 2194–2215. [Google Scholar] [CrossRef]
  15. Li, C.; Zhong, H.; Ning, W.; Hu, G.; Wu, M.; Liu, Y.; Yan, B.; Ren, H.; Sonne, C. Integrating climate-pest interactions into crop projections for sustainable agriculture. Nat. Food 2024, 5, 447–450. [Google Scholar] [CrossRef]
  16. Ma, G.; Ma, C.-S. Potential distribution of invasive crop pests under climate change: Incorporating mitigation responses of insects into models. Curr. Opin. Insect Sci. 2022, 49, 15–21. [Google Scholar] [CrossRef] [PubMed]
  17. Seidel, P. Climate Change as a New Challenge for Plant and Pest Modeling—A Critical Review. Gesunde Pflanz. 2017, 69, 1–14. [Google Scholar]
  18. Skendzic, S.; Zovko, M.; Zivkovic, I.P.; Lesic, V.; Lemic, D. The Impact of Climate Change on Agricultural Insect Pests. Insects 2021, 12, 440. [Google Scholar] [CrossRef]
  19. Cai, J.; Xiao, D.; Lv, L.; Ye, Y. An early warning model for vegetable pests based on multidimensional data. Comput. Electron. Agric. 2019, 156, 217–226. [Google Scholar] [CrossRef]
  20. Campos, J.C.; Manrique-Silupú, J.; Dorneanu, B.; Ipanaqué, W.; Arellano-García, H. A smart decision framework for the prediction of thrips incidence in organic banana crops. Ecol. Model. 2022, 473, 110147. [Google Scholar] [CrossRef]
  21. Ibrahim, E.A.; Salifu, D.; Mwalili, S.; Dubois, T.; Collins, R.; Tonnang, H.E.Z. An expert system for insect pest population dynamics prediction. Comput. Electron. Agric. 2022, 198, 107124. [Google Scholar] [CrossRef]
  22. Tay, A.; Lafont, F.; Balmat, J.-F. Forecasting pest risk level in roses greenhouse: Adaptive neuro-fuzzy inference system vs artificial neural networks. Inf. Process. Agric. 2021, 8, 386–397. [Google Scholar] [CrossRef]
  23. Yang, L.-n.; Peng, L.; Zhang, L.-m.; Zhang, L.-l.; Yang, S.-s. A prediction model for population occurrence of paddy stem borer (Scirpophaga incertulas), based on Back Propagation Artificial Neural Network and Principal Components Analysis. Comput. Electron. Agric. 2009, 68, 200–206. [Google Scholar] [CrossRef]
  24. Wahyono, T.; Heryadi, Y.; Soeparno, H.; Abbas, B.S. Enhanced LSTM Multivariate Time Series Forecasting for Crop Pest prediction. ICIC Express Lett. 2020, 14, 943–949. [Google Scholar]
  25. Wang, J.; Zhang, D. Intelligent pest forecasting with meteorological data: An explainable deep learning approach. Expert Syst. Appl. 2024, 252, 124137. [Google Scholar] [CrossRef]
  26. Zhang, H.; He, B.; Xing, J.; Lu, M. Deep spatial and temporal graph convolutional network for rice planthopper population dynamic forecasting. Comput. Electron. Agric. 2023, 210, 107868. [Google Scholar] [CrossRef]
  27. Chen, C.-J.; Li, Y.-S.; Tai, C.-Y.; Chen, Y.-C.; Huang, Y.-M. Pest incidence forecasting based on Internet of Things and Long Short-Term Memory Network. Appl. Soft Comput. 2022, 124, 108895. [Google Scholar] [CrossRef]
Figure 1. The overall architecture of an aphid early prediction system. It is composed of the perception layer, the transmission layer, and the application layer. The perception layer is responsible for the environmental data acquisition of multiple sensors in the greenhouse; the transmission layer is responsible for data transmission to the cloud; the application layer is responsible for aphid monitoring and forecasting in the greenhouse.
Figure 1. The overall architecture of an aphid early prediction system. It is composed of the perception layer, the transmission layer, and the application layer. The perception layer is responsible for the environmental data acquisition of multiple sensors in the greenhouse; the transmission layer is responsible for data transmission to the cloud; the application layer is responsible for aphid monitoring and forecasting in the greenhouse.
Agronomy 15 01044 g001
Figure 2. The framework of a two-level distributed multi-source information fusion model. It is composed of the primary fusion and the secondary fusion. The former is responsible for fusing sensing data such as temperature, humidity, light intensity, and carbon dioxide concentration, while the latter is responsible for extracting and integrating early-stage feature information.
Figure 2. The framework of a two-level distributed multi-source information fusion model. It is composed of the primary fusion and the secondary fusion. The former is responsible for fusing sensing data such as temperature, humidity, light intensity, and carbon dioxide concentration, while the latter is responsible for extracting and integrating early-stage feature information.
Agronomy 15 01044 g002
Figure 3. A homogeneous sensor fusion model using a weighted fusion algorithm. For a certain type of sensor, such as a temperature sensor, when placed at various positions, the corresponding temperature xk (k = 1, 2, …, n) at each location can be acquired. By assigning specific weights wk (k = 1, 2, …, n), accurate predictions of the indoor environmental parameter x in the greenhouse can be achieved.
Figure 3. A homogeneous sensor fusion model using a weighted fusion algorithm. For a certain type of sensor, such as a temperature sensor, when placed at various positions, the corresponding temperature xk (k = 1, 2, …, n) at each location can be acquired. By assigning specific weights wk (k = 1, 2, …, n), accurate predictions of the indoor environmental parameter x in the greenhouse can be achieved.
Agronomy 15 01044 g003
Figure 4. The flow of aphid impact factors based on grey correlation degree analysis. Select the number of aphids and the rate of aphids as the reference number sequence and calculate the correlation coefficient of each influence factor.
Figure 4. The flow of aphid impact factors based on grey correlation degree analysis. Select the number of aphids and the rate of aphids as the reference number sequence and calculate the correlation coefficient of each influence factor.
Agronomy 15 01044 g004
Figure 5. Pooling operation: (a) maximum pooling; (b) average pooling.
Figure 5. Pooling operation: (a) maximum pooling; (b) average pooling.
Agronomy 15 01044 g005
Figure 6. Structure diagram of the LSTM network. A single neuron is composed of an input gate, output gate, and forget gate in the basic structure of LSTM.
Figure 6. Structure diagram of the LSTM network. A single neuron is composed of an input gate, output gate, and forget gate in the basic structure of LSTM.
Agronomy 15 01044 g006
Figure 7. Multi-source information early prediction model of aphids based on 1D CNN-LSTM. One-dimensional time series data regarding multi-source information are used as the input, features are extracted through convolution and pooling operations, and then the LSTM network is employed to enhance the processing ability of multi-source information features.
Figure 7. Multi-source information early prediction model of aphids based on 1D CNN-LSTM. One-dimensional time series data regarding multi-source information are used as the input, features are extracted through convolution and pooling operations, and then the LSTM network is employed to enhance the processing ability of multi-source information features.
Agronomy 15 01044 g007
Figure 8. The training flow chart of the early prediction model of aphids based on 1D CNN-LSTM: 1D CNN extracts the features of environmental factors while LSTM conducts the reasoning of contextual time sequence information.
Figure 8. The training flow chart of the early prediction model of aphids based on 1D CNN-LSTM: 1D CNN extracts the features of environmental factors while LSTM conducts the reasoning of contextual time sequence information.
Agronomy 15 01044 g008
Figure 9. Correlation analysis results for multi-source data of aphid information. By conducting the calculation of the grey correlation degree, the correlation degree between the multi-source information of different sensors for environmental factors of aphids and the number of aphids and the aphid strain rate was obtained.
Figure 9. Correlation analysis results for multi-source data of aphid information. By conducting the calculation of the grey correlation degree, the correlation degree between the multi-source information of different sensors for environmental factors of aphids and the number of aphids and the aphid strain rate was obtained.
Agronomy 15 01044 g009
Figure 10. Training curve of the 1D CNN-LSTM model with ReLu as the activation function.
Figure 10. Training curve of the 1D CNN-LSTM model with ReLu as the activation function.
Agronomy 15 01044 g010
Table 1. The existing pest monitoring methods.
Table 1. The existing pest monitoring methods.
Author(s)YearMethod TypeKey ContributionsKey Findings
Zhang et al. [8]2024Expert
Observation
Analyzed climate change effects on pest dynamics.Climate change significantly influences pest populations and life cycles.
Sokame et al. [9]2022Expert
Observation
Studied temperature and CO2 effects on pests.Increased temperature and CO2 accelerate pest metabolism and increase populations.
Singh et al. [10]2016Expert
Observation
Investigated weather impacts on mung bean pests.Highest temperature positively correlated, while lowest temperature and wind negatively correlated.
Mutamiswa et al. [11]2019Expert
Observation
Analyzed environmental factors affecting moth populations.Positive correlation with average and highest temperatures; negative with humidity.
Madeira et al. [12]2021Expert
Observation
Combined lab and field methods to study the weevil’s response to temperature and humidity.High temperature and low humidity reduce survival and growth rates.
Xiao et al. [19]2019Machine
Learning
Developed pest recognition model with BP neural network.Enhanced multidimensional data collection for pest forecasting.
Manrique-Silupú et al. [20]2022Machine
Learning
Created a framework using SVR for pest incidence prediction.SVR demonstrated strong performance in predicting pest occurrences in organic banana crops.
Ibrahim et al. [21]2022Machine
Learning
Used fuzzy neural networks to model fruit fly dynamics.Climatic variables significantly affect pest population dynamics.
Lafont et al. [22]2021Machine
Learning
Studied ANNs and ANFIS for pest risk forecasting in greenhouses.Internal temperature and humidity are critical for pest risk levels.
Yang et al. [23]2009Machine
Learning
Developed an ANN model for predicting paddy stem borer populations.Non-linear relationships were identified between meteorological factors and pest populations.
Heryadi et al. [24]2020Deep
Learning
Addressed pest attack forecasting using a Bidirectional LSTM model.Effective forecasting of crop pest attacks with time-series weather data.
Wang and Zhang [25]2024Deep
Learning
Proposed ALIC model to extract relationships between outbreaks and meteorological variables.Validated the model’s effectiveness in pest forecasting.
He et al. [26]2023Deep
Learning
Used attention-based LSTM for rice plant hopper population dynamics.Significant meteorological impacts observed in pest populations.
Chen et al. [27]2022Deep
Learning
Established LSTM and ML models to study pest–environment correlations.Found significant correlations between pest dynamics and environmental factors.
Table 2. The experimental settings for algorithm parameters and their values.
Table 2. The experimental settings for algorithm parameters and their values.
ParameterValue
Iterations100
Batch size128
Learning rate0.001
Number of LSTM neurons128
Table 3. Temperature collection of sensors at different positions in planting region A.
Table 3. Temperature collection of sensors at different positions in planting region A.
No.Given ValueA1A2A3A4A5A6A7
125.624.725.325.420.425.625.826.3
227.526.626.927.521.227.927.527.8
328.828.328.428.735.328.928.928.7
429.229.528.929.118.130.229.229.5
529.929.329.729.719.930.130.629.7
632.431.832.232.342.432.632.532.3
734.933.934.634.823.135.135.535.3
836.736.036.436.527.536.937.236.6
Table 4. Gi value measured by the homogeneous sensor.
Table 4. Gi value measured by the homogeneous sensor.
No.G1G2G3G4G5G6G7
10.0460.2790.3332.3750.4410.5490.820
20.0520.1880.4612.4020.6430.4610.597
30.5560.5140.3852.4390.3000.3000.385
40.4310.2800.3312.4380.6080.3560.431
50.2490.3630.3632.4350.4770.6200.363
60.5440.4310.4032.4440.3180.3460.403
70.1720.3410.3892.4330.4620.5580.510
80.2190.3390.3702.4370.4940.5880.401
Table 5. Variance and weight assignment of different temperature and humidity sensor measurements in area A.
Table 5. Variance and weight assignment of different temperature and humidity sensor measurements in area A.
Nodes Variance   D x n Weight   w n
A13.73650.1681
A23.81350.1647
A33.75270.1674
A53.75600.1672
A63.92720.1600
A73.63900.1726
Table 6. The fusion results obtained by using the weighted fusion algorithm.
Table 6. The fusion results obtained by using the weighted fusion algorithm.
No.Given ValueFused ValueAbsolute Error
125.625.50.1
227.527.40.1
328.828.70.1
429.229.30.1
529.929.90
632.432.30.1
734.934.90
836.736.60.1
Table 7. The data on the correlation between the number of aphids and the aphid strain rate in the climate chamber.
Table 7. The data on the correlation between the number of aphids and the aphid strain rate in the climate chamber.
Variable12345...k − 1k
y1143987103119...408537
y22.75.310.015.718.3...39.045.3
x135.815.320.423.325.9...27.329.5
x230.313.916.417.117.3...26.127.6
x332.516.417.818.519.1...27.028.1
x464.332.134.638.242.3...49.755.3
x560.226.831.233.936.5...47.247.3
x663.229.133.134.737.2...46.049.9
x749237410998013,25515,653...18,76019,030
x837856300803012,92513,092...17,52317,400
x942007037854113,01014,322...18,44018,535
x10251338343330405...386459
x11220278291283378...349415
x12233293326305394...356436
Table 8. Results of model training under different activation functions.
Table 8. Results of model training under different activation functions.
Activation FunctionRMSELoss
LeakyReLu2.0422.085
clippedReLu1.5471.197
swish1.6271.324
ReLu1.5031.130
Table 9. Training results of different models for aphid prediction.
Table 9. Training results of different models for aphid prediction.
ModelRMSE
1D CNN-LSTM1.503
1D CNN8.394
LSTM9.016
BP35.483
Table 10. The predicted performance results of different models on the test set.
Table 10. The predicted performance results of different models on the test set.
Number of AphidsAphid Strain Rate
ModelRMSEMAE R 2   R M S E M A E R 2
1D CNN-LSTM1.3780.9000.9990.3370.2600.999
1D CNN7.1555.8000.9943.3672.6500.931
LSTM11.6318.7000.9932.5431.9600.961
BP22.57617.9000.97910.9158.300.1745
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Wang, L.; Dai, M.; Zhang, Y.; Su, W.; Wang, M.; Miao, H. Two-Level Distributed Multi-Source Information Fusion Model for Aphid Monitoring and Forecasting in the Greenhouse. Agronomy 2025, 15, 1044. https://doi.org/10.3390/agronomy15051044

AMA Style

Li X, Wang L, Dai M, Zhang Y, Su W, Wang M, Miao H. Two-Level Distributed Multi-Source Information Fusion Model for Aphid Monitoring and Forecasting in the Greenhouse. Agronomy. 2025; 15(5):1044. https://doi.org/10.3390/agronomy15051044

Chicago/Turabian Style

Li, Xiaoyin, Lixing Wang, Min Dai, Yongji Zhang, Wei Su, Mingyou Wang, and Hong Miao. 2025. "Two-Level Distributed Multi-Source Information Fusion Model for Aphid Monitoring and Forecasting in the Greenhouse" Agronomy 15, no. 5: 1044. https://doi.org/10.3390/agronomy15051044

APA Style

Li, X., Wang, L., Dai, M., Zhang, Y., Su, W., Wang, M., & Miao, H. (2025). Two-Level Distributed Multi-Source Information Fusion Model for Aphid Monitoring and Forecasting in the Greenhouse. Agronomy, 15(5), 1044. https://doi.org/10.3390/agronomy15051044

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop