Enhancing Corn Yield Prediction in Iowa: A Concatenate-Based 2D-CNN-BILSTM Model with Integration of Sentinel-1/2 and SoilGRIDs Data

: Ensuring food security in precision agriculture demands early prediction of corn yield in the USA at international, regional, and local levels. Accurate corn yield estimation can play a crucial role in averting famine by offering insights into food availability during the growing season. To address this, we propose a Concatenate-based 2D-CNN-BILSTM model that integrates Sentinel-1, Sentinel-2, and Soil-GRIDs (global gridded soil information) data for corn yield estimation in Iowa State from 2018 to 2021. This approach utilizes Sentinel-2 features, including spectral bands (Blue, Green, Red, Red Edge 1/2/3, NIR, n-NIR, and SWIR 1/2), and vegetation indices (NDVI, LSWI, DVI, RVI, WDRVI, SAVI, VARIGREEN, and GNDVI), alongside Sentinel 1 features (VV, VH, difference VV, and VH, and RVI), and Soil data (Silt, Clay, Sand, CEC, and pH) as initial inputs. To extract high-level features from this data each month, a dedicated 2D-CNN was designed. This 2D-CNN concatenates high-level features from the previous month with low-level features of the subsequent month, serving as input features for the model. Additionally, to incorporate single-time soil data features, another 2D-CNN was implemented. Finally, high-level features from soil, Sentinel-1, and Sentinel-2 data were concatenated and fed into a BILSTM layer for accurate corn yield prediction. Comparative analysis against random forest (RF), Concatenate-based 2D-CNN, and 2D-CNN models, using metrics like RMSE, MAE, MAPE, and the Index of Agreement, revealed the superiority of our model. It achieved an Index of Agreement of 84.67% with an RMSE of 0.698 t/ha. The Concate-nate-based 2D-CNN model also performed well with an RMSE of 0.799 t/ha and an Index of Agreement of 72.71%. The 2D-CNN model followed closely with an RMSE of 0.834 t/ha and an Index of Agreement of 69.90%. In contrast, the RF model lagged with an RMSE of 1.073 t/ha and an Index of Agreement of 69.60%. Integration of Sentinel 1 – 2 and Soil-GRIDs data with the Concatenate-based 2D-CNN-BILSTM model significantly improved accuracy. Combining soil data with Sentinel 1 – 2 features reduced RMSE by 16 kg and increased the Index of Agreement by 2.59%. This study high-lighted the potential of advanced Machine Learning (ML)/Deep Learning (DL) models in achieving precise and reliable predictions, which could support sustainable agricultural practices and food security initiatives.


Introduction
Corn is a highly significant crop in the United States (U.S.) due to its abundance of protein, oil, and its high water consumption [1,2].As the largest corn producer globally, the U.S. recorded a corn production of 15.1 billion bushels in 2021 (https://www.nass.usda.gov/Newsroom/2022/01-12-2022.php,accessed on 12 January 2022).With the rapid increase in population, the use of Remote Sensing (RS) technology in agriculture has become of paramount economic importance.By using multisensor RS images and soil and weather data, researchers can accurately predict crop yield.Optical RS data, obtained from satellites such as Sentinel-2 and Landsat-8, provide high/moderateresolution imagery in visible and near-infrared bands.This enables precise assessment of vegetation health through indices like NDVI, while also offering detailed information on land cover for accurate identification of crop types, including corn [3].Additionally, optical RS data allow the monitoring of phenological changes, facilitating the tracking of crop growth stages and overall health [4].On the other hand, SAR (Synthetic Aperture Radar) images facilitate the structural analysis of vegetation by using different polarizations and microwave frequencies.Additionally, SAR signals penetrates cloud cover, providing valuable insights into the physical structure of crops [5].The applications of RS in agriculture are diverse and encompass product and irrigation management, predicting crop performance, disease and fertilizer management, as well as crop classification, among other factors [3].However, the effectiveness of these applications hinges on various factors, including temperature, rainfall, growth indicators, soil type, genotype structure, management practices, and nutrient elements [6].Additionally, radiometric distortions have the potential to adversely affect the spectral bands of optical RS images [7].To mitigate these challenges and enhance the accuracy of yield predictions, a multi-faceted approach is recommended.This involves integrating RS data with advanced machine learning (ML) models and employing data fusion techniques.
For example, Ma et al. suggested using the Bayesian neural network to estimate corn yield using MODIS images, GLDAS dataset, PRISM dataset, and SSURGO at the county level in the United States between 2005 and 2019 [8].Desloires et al. introduced a stack of machine learning techniques, namely RF, SVR, XG-Boost, and MLP, to predict corn yield based on Sentinel-2 images captured at field scale in Iowa and Nebraska from 2017 to 2021 [9].Khaki et al. proposed the Deep-Corn network for enhancing crop yield at the field scale by counting corn kernels, which used a shortened VGG-16 for feature extraction at different scales [10].Shah-Hosseini et al. developed the Stacked LASSO method for predicting corn yield in Illinois, Indiana, and Iowa between 2000 and 2018 using observed corn yield, management data, plant population, planting date, and environmental features (weather and soil) [11].Shah-Hosseini et al. also proposed a new CNN-DNN method for estimating corn using historical management, environmental, and yield data in the United States from 1980 to 2019 [12].San et al. suggested using the CNN-RNN method for predicting corn yield using MODIS images, weather data, and soil features to extract multi-level spatiotemporal features at the county level from 2013 to 2016 [13].Dhaliwal et al. proposed the Random Forest model for predicting corn yield using crop management data, weather data, and field-level data in the United States between 1992 and 2018 [14].Shah-Hosseini et al. suggested combining the APSIM model with machine learning methods using plant population, planting date, and weather data for estimating corn yield in the United States between 1984 and 2018 [15].
Recent studies have demonstrated satisfactory outcomes in estimating crop yield [11,[16][17][18][19]].However, they have given less consideration to the combination of radar and optical images, along with soil data for corn yield prediction.Moreover, most of the studies used CNN-LSTM for feature extraction, and they have not fully explored the benefits of combining high-level features from the previous month with low-level features from the subsequent month to improve corn yield [20].Also, they have mostly employed Long Short-Term Memory (LSTM) networks for yield prediction, overlooking the potential of Bidirectional-LSTM (Bi-LSTM) networks which can integrate both past and future information to enhance corn yield forecasting [21].
In response to these limitations, we introduce a novel Concatenate-based 2D-CNN-BiLSTM model for corn yield estimation at the county level in Iowa.Leveraging Sentinel-1 and Sentinel-2 images along with Soil GRIDS, which provide global gridded soil information, our model aims to enhance performance during the growing season.This model offers an innovative approach to feature integration, effectively capturing short-term fluctuations and long-term trends in corn growth patterns.Additionally, the incorporation of Soil GRIDS data provides crucial insights into soil characteristics, augmenting the model's capacity to account for diverse soil conditions.

Study Area
The study area was located in the state of Iowa in the U.S. (see Figure 1).The research was conducted on corn during the years 2018 to 2021.Corn is planted in Iowa when the soil is warm enough for the seeds to grow, but not too early in order to avoid frost damage.The timing varies depending on the location, with southern counties planting as early as April and northern counties waiting until several weeks later.Farmers in Iowa typically begin harvesting corn in mid-September, with the majority of the harvest taking place in October.However, in cooler years, the harvest may not take place until November (https://www.iowacorn.org/education/faqs,accessed on 15 January 2022).
Environ.Sci.Proc.2024, 29, 2 3 of 9 1 and Sentinel-2 images along with Soil GRIDS, which provide global gridded soil information, our model aims to enhance performance during the growing season.This model offers an innovative approach to feature integration, effectively capturing short-term fluctuations and long-term trends in corn growth patterns.Additionally, the incorporation of Soil GRIDS data provides crucial insights into soil characteristics, augmenting the model's capacity to account for diverse soil conditions.

Study Area
The study area was located in the state of Iowa in the U.S. (see Figure 1).The research was conducted on corn during the years 2018 to 2021.Corn is planted in Iowa when the soil is warm enough for the seeds to grow, but not too early in order to avoid frost damage.The timing varies depending on the location, with southern counties planting as early as April and northern counties waiting until several weeks later.Farmers in Iowa typically begin harvesting corn in mid-September, with the majority of the harvest taking place in October.However, in cooler years, the harvest may not take place until November (https://www.iowacorn.org/education/faqs,accessed on 15 January 2022).

Methodology
The aim of the proposed method is to improve the corn yields prediction accuracy at the county level in Iowa during the growing season prior to the harvest during the month of August.As displayed in Figure 2, our proposed method includes two main steps: 1-extracting features derived from Sentinel-1, Sentinel-2, and Soil GRIDS in the GEE system, and 2-using the proposed Concatenate-based 2D-CNN-BiLSTM model to predict corn yield.The details of each step have been briefly explained in the following subsections.

Methodology
The aim of the proposed method is to improve the corn yields prediction accuracy at the county level in Iowa during the growing season prior to the harvest during the month of August.As displayed in Figure 2, our proposed method includes two main steps: 1extracting features derived from Sentinel-1, Sentinel-2, and Soil GRIDS in the GEE system, and 2-using the proposed Concatenate-based 2D-CNN-BiLSTM model to predict corn yield.The details of each step have been briefly explained in the following subsections.

Corn Yields Prediction Using the Concatenate-Based 2D-CNN-BiLSTM Model
As prediction of corn yield is so challenging, improvement of the advanced and novel deep learning model for accurately predicting corn yield is important.In this way we proposed the Concatenate-based 2D-CNN-BiLSTM model which have two main parts (see Figure 3) including feature extraction using a 2D-CNN network, and corn yield prediction using a Bi-LSTM network.The 2D-CNN network extracts high-level spatial features from

Corn Yields Prediction Using the Concatenate-Based 2D-CNN-BiLSTM Model
As prediction of corn yield is so challenging, improvement of the advanced and novel deep learning model for accurately predicting corn yield is important.In this way we proposed the Concatenate-based 2D-CNN-BiLSTM model which have two main parts (see Figure 3) including feature extraction using a 2D-CNN network, and corn yield prediction using a Bi-LSTM network.The 2D-CNN network extracts high-level spatial features from input data and concatenates them with low-level features from subsequent months [35].Additionally, a separate 2D-CNN network was created to incorporate single-time soil data features.Finally, the high-level features from soil, Sentinel-1, and Sentinel-2 data were concatenated and fed into a Bi-LSTM layer to accurately predict corn yield.The Bi-LSTM layer is able to overcome significant time lags between inputs across any time period and enhance its ability to represent temporal patterns at different frequencies using backward and forward information [21].This makes it particularly advantageous for analyzing crop growth cycles of varying durations.Monthly Block consists of Conv2D-1 > Linear activation function > Concatenate layer > Conv2D-2 > Linear activation function.Monthly composites (X Monthly ) pass through the Monthly Block, and Monthly features (F Monthly ) are extracted.In addition, Soil Block consists of Conv2D-1 > Linear activation function.Soil features pass through the Soil Block, and soil features (S) are then extracted.F Monthly and S are then concatenated together and fed into Bi-LSTM layer with a ReLU activation function to predict corn yield.Finally, the output of the Bi-LSTM layer passes through a dense layer with a linear activation function to obtain yield values.
input data and concatenates them with low-level features from subsequent months [35].Additionally, a separate 2D-CNN network was created to incorporate single-time soil data features.Finally, the high-level features from soil, Sentinel-1, and Sentinel-2 data were concatenated and fed into a Bi-LSTM layer to accurately predict corn yield.The Bi-LSTM layer is able to overcome significant time lags between inputs across any time period and enhance its ability to represent temporal patterns at different frequencies using backward and forward information [21].This makes it particularly advantageous for analyzing crop growth cycles of varying durations.Monthly Block consists of Conv2D-1 > Linear activation function > Concatenate layer > Conv2D-2 > Linear activation function.Monthly composites (XMonthly) pass through the Monthly Block, and Monthly features (FMonthly) are extracted.In addition, Soil Block consists of Conv2D-1 > Linear activation function.Soil features pass through the Soil Block, and soil features (S) are then extracted.FMonthly and S are then concatenated together and fed into Bi-LSTM layer with a ReLU activation function to predict corn yield.Finally, the output of the Bi-LSTM layer passes through a dense layer with a linear activation function to obtain yield values.
Overall, our Concatenate-based 2D-CNN-BiLSTM model is a promising approach for accurately predicting corn yield by incorporating various data sources and effectively capturing temporal patterns.

Results and Discussion
For this study, a total of 250, 27, and 83 samples were selected for training, validation, and testing of the Concatenate-based 2D-CNN-BiLSTM model, respectively.The Conv2D-1, and Conv2D-2 layers were set to have 16 and 22 filters, respectively, with a kernel size of 1 × 1.The Bi-LSTM layer had 16 filters.The model was trained using the Adam optimizer for 30 epochs with a batch size of 10.The best weight was obtained based on the minimum Validation Loss.The performance of the proposed model was compared with Concatenate-based 2D-CNN, 2D-CNN, and RF in two scenarios: (1) using Sentinel-1 and -2 data, and (2) using both Sentinel-1 and -2 data along with Soil Grids.Table 2 displays the performance of the proposed models and the compared models, measured in terms of RMSE, MAPE, MAE, RRMSE, and Index of Agreement (D).Overall, our Concatenate-based 2D-CNN-BiLSTM model is a promising approach for accurately predicting corn yield by incorporating various data sources and effectively capturing temporal patterns.

Results and Discussion
For this study, a total of 250, 27, and 83 samples were selected for training, validation, and testing of the Concatenate-based 2D-CNN-BiLSTM model, respectively.The Conv2D-1, and Conv2D-2 layers were set to have 16 and 22 filters, respectively, with a kernel size of 1 × 1.The Bi-LSTM layer had 16 filters.The model was trained using the Adam optimizer for 30 epochs with a batch size of 10.The best weight was obtained based on the minimum Validation Loss.The performance of the proposed model was compared with Concatenatebased 2D-CNN, 2D-CNN, and RF in two scenarios: (1) using Sentinel-1 and -2 data, and (2) using both Sentinel-1 and -2 data along with Soil Grids.Table 2 displays the performance of the proposed models and the compared models, measured in terms of RMSE, MAPE, MAE, RRMSE, and Index of Agreement (D).
Table 2 reveals that the Concatenate-Based 2D-CNN-BiLSTM model outperforms the Concatenate-based 2D-CNN, 2D-CNN, and RF methods significantly.The best performance of the Concatenate-based 2D-CNN-BiLSTM model is achieved when combining Sentinel-1 and -2 and Soil GRIDS, with an RMSE of 0.698 (t/ha), MAPE of 4.47%, MAE of 0.556 (t/ha), RRMSE of 5.55%, and D of 84.67%.Our proposed model improves D by 14.77% compared to the 2D-CNN.
Figure 4 depicts the scatter plots of predicted yield versus observed yield between our proposed method and compared methods in 2021.The scatter plots demonstrate that the fit line is close to the diagonal line in the Concatenate-based 2D-CNN model and far away from it in the RF model.Additionally, Figure 5 illustrates that the proposed model outperforms the compared models, resulting in a reduction in Error maps and generation of a brighter error map.This confirms the efficacy of utilizing Soil Grids data for yield estimation.A visual representation of the distribution of the corn yield value is presented in Figure 6, which compares the USDA yield with the predicted yield obtained using our proposed method.The results displayed in Figure 6 indicate a significant level of agreement 4. The scatter plots of predicted yield versus observed yield between our proposed method and compared methods in 2021.
Additionally, Figure 5 illustrates that the proposed model outperforms the compared models, resulting in a reduction in Error maps and generation of a brighter error map.This confirms the efficacy of utilizing Soil Grids data for yield estimation.Additionally, Figure 5 illustrates that the proposed model outperforms the compared models, resulting in a reduction in Error maps and generation of a brighter error map.This confirms the efficacy of utilizing Soil Grids data for yield estimation.A visual representation of the distribution of the corn yield value is presented in Figure 6, which compares the USDA yield with the predicted yield obtained using our proposed method.The results displayed in Figure 6 indicate a significant level of agreement A visual representation of the distribution of the corn yield value is presented in Figure 6, which compares the USDA yield with the predicted yield obtained using our proposed method.The results displayed in Figure 6 indicate a significant level of agreement between the observed and predicted corn yield, thereby reinforcing the reliability and accuracy of our proposed method's predictions.between the observed and predicted corn yield, thereby reinforcing the reliability and accuracy of our proposed method's predictions.

Conclusions
Forecasting corn yield is a crucial aspect of agriculture management in IOWA.Recent studies have demonstrated that remote sensing, soil data, and deep learning methods are effective when it comes to estimating corn yield.In order to accurately predict corn yield, it is important to consider both temporal and spatial features.To achieve this, we propose a novel Concatenate-Based 2D-CNN-BiLSTM model that extracts both spatial and temporal features.The CNNs extract spatial features while Bi-LSTM extracts temporal features.The inputs for our model include remote sensing data (Sentinel-1 and -2) and Soil GRIDS data.We conducted experiments with the proposed model on Iowa corn from 2018 to 2021 at the county level.Our results demonstrate the effectiveness and advantages of our approach compared to other methods.By considering both spatial and temporal features, our model is able to accurately forecast corn yield, which can aid in making informed decisions for agriculture management in Iowa.

Conclusions
Forecasting corn yield is a crucial aspect of agriculture management in IOWA.Recent studies have demonstrated that remote sensing, soil data, and deep learning methods are effective when it comes to estimating corn yield.In order to accurately predict corn yield, it is important to consider both temporal and spatial features.To achieve this, we propose a novel Concatenate-Based 2D-CNN-BiLSTM model that extracts both spatial and temporal features.The CNNs extract spatial features while Bi-LSTM extracts temporal features.The inputs for our model include remote sensing data (Sentinel-1 and -2) and Soil GRIDS data.We conducted experiments with the proposed model on Iowa corn from 2018 to 2021 at the county level.Our results demonstrate the effectiveness and advantages of our approach compared to other methods.By considering both spatial and temporal features, our model is able to accurately forecast corn yield, which can aid in making informed decisions for agriculture management in Iowa.

Figure 2 .
Figure 2. Flowchart of the proposed method.

Figure 2 .
Figure 2. Flowchart of the proposed method.

Figure 4
depicts the scatter plots of predicted yield versus observed yield between our proposed method and compared methods in 2021.The scatter plots demonstrate that the fit line is close to the diagonal line in the Concatenate-based 2D-CNN model and far away from it in the RF model.

Figure 4 .
Figure 4.The scatter plots of predicted yield versus observed yield between our proposed method and compared methods in 2021.

Figure 5 .
Figure 5. Error maps generated by the proposed and compared models in 2021.

Figure 4
depicts the scatter plots of predicted yield versus observed yield between our proposed method and compared methods in 2021.The scatter plots demonstrate that the fit line is close to the diagonal line in the Concatenate-based 2D-CNN model and far away from it in the RF model.

Figure 4 .
Figure 4.The scatter plots of predicted yield versus observed yield between our proposed method and compared methods in 2021.

Figure 5 .
Figure 5. Error maps generated by the proposed and compared models in 2021.

Figure 5 .
Figure 5. Error maps generated by the proposed and compared models in 2021.

Figure 6 .
Figure 6.Map of USDA corn yield and predicted corn yield in 2021.

Figure 6 .
Figure 6.Map of USDA corn yield and predicted corn yield in 2021.

Table 1 .
Sample plot yield statistics for year in study area.

Table 1 .
Sample plot yield statistics for year in study area.

Table 2 .
Comparison of performance of proposed Concatenate-based 2D-CNN-BILSTM Model versus other considered methods for corn yield prediction.

Table 2 .
Comparison of performance of proposed Concatenate-based 2D-CNN-BILSTM Model versus other considered methods for corn yield prediction.

Table 2
reveals that the Concatenate-Based 2D-CNN-BiLSTM model outperforms the Concatenate-based 2D-CNN, 2D-CNN, and RF methods significantly.The best performance of the Concatenate-based 2D-CNN-BiLSTM model is achieved when combining Sentinel-1 and -2 and Soil GRIDS, with an RMSE of 0.698 (t/ha), MAPE of 4.47%, MAE of 0.556 (t/ha), RRMSE of 5.55%, and D of 84.67%.Our proposed model improves D by 14.77% compared to the 2D-CNN.

Table 2
reveals that the Concatenate-Based 2D-CNN-BiLSTM model outperforms the Concatenate-based 2D-CNN, 2D-CNN, and RF methods significantly.The best performance of the Concatenate-based 2D-CNN-BiLSTM model is achieved when combining Sentinel-1 and -2 and Soil GRIDS, with an RMSE of 0.698 (t/ha), MAPE of 4.47%, MAE of 0.556 (t/ha), RRMSE of 5.55%, and D of 84.67%.Our proposed model improves D by 14.77% compared to the 2D-CNN.