Prediction of Wheat Grain Protein by Coupling Multisource Remote Sensing Imagery and ECMWF Data

: Industrialization production with high quality and e ﬀ ect on winter is an important measure for accelerating the shift from increasing agricultural production to improving quality in terms of grain protein content (GPC). Remote sensing technology achieved the GPC prediction. However, large deviations in interannual expansion and regional transfer still exist. The present experiment was carried out in wheat producing areas of Beijing (BJ), Renqiu (RQ), Quzhou, and Jinzhou in Hebei Province. First, the spectral consistency of Landsat 8 Operational Land Imager (LS8) and RapidEye (RE) was compared with Sentinel-2 (S2) satellites at the same ground point in the same period. The GPC prediction model was constructed by coupling the vegetation index with the meteorological data obtained by the European Center for Medium-range Weather Forecasts using hierarchical linear model (HLM) method. The prediction and spatial expansion of regional GPC were validated. Results were as follows: (1) Spectral information calculated from S2 imagery were highly consistent with LS8 (R 2 = 1.00) and RE (R 2 = 0.99) imagery, which could be jointly used for GPC modeling. (2) The predicted GPC by using the HLM method (R 2 = 0.524) demonstrated higher accuracy than the empirical linear model (R 2 = 0.286) and showed higher improvements across inter-annual and regional scales. (3) The GPC prediction results of the veriﬁcation samples in RQ, BJ, Xiaotangshan (XTS) in 2018, and XTS in 2019 were ideal with root mean square errors of 0.61%, 1.13%, 0.91%, and 0.38%, and relative root mean square error of 4.11%, 6.83%, 6.41%, and 2.58%, respectively. This study has great application potential for regional and inter-annual quality prediction.


Introduction
The market demand for wheat quality is receiving increasing attention because of the continuous improvement of the living standards of people. Grain protein content (GPC) is an important standard of wheat quality and directly restricts the utilization and commercial value of wheat [1]. Monitoring and forecasting GPC are of great significance for guiding farmers and enterprises in wheat harvesting and cultivation [2]. The laboratory detection and analytical method are accurate; however, their destructive multipoint sampling is time consuming and labor intensive, their detection cost is high, and their representativeness is poor [3]. The remote sensing (RS) technology with certain advantages, such as real-time, fast, and non-destructive, had been widely used in the acquisition of crop production information. Researchers had realized the instantaneous and large-scale monitoring of wheat growth status and environment through RS data.
Many studies on RS monitoring of GPC have been demonstrated in the past decade and directly or indirectly constructed a series of models [4,5]. In the early stages of the GPC monitoring development, researchers attempted to construct an empirical model between RS information and GPC directly. Hansen et al. [6] screened sensitive bands to construct a partial least square model for yield and protein content, which could estimate yield effectively. Such authors proved that determining the protein content through spectral reflectance was difficult. Li et al. [7] and Xue et al. [3] pointed out that the canopy ratio vegetation index (RVI) at flowering stage could predict the GPC of wheat at maturity. Rodrigues et al. [8] constructed normalized difference spectral index and ratio spectral index based on any combination of two wavebands, and successfully predicted GPC. Zhou et al. [9] showed that GPC monitoring model based on normalized difference vegetation index (NDVI) and difference vegetation index (DVI) at booting stage of rice had better accuracy. Pettersson et al. [10] used three variables, namely, apparent soil conductivity, canopy vegetation index (transformed chlorophyll absorption in reflectance index (TCARI)/optimized soil adjusted vegetation index), and thermal stress during the filling stage, to establish a multivariate linear model of barley yield and quality with good accuracy. Zhang et al. [11] established the GPC monitoring model of winter wheat by using the continuous projection algorithm + support vector machine, with good accuracy and root mean square error (RMSE) of 0.3587. Another way was to find intermediate variables to indirectly predict GPC. The researchers found that nitrogen is closely related to protein. It was a feasible and effective way to predict GPC indirectly by retrieving nitrogen from remote sensing information. Zhao et al. [12] built a prediction model of GPC based on the RVI of winter wheat flowering period and plant nitrogen content as the intermediate bridge, and optimized the model by using the reinforcement coefficient. Cox et al. [13] and Liu et al. [14] indirectly constructed a GPC monitoring model by searching for nitrogen-related intermediate variables, such as nitrogen nutrient index (NNI) and leaf nitrogen content (LNC), associated with the close relationship between GPC and nitrogen uptake and transport in the early crop stage. Li et al. [7] established the GPC model by coupling good spectral parameters with LNC and began the grain protein prediction. Huang et al. [15] used total nitrogen content as a link parameter between spectral characteristics and GPC. Chen et al. [16] introduced NNI, which reflected the nitrogen nutrition status and estimated the protein content of crops based on NNI and RS parameters. Although these models and methods had successfully monitored GPC via RS, most of them were mainly empirical and semi-empirical models, and the application was difficult to appropriate for new regions or inter-annual. In recent years, researchers had realized GPC monitoring by coupling RS and crop growth model, which has strong mechanisms. Orlando et al. [17] realized GPC monitoring at field scale by using RS data and CERES-wheat model. However, grain protein exhibited an extremely complex trait, which was affected by both environmental conditions and genotype [18]. Limitations in GPC monitoring at inter-annual expansion and spatial transfer by only using RS information are observed.
Various researchers attempted to correct the GPC monitoring model by considering environmental factors to solve the problem of poor inter-annual expansion and spatial transfer of GPC. Guasconi et al. [19] and Orlandini et al. [20] proved the feasibility of using effective temperature and precipitation from November to June and NDVI to monitor large-scale grain protein. They emphasized that GPC is inversely proportional to precipitation and directly proportional to temperature. Li et al. [21] established a GPC prediction model by combining the effects of temperature and soil conditions on the formation of GPC during wheat grain filling. Li et al. [22] considered the effect of temperature on grain nitrogen transport in the simplified model of nitrogen transport mechanism coupled with the remote sensing inversion model of LNC and realized GPC prediction. These models and methods considered the impact of environmental factors on GPC, and improved the problem of poor interannual expansion and spatial transfer. However, the model was relatively simple and was constructed by the independent variables of a multilinear regression model along with several environmental factors.
Wheat quality is mainly the result of the cross action of genotype, environmental factors, and management conditions [19]. With regard to environmental data selection, the European Center for Medium-range Weather Forecasts (ECMWF) was developed rapidly and was the main reference tool for front-line weather forecasters. Researchers had conducted extensive work on the research and analysis of ECMWF [23,24]. Alpert et al. [25] used the ECMWF data to conduct climate analysis of Mediterranean cyclones. Wang et al. [26] proved the reliability of ECMWF for rainfall forecast in Guang'An area of China. However, the application of ECMWF in the field of RS quality prediction was rare. In the selection of key growth stages, many models were based on the data of flowering or filling stages [5,27]. Li et al. [28] screened standardized leaf area index determining index and Modified chlorophyll absorption ratio index/Modification of the triangle vegetation index 2 combined with meteorological data at flowering stage and successfully predicted yield and GPC. Pettersson et al. [29] used TCARI to predict GPC at seeding time in Sweden. Song et al. [30] analyzed the correlation between GPC and green normalized difference vegetation index (GNDVI) calculated from QuickBird data of the flag leaf stage of winter wheat. These authors also constructed a GNDVI model for GPC monitoring. At present, the selection of high-quality wheat varieties in foreign countries is concentrated on high-gluten and low-gluten wheat. By contrast, domestic high-quality wheat improvement had better characteristics and more advantages in medium-strong gluten. Researchers conducted substantial research on gluten in agriculture. Li et al. [31] analyzed the relationship between glutenin macropolyer contents and high-molecular weight glutenin subunits with gluten content in wheat for different practical uses. Guo et al. [32] discussed the differences of qualitative characteristics of three gluten wheat cultivars under different environments. Around the application of agricultural RS, the feasibility study on inversing GPC by correlating RS and gluten data needs further exploration.
The above-mentioned conventional linear regression model could only analyze problems that involve one layer of data; moreover, the hierarchical linear model (HLM) could effectively solve the problem wherein the traditional linear model has certain limitations in the analysis of multilayer data [33]. HLM had been widely used in many fields with nesting problems (e.g., society, life, and earth science). Wallach et al. [34] developed HLM based on nonlinear response functions to optimize the fertilizer strategy. Li et al. [1] estimated GPC by using the HLM that considered the environmental variations yielded higher accuracy (R 2 = 0.58 and root mean square error (RMSE) = 1.21%) than the linear model (R 2 = 0.13 and RMSE = 1.73%).
The effective utilization of multisource RS data is a current research focus. Shan et al. [35] realized the dynamic monitoring of the ecological environmental quality of land consolidation using multisource RS data. Ma et al. [36] completed the monitoring and evaluation of the cultivated land quality supported by multisource RS. Satellites used in agricultural RS included Landsat, RapidEye, and Sentinel imageries. These satellites could complement one another to a certain extent and improve the spatial and temporal continuity of data. Researchers had successfully applied multiple satellite sensors in the monitoring leaf area index, plant nitrogen, and yield at different locations [37,38]; however, they had hardly tested the consistency of these data. Consequently, whether the pre-processed multisource satellite data could be used for the simultaneous inversion of agronomic parameters remains to be further verified.
In summary, considering the interaction between genotype and environment, few multilayer nested models that integrate multisource RS data for GPC interaction were available. This study focused on the following issues based on the ground and multisource satellite data of different years and regions: (1) The consistency of data from different satellite sensors was discussed; (2) wheat GPC prediction based on HLM was constructed by considering the interaction among environmental, genotype, and RS monitoring; and (3) the wheat quality classification with GPC was mapped at regional scale.

Research Area
The research areas are located in Xiaotangshan National Precision Agriculture Demonstration base in Changping District and suburbs of Beijing (BJ: 40 • 10 31"-40 • (Figure 1). These areas are located in the north of the North China Plain, with a typical warm temperate monsoon climate, with obvious changes in the four seasons. There is strong evapotranspiration in spring, high temperature and frequent waterlogging in summer, and dry, windy, and cold winter. The annual average temperature and rainfall in BJ, RQ, JZ, and QZ are 11.8 • C and 550.3 mm, 12.1 • C and 557.4 mm, 12.5 • C and 554.6 mm, and 13.1 • C and 556.2 mm, respectively.
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 21 genotype, and RS monitoring; and (3) the wheat quality classification with GPC was mapped at regional scale.

Experimental Design
A total of experiments was carried out at different regions and years. The experimental (Exp.) variables included different cultivars, irrigation rates, and application levels of nitrogen fertilizer (Table 1). Other field practices include weed and pest management, phosphate and potassium fertilizer, which were followed with local management procedures.
Exp. 1 was designed in Xiaotangshan (XTS) from 2013-2014 as an orthogonal design with two wheat cultivars (J9843 and ZM175), three irrigation rates (0, 146, and 292 mm), and four nitrogen fertilizer application rates (0, 90, 180, and 360 kg/hm 2 ). A total of 16 plots were designed for modeling in this experiment (Figure 2a). The image data used in wheat flowering period were RapidEye (RE) satellite data.
Exp. 2 was designed with 26 samples for modeling in XTS during 2014-2015. The wheat variety was Jingdong22, and the amount of fertilizer applied was 90 kg/hm 2 . The image data used in wheat flowering period were Landsat 8 Operational Land Imager (LS8) satellite data.
Exp. 3 was a completely randomized experiment designed in XTS during 2016-2017 for modeling with two wheat varieties (LX167 and JD18), four nitrogen fertilizer applications (0, 90, 180, and 270 kg/hm 2 ), and three repetitions ( Figure 2b). The image data used in wheat flowering period was RE satellite data.
Exp. 4 was a completely randomized experiment designed in XTS during 2017-2018 for modeling with two wheat varieties (LX167 and JD18), four nitrogen fertilizer applications (0, 90, 180, and 270 kg/hm 2 ), and four repetitions (Figure 2c). The image data for the flowering period of wheat were the RE satellite data, which was used for consistency testing in addition to modeling.
Exp. 5 and 6 were designed in XTS during 2017-2018 and 2018-2019, respectively. The above-mentioned experiments were designed with 10 and 15 sample collection areas, and the image data were all Sentinel-2 (S2). S2 image data of both experiments were used for verification, and Exp. 5 was also used for consistency test with RE image data of Exp. 4. Exp. 7-10 was designed during 2017-2018 in BJ, JZ, QZ, and RQ. The above-mentioned experiments were designed with 9, 6, 6, and 10 sample collection areas, and the image data were all S2. Exp. 8 and 9 were used for modeling, Exp. 7 and 10 were used for validation.

Satellite Image Preprocessing
Three RS satellite imageries, including S2, LS8, and RE, were used in this study ( Table 2). A total of 12 images were collected, and a total of 10 images were used in Exp.1-10. The acquisition time was consistent with the antithesis in Table 1. Two more are S2 and LS8 on 28 th April, 2018 for conformance testing, and the spectral reflectance of the images for modeling and verification was extracted on the basis of the ground GPS coordinate information (Table 1). We collected the same day imaging data of different satellites in the same area, and extract the spectral reflectance of ground objects after geographical registration to test the spectral consistency of different satellites. In this study, S2 and LS8 images of XTS were acquired on 28 th April, 2018, and S2 and RE images of XTS were acquired on 8 th May, 2018. We obtained the wheat planting areas of BJ (11,270 ha) and RQ City (22,000 ha) from the National Bureau of Statistics to verify the accuracy of the wheat classification map.
S2 satellite is a multispectral RS imaging mission in the Global Environment and Safety Monitoring System (GMES) launched on 23rd June 2015 and 7th March 2017, respectively. The singlestar revisit period is 10 days, and the A/B double-star shortened the revisit period to 5 days. S2 images in this study were L1C level products downloaded from the ESA website (https://scihub.copernicus.eu/). Radiation calibration and atmospheric correction were performed using Sen2cor (http://link.zhihu.com/?target=http%3A//step.esa.int/main/third-party-plugins-2/sen2cor/), which is a L2A level data processing plug-in released by the ESA.
The LS8 satellite was successfully launched by NASA on 11th February 2013. LS8 data in this study were obtained from the USGS website (http://glovis.usgs.gov/). The imagery data were LIT products and geometrically corrected by terrain data. The image data were radiometrically calibrated and atmospherically corrected using ENVI 5.3 (Harris Geospatial, Broomfield, Colorado, USA) to use the spectral information.
The German RE satellite was launched on 29th August 2008 and consists of five Earth observation satellites. The five satellites are evenly distributed in a sun-synchronous orbit. RE 3A products were downloaded in this study. These products have been orthorectified, and the geographical accuracy of these products can reach 1-2 pixels. Image pre-processing was only atmospherically corrected in ENVI 5.3.
Supervised classification of winter wheat in BJ and RQ City was carried out using the 2017 Winter S2 image. The preprocessed images of the wheat flowering period in the two cities were masked with the classified images. Subsequently, the classified S2 and meteorological data for spatial verification were stacked. The above-mentioned operations were all implemented by ENVI 5.3.

Satellite Image Preprocessing
Three RS satellite imageries, including S2, LS8, and RE, were used in this study (Table 2). A total of 12 images were collected, and a total of 10 images were used in Exp.1-10. The acquisition time was consistent with the antithesis in Table 1. Two more are S2 and LS8 on 28 April 2018 for conformance testing, and the spectral reflectance of the images for modeling and verification was extracted on the basis of the ground GPS coordinate information (Table 1). We collected the same day imaging data of different satellites in the same area, and extract the spectral reflectance of ground objects after geographical registration to test the spectral consistency of different satellites. In this study, S2 and LS8 images of XTS were acquired on 28 April 2018, and S2 and RE images of XTS were acquired on 8 May 2018. We obtained the wheat planting areas of BJ (11,270 ha) and RQ City (22,000 ha) from the National Bureau of Statistics to verify the accuracy of the wheat classification map.
S2 satellite is a multispectral RS imaging mission in the Global Environment and Safety Monitoring System (GMES) launched on 23 June 2015 and 7 March 2017, respectively. The single-star revisit period is 10 days, and the A/B double-star shortened the revisit period to 5 days. S2 images in this study were L1C level products downloaded from the ESA website (https://scihub.copernicus.eu/). Radiation calibration and atmospheric correction were performed using Sen2cor (http://link.zhihu. com/?target=http%3A//step.esa.int/main/third-party-plugins-2/sen2cor/), which is a L2A level data processing plug-in released by the ESA.
The LS8 satellite was successfully launched by NASA on 11 February 2013. LS8 data in this study were obtained from the USGS website (http://glovis.usgs.gov/). The imagery data were LIT products and geometrically corrected by terrain data. The image data were radiometrically calibrated and atmospherically corrected using ENVI 5.3 (Harris Geospatial, Broomfield, CO, USA) to use the spectral information.
The German RE satellite was launched on 29 August 2008 and consists of five Earth observation satellites. The five satellites are evenly distributed in a sun-synchronous orbit. RE 3A products were downloaded in this study. These products have been orthorectified, and the geographical accuracy of these products can reach 1-2 pixels. Image pre-processing was only atmospherically corrected in ENVI 5.3.
Supervised classification of winter wheat in BJ and RQ City was carried out using the 2017 Winter S2 image. The preprocessed images of the wheat flowering period in the two cities were masked with Remote Sens. 2020, 12, 1349 7 of 21 the classified images. Subsequently, the classified S2 and meteorological data for spatial verification were stacked. The above-mentioned operations were all implemented by ENVI 5.3.

Meteorological Data
The meteorological data of each experimental area were obtained from the ECMWF (http: //www.ecmwf.int/). ECMWF provides global scale raster meteorological data with resolutions of 1 • , 0.5 • , 0.25 • , and 0.125 • . The ERA-Interim data at 0.125 • resolution of the meteorological data was used in this work. Such data mainly included the daily rainfall, solar radiation, and minimum and maximum temperatures. The duration of the downloaded meteorological data is one month (30 days) before the imaging time of the image data selected for the flowering period of each growing season. For example, the imaging time of the S2 image used in the flowering period in 2018 is 8th May, and the time period of meteorological data is 8 April to 8 May 2018. The meteorological data were applied to the model in the form of total rainfall, total solar radiation, and accumulated temperature over this period. The main reason for choosing this period is that the time from stem growth to flowering of wheat in the study area is about one month (regardless of year), and the time of one month is practical in order to acquire meteorological data for regional applications. In this study, Python 3.7 (Python Software Foundation, Inc., Portland, USA) was used to download the meteorological data. MATLAB 2014 (MathWorks, Inc., Natick, USA) was used to read and standardized such data. ArcGIS 10.2 (Esri, Inc., Redlands, CA, USA) was used to add coordinate information and projections to raster meteorological data and for resampling.

Grain Protein Content (GPC)
At harvest, 1 m 2 of the winter wheat was collected using the diagonal sampling method to measure the yield and grain quality. Each sample was at least 2 m away from the field edge to avoid the edge effect. After weighing by electronic, the GPC was measured with an Infratec TM 1241 near-infrared grain analyzer (FOSS A/S, Hillerød, Denmark).

Method
The illustrated methodology of the GPC prediction is general in Figure 3. The detailed steps are presented as follows: (1) Data pre-processing: The LS8, RE, and S2 RS image data were preprocessed (e.g., radiation calibration and atmospheric correction). The spectral information that corresponds to the ground point was extracted. Accumulated temperature, cumulative rainfall, and light intensity were calculated from the meteorological data. (2) Consistency analysis: The consistency of the spectral information of the preprocessed LS8 and RE with S2 was compared. (3) Vegetation index (VI) selection: The sensitive VI with GPC was calculated and screened from the RS images as the first input parameter of HLM. (4) GPC prediction by HLM method: The selected VI and gluten type were used as the first input parameters, and the meteorological data were used as the second input parameter to construct the HLM for predicting GPC; (5) GPC map and validation: The validation set was used to validate the HLM in space. The GPC in BJ, XTS, and RQ City in 2018 was predicted, and the thematic map of wheat quality prediction was created.
presented as follows: 1) Data pre-processing: The LS8, RE, and S2 RS image data were preprocessed (e.g., radiation calibration and atmospheric correction). The spectral information that corresponds to the ground point was extracted. Accumulated temperature, cumulative rainfall, and light intensity were calculated from the meteorological data. 2) Consistency analysis: The consistency of the spectral information of the preprocessed LS8 and RE with S2 was compared. 3) Vegetation index (VI) selection: The sensitive VI with GPC was calculated and screened from the RS images as the first input parameter of HLM. 4) GPC prediction by HLM method: The selected VI and gluten type were used as the first input parameters, and the meteorological data were used as the second input parameter to construct the HLM for predicting GPC; 5) GPC map and validation: The validation set was used to validate the HLM in space. The GPC in BJ, XTS, and RQ City in 2018 was predicted, and the thematic map of wheat quality prediction was created.

Vegetation index (VI)
Previous studies indicated that 10 vegetation indices with high correlation with protein were selected ( Table 3). The correlation between each VI and GPC was analyzed. Subsequently, the optimal VI was selected to create the GPC model. The HLM is a least square regression analysis that considers data nested structures (e.g., the students are embedded in classes). However, the HLM can layer and comprehensively analyze the non-independent data, as well as the relationships between the data inside and outside the layer compared with the general least squares regression analysis, within a layer. At present, HLM has been extensive used in the field of education [49], economics [50], and agriculture [1].
The influence of environmental factors (layer 2 model) between regions and growth years on the grain quality will cause difference between RS information and grain quality model (layer 1 model, L1) considering the construction of regional and interannual GPC model. Therefore, HLM is feasible for the prediction and modeling of grain quality. In this study, the layer 1 model is constructed on the basis of the GPC, VI, and grain gluten type. The layer 2 model (L2) is constructed on the basis of the model coefficients in the L1 and the meteorological data, such as temperature (T), precipitation (Pre), radiation quantity (Rad) as follows: where GPC ij is the GPC; Glu and VI ij represent the gluten type of winter wheat (the value of strong, medium strong, and weak gluten were 1, 2, and 3 in this study, respectively), and VI, respectively; β 0j , β 1j , and β 2j correspond to the intercept, the coefficients of gluten type, and VI, respectively; r ij represents the random error; and i is the individual sample size.
L2 : β pj = γ p0 +γ p1 ×Rad j +γ p2 ×Pre j +γ p3 ×T j +µ pj (2) where β pj represents the intercept and the coefficients of gluten type and VI in the first layer; γ p0 is the intercept of the second layer model; γ p1 , γ p2 , and γ p3 are the model coefficients of total radiation, total precipitation, and accumulated temperature in this model, respectively; µ pj represents the random error; p is the number of coefficient terms of L1, corresponding to 0, 1, and 2 in this study; and j is the individual sample size. The models of the winter wheat grain protein were developed by HLM 7.03 Student (www.ssicentral.com/hlm/student.html) software.

Classification Standard of Gluten
The national standard for high quality strong-gluten wheat (GB/T 17892-1999) (https://gb-standards. com/GB_standard/GB-T%2017892-1999.html) is adopted in this study. This standard specifies the definition, classification, quality index, inspection method, inspection rules, packaging, transportation, and storage requirements of wheat. The gluten types were classified as weak gluten (GPC ≤ 12%), medium gluten (12% < GPC ≤ 14%), strong gluten of the 2nd grade (14% < GPC ≤ 15%), and strong gluten of the 1st grade (15% < GPC) according to the standard of GPC in this study.

Statistical Analysis
In this study, the coefficient (R 2 ), RMSE and relative root mean square error (rRMSE) were determined to verify the reliability and accuracy of the model [51]. The optimal VI based model is also established. Confusion matrix [52] and the accuracy indexes (ACC) are used to evaluate the effectiveness of GPC classification results.
where n is the number of observations; Y i and Y' i are the ith measured and simulated data, respectively; ACC is the accuracy, it can directly reflect the correct proportion of classification, and the calculation is simple; a ii is the sum of correctly classified samples; n is the sample size; the degree of model fit is high when R 2 value is close to 1; the model prediction ability is strong, the stability is good, and the reliability is high when the RMSE and rRMSE value is small.

Results and Analysis
3.1. Spectral Consistency among Sentinel-2 (S2), Landsat 8 (LS8), and RapidEye (RE) The spectral reflectance of different satellites in the same area and on the same day is extracted to verify the feasibility of multi-source remote sensing satellite complementation. The S2 and LS8 data, and the S2 and RE data were compared, respectively, to validate the data consistently among different sensors (Figure 4). Based on S2 imagery on 28 April 2018, LS8 imagery was georegistered. The S2 and LS8 imagery were resampled to 30 m resolution, the band reflectivity of 447 pixels was extracted for comparison, and the R 2 value was 0.99. Based on S2 imagery collected from 8 May 2018, RE imagery was georegistered. The S2 and RE imagery was resampled to 10 m resolution. The band reflectivity of 4165 pixels was extracted for comparison, and the R 2 value was 1.00. The average reflectivity of the four bands of the two groups of images was calculated and compared individually. The results show minimal difference between the two groups. The high consistency between S2 and LS8/RE demonstrated that these multisources could be used to construct a GPC model directly.

Screening of VI
The results showed that the selected VI and GPC were positively correlated ( Table 4). The correlation reached a significant level (p < 0.01). EVI (r = 0.48) has the highest correlation, followed by DVI (r = 0.46), OSAVI (r = 0.44), VARI (r = 0.39). VARI (r = 0.39) has the worst correlation. Therefore, EVI was selected as an independent variable to participate in the construction of the winter wheat grain protein model.

Screening of VI
The results showed that the selected VI and GPC were positively correlated ( Table 4). The correlation reached a significant level (p < 0.01). EVI (r = 0.48) has the highest correlation, followed by DVI (r = 0.46), OSAVI (r = 0.44), VARI (r = 0.39). VARI (r = 0.39) has the worst correlation. Therefore, EVI was selected as an independent variable to participate in the construction of the winter wheat grain protein model.

GPC Predicting Model based on HLM
The accuracy of the use of EVI to construct linear regression model (LR) for GPC was poor, with R 2 of 0.286, RMSE values of 1.11%, and rRMSE values of 8.63% (Figure 5b). Serious overestimates of the predicted GPC in winter wheat were shown during the 2014 and 2015 growing seasons. By contrast, the predicted GPC in the 2017 and 2018 growth seasons was underestimated. The model was insufficiently stable, and its performance greatly varied among different years and regions. The selected EVI and gluten data were considered the first layer information in HLM for GPC. Meteorological factors were considered the second layer information in HLM for GPC ( Table 5). The parameters in Table 5 correspond to the parameters in Eqs.(1) and (2), and γp1, γp2, and γp3 explain the contribution of Rad, Pre, and T, respectively. β0j, β1j, and β2j explain the influence of meteorological data on slope and intercept of gluten and VI. The GPC model based on the HLM method had better performance, with R 2 of 0.524, RMSE of 1.00%, and rRMSE of 8.12% (Figure 5a), than GPC model by LR method. Minimal difference was observed between the measured and the predicted values. The predicted results were reliable, especially the corrected accuracy interannual and regional scales.

GPC Predicting Model Based on HLM
The accuracy of the use of EVI to construct linear regression model (LR) for GPC was poor, with R 2 of 0.286, RMSE values of 1.11%, and rRMSE values of 8.63% (Figure 5b). Serious overestimates of the predicted GPC in winter wheat were shown during the 2014 and 2015 growing seasons. By contrast, the predicted GPC in the 2017 and 2018 growth seasons was underestimated. The model was insufficiently stable, and its performance greatly varied among different years and regions. The selected EVI and gluten data were considered the first layer information in HLM for GPC. Meteorological factors were considered the second layer information in HLM for GPC ( Table 5). The parameters in Table 5 correspond to the parameters in Equations (1) and (2), and γ p1 , γ p2 , and γ p3 explain the contribution of Rad, Pre, and T, respectively. β 0j , β 1j , and β 2j explain the influence of meteorological data on slope and intercept of gluten and VI. The GPC model based on the HLM method had better performance, with R 2 of 0.524, RMSE of 1.00%, and rRMSE of 8.12% (Figure 5a), than GPC model by LR method. Minimal difference was observed between the measured and the predicted values. The predicted results were reliable, especially the corrected accuracy interannual and regional scales.     (Figure 6b). These data were similar to those provided by the National Bureau of Statistics (11,270 and 22,000 ha). This finding showed that our classification was reliable. The figures and statistical data revealed that the wheat planting area in BJ was relatively small, mainly concentrated in Shunyi, Changping, Daxing, and Pinggu District, a small number of areas in Yanqing, Fangshan, and Tongzhou, and in the southern areas of Huairou and Miyun. The wheat planting in RQ City is relatively uniform. This work aims to integrate meteorological data for achieving regional GPC inversion.    (Figure 6b). These data were similar to those provided by the National Bureau of Statistics (11,270 and 22,000 ha). This finding showed that our classification was reliable. The figures and statistical data revealed that the wheat planting area in BJ was relatively small, mainly concentrated in Shunyi, Changping, Daxing, and Pinggu District, a small number of areas in Yanqing, Fangshan, and Tongzhou, and in the southern areas of Huairou and Miyun. The wheat planting in RQ City is relatively uniform. This work aims to integrate meteorological data for achieving regional GPC inversion.

GPC Validation and Map
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 21    (Figure 6b). These data were similar to those provided by the National Bureau of Statistics (11,270 and 22,000 ha). This finding showed that our classification was reliable. The figures and statistical data revealed that the wheat planting area in BJ was relatively small, mainly concentrated in Shunyi, Changping, Daxing, and Pinggu District, a small number of areas in Yanqing, Fangshan, and Tongzhou, and in the southern areas of Huairou and Miyun. The wheat planting in RQ City is relatively uniform. This work aims to integrate meteorological data for achieving regional GPC inversion.  The GPC at the flowering stage in 2018 and 2019 was predicted by using the constructed GPC monitoring model by HLM according to the fusion data of S2 satellite and raster meteorological data of winter wheat producing areas (Figures 7 and 8). The gluten types were classified as weak gluten (GPC ≤ 12%), medium gluten (12% < GPC ≤ 14%), strong gluten of the 2nd grade (14% < GPC ≤ 15%), and strong gluten of the 1st grade (15% < GPC) according to the national standard for high quality strong-gluten wheat (GB/T 17892-1999). The image exhibits that the medium and strong gluten were dominant in both places. The area ratio of different gluten wheat in the image was counted (Table 6), and relatively small amount of weak gluten wheat was found in both places. The area ratio of medium gluten wheat in BJ was 46.35%, followed by strong gluten of the 2nd grade and 1st grade occupying an area of 34.47% and 18.47%, respectively. During 2017-2018, XTS, as the experimental base, was dominated by high quality strong gluten of the 1st grade wheat with an area ratio of 77.12%, followed by strong gluten of 2nd grade and medium gluten with an area of 19.48% and 3.40%, respectively. During 2018-2019, XTS was dominated by high quality strong gluten of the 1st grade wheat with an area ratio of 61.48%, followed by strong gluten of 2nd grade and medium gluten with an area of 25.20% and 12.60%, respectively. RQ City was dominated by strong gluten of 2nd grade wheat, occupying an area of 72.56%, followed by medium gluten occupying an area of 19.62%. The strong gluten level-1 was less, with an area of 7.83%.

GPC Validation and Map
The GPC at the flowering stage in 2018 and 2019 was predicted by using the constructed GPC monitoring model by HLM according to the fusion data of S2 satellite and raster meteorological data of winter wheat producing areas (Figures 7 and 8). The gluten types were classified as weak gluten (GPC ≤ 12%), medium gluten (12% < GPC ≤ 14%), strong gluten of the 2 nd grade (14% < GPC ≤ 15%), and strong gluten of the 1 st grade (15% < GPC) according to the national standard for high quality strong-gluten wheat (GB/T 17892-1999). The image exhibits that the medium and strong gluten were dominant in both places. The area ratio of different gluten wheat in the image was counted (Table 6), and relatively small amount of weak gluten wheat was found in both places. The area ratio of medium gluten wheat in BJ was 46.35%, followed by strong gluten of the 2 nd grade and 1 st grade occupying an area of 34.47% and 18.47%, respectively. During 2017-2018, XTS, as the experimental base, was dominated by high quality strong gluten of the 1 st grade wheat with an area ratio of 77.12%, followed by strong gluten of 2 nd grade and medium gluten with an area of 19.48% and 3.40%, respectively. During 2018-2019, XTS was dominated by high quality strong gluten of the 1 st grade wheat with an area ratio of 61.48%, followed by strong gluten of 2 nd grade and medium gluten with an area of 25.20% and 12.60%, respectively. RQ City was dominated by strong gluten of 2 nd grade wheat, occupying an area of 72.56%, followed by medium gluten occupying an area of 19.62%. The strong gluten level-1 was less, with an area of 7.83%.

Discussion
Due to the first launch of S2 satellite in June 2015, the data of this study in 2015 cannot be used. Even after 2016, there would be insufficient data sources due to the long revisit cycle of a single satellite. For example, in a certain year, the flowering period in the suburb of Beijing is in the early May, and the S2 satellite in this area has no image in this period. Therefore, we tried to make up for this problem by complementing multiple satellites. Another advantage of introducing multi-source satellites was the ability to verify that our model can be unified for different sensors. If different satellite spectra were consistent, it was proved that our models can be used universally among different satellites. If the spectra were not consistent, new models needed to be established according to applied satellites. Due to different satellite sensors and different space scales, we must consider the problem of spectral consistency. We had explored the consistency test of multi-source satellite imageries, S2, LS8, and RE to enrich the RS data sources. Li et al. [53] attempted to study the potential of multispectral data for estimating biomass of non-photosynthetic vegetation. LS8 and S2 images were simultaneously used, but they were not tested for consistency. The study results showed that the differences between the aforementioned were small, and their spectral reflectance was highly consistent. This finding proved that the image data of the three imageries could participate in the model construction without correction processing. However, due to the cost, we did not analyze whether the band information between other satellites or sensors (e.g., high score satellite series and resource satellite series) was consistent. Further research will consider the consistency among other satellites, aviation platforms, and terrestrial platforms. By doing so, the RS image data will be further enriched, and a sufficient range of data sources in the application of regional GPC prediction will be ensured.
Researchers have carried out countless research in the field scale on GPC monitoring and proposed a series of methods and models. Hansen et al. [6] and Wang et al. [54] selected sensitive bands with high correlation with GPC to construct the model in the early stage. The model performance was poor because of the uncertainty of single band reflectance. Song et al. [55] showed that the best model for GPC prediction was index model, but R 2 was only 0.426 and 0.494. The follow-up researchers mostly used the VI to create models for effectively avoiding this uncertainty and predict GPC [9,56]. Feng et al. used multiple vegetation index to build GPC prediction model, in which the linear model constructed by modified normalized difference705 (mND705) has the highest precision with R 2 and RMSE values of 0.616 and 1.019%. However, this was based on the high-resolution ground hyperspectral data in precision agriculture. Similar accuracy based on satellite and field data was achieved in this study. The use of RS data only was insufficient for GPC prediction and would lead to the above-mentioned problem. This phenomenon was because of genotypes accounting for 95% and environmental factors accounting for 5% of the grain protein contribution [19]. The inherent basis of wheat quality characteristics depended on genotypes, which had different connotations, evaluation indicators, and suitable ecological regions with various product uses [57]. The genotype and environmental factors, namely variety information and meteorological data, were fully considered in this study. The GPC monitoring model was constructed by correlating RS data, species, and meteorological data with HLM. The HLM can effectively improve the GPC prediction accuracy by comparing the HLM and LR model constructed from RS data only. This conclusion was consistent with that of Wang et al. [2]. The GPC empirical model using RS data and environmental factors was more precise than the use of RS data or environmental factors alone. The HLM also effectively corrected inter-annual and inter-regional deviations. This outcome is attributed to this study regarding RS data, species, and meteorological data as two-layer nested structure data and decomposition of the total variance of GPC data by using the HLM. In this way, the interaction of different varieties and environments on GPC was considered, which played a key role in inter-annual and inter-regional deviation correction. Li et al. [28] also considered the influence of environment and gluten on GPC prediction, but mainly focused on the field scale and confirmed the HLM method for quality prediction interannual scales. This study was demonstrated at regional scales with some different results in the following aspects: (i) remote sensing data sources are different. We used the multi-source remote sensing satellite data (not the ground hyperspectral data), and analyzed the consistency of different remote sensing satellites to achieve the regional prediction of GPC; (ii) meteorological data are different. We used the grid meteorological data (not the data of a point), which provided the basis for large-scale GPC prediction; (iii) different experiments. We not only designed the plot experiment, but also conducted the random sampling experiment in the field, so it is more popularized. (iv) quality map is more useful to joint practical production.
However, there were still cases of gluten misclassification, e.g., weak gluten grains with high GPC value may be predicted as strong gluten of 2nd grade, and vice versa. We must point that GPC prediction during growing period is a rough classification to guide harvest. According to the standard of rough classification, we combined the strong gluten of 2nd grade and strong gluten of 1st grade into the strong gluten, and used the confusion matrix (Table 7) to evaluate the accuracy. The overall classification accuracy (ACC = 0.75) shows that the model can achieve positive classification results. However, due to the imbalance of the data set, there were limited samples of weak gluten in the modeling set, resulting in a large number of samples of weak gluten that were misclassified. A further study to balance the data set will improve this situation. Compared to the traditional non-classified harvest, GPC prediction and classification could improve the harvest grain quality, and even the misclassification has little percentage on the large proportion in one gluten type. Due to the sample size and GPC classification threshold (only 1% of the strong gluten of 2nd grade and strong gluten of 1st grade), a small number of misclassifications in the classification process caused a large deviation, which made it difficult to classify the strong gluten of 2nd grade and strong gluten of 1st grade. In addition, this study was based on the prediction of flowering stage, and the influence of other factors in the filling stage will also cause classification deviation. In spite of this, it can still have an important guiding significance for harvest classification. Sampling inspection during storage was necessary. Only the type of gluten needed to be determined at the fuzzy boundary of strong gluten wheat, which greatly reduced the workload of random inspection. Feng et al. [58] indicated that the optimal prediction period was the flowering stage. The meteorological data only used the empirically selected accumulated temperature, solar radiation, and rainfall one month before the flowering period. Although the data achieved good results, it may be insufficient. This issue is attributed to the impact of HLM supported by various information and meteorological data on the prediction of the optimal growth stage and the VI in different GPC growth stages. The meteorological data not only included temperature, solar radiation, and rainfall. The following work will also consider other growth periods as the origin and analyze the types and time bucket sensitivity of the accumulated meteorological data before and after the origin. Such work will consider the phenological factors to establish a full-time GPC monitoring model. In addition, latitude, topography, and soil information need to be considered.
In terms of variety information, RQ and BJ, which were commonly planted medium-gluten varieties, were compared. The monitoring results in RQ City were mainly strong-gluten wheat (Figure 10a). The monitoring results in BJ show that medium and strong gluten were distributed. The northern mountain area of BJ was dominated by medium gluten (Figure 10a). The middle and strong gluten evenly distributed in the gentle zone in the middle (Figure 10a), and the strong gluten was mainly distributed in the south (Figure 10a). This difference was due to the varying meteorological factors in the four regions. This conclusion once again verified the accuracy of the starting point of our study. Such notion indicated that the response of varieties to diverse environments was vary. We calculated the meteorological data for the four regions (Figure 10b-d). The difference of accumulated temperature was the greatest in the four regions, and those of precipitation and radiation quantity were relatively stable. Figure 10a shows that the wheat GPC was high when the temperature was also high. This finding was consistent with the agronomic conclusions of Guasconi et al. [19]. Figure 10c shows that the precipitation and GPC were high when the latitude was also high. However, previous studies [20] showed that GPC was inversely proportional to precipitation, which was inconsistent with our statistics. This similarity may be due to the small precipitation gradient difference in each region. Moreover, the interaction between temperature and precipitation affects the results. Therefore, the relationship between single climatic factors and GPC should be investigated, and the adverse effects between various factors cannot be ignored. Figure 10d shows that the radiation quantity of the three areas in BJ is the same, while that of RQ is low. The effect of radiation quantity on the difference of GPC cannot be verified.
Remote Sens. 2020, 12, x FOR PEER REVIEW 17 of 21 ( Figure 10a). The monitoring results in BJ show that medium and strong gluten were distributed. The northern mountain area of BJ was dominated by medium gluten (Figure 10a). The middle and strong gluten evenly distributed in the gentle zone in the middle (Figure 10a), and the strong gluten was mainly distributed in the south (Figure 10a). This difference was due to the varying meteorological factors in the four regions. This conclusion once again verified the accuracy of the starting point of our study. Such notion indicated that the response of varieties to diverse environments was vary. We calculated the meteorological data for the four regions (Figure 10b-d). The difference of accumulated temperature was the greatest in the four regions, and those of precipitation and radiation quantity were relatively stable. Figure 10a shows that the wheat GPC was high when the temperature was also high. This finding was consistent with the agronomic conclusions of Guasconi et al. [19]. Figure 10c shows that the precipitation and GPC were high when the latitude was also high. However, previous studies [20] showed that GPC was inversely proportional to precipitation, which was inconsistent with our statistics. This similarity may be due to the small precipitation gradient difference in each region. Moreover, the interaction between temperature and precipitation affects the results. Therefore, the relationship between single climatic factors and GPC should be investigated, and the adverse effects between various factors cannot be ignored. Figure 10d shows that the radiation quantity of the three areas in BJ is the same, while that of RQ is low. The effect of radiation quantity on the difference of GPC cannot be verified. The genotype and environmental interaction should be fully considered in various selection and field management. Various promoters should extend the different genotypes to their suitable ecological areas in the future. The varieties used in this study were mainly strong gluten and medium-strong gluten, and the weak gluten was seldom used. The modeling set was insufficiently diversified in the gluten. Additional weak gluten wheat varieties will be considered to participate in the modeling in the future. The wheat gluten in our model was derived from the known variety information. However, the variety information was difficult to obtain in the actual regional prediction. The genotype and environmental interaction should be fully considered in various selection and field management. Various promoters should extend the different genotypes to their suitable ecological areas in the future. The varieties used in this study were mainly strong gluten and medium-strong gluten, and the weak gluten was seldom used. The modeling set was insufficiently diversified in the gluten. Additional weak gluten wheat varieties will be considered to participate in the modeling in the future. The wheat gluten in our model was derived from the known variety information. However, the variety information was difficult to obtain in the actual regional prediction. Therefore, the variety information from the seed departments and their extension stations should be obtained to create the wheat variety flow direction and distribution map in the regional application.

Conclusions
In this study, a GPC monitoring model based on HLM was established by combining gluten, meteorological data, and multisource RS images. The poor interannual expansion and regional transfer of the traditional models were corrected effectively. The major study conclusions are presented as follows: (1) LS8, RE, and S2 had high consistency in spectral information and could be combined for RS modeling; (2) The GPC prediction model based on HLM (R 2 = 0.524, RMSE = 1.00% and rRMSE = 8.12%) was better than the traditional linear model (R 2 = 0.286, RMSE = 1.11% and rRMSE = 8.63%). In combination with the meteorological data and gluten, the problem of spatiotemporal deviation was effectively corrected. (3) The GPC regional prediction of wheat in BJ and RQ was realized by using the model and S2 data. These results show that the potential of HLM in predicting GPC of winter wheat on the interannual and regional scales can provide technical reference for GPC prediction in global or regional scale.