Novel Vegetation Indices for Cotton Boll Opening Status Estimation Using Sentinel-2 Data

: The application of chemical harvest aids to defoliate leaves and ripen bolls plays a significant role in the once-over machine harvest of cotton ( Gossypium hirsutum L.) fields. The boll opening rate (BOR) is a key indicator for the determination of harvest aid spraying times. However, the most commonly used method to determine BOR is manual investigation, which is subjective and cannot have a holistic judgment of the entire area. Remote sensing can be employed to overcome these limitations, due to a wide field of vision, acceptably spatial and temporal resolution, and rich spectral information beyond the perception of the human eye. The reflectance of open cotton bolls is relatively high in the visible and near-infrared bands. High reflectance of open bolls has a great influence on the reflectance of the mixed pixels on remote sensing imagery. Therefore, it is an effective method to detect boll opening status by constructing vegetation indices with the sensitive spectral bands of imagery. In this study, we proposed two new vegetation indices based on Sentinel-2 remote sensing data, namely, the boll area ratio index (BARI) and the boll opening rate index (BORI), in order to estimate the boll opening status on a regional scale. The proposed indices were strongly correlated with the boll area ratio (BAR) and BOR. In particular, BARI exhibited the most accurate and robust performance with BAR in the prediction (R 2 = 0.754, RMSE = 2.56%) and validation (R 2 = 0.706, RMSE = 5.00%) among all the indices, including published indices we chose. Furthermore, when comparing to all other indices, BORI demonstrated the best and satisfactory estimation with BOR in the prediction (R 2 = 0.675, RMSE = 7.96%) and validation (R 2 = 0.616, RMSE = 2.79%). Meanwhile, an exponential growth relationship between BOR and BAR was identified, and the underlying mechanisms behind this phenomenon were discussed. Overall, through our study, we provided convenient and accurate vegetation indices for the investigation of boll opening status in a cotton-producing area by accessible and free Sentinel-2 imagery.


Introduction
As a leading cash crop and the most important fiber producing crop for the textile industry, cotton (Gossypium hirsutum L.) is grown in many countries across the globe [1]. Cotton fibers covering seeds of the cotton plant are the starting point of the textile production chain [2]. Cotton production plays an important role in international trade, national security, and is an important source of income for cotton farmers [3]. Around the world, China, India and the United States are the prominent cotton-producing countries [1]. Today, cotton-producing countries worldwide tend to plant on flat and vast areas for mechanizing cotton harvesting. The adoption of mechanization has improved the efficiency of cotton harvesting and reduced the labor burden of farmers, as well as cotton production costs. An integral component of mechanized cotton harvesting is the spraying of chemical harvest aids [3,4]. Harvest aids are employed to promote mechanized cotton harvesting and protect fiber and seed quality by allowing for earlier harvesting, thereby reducing field weathering losses and minimizing trash content and lint pollution [4]. Thus, the timing of harvest aids is crucial for cotton yield and quality, with too early or too late applications possibly resulting in yield reductions and poor fiber quality [5]. Boll opening rate (BOR) and regional climatic characteristics are two important factors to guide the timing of harvest aids. The boll opening rate indicates the status of cotton reproductive growth. Only when BOR reaches a certain degree can harvest aids accelerate the process of boll opening [6,7]. Otherwise, when the BOR is low, the improper use of harvest aids may result in reduction of production. Also, in terms of regional climatic characteristics, due to the changes of seasons, rain and low temperature can cause the decline of fiber quality or even extinction in the late growth period of some cotton-producing areas. It is necessary to complete the whole process of boll opening before the arrival of low temperature or rainy season. Therefore, cotton planting areas threatened by low temperature in the late growth period should spray harvest aids at a proper BOR, instead of waiting for a high BOR.
As a prominent cotton-producing country, China's cotton planting area in 2019 reached 3.339 million hectares, while cotton production was recorded as 5.889 million tons. The Xinjiang Uyghur Autonomous Region in northwest China is the principle cotton-producing area of the country, accounting for 76.1% of its cotton planting area [8]. In particular, in 2019 almost 40% of the planting area in Xinjiang was machine-picked [8,9]. However, in the cotton planting area of Xinjiang, due to low temperatures in the late growth period, harvest aids can be sprayed without the BOR reaching a high level. Generally, the spraying of harvest aids to assist mechanical harvesting is performed when BOR reaches 30-40% [7]. Therefore, the investigation of BOR is very important to guide the spraying of harvest aids.
Traditional BOR measurements involve the random sampling of five points via a "Z" pattern across the experimental area to determine the number of total and open bolls [7,10]. At present, such manual measurements are the most common method used to determine BOR values. However, they tend to be subjective, unrepresentative and non-dynamic, and can also be destructive, thus preventing the accurate estimation of BOR in the study area. In addition, the high density of cotton in Xinjiang causes the central regions of the field to be difficult to reach by investigators. Thus, an effective and accurate measurement method is urgently required to overcome the limitations of the currently used techniques.
Remote sensing can effectively and accurately obtain information on crops in a dynamic, nondestructive and rapid manner and can be applied in numerous fields. For example, the assessment of pigment content [11][12][13], the acquisition of leaf area index (LAI) [14][15][16], the estimation of nitrogen content [17], crop pest and disease detection [18][19][20][21], and the quantitative identification of crop stress [22]. Over the last two decades, many cotton researches have been performed employing remote sensing for the growth monitoring and detection [23][24][25], yield estimation [26,27], pest and disease monitoring [28][29][30] and other aspects [31][32][33][34]. In the discrimination and monitoring of cotton, Bai et al. [23] developed the linear models using vegetation indices to provide an effective and rapid way to monitor plant density of cotton, which is significant for operational application at a regional scale. Zhang et al. [24] investigated the spectral features of four crops (cotton, corn, soybean, and sorghum) across different growing periods and soil conditions and was able to distinguish cotton from other crop types via hyperspectral data. Zhang et al. [25] discriminated crop types to detect cotton plants by fusing ground measurements and airborne data. In the yield estimation of cotton, Zhuang et al. [26] proposed a yield estimation model based on hyperspectral data for the monitoring of growth vigor and production forecasting of cotton. Attia et al. [27] investigated the responses of LAI, the normalized difference vegetation index (NDVI), the normalized difference water index (NDWI), and the crop water stress index (CWSI) and their relationship with lint yield for four cotton cultivars grown under different irrigation regimes across two growing seasons. In the pest and disease monitoring of cotton, Fitzgerald et al. [28] discriminated between a healthy cotton field and an adjacent mite-damaged cotton field using hyperspectral imagery and spectral mixture analysis, and finally determined spatially explicit maps of mite damage severity. Jing et al. [29] analyzed the correlation of cotton leaf verticillium wilt severity level with raw, first derivative hyperspectral reflectance and hyperspectral characteristic parameters in order to build a monitoring model. Wu et al. [30] employed an improved spatial and temporal data fusion technique to generate an NDVI time series with a relatively high spatial resolution for the monitoring of cotton root rot. Additional researches on cotton includes the development of a method using a convolutional neural network (CNN) and dense point clouds to count cotton blooms using aerial RGB imagery by Xu et al. [31]. Westbrook et al. [32] obtained the spectral characteristics at different growth stages of regrowth cotton in order to detect the smallest regrowth via the linear spectral unmixing of airborne multispectral images.
Several studies have focused on the monitoring of cotton bolls and the usage of agricultural chemicals on cotton. Yang et al. [35] assessed the effectiveness of different defoliation strategies on cotton for harvesting by airborne multispectral imagery. Yang et al. [36] employed the green, red, and near-infrared bands of airborne color-infrared (CIR) digital imagery to derive four vegetation indices to evaluate the impact of different herbicide treatments for cotton regrowth control. Yeom et al. [37] proposed a novel cotton yield estimation framework that included a cotton boll extraction algorithm using super high resolution unmanned aerial vehicle (UAV) images to extract the spectral and spatial features of open bolls.
The monitoring of the boll opening process not only plays an important role in yield estimation, but also has a guiding significance in the use of harvest aids. Nevertheless, research on the boll opening using remote sensing is limited. Satellite imagery is the most commonly used remote sensing data type, and is associated with several advantages, such as a large coverage, the rich data volume and plenty of freely accessible data sources. Although there exists weather and resolution limitations, satellite remote sensing is still a popular and convenient approach in the crop investigation. Among the numerous satellites, the multispectral sensors of the Sentinel-2 satellite pair include relatively refined spatial, spectral, and temporal resolutions, and in particular, the red-edge bands can provide important information for agriculture research [38].
During the boll opening stage, there are plant leaves, open bolls and soil background in cottonproducing areas in a Sentinel-2 image. Each Sentinel-2 mixed pixel contains different proportions of three end members (plant leaves, open bolls and soil). Boll opening stage is the late growth stage of cotton. With the increase of boll opening rate, senescence of cotton plant occurs and the cotton leaves fall. With the increase of open bolls, the area occupied by cotton leaves decreased and the area of open bolls and exposed soil increase in Sentinel-2 imagery. Each object owns its characteristic spectrum. Therefore, there's a series of changes on mixed pixels spectral reflectance. Especially, the increasing open bolls have a great influence on the reflectance of pixels, which owns high reflectance on visible and near-infrared bands. Finally, the vegetation indices constructed by the sensitive bands can reflect the difference clearly and obviously.
The aim of this study is to develop effective and robust vegetation indices to estimate the boll opening status using Sentinel-2 imagery. The specific objectives are as follows: 1) To establish two new indices, namely the boll area ratio index (BARI) and the boll opening rate index (BORI), to estimate the boll opening status; 2) to evaluate the capability and robustness of the newly proposed BARI and BORI; and 3) to explain the relationship between agricultural manual measurements and remote sensing observations on the cotton boll opening.

Study Site
As shown in Figure 1a, b, the field experiment was carried out in Beiquan Town, Shihezi City, Xinjiang Uygur Autonomous Region, China (44°23'N, 85°58'E). The cotton cultivar was Jinmian 18#. The planting date was 29 April, 2018. The plant density in the experimental sites was 195,000 plants/ha. Cotton has been planted in the experimental fields for many years.
The region has a temperate continental climate with 8.7℃ average temperature and 203.4 mm annual rainfall. Summer is short and hot, with an average temperature of 24.4℃; autumn is short, with an average temperature of 8.5℃. A huge temperature difference between summer and autumn may lead to the decrease of cotton production or even the death of cotton. Therefore, it is necessary to spray harvest aids to accelerate the process of boll opening.
Cotton plants were planted according to the planting requirements of mechanical harvesting, the big row space and small row space were 66 and 10 cm, respectively. The distance between plants is 10.5 cm. The soil type is dark brown meadow loam.
The area of the whole study area was approximately 35.3 ha. The experimental site was divided into 5 sub-research areas, as shown in Figure 1b. The entire area of each sub-research area was divided into several similar-sized small plots, as shown in Figure

Boll Opening Rate Measurements
Ground-based measurements were conducted to determine the boll opening rate on September 6, 2018. The air temperature is 26.7℃, the air humidity is 30.95% and the wind speed is 1.8m/s in the experiment site on that day. Measurements were taken under sunny hours between 10:00-14:00 in Beijing time. More specifically, for each small plot, five sampling points were determined using a "Z" pattern, with six cotton plants chosen at each sampling point. The total bolls and open bolls of each plant were counted [9,10]. Based on the field measurements, the BOR is calculated according to the following equation: where BOR is the boll opening rate, NOB is the number of open bolls and NTB is the total number of bolls of the marked cotton plants in the tested zone. A total of 30 BOR data was collected for each plot. The average of the 30 BOR data was taken and used as the average plot BOR value for each plot. We investigated the relationship between VIs and BOR, from the data collected on sub-research areas No.1, No.2 and No.5, consisting of 48 small plots in total. The dataset was divided randomly into two subsets: 70.83% as the training dataset (34 small plots) and 29.17% as the validation dataset (14 small plots).

Boll Area Ratio Measurements
The Phantom 4 UAV (DJI, China) with the Sequoia integrated RGB sensor (Parrot, France) was used for the data acquisition of the experimental field plots on 6 September 2018. The experiment was done in sunny, cloudless weather between 11:00 and 13:30 in Beijing time. The raw data was recorded as a total of 544 raw images collected over the study area. All images were obtained at a flight height of 159.2m, with a spatial resolution close to 15 cm per pixel. We employed Pix4D (Pix4D SA) to generate an orthomosaic image from the raw images.
The random forest method was adopted for the classification of the UAV imagery. The overall classification accuracy of the image was 95.5%, and the user's accuracy of open cotton bolls was 96.3%. In this study, the concept of the boll area ratio (BAR) was defined, which is the ratio of the open bolls' area in one small plot and the area of that small plot in the UAV image. The BAR of each plot was determined based on the UAV image and the experimental design in Figure 1 as follows: where BAR is the boll area ratio, AOB is the total area of the open bolls in one small plot calculated by the UAV image, and ASP is the area of the single small plot. One BAR value for a single small plot was determined in the test zone. There was a total of 70 BAR data in the entire experimental site from the No.1-5 sub-research areas.
We used the data from all sub-research areas (No.1 to No.5) to determine the relationship between the VIs and BAR. The BAR values of all plots were divided randomly into two subsets: 70% as the training dataset (49 small plots) and 30% as the validation dataset (21 small plots).

Sentinel-2 Imagery
Sentinel-2 is an Earth observation mission, forming part of the Copernicus Programme developed by the European Space Agency [15]. Sentinel-2A was launched on 23 June 2015 and 2B on 7 March 2017 [38]. The revisit frequency of the Sentinel-2 satellite pair is 5 days, with 10 days as the temporal resolution of each satellite. Each Sentinel-2 satellite carries a sensor with 13 spectral channels located in the visible/near infrared (VNIR) and short wave infrared spectral range (SWIR). The spatial resolution is 10, 20 and 60 m. The three red-edge bands are centered at 704, 740 and 780 nm with a spatial resolution of 20 m. These three red-edge bands provide more significant information for the agricultural applications [15,16,21].
In this study, the Sentinel-2 multispectral image was acquired on 6 September 2018 from https://scihub.copernicus.eu/. Atmospheric correction of Sentinel-2 imagery was performed using the Sen2cor atmospheric correction toolbox, which is provided by the sentinel official website (http://step.esa.int/main/third-party-plugins-2/sen2cor). Furthermore, the spatial resolution of all bands was resampled to 10 m via the Sentinel Application Platform (SNAP, version 4.0) software [38].

Proposed Vegetation Indices
The average spectral reflectance of a cotton plant (with no open boll), cotton fiber of open bolls and bare soil is distinct. More specifically, the reflectance of the bare soil is low across all wavelengths, while that of the opening cotton boll is very high. The cotton leaves exhibit a spectrum typical to plants with a green peak and high near infrared reflectance. The spectral reflectance value of each band in each small plot was calculated using sentinel-2 data. A significance test was conducted between spectral reflectance values and BAR (BOR). As Table 1 showed, p-values were used to evaluate the significance level. The coastal aerosol (B1), green (B3), red edge1 (B5), water vapor (B9), and cirrus (B10) bands were removed from the study due to not passing the significant test. The SWIR1 (B11) and SWIR2 (B12) were also removed, owing to their general performance in the significant test. The remaining six bands were chosen for the construction of the new indices with p-values lower than 0.001 in the significant test.

Spectral Band p-value (BAR) p-value (BOR) B1
Coastal aerosol Not significant Not significant B2 Blue(B) *** *** B3 Green(G) Not significant Not significant B4 Red(R) *** *** B5 Red-edge1(Re1) Not significant Not significant B6 Red-edge2(Re2) *** *** B7 Red-edge3(Re3) *** *** B8 Near infrared (NIR) *** *** B8a Near infrared narrow (NIRn) *** *** B9 Water vapor Not significant Not significant B10 Shortwave infrared/Cirrus Not significant Not significant B11 Shortwave infrared1(SWIR1) ** * B12 Shortwave infrared (SWIR2) *** * * Significance at 0.05 level, ** Significance at 0.01 level, *** Significance at 0.001 level Polylines connecting the multispectral reflectance of small plots with different BOR are shown in Figure 2a, and that with different BAR are shown in Figure 2b. Increase in BOR and BAR values are associated with the increase in the red and blue bands reflectance, and the fall in the reflectance of the Re2, Re3 and near infrared bands, overall. At the same time, the reflectance change law of the green band is not obvious, and the reflectance of red-edge 1 band is basically unchanged. As shown in Figure 2, the change of the red-edge 3 band is more stable and regular than that of near-infrared bands. Therefore, in order to obtain abundant and stable information of cotton, we can combine the two visible bands (blue band and red band) and red-edge 3 bands to analyze. As shown in Figure 2a, the three vertices of triangle are the reflectance of band 2 (Blue), band 4 (Red) and band 7 (Red-edge3) from Sentinel-2 imagery, respectively. With the increase of BOR, the areas of triangles decrease. Based on the above, the formula of the new index, namely the boll opening rate index (BORI), is the triangle area and given as follows: The ratio of the triangle area to the reflectance of the red band is also sensitive to the estimation of the boll opening via remote sensing data. Thus, an additional index, namely the boll area ratio index (BARI), was developed as follows: In Equations (3) and (4), where S△ represents the area of the triangle, Ri indicates the reflectance of the i band (i represents blue, red, and Re3), and the values of 493, 665, 783 represent the central wavelength (nm) of the blue, red, and Re3 band, respectively. As in Equation (4), with the increase of the boll opening status, BORI, which is the triangle area, decreased, and the reflectance of the red band increased. Therefore, the decrease of BARI is more obvious.

Published vegetation indices used in this study
We selected 10 vegetation indices (Table 2) commonly used to routinely monitor vegetation growth for the comparisons with our proposed indices.

Analysis Method and Software
ArcGIS (version 10.3, ESRI, Redlands, CA, USA) was employed for the division of small plots, as well as the calculation of the VIs of each small plot. The raw UAV images were processed using Pix4D (Pix4D, Lausanne, Switzerland) in order to create an orthomosaic image. The orthomosaic image was imported using ENVI 5.3 (version 5.3 Harris Geospatial Solutions, Broomfield, CO, USA) to obtain the BAR data. Significant tests were performed using SPSS (version 20.0, IBM Corporation, New York, NY, USA). The regression model was established and validated via MATLAB (version 2016a, MathWorks, Inc., Natick, MA, USA). The coefficient of determination (R 2 ) and root-meansquare error (RMSE) selected to judge the performance of the established models were calculated in MATLAB. More details on the R 2 and RMSE are available in Richter et al. [49].

Relationship between VIs and BAR
The results of the BAR assessment with the published and newly proposed spectral indices are presented in Figure 3. In particular, the BAR values are distributed within 0-30%, and mainly concentrated in the range of 0-10%. High BAR values are quite rare as the boll opening rate did not reach high levels on 6th September, which is due to the little pixels that were occupied by the open bolls on the image in each plot.
For the prediction step of the experiment, multiple fitting models were used, with the optimal models obtained for different indices. In general, there was not much variation in the correlation between the VIs and BAR, with most correlations observed to be high, apart from RBRI, TCARI/OSAVI and MCARI/OSAVI. During the cotton boll opening process, the carpel walls separate along the sutures, making the white fiber visible. Moreover, senescence occurs and the cotton leaves fall, exposing more bare soil. This phenomenon can alter the reflectance of the visible, red edge and near infrared bands. The selected published VIs are commonly used to monitor growth changes and can thus accurately detect growth stage information. The proposed indices exhibited a significant exponential trend. In particular, BARI demonstrated the highest correlation with BAR (R 2 = 0.754, RMSE = 2.56%) among all indices, with BORI also correlating strongly with BAR (R 2 = 0.725, RMSE = 2.71%). In addition, NDVI ranked first among the published indices (R 2 = 0.745 RMSE = 2.60%). NDVI was followed by MSR (R 2 = 0.738, RMSE = 2.64%) and SR (R 2 = 0.730, RMSE = 2.68%). The R 2 values of these three VIs are larger than 0.7. MTVI1 (R 2 = 0.690, RMSE = 2.87%), DVI (R 2 = 0.673, RMSE = 2.95%) and MSAVI (R 2 = 0.672, RMSE = 2.96%) exhibited a moderate relationship with BAR, ranking 6th, 7th, and 8th respectively, among all indices. MTCI (R 2 = 0.616, RMSE = 3.20%) and RBRI (R 2 = 0.513, RMSE = 3.60%) demonstrated lower correlations, yet nevertheless, their scatterplots still exhibited an exponential trend. TCARI/OSAVI exhibited a weak correlation with BAR (R 2 = 0.239, RMSE = 4.50%). Unlike the other indices, RBRI and TCARI/OSAVI exhibited a positive correlation with BAR. Lastly, the very weak observed correlation for MCARI/OSAVI (R 2 = 0.118, RMSE = 4.85%) indicates that it is unlikely to effectively predict BAR values.
For the validation step of the experiment (Table 3), the newly proposed BARI demonstrated the highest accuracy in the BAR assessment, suggesting that BARI can accurately and robustly estimate BAR. Compared with the prediction results above, BORI (ranked 6th) showed slightly weaker results in the validation, yet still passed the significant test, with a p-value lower than 0.001. In addition, distinct from its prediction performance, MTCI exhibited the strongest relationship with BAR among all published VIs. Similarly, in contrast to its successful prediction performance, NDVI demonstrated slightly poorer results (ranked 5th). SR and MSR exhibited similar performances in both the prediction and validation, with SR slightly surpassing MSR. MTVI1 and DVI also performed poor. MSAVI, RBRI, and MCARI/OSAVI showed no significant correlation with BAR, indicating their possible insensitivity to BAR.

Relationship between VIs and BOR
BOR is a familiar term for farmers and agricultural technicians, and is also a common indicator used to guide the use of harvest aids. The BOR data derived from our field measurements on September 6, 2018 were within 0-60%. The BOR estimation results using the two newly proposed and 10 published indices are shown in Figure 4. A linear relationship between the indices and BOR can be observed. Similar to the BAR results, all indices, with the exception of RBRI and TCARI/OSAVI, were negatively correlated with BOR. The newly proposed BORI yielded the highest correlation with BOR (R 2 = 0.675, RMSE = 7.96%), while BARI ranked 5th among all VIs. MTVI1 was the best performing published index followed by DVI. In addition, MSR and SR ranked 6th and 7th, respectively. Unexpectedly, NDVI did not perform as well as the BAR retrieval, ranking 8th among all indices. As with the BAR retrieval, the performance of RBRI was unsuccessful, ranking 9th. The R 2 value of MTCI only reached 0.451, performing just as poor as in the BAR estimation. The scatterplots of MCARI/OSAVI and TCARI/OSAVI versus BOR were highly dispersive, which is in agreement with the BAR results. In the validation step, BORI and BARI both performed successfully. As shown in Table 4, BORI demonstrated the best accuracy in the validation of BOR (R 2 = 0.616, RMSE = 2.79%), with BARI ranking 3rd. Although MCARI/OSAVI and RBRI exhibited not bad R 2 and low RMSE values, they do not pass the significant test. NDVI performed moderately, with a better ranking than that for the BOR prediction. MSR and SR exhibited relatively weak correlations with BOR. This indicates their similar ability to estimate the boll opening status. Although MSAVI and MTVI1 were able to predict BOR successfully, their validation results were poor. In addition, MSAVI did not pass the significant test. Therefore, these two indices cannot be used to accurately retrieve BOR values. As with the BOR predictions, the performance of DVI in the validation step was very poor. TCARI/OSAVI demonstrated a weak correlation with BOR, and thus its poor performance in both the BOR and BAR assessment indicates that it is not correlated with the boll opening status. MTCI exhibited the most unsuccessful estimation in the assessment of BOR. Scatterplots of measured BOR versus predicted BOR of the four indices with the best performance are shown in Figure 5. BORI showed the best correlation with BOR, and the scatterplot of BORI is very clustered. However, it existed as somehow overestimated. BARI showed good estimation of BOR. The fitted line of its scatterplot was close to the 1:1 line. In addition, the phenomena of low value overestimation and high value underestimation is obvious. Both RBRI and MCARI/OSAVI have obvious overestimation. The estimation accuracy of MCARI/OSAVI is better than RBRI, but the overestimation is more obvious.

Relationship between BOR and BAR
In the results from Section 3.1 and 3.2, the relationship between the majority of the VIs and BAR (BOR) was observed as exponential (linear). Therefore, we conclude that BAR and BOR have an exponential relationship. Figure 6 demonstrates the scatter plot of BAR and BOR with a fitted exponential curve (R 2 = 0.612, RMSE = 3.52%), which is the same as the inference. The underlying mechanism behind this phenomenon is based on the process of cotton boll opening, which initiates from the bottom region of the cotton plant and subsequently moves to the upper parts. In this study, BAR was defined as the ratio of open bolls area in one small plot and the area of that small plot in the high-resolution UAV image (RGB image). In RGE images, the information from the upper layers of the crop is richer than lower layers, since the middle and bottom layers of the crop are covered due to the vegetation structure, especially after dense planting. In the early stage of boll opening, the boll opening begins to happen in the lower layers and BAR is small. As the BOR values increase, more opportunities of open bolls appear on the upper layers of the plant, and finally, the increasing speed of BAR keeps accelerating, as demonstrated in Figure 6.

Discussion
Boll opening status is a crucial indicator for the guidance of spraying harvest aids, as the timing and dosage of harvest aids directly affect cotton yield and quality. Remote sensing is an effective and accurate information acquisition approach to investigate boll opening status in a large scale. In this study, using Sentinel-2 imagery, two effective and robust indices BARI and BORI were developed to estimate the boll opening status.
Aiming at the widely recognized monitoring indicator BOR, the newly proposed index BORI was proposed. BORI performed the best both in the BOR prediction ( Figure 4) and validation (Table  4). In previous studies, the most commonly used method is manual investigation, in which five sampling points are randomly selected in each tested zone according to the method of "Z" pattern investigation, and five plants per sampling point are marked to count total bolls and open bolls [10]. However, manual investigation is difficult to adapt to large-scale and real-time monitoring due to it being time consuming, toilsome and subjective [50]. However, remote sensing can solve the problems that manual investigation caused. In the study of F. Aubrey et al., the BOR and vegetation indices chose were demonstrated a linear relationship [51]. In addition, visible atmospherically resistant index (VARI) with the best effect showed the best performance in all indices, in which R 2 is 0.62 and 0.3 in two datasets, respectively. In the present study, BORI showed better accuracy and robustness both in the prediction and validation. Compared to this study, the research of F. Aubrey et al. [51] was not robust and the models of two datasets were not uniform. The results of this study can achieve better results due to the most suitable bands in the Sentinel-2 MSI (i.e., B2, B4, and B7) being selected for the monitoring of BOR. The canopy spectra of cotton fields is mainly contributed by cotton plants (without open bolls), open cotton bolls and soil [28]. With the process of boll opening, the cotton canopy spectra will be affected by the soil and open bolls to a greater extent. The spectra in this study ( Figure 2) is basically consistent with Zhuang's study [26]. In addition, in Zhuang's research on the yield estimation, the best index is VARI_700, in which band choices are blue band, red band and the red edge band. Although the wavelengths are somewhat different, the study can prove that this study's choice of bands has a certain significance. Furthermore, the construction of the triangular index can combine the information of three bands, and make the changes of the total area of the triangle more obvious [16].
In this study, we first proposed the concept of BAR, which is a new indicator that can be obtained directly from UAV image data without extensive human investigation. At the same time, the BARI index based on the BAR indicator was also proposed to achieve the best performance both in the prediction step (R 2 = 0.754, RMSE = 2.56%) and the validation (R 2 = 0.706, RMSE = 0.500%). BARI exhibited its capability mainly on the account of the bands' selection and the index structure. The advantages of band selection are the same as that of the BORI mentioned above. Moreover, the construction of BARI is the ratio of the area of the triangle (BORI) to the value of the red band as Equation (4). The changes of the triangle area is opposite to the changes of the value of the red band, which increase the sensitivity of the index [21]. In previous studies, the status investigation of boll opening always used the BOR agronomic concept [10]. However, this agronomic indicator is conducive to on-site testing, limited to spot surveys, and not suitable for the investigation on a large scale. Thanks to the BARI firstly proposed, we can directly obtain the situation of boll opening status by UAV images. Also, we can calculate BARI using Sentinel-2 satellite images to infer the value of BAR, and finally obtain the boll opening status on a large scale.
In addition to the newly proposed indices mentioned above, a series of published indices were also selected, which have different performances on BOR and BAR. NDVI, SR and MSR performed not bad in the BAR and BOR estimations. Similarly, the indices construction all contain a red band and NIR band, which performed well in the significance test with BOR and BAR. The ratio form of three indices can partly avoid the influence of irradiance changes from the solar elevation angle, shadow, and negative effects of the atmosphere [16]. This is why DVI with the same band selection cannot obtain ideal results. MSAVI developed by Qi et al. [41] exhibited an average performance with BAR and BOR, which can eliminate the effects of the soil background. However, the elimination of soil leads to a poor performance because soil plays an important role in the reflectance changes in this study. In the estimation using three-band indices, MTVI1, MTCI, RBRI all contain bands that have not passed the significant test, as shown in Table 1. Even though the indices with three bands own richer band information, the bands that did not pass the significance test hindered the ability of the estimation. Red 1 band was used in the index construction of MTCI, and green band was used in MTVI1. Both Re1 band and green band were used in the construction of RBRI. Even if MTVI used a triangular construct similar to our newly proposed index BORI, it still cannot improve the estimation accuracy. Two combined vegetation indices, MCARI/OSAVI and TCARI/OSAVI, are used to reduce the effects of soil background and leaf area index in chlorophyll content estimation studies [47,48]. However, the leaf area and soil background in this study are two important factors. In addition, these two indices include the green and Re1 bands. Therefore, its poor correlations with BAR and BOR were observed.

Conclusions
We proposed and tested the ability of two new indices to reflect the cotton boll opening status with respect to two indicators; the newly proposed remote-sensing derived BAR and the commonlyused BOR. The two newly proposed indices and 10 frequently used spectral indices were selected to assess the behavior of BAR and BOR. In general, the two new indices exhibited accurate detection results, with their robustness much higher than other published spectral indices. The prediction and validation of BARI versus BAR was highly successful. BORI demonstrated optimal results for the BOR retrieval, both in the prediction and validation. At the same time, BARI achieved reasonable results in the prediction and validation of BOR, although slightly worse than BORI. The relationship between BAR and BOR was also determined, and the underlying mechanism behind this phenomenon was also discussed. The results of this study demonstrated the exponential growth relationship between BOR and BAR. This is consistent with the growth pattern of cotton. The process of cotton boll opening is always from the lower layer to the upper layer of the cotton plant [52]. The cracked boll is the first visual sign of boll opening, when the carpel walls separate along sutures. This usually occurs on the mature bolls on the lower layer of the cotton plants [53]. Following this, senescence accelerates and the cotton leaves fall. According to the definition of BAR, the boll opening information is difficult to obtain in the early stage of boll opening, especially in the case of dense planting. However, with the increase of BOR, open bolls gradually appeared on the upper layers. So, BAR would increase accordingly, and the rate of increase was increasing. This is consistent with the research results of the exponential growth relationship of BOR and BAR. Although the research on the cotton boll opening status obtained ideal results, some issues have not been taken into consideration. First, our indices do not consider the changes in leaf reflectance due to senescence. Second, the vertical structure of the cotton can also impact the spectral detection. These factors could be considered in future research.
In summary, BARI and BORI demonstrated their superiority in reflecting the cotton bolls opening status compared to published VIs. We proposed that BARI was suitable for BAR retrievals, while BORI was the optimal index for BOR retrievals.