Bootstrap Assessment of Crop Area Estimates Using Satellite Pixels Counting

: Crop area estimates based on counting pixels over classiﬁed satellite images are a promising application of remote sensing to agriculture. However, such area estimates are biased, and their variance is a function of the error rates of the classiﬁcation rule. To redress the bias, estimators (direct and inverse) relying on the so-called confusion matrix have been proposed, but analytic estimators for variances can be tricky to derive. This article proposes a bootstrap method for assessing statistical properties of such estimators based on information from a sample confusion matrix. The proposed method can be applied to any other type of estimator that is built upon confusion matrix information. The resampling procedure is illustrated in a small study to assess the biases and variances of estimates using purely pixel counting and estimates provided by both direct and inverse estimators. The method has the advantage of being simple to implement even when the sample confusion matrix is generated under unequal probability sample design. The results show the limitations of estimates based solely on pixel counting as well as respective advantages and drawbacks of the direct and inverse estimators with respect to their feasibility, unbiasedness, and variance.


Introduction
Official agricultural statistics usually rely on field surveys based on well-designed sampling plans. Unfortunately, developing countries face financial and organizational problems in conducting such periodic inventories [1], and the current free access to satellite imagery offers an attractive complementary or even alternative solution. For many years, remote sensing has been advocated for boosting the precision of such censuses [2], using the so-called "regression" estimator to reduce the sampling variance of the field surveys. Unfortunately, the relative efficiency of the approach remained limited [3].
More recently, entire image classification at country level provided cheap crop maps, and temptation became high to just proceed to crop classes pixel counting to obtain crop areas [4]. Unfortunately, image classification is subject to errors (omissions and commissions [5] so that the results obtained are generally biased although exempt from sampling variance.
Instead of conceiving the use of imagery as a way to reduce the sampling variance of estimates derived from agricultural surveys, the idea proposed in this paper is to use the ground survey data firstly to correct the bias of an exhaustive image classification and secondly to propagate the errors of the classification rule to derive precision for the crop area estimates.
In 1982, the direct estimator based on image classification was proposed [6]. Later, in 1988, the inverse estimator was introduced [7], leading to a discussion about how direct and inverse estimators should be chosen [8]. Both can be used to redress the biased "pixel counting" estimators by using the so-called confusion matrix of the classification rule. However, the problem of assessing their bias level as well as the variance of their estimates has not been addressed yet. In practice, no attempt to report variances is made because no variance formula is provided by [6][7][8]. The fact that the choice of the appropriate estimator depends on the sampling approach to the ground data collection has also not been discussed yet, representing another literature gap. In this paper, direct and inverse estimators are reviewed, and a discussion about how their feasibility in practice depends on the sampling strategy to collect data on the ground is provided. In addition, bootstrap is proposed as a statistical resampling method useful to assess both bias and variance of the direct and inverse estimators. A bootstrap algorithm based on information from a sample confusion matrix is introduced so as to properly consider the sampling strategy used to generate the sample data used by the estimators. The proposed method can be applied to any other type of estimator that is built upon confusion matrix information.
This paper is structured in six sections, including this introduction. Section 2 introduces types of errors and the direct and inverse calibration estimators. Section 3 discusses the feasibility of using the considered estimators in practice depending on how ground data are collected. Section 4 introduces a brief review on the bootstrap method and presents a bootstrap algorithm for assessing crop area estimates produced via confusion matrices. Section 5 illustrates the application of the proposed resampling method to assess statistical properties of estimates based only on pixel counting and estimates provided by the direct and inverse estimators. Section 6 includes concluding remarks.

Remote Sensing Estimates
Estimation of crop areas using pixel counting is subject to at least two sources of errors: mixed-borders pixels and misclassification of pure pixels. Considering the territory of interest is completely covered by satellite imagery, and assuming the effect of mixed border pixels can be neglected, the bias due to misclassification of pure pixel counting on crop area estimates can be defined with respect to an error matrix.
Consider the classification of images of the whole territory of interest leading to the identification of M classes of land covering so that R = (A +1 , A +2 , . . . , A +M ) is an M × 1 column vector with the total area of pixels classified in each type of class. The actual areas are represented by T = (A 1+ , A 2+ , . . . , A M+ ) , an M × 1 column vector with the truth total area of classes found on the ground. The error matrix, also called confusion matrix, is an M × M matrix A so that its elements are areas classified according to remote sensing image and ground truth: The elements of the error matrix A, denoted by A gc , are areas of class g (ground), also classified as class c by pixel counting. All these components can be arranged into a table with the same structure as the partitioned matrix Q, which is given by: . Table 1 illustrates an example of such a matrix in tabular form, with only three classes of land cover: wheat, corn, and soy. Such structure assumes the image classification algorithm is capable of correctly identifying the classes on the ground (absence of clouds).
In Table 1, A 11 , A 22 , and A 33 represent area estimates of wheat, corn, and soy, respectively, using pixels correctly classified as the corresponding ground truth data. In addition, A 21 and A 31 represent area estimates of wheat based on pixel counting that are indeed areas of corn and soy, respectively. The summation (A 21 + A 31 ) = Φ 1 is called the commission error related to the wheat area estimate. On the other hand, A 12 and A 13 are wheat ground area that were mistakenly estimated as corn and soy areas, respectively, using pixel counting. The summation (A 12 + A 13 ) = Ψ 1 is called the omission error related to wheat estimate. The vectors R and T are, respectively, the row and the column marginals of the Table 1, while A ++ represents the total number of pixels classified over the considered territory.

Total Area and Bias Estimation
Error matrix A and column vector T are unknown in practice. T is the parameter of interest. Estimating the total crop areas of T based only on R is subject to a bias given by the difference between the commission and the omission errors. Let R c represent the crop c area estimator based solely on pixel counting and T c be the ground truth area of the same crop. Define B c = B c (R c , T c ) as the bias of the estimator R c for the total area T c . Let Φ c = A +c − A cc and Ψ c = A c+ − A cc be the commission and the omission error related to such estimate, respectively. Then, Denoting by B = (B 1 , B 2 , . . . , B M ) , the column vector of biases for each of the M classes' estimates, one can write: Define D R = diag(A +1 , A +2 , . . . , A +M ) as the diagonal matrix with the R information in the main diagonal. Let the following relative error matrix be defined: E g|c is a matrix with conditional probabilities that a pixel is over the ground truth class g given the pixel is classified as c. Its column vectors are the relative columns frequencies of the table based on matrix Q. Hence, Based on expression (3), ifÊ g|c denotes an unbiased estimator for E g|c , then one can usê as an unbiased estimator of T. Following a similar reasoning, let D T = diag(A 1+ , A 2+ , . . . , A M+ ) be the diagonal matrix with the T information in the main diagonal and E c|g the relative error matrix given by: E c|g is a matrix with conditional probabilities that a pixel is classified as class c given the ground truth class is g. Its column vectors are relative row frequencies of the table based on matrix Q. Therefore, If E c|g is non-singular, then it is possible to write IfÊ −1 c|g is an unbiased estimator for E −1 c|g , one can usê as an unbiased estimator for T. Approximately unbiased estimatorsÊ g|c andÊ −1 c|g can be defined depending on the availability of further data. Suppose ground information can be observed for a sample of n test points, following Gallego's recommendations [5] (p. 252) so that a sample M × M error matrix a is available: The elements a gc of the sample error matrix a are areas of class g (ground) classified as class c by pixel counting over a set of sampling testing points. a provides information that can be used to estimate A.
Let r = (a +1 , a +2 , . . . , a +M ) be an M × 1 column vector with the total area of sample test pixels classified in each type of class. Let t = (a 1+ , a 2+ , . . . , a M+ ) be an M × 1 column vector with the truth total areas of classes found on the sample ground points. All these components can be arranged into a table with the same structure as the partitioned matrix q, which is given by: with a ++ = n. Define the following diagonal matrices based on q: D r = diag(a +1 , a +2 , . . . , a +M ) and D t = diag(a 1+ , a 2+ , . . . , a M+ ). Then, one can defineÊ g|c = e g|c , andÊ −1 c|g = e −1 c|g so that: The two estimators previously written asT 1 andT 2 assume the form of the known direct and inverse calibration type estimators, respectively, defined by: The direct calibration estimator: The inverse calibration estimator: for nonsingular e c|g . The bias B for using only R as crop area estimates for T can then be assessed depending on the calibration estimator used:B orB T Direct andT Inverse are nonlinear estimators. Analytical expressions for their variances depend on how the sample of test points that generated the matrix q was selected and may not be simple to derive. Although Taylor linearization technique could be applied to assess the direct estimator, assessment of the inverse estimator variance is not a simple task. In this paper, a bootstrap algorithm to estimate the variances of both estimators is introduced, which takes into account unequal probability sample designs. The same algorithm idea can be adapted to assess any estimator built upon confusion matrices' information.

Feasibility of Estimators in Practice
Direct and inverse calibration estimators rely on estimates of conditional probabilitieŝ E g|c = e g|c , andÊ −1 c|g = e −1 c|g , respectively. Let P(G = i, C = j), i = 1, 2, . . . , M, j = 1, 2, . . . , M be the joint probability of a pixel being on the ground of crop i and being classified as crop j. Let P(G = i) be the probability that a pixel is over a ground truth crop i and P(C = j) be the probability that a pixel is classified as crop j. Then, it is possible to write: Stats 2022, 5

427
The direct calibration estimator defined in (8) relies on the relationship between the joint probabilities and the conditional probabilities expressed by (12), while the inverse calibration estimator defined in (9) relies on the relationship described in Equation (13).
The need to reach the joint distribution P(G = i, C = j) is addressed through the conditional probabilities P(G = i|C = j) and P(C = j|G = i), estimated by a sample of test points. In practice, the following strategies to collect testing points could be considered: Strategy 1. Bivariate classification of points (Bivariate): To randomly select a set of geographical coordinates in the region of interest and then to observe their category in the image (image classification) and over the field (ground truth).
The Bivariate strategy of classification provides direct information of the joint probabilities P(G = i, C = j), allowing the choice of using eitherT Direct orT Inverse . However, field work costs involved with this procedure may lead to its non-feasibility in practice. The RS strategy of classification can provide information about the conditional probabilities P(G = i|C = j) , allowing the use of the direct estimatorT Direct . However, such strategy may also represent a challenge to be implemented in practice due to field costs. If this strategy is used, it is not possible to estimate P(C = i|G = j) , and so the inverse estimator has no theoretical basis to be chosen. The G strategy of classification provides information about the conditional probabilities P(C = j|G = i) so that the inverse calibration estimatorT Inverse can be used. If this strategy is implemented, it is not possible to estimate P(G = i|C = j) , so there is no theoretical justification to choose the direct estimator even though considering organizing the field work this way tends to be more cost-effective. Therefore, the choice of the G strategy of classification seems to be the one with most appeal to be used in practice.
It is possible that both direct and inverse estimators could suffer instability when the number of pixels observed is not large enough [8]. Agreement between the interpretation of the classification rules and the surveyed ground classes is essential. Inverse estimator is not feasible if nearly singular sample confusion matrices are observed. This can be a consequence, for example, of a poor classification algorithm. Inversibility of the sample confusion matrix is ensured when the classification rule is deemed minimally acceptable in practice [9] so that: Further, caution is also needed when the image classification imposes the use of more categories (i.e., a class corresponding to cloudy areas) than the reality seen on the ground, leading to rectangular sample confusion matrices. Inverse estimators based on completing such matrices with zeroes are not guaranteed to work. In addition, an extra source of care is needed concerning the sample design used to select the testing points. Strategies 1 to 3 mention the use of random samples in the sense that probability sample designs are employed. Such designs may range from equal probability sampling to more complex selection methods. The bootstrap method proposed in this paper considers the possibility that an unequal probability sample design is used to generate the sample confusion matrix. In case of unequal probability sampling, the direct and inverse estimators also have to be weighted in function of inclusion probabilities. In this case, the sample confusion matrix a = â gc must be composed by design-consistent estimators using sampling weights defined as the inverse of inclusion probabilities π k .

Bootstrap Resampling
Bootstrap is a computer-intensive statistical methodology introduced by Efron in 1979 [10], which replaces complex analytical procedures by computer intensive empirical analysis. It relies on Monte Carlo method, where several random resamples are drawn from a given original sample. The bootstrap method has been applied in a variety of situations (e.g., [11][12][13][14]). Several authors provide comprehensive discussion of the bootstrap method. Beaumont [15], Efron and Tibshirani [16], Hersterber [17], and Shao and Tu [18] are some of them.
Consider a random sample y = (y 1 , . . . , y n ) T of size n, where each element is a random draw from the random variable Y, which has the distribution function F = F(θ), and θ is the parameter that indexes the distribution. Here, θ is viewed as a functional of F, i.e., θ = T(F). Letθ be an estimator of θ based on y so that it is possible to writeθ = S(y). The application of the bootstrap method consists in obtaining a large number of pseudosamples y * = y * 1 , . . . , y * n T from the original sample y and then extracting information from these pseudo-samples to improve inference. In principle, there are two different ways of obtaining and evaluating bootstrap estimates: non-parametric bootstrap, which does not assume any distribution of the population, and parametric bootstrap, which assumes a particular distribution for the sample at hand [19]. In the parametric version, the bootstrap samples are obtained from F θ , which is expressed here as Fθ, whereas in the nonparametric version, they are obtained from the empirical distribution functionF through sampling with replacement. The nonparametric bootstrap does not entail parametric assumptions.
Let B F θ , θ be the bias of the estimatorθ = S(y); that is, where the subscript F indicates that expectation is taken with respect to F. The bootstrap estimators of the bias in the parametric and nonparametric versions are obtained by replacing the true distribution F, which generated the original sample, with Fθ andF, respectively, in (13). Therefore, the parametric and nonparametric estimates of the bias are given, respectively, by: and If B bootstrap samples y * 1 , y * 2 , . . . , y * B are generated independently from the original sample y, and the respective bootstrap replications (θ * 1 ,θ * 2 , . . . ,θ * B ) are calculated Therefore, the bootstrap bias estimates based on B replications ofθ are: for the parametric and nonparametric versions, respectively. By using the two bootstrap bias estimators above, it is possible to obtain estimates that are bias-corrected up to order O n −1 . In addition, bootstrap variances can be assessed using:

A Bootstrap Algorithm for Crop Area Estimates' Assessment
Monte Carlo simulation based on artificial data has been used to compare the performances of crop area estimators [20]. Bootstrap has been applied to assess accuracy of agricultural land classifications relying on resampling points on the entire territory of interest [21]. In this paper, the interest is to explore the potential for using bootstrap methods to assess crop estimates generated by estimators that are function of confusion (error) matrices, such as the direct and the inverse estimators. Hence, a bootstrap algorithm is proposed, built upon the use of confusion matrix sample information.
Define U as the set of all N pixel points needed to cover the entire territory of interest. The elements (pixel points) of such set need not be identifiable, but they are rather deemed to be used to build the matrix Q.
Direct and inverse estimators presented depend upon data provided by a sample S ⊂ U of n testing pixel points to compose a sample confusion matrix a. Consider sample S is selected from U using one of the three strategies listed on the last section by a probability sampling design p(.). Define I k = 1 if pixel k ∈ S and I k = 0 if pixel k / ∈ S so that π k = P(I k = 1) is the first order inclusion probability for k ∈ U.
LetQ(S) = q 0 be the sample matrix estimate for the matrix Q based on the probability sampling design p(.) so that: a 0 = â gc is the sample confusion matrix built upon the classification of all n pixel points in S. Let γ k (g, c) = 1 if pixel k is classified as crop g on the ground and class c by the satellite image; otherwise, γ k (g, c) = 0,. Then, each element of a 0 can be written as: In 2020, Conti et al. provided the asymptotic theory needed to support selecting bootstrap samples in two phases where, in the first phase, a pseudo-population U * is predicted and calibrated to the size of the original population U, and in the second phase, bootstrap resamples are selected conditionally from U * based on the original sampling design p(.) [22]. The following are proposed algorithms adapted from them to fit the crop area estimation scenario: Bootsrtrap Algorithm to be Used with a Bivariate Strategy of Point Classification: Step 1. Build a multinomial pseudo-population: Let i = 1, 2, . . . , N be a sequence of independent trials where for each i, a pixel k ∈ S is selected for the pseudo-population with probability p k = π k / ∑ j∈S π j . If pixel k is selected for the pseudo-population, information regarding the two-way classification of pixel k is also retained. Let δ ik = 1 if pixel k ∈ S is selected at the i-th trial; otherwise, δ ik = 0. Recall that γ k (g, c) = 1 if pixel k is classified as crop g on the ground and class c by the satellite image; otherwise, γ k (g, c) = 0. Then, the retained two-way classification of a pixel i ∈ U * , γ i (g, c) can be written as: The pseudo-population U * is then composed by the set of N pixels selected from S and their respective classification γ i (g, c), for i = 1, 2, . . . N.
Step 2. Select the b-th bootstrap sample: Select a probability sample of n pixels from the pseudo-population S b ⊂ U * using the same sample design p(.) used to generate the original sample S ⊂ U. This means to select S b with inclusion probability π i = ∑ k∈S δ ik π k . Keep the values γ i (g, c) for those i ∈ S b .
Step 3. Build the b-th bootstrap sample confusion matrix a b = a and calculate: and a (b) gc .
Step 5. Calculate the b-th bootstrap estimates and keep their values: Let R 0 be the observed crop area estimates for the territory of interest based solely on pixel counting from satellite image. R 0 is the only component of Q that is known: Calculate the b-th bootstrap estimates using: andT Step 6. Repeat steps 2 to 5 for b = 1, 2, . . . B, where B is the desired number of bootstrap replicated samples.
Step 7. Calculate the bootstrap estimates and respective variances: T Bootstrap Inverse One should note that the possibility of calculating the two bootstrap estimators, direct and inverse, relies on the fact that a Bivariate strategy of classification of testing points is used, as described in Section 3. However, as discussed in the same section, a G strategy of classification of points is the one that is feasible in practice. If the G strategy is adopted, the bootstrap algorithm steps 1 and 2 must be implemented for each ground category g = 1, 2, . . . , M independently. The modified steps 1 and 2 can be written as: Modified Bootstrap Algorithm Steps for Using with G strategy of Point Classification: Step 1. Build a multinomial-product pseudo-population for a G strategy: Let S g be the sample of n g = n/M test points selected using the G strategy to compose the sample confusion matrix a for g = 1, . . . , M based on the probability sample design p g (.). Let i = 1, 2, . . . , N/M be a sequence of independent trials, where for each i, a pixel k ∈ S g is selected for the pseudo-population stratum U g with probability p k = π k / ∑ j∈S g π j . If pixel k is selected for the pseudo-population stratum U g , information regarding the two-way classification of pixel k is also retained. Let δ ik = 1 if pixel k ∈ S g is selected at the i-th trial, and δ ik = 0 otherwise. Recall that γ k (g, c) = 1 if pixel k is classified as crop g on the ground and class c by the satellite image; and γ k (g, c) = 0 otherwise. Then, the retained two-way classification of a pixel i ∈ U * g , γ i (g, c) can be written as: The pseudo-population stratum U * g is then composed by the set of N/M pixels selected from S g and their respective classification γ i (g, c) for i = 1, 2, . . . N/M. The pseudopopulation is so that U * = ∪ N/M g=1 U * g , with all two-way classification information retained.

Step 2. Select the b-th bootstrap sample under the G strategy:
Let U g and U * g be the set of pixels in the population stratum g and pseudo-population stratum g, respectively. For each g = 1, 2, . . . , M, select a probability sample of n/M pixels S (b) g ⊂ U * g using the same sample design p g (.) used to generate the original sample S g ⊂ U g . This means to select S (b) g from U * g with inclusion probability and keep the values γ i (g, c) for those i ∈ S (b) g . In this case, only the bootstrap inverse estimator makes sense to be calculated. Therefore, in step 4, only Equation (24) should be calculated, and in step 5, only the inverse estimator defined by Equation (26) should be calculated. In step 7, only Equations (29) and (30) apply.
If the RS strategy is used instead, then the bootstrap algorithm steps 1 and 2 must be implemented for each class category c = 1, 2, . . . , M independently. The modified steps 1 and 2 are given by: Modified Bootstrap Algorithm Steps for Using with RS strategy of Point Classification: Step 1. Build a multinomial-product pseudo-population for an RS strategy: Let S c be the sample of n c = n/M test points selected using the RS strategy to compose the sample confusion matrix a for c = 1, . . . , M based on the probability sample design p c (.). Let i = 1, 2, . . . , N/M be a sequence of independent trials, where for each i, a pixel k ∈ S c is selected for the pseudo-population stratum U c with probability p k = π k / ∑ j∈S c π j . If pixel k is selected for the pseudo-population stratum U c , information regarding the two-way classification of pixel k is also retained. Let δ ik = 1 if pixel k ∈ S c is selected at the i-th trial, and δ ik = 0 otherwise. Recall that γ k (g, c) = 1 if pixel k is classified as crop c on the ground Stats 2022, 5 432 and class c by the satellite image; and γ k (g, c) = 0 otherwise. Then, the retained two-way classification of a pixel i ∈ U * c , γ i (g, c) can be written as: The pseudo-population stratum U * c is then composed by the set of N/M pixels selected from S c and their respective classification γ i (g, c) for i = 1, 2, . . . N/M. The pseudopopulation is so that U * = ∪ N/M c=1 U * c , with all two-way classification information retained.
Keep the values c . In this case, only the bootstrap direct estimator makes sense to be calculated. Therefore, in step 4, only Equation (23) should be calculated, and in step 5, only the inverse estimator defined by Equation (25) should be calculated. In step 7, only Equations (27) and (28) apply.

Application and Results
The proposed method is illustrated using data built upon the joint probabilities described in Table 2. The numbers correspond to a scenario where the crop areas of wheat, rapeseed, corn, sugar beet, and others are present in a given territory in the proportions of 25%, 5%, 10%, 20%, and 40%, respectively. The classification rule is such that 0.2/0.25 = 0.8 of the area with wheat is correctly classified. The proportion of correct classification for rapeseed, corn, sugar beet, and others are 0.6, 0.8, 0.6, and 0.6, respectively, all satisfying the condition to be practically minimally acceptable [9]. It is assumed the area of the territory of interest is covered by 1 million pixels (N) that are classified generating an estimate of areas by pixel counting so that the proportion of area cultivated by wheat is 31.6%, by rapeseed 9.5%, by corn 13.5%, by sugar beet 16%, and by other crops (others) 29.4%. Such estimates reveal the classification rule overestimates wheat, rapeseed, and corn and underestimates sugar beet and others. The set of 1 million pixels is the population U, and the estimates based purely on pixel counting for this population is the R vector that corresponds to the marginal column of Table 2 multiplied by 1 million.
Assessment of purely pixel counting estimates and estimates generated by direct and inverse estimators was done for the three strategies of collecting test points, as described in Section 3.
Bivariate strategy was first investigated using simple random sampling of a thousand pixels that are classified based on the ground and based on the satellite image, generating Table 3.  Table 3 information is used to generate a sample confusion matrix a with crop area estimates. For example, the confusion matrix area for the classification g = 1, c = 1, is provided by:â 11 = 1,000,000 × ∑ i∈S γ i (1, 1) 1000 = 1000 × 201 = 201,000.
The elements of the confusion matrix can also be represented by the estimated proportion of each crop area classification. In such a case, The confusion matrix is then used to provide input for the direct and inverse calibration estimators.
The proposed bootstrap algorithm was used to first generate a pseudo-population U * of size 1 million, built upon the set of test points classified in Table 3 and then, at each replicate b, to select a new set of 1000 points using strategy 1 and simple random sampling to generate a b and to compose the direct and inverse estimates.
RS strategy of classification of selected test points was then investigated using independent simple random sampling of two hundred pixels within each class of crop classified by remote sensing. This corresponds to use the columns of Table 2 as strata to select the test points. The selected points are then classified based on the ground, generating Table 4. Table 4 information can only be used to generate a sample confusion matrix a with crop area estimates based on probabilities conditioned to the remote sensing classification (columns). For example, the confusion matrix proportion area for the classification g = 1, given a pixel is classified as c = 1, is provided by: Therefore, it only makes sense to use such information to calculate estimates by the direct estimator. The proposed bootstrap algorithm, with modified steps 1 and 2 to strategy 2, was used to first generate a pseudo-population U * of size 1 million, built upon the set of test points classified in Table 4 and then, at each replicate b, to select independently a new set of 200 points for each class of remote sensing using simple random sampling. The data generate a b , and estimates are calculated using the direct estimator. Although inappropriate, the inverse estimator was also calculated for each bootstrap replicate for the sake of illustration.
G strategy was studied using independent simple random sampling of two hundred pixels within each class of crop classified on the ground. This corresponds to use the rows of Table 2 as strata to select the test points. The selected points are then classified based on remote sensing, generating Table 5. Table 5. Test points classification using G strategy, with 200 pixels selected by simple random sampling from each set of ground class independently. Wheat  163  14  3  6  14  200  Rapeseed  37  124  0  0  39  200  Corn  3  25  150  15  7  200  Sugar beet  7  15  37  117  24  200   Ground  truth  classes  Others  57  15  3  12  113  200   Table 5 information can only be used to generate a sample confusion matrix a with crop area estimates based on probabilities conditioned to the ground classification (rows). For example, the confusion matrix proportion area for the classification c = 1, given a pixel is on the ground class g = 1, is provided by:

Remote Sensing Classification Total Wheat Rapeseed Corn Sugar Beet Others
Therefore, it only makes sense to use such information to calculate estimates by the inverse estimator.
The proposed bootstrap algorithm, with steps 1 and 2 modified to strategy 3, was used to first generate a pseudo-population U * of size 1 million, built upon the set of test points classified in Table 5 and then, at each replicate b, to select independently a new set of 200 points for each ground class using simple random sampling. The data generate a b , and estimates are calculated using the inverse estimator. Although inappropriate, the direct estimator was also calculated for each bootstrap replicate for the sake of illustration.
All the strategies were evaluated using 1000 bootstrap replicates. Figure 1 summarizes the results for each one, describing the bootstrap distribution of the direct and inverse estimators for each crop. points classified in Table 5 and then, at each replicate , to select independently a new set of 200 points for each ground class using simple random sampling. The data generate , and estimates are calculated using the inverse estimator. Although inappropriate, the direct estimator was also calculated for each bootstrap replicate for the sake of illustration.
All the strategies were evaluated using 1000 bootstrap replicates. Figure 1 summarizes the results for each one, describing the bootstrap distribution of the direct and inverse estimators for each crop. In Figure 1, the dotted line represents the parameter of the population (ground truth, ), and the solid line represents the estimate based purely on pixel counting ( ). Strategy 1 uses a multinomial selection to compose the confusion matrix and hence allows for the use of either direct or inverse estimators. Focusing on the results of strategy 1, it is possible to see the bias of the estimates provided by , based on pixel counting. Clearly, using direct or inverse estimates results in improvement over the ones from . Considering crop areas of sugar beet, both estimators (direct and inverse) provide practically unbiased estimates, with the direct estimator showing smaller variance. For corn, a small bias is noted for both direct and inverse estimators, with direct estimator providing smaller variance. For rapeseed, wheat, and others, one can see that the direct estimator performs better than the inverse, showing smaller variance and bias.
The analysis of the results under the RS strategy must consider the fact that the confusion matrix was built based on a multinomial product selection of points per image classification. Hence, when analyzing the results in Figure 1, only the direct estimator makes sense to be calculated. For all crops, the estimates provided by the direct estimator represent an improvement over estimates provided by , based on pixel counting in the sense that it redresses considerably the bias. For wheat, rapeseed, and others, the direct estimator shows smaller bias than for corn and sugar beet. The use of the inverse estimator In Figure 1, the dotted line represents the parameter of the population (ground truth, T), and the solid line represents the estimate based purely on pixel counting (R). Strategy 1 uses a multinomial selection to compose the confusion matrix and hence allows for the use of either direct or inverse estimators. Focusing on the results of strategy 1, it is possible to see the bias of the estimates provided by R, based on pixel counting. Clearly, using direct or inverse estimates results in improvement over the ones from R. Considering crop areas of sugar beet, both estimators (direct and inverse) provide practically unbiased estimates, with the direct estimator showing smaller variance. For corn, a small bias is noted for both direct and inverse estimators, with direct estimator providing smaller variance. For rapeseed, wheat, and others, one can see that the direct estimator performs better than the inverse, showing smaller variance and bias.
The analysis of the results under the RS strategy must consider the fact that the confusion matrix was built based on a multinomial product selection of points per image classification. Hence, when analyzing the results in Figure 1, only the direct estimator makes sense to be calculated. For all crops, the estimates provided by the direct estimator represent an improvement over estimates provided by R, based on pixel counting in the sense that it redresses considerably the bias. For wheat, rapeseed, and others, the direct estimator shows smaller bias than for corn and sugar beet. The use of the inverse estimator is not appropriate in this scenario, and one can see that its performance does not represent improvement over the R estimates.
The last analysis refers to the G strategy to classify selected testing points. Under this strategy, confusion matrix is built based on a multinomial product selection of points per ground classification. Therefore, only the inverse estimator makes sense to be calculated. Focusing on the results for the G strategy in Figure 1, one can see that for all the crops, using the inverse estimator represents improvements over the use of R in the sense of redressing the bias of using only pixel counting estimates. The use of the direct estimator is not appropriate in this case, and one can see that for rapeseed, corn, and others, the direct estimator is not acting to diminishing bias. Although for sugar beet and wheat, the direct estimator has shown a similar effect of bias reduction as the inverse estimator and with smaller variance, its composition has no support under this scenario and should be avoided. Figure 2 allows for an analysis emphasizing the effect of the different strategies for selecting test points over the crop area estimates of corn. In this case, when the Bivariate strategy is adopted, direct and inverse estimators offer improved estimates in the sense that they show smaller bias than the estimate based on pixel counting alone. The use of either estimator is theoretically justified, and the direct estimator shows smaller variance than the inverse one. direct estimator is not acting to diminishing bias. Although for sugar beet and wheat, the direct estimator has shown a similar effect of bias reduction as the inverse estimator and with smaller variance, its composition has no support under this scenario and should be avoided. Figure 2 allows for an analysis emphasizing the effect of the different strategies for selecting test points over the crop area estimates of corn. In this case, when the Bivariate strategy is adopted, direct and inverse estimators offer improved estimates in the sense that they show smaller bias than the estimate based on pixel counting alone. The use of either estimator is theoretically justified, and the direct estimator shows smaller variance than the inverse one. Continuing to analyze Figure 2, if the RS strategy is adopted, then only the direct estimator is theoretically justified. Indeed, the bootstrap distributions show that only the direct estimator was able to redress considerably the bias for corn area estimates compared to the estimate based solely on pixel counting ( ). On the other hand, if the G strategy is adopted, only the use of the inverse estimator can be justified. Indeed, one can see that for the G strategy, the performance of the estimators shows that only the inverse estimator was able to diminish bias for corn area estimates. In such case, the inappropriate use of the direct estimator leads to an increase in bias. It should be noted, however, that only the G strategy has an appeal to be used in practice, as argued in Section 3. Continuing to analyze Figure 2, if the RS strategy is adopted, then only the direct estimator is theoretically justified. Indeed, the bootstrap distributions show that only the direct estimator was able to redress considerably the bias for corn area estimates compared to the estimate based solely on pixel counting (R). On the other hand, if the G strategy is adopted, only the use of the inverse estimator can be justified. Indeed, one can see that for the G strategy, the performance of the estimators shows that only the inverse estimator was able to diminish bias for corn area estimates. In such case, the inappropriate use of the direct estimator leads to an increase in bias. It should be noted, however, that only the G strategy has an appeal to be used in practice, as argued in Section 3.
The type of analysis presented for corn in Figure 2 can also be done for the remaining considered crops. Although they are omitted due to space constraints, they are available through a GitHub directory. Tables with the summary of statistical properties for each estimator under each strategy are presented in the Appendix A. When analyzing the summaries of Tables A1-A3, one should keep in mind that offices of official statistics look for CVs around 5-10% at the regional level (10,000 km 2 ). Considering that 1 million pixels of Sentinel 1 (10 × 10 m 2 ) correspond roughly to only 100 km 2 and the fact that, very often, the obtained CVs are below 10%, it is possible to conclude that the obtained precision is well in line with the needs.

Concluding Remarks
Remote sensing has several potential uses in agricultural statistics [23][24][25], including estimating crop areas based on pixel counting over satellite images. Such estimates, based purely on pixel counting, are known to be biased [5,20], and their variance depends upon the error rates of the classification rule in use even though several studies still use them with no further assessment of their statistical properties. Keeping the accuracy of the estimates under some control [26] may depend on factors such as landscape and image resolution. Direct and inverse calibration estimators reviewed in this paper are built upon sample confusion matrix information and intend to redress the bias of estimates based on pixel counting. Several studies comparing both estimators describe the direct estimator as presenting the best performance [26,27]. There are instances where the inverse estimator shows smaller variance [20]. In this paper, we emphasized the fact that the feasibility of each estimator in practice depends upon the chosen strategy to collect and classify testing points on the field. Three strategies were discussed in this paper: the Bivariate, the RS and the G strategy. The G strategy is presented as the one with more practical appeal. If the G strategy is carried out in practice, the only estimator that is theoretically supported is the inverse estimator. Even considering the appropriate sampling strategy, assessing their variance may not be a simple task, as it also depends on the complexity of the sample design used to build the confusion matrix. In order to cope with this problem, a bootstrap algorithm was introduced for each sampling strategy, based on information provided by confusion matrices, that considers unequal inclusion probabilities. A small simulation study was presented where the statistical properties of the considered estimators were assessed based on the proposed bootstrap algorithm. The results illustrate the effectiveness of the bootstrap resampling method to assess direct and inverse calibration estimators under appropriate strategies. The performances shown are in line with the theoretical expectations and with the results from other studies [20,21,[26][27][28]. The codes and main results of the simulation can be found in [29].

Acknowledgments:
The authors would like to thank the editors and three anonymous reviewers for their comments. They helped to improve the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Tables A1-A3 show the summaries of the performances of each estimator of crop area, assessed by the proposed bootstrap algorithm, with 1000 replicates. Each Table  corresponds to a strategy of sampling testing points in the ground. Estimates of area need to be multiplied by 1000 to represent the estimated total area in hectares. Estimators are assessed with respect to standard deviation (Std.Dev.), CV expressed in percentage, and absolute bias (Bias).