Validation of Land Cover Products Using Reliability Evaluation Methods

Validation of land cover products is a fundamental task prior to data applications. Current validation schemes and methods are, however, suited only for assessing classification accuracy and disregard the reliability of land cover products. The reliability evaluation of land cover products should be undertaken to provide reliable land cover information. In addition, the lack of high-quality reference data often constrains validation and affects the reliability results of land cover products. This study proposes a validation schema to evaluate the reliability of land cover products, including two methods, namely, result reliability evaluation and process reliability evaluation. Result reliability evaluation computes the reliability of land cover products using seven reliability indicators. Process reliability evaluation analyzes the reliability propagation in the data production process to obtain the reliability of land cover products. Fuzzy fault tree analysis is introduced and improved in the reliability analysis of a data production process. Research results show that the proposed reliability evaluation scheme is reasonable and can be applied to validate land cover products. Through the analysis of the seven indicators of result reliability evaluation, more information on land cover can be obtained for strategic decision-making and planning, compared with traditional accuracy assessment methods. Process reliability OPEN ACCESS Remote Sens. 2015, 7 7847 evaluation without the need for reference data can facilitate the validation and reflect the change trends of reliabilities to some extent.


Introduction
Land cover is one of the most critical variables of the earth system and affects various parts of the human and physical environments.It has a major role in the exchanges of energy, heat, and momentum between continents and the atmosphere [1].Therefore, information on the state of land cover and land cover dynamics is much needed on a regular basis to support major scientific and policy applications.Land cover products that focus on characterizing different vegetation types have been developed via remotely sensed datasets.To provide reliable and accurate land cover information, validating land cover data should be a fundamental task prior to data application.
Efforts to improve the validation methods and schemes of land cover products were realized by focusing on sampling design, response design, and analysis [2].Sampling designs differ in their suitability to achieve different objectives in the validation of land cover [3].Mayaux, et al. [4] used a two-stage stratified clustered sampling to validate the Global Land Cover 2000 Map.Stehman, et al. [5] constructed a stratified sampling design incorporating class-level stratification to estimate the accuracy of land cover maps.Considerable research on response design focused on reducing the effect of error in ground reference data to enable more accurate estimation in the validation of land cover products.A virtual field reference database was designed to support detailed assessments of land cover products by providing a robust database that characterizes representative cover types [6].Foody et al. [7] used two sources of volunteered data to illustrate the potential of amateur activity in validating the forest cover representation provided by the GlobCover map of the European Space Agency.Foody et al. also studied the impact of ground reference data error on the accuracy of estimates of the extent of change and on the accuracy of change detection in [8].Wulder et al. [9] used the collection of airborne videos used for validating sustainable development of forests' land cover maps.Estimation and analysis are commonly based on a confusion matrix that summarizes the key information obtained from sampling and response designs.The confusion matrix is mainly used to provide a basic description of land cover product accuracy and to compare accuracies [10].Indices such as overall, users' and producers' accuracies derived from the confusion matrix are used to assess the accuracy in validating land cover products [4,11,12].
However, these validations only provide information on the classification accuracy of land cover data as a whole; they cannot be used to characterize the reliability of the data, substantially limiting the value of data in use.In the field of system engineering, reliability is defined as the probability that an item can perform its specified function for a certain interval under given conditions [13].Reliability of land cover products refers to the function that the data can correctly, effectively, and completely reflect the actual land cover conditions under certain spatio-temporal conditions.Reliability reflects the characteristics more than the classification accuracy of land cover data, and thus it has numerous impacts in applying the data and affects strategic decision and planning.Reliability evaluation is widely applied in various fields, such as power system, network, software, and pipeline [14][15][16][17].In the field of geographical information science and remote sensing, Zhang et al. [18] proposed a fusion approach of several typical change-detection algorithms based on the reliability of individual change-detection algorithms to generate reliable land cover change information.Based on the research of uncertainties in spatial analysis [19], Shi et al. [20] introduced reliability theory into spatial analysis to obtain more accurate and reliable spatial analysis results.Considerable attention should be directed toward the study of reliability evaluation of land cover products.
Validation is also often constrained by the lack of high-quality ground reference data [7].Furthermore, the reference data are expensive and logistically challenging to collect for large area land cover products [21].Reliability analysis on the data production process to obtain the reliability of data products can solve this problem to some extent because this analysis does not require the reference data, and the reliability propagation in the data production process determines the reliability of land cover products.
Thus, this paper proposes a new validation schema to evaluate the reliability of land cover data.The schema includes two validation methods, namely, result reliability evaluation and process reliability evaluation.Result reliability evaluation addresses a few reliability indicators to evaluate the reliability of land cover products.Process reliability evaluation obtains the reliability of land cover data through the reliability analysis on the process of data production without the need for reference data.Section 2 describes the proposed validation schema.Section 3 tests the validity of the proposed schema by providing applications in six typical study areas and gives the validation results.Discussions and conclusions are presented in the last section.

Methodology
The general framework of our validation schema is shown in Figure 1.The validation schema includes two methods: result reliability evaluation and process reliability evaluation.Sampling design should be conducted when evaluating the result reliabilities of a group of land cover maps or a certain land cover type in the group.The reliability of the sample unit (one map or certain land cover type in the map) can then be obtained using seven reliability indicators and a weighted computation method.Process reliability evaluation is based on the process reliability analysis model to evaluate the reliability of one map.Given that the analysis does not need reference data, evaluating all maps in the group is relatively easy without the need for sampling design.

Result Reliability Evaluation
Result reliability evaluation uses reliability indicators to obtain the reliability of land cover data.Shi et al. [20] presented reliability indicators such as correctness, integrity, consistency, robustness, and applicability in reliable spatial analysis.Errors affecting the reliability of land cover data can be caused by confusion between land cover types (wrong label or missing types) or can be spatial errors (wrong position of the boundary between types or disappearance of small patches).Other factors such as acquisition scale, human factors, and acquisition time can also affect data reliability.Based on these factors and the characteristics of land cover data, we propose seven quantized indicators (Figure 1) and a weighted computation method to evaluate the reliability of land cover data.In the hierarchical classification system, the multilevel reliabilities of one map or a certain land cover type are supported.These reliability indicators are defined as follows: (1) Classification correctness (C) refers to the probability of incorrectly classified area in the land cover data.When computing multilevel reliabilities of the data, error level should be judged according to the feature type and actual land cover type to compute the incorrect area.Let A c_error be the incorrectly classified area of the data, and A total be the total area of the data.The following formula can thus be obtained (2) Scale reasonableness (S) represents the scale effect to the reliability of the data.Let A currenct_scale be the area of the data at current acquisition scale, and A be the actual area of the data.Thus, scale reasonableness can be computed by (3) Integrity (I) reflects the effect of missing or extra features and feature types on the reliability of the land cover data.I can be defined as where w , w 0 are the weighting parameters, N t_total is the total number of feature types, N t_error is the number of missing or extra feature types, A a_total is the total area of the data, and A a_error is the area of missing or extra features.
(4) Robustness (B) refers to the stability of the land cover data to maintain its reliability when evaluated by different evaluation professionals, who can be affected by several factors, such as evaluating experience, evaluating condition, spirit state, and psychological quality.Based on the variance value of the evaluation results obtained by different experts, B is expressed as where δ is a constant normally set to 10.If B is less than zero, then we assign zero to B.
(5) Consistency (K) refers to the level of congruence between land cover data and actual land cover condition.This paper adopts Cohen's kappa coefficient for the consistency measurement.Cohen's kappa coefficient is often applied as an index of classification accuracy to evaluate the consistency between classification result and imagery data [22].Derived from a confusion matrix, K is expressed as: where r denotes the number of land cover types, N is the total number of features, N ii is the main diagonal of the matrix, N i+ is the sum of the ith row of the matrix, and N +i is the sum of the ith column of the matrix.( 6) Currency (T) measures the extent that the land cover data acquired before can reflect the actual land cover condition at the current time.Given the actual land cover condition changes over time, the reliability of the data will decrease.Let be the change ratio of the land cover condition from the data acquisition time to the data evaluation time.Large value represents a low reliability of the data.Thus, T is defined as (7) Position Precision (P) presents the position offsets of the features in the land cover data according to the actual land cover condition.P can be computed according to the features with geometry displacement error and overedge error, i.e., P w N g_error /N total w N o_error /N total (7) where w , w 0 are the weighting parameters, N g_error and N o_error represent the number of features with geometry displacement error and overedge error respectively, and N total denotes the total number of features of the data.
Eventually, the reliability of the sample unit can be computed by where R is the reliability of the sample unit, is the ith aforementioned reliability indicator, and 0 and ∑ 1 are the weights of the indicator determined by the data application and the professional knowledge.

Process Reliability Evaluation
Process reliability evaluation obtains the reliability of land cover products through the analysis on reliability propagation in the data production process without the need for reference data.High uncertainty and variability associated with the process of data production pose major challenges in the analysis on reliability propagation.In this paper, we introduced the fuzzy fault tree analysis (FFTA) to analyze the reliability propagation in the production process of land cover data.
Fault tree analysis (FTA) is a logical and diagrammatic approach to represent the sequences and combinations of possible events occurring in a system that lead to the top undesired event [23].This model is widely used to evaluate the reliability of a complex system in various fields, such as pipelines, aerospace and petrochemical industry [24].FFTA combines fuzzy set theories for the uncertainty analysis in FTA model [25].The probabilities of events are treated as a fuzzy number, which can be obtained by theory of fuzzy set [26].In the process reliability analysis model of land cover data, the FFTA was used to describe the relationships among various events in the data production system and the probability of the top event was regarded as a function of the reliability probability of the system.

Construction of Process Reliability Analysis Model
Two methods to produce land cover products are commonly used [27].One involves using visual interpretation based on professional knowledge with the foundation geographical data and interpretation symbols as the auxiliary information.The reliability of visual interpretation is affected by operation staff and data sources, including digital orthophoto images and foundation geographical data.Another method of computer classification was applied to certain land cover types that have homogeneous color and texture, such as water bodies, built-up land, and sandy land.The reliabilities of classification algorithms and digital orthophoto images affect the reliability of the computer classification.Furthermore, the reliability of the digital orthophoto image is affected by the image source (i.e., spatial resolution, spectral type and currency) and the image pre-processing (i.e., plane precision and overedge precision).Field survey is the auxiliary process of data production and will be performed to identify the land cover type in the region where the type cannot be determined by visual interpretation and computer classification.Therefore, land cover data are derived from machine interpretation, visual interpretation, and field survey, and its reliability depends on the reliabilities of the three processes.
Based on the production process of land cover data and the FFTA, the process reliability analysis model was constructed to evaluate the reliability of land cover products (Figure 2).In the figure, circular frames denote the basic events and square frames denote the intermediate and top events.The evaluation model involves nine basic events, five intermediate events, and one top event.The reliability of land cover products was defined as the top event in this model.The reliabilities of basic events can be obtained by professional knowledge or quantitative methods.Through the reliabilities of basic events and the relationships among events, the reliability of intermediate events and the top event can be computed.

Reliabilities of Basic Events
To evaluate the reliability of the top event in the FFTA model, probabilities of the basic events, as the parameters of the model, must be known in advance [24].Normally, probabilities of the basic events can be obtained according to expert knowledge and experience [28].Based on the research on the data production process in the production department, the following preliminary quantitative methods are presented to compute these probabilities: (1) Image Spectral Type (R1) includes panchromatic and multispectral.Multispectral image contains more adequate spectral information than panchromatic image.Thus, the reliability of multispectral image is more reliable than that of panchromatic image in theory.In this paper, Image Spectral Type is set to 0.7 and 0.9 when the type is panchromatic and multispectral, respectively.
(2) Image Spatial Resolution (R2).A high resolution image will theoretically provide more adequate and reliable land cover information.Setting the reliability of the image with spatial resolution of 2 m to 0.7 m and considering the image with spatial resolution more than 10 m as unreliable, the possibility of Image Spatial Resolution can be computed as 0.
where r is the resolution of the image.
(3) Image Currency (R3) measures the extent that the image acquired before can reflect the current land cover condition considering the land cover changes.Setting the reliability of the image acquired at the earliest data-acquired time demanded by the design of the data production to 0.6 and assuming that the reliability of the image will decrease linearly as time passes by, the possibility of Image Currency can be computed as where t is the month span between the actual data-acquired time and the earliest data-acquired time demanded by the design of the data production, and t0 is the month span between the actual data-acquired time and the data-evaluated time.If the image is acquired before the earliest data-acquired time demanded by the design of the data production, then t is less than 0 and R3 is set to 0. (4) Image Plane Precision (R4) influences the reliability of the image pre-processing and is defined as where is a constant denoting the limit value of the mean square error of the image plane precision, and m is the mean square error of the image plane precision.
(5) Image Overedge (R5) influences the reliability of the image pre-processing and is defined as where is a constant denoting the limit value of the mean square error of the image overedge, and is the mean square error of the image overedge.
(6) Machine Interpretation Algorithm (R6).Different classification algorithms will yield different classification results.Let be the maximum posteriori possibility of the jth pixel or image object, and n be the total number of pixels or image objects.Thus, R6 can be denoted as 6 1 1 (7) Foundation Geography Information Datum (R7) as the auxiliary datum in the artificial interpretation is one of the data sources of artificial interpretation.R7 can refer to the previous accuracy assessment result of the data.
(8) Operation Staff (R8) reflects the ability of the data production operator in artificial interpretation.R8 can refer to the previous average accuracy of the land cover data produced by the operator.
(9) Field Survey (R9).The reliability of land cover data will increase after field survey.Thus, R9 can be set to a high possibility value according to expert experience.

Reliability Interval of Top Event
Given all the probabilities of the basic events, the reliability of the top event can be obtained through the relationships among various events which are normally represented by logical AND and OR gates [29].In the process reliability analysis model, we improved the reliability computation methods of intermediate events and top event according to the relationships among the events in the production process of land cover data.In addition, we use the possibility intervals instead of the exact possibility values of intermediate events and top event to obtain more flexible and reliable results.
Image source (R10) is affected by image spectral type (R1), image spatial resolution (R2), and image currency (R3).Thus, the left value of the possibility interval can be set to the minimum value of the three event possibilities, and the right value of the interval can be set to the maximum value as follows: where the operators min ( ) and max ( ) find the minimum and maximum values of the input values, respectively.
Given that the image pre-processing (R11) accuracy depends on image plane precision (R4) and image overedge (R5), and plane position errors and overedge errors can happen at the same places, the reliability possibility interval can be computed as Digital orthophoto image ( ) used to produce land cover data depends on image source (R10) and image pre-processing (R11), two events that are independent of each other.Thus, the reliability possibility interval can be obtained by where the operators left ( ) and right ( ) find the left value and right value of the input possibility interval, respectively.Data source of artificial interpretation ( ) containing foundation geography information datum ( ) and digital orthophoto image ( ) can be computed by the average reliability possibility of the two data Machine interpretation is affected by the reliability of the image and the classification algorithm applied to the images.Machine interpretation algorithm (R6) and digital orthophoto image ( ) are also independent of each other.Thus, machine interpretation can be computed as Similarly, artificial interpretation (R15) is affected by operation staff (R8) and data source of artificial interpretation ( ), which are independent of each other.We can thus obtain the artificial interpretation (R15) by Considering that land cover products are derived from machine interpretation, visual interpretation, and field survey, the reliability interval of land cover products can be calculated as where , , > 0 and ∑ 1 are the weighting parameters.These parameters are the proportions of land cover data acquired through field survey, machine interpretation, and artificial interpretation.

Area and Data
In this paper, we quantitatively analyze the applications of the proposed validation schema using land cover data generated from the project of the National Geographical State Monitoring (NGSM).The project launched by the State Council of China was initiated in 2011.It completely mapped the national land cover in China through remote sensing and geographic information system techniques.
The land cover data of NGSM were generated from multisource remote sensing images, such as aerial images and images acquired by ZY-3, WorldView-2, and QuickBird.NGSM applied a hierarchical classification system with 10 first-level types, 46 second-level types, and 77 third-level types.The hierarchical nature of the classification scheme enables generalization and reporting at high levels of the hierarchy.
We choose land cover data acquired in six typical counties in July 2014 to apply the proposed validation schema, named County 1 and County 2 of Shaanxi Province, County 3 and County 4 of Jiangxi Province, and County 3 and County 4 of Hainan Province.The three provinces of Shaanxi, Jiangxi, and Hainan with different landforms and climates are located in the north, middle, and south of China, respectively.Table 1 displays the detail information of the six study areas.The locations of the six counties are shown as red points in Figure 3.These data are subdivided into map sheets at 1:10,000 scale (in Jiangxi and Hainan Province) or 1:25,000 scale (in Shaanxi Province).The corresponding remote sensing image of one typical land cover map in County 3 is illustrated in Figure 3b.

Reliability Evaluation Results
We established a stratified sampling for result reliability evaluation.The stratification was based on the landscape complexity and the production department.The sampling was applied in each study county and based on the sample strategy and the total land cover maps in each county, a total of 42 samples were selected.Reference data were collected by the professionals from the field survey and the high-resolution images used for the data production.For process reliability evaluation, we evaluated the total land cover maps without the sampling design.
Figure 4 shows the result reliabilities and process reliabilities of a total of 42 samples.Considering that NGSM applied a hierarchical classification system with three levels, the reliability of each level of the data could be obtained as shown in Figure 4a.Most data perform the reliabilities higher than 0.98 because these data were produced by the professional production departments and satisfied the accuracy requirements of the NGSM.For example, the percentage of features with first-level classification error in one map should be less than 0.3% according to the accuracy requirements of the NGSM.The reliability of the first level is higher than that of the other two levels because the first-level types are more easily recognized.
Table 2 shows the overall result reliabilities of the six study areas.The reliabilities of County 1, County 2, and County 3 are lower than those of County 4, County 5, and County 6 because of different production departments and landscape complexities.Experienced professionals and high-reliability images in the production department can normally facilitate the data production.On the contrary, complex landscapes in the county can increase the difficulty of the data production.
The reliabilities obtained from the process reliability analysis are presented as the reliability intervals as shown in Figure 4b.System bias existing in the process reliability analysis results in the process reliabilities lower than the result reliabilities.The differences in process reliabilities of land cover data are mainly caused by the reliabilities of basic events (e.g., operation staff, image spatial resolution and image currency) and the proportions of the land cover data acquired through field survey, machine interpretation, and artificial interpretation.

Classification Correctness Analysis
Figure 5 shows the classification correctness of first-level land cover types in six cities.If the reliabilities of the three levels of a type are equal, then this means the error features of this type are classified to the wrong first-level types.If the second-level and third-level reliabilities of a type are lower than the first-level reliability, then this means a few error features of this type are classified to the wrong second-level or third-level, but the first-level type of the error features is true.The classification correctness of woodland, buildings, and structures are lower than other types in County 1 (Figure 5a).In County 2 (Figure 5b), the second-level and third-level classification correctness of desert and bare surface are lower, whereas the first-level classification correctness is normal.This result is attributed to the difficulty of distinguishing between the second-level and third-level types of desert and bare surface in County 2, which is covered by loess tableland.The situation is similar to the third-level types of structures.In County 4 (Figure 5d), the third-level classification correctness of gardens is low because it is difficult to distinguish between orchard, tea garden, nursery, and mulberry.Several features of desert and bare surface were wrongly classified to cultivated land, gardens, and artificial piling and digging land in County 4 and County 3 (Figure 5c).The classification correctness of structures in County 5 (Figure 5e) is low because different types of hardening land are difficult to distinguish.The overall classification correctness of first-level types in County 6 (Figure 5f) is high.The analysis of the classification correctness can be the reference for the routes of field surveys in the production process of land cover data.

Scale Reasonableness Analysis
The acquisition scale of the study data is 400 m 2 , which means that features less than 400 m 2 were abandoned during data production.Therefore, we analyze the scale reasonableness of the study data at the scale of 200 and 100 m 2 , which are represented as Scale 1 and Scale 2 in Figure 6, respectively, to obtain further details on the land cover information.The figure shows the scale reasonableness of first-level types in the six study areas.The scale reasonableness of buildings is lower than other land cover types in all the six counties because several separate buildings, which have an area of less than 400 m 2 , exist in the land surface, and these buildings were abandoned during data production.The scale reasonableness of water bodies in County 3 (Figure 6c) and County 4 (Figure 6d) is lower than in other counties because many little broken water regions exist in the two counties of Jiangxi than in the counties of other provinces.The analysis on the scale reasonableness can be the reference for further data production to obtain more reliable land cover data.For example, a smaller acquisition scale for the buildings and the water bodies in regions containing many little broken water regions can be defined to obtain more details of actual land cover conditions and reliable land cover information.

Consistency Analysis between Result Reliability and Process Reliability
The comparison between result reliabilities and process reliabilities is illustrated in Figure 7.In the figure, red points denote the first-level result reliabilities of the samples and blue points represent the medians of process reliability intervals.The gradients of reliabilities of the two samples were computed to compare two groups of reliabilities.The sign (positive or negative) of the gradient reflects the change trend of reliabilities of two samples, while the gradient value reflects the change magnitude between the reliabilities of two samples.
The comparison shows that the reliability trends of 34 samples are consistent between the two groups.Only eight of the 42 samples are inconsistent, shown as the points with black circle in Figure 7.The inconsistency happens when the data in the complex landscapes have numerous errors caused by visual interpretation or computer classification leading to low result reliability.Meanwhile, the weight of field survey in the data is too high, resulting in the high process reliability.The opposite case can happen in the homogenous landscapes.The gradient values of process reliabilities in 37 samples are larger than those of the result reliabilities, which means most change magnitudes between the reliabilities of the samples are different in the two groups.The comparison indicates that process reliability evaluation is reasonable and can reflect the change trend of reliabilities to some extent.The comparison between the second-level or third-level result reliabilities and the process reliabilities can reach a similar conclusion.

Discussion
Experimental results show that the proposed reliability evaluation scheme is applicable in the validation of land cover products.Through the analysis on the seven result reliability indicators, multilevel and multitype reliabilities of land cover data can be obtained for different applications and strategic decisions.For example, the routes of field surveys can be established based on the land cover types with low classification correctness, and the analysis on the scale reasonableness can be the reference of further data acquisition scale for the land cover types with low scale reasonableness.In addition, process reliability evaluation with no need for reference data can save costs, facilitate the validation and solve the problem that the validation is often constrained by the lack of high-quality ground reference data to some extent.The consistency analysis between result reliability and process reliability shows that process reliability evaluation can reflect the consistent change trends of reliabilities with the result evaluation method to some extent.
However, limitations exist in the process reliability analysis model.As the production of land cover data is a complicated process with high uncertainty and the process reliability analysis model simplifies this process, system biases exist in the results of process reliability evaluation.Therefore, process reliability evaluation cannot take the place of result reliability evaluation for the validation of land cover products.The construction of the process reliability analysis model, including the definitions of the events and relationships among these events, should be further studied.

Conclusions
The reliability of land cover products has numerous impacts on the applications of the data and affects strategic decision and planning.The proposed reliability evaluation schema includes two reliability evaluation methods, namely, result reliability evaluation and process reliability evaluation.Result reliability evaluation computes the reliability of land cover data by using seven reliability indicators.Process reliability evaluation obtains the reliability through the reliability analysis on the data production process without the need for reference data.Experimental results using land cover data in six typical counties show that the proposed reliability evaluation scheme is reasonable and can be applied in the validation of land cover products.More reliability information of land cover can be obtained for the strategic decision and planning through the analysis on the seven indicators of result reliability evaluation, compared with traditional accuracy assessment methods.Process reliability evaluation, which relatively saves costs and does not need reference data compared with result reliability evaluation, can facilitate the validation, reflect the change trends of reliabilities and solve the problem that the validation is often constrained by the lack of high-quality ground reference data to some extent.However, the process reliability analysis model should be improved in further studies.

Figure 1 .
Figure 1.General framework of the reliability evaluation of land cover data.

Figure 2 .
Figure 2. The process reliability analysis model.

Figure 4 .
Figure 4. Reliabilities of the samples of six study areas.(a) Result reliabilities; (b) Process reliabilities.

Figure 7 .
Figure 7.Comparison between first-level result reliabilities and process reliabilities.

Table 1 .
Information of the six study areas.
County 1Midwest of Shaanxi Province 720 Rich types of landforms, such as hilly, river terrace, and loess tableland

Table 2 .
Overall result reliabilities of the six study areas.