Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data in an Agricultural Watershed

Dash, Padmanava; Sanders, Scott L.; Parajuli, Prem; Ouyang, Ying

doi:10.3390/rs15164020

Open AccessEditor’s ChoiceArticle

Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data in an Agricultural Watershed

¹

Department of Geosciences, Mississippi State University, Mississippi State, MS 39762, USA

²

Department of Agricultural and Biological Engineering, Mississippi State University, Mississippi State, MS 39762, USA

³

USDA Forest Service, Center for Bottomland Hardwoods Research, 775 Stone Blvd., Thompson Hall, Room 309, Mississippi State, MS 39762, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(16), 4020; https://doi.org/10.3390/rs15164020

Submission received: 9 July 2023 / Revised: 6 August 2023 / Accepted: 10 August 2023 / Published: 14 August 2023

(This article belongs to the Special Issue Remote Sensing Applications in Land Use and Land Cover Monitoring)

Download

Browse Figures

Versions Notes

Abstract

Classification of remotely sensed imagery for reliable land use and land cover (LULC) remains a challenge in areas where spectrally similar LULC features occur. For example, bare soils of harvested crop fields in agricultural watersheds exhibit spectral characteristics similar to high-intensity developed regions and impede an accurate classification. The goal of this study is to improve the accuracy of LULC classification of satellite imagery for the Big Sunflower River Watershed, Mississippi using ancillary data, multiple classification methods, and a post-classification correction (PCC). To determine the best approach, the methodology was applied to Landsat 8 Operational Land Imager (OLI) imagery during the growing season and post-harvest. Imagery for the growing season was acquired on 25 August 2015, and post-harvest was acquired on 7 January 2018. Three classification methods were applied: maximum likelihood (ML), support vector machine (SVM), and random forest (RF). LULC imagery was classified as open water, woody wetlands, harvested crop, rangeland, cultivated crop, high-intensity developed, and mid-low intensity developed areas. Ancillary data such as normalized difference vegetation index (NDVI), thematic maps of urban areas, river networks, transportation networks, high-resolution National Agriculture Imagery Program (NAIP) imagery, Google Earth time-series data, and phenology were used to determine the training dataset. Initially none of the three classification methods performed adequately. Hence, a post-classification correction (PCC) was implemented by masking and applying a majority filter using thematic maps of urban areas. Once PCC was implemented, the accuracies from each of the classification methods increased significantly with the SVM classification method performing best in both the growing season and post-harvest with an overall classification accuracy of 93.5% with a Kappa statistic of 0.88 in the post-harvest imagery and an overall classification accuracy of 84% with a Kappa statistic of 0.789 in the imagery from the growing season. It was found that SVM was the best classification method while PCC is an effective strategy to implement when dealing with spectrally similar LULC features. The use of SVM together with PCC increased the reliability of the information extracted. Strategies from this study can help to evaluate the LULC in agricultural and other watersheds.

Keywords:

land use and land cover; classification; agricultural watershed; maximum likelihood; support vector machine; random forest

1. Introduction

Land cover mapping and assessment is a core area of remote sensing data application [1]. Land cover is an underlying variable that impacts and links numerous components of the human and physical environments [2]. Land use and land cover (LULC) maps provide base information for decision-making in watershed management applications if the maps are reliable and updated [3]. Furthermore, LULC change is an important measure that is used to evaluate the effect of applied watershed management measures and is regarded as one of the most important variables of global change affecting ecological systems [1,3]. However, land cover change estimates from remotely sensed data are limited by numerous factors that impact the accuracy and success of classifications and hinder the creation of a functional thematic map. Such factors include the complexity of the landscape of a study area, inadequate resolution of the selected remotely sensed data, and difficulty in finding the image processing and classification approach most suitable for a particular study area [4].

To address this issue, ancillary data are integrated with remote sensing data to improve the classification accuracy of land cover data [3,4,5,6,7,8,9,10,11,12]. The most common approach is to incorporate ancillary data before the classification and as a result infuse the spatial or nonspatial information that may be of value in the image classification process, including elevation, slope, aspect, geology, soils, phenology, hydrology, transportation networks, political boundaries, and/or vegetation maps [13]. Sometimes post-classification corrections are implemented utilizing ancillary data to improve accuracy. The majority of imagery classifications are based on remotely sensed spectral responses and due to the complexity of biophysical environments, spectral confusion is common among land cover classes [4]. Some studies have turned to masking to deal with spectral confusion and have had success [3,14]. Masking removes a spectrally similar class and then returns that class after classification.

In addition to incorporating ancillary data, it is also important to determine the appropriate classification technique/method for a given situation. Numerous classification algorithms have been developed and a review of the methods and techniques can be found in Lu and Weng [4]. In a broad sense, the classification methods can be broken down into common or advanced. Classification methods, such as maximum likelihood (ML), minimum distance, and K-means are considered common classification methods [1]. Advanced classification methods include artificial neural networks, support vector machine (SVM), decision trees, and random forest (RF) [1]. The main objective of this study was to explore the capabilities of pixel-based classification methods on Landsat 8 Operational Land Imager (OLI) imagery from the three classification methods; maximum likelihood (ML), support vector machine (SVM), and random forest (RF) as well as the benefits of infusing ancillary data to create accurate LULC maps of the agriculture dominated Big Sunflower River Watershed in Mississippi, United States.

2. Literature Review

2.1. Use of Remote Sensing in Agriculture

Remotely sensed data has been utilized in agricultural applications for decades [15,16,17,18]. In agricultural settings, using appropriate methodology is important for accurate land cover classifications due to varying phenology. Each crop has specific planting and harvesting times, varying leaf structures, and different biophysical and biochemical variables. Additionally, soil moisture, soil organic matter content, and soil signatures affect the remote sensing spectra. A review of remote sensing in agriculture can be found in Mulla [19]. Applications of remote sensing in agriculture are typically based on the measurement of reflected radiation from soil or plant material. Plant pigments such as chlorophyll absorb radiation strongly in the visible spectrum, especially in blue and red wavelengths, and the near-infrared is strongly reflected due to leaf density and canopy structure [19,20]. The Normalized Difference Vegetation Index (NDVI) uses pigment absorption features in the red (~660 nm) and reflectance in the near-infrared (~860 nm) regions of the electromagnetic spectrum [21] to show vegetation biomass. NDVI is capable of estimating the number of plant properties such as leaf mass, chlorophyll (pigment) concentration, water content, and absorbed (or fraction) photosynthetic radiation [21]. When examining reflectance data, it is important to consider bare soils and their respective moisture and organic matter content. These soils will vary in their specific spectral reflectance signatures [22]. Since both bare soil and crop canopy will be present in a remotely sensed image, the mixture of the two spectral signatures often confounds the interpretation of reflectance data [19].

2.2. Classification Methods

Maximum likelihood (ML) is the most extensively used parametric classification algorithm [13]. This is due to the robust abilities of ML as well as its availability in almost every image-processing software [4]. ML is based on Bayes’ Theorem and assumes the probability distributions of input classes to have a multivariate normal distribution. Instead of minimum distance, ML selects the largest posterior probability [23]. ML classification within ESRI ArcMap uses a probability density function instead of a probability distribution present in Bayes’ Theorem. This is conducted by examining the variances and covariances of the training data as it assigns each cell to the appropriate class. Bayes’ Theorem is explained in detail in [24]. There are several drawbacks to the parametric approach. The imagery of a study area can be complex and violate the assumption of a normal spectral distribution [4]. This is evident in classes with significant within-class variance such as global and continent-wide land cover mapping [12]. In addition, integrating spectral with ancillary data is especially challenging with parametric classifiers such as ML [4].

Non-parametric classifiers, such as support vector machine (SVM) and random forest (RF), have grown in popularity for numerous reasons. Non-parametric classifiers make no assumption of data distribution, nor do they require any statistical parameters to separate classes. This makes it easier to incorporate non-spectral data into a classification procedure [4]. SVM is based on statistical learning theory with the goal of determining the optimal separation of classes [25]. SVM has been recognized to give higher classification accuracies than traditional methods such as ML [13]. Additionally, SVM has the advantage of imagery with heterogeneous classes and limited training sample availability [13,26]. Experiments have demonstrated SVM’s ability to interpret hyperspectral data effectively in hyper-dimensional feature space and not require any feature reduction procedures [27,28]. SVM capitalizes on the concept of margin maximization [26]. The margin is determined by the sum of distances to the hyperplane from the closest points separating two classes [25]. The basic premise of margin maximization is to determine the optimal separating hyperplane between two classes by maximizing the margin between the classes’ closest training samples. These training samples on the margin are termed support vectors and the line between the classes is known as the optimal separating hyperplane. If it is not possible to determine a linear separator, SVM can take it a step further and project the points into a higher-dimensional space using kernel techniques and then find a linear separator [13]. Margin maximization and SVM are explained in greater detail in statistical terms in Premalatha et al. [29], Gualtieri and Cromp [30], as well as in Chang and Lin [31].

Decision tree is a classification procedure that uses a recursive strategy to partition a dataset into smaller subsets by running the data through tests that are defined at each branch in the tree [12,13]. A decision tree can be broken down into three parts: root, split, and leaf. Furthermore, the root is formed from all the data where the tests begin [13]. The split (also termed as branch or node) is the next stop. Here, decision rules are implemented as a splitting test as the data is continually split into smaller groups. The split can be defined as:

\sum_{i}^{n} a_{i} x_{i} \leq c

for multivariate and

x_{i} > c

for univariate decision trees, where

x_{i}

is the measurement vectors of the

n

selected features. The vector of linear discriminate coefficients is represented as

a

and

c

are the decision thresholds [1]. The leaves refer to the class label assigned [12].

A random forest classifier (RF) is a nonparametric machine learning algorithm utilizing multiple decision trees [13]. Each decision tree is generated from different samples and subsets of the training data. The dataset is classified a number of times based on a random sub-selection of training pixels. This creates numerous decision trees. The final decision of each pixel’s classification is the result of a majority vote for that pixel. To create variation among trees, training data is projected into a randomly chosen subspace before being fitted to each tree. Additionally, to optimize the decision at each node, a randomized procedure is introduced.

2.3. Land Use & Land Cover Classes (LULC)

An adapted version of the Anderson [32] classification scheme used for the National Land Cover Database (NLCD) is considered the standard scheme for LULC classes for agricultural watersheds [13]. The NLCD was created by a group of federal agencies, including the United States Geological Survey, known as the Multi-Resolution Land Characteristics (MRLC) consortium [13]. The NLCD is the conclusive Landsat-based, 30-m resolution, land cover database for the United States [33,34]. Thus, the NLCD classification levels were chosen to represent the LULC classes in this paper to maintain compatibility with the majority of the literature.

2.4. Post Classification Correction (PCC)

The complexity of biophysical environments may lead to spectral confusion among LULC classes and thus requires ancillary data to ‘clean up’ or improve classified maps [4]. Ancillary data used in image classification are any type of spatial or nonspatial information that is potentially valuable in the image classification process. This includes transportation networks, soils, hydrology, political boundaries, phenology, vegetation maps, geology, slope, aspect etc. [13]. Studies have found that masking and then returning the class after classification is especially beneficial in increasing thematic map accuracy [3,14]. This removes spectrally similar classes. Masks are created in a number of ways. Thakkar et al. [3] generated masks based on a 3 × 3 variance texture derived from NIR band. Additionally, the NDWI index has been used to develop a water body mask [3]. Mesev [14] utilized special census data to further classify urban areas.

2.5. Classification Accuracy Assessment

Classification accuracy presented as a confusion matrix provides a simple cross-tabulation of a mapped class label against what was observed in the ground or reference data and provides the basis to describe classification accuracy and characterize errors [2,35,36]. Overall accuracy is the percentage of cases correctly allocated [2]. The accuracy of individual classes are examined through the confusion matrix from two different viewpoints: the user’s and producer’s accuracy. This is achieved by relating the total cases correctly allocated to the class to the total cases of that class. The user’s and producer’s accuracy entirely depend on whether it is based upon the matrix’s row or column marginals [2]. User’s accuracy corresponds to errors of omission or exclusion and producer’s accuracy corresponds to errors of commission or inclusion. Cohen’s Kappa coefficient is a standard measure used in accuracy assessment and resolves the issue of chance agreement or the allocation of the correct class by chance [2]. Many studies have recommended Cohen’s Kappa coefficient to be the standard measure of classification accuracy [37,38,39,40]. Stehman [36] argues overall accuracy, user’s accuracy, and producer’s accuracy are more applicable accuracy measures due to their direct interpretation as probabilities distinguishing data quality of a specific map and thus recommends all summary measures be used as each measure alone obscures potentially important details.

This method of accuracy assessment comes with inherent assumptions and limitations. Generally, it is implied that each pixel belongs solely to one of the classes in a defined set of mutually exclusive classes [2]. It is argued that the Kappa coefficient is not always suitable as a chance agreement is overestimated and results in an underestimation of classification accuracy [2,41]. Each measure of accuracy assesses different components of accuracy and thus different assumptions about the data [42]. Additionally, there is no widely acceptable measure of accuracy but a variety of indices, each sensitive to different features. Thus, there is no all-purpose measure of classification accuracy [2].

Sampling design is very important as the confusion matrix cannot be properly interpreted otherwise. A basic sampling size, such as random sampling, is suitable only if the sample size is large enough to guarantee all classes are adequately represented [2]. All constraints in a particular study must be considered in the design process of an accuracy assessment. The design should be practical so as to not diminish the credibility of the derived accuracy statement [43].

3. Materials and Methods

3.1. Study Area

The Big Sunflower River watershed (BSRW) is part of the Yazoo Basin and is one of the main tributaries of the Yazoo River in Mississippi. The Yazoo basin in northwestern Mississippi comprises an area of around 19,684 km² making it the largest in the Mississippi alluvial valley [44]. Interior drainage of this basin happens through complex and sluggish streams that eventually connect to the Big Sunflower or Bogue Phalia rivers, or Deer Creek, which flow into the Yazoo River and ultimately to the Mississippi River [44]. The BSRW is located in the humid subtropical climate region, characterized by temperate winters; long, hot summers; and rainfall that is fairly evenly distributed throughout the year. The BSRW is known as a crop-dominated watershed encompassing a substantial amount of Mississippi’s agricultural-heavy region, which is commonly termed the Mississippi Delta (Figure 1). The BSRW is located within eleven Mississippi delta counties (Bolivar, Coahoma, Humphreys, Issaquena, Leflore, Sharkey, Sunflower, Tallahatchie, Warren, Washington, and Yazoo) with a total surface area of 7660 km² [45]. Elevation ranges from nearly flat to undulating gentle slopes from around forty-nine to two hundred feet above sea level [46]. BSRW is ideal for agriculture due to nutrient-rich alluvial soils from years of deposition from seasonal flooding from the Mississippi River and surrounding tributaries. The soils vary extensively in structure, texture, frequency, and depth [46]. Agriculture has been a linchpin for the economy in this area with cotton, soybean, rice, corn, and wheat as the major crops. Typical planting and harvesting dates in the Mississippi Delta are listed in Table 1 [47]. Given the economic importance and the impact land cover has on both human and physical environments, accurate LULC maps with higher temporal resolution are required to provide base information for watershed management applications. Being an agricultural watershed, LULC in BSRW is dynamic seasonally as well as having non-seasonal temporal variations. The US Dept. of Agriculture (USDA) currently only provides one cropland data layer (CDL) map annually and the US Geological Survey (USGS) provides a national landcover database (NLCD) map every ~five years for the continental United States. This temporal resolution is insufficient for watershed modeling or management applications since LULC change is often more aggressive. While CDL and NLCD products are applicable at national scale analyses, at local-scale studies, as in this work, it is important to derive detailed classifications for improving the accuracies. This study is an effort to improve upon the annually available USDA LULC compilations by using Landsat 8 Operational Land Imager (OLI) data to create a database of LULC for BSRW with higher temporal resolution.

3.2. Remote Sensing Data and Processing

Landsat 8 OLI C1 Level-2 imagery of 25 August 2015, during the growing season, and the imagery of 7 January 2018, during post-harvest were downloaded through USGS Earth Explorer. Bands 1–7 were layer-stacked, and data processing was carried out using QGIS 2.18 and ArcMap 10.4 prior to analysis. Once imagery was mosaicked the study area was subset from the rest of the image. A normalized vegetation index (NDVI) was generated from the original data for analysis. This provided a measure of the absence or presence of vegetation and is useful for assessing the health of vegetation with higher NDVI values indicating healthy vegetation and lower NDVI values showing stressed vegetation [1]. A flowchart showing the remote sensing data used, processing, and methodology for this study is detailed in Figure 2. First, satellite data was selected based on percent cloud cover, image quality, radiometric, and geometric correction to obtain the best possible image quality. Second, to determine the number of predictors (i.e., band combinations), class separability and band separability were determined in Erdas Imagine using transformed divergence statistics from the selected training data. Next, each scheme, ML, RF, and SVM, was implemented and RF was optimized by modifying tree depth on the order of 10 increments (e.g., 40, 50, 60, etc.) and the number of trees by factors of 100 (e.g., 100, 200, 300, etc.) until performance leveled off. An accuracy assessment was performed using a stratified random sampling method. Next, PCC was applied with prior training data for each scheme, ML, RF, and SVM. Lastly, a final accuracy assessment was performed for all schemes with PCC implemented. The same number of training data for classification and testing data for accuracy assessment were used as reported in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13.

3.3. Ancillary Data

In this study, a city mask was generated using the National Agriculture Imagery Program (NAIP) high-resolution aerial imagery, Landsat 8 OLI data, and geographic information system (GIS) ancillary data including thematic maps of urban areas, transportation networks, and shapefiles of developed areas. NAIP imagery was acquired through the United States Department of Agriculture (USDA) Natural Resources Conservation Service (NRCS) Geospatial Data Gateway [48]. NAIP acquires aerial imagery during the agricultural growing seasons in the U.S. providing high-resolution data for the study [48]. Shapefiles of developed areas were obtained from the Mississippi Automated Resource Information System (MARIS). Other ancillary data used to determine the training dataset included the Normalized Difference Vegetation Index (NDVI), river networks, Google Earth time-series data, and crop phenology.

3.4. Mask Generation

Shapefiles of all incorporated cities within the BSRW were downloaded from the Mississippi Automated Resource Information System (MARIS). The incorporated cities shapefile was last updated in 2010. Hence, each shapefile was edited to represent the boundaries of high to low-intensity developed areas more accurately while also keeping permeable surfaces outside of the shapefile. To accomplish this, NAIP imagery and the Landsat 8 OLI imagery were used interchangeably to ensure an accurate edit of each incorporated cities shapefile. The updated cities shapefile was then masked over the BSRW shapefile (Figure 3), which created a second shapefile with developed areas within the BSRW removed. Finally, this second shapefile was used together with the Landsat 8 OLI imagery to mask all developed areas contained within the imagery.

3.5. LULC Classification and Post-Classification Correction

Accuracy assessment was carried out for each classification method before and after PCC. Each classified thematic map’s accuracy was assessed using an accuracy assessment workflow. Five-hundred assessment points were generated using a stratified random sampling strategy for each classified thematic map. Stratified random sampling distributes points proportional in number to the class area of each class. Each point was examined using the original satellite imagery, NAIP imagery, NDVI image, and Google Earth time series data to determine its actual class or ground truth. The classified points and ground truth data were then compiled into a confusion matrix. This matrix compared the user’s accuracy (Equation (1)) versus the producer’s accuracy (Equation (2)) as well as with overall accuracy (Equation (3)). The user’s accuracy was calculated to determine how frequently the class assigned will be present on the ground. To know how often real features on the ground are correctly shown on the classified map, the producer’s accuracy was calculated. To determine the inter-rater reliability between classes, the Kappa statistic was computed.

U s e r^{'} s A c c u r a c y = \frac{N u m b e r o f C o r r e c t l y C l a s s i f i e d P i x e l s i n e a c h C a t e g o r y}{T o t a l N u m b e r o f C l a s s i f i e d P i x e l s i n t h a t c a t e g o r y (R o w t o t a l)} \times 100

(1)

P r o d u c e r^{'} s A c c u r a c y = \frac{N u m b e r o f C o r r e c t l y C l a s s i f i e d P i x e l s i n e a c h C a t e g o r y}{T o t a l N u m b e r o f R e f e r e n c e P i x e l s i n t h a t C a t e g o r y (C o l u m n T o t a l)} \times 100

(2)

O v e r a l l A c c u r a c y = \frac{T o t a l N u m b e r o f C o r r e c t l y C l a s s i f i e d P i x e l s (d i a g o n a l)}{T o t a l N u m b e r o f R e f e r e n c e P i x e l s} \times 100

(3)

4. Results

4.1. Growing Season

The growing season presented a substantial amount of varying vegetation types due to the large variety of crops present. Utilizing transformed divergence statistics, it was determined that seven bands were necessary to effectively separate each class. In terms of user accuracy, all three schemes struggled with the same three classes, rangeland, high-intensity developed, and medium-low intensity developed before PCC. These classes can be hard to separate due to their similar spectral signatures calculated using each respective class samples (Figure 4). Looking at the producer’s accuracy, inaccuracies varied more between classes, but all struggled with rangeland and cultivated crop classes. This can be attributed to the low classification accuracy of classes such as rangeland and urban classes causing exclusion for other classes.

4.1.1. Classification of the Growing Season Imagery before Post-Classification Correction

ML scheme was the least successful prior to PCC and had an overall accuracy of 61% (Table 2). RF (Table 3) and SVM (Table 4) both performed better than ML prior to PCC with overall accuracies of 72% and 68%, respectively. Figure 5A displays the original Landsat 8 OLI imagery. Figure 5B–D represent the classified imagery for the same area with each applied scheme, ML, SVM, and RF, respectively, before the application of PCC.

ML accuracy assessment resulted in a Kappa statistic of 0.515 (Table 2). ML had trouble differentiating between cultivated crops and rangeland and thus caused lower accuracy. Some of this inaccuracy was also due to the woody wetlands class present typically on the boundaries of forests or in less dense tree cover areas. Another issue came from the two urban classes, high intensity developed and medium-low intensity developed. The bare soils in harvested fields affected both. This is attributed to the high reflectance values in components of the soils similar to that of developed areas (Figure 4). Due to the presence of vegetation in the medium-low intensity developed class, there was a mixing of classes with rangeland, harvested crops, and cultivated crops. ML did well with open water, woody wetlands, harvested crops, and cultivated crops in terms of user accuracy. However, due to the inaccuracies for the other classes (i.e., rangeland, high intensity developed, mid-low intensity developed), the producer’s accuracy indicated the classes had significant exclusion.

RF accuracy assessment shows an overall accuracy of 72% and a Kappa statistic of 0.66 (Table 3). Optimizing to a tree depth of 50 and a total number of 200 trees produced the best results for RF before PCC. Before optimization, the overall accuracy was 62% with a Kappa statistic of 0.53. RF scheme had similar issues as ML and SVM. Of the three schemes, RF was able to differentiate rangeland and cultivated crops most successfully as indicated by the user’s accuracy. However, in terms of producer’s accuracy, RF excluded more than ML and SVM in the amount of the rangeland class. RF performed well, in terms of user accuracy, in separating high-intensity developed. However, the low user accuracy of mid-low intensity developed was posed as a problem. This low accuracy in the urban class signifies exclusion in other classes since a large number of pixels were misclassified as medium-low intensity or high intensity developed.

SVM had similar problems differentiating between rangeland and cultivated crop classes. Out of 105 reference pixels for the rangeland class, 48 of them should have been cultivated crops. This issue along with the medium-low intensity developed class contributed to a low producer’s accuracy for the cultivated crop class. Similar to ML, SVM could not separate the two urban classes from soils and vegetation. Areas of woody wetlands were misclassified as rangeland, which affected the producer’s accuracy of the woody wetlands class the most. Similarly, areas of rangeland misclassified as medium-low intensity developed affected the producer’s accuracy of the rangeland class. Throughout the map, it is evident that parts of cultivated crop fields were misclassified as rangeland and medium-low intensity developed. In terms of user accuracy, SVM did well with open water, woody wetlands, harvested crops, and cultivated crops. However, the errors with other classes caused some exclusion in those same classes, hence resulted in lower producer’s accuracy. Accuracy assessment for the SVM classification showed an overall accuracy of 68% and a Kappa statistic of 0.595 (Table 4). Although none of the schemes produced sufficient results before PCC, RF performed the best with an overall accuracy of 72% with a Kappa of 0.66 (Table 3).

Table 2. Accuracy assessment values for growing season imagery using maximum likelihood classification before post-classification correction.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	High Int. Dev	Mid-Low Int. Dev	Total	User’s Accuracy	Kappa
Open Water	10	0	0	0	0	0	0	10	1	0
Woody Wetlands	0	55	0	0	0	0	0	55	1	0
Harvested Crop	0	0	90	0	0	0	0	90	1	0
Rangeland	0	19	1	37	74	0	1	132	0.28	0
Cultivated Crop	0	12	0	1	111	0	0	124	0.895	0
High Int. Developed	2	0	3	1	0	3	1	10	0.3	0
Mid-Low Int. Developed	0	4	29	29	20	0	4	86	0.047	0
Total	12	90	123	68	205	3	6	507	0	0
Producer’s Accuracy	0.83	0.61	0.73	0.54	0.54	1	0.667	0	0.611	0
Kappa	0	0	0	0	0	0	0	0	0	0.515

Table 3. Accuracy assessment for growing season imagery using random forest classification before post-classification correction. Number of trees = 200 and tree depth = 50.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	High Int. Dev	Mid-Low Int. Dev	Total	User’s Accuracy	Kappa
Open Water	10	0	0	0	0	4	0	14	0.71	0
Woody Wetlands	0	13	0	5	2	0	0	20	0.65	0
Harvested Crop	0	0	22	0	0	0	3	25	0.88	0
Rangeland	0	0	0	10	1	0	4	15	0.67	0
Cultivated Crop	0	0	0	10	23	0	4	37	0.62	0
High Int. Developed	0	0	0	0	0	6	0	6	1	0
Mid-Low Int. Developed	0	0	0	0	0	0	0	0	0	0
Total	10	13	22	25	26	10	11	117	0	0
Producer’s Accuracy	1	1	1	0.4	0.88	0.6	0	0	0.72	0
Kappa	0	0	0	0	0	0	0	0	0	0.66

4.1.2. Classification of the Growing Season Imagery after Post-Classification Correction

Table 5, Table 6 and Table 7 show the accuracy assessments for the growing season using ML, SVM, and RF after PCC. Figure 6A displays the original Landsat 8 OLI imagery and Figure 6B–D represents the same portion of the map with each applied scheme, ML, SVM, and RF, respectively, after the application of PCC.

ML scheme produced an overall accuracy of 75% with a Kappa statistic of 0.67 (Table 5). The main misclassifications came from the rangeland class. The misclassification of woody wetlands into open water was a mixed pixel issue. A flooded field next to a forest produced an NDVI value too high to be representative of water and therefore represents the woody wetlands class. A similar issue arose with harvested crops classified as open water. There was a small road between fishponds and thus was mixed with both open water and harvested crop. However, NDVI values reveal the pixel more so represented harvested crop. Woody wetlands were misclassified several times into four other classes resulting in a producer’s accuracy of 63% with rangeland being the dominant factor. This misclassification was caused by a number of factors. For example, Landsat 8 resolution is thirty meters, and thus, any strip of forest between fields or other land cover types may be misclassified. Furthermore, the density of trees within a pixel can cause misclassification. The producer’s accuracy of the rangeland class is satisfactory; however, the user’s accuracy was low. The most substantial problem was cultivated crop pixels classified as rangeland. Typically, this occurred in crop fields with lower NDVI values or where soil signal was influencing the scheme’s decision-making. Additionally, late August is the beginning of harvest for farmers. Thus, crops were beginning to reach maturity and many experienced the end of their life cycle as leaves began to turn brown and yellow. Moreover, land cover boundaries, particularly bare soils, created mixed pixels causing the scheme to choose rangeland. This problem contributed to cultivated crops having a low producer’s accuracy as portions of that land cover class were excluded due to those pixels being classified as rangeland. The other classes did well, regarding the user’s accuracy, and where there were inaccuracies in the producer’s accuracy typically had to do with misclassifications with rangeland. Finally, it appears that ML tends to “overcompensate” for a class. For example, a neighboring pixel to a different land cover class may be added to that class even though the values representative of that pixel not being correlated to that class.

Table 4. Accuracy assessment for growing season imagery using support vector machine classification before post-classification correction.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	High Int. Dev	Mid-Low Int. Dev	Total	User’s Accuracy	Kappa
Open Water	9	0	0	1	0	0	0	10	0.9	0
Woody Wetlands	1	63	0	2	1	0	0	67	0.94	0
Harvested Crop	0	0	100	1	0	0	1	102	0.98	0
Rangeland	0	27	3	26	48	0	1	105	0.248	0
Cultivated Crop	0	5	0	3	136	0	0	144	0.94	0
High Int. Developed	2	0	2	0	0	5	1	10	0.5	0
Mid-Low Int. Developed	0	3	21	23	17	0	9	73	0.12	0
Total	12	98	126	56	202	5	12	511	0	0
Producer’s Accuracy	0.75	0.64	0.79	0.46	0.67	1	0.75	0	0.68	0
Kappa	0	0	0	0	0	0	0	0	0	0.595

Table 5. Accuracy assessment values for growing season imagery using maximum likelihood classification after post-classification correction.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	Total	User’s Accuracy	Kappa
Open Water	8	1	1	0	0	10	0.8	0
Woody Wetlands	0	56	0	0	0	56	1	0
Harvested Crop	0	0	106	1	0	107	0.99	0
Rangeland	0	27	9	88	79	203	0.434	0
Cultivated Crop	0	4	0	2	119	125	0.952	0
Total	8	88	116	91	198	501	0	0
Producer’s Accuracy	1	0.636	0.913	0.967	0.60	0	0.753	0
Kappa	0	0	0	0	0	0	0	0.674

Table 6. Accuracy assessment for growing season imagery using random forest classification after post-classification correction. Number of trees = 100 and tree depth = 40.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	Total	User’s Accuracy	Kappa
Open Water	9	0	0	0	0	9	1	0
Woody Wetlands	1	13	0	3	0	17	0.76	0
Harvested Crop	0	0	24	4	0	28	0.86	0
Rangeland	0	0	0	11	1	12	0.92	0
Cultivated Crop	0	0	0	15	27	42	0.64	0
Total	10	13	24	33	28	108	0	0
Producer’s Accuracy	0.9	1	1	0.33	0.96	0	0.78	0
Kappa	0	0	0	0	0	0	0	0.72

Table 7. Accuracy assessment for growing season imagery using support vector machine with post-classification correction.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	Total	User’s Accuracy	Kappa
Open Water	10	0	0	0	0	10	1	0
Woody Wetlands	0	64	0	0	4	68	0.94	0
Harvested Crop	1	0	114	5	1	121	0.94	0
Rangeland	0	24	3	78	33	138	0.565	0
Cultivated Crop	0	5	1	2	158	166	0.95	0
Total	11	93	118	85	196	503	0	0
Producer’s Accuracy	0.909	0.688	0.966	0.917	0.806	0	0.84	0
Kappa	0	0	0	0	0	0	0	0.789

RF performed better than ML and improved with the implementation of PCC (Table 6). Implementing a tree depth of 40 and a total number of 100 trees was found to produce the best results for RF. The same issues bedeviling SVM and ML can explain some components of misclassification in RF, but not all. RF scheme’s choices for pixel class could not always be explained. Since RF works using a voting system of trees, this may complicate decision-making in that too many rules are being used to determine a class choice that is relatively small.

SVM scheme produced an overall accuracy of 84% and Kappa statistic of 0.789 and had similar issues as ML, however, SVM did a considerably better job handling those issues (Table 7). For instance, the user’s accuracy for the rangeland class increased and did not affect the producer’s accuracy of cultivated crops as poorly as ML. Nonetheless, SVM had similar issues regarding the producer’s accuracy of woody wetlands and the rangeland class was culpable. After examining the imagery, it was observed that the spatial resolution of Landsat 8 imagery, tree density, and mixed pixels from land cover class boundaries caused most of the errors. Overall, SVM did well with all classes excluding the user’s accuracy of rangeland and the producer’s accuracy of woody wetlands. SVM performed well in terms of separating classes, especially alongside differing land cover class boundaries.

In conclusion, results dramatically improved with the use of PCC with regard to majority filter and masking urban areas. SVM returned the most ideal results and had an overall accuracy of 84% and a Kappa statistic of 0.789. ML and RF did not perform as well, and all schemes had issues with the rangeland class in regards to user accuracy. Additionally, the woody wetlands class was misclassified as rangeland frequently with all schemes. However, a number of misclassifications can be attributed to mixed land cover types in a given pixel or boundary areas between classes.

4.2. Post-Harvest

In the post-harvest imagery, vegetation types and area of cover dropped substantially due to harvested crops and winter weather resulting in better results overall for ML (Table 8), RF (Table 9), and SVM (Table 10) schemes. However, rangeland, high-intensity developed, and mid-low-intensity developed classes still created substantial inaccuracy. Also, to examine class separability, spectral signatures were plotted using each respective class samples (Figure 7) acquired from the post-harvest imagery. Transformed divergence determined seven bands were necessary and effectively separated each class.

4.2.1. Classification of the Post-Harvest Imagery before Post-Classification Correction

Table 8, Table 9 and Table 10 exhibit the accuracy assessments for the post-harvest imagery using ML, RF, and SVM, respectively, before PCC. Figure 8 displays the original Landsat 8 post-harvest imagery. Figure 8B–D represents the same portion of the imagery with each applied scheme, ML, SVM, and RF, before the application of PCC. All had trouble differentiating between rangeland and cultivated crop land cover classes. Another issue was the misclassification of land cover types as high-intensity developed and mid-low intensity developed.

ML accuracy assessment yielded the best results before PCC over the post-harvest imagery among all the classifications implemented (Table 8). ML produced an overall accuracy of 77% with a Kappa statistic of 0.66. ML scheme had considerable trouble with the urban land cover classes; high intensity developed and mid-low intensity developed. This in turn affected other classes’ producer’s accuracy. The post-harvest imagery has significantly more bare soils than the growing season imagery. A number of these soils have reflectance values similar to that of developed areas and this caused misclassification. The misclassification of open water and woody wetlands is due to the fact that either developed class involves mixed pixels of woody wetlands and open water land covers. The misclassification of woody wetlands with rangeland is attributed to tree density, land cover boundaries, and other resolution-related issues. Throughout the classified imagery harvested crop land cover was misclassified as mid-low intensity developed. This occurred with the rangeland class as well. Since mid-low intensity developed is a class containing vegetation and impervious surfaces this is somewhat expected. Other than issues related to the developed classes, ML performed well.

Table 8. Accuracy assessment values for post-harvest imagery using maximum likelihood classification before post-classification correction.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	Mid-Low Int. Dev	Total	User’s Accuracy	Kappa
Open Water	11	0	0	1	0	0	12	0.92	0
Woody Wetlands	0	73	0	1	0	0	74	0.986	0
Harvested Crop	0	1	222	1	0	0	224	0.99	0
Rangeland	0	7	9	74	2	2	94	0.787	0
Cultivated Crop	0	0	0	2	8	0	10	0.8	0
High Int. Developed	7	4	2	2	0	0	15	0	0
Mid-Low Int. Developed	0	8	49	20	0	4	81	0.049	0
Total	18	93	282	101	10	6	510	0	0
Producer’s Accuracy	0.61	0.785	0.787	0.73	0.8	0.667	0	0.768	0
Kappa	0	0	0	0	0	0	0	0	0.665

RF accuracy assessment shows an overall accuracy of 72.4% with a Kappa statistic of 0.593 (Table 9). This portrays a drop in accuracy in almost all classes in terms of user’s accuracy. RF misclassified several classes resulting in very poor producer accuracies. Upon examination of rangeland pixels misclassified as woody wetlands, it is difficult to determine RF’s reasoning. The pixels themselves are not mixed nor close to a land cover boundary. However, there are errors present in the classification related to resolution issues causing mixed pixels. The urban classes, high intensity developed and mid-low intensity developed, continued to be problematic. All classes had pixels misclassified as either urban class except for cultivated crops.

SVM accuracy assessment resulted in 75% overall accuracy with a Kappa statistic of 0.641 (Table 10). SVM had complications with urban land cover classes as well. Using a different scheme did not resolve the issue of misclassification of woody wetlands and open water into high-intensity developed. A considerable amount of harvested cropland cover and rangeland was misclassified as mid-low intensity developed as well. As mentioned earlier, the increase in soil land cover and the similarity of signatures with developed areas caused this misclassification problem. Additionally, the rangeland class had a significant drop in terms of producer accuracy as compared to ML due to developed classes. Overall, SVM results were not as sound as results with the ML scheme.

Table 9. Accuracy assessment for post-harvest imagery using random forest classification before post-classification correction.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	High Int. Dev	Mid-Low Int. Dev	Total	User’s Accuracy	Kappa
Open Water	14	0	2	0	0	0	0	16	0.875	0
Woody Wetlands	0	76	3	7	0	0	1	87	0.874	0
Harvested Crop	0	4	225	11	0	0	0	240	0.94	0
Rangeland	0	8	1	34	4	0	2	49	0.69	0
Cultivated Crop	0	0	0	1	9	0	0	10	0.9	0
High Int. Developed	2	1	2	1	0	2	2	10	0.2	0
Mid-Low Int. Developed	0	10	55	23	0	0	8	96	0.083	0
Total	16	99	288	77	13	2	13	508	0	0
Producer’s Accuracy	0.875	0.767	0.78	0.44	0.69	1	0.62	0	0.72	0
Kappa	0	0	0	0	0	0	0	0	0	0.59

Table 10. Accuracy assessment for post-harvest imagery using support vector machine classification before post-classification correction.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	High Int. Dev	Mid-Low Int. Dev	Total	User’s Accuracy	Kappa
Open Water	15	0	0	0	0	0	0	15	1	0
Woody Wetlands	0	78	0	3	0	0	0	81	0.96	0
Harvested Crop	0	2	228	0	0	0	0	230	0.99	0
Rangeland	0	8	0	50	2	0	5	65	0.769	0
Cultivated Crop	0	0	0	3	7	0	0	10	0.7	0
High Int. Developed	5	2	1	0	0	2	0	10	0.2	0
Mid-Low Int. Developed	0	12	56	27	0	0	3	98	0.03	0
Total	20	102	285	83	9	2	8	509	0	0
Producer’s Accuracy	0.75	0.76	0.8	0.6	0.778	1	0.375	0	0.75	0
Kappa	0	0	0	0	0	0	0	0	0	0.64

4.2.2. Classification of the Post-Harvest Imagery after Post-Classification Correction

Implementing PCC into the classification methodology produced significantly better results for all classification schemes. Table 11, Table 12 and Table 13 exhibit the results for post-harvest using ML, RF, and SVM, respectively, after PCC. Additionally, Figure 9 exhibits the ability of PCC to clean up image classification for each scheme, ML, SVM, and RF, respectively.

ML produced user accuracies above 90% for all classes except rangeland (Table 11). The low user’s accuracy of rangeland caused portions of woody wetlands, harvested crops, and cultivated crops to be misclassified resulting in a drop in producer’s accuracy for those land cover classes. All but cultivated crop class contributed to lowering woody wetlands producer’s accuracy. Now that the urban area’s signature has been masked rangeland class is the only class with any significant impact on classification accuracy.

RF (Table 12) did not perform nearly as well as either SVM or ML. No class had a user’s accuracy over 88% whereas SVM and ML produced user’s accuracies exceeding 90% for all classes, excluding ML performance with rangeland. The producer’s accuracy of woody wetlands decreased even more with RF and all classes affected woody wetlands class. While SVM and ML only had issues with rangeland, RF had difficulties with cultivated crops as well. As mentioned with RF classification during the growing season, RF’s pixel class choice cannot always be explained or understood. SVM (Table 13) handled the issue with rangeland significantly better than ML and RF. However, the producer’s accuracy of rangeland decreased as compared to ML. Also, woody wetlands were still excluded to some degree due to rangeland and harvested crop class. This problem was typical along the boundaries of classes where mixed pixels occurred. Overall, the increase in accuracy with SVM is due to SVM’s efficiency in the class separation of pixels with mixed land cover classes. All in all, SVM with PCC outperformed ML and RF with an overall accuracy of 93.5% with a Kappa statistic of 0.88. ML with PCC had an overall accuracy of 88.8% with a Kappa statistic of 0.82. Finally, RF results had an overall accuracy of 84.6%, with a Kappa statistic of 0.72.

Table 11. Accuracy assessment for post-harvest imagery using maximum likelihood classification after post-classification correction.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	Total	User’s Accuracy	Kappa
Open Water	17	1	0	0	0	18	0.94	0
Woody Wetlands	0	77	0	1	0	78	0.98	0
Harvested Crop	2	4	251	3	0	260	0.96	0
Rangeland	0	13	31	97	2	143	0.67	0
Cultivated Crop	0	0	0	0	10	10	1	0
Total	19	95	282	101	12	509	0	0
Producer’s Accuracy	0.89	0.81	0.89	0.96	0.83	0	0.888	0
Kappa	0	0	0	0	0	0	0	0.82

Table 12. Accuracy assessment for post-harvest imagery using random forest classification after post-classification correction.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	Total	User’s Accuracy	Kappa
Open Water	17	1	1	1	0	20	0.85	0
Woody Wetlands	0	81	2	10	0	93	0.87	0
Harvested Crop	1	12	281	26	0	320	0.87	0
Rangeland	0	5	14	45	0	64	0.70	0
Cultivated Crop	0	2	0	3	5	10	0.5	0
Total	18	101	298	85	5	507	0	0
Producer’s Accuracy	0.94	0.80	0.94	0.53	1	0	0.846	0
Kappa	0	0	0	0	0	0	0	0.72

Table 13. Accuracy assessment for post-harvest imagery using support vector machine with post-classification correction.

LULC Class	Open Water	Woody Wetlands	Harvested Crop	Rangeland	Cultivated Crop	Total	User’s Accuracy	Kappa
Open Water	19	0	0	0	0	19	1	0
Woody Wetlands	0	80	0	3	0	83	0.96	0
Harvested Crop	3	11	292	12	0	318	0.91	0
Rangeland	0	2	0	75	1	78	0.96	0
Cultivated Crop	0	0	0	1	9	10	0.9	0
Total	22	93	292	91	10	508	0	0
Producer’s Accuracy	0.86	0.86	1	0.82	0.9	0	0.935	0
Kappa	0	0	0	0	0	0	0	0.88

5. Discussion

SVM was found to be the most robust of the three schemes implemented with post-classification correction. This finding reveals that traditional parametric classifiers are not as suitable for agricultural settings due to the fact that the assumption of normal spectral distribution is often violated [4]. Pal and Mather [49] reported similar results in their study comparing SVM to ML and artificial neural networks.

No scheme implemented was able to effectively separate urban classes from bare soils and the various vegetation classes. The BSRW presents a unique challenge due to its soil structure. The BSRW soils vary widely in texture, structure, depth, and frequency [46]. Additionally, water absorption in soil largely influences reflectance in near-infrared and shortwave infrared regions [19,21]. These factors cause spectral confusion within the classifiers and result in no scheme effectively separating urban areas from bare soils and the various vegetation classes. Herold et al. [50] and Mesev [14] found bare soil surfaces to have spectral similarities to urban material types. Additionally, the spatial resolution of remote sensing imagery poses a limitation making mixed pixels common [4]. This has a direct effect on the mid-low intensity developed class separation from vegetation classes.

The PCC method of masking is an effective method to implement and increases classification accuracy significantly. Thakkar et al. [3] and Mesev [14] found similar results. Thakkar et al. [3] applied three masks, forest, water body, and drainage network masks, to produce sound results for the Arjuni watershed in Gujarat, India. Mesev [14] applied census data in urban areas to create accurate areal estimations. PCC is a reliable method to implement when dealing with the limitations of certain spatial resolutions and spectral similarities between classes.

Confusion between natural vegetation, such as rangeland, and agriculture is a difficult problem to solve and is a major source of error in remote sensing-based global land cover maps [12]. Moreover, bare soil and crop canopies will often both be present in a remotely sensed image and this mixture of two spectral signatures will often confound the interpretation of reflectance data leading to possible misclassification [19]. In the growing season imagery, a number of fields with crops have reached maturity and are ready for harvest. Upon reaching maturity some crops, such as wheat, corn, milo, and soybeans, turn yellow or brown and soybeans lose their leaves. This phenomenon allows the soil signatures to affect the vegetation pixels. The spectral signatures of soils and senescent crop residues are highly similar and traditional classification schemes have not proven robust enough to successfully differentiate the two [51]. This research found a number of fields that appeared to be bare soils if not for a faint NDVI signature or spotting throughout the field of crops not fully mature. These fields were misclassified as rangeland in some instances.

There are inherent limitations involved with classification accuracy assessments. Pixels are assumed implicitly to belong fully to one of the defined sets of mutually exclusive classes [2]. However, due to the complexity of biophysical environments and imagery resolution, this assumption is not always met. Mixed pixels have been identified as the most important cause of misclassification and a prime contributor to the underestimation of land cover change [2,4]. In this research, mixing at class boundaries was a significant problem. When dealing with measures of accuracy it has been argued that chance agreement is overestimated in the calculation of the Kappa coefficient and results in an underestimation of classification accuracy [2,41]. Sampling design also has major implications. This research chose a stratified random method in which points are randomly distributed within each class, where each class has a number of points proportional to its relative area. Thus, classes larger in area within the map were assessed more so than smaller class areas. Within an accuracy assessment, all errors are weighed equally. However, some errors are more critical or damaging than others and, in many instances, errors observed are between relatively similar classes [2].

The lack of ground truth data is an important limitation to be noted in this research. NAIP imagery and ancillary data such as NDVI were used to bridge this gap. NAIP imagery has a resolution of 0.6 m and was particularly helpful in determining land cover and other various components. Agricultural areas are highly dynamic and constantly changing in correspondence with crop growth and harvest. Thus, more Landsat 8 imagery in between the growing season and harvest would enhance the knowledge of land cover as well as what scheme and methodology is most suitable. Additionally, there are a plethora of classification techniques and methods that could increase the accuracy of land cover maps for an agricultural watershed.

6. Conclusions

Although all classifiers have been proven to be robust in previous research, none could perform satisfactorily to assure the desired classification accuracy for a heterogeneous agricultural landscape. In the present study, it was possible to improve the accuracy of all classifiers by incorporating PCC and other ancillary data. Additionally, the high-resolution NAIP imagery and 3 × 3 majority filter further aided in reducing the misclassification. The overall accuracy of 84.3% (for August 2015) and 93.5% (for January 2018) with SVM demonstrate the integration of PCC and ancillary data for remote sensing imagery is an effective method for improving classification accuracy.

Author Contributions

Conceptualization, P.D. and P.P.; methodology, P.D. and S.L.S.; software, S.L.S.; writing—original draft preparation, S.L.S.; writing—review and editing, P.D., P.P. and Y.O.; visualization, S.L.S.; supervision, P.D.; project administration, P.P.; funding acquisition, P.P., P.D. and Y.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by USDA NIFA/AFRI competitive grant award # 2017-67020-26375.

Data Availability Statement

All data generated or analyzed during this study are included in this published article in the form of figures and tables. Additional information about the dataset or the dataset in a different format than what is presented in this article can be obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest. The statements, findings, conclusions, and recommendations are those of the author(s) and do not necessarily reflect the views of the U.S. Department of Agriculture. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

References

Otukei, J.R.; Blaschke, T. Land Cover Change Assessment Using Decision Trees, Support Vector Machines and Maximum Likelihood Classification Algorithms. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, 27–31. [Google Scholar] [CrossRef]
Foody, G.M. Status of Land Cover Classification Accuracy Assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
Thakkar, A.K.; Desai, V.R.; Patel, A.; Potdar, M.B. Post-Classification Corrections in Improving the Classification of Land Use/Land Cover of Arid Region Using RS and GIS: The Case of Arjuni Watershed, Gujarat, India. Egypt. J. Remote Sens. Space Sci. 2017, 20, 79–89. [Google Scholar] [CrossRef]
Lu, D.; Weng, Q. A Survey of Image Classification Methods and Techniques for Improving Classification Performance. Int. J. Remote Sens. 2007, 28, 823–870. [Google Scholar] [CrossRef]
Stefanov, W.L.; Ramsey, M.S.; Christensen, P.R. Monitoring Urban Land Cover Change: An Expert System Approach to Land Cover Classification of Semiarid to Arid Urban Centers. Remote Sens. Environ. 2001, 77, 173–185. [Google Scholar] [CrossRef]
Xiuwan, C. Using Remote Sensing and GIS to Analyse Land Cover Change and Its Impacts on Regional Sustainable Development. Int. J. Remote Sens. 2002, 23, 107–124. [Google Scholar] [CrossRef]
Currit, N. Development of a Remotely Sensed, Historical Land-Cover Change Database for Rural Chihuahua, Mexico. Int. J. Appl. Earth Obs. Geoinf. 2005, 7, 232–247. [Google Scholar] [CrossRef]
Yuan, F.; Sawaya, K.E.; Loeffelholz, B.C.; Bauer, M.E. Land Cover Classification and Change Analysis of the Twin Cities (Minnesota) Metropolitan Area by Multitemporal Landsat Remote Sensing. Remote Sens. Environ. 2005, 98, 317–328. [Google Scholar] [CrossRef]
Qian, Y.; Zhang, K.; Qiu, F. Spatial Contextual Noise Removal for Post Classification Smoothing of Remotely Sensed Images. Proc. ACM Symp. Appl. Comput. 2005, 1, 524–528. [Google Scholar] [CrossRef]
Judex, M.; Thamm, H.; Menz, G. Improving Land-Cover Classification with a Knowledge- Based Approach and Ancillary Data. In Proceedings of the 2nd Workshop of the EARSeL SIG on Land Use and Land Cover, Bonn, Germany, 28–30 September 2006; pp. 184–191. [Google Scholar]
Manandhar, R.; Odehi, I.O.A.; Ancevt, T. Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data Using Post-Classification Enhancement. Remote Sens. 2009, 1, 330–344. [Google Scholar] [CrossRef]
McIver, D.K.; Friedl, M.A. Using Prior Probabilities in Decision-Tree Classification of Remotely Sensed Data. Remote Sens. Environ. 2002, 81, 253–261. [Google Scholar] [CrossRef]
Jensen, J.R. Introductory Digital Image Processing: A Remote Sensing Perspective; Pearson Education, Inc.: Glenview, IL, USA, 2015. [Google Scholar]
Mesev, V. The Use of Census Data in Urban Image Classification. Photogramm. Eng. Remote Sens. 1998, 64, 431–438. [Google Scholar]
Tucker, C.J.; Holben, B.N.; Elgin, J.H.; McMurtrey, J.E. Remote Sensing of Total Dry-Matter Accumulation in Winter Wheat. Remote Sens. 1980, 11, 171–189. [Google Scholar]
Moran, M.S.; Clarke, T.R.; Inoue, Y.; Vidal, A. Estimating Crop Water Deficit Using the Relation between Surface-Air Temperature and Spectral Vegetation Index. Remote Sens. Environ. 1994, 49, 246–263. [Google Scholar] [CrossRef]
Wardlow, B.D.; Egbert, S.L. Large-Area Crop Mapping Using Time-Series MODIS 250 m NDVI Data: An Assessment for the U.S. Central Great Plains. Remote Sens. Environ. 2008, 112, 1096–1116. [Google Scholar] [CrossRef]
Bolton, D.K.; Friedl, M.A. Forecasting Crop Yield Using Remotely Sensed Vegetation Indices and Crop Phenology Metrics. Agric. For. Meteorol. 2013, 173, 74–84. [Google Scholar] [CrossRef]
Mulla, D.J. Twenty Five Years of Remote Sensing in Precision Agriculture: Key Advances and Remaining Knowledge Gaps. Biosyst. Eng. 2013, 114, 358–371. [Google Scholar] [CrossRef]
Pinter, P.J.; Hatfield, J.L.; Schepers, J.S.; Barnes, E.M.; Moran, M.S.; Daughtry, C.S.T.; Upchurch, D.R. Remote Sensing for Crop Management; American Society for Photogrammetry and Remote Sensing: Baton Rouge, LA, USA, 2003; Volume 69. [Google Scholar]
Cheng, Y.B.; Ustin, S.L.; Riaño, D.; Vanderbilt, V.C. Water Content Estimation from Hyperspectral Images and MODIS Indexes in Southeastern Arizona. Remote Sens. Environ. 2008, 112, 363–374. [Google Scholar] [CrossRef]
Ben-Dor, E.; Chabrillat, S.; Demattê, J.A.M. Characterization of Soil Properties Using Reflectance Spectroscopy. In Fundamentals, Sensor Systems, Spectral Libraries, and Data Mining for Vegetation, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2018; pp. 187–247. [Google Scholar] [CrossRef]
Atkinson, P.M.; Lewis, P. Geostatistical Classification for Remote Sensing: An Introduction. Comput. Geosci. 2000, 26, 361–371. [Google Scholar] [CrossRef]
Aplin, P.; Atkinson, P.M.; Curran, P.J. Fine Spatial Resolution Simulated Satellite Sensor Imagery for Land Cover Mapping in the United Kingdom. Remote Sens. Environ. 1999, 68, 206–216. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar]
Melgani, F.; Bruzzone, L. Classification of Hyperspectral Remote Sensing. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar]
Gualtieri, J.A.; Cromp, R.F. Support Vector Machine for Hypserspectral Remote Sensing Classification. Adv. Comput. Assist. Recognit. 1998, 3584, 221–232. [Google Scholar]
Gualtieri, J.A.; Chettri, S.R.; Cromp, R.F.; Johnson, L.F. Support Vector Machine Classifiers as Applied to AVIRIS Data. In Proceedings of the Eighth JPL Airborne Earth Science Workshop, Pasadena, CA, USA, 10–11 February 1999. [Google Scholar]
Premalatha, M.; Lakshmi, C.V. SVM Trade-off between Maximize the Margin and Minimize the Variables Used for Regression. Int. J. Pure Appl. Math. 2013, 87, 741–750. [Google Scholar] [CrossRef]
Gualtieri, J.A.; Cromp, R. SVM for Hyperspectral Remote Sensing Classification. Proc. SPIE 1998, 3584, 221–232. [Google Scholar]
Chang, C.C.; Lin, C.J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
Anderson, J.R.; Hardy, E.E.; Roach, J.T.; Witmer, R.E. Land Use and Land Cover Classification System for Use with Remote Sensor Data; U.S. Government Printing Office: Washington, DC, USA, 1976.
Homer, C.; Fry, J. The National Land Cover Database. US Geol. Surv. Fact Sheet 2012, 3020, 1–4. [Google Scholar]
Homer, C.; Dewitz, J.; Yang, L.; Jin, S.; Danielson, P.; Xian, G.; Coulston, J.; Herold, N.; Wickham, J.; Megown, K. Completion of the 2011 National Land Cover Database for the Conterminous United States—Representing a Decade of Land Cover Change Information. Photogramm. Eng. Remote Sens. 2015, 81, 345–354. [Google Scholar]
Canters, F. Evaluating the Uncertainty of Area Estimates Derived from Fuzzy Land-Cover Classification. Photogramm. Eng. Remote Sens. 1997, 63, 403–414. [Google Scholar]
Stehman, S.V. Selecting and Interpreting Measures of Thematic Classification Accuracy. Remote Sens. Environ. 1997, 62, 77–89. [Google Scholar] [CrossRef]
Smits, P.C.; Dellepiane, S.G.; Schowengerdt, R.A. Quality Assessment of Image Classification Algorithms for Land-Cover Mapping: A Review and a Proposal for a Cost-Based Approach. Int. J. Remote Sens. 1999, 20, 1461–1486. [Google Scholar] [CrossRef]
Rosenfield, G.H.; Fitzpatrick-Lins, K. A Coefficient of Agreement as a Measure of Thematic Classification Accuracy. Photogramm. Eng. Remote Sens. 1986, 52, 223–227. [Google Scholar]
Fitzgerald, R.W.; Lees, B.G. Assessing the Classification Accuracy of Multisource Remote Sensing Data. Remote Sens. Environ. 1994, 47, 362–368. [Google Scholar] [CrossRef]
Fung, T.; Ledrew, E. The Determination of Optimal Threshold Levels for Change Detection Using Various Accuracy Indices. Photogramm. Eng. Remote Sens. 1988, 54, 1449–1454. [Google Scholar]
Ma, Z.; Redmond, R.L. Tau Coefficients for Accuracy Assessment of Classification of Remote Sensing Data. Photogramm. Eng. Remote Sens. 1994, 61, 435–439. [Google Scholar]
Stehman, S.V. Comparing Thematic Maps Based on Map Value. Int. J. Remote Sens. 1999, 20, 2347–2366. [Google Scholar] [CrossRef]
Stehman, S.V.; Czaplewski, R.L. Design and Analysis for Thematic Map Accuracy Assessment: Fundamental Principles. Remote Sens. Environ. 1998, 64, 331–344. [Google Scholar] [CrossRef]
Saucier, R.T. Quaternary Geology of the Lower Mississippi Valley; Arkansas Archeological Survey: Fayetteville, AR, USA, 1994; Volume I. [Google Scholar]
Risal, A.; Parajuli, P.B.; Dash, P.; Ouyang, Y.; Linhoss, A. Sensitivity of Hydrology and Water Quality to Variation in Land Use and Land Cover Data. Agric. Water Manag. 2020, 241, 106366. [Google Scholar] [CrossRef]
Snipes, C.E.; Evans, L.P.; Poston, D.H.; Nichols, S.P. Agricultural Practices of the Mississippi Delta; ACS Publications: Washington, DC, USA, 2004; pp. 43–60. [Google Scholar] [CrossRef]
National Agricultural Statistics Service. Field Crops Usual Planting and Harvesting Dates. In Agriculural Handbook; NASS: Burr Ridge, IL, USA, 2010; pp. 1–51. [Google Scholar]
USDA. Data Gateway; USDA: Washington, DC, USA, 2018.
Pal, M.; Mather, P.M. Support Vector Machines for Classification in Remote Sensing. Int. J. Remote Sens. 2005, 26, 1007–1011. [Google Scholar] [CrossRef]
Herold, M.; Roberts, D.A.; Gardner, M.E.; Dennison, P.E. Spectrometry for Urban Area Remote Sensing—Development and Analysis of a Spectral Library from 350 to 2400 Nm. Remote Sens. Environ. 2004, 91, 304–319. [Google Scholar] [CrossRef]
South, S.; Qi, J.; Lusch, D.P. Optimal Classification Methods for Mapping Agricultural Tillage Practices. Remote Sens. Environ. 2004, 91, 90–97. [Google Scholar] [CrossRef]

Figure 1. Big Sunflower River Watershed and the surrounding Yazoo Basin.

Figure 2. Methodology flowchart.

Figure 3. Mask generation for post-classification correction.

Figure 4. Spectral Signature for each class during the growing season from the Landsat OLI imagery of 25 August 2015.

Figure 5. Growing season (25 August 2015) before PCC; (A) Landsat 8 OLI imagery, (B) support vector machine classification, (C) maximum likelihood classification, (D) random forest classification.

Figure 6. Growing season (25 August 2015) after PCC; (A) Landsat 8 OLI imagery, (B) support vector machine classification, (C) maximum likelihood classification, (D) random forest classification.

Figure 7. Spectral Signature for each class post-harvest from the Landsat OLI imagery of 7 January 2018.

Figure 8. Post-harvest (7 January 2018) before PCC; (A) Landsat 8 OLI imagery, (B) support vector machine classification, (C) maximum likelihood classification, and (D) random forest classification.

Figure 9. Post-harvest (7 January 2018) after PCC; (A) Landsat 8 OLI imagery, (B) support vector machine classification, (C) maximum likelihood classification, (D) random forest classification.

Table 1. Usual Planting and Harvesting Dates [47].

USDA: 2010		Usual Planting Dates			Usual Harvesting Dates
Crops	Begin	Most Active	End	Begin	Most Active	End
Barley	n/a	n/a	n/a	n/a	n/a	n/a
Corn	17 Mar	24 March–27 April	4 May	11 August	23 August–23 September	7 October
Cotton	20 April	27 April–19 May	29 May	15 September	27 September–29 October	12 November
Potatoes, Sweet	4 May	7 June–23 June	7 July	20 August	2 September–28 October	7 November
Hay, other	n/a	n/a	n/a	10 April	n/a	26 September
Oats	n/a	n/a	n/a	n/a	n/a	n/a
Peanuts	25 April	6 May–31 May	15 June	20 September	29 September–31 October	10 November
Rice	6 April	18 April–16 May	24 May	29 August	5 September–6 October	20 October
Rye	n/a	n/a	n/a	n/a	n/a	n/a
Sorghum	8 April	14 April–21 May	3 June	19 August	29 August–27 September	2 October
Soybeans	19 April	26 April–31 May	17 June	10 September	13 September–31 October	9 November
Sugarbeets	n/a	n/a	n/a	n/a	n/a	n/a
Tobacco	n/a	n/a	n/a	n/a	n/a	n/a
Wheat (Winter)	24 September	10 October–18 November	30 November	28 May	2 June–21 June	1 July

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dash, P.; Sanders, S.L.; Parajuli, P.; Ouyang, Y. Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data in an Agricultural Watershed. Remote Sens. 2023, 15, 4020. https://doi.org/10.3390/rs15164020

AMA Style

Dash P, Sanders SL, Parajuli P, Ouyang Y. Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data in an Agricultural Watershed. Remote Sensing. 2023; 15(16):4020. https://doi.org/10.3390/rs15164020

Chicago/Turabian Style

Dash, Padmanava, Scott L. Sanders, Prem Parajuli, and Ying Ouyang. 2023. "Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data in an Agricultural Watershed" Remote Sensing 15, no. 16: 4020. https://doi.org/10.3390/rs15164020

APA Style

Dash, P., Sanders, S. L., Parajuli, P., & Ouyang, Y. (2023). Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data in an Agricultural Watershed. Remote Sensing, 15(16), 4020. https://doi.org/10.3390/rs15164020

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data in an Agricultural Watershed

Abstract

1. Introduction

2. Literature Review

2.1. Use of Remote Sensing in Agriculture

2.2. Classification Methods

2.3. Land Use & Land Cover Classes (LULC)

2.4. Post Classification Correction (PCC)

2.5. Classification Accuracy Assessment

3. Materials and Methods

3.1. Study Area

3.2. Remote Sensing Data and Processing

3.3. Ancillary Data

3.4. Mask Generation

3.5. LULC Classification and Post-Classification Correction

4. Results

4.1. Growing Season

4.1.1. Classification of the Growing Season Imagery before Post-Classification Correction

4.1.2. Classification of the Growing Season Imagery after Post-Classification Correction

4.2. Post-Harvest

4.2.1. Classification of the Post-Harvest Imagery before Post-Classification Correction

4.2.2. Classification of the Post-Harvest Imagery after Post-Classification Correction

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI