The Classification of Noise-Afflicted Remotely Sensed Data Using Three Machine-Learning Techniques : Effect of Different Levels and Types of Noise on Accuracy

Remotely sensed data are often adversely affected by many types of noise, which influences the classification result. Supervised machine-learning (ML) classifiers such as random forest (RF), support vector machine (SVM), and back-propagation neural network (BPNN) are broadly reported to improve robustness against noise. However, only a few comparative studies that may help investigate this robustness have been reported. An important contribution, going beyond previous studies, is that we perform the analyses by employing the most well-known and broadly implemented packages of the three classifiers and control their settings to represent users’ actual applications. This facilitates an understanding of the extent to which the noise types and levels in remotely sensed data impact classification accuracy using ML classifiers. By using those implementations, we classified the land cover data from a satellite image that was separately afflicted by seven-level zero-mean Gaussian, salt–pepper, and speckle noise. The modeling data and features were strictly controlled. Finally, we discussed how each noise type affects the accuracy obtained from each classifier and the robustness of the classifiers to noise in the data. This may enhance our understanding of the relationship between noises, the supervised ML classifiers, and remotely sensed data.


Introduction
Remotely sensed data, especially for satellite images, are used to estimate information about the Earth, its various objects, phenomena, and processes.These images have been widely used for applications of Earth surface monitoring such as land cover classification and change detection, crop yield estimation, and geographic information extraction.Improving the accuracy of the classifications is thus a fundamental research topic in the field of geographic information sciences [1].However, it can be a difficult task depending on the complexity of the landscape, the spatial and spectral resolution of the imagery being used, and the amount of noise included.
Noise may be added to remotely sensed data during different stages of data acquisition or processing, from the moment the data are captured by the sensor until their atmospheric and topographic correction, orthorectification, or co-registration [2].In satellite image classification, the data obtained from the satellite sensors may be affected by atmospheric noise resulting from the obstruction of the light reflected by targets on the Earth's surface by a variety of phenomena, including aerosols and clouds in the atmosphere as well as changing illumination patterns and the angle at which the satellite views the ground at any given time [3].The basic source for image pixels, such as those acquired by satellite sensors, has limited radiometric and geometric resolution.This effect leads to a mixture of classes within one pixel, thereby resulting in the representation of the partial degree membership of at least two classes in the pixel, known as a mixed pixel, which normally increases the classification complexity [4].Moreover, the image generation process may add noise to the data.This is widely found in most Synthetic-aperture radar (SAR) images [5][6][7].These data have to be compressed to reduce their requirements for archiving and data transmission and this may cause artifacts and ambiguities in the final images [8].
Normally, the normal distribution values that occur at a random location by merging with the original values in the satellite image are referred to as zero-mean Gaussian noise [9].In optical remote sensing multispectral imagery, the noise is typically independent of the data and it is generally additive in nature.The noise may reduce the performances of important techniques of image processing such as detection, segmentation, and classification.Most of the natural images are assumed to have additive white Gaussian noise [10,11].
Salt and pepper noise are the highest and lowest global values, respectively, that replace an original pixel at a random location.This is generally caused due to errors in data transmission [5,12].A report of data loss on the Landsat Missions website also refers to an artifact called Christmas Tree anomaly.The lost data appear as bright red, green, and blue artifacts, often next to or overlapping each other.This artifact is usually caused by telemetry data erroneously being included in the satellite imagery [13].The missing data may be replaced by null values (salt noise) or filled with a designated fill pattern by the ground processing systems.Charoenjit et al. [14] stated that one of the problems associated with biomass estimation from very high-resolution (VHR) data is salt-pepper noise.In an attempt to solve the problem caused by the noise, these authors performed object-based image analysis on a Para rubber plantation and proved that the segmentation method can reduce the effect of constant salt-pepper noise.
Multiplicative noise or speckle is widely found in most SAR images [5][6][7].These data have to be compressed to reduce their requirements for archiving and data transmission and this may cause artifacts and ambiguities in the final images [8,15].Speckle noise gives a grainy appearance to radar imageries.It reduces the image contrast, which has a direct negative effect on the texture-based analysis of the imageries [16,17].For example, Nobrega et al. [18] showed a LiDAR intensity image that was affected by signal eccentricities caused by sensor scanning patterns.By filtering the noise, they could improve the efficiency of the image's segmentation.Moreover, Mulder et al. [19] reviewed the use of remotely sensed data for soil and terrain mapping.They concluded that the accuracy of retrieving soil information by using proximal sensing combined with a remote sensing method decreases because of several types of noise such as that caused by a mixture of soil properties and atmospheric, topographic, and sensor noise.Soil moisture retrieval from SAR images is always affected by speckle noise and uncertainties associated with soil parameters, which impact negatively the accuracy of soil moisture estimates.Barber et al. [7] proposed a soil moisture Bayesian estimator from polarimetric SAR images to address these issues.The results indicated that the model enlarges the validity region of the minimization-based procedure and the speckle effects can be reduced by the multilooking method included in the model scheme.
Consequently, the effect of noise within the data can be either mitigated or ignored by selecting appropriate classifiers that are robust to noise, especially for supervised machine-learning classifiers which have a mechanism for handling noise [20][21][22][23].Continuous developments in storage capacity and the processing speed of computers made advanced machine learning (ML) available for land cover classification.Supervised machine-learning algorithms require external assistance in the form of training [24].Their classification accuracy obviously depends on the quality of the training data [20].
The input dataset is normally divided into training and testing datasets.The training dataset has an output variable that needs to be predicted or classified.Most algorithms learn some kind of pattern from the training dataset and apply them to the test dataset for prediction or classification [25,26].
Algorithms based on random forests (RF) [21] utilize decision trees.These algorithms are probably one of the most efficient ML algorithms in terms of prediction accuracy [27].The algorithm has the capability to rapidly process databases ranging from small to very large, and is easy to interpret and visualize [28].Moreover, RF has been reported to be robust to noise [29][30][31].However, the algorithm can slow down as the number of trees increases [28].In recent decades, many scientists have used RF as a classifier.In the original paper describing RF [21], the performance of RF was observed against Adaboost algorithms [32] by classifying 20 different datasets.The authors proved that RF is more robust with respect to noise.Moreover, Crisci et al. [27] reviewed a number of supervised ML algorithms to model mortality events in benthic communities in rocky coastal areas in the northwestern Mediterranean Sea.The RF model yielded the lowest misclassification rates for such ecological data.
Support vector machine (SVM) is a supervised non-parametric statistical learning technique; therefore, no assumption is made about the underlying data distribution [33].The SVM learning algorithm aims to find a hyperplane that separates the dataset into a discrete predefined number of classes in a fashion consistent with the training examples.SVM outperforms other ML models, particularly when only a small dataset is available for training [34,35].However, SVM requires more training time and its performance is dependent on parameter adjustment in comparison to other methods [23].For classification purposes, Dalponte et al. [36] used SVM and the fusion of hyperspectral and LiDAR data, in which speckle noise usually existed.They pointed out that SVM outperformed the maximum likelihood and k-nearest neighbor (k-NN) techniques.Moreover, the incorporation of LiDAR variables generally improved the classification performance and the first return data was the highest contributing factor.Insom et al. [34] improved the accuracy of standard SVM by applying a particle filter (PF) to automatically update the SVM training model parameters to values that were more appropriate for their flood dataset.The performance of the method is superior when compared with standard SVM.Senf et al. [37] performed land cover classification of complex Mediterranean landscapes using an SVM classifier.The work proved that combining phonological profiles of the land covers and human interpretation can significantly improve the generalization power of the SVM models, as previously stated [38].Moreover, SVM was also reported to have good generalization capabilities under different noise levels, types of noise, and sample sizes when it was optimized [39].
A neural network (NN) is an algorithm that simulates the neuronal structure, processing method, and learning ability of the human brain but on much smaller scales.This technique is applicable to problems in which the relationships may be nonlinear or quite dynamic [40].The most common type of NN is based on the back-propagation learning algorithm and is known as a back-propagation neural network (BPNN) [41].BPNN has been proven to be the best among the Multi-layer perceptron (MLP) algorithms [22].However, BPNN tends to be slower to train than other types of NNs, which can be problematic in very large networks with a large amount of data [28].BPNNs in remotely sensed image classification applications have been widely reviewed [41][42][43].They were also reported as very effective to use in noise reduction [44,45] and robust to noises when trained by noise data [46].These researchers have mutually concluded that the BPNN approach is feasible for the classification of remote sensing imagery.
In recent decades, many experiments have been designed for studies aiming to observe the robustness of a variety of classifiers relating to noise content and type.For example, Petrou et al. [2] proposed fuzzy-rule-based classifiers to overcome uncertainty in habitat mapping.The robustness of these classifiers was evaluated by drawing additive homogeneous noise from a zero-mean Gaussian probability density function.The noise was added to the original images and a threshold was assigned by an expert at different standard deviations ranging from 5 to 20% of the image mean values and 5 to 50% of the rule threshold, respectively.The results showed that the classifiers remained mostly unaffected by noise in the data that were used because the classification was object based.In contrast, the additive noise in the rule threshold, which acted as inaccurate expert rules, directly affected the performance.However, the overall accuracies remained over 75%.Celik and Ma [9] investigated the performance of their automatic change-detection method by using different types and levels of added noise.Different levels of zero-mean Gaussian noise ranging from 35 to 50 dB (peak signal-to-noise ratio (PSNR)) were used in the experiments.Moreover, the authors contaminated the test images with different levels of speckle noise to illustrate the noise that commonly occurs in SAR data.The method proved to be highly robust (accuracies of approximately 95-100%) to both types of noise when the classifier was assigned an appropriate parameter value.Moreover, the noise test could assist the authors in optimizing the parameter used in the classification with different levels of noise.Dierking and Dall [6] used complex scattering matrix data of the Electromagnetics Institute's Synthetic Aperture Radar (EMISAR) system in order to simulate image products with different levels of pixel dimensions and noise levels of −35, −30, −25, and −20 dB.Ice deformation maps were generated by using a selected threshold, such that image pixels with intensities equal to or larger than the highest 2% of the ice intensity distribution level were classified as deformed ice.The results indicated that if the level of the sensor noise is equal to or larger than the average backscattering intensity of the level ice, the retrieved values for the deformation parameters change.Moreover, an L-band SAR system at like-polarization, with an incidence angle larger than 35 • and a noise level of at most −25 dB or lower, is desirable for mapping the deformation state of sea ice cover.
In this work, we employed three supervised ML classifiers, i.e., random forest (RF), support vector machine (SVM), and back-propagation neural network (BPNN), to classify the land covers in satellite images afflicted by three classes of noise, namely zero-mean Gaussian, salt-pepper, and speckle noise.The classification was first carried out on an image corrected for absolute atmospheric interference, termed the reference image, to obtain the standard accuracies.Subsequently, noise was applied to the reference image in different concentrations.The normal reflectance image (without absolute atmospheric correction) was also classified to illustrate the real effects of mixing different kinds of noise.To conclude, we experimented with the performance of the classifiers under the following noise conditions: 1.
Fully processed images with absolute atmospheric correction, termed the reference images.
Reference images with zero-mean Gaussian noise added; 4.
Reference images with salt-pepper noise added; 5.
Reference images with speckle noise added (multiplicative noise).
Discussions are presented to describe and examine the performance of the different classifiers to each noise type and level in addition to assessing the research questions: (1) To what extent do the different types of noise affect the classification accuracies?(2) How do the classifiers behave toward noise-afflicted image classification?Additionally, the effect of the different types of noise and the behavior of the classifiers on remotely sensed data are considered.Note that the results and conclusions that we made in this study are mainly aimed at providing an understanding of the core mechanisms of the ML classifiers toward noises in remotely sensed data rather than comparing their performances.

Experimental Design
Figure 1 shows the overall experimental flow of the current study.A Landsat 8 OLI scene was acquired and processed to be a reference image (refer to Section 2.2.for the image processing).We added different levels of three types of noise, zero-mean Gaussian noise, salt-pepper noise, and speckle noise to the reference image to produce noise-afflicted images.Twenty-three images (one reference image, one image without absolute atmospheric correction, and 21 images to which artificial noise was added) were then classified by BPNN, SVM, and RF to obtain the classification accuracies.The classifications were scoped by the following conditions.First, the training and validating samples were extracted using the same shape file (i.e., samples were acquired at the same image locations).Second, the parameters used in each of the algorithms were optimized.Please refer to Section 2.4 for a detailed explanation.In addition, we observed the robustness of a classifier by the differential accuracies of the reference image and the images to which noise was added.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 5 of 21 artificial noise was added) were then classified by BPNN, SVM, and RF to obtain the classification accuracies.The classifications were scoped by the following conditions.First, the training and validating samples were extracted using the same shape file (i.e., samples were acquired at the same image locations).Second, the parameters used in each of the algorithms were optimized.Please refer to Section 2.4 for a detailed explanation.In addition, we observed the robustness of a classifier by the differential accuracies of the reference image and the images to which noise was added.

Remotely Sensed Data
The satellite image is a Landsat-8 Operational Land Imager (OLI) image acquired on December 26, 2014 (Path 130, Row 50), shown in Figure 2. The image was acquired under clear sky conditions, a very low content of homogeneous aerosol, and in the absence of cloud objects.The scene includes Landsat 8 surface reflectance data generated from the Landsat Surface Reflectance Code (LaSRC).The LaSRC makes use of the coastal aerosol band to perform aerosol inversion tests, uses auxiliary climate data from the Moderate Resolution Imaging Spectroradiometer (MODIS), and uses a unique radiative transfer model.The model artifacts or blockiness presented in the surface reflectance data products were excluded from the image and any future calculation by employing the quality assessment (QA) band.We termed this image the reference image.
The study area, which covered the southeastern part of the Srinakarin dam, an embankment dam on the Khwae Yai River in the Si Sawat District of the Kanchanaburi Province of Thailand, included five land cover classes: agriculture, bare land, construction, water body, and forest.Descriptions of the classes are provided in Table 1.The sample pixel locations were randomly selected from Thailand's 2014 land cover ground survey map obtained from the Geo-Informatics and Space Technology Development Agency (GISTDA).The phenology of plants or land cover may change after the referencing date (the time lapse is about two months).Thus, we incorporated human interpretation and Google Earth images with an acquisition date near the Landsat image to confirm that the samples had not changed category during the time lapse.

Remotely Sensed Data
The satellite image is a Landsat-8 Operational Land Imager (OLI) image acquired on 26 December 2014 (Path 130, Row 50), shown in Figure 2. The image was acquired under clear sky conditions, a very low content of homogeneous aerosol, and in the absence of cloud objects.The scene includes Landsat 8 surface reflectance data generated from the Landsat Surface Reflectance Code (LaSRC).The LaSRC makes use of the coastal aerosol band to perform aerosol inversion tests, uses auxiliary climate data from the Moderate Resolution Imaging Spectroradiometer (MODIS), and uses a unique radiative transfer model.The model artifacts or blockiness presented in the surface reflectance data products were excluded from the image and any future calculation by employing the quality assessment (QA) band.We termed this image the reference image.
The study area, which covered the southeastern part of the Srinakarin dam, an embankment dam on the Khwae Yai River in the Si Sawat District of the Kanchanaburi Province of Thailand, included five land cover classes: agriculture, bare land, construction, water body, and forest.Descriptions of the classes are provided in Table 1.The sample pixel locations were randomly selected from Thailand's 2014 land cover ground survey map obtained from the Geo-Informatics and Space Technology Development Agency (GISTDA).The phenology of plants or land cover may change after the referencing date (the time lapse is about two months).Thus, we incorporated human interpretation and Google Earth images with an acquisition date near the Landsat image to confirm that the samples had not changed category during the time lapse.

Sampling Design
The sampling framework utilized in this paper can be explained as follows.
(1) Four thousand and four hundred random samples were randomly generated based on the Thailand's 2014 land cover ground survey map via Generate Random Sample-Using Ground Truth Classification Image in ENVI classic version 5.1.(HarrisGeospatial Solutions, Inc., Melbourne, Australia) The random samples include all classes except the construction class because it rarely existed in the study area; therefore, we manually selected 100 samples of the construction class by human interpretation.The initial 4550 random samples include 2600 samples of the forest class, 650 samples of the agriculture class, 650 samples of the bare land class, and 650 samples of the water body class.Note that the forest class accounts for more than 50% land cover of the study area, so we provided a sample size three times larger than other classes to capture the forest diversity.On the other hand, we maintained the same proportion for the water body class as it has less complexity.As a result, 4650 samples were obtained in this step.(2) The samples close to the edge of the class vectors or their classes changed during the time lapse of the data acquisition and survey map and were excluded.The criterion for removing the edge's pixels is that the distance from the vector line is less than three pixels or 90 m.The remaining pixels were compared one-by-one with the Google Earth images using human interpretation to detect any category changes.(3) Finally, 4270 samples remained in the experiment.An 80/20 ratio of training and validation samples was randomly drawn without replacement in MATLAB programing.The final sample distribution is shown in Table 1.

Sampling Design
The sampling framework utilized in this paper can be explained as follows.(1) Four thousand and four hundred random samples were randomly generated based on the Thailand's 2014 land cover ground survey map via Generate Random Sample-Using Ground Truth Classification Image in ENVI classic version 5.1.(Harris Geospatial Solutions, Inc., Melbourne, Australia) The random samples include all classes except the construction class because it rarely existed in the study area; therefore, we manually selected 100 samples of the construction class by human interpretation.The initial 4550 random samples include 2600 samples of the forest class, 650 samples of the agriculture class, 650 samples of the bare land class, and 650 samples of the water body class.Note that the forest class accounts for more than 50% land cover of the study area, so we provided a sample size three times larger than other classes to capture the forest diversity.On the other hand, we maintained the same proportion for the water body class as it has less complexity.As a result, 4650 samples were obtained in this step.(2) The samples close to the edge of the class vectors or their classes changed during the time lapse of the data acquisition and survey map and were excluded.The criterion for removing the edge's pixels is that the distance from the vector line is less than three pixels or 90 m.The remaining pixels were compared one-by-one with the Google Earth images using human interpretation to detect any category changes.(3) Finally, 4270 samples remained in the experiment.An 80/20 ratio of training and validation samples was randomly drawn without replacement in MATLAB programing.The final sample distribution is shown in Table 1.
To ensure that the accuracy of the classifiers was comparable and less biased, 80% of data points was used to train the various classification algorithms and 20% was used for comparison.This means that all of the training elements required for generating a classification model-training set, testing set, or validating set (if needed)-were included in the 80% training data.Note that each classifier used this data in different ways based on the training algorithms.The management of the training data of each classifier can be found in Section 2.5.Finally, we distinctly preserved 854 validating samples that were sparse in the image (20% of the reference data) to evaluate the classification accuracies.Note that only the Coastal, Blue, Green, Red, Near-Infrared (NIR), Shortwave-Infrared 1 and 2 (SWIR1 and SWIR2, respectively) bands of the Landsat 8 OLI dataset were used as spectral features in this study.

Noise Afflictions
We investigated the highest case of noise affliction in the satellite image by simulating the types of noise mentioned below in the MATLAB environment, which was then applied to the reference image at different levels of noise.The Gaussian distribution noise can be expressed by: where P(x) is the Gaussian distribution noise in an image; µ and σ are the mean and standard deviation of the noise process, respectively.Salt and pepper noise are the highest and lowest global values, respectively, that replace an original pixel at a random location.To simulate salt and pepper noise, the following conditions were applied to the noise-free image: where p1 and p2 are the probabilities density function (pdf); p(x) is the distribution of salt and pepper noise in image; A, B are the possible minimum and maximum values, respectively (in this paper they are 0 and 1, respectively).The speckle noise is known as multiplicative noise, i.e., it is produced by the coherent superposition of spatially random multiple scattering sources within the resolution volume of the sensor [47].The speckle noise distribution can be expressed as: where J is the distribution speckle noise image; I is the input image; and n is the uniform noise image by specific mean and variance.The noise level is quantitatively defined in decibels (dB) in terms of the PSNR.Given an input image I and its noisy image H, the PSNR between the two images can be expressed as: where i and j are pixel locations in rows and columns, and m and n are the maximum number of image rows and columns, respectively.Note that a smaller value of the PSNR refers to increasing noise intensity.The noisy images were produced by using different PSNR values.Finally, the sample images used in this study comprise a total of 23 datasets with the following descriptions: Reference images (applied absolute atmospheric correction and noise removal) Level 1 Landsat 8 images without absolute atmospheric correction (only convert from digital number to reflectance values) Reference images + zero-mean Gaussian noise from 10 to 40 dB (in steps of 5 dB) Reference images + salt and pepper noise from 10 to 40 dB (in steps of 5 dB) Reference images + speckle noise from 10 to 40 dB (in steps of 5 dB) Note that each pixel value of the reference image was rescaled to values between 0 and 1 using the minimum and maximum values (as shown in Equation ( 5)) of its band before noise generation.If an image data has i rows, v columns, and u bands, the rescaling equation in this work is expressed as: where I(i, v, u) is the rescaled value of the original value Q(i, v, u).The terms min(u) and max(u) denote the minimum and maximum values selected from band u, respectively.

Classifiers-Implementation Packages
The classifiers are described in terms of their key concepts, operational details, and parameter settings that are exclusively used in this study.Further details of both the theoretical and mathematical descriptions can be found by linking to the appropriate work that provides extensive points of interest (RF: [21,48,49], SVM: [50,51], BPNN: [22,42]).

Random Forests (RF)
Random Forests build multiple decision trees based on random bootstrapped samples of the training data.In contrast to other classifiers, RF neither causes overfitting nor does it require a long training time [21].Each tree is built using a subset that differs from the original training data, containing approximately two-thirds of the cases, and the nodes are split using the best split variable among a subset of m randomly selected variables [52].The trees are created by drawing a subset of training samples by using the replacement or bagging method.Each decision tree is independently produced without any pruning and each node is split using a number (m) defined by the user.By growing the forest up to the number of trees (k), the algorithm creates trees with high variance and low bias [21].The final classification decision is taken by averaging the class assignment probabilities calculated by all trees.Therefore, two parameters are required to construct an RF framework: the number of trees (ntree) in the ensemble and the number of variables used to split the nodes (mtry).The classifications were performed using the well-known "randomForest" package for R [43].
All of the considered parameters were investigated, namely, the ntree was observed from 500 to 2000, and the mtry was observed from 2 to 5. The resulting model was selected by choosing the most accurate model that was obtained by running the model generation procedure 30 times.Table 2 shows the parameter tuning of all classifiers in this study.The optimized models are different for each observed image.

Support Vector Machines (SVM)
The SVM algorithm aims to find a decision boundary termed a hyperplane that separates the dataset into a discrete predefined number of classes in a fashion consistent with the training examples.In this study, we employed the well-known "LIBSVM" package [53] implemented in the MATLAB environment.A brief mathematic description of the SVM package is provided below.
Given the input training sets of l examples x i with labels y i : (x i , y i ), i = 1, 2, 3, . . ., l where x i ∈ R N and y ∈ {1, −1} l , the SVMs require the solution of the following optimization problem [53] as revised from The Soft Margin Hyperplane [51]: where w, b, T, and ξ are the l-dimensional vectors, bias, transpose function, and non-negative variables, respectively.The training vectors x i are mapped into a higher dimensional space by the function φ.SVM finds a linear separating hyperplane with the maximal margin in this higher dimensional space.C > 0 denotes the penalty parameters of the error term.Furthermore, K x i , x j ≡ φ(x i ) T φ x j is termed the kernel function.In this work, the radial basis function (RBF) kernel is used.It is expressed as: where γ is the kernel parameter.Therefore, only two parameters-the penalty parameter C and kernel parameter γ-are required when using the method.
To attain reasonable accuracy in each of the classifications, we implemented the algorithm in a step-by-step manner.First, the training data were converted to a suitable form for the SVM package.The labels were changed from category attributes into binary numbers 0 and 1.For example, {water body, constructions, agriculture} were represented as (1,0,0), (0,1,0), and (0,0,1).The straightforward grid-search cross-validation approach [53], which is provided in the package, was performed to find the best model parameters, since the values of C and γ directly affect the performance of the SVM model [34].The cross-validation was replicated 30 times (Table 2).Therefore, the final SVM model was generated by training the entire training set by the obtained parameters in regards to the testing set, after which we exported the model to classify the 20% validating data.

Back-Propagation Neural Network (BPNN)
The goal of BPNN is to compute the partial derivative or gradient of a loss function with respect to any weight in the network.The loss function calculates the difference between the input training example and its expected output, after the example has been propagated through the network [36].The basic concern of an NN algorithm is that it cannot perform accurately beyond the range of trained inputs; that is, the classification accuracies strongly depend on the data used to train the networks.The modeling data were labeled by binary number forms and then they were randomly separated into training, testing, and validating sets in the ratios of 70%, 15%, and 15%, respectively.The number of hidden layers was observed from 5 to 60.The BPNN model was automatically trained until the network converges by using the trainscg function (scaled conjugate gradients or SCG), the details of which have been described by Moller [49], in the MATLAB environment.The SCG training was automatically optimized by the parameter sigma σ (which determines the change in weight for the second derivative approximation) and lamda λ (which regulates the indefiniteness of the Hessian).The values of σ and λ were taken as 5 × e −5 and 5 × e −7 , respectively.The model was repeated 30 times and the best model was then employed to classify the given validating data.The mean and standard deviation of the model accuracies were reported.

Image with Added Noise
Each noisy image was constructed by considering the PSNR values.An automatic noise-adding function was used to apply random noise to each satellite image band.In the case of salt-pepper noise, the pixel location may possibly be disturbed by the noise (irrespective of whether it is salt or pepper) more than once, i.e., the first location (row 1, column 1) of the seven-band satellite image data consists of seven pixels, and the noise can either be added or not added to a pixel, some pixels, or all pixels.In the case of additive (zero-mean Gaussian noise) and multiplicative noise (speckle noise), all pixels in each image band were directly afflicted by adding or multiplying by random values.Note that, to produce the intended PSNR, a certain noise parameter is associated with each image band depending on the type of noise.In other words, a noise layer was prepared by using specific parameters such as the variance, mean, and standard deviation, after which the PSNR was calculated from the mean-square error of the original image and the noise layer combined with the original image.This requires the parameters to be assigned a certain value before they are used to target a certain PSNR.The noisy images are shown in Figure 3.

BPNN
The mean model accuracy (OA m1 ) and mean validation accuracy (OA V1 ) obtained by classifying (30 replications) the reference image by the neural network model are 99.1% and 96.7% (K = 0.94).In the case of images without atmospheric correction, the algorithm produced accuracies of 99.0% and 96.4% (K = 0.94) for OA m1 and OA V1 , respectively.It is obvious that the value of OA V is fairly low when the algorithm encounters very high noise content.The best validation accuracy (OA V0 ) of the images to which zero-mean Gaussian noise was added is 65.9% (K = 0.31) at 10 dB, for which the classifier was completely unable to classify the construction class until reaching a PSNR of 30 dB.The accuracy rapidly increased from this percentage to over 90% at 25 dB, then slightly increased to 98.5% (K = 0.97) at 40 dB.
The OA V0 for image classification with added speckle noise at 10 dB is 73.8% (K = 0.52), which differs considerably from the lowest accuracy obtained for the zero-mean Gaussian noise; however, the construction class still cannot be classified until a PSNR of 30 dB is reached.Above 25 dB, the image classification accuracies are higher for images to which zero-mean Gaussian noise was added in comparison to speckle noise.In contrast, the classification accuracies obtained for images with added salt-pepper noise are higher than those to which other types of noise were added, i.e., the mean validation accuracies (OA V1 ) exceed 85% (K ≥ 0.75) at 10 dB and are over 90% (K ≥ 0.85) for the remaining PSNR levels.The classification accuracies obviously showed that BPNN is affected by all types of noise at increasing noise intensities.Details of the classification accuracies of the BPNN classifiers are presented in Table 3.

SVM
The mean model accuracy (OA m1 ) and mean validation accuracy (OA V1 ) obtained from classifying the reference image by the SVM model are as high as the values obtained from non-atmospheric-corrected images, i.e., approximately 99% (K ≥ 0.98).The OA V0 of the images to which zero-mean Gaussian noise was added is 64.5% (K = 0.23) at 10 dB, for which the classifier was completely incapable of classifying the agriculture and construction class until PSNRs of 15 dB and 25 dB, respectively, were reached.In most of the cases, the accuracy increases from this percentage to 99.2% (K = 0.99) at 40 dB.The OA V1 for the classification of images to which speckle noise was added at 10 dB is 74.5% (Kappa = 0.54), which differs by 10% from the value obtained for zero-mean Gaussian noise; however, it is still not possible to classify the construction class until a PSNR of 30 dB is reached.At high noise content (10-20 dB), the classifications of images to which speckle noise was added yielded accuracies higher than those of the datasets afflicted by zero-mean Gaussian noise.Contrary to this, the mean validation accuracies are over 90% (K ≥ 0.83) for all classification tasks that were afflicted by salt-pepper noise.Moreover, the SVM algorithm could model all classes to which this noise was added.Details of the classification accuracies of the SVM classifiers are provided in Table 4.

RF
In the case of RF classifiers, the mean model accuracy (OA m1 ) and mean validation accuracy (OA V1 ) obtained by classifying the reference image are 99.1% and 98.8% (K = 0.98), respectively.The algorithm produced the closed OA V1 value for the reference and image without atmospheric correction, which is consistent with the results obtained from the BPNN and SVM.The increasing trends observed for OA m1 and OA V1 for the classification of images to which zero-mean Gaussian and speckle noise was added is similar to that of SVM and BPNN, but the RF classifiers are more effective at high noise intensity.Unexpectedly, the mean validation accuracies reached 93.8% (K ≥ 0.89) for 10 dB salt-pepper noise-afflicted image classifications.Consequently, classifying the construction class, the classification of which is problematic with both BPNN and SVM, remains impossible until PSNRs of 25, 15, and 30 dB are reached with zero-mean Gaussian, salt-pepper, and speckle noise, respectively.Details of the classification accuracies of the RF classifiers are provided in Table 5.   0.000 0.000 0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.001 0.002 0.001 0.002 0.000 0.001 0.000 0.002 0.001 0.001 0.001 0.001 0.001 0.001

Performance of the Classifiers vs. the Reference and the Non-Atmospheric-Corrected Image
The mean kappa coefficients of the same classifier for classification of the reference image and the image without atmospheric correction are very similar.For example, the BPNN produced K = 0.94 for both the classification of reference and non-atmospheric-corrected image, which indicates that there is insignificant difference between the classification result of the images.This proves that the classifications using the three ML classifiers on data from a single date Landsat 8 OLI image may not require absolute atmospheric correction.In other words, the Level 1 Landsat products are already used for land cover classification after digital conversion to reflectance.However, the absolute atmospheric correction is still required in some works, i.e., time-series analysis and multi-sensor image data.

Analysis of Noise and Classifiers
Zero-mean Gaussian noise, which was used to illustrate that the effect of adding random values to the image pixels has a significant impact on the classification performance of all classifiers in the presence of high-intensity noise, causes the mean validation accuracies to be lower than 66% (K ≤ 0.28) (Figure 4).The three classifiers were unable to classify the construction class unless the noise content was decreased to 25 dB (RF and SVM) and 30 dB (BPNN).Moreover, a noise intensity of 10 dB also caused SVM to fail in the classification of the agriculture class, for which BPNN and RF also produced very low accuracies.These phenomena have their origins in the high variance of noise that was added to the training data.This variance causes additional overlapping of the ranges of values among the categories.Another explanation may be that the addition of noise changed the samples significantly.In other words, increasing the additive or zero-mean Gaussian noise decreases the representativeness of each category by changing the pattern of the samples in some or all of the layers.

Performance of the Classifiers vs. the Reference and the Non-Atmospheric-Corrected Image
The mean kappa coefficients of the same classifier for classification of the reference image and the image without atmospheric correction are very similar.For example, the BPNN produced K = 0.94 for both the classification of reference and non-atmospheric-corrected image, which indicates that there is insignificant difference between the classification result of the images.This proves that the classifications using the three ML classifiers on data from a single date Landsat 8 OLI image may not require absolute atmospheric correction.In other words, the Level 1 Landsat products are already used for land cover classification after digital conversion to reflectance.However, the absolute atmospheric correction is still required in some works, i.e., time-series analysis and multi-sensor image data.

Analysis of Noise and Classifiers
Zero-mean Gaussian noise, which was used to illustrate that the effect of adding random values to the image pixels has a significant impact on the classification performance of all classifiers in the presence of high-intensity noise, causes the mean validation accuracies to be lower than 66% (K ≤ 0.28) (Figure 4).The three classifiers were unable to classify the construction class unless the noise content was decreased to 25 dB (RF and SVM) and 30 dB (BPNN).Moreover, a noise intensity of 10 dB also caused SVM to fail in the classification of the agriculture class, for which BPNN and RF also produced very low accuracies.These phenomena have their origins in the high variance of noise that was added to the training data.This variance causes additional overlapping of the ranges of values among the categories.Another explanation may be that the addition of noise changed the samples significantly.In other words, increasing the additive or zero-mean Gaussian noise decreases the representativeness of each category by changing the pattern of the samples in some or all of the layers.We simulated the occurrence of speckle noise on the reference image by multiplying the random values at different PSNRs.It should be noted that, in real situations, speckle noise either exists infrequently or occurs at very low radiance in Landsat 8 OLI data.The effect of speckle noise is quite similar to that of zero-mean Gaussian noise, in that these types of noise decrease the representativeness of the training model of each class.According to the result, the three algorithms can overcome high-intensity speckle noise more effectively than high-intensity zero-mean Gaussian noise.When considering the UA and PA of each class, the commission and omission errors are lower than the classification accuracies for images to which zero-mean Gaussian noise was added.This is We simulated the occurrence of speckle noise on the reference image by multiplying the random values at different PSNRs.It should be noted that, in real situations, speckle noise either exists infrequently or occurs at very low radiance in Landsat 8 OLI data.The effect of speckle noise is quite similar to that of zero-mean Gaussian noise, in that these types of noise decrease the representativeness of the training model of each class.According to the result, the three algorithms can overcome high-intensity speckle noise more effectively than high-intensity zero-mean Gaussian noise.When considering the UA and PA of each class, the commission and omission errors are lower than the classification accuracies for images to which zero-mean Gaussian noise was added.This is attributed to the small change in the product after multiplication of the noise and the low pixel value compared to the original when compared to zero-mean Gaussian images, especially in the case of dark objects such as water bodies and mountain shadows (Figure 3).
Salt and pepper noise is the minimum and maximum of the image pixel values (here they are 0 and 1) and randomly replaces the original pixels.In this case, the replaced noise pixels can be seen as data loss or missing data in the case of salt, and black dots in the case of pepper, which can sometimes be removed from the image by using a mask.However, if the reference data points were located among those pixels removed by the mask, it would be problematic, because it would cause valuable ground truth data points to be lost.In this study, the selected ML algorithms were proved to be robust to this kind of noise.Figure 4 clearly shows the mean validation accuracies (OA V1 ) of all classifiers higher than 85% (K ≥ 0.76).RF outperforms the other classifiers and has an OA V1 value of 93.8% (K = 0.89) at the highest noise intensity used in this study (10 dB).
The question may arise here as to why salt-pepper noise had a lesser effect than other types of noise in the classification using these classifiers.This can be explained by pointing to the characteristic of the noise.The noise is not applied to all image pixels; instead, it randomly "replaced" the target pixel by certain values.If the set of a training class is not completely replaced by the noise, the remaining unchanged values can be used to represent the pattern of the class.In other words, we have seven layers for training samples, and if only two of these are changed to the same value (even if they are changed to a different value), the remaining five layers, which are strongly representative of the class, are still available for classification.This is a unique characteristic of salt-pepper noise and is different from that of speckle and zero-mean Gaussian noise.
The algorithm used to train the classifiers also plays an important role in the classification.The black box of basic BPNNs returned a final model that consisted of two main parameters: bias and weights.In this sense, the final model truly represents all of the training samples.This is why the BPNN models trained by noisy data provided lower accuracies than others.Normally, we can improve BPNN accuracy by reducing noise in the data or including more representative training samples in the BPNN learning process.SVM separates binary classes using the optimal hyperplane.The final hyperplane is generated by a few support vectors near the boundary area of the two classes.In this investigation of the effect of noise, the SVM is significantly more robust, because of the fact that the SVM's parameter selection only determines optimal support vectors to construct the final hyperplanes separating each of the categories.This means that the outlier or noise data were confirmed to be mostly excluded from hyperplane construction.Moreover, for the construction class, which has a very small training size disturbed by high noise intensity, SVM seems to be more effective than the other classifiers.In this study, RF outperformed all the classifiers for the classification of images to which salt-pepper noise was added.This is because the RF randomly generated a large number of trees (decision trees) from training data to produce forests, following which the unknown test samples were voted for by all trees in the final step.This means that anomaly values such as noise may not be voted for by the trees.
In a classification using supervised ML algorithms, the representativeness and the number of images used as training data should be of greater concern when the satellite image is afflicted by noise.Considering the training data in Table 1, the reference samples of each class are imbalanced based on the fact that the percentage of each class in the image is not equal.A rough count of the study area, shown in Figure 2, indicates that the water body and forest class accounts for approximately 60% of the image and the remaining 40% consists of pixels in the bare land, agriculture, and construction classes.An examination of the result of the classifications revealed that the construction class always has the lowest accuracies of any of the classifiers.This may come from one or both of the following: (case 1) the construction class contains too few samples to represent the class characteristic in the presence of noise; (case 2) the limitations of remotely sensed data to represent the class for any given pixel resolution.In this study, based on the results, we can strongly assume that the misclassified samples cannot be explained by case 2, because the accuracy result of the reference image classification confirms that the construction class can be properly classified.The classification accuracy of the water body and forest classes was mostly high despite the existence of high noise content, because these classes have sufficient training samples to expose the class characteristic even though it is afflicted by noise.
In our experience, supervised-based classifiers, especially ML classifiers, are sensitive to the structure or pattern of training samples because they learn from these data.For example, two size-fixed agricultural areas of a crop with different planting densities (one is sparse, the other is dense) can generally be seen as different classes in medium-resolution images such as a Landsat image due to the effect of the soil background.The user needs to use this information to train the classifier to produce a reasonable and accurate result.This is similar to the case of noise-afflicted data.If it is not possible to prevent or eliminate noise from data, a more effective approach would be to teach the classifier to understand the noise patterns by increasing the number of training samples or by using training data dispersed across the area.
The experiment in this study was carried out on the assumption that the same variance in noise occurred in every layer, whereas the occurrence of real noise varies as it is only found in some bands or as a mixture of different types and intensities of noise.We elucidated the gap by expressing the simulated noise by PSNR as an appropriate way in which the real noise can also be measured by this measurement.However, the intensity of real noise is normally not as high as the noise levels we investigated.The experimental design aimed to determine the effect of noise and to test the performance of the ML algorithms.Consequently, the wide range of noise levels was used to capture the behavior and effects of noise.

Advanced Extension of the MLs
Recent decades have seen various extensions of the BPNN (e.g., [54][55][56]), SVM (e.g., [56,57]), and RF (e.g., [56,58]) algorithms that were developed to overcome their drawbacks and, we believe, can perform more effectively than the basic versions of these algorithms.We suggest that readers who require a very high accuracy use the extended version or a noise filter [59][60][61].In general, the application of advanced MLs depends greatly on the related theory or technique.For example, fuzzy set concepts [62] are usually attached to basic MLs to make the decision space in categorizing more flexible (fuzzy random forest [63]; fuzzy nonlinear proximal SVM [64]; fuzzy-NN [65]).Markov-systems-based techniques also belong to the leading trend of increasing the classification accuracy (Markov-random-field-based SVM [57]; MLP-Markov chain models [66]).Beside the abovementioned extensions, there are various combinable algorithms that exist in classification field, such as a particle filter [35] and genetic algorithm.
Although the robustness of the basic BPNN towards noises observed in this study was lower than that of RF and SVM, the new-generation neural networks, especially deep learning, are outstanding among other advanced learning algorithms [67].Deep learning algorithms automatically extract features and abstractions from the underlying data.Numerous reports in the literature have demonstrated that data representations obtained from a deep learning model often yield better results, e.g., improved classification modeling.For example, Hu and Yi (2016) obtained a reasonable result for ground point extraction (for digital terrain modeling) by employing deep convolution neural networks using 17 million labeled Airborne laser scanning (ALS) points [55].Moreover, Chen et al. (2014) developed a hybrid framework of principle component analysis (PCA), deep learning architecture, and logistic regression for accurate hyperspectral data classification that yielded higher accuracy than RBF-SVM [68].The examples demonstrated the performance of deep learning over complicated works when a large number of training samples are available.
Nevertheless, the majority of users still use the original algorithms without any complicated extensions.Our work is not only intended to assist private users with the selection of appropriate classifiers for their work, but also to transfer an understanding of remote sensing data to algorithm developers.Future work on other ML classifiers, especially on new algorithms invented in the last decade, would be necessary to compare them to the existing algorithms.

Conclusions
In this study, we investigated the effects of adding simulated zero-mean Gaussian, salt-pepper, and speckle noise on image classification performance.Three well-known machine-learning classifiers, i.e., back-propagation neural network (BPNN), support vector machine (SVM), and random forest (RF), with the most basic parameter settings, were used for the classifications.A Landsat 8 OLI scene of Srinakarin dam, Kanchanaburi Province, Thailand, acquired under cloud-free conditions with absolute atmospheric correction, was used as reference data for noise affliction.We specified the same number of usable training data for all classifiers.The experiment started by classifying the reference image by the classifiers, after which the results were set as the baseline accuracy for interpretation.The reference image was then contaminated by different levels of noise at peak signal-to-noise ratios (PSNRs) ranging from 10 to 40 dB (in increments of 5 dB).Finally, 21 noisy images were obtained and separately classified by the classifiers.As expected, the lower the PSNR (high noise intensity), the lower the accuracy.All classifiers provided accuracies over 96% for the reference and the image without atmospheric correction.This means that absolute atmospheric correction in single date classification is unnecessary for images acquired under clear weather conditions.We suggest from our experiment that an increase in the number of training and collecting samples dispersed in various patterns across the study area facilitates the improvement of the representativeness of the class when the data is afflicted by high-intensity noise.More specifically, in terms of the performance of the classifiers, the BPNN models provided accuracies lower than others in very high noise intensity, because the algorithm strongly depends on the values of the training samples.According to the results, SVM often provides a high accuracy because SVM parameter selection can be used to ensure that only optimal support vectors are found to construct the final hyperplanes separating each of the categories, and the minor fluctuations caused by noise were confirmed to mostly be excluded from model construction.RF outperformed all of the classifiers in salt-pepper noise added image classifications.However, although in this study the three ML classifiers proved to be very robust to low-to-medium noise intensity (PSNR > 20) without using any filter or extension, we suggest taking advantage of a noise filter or the advanced extensions of these ML classifiers, especially for cases requiring very high accuracy (almost 100%).

Figure 2 .
Figure 2. True color image of the study area.

Figure 2 .
Figure 2. True color image of the study area.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 10 of 21 calculated from the mean-square error of the original image and the noise layer combined with the original image.This requires the parameters to be assigned a certain value before they are used to target a certain PSNR.The noisy images are shown in Figure 3.

Figure 4 .
Figure 4. Mean validation accuracy trends of (a) neural network, (b) support vector machine, and (c) random forest classifier in relation to different noise levels and types.

Figure 4 .
Figure 4. Mean validation accuracy trends of (a) neural network, (b) support vector machine, and (c) random forest classifier in relation to different noise levels and types.

Table 1 .
Number of samples of modeling and application testing datasets.

Table 1 .
Number of samples of modeling and application testing datasets.
* Modeling data (M) and application testing (A) data.

Table 2 .
Parameter tuning of all classifiers.

Table 3 .
Classification accuracies of the BPNN classifiers.

Table 4 .
Classification accuracies of the SVM classifiers.

Table 5 .
Classification accuracies of the RF classifiers.