A Noise-Resilient Online Learning Algorithm for Scene Classiﬁcation

: The proliferation of remote sensing imagery motivates a surge of research interest in image processing such as feature extraction and scene recognition, etc. Among them, scene recognition (classiﬁcation) is a typical learning task that focuses on exploiting annotated images to infer the category of an unlabeled image. Existing scene classiﬁcation algorithms predominantly focus on static data and are designed to learn discriminant information from clean data. They, however, suffer from two major shortcomings, i.e., the noisy label may negatively affect the learning procedure and learning from scratch may lead to a huge computational burden. Thus, they are not able to handle large-scale remote sensing images, in terms of both recognition accuracy and computational cost. To address this problem, in the paper, we propose a noise-resilient online classiﬁcation algorithm, which is scalable and robust to noisy labels. Speciﬁcally, ramp loss is employed as loss function to alleviate the negative affect of noisy labels, and we iteratively optimize the decision function in Reproducing Kernel Hilbert Space under the framework of Online Gradient Descent (OGD). Experiments on both synthetic and real-world data sets demonstrate that the proposed noise-resilient online classiﬁcation algorithm is more robust and sparser than state-of-the-art online classiﬁcation algorithms.


Introduction
Due to the rapid development of sensor and aerospace technology, more and more high-resolution images are available [1][2][3][4][5][6].Remote sensing images enable us to measure the Earth's surface with detailed structures that have been extensively used in many applications such as military reconnaissance, agriculture, and environmental monitoring [7].Hence, the remote sensing image is one kind of important data source [8].The proliferation of remote sensing imagery motivates numerous image learning tasks such as representation learning [9][10][11] and further scene recognition (classification) [1,[12][13][14][15][16]. Thereinto, scene classification aims to automatically assign a semantic label to each image in order to know which category it belongs to.As scene classification can provide a relatively high-level interpretation of images, it has received growing attention and much exciting progress has been extensively reported in recent years.However, there are two major challenges that seriously limit the development of scene classification.
-Lacking Noise-Resilient Scene Classification Algorithm: since images' categories are often annotated by human beings, and it is natural for us to make some incorrect annotations especially when we are provided with massive images.In addition, an image may cover several semantics.For example, the images in Figure 1 can be annotated with the scene of river or forest, but, under the framework of multi-classification, only one category is assigned to each of the images.Thus, noisy labels are often inevitable in scene classification.It is necessary to devise a scene classification algorithm that is robust to noisy labels.-Lacking Online Scene Classification Algorithm: a vast majority of existing scene classification algorithms predominantly focus on the static setting and require the accessibility of the whole image data set.However, with the constant improvement of satellite and aerospace technology, a large number of images are available continuously in the streaming fashion.The requirement to have all the training data in prior to training poses a serious constraint in the application of traditional scene classification algorithms based on batch learning techniques.To this end, it is necessary and of vital importance to perform online scene classification to adapt to the streaming data accordingly.Image may be associated with more than one semantic category.Three images listed here can be annotated with the scene of rivers or forests.
To tackle the above challenges, in this paper, we propose a noise-resilient online multi-classification algorithm to promote the scene classification problem for remote sensing images.Specifically, we generalize the ramp loss designed for a batch learning algorithm, e.g., ramp-Support Vector Machine (SVM), to the online learning setting and employ the Online Gradient Descent (OGD) algorithm to optimize the decision function in the Reproducing Kernel Hilbert Space.To effectively reduce the impact of noises, an adjust strategy to dynamically control the threshold parameter s in ramp loss is given in the proposed algorithm.Large-scale examples are assumed to arrive consecutively one by one without any initial pre-labeled training set for the initialization of classifier.In the online learning procedure, as shown in Figure 2, the parameters of the predictor (classifier) are updated in an iterative manner with sequential incorporated examples.The noise-resilient online multi-classification algorithm we proposed in this paper has two major merits: -Noise-Resilient: by the dynamical setting of threshold parameter s, the noise which would lead to a large loss (larger than the threshold parameter) would be identified and won't be incorporated into the Support Vectors (SVs) set.-Sparsity: as can be seen from Figure 3, only a fraction of examples (with the loss between s and 1) would serve as Support Vectors (SVs).It is designed to reduce the computational cost and enjoy the perfect scalability property.
The remainder of the paper is organized as follows: Section 2 reviews related work along scene classification and online learning.Section 3 introduces the proposed noise-resilient online learning algorithm for scene classification.Section 4 presents experimental results with discussions.Section 5 concludes the whole paper.

Related Work
In this section, we review the related work from two aspects: scene classification and online learning.Scene classification is a fundamental task in the remote sensing image analysis field.The core aim of scene classification is to identify the land-cover categories of remotely sensed image patches.Numerous feature learning algorithms have been presented for scene classification.In the earlier years, feature extraction algorithms including GIST (represents the dominant spatial structure of a scene by a set of perceptual dimensions based on the spatial envelope model) [17], BoVW (bag-of-visual-words) model [11] and VGGNet (based on deep convolutional neural network model) [18], focus on hand-crafted features.Recently, data-driven features are developed via unsupervised feature learning algorithms [1,12,19].For example, Zhang et al. propose a saliency guided unsupervised feature learning approach that is named as an auto-encoder.Romero et al. introduce the highly efficient conforcing lifetime and population sparsity (EPLS) algorithm into the auto-encoder to improve the classification performance [19].A multiple feature-based remote sensing image retrieval approach was proposed in [12] by combining hand-crated features and data-driven features via unsupervised feature learning.In addition, an incremental Bayesian approach has also been presented to address the problem in image processing to learn generative visual models from a few training examples [20].
Instead of training the classifier again from scratch on the combined training set, the online learning algorithm incrementally updates the classifier to incorporate new examples into the learning model [21].In this way, online learning can significantly save on computation costs and be more suitable to deal with large scale problems.In recent years, online learning has been extensively studied in the machine learning community.For example, Song et al. propose an incremental online algorithm to dynamically update the LS-SVM model when a new chunk of samples are incorporated into the SV set [22].Hu et al. use an incremental online variant of the nearest class mean classifier and update the class means incrementally [23].A novel online universal classifier capable of performing the multi-classification problem is proposed in [24].In order to solve the cost-sensitive classification task on the fly, some novel online learning algorithms are proposed [25] to directly optimize different cost-sensitive metrics.

Method
In this section, we propose a noise-resilient online learning algorithm for scene classification of remote sensing images.A vast majority of existing online classification algorithms are mainly designed to learn discriminant information from clean data.However, in the scenario of scene classification, labels of some images could be noisy and erroneous, mainly because of the imperfect human labeling process and the inherent attribute of multiple label (as shown in Figure 1).To enable the online classification on streaming remote sensing images and to alleviate the negative impacts from noisy labels, we generalize the ramp loss designed for batch learning algorithm, i.e., ramp-SVM, to the online learning setting.Next, we propose a novel strategy to dynamically adjust the ramp loss parameter s.

Ramp Loss
In the case of pattern recognition, one argument was that the misclassification rate is poorly approximated by convex losses such as the hinge loss or the least square loss.Researchers proposed non-convex alternatives, such as hard-margin loss, Laplace error penalty [26], normalized sigmoid loss [27], ψ-learning loss [28], ramp loss [29], etc.Among the mentioned non-convex losses, ramp loss which also called truncated hinge loss [30] is an attractive one.The merits of ramp loss proposed by Collobert et al. lie in two folds, i.e., scalability and noise-resilient [29].
Steinwart shows that the number of SVs, i.e., n, increases in classical SVMs and its online version Pegasos linearly with the number of training examples N [31].More specifically, n/N → 2B Φ where B Φ is the best possible error achievable in the chosen feature space Φ(•).Since the SVM training and recognition times grow quickly with the number of SVs, it appears obviously that SVMs cannot deal with large scale data.The curse can be exorcised by replacing the classical hinge loss by a non-convex loss function, e.g., the ramp loss.Shown in Figure 3, replacing hinge loss h(w) by ramp loss (w) guarantees that examples with score w < s wont be selected as SVs.The increased sparsity leads to better scaling properties for ramp-SVMs.Using the ramp loss, Collobert et al. obtained the ramp loss support vector machine (ramp-SVM) [29].
In addition, in classification methodologies, robustness to noise is always an important issue.The effect of noise samples can be significantly large since the penalty given to the outliers by the hinge loss is quite huge.In fact, any convex loss is unbounded.In ramp loss, the loss of any example has an upper bound, so it can control the effect of noisy sample and remove the effect of noise.Plots of hinge loss and ramp loss in Figure 3 show the robustness (noise-resilient) of the ramp loss.
The sparsity and noise-resilient arguments provide the motivation for using ramp loss as the loss function, and using it as a base to develop an online learning algorithm for large scale scene classification problems.

Online Learning Algorithm
One of the most common and well-studied tasks in data mining and knowledge discovery is classification.Over the past decades, a great deal of research has been performed on inductive learning methods for classification such as decision trees, artificial neural networks, and support vector machines.All of these techniques have been successfully applied to a great number of real-world problems.However, their standard application requires the availability of all of the training data at once, making their use for large-scale data mining applications and mining task on streaming data problematic [32].
In recent years, a great deal of attention in the machine learning community has been directed toward online learning methods (shown in Figure 4) such as Forgetron [33], online Passive Aggressive algorithm [34], Projectron [35], bounded online gradient descent algorithm [36], and online soft-margin kernel learning algorithm [37], to name a few.However, these online learning algorithms are proposed based on clean data.On the other hand, there are comparably few studies on online learning from noisy examples, and, in particular, from noisy labels.Online learning is performed in a sequence of consecutive rounds.At each round t, the online learner picks a predictor f to make the prediction f (x t ).When the true label y t is revealed, the online learner suffers from an instantaneous loss ( f ; x t , y t ) and updates the predictor for the next prediction.

Learner
In this work, we investigate the extent to the scenario of examples with noise which are not uncommon in scene categorization.Based on kernel trick and ramp loss, a sparsity and noise-resilient multi-classification algorithm is proposed for scene categorization problem.

Noise-Resilient Online Multi-Classification Algorithm
In this subsection, we introduce a sparse and robust online learning algorithm to perform a scene classification task when images' labels are noisy or even erroneous.Specifically, we first introduce the proposed online learning algorithm for binary classification problem, and then present the general formulation to tackle multi-classification problems.
For similarity, we begin with the binary classification problem.In this scenario, the goal is to learn a series of nonlinear mapping function f (t) : R d → R based on a sequence of examples {(x i , y i )} t i=1 , where t is the current time stamp, d stands for the number of features, x i ∈ R d denotes the feature vector of remote sensing image and y i ∈ {+1, −1} is the scene category of an image.Suppose that images arrive continuously in a streaming fashion, and the online classification algorithm makes the prediction in a sequential way.Specifically, each time when an image arrives, we first apply feature representation algorithm, e.g., Vector of Locally Aggregated Descriptors (VLAD) [38], Visual Geometry Group Descriptors (VGG) [18], Scale Invariant Feature Transform (SIFT) [39], or Spatial Envelope Model (GIST) [17] to obtain its representation vector x t , and then predicts its label as ŷt = sign( f (t−1) (x t )) by the latest decision function f (t−1) .After the true label y t is revealed, the algorithm suffers an instantaneous loss with the specification of loss function as ramp loss.In addition, the online classification algorithm updates the classifier by incorporating the new sample (x t , y t ) for the next round.Here, we assume that the nonlinear mapping f belongs to a Reproducing Kernel Hilbert Space (RKHS) (Given a nonempty set X and a Hilbert space H of functions f : X → R, H is an RKHS [40] endowed with kernel function k : X × X → R if k has the reproducing property:

and k is called the reproducing kernel for H).
Similar to the standard SVMs, our algorithm tries to find the optimal decision function f (t) by optimizing the regularized loss function of examples {(x i , y i )} t i=1 , i.e., Note that Equation ( 2) is not a convex optimization problem, but it can be formulated as a Difference of Convex (DC) programming.The Concave-Convex Procedure (CCCP) [41] may be applied to get the optimal solution.However, it falls into the category of batch learning algorithms and cannot meet the real-time requirement when dealing with streaming data.In the current work, we employ the well known online gradient descent (OGD) [42] framework Equation ( 3) to find the near-optimal solution.It is a trade-off between the accuracy and scalability: Here, stands for the Gâteaus derivative of ramp loss t .We can deduce z t as the following: Substituting the gradient Equation (4) into Equation (3), we deduce the update rule for f (t) as Now, we extend the sparse and noise-resilient online classification algorithm to the case of the multi-classification problem.Assume that there is a sequence of examples {(x i , y i )} t i=1 , where x i ∈ R d is the feature representation of the ith image and y i is the corresponding label that belongs to a label set Y = {1, • • • , c}.Similar to the multi-class SVM formulation proposed by Crammer and Singer [43], the multi-class model is defined as: where f k is the predictor associated with the kth class.Assume that f is a c-dimensional vector with f k as its kth component, i.e., f = [ Similar to the aforementioned binary classification problem, the multi-class online learning algorithm receives examples in a sequential order and updates f continuously.In particular, when we receive the new image x t , our algorithm predicts the label ŷt according to Equation (6).After the prediction, our algorithm receives the true label y t .The instantaneous loss specified by the ramp loss in the case of multi-class scenario is defined as: with the notation of r = arg max k∈Y,k =y t f k (x t ).Given f (t−1) and (x t , y t ), we list the update rule for decision function f according to the deduced OGD framework Equation (5) as: where . In the case of f Substituting the gradient Equation ( 9) into Equation ( 8), we get the update rule for f (t) as One should note that the update rule in the case of binary classification can also be formulated into the framework of multi-classification by the replacement of f by As shown in Equation ( 4), there is a noise-resilient parameter s ranging from (−∞, 1] in the proposed noise-resilient online learning algorithm.The smaller the parameter s, the closer the proposed algorithm is to the classical Pegasos algorithm proposed in [44].Meanwhile, when the parameter is set as 1, the proposed algorithm won't learn from any example and never update the classifier.It is an urgent issue to give a parameter setting strategy to assist the proposed noise-resilient algorithm with adjusting the ramp loss parameter s adaptively.In the current work, we set the parameter as Equation ( 11) and show it in Figure 5.In Equation (11), c stands for the number of categories and n is an estimate number of examples: We summarize the proposed noise-resilient online multi-classification algorithm (Algorithm 1) as follows.

Experiments
In this section, we conduct experiments to evaluate the performance of the proposed noise-resilient online multi-classification algorithm.All experiments are performed in a MATLAB 7.14 environment on a PC with 3.4 GHz Intel Core i5 processors and 8G RAM running under the Windows 10 operating system.The source code of the proposed algorithm will be available upon the acceptance of the manuscript.First, we perform the parameter sensitivity study to show how the ramp loss parameter Predict ŷt = arg max k f Receive true label y t

12:
end if 13: end for s affects the classification results.Then, we present an experiment on synthetic data sets to show the efficacy and efficiency of the proposed method for noisy labels.Finally, we conduct extensive experiments to evaluate the performance of the proposed algorithm on different remote sensing image classification tasks.

Parameter Sensitivity Study
There is an important hyper parameter in the proposed online classification algorithm: ramp loss parameter s.The parameter s controls the sparsity and noise-resilient level of the proposed model.The bigger the parameter s, the sparser the proposed algorithm is, and the less noisy examples will be incorporated into the learning model.However, bigger parameter s will decrease more informative examples and further influence the classification efficacy.To study how this parameter affects the classification result, we conduct an experiment on synthetic data sets.Specifically, we derive a set of synthetic data sets from a real-world data set, i.e., Adult (http://archive.ics.uci.edu/ml/datasets/Adult), which consists of 7579 negative samples and 2372 positive samples, by adding some random noise to the labels.To simulate the case of noisy labels, we randomly change some entries in the label vector y.The percentage of changed labels is varied among {5%, 10%, 15%, 20%}.In this way, we generate synthetic data sets with signal-to-noise ratio (SNR) as 95:5, 90:10, 85:15, 80:20, respectively.In this study, we tune the parameter s from {−0.5, −1, −1.5, −2, −2.5, −3} and draw a 2D performance variation (Average Classification Accuracy, ACA %) figure w.r.t. the different parameter setting of s in Figure 6.We make the following observations from Figure 6: -At the beginning of the online learning process, the bigger s always outperforms small ones.
-On the whole, the higher the noise level is, the worse the performance of the algorithm will be.
On a fixed noise level, e.g., SNR 90:10, a smaller s will incorporate more SVs into the classifier.Among them some are useful examples and the other are noisy examples.Thus, a proper setting for s is the key problem for the proposed noise-resilient online classification algorithm.-The proposed algorithm is sensitive to ramp loss parameter s.In this study, s = −1.5 gives the overall best performance, and s = −3 is the worst one.Any fixed setting of s can not outperform others in all four of the situations.
Regarding this, we propose an adaptive parameter setting strategy in Equation (11) to adjust s dynamically and investigate its performance in the next subsection.

Synthesis Data Sets
We investigate the proposed noise-resilient online classification algorithm on synthetic data sets when label information is noisy.Specifically, we attempt to answer the following two questions: -Sparsity: How sparse is the proposed online classification algorithm for streaming data?-Noise-Resilient: How effective is the proposed online classification algorithm for data with noisy labels?
Our algorithm incorporated with adaptive parameter s is denoted as Ramp adaptive ; meanwhile, the algorithm with fixed parameter settings is denoted as Ramp.We compare the proposed Ramp adaptive and Ramp with the following widely used online learning algorithms: 1. OSELM: Online Sequential ELM (OSELM) is an online version ELM algorithm [45].Using the Sherman-Morrison-Woodbury (SMW) formula, the OSELM can update the predict model extremely fast [46].

Pegasos:
It is an online multi-class SVM algorithm based on stochastic gradient descent (SGD) [44].
3. Perceptron: It is a typical online learning algorithm that belongs to the Perceptron algorithm family [47].
For OSELM, we specify the sigmoid function G(a, b, x) = 1/(1 + exp(−(a T x + b))) as the active function and set the number of hidden neurons as 50.We use the default setting for parameters in Pegasos and Perceptron: the regularization parameter λ is set as 10 −4 for Pegasos and the learning rate parameter η is set as 1 for Perceptron.In Ramp, the parameter s is set to be −1.In the current experiments, the RBF kernel k(x, z) = exp(−γ x − z 2 ) is selected as the kernel function for kernel based learning algorithms.Kernel parameter γ is set as 1  d , where d is the number of features.To simulate a large scale scenario, in the current experiment, we repeat the data of Adults five times.In addition, we randomly change some entries in the label vector with the percentage of {5%, 10%, 15%, 20%}.The classification performances of different algorithms are listed in Figure 7. Figure 7a,b show that the average classification performance of Ramp adaptive is comparable to OSELM, and they outperform other algorithms.In Figure 7c,d, Ramp adaptive outperforms the other algorithms.It indicates that Ramp adaptive is indeed a noise-resilient algorithm that is able to mine discriminative information when the labels contain explicit noise.
To further investigate the superiority of the proposed algorithm on sparsity and efficiency, we compare the number of SVs and speedup rate of the proposed online learning algorithm w.r.t. the state-of-art online learning algorithms, i.e., Pegasos and Perceptron.One should note that the OSELM incrementally incorporates examples to update the learning model.As all of the learning examples serve as the SVs, the OSELM does not belong to the family of sparse learning algorithm.Thus, we did not investigate its sparsity and efficiency here.In Table 1, we show the number of support vectors (SVs) and running time that each method needs to perform online classification.Among the two proposed noise-resilient online classification algorithms, Ramp adaptive uses less SVs than Ramp and only costs half of the running time of Ramp on the four data sets.The proposed Ramp adaptive achieves about 2.7×, 3.3×, 3.8×, and 5.7× sparsity in the case of SNR 95:5, SNR 90:10, SNR 85:15 and SNR 80:20, respectively.As for running time, Ramp adaptive achieves about 3.5×, 5.2×, 5.8×, and 7.5× speedup in the case of SNR 95:5, SNR 90:10, SNR 85:15 and SNR 80:20, respectively.In a nutshell, the proposed online classification algorithm Ramp adaptive is more suitable to scale up among the online kernel learning algorithms.In this experiment, we randomly select 50% of the images from each class to form the training set and the remaining 50% images are used for testing.This procedure is repeated five times and the average performance is finally reported.We use the GIST Descriptor (http://people.csail.mit.edu/torralba/code/spatialenvelope/) to transform an image into a feature vector of 512 dimensions.For OSELM, we conduct comparison experiments on dataset AID7 and Outdoor Scene to check if different settings for OSELM can change the results significantly.We specify the activation function to be sigmoid, sin, rbf, and hardlim, respectively.Meanwhile, set the number of hidden nodes as 50, 100, 200, and 300, respectively.Comparison experiments show that the sigmoid function with the number of hidden neurons as 200 is a good candidate for OSELM without prior knowledge.Without loss of generality, we specify the sigmoid function as active function and set the number of hidden neurons as 200.For kernel based learning algorithms, polynomial kernel k(x, z) = (γx T z + c 0 ) p is selected as the kernel function for it is extensively used for image processing.Here, we set γ to be 1 d , where d is the number of features, c 0 to be 0 and polynomial order p to be 1.
First of all, we show the behavior of the algorithms over time.Figure 9 shows the average online classification accuracy.The average online classification accuracy is the total number of correctly classified samples seen as a function of the number of all samples.From Figure 9, we can draw the following conclusions: (1) kernel based online learning algorithms consistently outperform the OSELM, which validates that polynomial kernel is a good candidate for the image classification problem; (2) Ramp adaptive always beats other kernel based online learning algorithms, i.e., Pegasos and Perceptron; (3) in Figure 9c, the proposed Ramp adaptive clearly shows a big advantage over the state-of-the art online learning methods.For a comprehensive comparison, Table 2 summarizes the frequently used criteria: Overall Accuracy (%), Average Accuracy (%), Kappa and running time of different online learning algorithms.It can be observed from Table 2 that kernel based online learning algorithms significantly improve the performance (Overall Accuracy (%), Average Accuracy (%) and Kappa) compared with OSELM.The three kernel based online learning algorithms achieve similar performance and the proposed Ramp adaptive slightly outperforms others in four data sets.The column of Time (s) shows that OSELM is extremely efficient.The proposed Ramp adaptive costs significantly more running time on small scale data sets AID7, Outdoor Scene, and UC Merced (around 1000 testing samples), which conflicts with the observation in Table 1.
The reason lies in the extra computation of the ramp loss parameter s in Equation (11) per iteration in Ramp adaptive .On small scale data sets, the number of SVs in different online learning algorithms is comparable and so is the iteration number.In this case, the proposed Ramp adaptive costs more running time than Pegasos and Perceptron.As time goes by, more and more learning samples will be misclassified.All of the misclassified samples are selected as SVs and will further be used to update the learning model of Pegasos and Perceptron.In contrast, only a small fraction of misclassified samples will be selected as SVs for the model updating of Ramp adaptive .Thus, the efficiency and sparsity advantages of Ramp adaptive will be fully demonstrated when dealing with large-scale problems.Figure 10 shows the confusion matrix of online learning algorithms OSELM, Pegasos, Perceptron and the proposed Ramp adaptive on the AID7 data set.From the figure, we observe that accuracies above 97% are obtained for all seven of the classes with kernel based online learning approaches.As for the three kernel based online learning algorithms, our proposed Ramp adaptive outperforms Pegasos and Perceptron on the classes "Grass", "Field", "Industry", "RiverLake", "Parking" and Ramp adaptive 's performance is slightly lower than Pegasos and Perceptron upon "Forest" and "Resident".

Conclusions
For a variety of reasons such as multiple label, human errors, etc., noisy labels are inevitable in the scenario of large scale scene classification problems.In this paper, we studied a novel problem on performing online scene classification of remote sensing images and providing a noise-resilient online classification algorithm to incrementally predict the scene category of new images.Due to the fact that less examples are incorporated into the SV set during the learning procedure, the proposed method leads to better sparsity and hence much faster learning speed.The aforementioned merits make it a good candidate for large scale scene classification of remote sensing images.We conduct extensive experiments on both synthetic and real-world data sets to validate the efficiency and efficacy of the proposed algorithm.Though experimental studies shows the potential of the proposed online learning algorithm, relevant theoretical analysis has not been carried out deeply and will be our future investigation focus.In addition, with the increasing number of SVs, the computational efficiency of updating the learning model will decrease gradually.Incorporating the budget strategy can further improve the efficiency of the proposed online learning algorithm and will be another focus of our investigation.

Figure 1 .
Figure 1.Image may be associated with more than one semantic category.Three images listed here can be annotated with the scene of rivers or forests.

Figure 2 .Figure 3 .
Figure 2. Illustration of the online scene classification framework.As time goes by, unlabeled images are assumed to arrive consecutively.A predictor is applied to annotate the images that have arrived.When the true label is revealed, the online learner updates the predictor for the next prediction.() ℎ()

Figure 4 .
Figure 4.An illustration schematic of the online learning algorithm.Online learning is performed in a sequence of consecutive rounds.At each round t, the online learner picks a predictor f to make the prediction f (x t ).When the true label y t is revealed, the online learner suffers from an instantaneous loss ( f ; x t , y t ) and updates the predictor for the next prediction.

Figure 5 .
Figure 5.An illustration of the adaptive parameter setting for s.

Figure 8 .
Figure 8.Some sample images from the UC Merced Landuse data set.

Figure 9 .
Figure 9. Average classification accuracy for different algorithms on (a) AID7; (b) Outdoor Scene; (c) UC Merced and (d) AID30 as a function of the number of learning samples.
the instantaneous loss t is constant, and the gradient is zero.Thus, we don't update the decision function.Otherwise, if s < f (10)rding to Equation(10) 10:

Table 1 .
The #Support Vectors (SVs) and running time of kernel based online learning algorithms.In this section, we will conduct extensive experiments to evaluate the performance of the proposed algorithm on different remote sensing image analysis tasks, including Outdoor Scene categories data set (http://people.csail.mit.edu/torralba/code/spatialenvelope/),UCMerced Landuse data set (http://weegee.vision.ucmerced.edu/datasets/landuse.html),andAerial Image Data (AID) set (www.lmars.whu.edu.cn/xia/AID-project.html).-AID7data set: AID is a large-scale aerial image data set, by collecting sample images from Google Earth imagery.AID7 is made up of the following seven aerial scene types: grass, field, industry, river lake, forest, resident, and parking.The AID7 data set has a number of 2800 images within seven classes and each class contains 400 samples of size 600×600 pixels.-Outdoor Scene categories data set: this data set contains eight outdoor scene categories, i.e., coast, mountain, forest, open country, street, inside city, tall buildings and highways.There are 2600 color images of 256×256 pixels.All of the objects and regions in this data set have been fully labeled.There are more than 29,000 objects.-UC Merced Landuse data set: the images in the UC Merced Landuse data set were manually extracted from large images from the USGS (United States Geological Survey) National Map Urban Area Imagery collection for various urban areas around the country.The pixel resolution of this public domain imagery is one foot.The UC Merced data set contains 2100 images in total and each image measures 256×256 pixels.There are 100 images for each of the following 21 classes: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis court.Some sample images from this data set are shown in Figure 8. -AID30 data set: similar to the AID7 data set, this data set is made up of the following 30 aerial scene types: airport, bareland, baseballfield, beach, bridge, center, church, commercial, dense residential, desert, farmland, forest, industrial, meadow, medium residential, mountain, park, parking, playground, pond, port, railway station, resort, river, school, sparse residential, square, stadium, storage tanks and viaduct.In total, the AID30 data set has a number of 10,000 images within 30 classes and each class contains about 200 to 400 samples of size 600×600 pixels.

Table 2 .
Performance comparison of different algorithms.