Online Least Squares One-Class Support Vector Machines-Based Abnormal Visual Event Detection

The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method.


Introduction
Visual surveillance is one of the major research areas in computer vision. After recording events by a visual sensor, such as a camera, obtaining detailed information of individual or crowd behavior is a challenging object in this area; automatic abnormal event detection is required to provide convenience, safety and an efficient lifestyle for humanity [1]. An abnormal event is defined as behavior deviating from what one expects. For example, a pedestrian panic in a public region: the people are running in the plaza, where people are usually strolling. As shown in Figure 1a, a normal scene is illustrated, the people are walking. In Figure 1b, the people are suddenly running in different directions; this scene is considered abnormal. If a system can detect this event, which might imply a safety risk, security staff can be alerted to take emergency response procedures. The abnormal event detection is a content-based video analysis problem; it includes two technologies: a feature representation of an event model and an abnormal event detection approach. In [2][3][4], abnormal detection approaches with behavioral models were introduced. The behavior pattern modeled by adopting optical flow or pixel change history (PCH) was represented by Bayesian models. In [5], the motion feature, including the position, direction and velocity, was modeled by latent Dirichlet allocation. In [6], the abnormal vehicle behavior at intersections was detected via a stochastic graph model based on the Markovian approach. The behavior was labeled as abnormal when the current motion pattern cannot be recognized as any state of the system or a particular sequence of states cannot be parsed with the stochastic model. These works relied on an explicit signal statistical model, and the abnormal events were the ones interpreted as statistical model abrupt changes, maximum likelihood or Bayesian estimation theory [7]. The signal model together with probabilistic assumption techniques are usually extremely powerful insofar as an accurate model exists; these methods are effective in several scenarios. However, there are various situations where a robust and tractable model cannot be obtained. This raises the need for model-free methods.
On the other hand, low-level motion features were employed. In [8], the authors presented an algorithm monitoring optical flow in a set of fixed spatial positions. The similarity of the model was computed to detect abnormal patches. In [9], the irregular behavior of images and videos was detected by comparing the likelihood of patches via a probabilistic graphical model. These methods based on separated patches, benefiting from the partial knowledge of the image, do not exploit the global information of the frame.
Trajectory information is also adopted to detect abnormal events. In [10,11], the authors presented a method for anomalous event detection by means of trajectory analysis. The trajectories were subsampled to a fixed-dimension vector representation and clustered with an one-class support vector machine (SVM). In [12], alarm detection of traffic was performed on the basis of the parameters of the moving objects and their trajectories by using semantic reasoning and ontologies. In [13], vision-based abnormal events for home healthcare systems were detected by using shape feature variation and 3D trajectory. Tracking-based algorithms are likely to fail in crowded scenes.
We consider the model-free approach, which does not require an explicit statistical model. To be accurate, the support vector machine (SVM) classification method is relied on in this paper. Inspired by the satisfactory performance of a covariance feature descriptor representing object in a tracking problem, a covariance descriptor characterizes the moving information of a global frame. In a tracking problem, the covariance descriptor is constructed of the blob intensity or color for template matching. In this paper, covariance encodes the optical flow of the global frame.
The rest of the paper is organized as follows. In Section 2, related works are briefly reviewed. In Section 3, the online least squares one-class support vector machine (online LS-OC-SVM) classification method is originally derived. In Section 4, a covariance matrix descriptor is described to provide feature vectors for the classification algorithm. In Section 5, we propose abnormal detection methods based on the online LS-OC-SVM. In Section 6, we present the results on synthetic data and real-world video scenes. Finally, Section 7 concludes the paper.

Related Work
SVM is usually trained in a batch model, i.e., all training data are given a priori and are learned together. If additional training data arrive afterward, the SVM must be retrained from scratch [14]. In the problem of abnormal event detection in video surveillance, the normal sequence for training may last for a long time. It is impractical to train the big training set of normal samples as one batch together. If a new datum is added to a large training set, it will likely have only a minimal effect on the previous decision surface. Resolving the problem from scratch seems computationally wasteful. Considering these two aspects, the online strategy is considered in our work to adapt to the computational and the memory requirement.
Some online learning algorithms for SVM were derived based on analyzing the change of Karush-Kuhn-Tucker (KKT) conditions while updating the classifier. In [15], new arrival data along with the data violating the KKT conditions, and the support vectors from the last iteration, were considered as a new training dataset to train the classifier at the current step. The iteration will be stopped when all data satisfy the KKT conditions. In [16,17], the authors analyzed the change of the KKT conditions when one datum was included into, or removed from, the training set; then, a so-called bookkeeping step was used to compute the new coefficients of the classifier to achieve an online update for a two-class SVM. Useful implementation issues on incremental SVM were presented in [18].
In [7], it was argued that the binary classification algorithm in [16] cannot be directly implemented for a one-class problem. In [7,19], the authors considered the change of the normal model over time and online identified outliers using previous data vectors in a sliding time window. Two one-class SVM classifiers, which preceded and followed the present instant, were compared. A change in the statistics of the time series was likely to occur when the resulting machines were different. The sliding time window approach was considered in [20], with an application on wireless sensor networks. This method, adopting sliding window formulation, is not inherently online, since it requires repeated batch training of new machines.
In [21], an online one-class SVM was presented following the idea of [22]: an exponential window was applied to the data to suit it to an adaptive scenario where the solution was able to track the changes of the data distribution and to forget old patterns. This algorithm is based on the slow-varying assumption.
Some online one-class SVM classification methods were proposed based on support vector data description (SVDD) [23,24], the hypersphere one-class SVM formulation. In [25], an online one-class classification method was proposed, a least squares optimization problem was considered and the model complexity was controlled by the coherence criterion. In [26], a method was proposed to reduce space and time complexities. It reduced the training set size during the training procedure by removing data having a high probability of becoming non-support-vectors.
In order to sidestep the difficulty in the nature of the constrained quadratic optimization problem, we derive an online version of the hyperplane one-class SVM [27] based on the least squares regularization. In the least squares SVM version, one finds the solution by solving a linear system instead of a quadratic programming problem. This advantage comes from the use of equality instead of inequality constraints in the problem formulation [28]. Least squares one-class SVM (LS-OC-SVM) was proposed in [29], without considering the sparsity of the hyperplane representation. It is thus inappropriate to detect abnormal events online. In the following, we shall derive an online version of the least squares one-class SVM, then propose a sparsification representation of the detector.

Classification
In this section, we introduce the derivation of the proposed online least square one-class support vector machine (online LS-OC-SVM). In abnormal detection problems, it is supposed that the samples from a positive class are obtainable. A density will only exist if the underlying probability measure possesses an absolutely continuous distribution function, but the general problem of estimating the measure for a large class of sets is not solvable [27,30]. The one-class SVM framework is then suitable to the specificity of the abnormal event detection where only normal scene data are available. Support vector machine (SVM) was initially proposed by Vapnik and Lerner [31], attempting to find a compromise between the minimization of empirical risk and the prevention of overfitting. By applying a kernel trick, SVM can handle nonlinear classification problems [10,[32][33][34]. Based on the theoretical foundation of SVM and the soft-margin trick [35,36], one-class SVM is proposed to address the problem where only one-category (the positive) samples with a few outliers are available. In this section, after a brief review of one-class SVM and least squares one-class SVM on a batch model, an online training algorithm is proposed. A sparsified version of the algorithm will then be provided for further adapting to critical online requirements.

One-Class SVM
One-class SVM (OC-SVM) aims to determine a suitable region in the input data space, X , which includes most of the samples drawn from an unknown probability distribution, P . It detects objects that resemble training samples. The hypersphere one-class SVM was proposed in [23,24]. It identified outliers by fitting a hypersphere with a minimal radius. The hyperplane one-class SVM was an extended version of the original SVM to one-class problems [27,36]. It identified outliers by fitting a hyperplane from the origin. In our work, we adopt the hyperplane one-class SVM, which is formulated as a constrained minimization optimization problem: . . n}, are n training samples in the input data space, X , and ξ i is the slack variable for penalizing the outliers. The hyperparameter, C, is the weight for restraining the slack variable. It tunes the number of acceptable outliers and, thus, enables the analyzing of noisy data points. · denotes the Euclidean norm of a vector. The decision hyperplane is given by the equation: The nonlinear function, Φ : X → H, maps datum x i from the input space, X , into the feature space, H, which allows us to solve a nonlinear classification problem by designing a linear classifier in the feature space. w defines a hyperplane in the feature space separating the projections of training data from the origin. A positive definite kernel function, κ, is defined as κ(x, x ′ ) = Φ(x), Φ(x ′ ) , which implicitly maps the training or testing data, x, into a higher (possibly infinite) dimensional feature space. Introducing the Lagrangian multipliers, α i , the decision function in the input data space, X , is given by: if f (x) = −1, the datum, x, is classified as abnormal; otherwise, x is classified as normal.

Least Squares One-Class SVM
Least squares SVM (LS-SVM) was proposed by Suykens in [37,38]. By using the quadratic loss function, Choi proposed least squares one-class SVM (LS-OC-SVM) [29]. LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. It can be written as the following objective function: The condition for the slack variables in OC-SVM, ξ i ≥ 0, is no longer in need. The variable, ξ i , represents an error caused by a training object, x i , with respect to the hyperplane. The definitions of the other parameters in Equation (4) are the same as the ones in OC-SVM. The associated Lagrange is: Setting derivatives of Equation (5) with respective to primal variables, w, ξ i , ρ and α i , to zero, we have the following stationarity conditions: Substituting Equations (6)-(8) into (9) yields: For all i = 1, 2, . . . , n, we can rewrite Equation (10) in matrix form as: where K is the Gram matrix with (i, j)-th entry κ(x i , x j ), I is the identity matrix with the same dimension as Gram matrix K and α is the column vector with i-th entry α i for training sample x i . 1 and 0 are all-one and all-zero column vectors, respectively, with compatible lengths. The parameters, α and ρ, could be obtained by: The hyperplane is then described by: The distance, dis(x), of a datum, x, with respect to the hyperplane is calculated by: where x i is a training sample, α is the two-norm of vector α. An object with a low dis(x) value lies close to the hyperplane thus resembles the training set better than other objects with high dix(x) values. The distance, dis(x), is used as a proximity measure to determine the normal and abnormal class of the data [29].

Online Least Squares One-Class SVM
In an online learning scheme, the training data continuously arrive. We thus need to tune hyperparameters in the objective function and the hypothesis class in an online manner [17]. Let α n , K n and I n denote the coefficient, Gram matrix and identity matrix at the time step, n, respectively. The parameters of LS-OC-SVM [α n − ρ n ] ⊤ at the time step, n, could be calculated as: In order to proceed, recall the matrix inverse identity for matrices A, B, C and D with suitable sizes [39]: The matrix, K n , with diagonal loading In C can be calculated recursively with respect to time step n by: where κ n+1 is the column vector with i-th entry κ(x i , x n+1 ), i ∈ {1, 2, . . . , n}, and κ n+1 = κ(x n+1 , x n+1 ). Based on Equations (15) and (17), we arrive at an online implementation of LS-OC-SVM.

Sparse Online Least Squares One-Class SVM
The procedures for calculating the parameters, α and ρ, of LS-OC-SVM in Section 3.3 lose sparseness, due to the quadratic loss function in the objective function Equation (4). This formulation is inappropriate for large-scale data and unsuitable for online learning, as the number of training samples grows infinitely [25]. We propose a sparse solution to provide a robust formulation. A dictionary is adopted to address the sparse approximation problem [40].
Instead of Equation (6), where w is expressed with all available data, we intend to approximate it by adopting a dictionary in a sparse way. Consider a dictionary, x D , D ⊂ {1, 2, . . . , n}, of size D with elements x w j , j ∈ D. Instead of Equation (6), we approximate w with these D dictionary elements: The hyperplane becomes: In sparse online LS-OC-SVM, the distance, dis D (x), of a datum, x, to the hyperplane is: where x w j is a dictionary element and β is the column vector with the entries, β j . Replacing Expression (20) into Lagrange Function (5), we have: Taking the derivatives of the Function (23) with respect to primal variables, β, ξ i , ρ and α i , yields: The matrix form for Condition (27) is written: Replacing Conditions (24) and (25) into (28) leads to: Combining Equations (26) and (29), the equation for computing coefficients α − ρ ⊤ becomes: After providing these relations with the dictionary, we now discuss the dictionary construction. The coherence criterion is adopted to characterize a dictionary in sparse approximation problems. It provides an elegant model reduction criterion with a less computationally-demanding procedure [25,40,41]. The coherence of a dictionary is defined as the largest correlation between the elements in the dictionary, i.e., In the online case, the coherence between a new datum and the current dictionary is calculated by: where x w j is the element in the dictionary, x D . Presetting a threshold, µ 0 , the new arrival sample, x t , at the time step, t, is tested with the coherence criterion to judge whether the dictionary remains unchanged or is incremented by including the new element. For n training samples, the subset, which includes m (1 ≤ m ≪ n) samples, is considered the initial dictionary. Then, each remaining sample is tested with Equation (32) to determine the relation between itself and the previous dictionary. If ǫ t ≤ µ 0 , it will be included into the dictionary. Concretely, the algorithm is performed with two cases described herein below.
First case: ǫ t > µ 0 In this case, at time step n + 1, the new data, x n+1 , is not included into the dictionary. The Gram matrix, K D , with the entries, κ(x i , x j ), i, j ∈ {1, 2, . . . , D}, is unchanged. When a new sample, x, arrives, we need to compute: where at time step n + 1, κ is the column vector with entries κ( Second case: ǫ t ≤ µ 0 In this case, the new data, x n+1 , is added into the dictionary, x D . Then, the Gram matrix should be changed by: where K D is the Gram matrix of the dictionary, including the new arrival dictionary sample, x n+1 , and K D is the Gram matrix of the dictionary at the last time step, n. Let x D = {x w 1 , x w 2 , . . . , x w D } denote the dictionary at time step n; d is the column vector with entries d j = κ(x, x w j ), j ∈ {1, 2, . . . , D}, and d = κ(x n+1 , x n+1 ). By adopting the matrix inverse identity Equation (16), we have: where: Because the dictionary changes, the value of K D (x) and also K D (x)K −1 C −1 at time step n + 1; we have: where at time step n + 1, q is the column vector with entries q i = κ(x i , x D+1 ), i ∈ {1, 2, . . . , n}, and x D+1 is the new arrival datum x n+1 , which is included into the dictionary. The matrix inverse in Equation (39) can be calculated by using four-times Woodbury identity: with proper choices of matrices A, U, C and V , such that U and V should be chosen as two vectors, and A should be chosen as a scaler. Thus, the inverse, (C −1 + V A −1 U), is a scaler; Equation (39) can be calculated very efficiently. For instance, for computing the inverse, including the term, (K D (x)bq ⊤ ), we regard two vectors, (K D (x)b) and q ⊤ , as vector U and V , respectively, while C in Equation (41) is one.

Covariance Descriptor of Frame Behavior
The optical flow is chosen as the basic low-level feature to represent the movement direction and amplitude. We apply the Horn-Schunck (HS) method to compute optical flow in this paper. The optical flow can provide important information about the spatial arrangement of the object and the change rate of this arrangement [42]. The optical flow of a gray image is formulated as the minimization of the following global energy functional: where I is the intensity of the image, I x ,I y and I t are the derivatives of the image intensity value along the x, y and time t dimension, u and v are the components of the optical flow in horizontal and vertical direction and γ represents the weight of the regularization term. The covariance feature descriptor was originally proposed by Tuzel [43] for pattern matching in a target tracking problem. Owing to its good performance, the covariance descriptor encoding the optical flow is introduced to represent the global movement of the frame. A feature is defined as: where I is the image (which could be gray, red-green-blue (RGB) , hue-saturation-value (HSV) , hue-lightness-saturation (HLS) , etc), φ i is a mapping relating the image with the i − th feature, F is the W × H × d dimension feature, W is the image width, H is the image height and d is the number of the chosen features. For each frame, the feature, F , can be represented as the d × d covariance matrix: where n is the number of the pixels sampled in the frame, µ is the mean of n feature vectors of the selected points and z k is the feature vector of the k − th point. C is the covariance matrix of the feature vector, F . The covariance matrix descriptor proposes a way to merge multiple features. Different choices of feature vectors are shown in Table 1, where u and v are horizontal and vertical components of optical flow, u x and v x are the first derivatives of horizontal and vertical optical flow in the x direction, respectively, u y and v y are the first derivatives of the corresponding feature in the y direction, u xx and v xx are the second derivatives in x direction and u yy and v yy are the second derivatives in y direction. The flowchart of covariance matrix descriptor computation is shown in Figure 2. The optical flow and corresponding partial derivative characterize the inter-frame information or can be regarded as the movement information.  Table 1. Different choices of feature F to construct the covariance descriptor.

Feature Vector
If proper parameters are given, classical kernels, such as Gaussian, polynomial and sigmoidal kernels, have similar performances [44]. In our work, the Gaussian kernel κ(x i , x j ) = exp(− The covariance matrix is an element in the Lie group; the Gaussian kernel in Euclidean spaces is not suitable. The Gaussian kernel in the Lie group is defined as [45,46]: where X i and X j are matrices in Lie group G; the parameter σ determines the scale at which the data is probed.

Abnormal Event Detection
In an abnormal event detection problem, it is assumed that a set of training frames, {I 1 , I 2 , . . . , I n } (the positive class), describing the normal behavior is obtained. The abnormal detection strategies relative to the online algorithms proposed in Section 3.3 and Section 3.4 are introduced below.

Online LS-OC-SVM Strategy
The general architecture of the abnormal event detection method via online least squares one-class SVM (online LS-OC-SVM) proposed in Section 3.3 is summarized in Algorithm 1; the flowchart is shown in Figure 3 and explained below.
(b) Sparse online strategy: Applying LS-OC-SVM to train the initial dictionary, C D , offline.
(a) Online strategy: Applying online LS-OC-SVM on the remaining samples to calculate the coefficient matrix.
(b) Sparse online strategy: Applying sparse online LS-OC-SVM on the remaining samples to calculate the coefficient matrix and to update the dictionary.
Each frame C n+l is classified via LS-OC-SVM.

one-class SVM
Step 1: The first step consists of calculating the covariance matrix descriptor of the training frames. The features could be chosen as any form shown in Table 1. This step can be generalized as: where {OP 1 , OP 2 , . . . , OP n } are the image optical flows of the 1st to n − th frames; {C 1 , C 2 , . . . , C n } are the covariance matrix descriptors.
Step 2: The second step is applying LS-OC-SVM on a small subset of the training samples to calculate the coefficient parameters, α and ρ, in Equation (11). Consider a subset {C i } m i=1 , 1 ≤ m ≪ n of data selected from the training set {C i } n i=1 . Without loss of generality, assume that the first m frames are chosen. These m samples are trained offline. This step can be described in the following equation: where K and α − ρ ⊤ are defined in Equation (11).
Step 3: After learning the first m samples, the coefficient matrices, K and α − ρ ⊤ , are obtained. The online LS-OC-SVM method (Section 3.3) is applied to learn the remaining n−m samples {C m+1 , C m+2 . . . C n }. This step can be expressed as: Step 4: Based on the coefficient matrix, α − ρ ⊤ , the distance of the training samples and the incoming test sample, C n+l , with respect to the decision plane is computed. By comparing the distances of the samples, an abnormal event is detected: where C n+l is the covariance matrix descriptor of the (n + l) − th frame needed to be classified, and C i is the sample of the training data. "1" corresponds to an abnormal frame; " − 1" corresponds to a normal frame. T dis is the threshold of the distance, it is the maximum distance of the training samples to the hyperplane.

Sparse Online LS-OC-SVM Strategy
The abnormal event detection via sparse online least squares one-class SVM (sparse online LS-OC-SVM) is introduced below. A subset of the samples is chosen to form the dictionary, C D , making a sparse representation of the training data. The initial dictionary, C D , is learned offline. Each remaining training sample is learned one-by-one online. Meanwhile, it is checked to be included, or not, into the dictionary. The test datum is classified based on the dictionary. The feature extraction step (Step 1) and the detection step (Step 4) are the same as the ones presented in Section 5.1. Owing to the dictionary, the training steps are different.
Step 2-sparse: The second step is applying LS-OC-SVM to train the initial dictionary offline. The first m samples are the initial dictionary denoted as C D . This step can be generalized as: Step 3-sparse: After learning the initial dictionary, C D , including the first m (1 ≤ m ≪ n) samples, the remaining training samples, {C m+1 , C m+2 , . . . , C n }, are learned via sparse online LS-OC-SVM described in Section 3.4. This step can be described in the following equations: where C D is the dictionary and C k is a new incoming remaining sample in the training dataset. According to the coherence criterion introduced in Section 3.4, if the new sample, C k , satisfies the dictionary updated condition, it will be included into the dictionary, C D .

Abnormal Event Detection Results
This section presents the results of experiments conducted to illustrate the performance of the two proposed classification algorithms, online least square one-class SVM (online LS-OC-SVM) and sparse online least square one-class SVM (sparse online LS-OC-SVM). The two-dimensional synthetic distribution dataset and the University of Minnesota (UMN) [47] dataset are used.

Synthetic Dataset via Online LS-OC-SVM and Sparse Online LS-OC-SVM
Two synthetic data, "square" and "ring-line-square" [48], are used. The "square" consists of four lines, 2.2 in length and 0.2 in width. In the area of these lines, 400 points were randomly dispersed with a uniform distribution. The "ring-line-square" distribution is composed of three parts: a ring with an inner diameter of 1.0 and an outer diameter of 2.0, a line of 1.6 in length and 0.2 in width, and a square the same as dataset "square" introduced above. 850 points are randomly dispersed with a uniform distribution. These two data are shown in Figure 4. The first sample is used for initializing the online LS-OC-SVM proposed in Section 3.3; the 399 remaining samples in "square" and 849 remaining samples in "ring-ling-square" are learned in the online manner.
Via the sparse online LS-OC-SVM method proposed in Section 3.4, the first sample is trained offline, and this sample is considered the initial dictionary. Then, each arrival sample in 399 remaining samples in "square" and 849 remaining samples in "ring-ling-square" are checked by the coherence criterion to determine whether the dictionary should be retained or updated by including the new element.
The distances are shown in contours illustrating the boundary. The contours of "square" and "ring-line-square" are shown in Figures 5 and 6, respectively. Gaussian kernel was used in these two data, with bandwidth σ = 0.065. The preset threshold of the coherence criterion is µ 0 = 0.08. The detection results obtained by these two online training algorithms are the same as the ones when training data were learned in a batch model.    Figure 8 and Figure 9, respectively. A Gaussian kernel for the covariance matrix in the Lie group is used. Various values of the variance, σ, in the Gaussian function and the penalty factor, C, are chosen to form the receiver operating characteristic (ROC) curve. In the indoor scene, time lags of the frame labels lead to the lower area under the ROC curve (AUC) value. In the last few frames, labeled as abnormal of abnormal sequences, there are no people, while, in the training samples, there are no people in the upper half of the image. The covariance of the training frame is similar to the covariance of the abnormal frame without people. Our covariance feature descriptor-based classification method cannot distinguish between these two situations. However, this issue can be resolved by utilizing the foreground information. For example, if there are no moving objects in the frame, this frame is immediately classified as abnormal. The results of these three scenes show that the covariance descriptor can distinguish between normal and abnormal events. The performance of online LS-OC-SVM is almost the same as that of the offline method.

Abnormal Visual Event Detection via Sparse Online LS-OC-SVM
UMN dataset abnormal event detection results via sparse online LS-OC-SVM proposed in Section 3.4 are presented. Taking the lawn scene as an example, the first normal covariance matrix descriptor from the training samples is included into the dictionary firstly; then, the remaining training covariance descriptors are learned online by the sparse online LS-OC-SVM method. The ROC curve of the detection results of the lawn scene, the indoor scene and the plaza scene are shown in Figure 10a   The resulting performances of the covariance matrix descriptor-based online least squares one-class SVM method, and of state-of-the-art methods, are shown in Table 3. The covariance matrix-based online abnormal frame detection method obtains competitive performance. In generally, our sparse online LS-OC-SVM method is better than others, except sparse reconstruction cost (SRC) [49]. In that paper, multi-scale histogram of optical flow (HOF) was taken as a feature and a testing sample was classified by its sparse reconstruction cost, through a weighted linear reconstruction of the over-complete normal basis set. However, the computation of the HOF takes more time than the computation of covariance. By adopting the integral image [43], the covariance matrix descriptor of the subimage can be computed conveniently. The covariance descriptor can appropriately be used to analyze partial image movement. In [49], the whole training dataset was saved in the memory in advance; then, the dictionary was chosen as an optimal subset for reconstructing. Our sparse online LS-OC-SVM strategy enables one to train the classifier with sequential inputs. This property makes our proposed method extremely suitable to handle large volumes of training data, while the method in [49] fails to work due to lack of memory. Table 3. The comparison of our proposed method with state-of-the-art methods for abnormal event detection in the UMN dataset. NN, nearest neighbor. SRC, sparse reconstruction cost. STCOG, spatial-temporal co-occurrence Gaussian mixture models.

Conclusions
In this paper, we proposed a method to detect abnormal events via online least squares one-class SVM (online LS-OC-SVM) and sparse online least squares one-class SVM (sparse online LS-OC-SVM). Online LS-OC-SVM learns training samples sequentially; sparse online LS-OC-SVM incorporates the coherence criterion to form the dictionary for a sparse representation of the detector. The covariance matrix descriptor encodes the movement feature of the frame to distinguish between normal and abnormal events. The proposed detection algorithms have been tested on a synthetic dataset and a real-world video dataset yielding successful results in detecting abnormal events.

Conflicts of Interest
The authors declare no conflicts of interest.