Open Access
This article is
 freely available
 reusable
Algorithms 2010, 3(1), 120; https://doi.org/10.3390/a3010001
Article
A Clinical Decision Support Framework for Incremental Polyps Classification in Virtual Colonoscopy
^{1}
Electrical and Computer Engineering Department, American University of Beirut, PO Box 11 0236, Riad El Solh, Beirut 1107 2020, Lebanon
^{2}
School of Engineering, Virginia Commonwealth University, Richmond, VA, USA
^{3}
Department of Radiology, Harvard Medical School and Massachusetts General Hospital, Boston, MA 02114, USA
^{*}
Author to whom correspondence should be addressed.
Received: 28 September 2009 / Accepted: 6 October 2009 / Published: 4 January 2010
Abstract
:We present in this paper a novel dynamic learning method for classifying polyp candidate detections in Computed Tomographic Colonography (CTC) using an adaptation of the Least Square Support Vector Machine (LSSVM). The proposed technique, called Weighted Proximal Support Vector Machines (WPSVM), extends the offline capabilities of the SVM scheme to address practical CTC applications. Incremental data are incorporated in the WPSVM as a weighted vector space, and the only storage requirements are the hyperplane parameters. WPSVM performance evaluation based on 169 clinical CTC cases using a 3D computeraided diagnosis (CAD) scheme for feature reduction comparable favorably with previously published CTC CAD studies that have however involved only binary and offline classification schemes. The experimental results obtained from iteratively applying WPSVM to improve detection sensitivity demonstrate its viability for incremental learning, thereby motivating further follow on research to address a wider range of true positive subclasses such as pedunculated, sessile, and flat polyps, and over a wider range of false positive subclasses such as folds, stool, and tagged materials.
Keywords:
support vector machine; machine learning; medical image analysis; computeraided detection; dynamic multiclassification and unbalanced data sets1. Introduction
Due to the recent advancements in Computed Tomography (CT) technology, and with more than 57,000 colon cancer deaths per year in the United States, CT Colonography (CTC), also known as virtual colonoscopy, is becoming a promising tool for early diagnosis of colon cancer. CTC is a minimally invasive technique that detects colorectal polyps and masses based on the CT scans of distended colon [1]. One of the major obstacles for CTC to be an effective tool for detecting polyps is that radiologists’ expertise is required for analyzing the CTC images, in particular, for the detection of small polyps. Because diagnosis interpretation is a complicated task and any erroneous decision may lead to painful consequences for patients, computeraided detection (CAD) of polyps would provide clinical decision support systems that benefit patients from correct clinical decisions by reducing the variability of the detection accuracy among radiologists [1]. Such a CAD system typically employs a shapebased method for the initial detection of polyp candidates, followed by a machine learning (ML) method for the classification of polyps from nonpolyps (normal colonic structures). The CAD system then generates the final list of polyps that are provided to the radiologist as a “second opinion” [2]. Typically, the input to a CAD system is a large number of CT images, ranging from 300 to 3000 images per patient.
The large amount of data is one of the major obstacles for the ML method in any CAD system to be trained. Moreover, to update the CAD system, the ML method needs to be retrained when new CTC patient image data become available. Therefore, the need to scale up inductive learning algorithms in CAD systems is drastically increasing in order to extract valid and novel patterns from incremental data without a major ML retrain. Dynamic, incremental, or online learning refers, in this context, to the situation where a training image data set is not fully available at the beginning of the learning process. The data can arrive at different time intervals and need to be incorporated into the training data to preserve the class concept. Thus, constructing a ML method capable of incremental classification as opposed to batchmode learning is very attractive and will become a strategic necessity for CTC because of a few reasons. First, the training period is the most significant resourceintensive element in ML. Second, the CTC data has a continuous, large and unbalanced stream by nature which makes it an ideal candidate for online learning.
To the best of our knowledge, no prior work has addressed incremental multiclassification of polyps using a support vector machine (SVM) approach within the framework of dynamic learning. The novel method we present in this work is called Weighted Proximal Support Vector Machine (WPSVM) and extends traditional SVM beyond its existing static learning context to handle dynamic and multiple classifications of unbalanced data sets of polyps. The selection of SVM as a machine learning tool for assisting in CTC clinical decisions stems from several of its main advantages: SVM has its roots in statistical learning theory which ensures strong learning and generalization capabilities. It is computationally efficient in learning a hyperplane that correctly classifies the high dimensional feature space and it is also highly resistant to noisy data [3].
The remainder of this paper is organized as follows: Section 2 presents an overview of multiclassification Least Square SVM (LSSVM) principles. Section 3 covers our proposed multiclassification WPSVM techniques for unbalanced data sets. Section 4 validates the effectiveness of WPSVM in terms of detection performance, computation time, and storage requirements. Finally, Section 5 concludes this work with remarks and outlines for future research.
2. MultiClassification LSSVM Survey
SVM technique as invented by Boser, Guyon and Vapnik was first introduced during the Computational Learning Theory (COLT) conference of 1992 [4] and since then it has established itself as one of the leading approaches in the pattern recognition and the machine learning areas, as demonstrated by the results obtained by a broad range of practical applications and recent research work [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. In terms of structural or representational capacity, SVM behaves like a neural network, but it differs in the learning technique.SVM solves a quadratic programming (QP) problem and finds a computationally efficient way of learning a hyperplane that correctly classifies the highdimensional feature space after using a linear combination of kernel functions that have to be positive definite [28]. With kernel functions centered on the training input data, SVM minimizes the confidence interval and keeps the training error e fixed while retaining only the support vectors (SV) from the input data.
For instance, given a classical linearly separable multiclassification task with attributes or feature sets <f_{1},f_{2},...f_{f}> defined as ${\left\{{x}_{i},{y}_{i}\right\}}_{i=1}^{N}$, where ${x}_{i}\in {R}^{f}$ represents the i^{th} input image and y_{i} the output class, an LSSVM as introduced in [8] optimizes the objective function
where λ is a suitable positive penalty parameter that controls the tradeoff between the classification error e of the c different classes and the margin maximization during the training phase. The error term e, often referred to as the slack variable, accounts for the nonseparable data points. Hsu and Lin [29] showed that SVM accuracy rates in general are influenced by the selection of λ which varies depending on the problem under investigation. The selection of λ can be found heuristically or by a grid search.
$$\mathrm{Objective\; function:}\frac{1}{2}{\displaystyle \sum _{m=1}^{c}{w}_{m}^{T}\cdot {w}_{m}}+\frac{\lambda}{c}{{\displaystyle \sum _{i=1}^{N}{\displaystyle \sum _{m\ne {y}_{i}}^{c}({e}_{i}^{m})}}}^{2}$$
Depending on the decomposing strategy for converting the multiclassification problem into a set of binary ones, the ML constraint can be formulated as a multiclassification objective function, oneversusrest, pairwise, or errorcorrecting output code [9]. The multiclassification objective function has probably the most compact form as it optimizes the problem into a one single step. It constructs c twoclass rules, where each classifier separates training vectors of class y_{i} from the other m classes using the constraint of the hyperplanes defined by their slopes w and intercepts b:
$$\mathrm{Constraint}:\text{}{w}_{{y}_{i}}^{T}{x}_{i}+{b}_{{y}_{i}}\ge {w}_{m}^{T}{x}_{i}+{b}_{m}2{e}_{i}^{m}$$
LSSVM classifiers reduce the optimization problem from a QP to a linear one and optimize the Lagrangian as:
where α_{i} represent the Lagrange multipliers which can be either positive or negative. These parameters are derived from the Karush KuhnTucker (KKT) conditions that are valid as long as the objective function and conditions stay convex [28]. The LSSVM solution can be rewritten as a linear system of equations in a matrix format as:
where $Z=({x}_{1}^{T}{y}_{1};\dots ;{x}_{N}^{T}{y}_{N})$, $Y=[{y}_{1};\dots ;{y}_{N}]$, $I=[1;\dots 1]$, and $\alpha =[{\alpha}_{1};\dots ;{\alpha}_{N}]$
$${L}_{p}(w,b,e,\alpha )=\frac{1}{2}{w}^{T}w+\frac{\lambda}{c}{{\displaystyle \sum _{i=1}^{N}{\displaystyle \sum _{m\ne {y}_{i}}^{c}({e}_{i}^{m})}}}^{2}{\displaystyle \sum _{i=1}^{N}{\alpha}_{i}{\displaystyle \sum _{m\ne {y}_{i}}^{c}(({w}_{{y}_{i}}{w}_{m}){x}_{i}+({b}_{{y}_{i}}{b}_{m})2}}+{e}_{i})$$
$$\left[\begin{array}{cc}0& {Y}^{T}\\ Y& Z{Z}^{T}+\raisebox{1ex}{${\lambda}^{1}I$}\!\left/ \!\raisebox{1ex}{$c$}\right.\end{array}\right]\left[\begin{array}{c}b\\ \alpha \end{array}\right]=\left[\begin{array}{c}0\\ I\end{array}\right]$$
3. Proposed multiclassification WPSVM
3.1. Proposed MultiClassification
We propose several novel modifications to the standard multiclassification LSSVM as highlighted in Section 2. First, we modify the objective function represented by Eq. (1) by adding the plane intercept b in order to uniquely define the hyperplane by its slope and intercept. Second, because the input data can be unbalanced with respect to the class distribution, and because the penalty parameter of Eq. (1) could be biased towards controlling the overall error term e at the expense of specific classes, we include a new controlling parameter ζ which acts as a local penalty variable for each class. The proposed new objective function is
$$\frac{1}{2}{\displaystyle \sum _{m=1}^{c}({w}_{m}^{T}\cdot {w}_{m}}+{b}_{m}\cdot {b}_{m})+\frac{\lambda}{2c}{{\displaystyle \sum _{i=1}^{N}{\displaystyle \sum _{m\ne {y}_{i}}^{c}({\zeta}^{m}{e}_{i}^{m})}}}^{2}$$
We also modify the constraint relationship between different classes in the multiclassification objective function of Eq. (2) to be equality instead of inequality:
$$({w}_{{y}_{i}}^{T}\cdot {x}_{i})+{b}_{{y}_{i}}=({w}_{m}^{T}\cdot {x}_{i})+{b}_{m}+2{e}_{i}^{m}$$
Furthermore, instead of incorporating the constraint function into the objective function as proposed by LSSVM, we use Eq.(6) to find an expression for the slack variable in terms of the hyperplanes parameters that is substituted in Eq.(5) in a similar manner to [19]. The optimization problem represented in Eq. (7) now becomes fundamentally different than the standard SVM.
$$L(w,b)=\frac{1}{2}{\displaystyle \sum _{m=1}^{c}({w}_{m}\cdot {w}_{m}}+{b}_{m}\cdot {b}_{m})+\frac{\lambda}{2c}{\displaystyle \sum _{i=1}^{N}{\displaystyle \sum _{m\ne {y}_{i}}^{c}{\zeta}^{m}(({w}_{{y}_{i}}{w}_{m}){x}_{i}+({b}_{{y}_{i}}{b}_{m})2}}{)}^{2}$$
With the proposed changes and by dropping the Lagrange multipliers, the solution is reduced to an unconstrained optimization and becomes equal to the rate of change in the value of the objective function. This makes it faster than the standard LSSVM which is known to converge slower than neural networks for a given generalization performance. In a traditional SVM, nonzero Lagrange multipliers correspond to a SV that summarizes the training data set. By storing only the SV and by discarding the training set after the classifier model has been established, the storage requirement over having to store the complete training set is reduced. However, depending on the classification task, the number of SVs can still be numerous, and the order of operations that are needed for N training points with f as the dimension for the feature space and N_{s} as the total number of SVs would range from $({N}_{s}^{3}+{N}_{s}^{2}.l+{N}_{s}.f.N)$ to $({N}_{s}^{2}+{N}_{s}.f.N)$ depending on the SV location with respect to the hyperplanes [4]. WPSVM does not numerically solve for the support vectors and their nonzero Lagrange multipliers. Instead, it classifies points by assigning them to the closest parallel planes without explicitly calculating the SVs. The hyperplanes are still pushed apart by a maximum margin w, but the data are clustered around the planes rather than on the planes. The uniqueness of the global solution is still valid because it is a property of the Hessian being positive definite or semi definite [28].
Figure 1 a represents a standard multiclass SVM with the SV lying on the hyperplanes, whereas Figure 1 b illustrates the proposed WPSVM where data points are rather clustered around the hyperplanes.
The mathematical steps for the optimization start by solving for the partial derivatives of L(w,b) with respect to both w and b.
$$\frac{\partial L(w,b)}{\partial {w}_{n}}=0,\frac{\partial L(w,b)}{\partial {b}_{n}}=0$$
Defining
Eq. (8) becomes:
$$\{\begin{array}{c}\begin{array}{l}\frac{c.{w}_{n}}{\lambda}+{\displaystyle \sum _{i=1}^{N}[}({x}_{i}\cdot {x}_{i}^{T}({w}_{{y}_{i}}{w}_{n}){x}_{i}({b}_{{y}_{i}}{b}_{n})2{x}_{i})(1{a}_{i})+\\ {\displaystyle \sum _{m\ne {y}_{i}}^{c}{\zeta}^{m}({x}_{i}{x}_{i}^{T}({w}_{{y}_{i}}{w}_{m})+{x}_{i}({b}_{{y}_{i}}{b}_{m})+2{x}_{i})}{a}_{i}]=0\\ \frac{c.{b}_{n}}{\lambda}+{\displaystyle \sum _{i=1}^{N}[({x}_{i}^{T}({w}_{{y}_{i}}}{w}_{n})+({b}_{{y}_{i}}{b}_{n})+2)(1{a}_{i})+\\ {\displaystyle \sum _{m\ne {y}_{i}}^{c}{\zeta}^{m}(}{x}_{i}^{T}({w}_{{y}_{i}}{w}_{m})+({b}_{{y}_{i}}{b}_{m})+2){a}_{i}]=0\end{array}\end{array}$$
Using another definition: ${S}_{\begin{array}{l}w\end{array}}:={\displaystyle \sum _{i=1}^{N}[({w}_{{y}_{i}}{w}_{n}}){x}_{i}{x}_{i}^{T}(1{a}_{i})+{\displaystyle \sum _{m\ne {y}_{i}}^{c}{\zeta}^{m}({w}_{{y}_{i}}}{w}_{m}){x}_{i}{x}_{i}^{T}{a}_{i}]$.
And let q(n) represent the size of a class n. We can thus rewrite S_{w} as:
$${S}_{w}={\displaystyle \sum _{i=1}^{N}({w}_{{y}_{i}}{w}_{n}}){x}_{i}{x}_{i}^{T}+{\displaystyle \sum _{p=1}^{q(n)}{x}_{{i}_{P}}{x}_{{i}_{P}}^{T}{\displaystyle \sum _{m=1}^{c}{\zeta}^{m}({w}_{n}}{w}_{m})}$$
A similar argument shows that:
$$\begin{array}{l}{S}_{b}:={\displaystyle \sum _{i=1}^{N}[({b}_{{y}_{i}}{b}_{n}}){x}_{i}(1{a}_{i})+{\displaystyle \sum _{m\ne {y}_{i}}^{c}{\zeta}^{m}({b}_{{y}_{i}}{b}_{m}){x}_{i}}{a}_{i}]\\ \Rightarrow {S}_{\begin{array}{l}b\end{array}}={\displaystyle \sum _{i=1}^{N}({b}_{{y}_{i}}{b}_{n}}){x}_{i}+{\displaystyle \sum _{p=1}^{q(n)}{x}_{{i}_{p}}{\displaystyle \sum _{m=1}^{c}{\zeta}^{m}({b}_{n}}{b}_{m})}\\ \mathrm{and}{S}_{2}:={\displaystyle \sum _{i=1}^{N}}[2{x}_{i}(1{a}_{i}){\displaystyle \sum _{m\ne {y}_{i}}^{c}2{\zeta}^{m}{x}_{i}}{a}_{i}]\\ \Rightarrow {S}_{2}={\displaystyle \sum _{i=1}^{N}2{x}_{i}}{\displaystyle \sum _{p=1}^{q(n)}2{x}_{{i}_{p}}}{\displaystyle \sum _{p=1}^{q(n)}{\displaystyle \sum _{m=1}^{c}2{\zeta}^{m}{x}_{{i}_{p}}}}=2{\displaystyle \sum _{i=1}^{N}{x}_{i}}2(1+c){\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{m}{x}_{{i}_{p}}}\end{array}$$
Applying a similar reasoning for b, we can rearrange Eq. (9) to obtain:
$$\{\begin{array}{c}\begin{array}{l}(\frac{c.I}{\lambda}+{\displaystyle \sum _{i=1}^{N}{x}_{i}{x}_{i}^{T}}+c{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{ip}{x}_{{i}_{p}}^{T}}){w}_{n}+{b}_{n}({\displaystyle \sum _{i=1}^{N}{x}_{i}+}c{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}})=\\ {\displaystyle \sum _{i=1}^{N}{x}_{i}{x}_{i}^{T}}{w}_{{y}_{i}}+{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{ip}{x}_{{i}_{p}}^{T}{\displaystyle \sum _{m=1}^{c}{w}_{m}}}+{\displaystyle \sum _{i=1}^{N}{x}_{i}{b}_{{y}_{i}}}+{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}{\displaystyle \sum _{m=1}^{c}{b}_{m}}}+2{\displaystyle \sum _{i=1}^{N}{x}_{i}}2(1+c){\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}}\end{array}\\ \begin{array}{l}({\displaystyle \sum _{i=1}^{N}{x}_{i}^{T}+}c{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}^{T}}){w}_{n}+{b}_{n}(\frac{c}{\lambda}+N+cq(n))=\\ {\displaystyle \sum _{i=1}^{N}{x}_{i}^{T}{w}_{y}}+{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}^{T}{\displaystyle \sum _{m=1}^{c}{w}_{m}}}+{\displaystyle \sum _{i=1}^{N}{b}_{{y}_{i}}}+q(n){\displaystyle \sum _{m=1}^{c}{b}_{m}}+2(Nc)q(n)\end{array}\end{array}$$
To rewrite Eq. (10) in a matrix form, we use the series of definitions as described in Table 1.
Matrix Symbol  Matrix Element 
C  Diagonal matrix of size (f*c) by (f*c), the diagonal elements are composed of the square matrix c_{n} which is of size f: ${c}_{n}=\frac{c.I}{\lambda}+{\displaystyle \sum _{i=1}^{N}{x}_{i}{x}_{i}^{T}}+c{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{ip}{x}_{ip}{x}_{{i}_{p}}^{T}}$ 
D  Diagonal matrix of size (f*c) by c, the diagonal elements are the column vector d_{n} of length f ${d}_{n}={\displaystyle \sum _{i=1}^{N}{x}_{i}+}c{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{ip}{x}_{{i}_{p}}}$ 
E  Column vector of size c made from ${e}_{n}=2{\displaystyle \sum _{i=1}^{N}{x}_{i}}2(1+c){\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}}$ 
H  Matrix of size (f*c) by c. The row vector is h_{n} of length c and of the form ${h}_{n}=\left[{\displaystyle \sum _{p=1}^{q(1)}{\zeta}^{1}{x}_{{i}_{p}}+{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}\text{\hspace{1em}}\begin{array}{ccc}{\displaystyle \sum _{p=1}^{q(2)}{\zeta}^{2}{x}_{{i}_{p}}+{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}}}& \text{}\dots \text{}& {\displaystyle \sum _{p=1}^{q(c)}{\zeta}^{c}{x}_{{i}_{p}}+{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}}}\end{array}}}\right]$ 
G  Square matrix of size (f*c) by (f*c), composed of matrix g_{n} of size f by c such that ${g}_{n}=\left[({\displaystyle \sum _{p=1}^{q(1)}{\zeta}^{1}{x}_{{i}_{p}}{x}_{{i}_{p}}^{T}+{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}{x}_{{i}_{p}}^{T})\text{\hspace{1em}}\begin{array}{cc}\dots \text{}& ({\displaystyle \sum _{p=1}^{q(c)}{\zeta}^{c}{x}_{{i}_{p}}{x}_{{i}_{p}}^{T}+{\displaystyle \sum _{p=1}^{q(n)}{\zeta}^{{i}_{p}}{x}_{{i}_{p}}{x}_{{i}_{p}}^{T})}}\end{array}}}\right]$ 
Q  Square matrix of size c, made from the row vector q_{n} of length c q_{n} = [(q(1)+q(n)) ... (q(c)+q(n))] 
U  Column vector of size c, made from u_{n} such that u_{n} = −2(N −cq(n)) 
R  Square diagonal matrix of size c, the diagonal elements r_{n} are as follows ${r}_{n}=\frac{1}{\lambda}+N+cq(n)$ 
The above definitions allow us to manipulate Eq. (10) and rewrite it as a system of equations:
$$\{\begin{array}{c}(CG)W+(DH)B=E\\ {(DH)}^{T}W+(RQ)B=U\end{array}$$
Solving these equations for W and B, we obtain:
$$\left[\begin{array}{c}W\\ B\end{array}\right]={\left[\begin{array}{cc}(CG)& (DH)\\ {(DH)}^{T}& (RQ)\end{array}\right]}^{1}\left[\begin{array}{c}E\\ U\end{array}\right]$$
We define matrix A to be:
and L to be:
$$A=\left[\begin{array}{cc}(CG)& (DH)\\ {(DH)}^{T}& (RQ)\end{array}\right]$$
$$L=\left[\begin{array}{c}E\\ U\end{array}\right]$$
These definitions allow us to rewrite Eq. (12) in a very compact form:
$$\left[\begin{array}{c}W\\ B\end{array}\right]={A}^{1}L$$
Eq. (15) provides the separating hyperplane slopes and the intercept values for the different c classes. The hyperplane parameters are uniquely defined by the matrices A and L, and don’t depend on the SVs or the Lagrange multipliers.
A data point is tested against the decision function shown in Eq. (16) and is assigned to the class that shows the highest output value:
$$\mathrm{Class\; of\; x}\equiv \mathrm{arg}{\mathrm{max}}_{i=1,\cdots ,c}({w}_{i}^{T}.x+{b}_{i})$$
3.2. Proposed WPSVM
Once the hyperplane slopes have been defined, incorporation of recently acquired image data into a traditional LSSVM model necessitates a full retraining of the system in order to calculate the new model parameters:
$${\left[\begin{array}{c}W\\ B\end{array}\right]}_{new}=\left[\begin{array}{cc}({C}_{new}{G}_{new})& ({D}_{new}{H}_{new})\\ {({D}_{new}{H}_{new})}^{T}& ({R}_{new}{Q}_{new})\end{array}\right]\left[\begin{array}{c}{E}_{new}\\ {U}_{new}\end{array}\right]$$
For large data sets, such retraining is not efficient. It is expensive in terms of memory and computation time requirements. To maintain an acceptable balance between storage, accuracy and computation time, we propose the WPSVM, a dynamic Weighted Proximal SVM approach. Whenever the model needs to be updated, each incremental sequence will alter matrices C, G, D, H, E, R, Q and U as defined in Eqs. (13) and (14) by the amounts of ΔC, ΔG, ΔD, ΔH, ΔE, ΔR, ΔQ and ΔU respectively. As an example, let us consider a recently acquired data set x_{N}_{+1} that belongs to class t. Eq. (17) then becomes:
$${\left[\begin{array}{c}W\\ B\end{array}\right]}_{new}={\left[\begin{array}{cc}(C+\mathsf{\Delta}C)(G+\mathsf{\Delta}G)& (D+\mathsf{\Delta}D)(H+\mathsf{\Delta}H)\\ (D+\mathsf{\Delta}D)(H+\mathsf{\Delta}H)& (R+\mathsf{\Delta}R)(Q+\mathsf{\Delta}Q)\end{array}\right]}^{1}\left[\begin{array}{c}E+\mathsf{\Delta}E\\ U+\mathsf{\Delta}U\end{array}\right]$$
In order to adequately capture the effect of the newly acquired sequences, and to ensure that their impact on the hyperplane orientations W and B is accounted for despite the unbalanced classes, we scale the incremental changes in ΔC, ΔG, ΔD, ΔH, ΔE, ΔR, ΔQ and ΔU by some weight factors (Ψ). The basic idea of the WPSVM is to assign an entropy measure to each incremental data point. These weight factors are determined based on the misclassification rate, the relative importance of a dynamic data point with respect to its class and its variance with respect to the other classes. The proposed weight factors are defined as:
$${\psi}_{fc}=(\nu .\mathrm{log}\frac{N}{\nu})/\sqrt{\mathrm{min}\mathrm{arg}({s}_{fc}^{2}.{\zeta}^{c})}$$
We define $v$ as the frequency of the incremental data sequence acquired, and N is the total number of sequence data that was used to determine the initial hyperplane parameters of the model. The ${s}_{fc}^{2}$ factor is the Mahalanobis distance between an incremental data feature f and the hyperplane parameters for class c, scaled by ζ^{c} that represents the error rate observed in class c before the introduction of new incremental data. Eq. (18) ensures that different data points have different impact on the classifier parameters, and that data points which have low probability of occurrence but that are nevertheless important with respect to the hyperplane position are not outnumbered and neglected in the dynamic model update process.
1) Dynamic Processing for Sequential Data
Sequential data refers to incremental data being acquired and processed serially as they are acquired. To assist in the mathematical manipulation, we define the following matrices:
$${I}_{c}=\left[\begin{array}{cccccc}1& 0& 0& 0& .& 0\\ 0& 1& 0& 0& .& 0\\ .& .& .& .& .& .\\ 0& 0& 0& 1+c& .& 0\\ .& .& .& .& .& .\\ 0& 0& 0& 0& .& 1\end{array}\right];{I}_{t}=\left[\begin{array}{cccccc}0& 0& .& 1& .& 0\\ 0& 0& .& 1& .& 0\\ .& .& .& .& .& .\\ 1& 1& .& 2& .& 1\\ .& .& .& .& .& .\\ 0& 0& .& 1& .& 0\end{array}\right];{I}_{e}=\left[\begin{array}{c}1\\ 1\\ .\\ (1c)\\ \begin{array}{l}.\\ 1\end{array}\end{array}\right]$$
We can then rewrite the incremental change as follows:
$$\mathsf{\Delta}C=\mathsf{\Psi}({x}_{N+1}{x}_{N+1}^{T}){I}_{c};\text{}\mathsf{\Delta}G=\mathsf{\Psi}({x}_{N+1}{x}_{N+1}^{T}){I}_{t}$$
$$\mathsf{\Delta}D=\mathsf{\Psi}{x}_{N+1}{I}_{c};\text{}\mathsf{\Delta}H=\mathsf{\Psi}{x}_{N+1}^{T}{I}_{t};$$
$$\mathsf{\Delta}E=2\mathsf{\Psi}{x}_{N+1}{I}_{e};\text{}\mathsf{\Delta}R={I}_{c};$$
$$\mathsf{\Delta}Q={I}_{t};\text{}\mathsf{\Delta}U=2{I}_{e}.$$
The dynamic model parameters now become:
$${\left[\begin{array}{c}W\\ B\end{array}\right]}_{new}={\left[A+\left[\begin{array}{cc}\mathsf{\Psi}({x}_{N+1}{x}_{N+1}^{T})({I}_{c}{I}_{t})& \mathsf{\Psi}{x}_{N+1}^{T}({I}_{c}{I}_{t})\\ \mathsf{\Psi}{x}_{N+1}^{T}({I}_{c}{I}_{t})& ({I}_{c}{I}_{t})\end{array}\right]\right]}^{1}\left[L+\left[\begin{array}{c}2\mathsf{\Psi}{x}_{N+1}{I}_{e}\\ 2{I}_{e}\end{array}\right]\right]$$
$$\mathrm{Let}\text{}\mathsf{\Delta}A=\left[\begin{array}{cc}\mathsf{\Psi}({x}_{N+1}{x}_{N+1}^{T})({I}_{c}{I}_{t})& \mathsf{\Psi}{x}_{N+1}^{T}({I}_{c}{I}_{t})\\ \mathsf{\Psi}{x}_{N+1}^{T}({I}_{c}{I}_{t})& ({I}_{c}{I}_{t})\end{array}\right]\text{}\mathrm{and}\text{}\mathsf{\Delta}L=\left[\begin{array}{c}2\mathsf{\Psi}{x}_{N+1}{I}_{e}\\ 2{I}_{e}\end{array}\right]$$
We thus can rewrite Eq. (15) to reflect incremental learning:
$${\left[\begin{array}{c}W\\ B\end{array}\right]}_{new}={(A+\mathsf{\Delta}A)}^{1}(L+\mathsf{\Delta}L)$$
Eq. (19) shows that the separating hyperplane slopes and intercepts of Eq. (15) for the c different classes can be efficiently updated by use of the old model parameters. The incremental change introduced by the recently acquired data stream is incorporated as a weighted ‘perturbation’ to the initially established system parameters. Any changes in ΔA are absorbed by the changes in ΔL and vice versa. L(w,b) remains convex and the proposed solution still satisfies KKT conditions.
2) Dynamic Processing for Chunk Data
For incremental chunk processing, the data are still acquired incrementally, but they are stored in a buffer awaiting batch processing. To update the model after capturing k sequences, the recently acquired data are processed and the model is updated as described in Eq. (18). Alternatively, we can use the ShermanMorrisonWoodbury (SMW) [30] generalization formula to account for the perturbation introduced by matrices M and L such that ${(I+{M}^{T}{A}^{1}L)}^{1}$ exists. In this case, the SMW generalization formula is
where
$${(A+L{M}^{T})}^{1}={A}^{1}{A}^{1}L{(I+{M}^{T}{A}^{1}L)}^{1}{M}^{T}{A}^{1}$$
$$M=\psi \left[\begin{array}{c}{x}_{N+1}(I_{c}{I}_{t})\\ (I_{c}{I}_{t})\end{array}\right];L=\psi {\left[\begin{array}{c}{x}_{N+1}\\ I\end{array}\right]}^{T}$$
Using Eqs. (17) and (20), the new model can represent the incrementally acquired sequences as follows:
$${\left[\begin{array}{c}W\\ B\end{array}\right]}_{new}={\left[\begin{array}{c}W\\ B\end{array}\right]}_{old}+\left[\begin{array}{c}\mathsf{\Delta}E\\ \mathsf{\Delta}U\end{array}\right]+\left[\left[\begin{array}{c}\mathsf{\Delta}E\\ \mathsf{\Delta}U\end{array}\right]{\left[\begin{array}{c}W\\ B\end{array}\right]}_{old}\right]\left[I{A}^{1}M{(I+{M}^{T}{A}^{1}L)}^{1}{M}^{T}{A}^{1}\right]$$
Eq. (21) shows the influence of the incremental data on calculating the new separating hyperplane slopes and intercept values for the c different classes. The proposed WPSVM meets all the main requirements for online learning and uses the learned knowledge towards incorporating new ‘experiences’ in a computationally efficient manner. The leftmost subfigure 2.a represents the plane orientation before the acquisition of x_{N+1}, whereas the rightmost subfigure 2.b shows the effect of x_{N+1} on shifting the planes orientation whenever an update is necessary.
Table 2 depicts the workflow of the WPSVM classifier.
Step #  Algorithm 
Step 1  Train initial model using TrainSet which consists of N patient data each having f features. $\left[\begin{array}{c}W\\ B\end{array}\right]={A}^{1}L;A=\left[\begin{array}{cc}(CG)& (DH)\\ {(DH)}^{T}& (RQ)\end{array}\right];\text{}L=\left[\begin{array}{c}E\\ U\end{array}\right]$ Store only W and B as Initial_Model. Discard TrainSet 
Step 2  Acquire incremental data IncSet. 
Step 3  Validate the generalization performance using decision function of Initial_Model with the independant TestSet $f(x)=\mathrm{arg}\underset{m}{\mathrm{max}}(({w}_{m}^{T}.x)+{b}_{m}),m=\mathrm{1...}c$

4. Experimental Results
4.1. Data Set Details and Feature Selection
To assess the classification accuracy of WPSVM, we used volumes of interest (VOIs) representing lesion candidates in clinical CTC data sets. The VOIs were labeled into true polyps (TP) and false positives (FP) by expert radiologists. The CTC data used, was acquired by use of helical singleslice and multislice CT scanners (GE HiSpeed CTi, LightSpeed QX/I, and LightSpeed Ultra; GE Medical Systems, Milwaukee, WI). The patients’ colons were prepared with standard laxative precolonoscopy cleansing and scanned in supine and prone positions with collimations of 1.25  5.0 mm, reconstruction intervals of 1.0 – 5.0 mm, Xray tube currents of 50 – 260 mAs with 120 – 140 kVp, inplane voxel sizes of 0.51– 0.94 mm, and a CT image matrix size of 512 x 512. Two CTC scan positions (supine and prone) are generally used for each patient to improve the specificity of polyp detection through improved differentiation of mobile residual stool from polypoid lesions [31, 32]. We further divided the VOIs in the TP class into two categories: mediumsize polyps that were between 69 mm in size (hereafter, TP1), and large polyps ≥10 mm (hereafter, TP2). This partition was determined by correlating the CTC data with colonoscopy reports. The motivation for this sizebased partitioning is that in colorectal screening, large polyps are considered to require polypectomy, whereas for smaller polyps a followup surveillance may suffice. A total of 61 colonoscopyconfirmed polyps measured 6 mm or larger: 28 polyps were identified as TP1, and 33 polyps as TP2. The number of entries in the TP class would be higher than the number of actual polyps and this is because the lesion may be seen in both supine and prone positions, and because some large lesions could be represented by more than one detection.
Symbol  Name  Count 
DB1  Database 1  Class 1 (FP) = 8008 Class 2 (TP1) = 43 Class 3 (TP2) = 84 
VOI=16*16*16=4096 Features 
FP: False Positive; TP1: True Positive (mediumsize polyps: 6–9 mm); TP2: True Positive (polyps >=10 mm); VOI: Volume of Interest
To compare the classification performance of WPSVM with previously published CAD results, and to confine the variability to the classifier method itself, we used the technique proposed by our earlier work in [2]. As for the feature extraction technique, we adopted the 3D CAD scheme also developed earlier in [2] that extracts a thick region encompassing the entire colonic wall in an isotropic CTC volume. Discriminative geometric features (shape index, curvedness, CT value, gradient, gradient concentration, and directional gradient concentration, where each of which is characterized by nine statistics) identify polyps at each voxel of the extracted colon and are used for detecting polyp candidates. Figure 3 (a) represents an axial CT slice where the white box indicates a region of interest with a polyp whereas Figure 3 (b) is a magnification of the region of interest with the polyp indicated by a white arrow. Folds are shown in light gray and colonic wall in dark gray. Suspicious regions identified by connected components are further segmented by use of hysteresis thresholding followed by fuzzy clustering to identify true polyps from nonpolyps.
4.2. Performance and validation criteria for WPSVM
Because ML algorithms have a tradeoff between the classification accuracy on training data and the generalization accuracy on novel data, and because FP occurrences are much more frequent than those of TP1 and TP2, we calculated four performance measurements: the confusion rate (Mis_Err), the True Positive, the True Negative, and the False Positive Ratios. These can be derived from the entries s_{ij} of the confusion matrix CM:
$$CM=\left[\begin{array}{ccc}{s}_{11}& {s}_{12}& {s}_{13}\\ {s}_{21}& {s}_{22}& {s}_{23}\\ {s}_{31}& {s}_{32}& {s}_{33}\end{array}\right]$$
Index i represents the correct class and index j the predicted class. Thus, s_{ij} represents the number of data belonging to class i that WPSVM classified as belonging to class j. The True Positive Ratio (TPR), also known as sensitivity, reflects how sensitive WPSVM is in detecting polyps, whereas the True Negative Ratio (TNR), also referred to as specificity, represents how accurately the classifier identifies false positives. The False Positive Ratio (FPR) is simply the complement of TNR, and Mis_Err is the overall misclassification rate.
$$Mis\_Err=\frac{{\displaystyle \sum _{i=1,i\ne j}^{c}{s}_{ij}}}{{\displaystyle \sum _{i=1,j=1}^{c}{s}_{ij}}},TP{R}_{i}=\frac{{s}_{ii}}{{\displaystyle \sum _{j=1}^{3}{s}_{ij}}},TN{R}_{j}=\frac{{s}_{jj}}{{\displaystyle \sum _{j=1}^{3}{s}_{ij}}},\mathrm{and}FPR=1TNR$$
4.3. WPSVM Performance in Processing Chunk versus Sequential Data
To characterize the detection performance of WPSVM, we divided DB1 into 3 independent sets of data: a training set (hereafter, TrainSet), a testing set (hereafter, TestSet), and an incremental set (hereafter, IncSet) in a manner to preserve all data that belong to a patient in one of these sets. This validation technique insures exclusion of any criterion that has been optimized during the model training phase from optimistically biasing the model generalization performance in the validation step. We compared CTC classification performance when the ML model was retrained (hereafter, Retrain_Model) to the case where incremental learning using WPSVM was applied. In the latter case, the dynamic data were processed either in a chunk manner (hereafter, Inc_Model) and by incorporating the data sequentially into the classifier (hereafter, Inc_Seq_Model). We also compared WPSVM to the confusion rate for simple incremental SVM as compared to the Retrain_Model. Table 3 summarizes the average result for 20 different experiments as well as CPU requirements as normalized to the baseline of Retrain_Model while using Matlab’s etime routine in order to insure the analysis is independent of machine specifics.
Table 4.
Normalized Confusion Rates and CPU Requirements with respect to Retrain_Model for Inc_Model, Inc_Seq_Model, and Incremental SVM.
Inc_Model  INC_SEQ_MODEL  Incremental SVM  
Confusion Rate  1.2  1.07  1.24 
CPU Time  0.62  0.675  0.687 
The ratio of the confusion rates of Inc_Model to the Retrain_Model was found to be 1.2 on average. And for the sequential processing, the ratio of the confusion rates of the WPSVM to the Retrain_Model was on average improved to a factor of 1.07  which represents almost a 16% improvement over the chunk data processing scenario. Table 3 also shows the CPU usage times for the models normalized with respect to Initial_Model CPU requirements. On average, the ratio of the CPU times of Inc_Model to the Retrain_Model was 0.401. We also observed a marginal degradation in the CPU time for the Inc_Seq_Model. The ratio of the CPU times of Inc_Seq_Model to the Retrain_Model was 0.675. This means that the improvement in the sequential classifier’s accuracy degraded CPU usage time with respect to batch processing. However, this is a reasonable price to incur for enhancing polyp detection by almost 16% with respect to the batch processing.
To illustrate the advantage of online learning in that incremental training could have the advantage of possibly improving classifier accuracy by being able to incorporate more data into the model over the baseline retrain method., we started with an intentionally poorperforming Initial_Model to which WPSVM was applied iteratively for testing the model convergence.
As shown in Figure 4, WPSVM convergence rate for TP2 in reaching an acceptable sensitivity level is basically influenced by the size of IncSet. With larger IncSet sizes successively applied to Initial_Model, WPSVM adjusted faster the hyperplane positions to gradually learn classification of the CTC data without having to consume resources for retraining thus validating the viability of incremental learning in improving model parameters after some iterative training.
4.4. WPSVM Specificity and Storage Requirements
Because CTC CAD data are often highly unbalanced with respect to the size of the classes TP1, TP2, and FP, the confusion rate does not fully demonstrate the effectiveness of WPSVM, we therefore investigated the sensitivity of WPSVM in detecting polyps while the penalty factors ζ and λ were varied. The main fallout in classification accuracy occurred between classes TP1 and TP2. The WPSVM identified the FP class correctly but failed to reach 100% detection sensitivity in the classes TP1 and TP2. This is not surprising considering that these classes are not completely linearly separable and that kernel functions were not used in mapping the input feature space in the SVM procedure. Figure 5 compares the performance of Inc_Model and Retrain_Model in terms of receiver operating characteristic (ROC) curves. The curves depict the tradeoff between the TPR and FPR rates for TP1, TP2, and FP.
Figure 5 indicates that as TPR increases, FPR increases as well because the FPR which makes WPSVM a good decision method will simultaneously have a reasonably high detection sensitivity and specificity for a specific setting of ζ. Since WPSVM reached a sensitivity of 91 % and 96 % for TP1 and TP2 with a specificity of 90.3% and 90 % and an average of 3.2 false positive per patient respectively. Since the area under the ROC curve of WPSVM is greater than the area under a hypothetical diagonal line which would represent a random guess, we can conclude that the obtained WPSVM ROC curves are informative and that WPSVM presents a promising online learning algorithm for detecting polyps.
Reference  Results  Settings 
[33]  95%, average of 1.5 false positive per patient  72 patients, 144 data sets, 21 polyps >=5 mm in 14 patients 
[34]  90.5%, average of 2.4 false positive per patient  121 patients, 242 data sets, 42 polyps >=5 mm in 28 patients 
[35]  80%, average of 8.2 false positive per patient  18 patients, 15 polyps >= 5mm in 9 patients 
[36]  100%, average of 7 false positive per patient  8 patients, 7 polyps>=10 mm in 4 patients 
50%, average of 7 false positive per patient  8 patients, 11 polyps measuring between 5 – 9 mm in 3 patients  
[37]  90%, average of 15.7 false positive per patient  40 patients, 80 data sets,39 polyps>=3 mm in 20 patients 
WPSVM  93.4% average of 3.2 false positive per patient  169 patients, 28 polyps measuring between 69 mm and 33 polyps >10mm 
On average, detection performance reported by the CAD schemes used for binary classification in a non–dynamic scheme, as shown in Table 5, has varied between 50% and 100% with 1.5 to 15.7 false positives per patient. The WPSVM results for falsepositive findings per patient compares favorably with these published results, especially when we consider that WPSVM is being applied as an online multiclassifier on a larger database. Note that mutliclassification accuracy is expected to be negatively impacted in comparison to binary classification. The parametric model of the SVM allows for adjustments when constructing the discriminant function. However, for multiclass problems these parameters do not always exhibit a perfect fit across the entire data set. This is partly supported by the fact that the VC dimension “h” impacts the generalization error and is bounded by the slope of the hyperplane w and R the radius of the smallest sphere that contains all the training points according to [28]: $h<\frac{{R}^{2}}{{\Vert w\Vert}^{2}}$
Finally, Table 6 compares the storage requirements of the Retrain_Model vs. the Inc_model when WPSVM is applied. Over time, as the polyps’ database increases, computer storage for the Retrain_Model will require memory space proportional to the number of CTC acquired so that the model can be retrained each time a new CTC is acquired. For the Inc_model, the memory space is reduced to the number of features f multiplied by the number of different classes.
Classifier Type  Data Structure Size 
Retrain_Model  1 a permanent storage of size (N+incnum)*f that is always increasing. 
Inc_Model  1 f by c for classifier parameters 2temporary memory of size incnum*f for dynamic data if classifier is not updated. 
incnum= number of dynamic data acquired
5. Conclusions
We presented a novel extension to LSSVM to provide a dynamic multiclassification framework for CTC classification. The ratio of the confusion rates of Inc_Seq_Model and Retrain_Model was 1.07 on average, and the CPU requirements of the WPSVM were 0.675 times the Retrain_Model. The accuracy of the proposed model was more constrained by the initial model accuracy when chunk learning rather than iterative learning was applied. Performance evaluation based on 169 clinical CTC cases showed using a 3D computeraided diagnosis (CAD) scheme for feature reduction polyp detection sensitivities of 91% and 96% for 6 – 9 mm and ≥10 mm polyps with specificities of 90.3% and 90%, respectively. We also showed that the storage requirements of WPSVM are drastically reduced compared to standard classification, and that this is due to the fact that only the hyperplane parameters are required for updating the classifier. The experimental results demonstrate the capability of WPSVM in detecting polyps and motivate further work to improve performance accuracy and specificity measures as well as to validate the detection rates on a larger TP database. Further developments will include the application of kernel methods to WPSVM and an adaptation of SVM as an image preprocessing technique for feature extraction. The future work will also involve the validation of the WPSVM over a wider range of TP subclasses such as pedunculated, sessile, and flat polyps, and over a wider range of FP subclasses such as folds, stool, and tagged materials.
Acknowledgements
This work was partially supported by the University Research Grant provided by the American University of Beirut and the dean office of the School of Engineering at Virginia Commonwealth University.
References and Notes
 Macari, M.; Bini, E.J. CT Colonography: Where Have We Been And Where Are We Going? Radiology 2005, 237, 819–833. [Google Scholar]
 Yoshida, H.; Näppi, J. ThreeDimensional ComputerAided Diagnosis Scheme for Detection of Colonic Polyps. IEEE T. Med. Imaging 2001, 20, 1261–1274. [Google Scholar]
 Duda, R.; Hart, P.; Stork, D. Pattern Classification, 2nd Ed. ed; John Wiley & Sons: New York, NY, USA, 2001. [Google Scholar]
 Cristianini, N.; ShaweTaylor, J. An Introduction To Support Vector Machines And Other KernelBased Learning Methods; Cambridge University Press: New York, NY, USA, 2000; pp. 64–87. [Google Scholar]
 Basu, S.; Bilenko, M.; Banerjee, A.; Mooney, R. Probabilitic SemiSupervised Clustering With Constraints, in SemiSupervised Learning; Chapelle, O., Scholkopf, B., Zien, A., Eds.; The MIT Press: New York, NY, USA, 2006; p. 72. [Google Scholar]
 Zou, A.; Wu, F.X.; Ding, J.R.; Poirier, G.G. Quality Assessment Of Tandem Mass Spectra Using Support Vector Machine. BMC Bioinformatics 2009, 10 Suppl. 1. [Google Scholar]
 Isa, D.; Lee, L.H.; Kallimani, V.P.; RajKumar, R. Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine. IEEE T. Knowl. Data En. 2008, 20, 1264–1272. [Google Scholar] [CrossRef]
 Zhang, L.; Wei, Y.; Wang, Z. Prediction on Ecological Water Demand Based on Support Vector Machine. International Conference on Computer Science and Software Engineering 2008, 5, 1032–1035. [Google Scholar]
 Chen, S.H. A Support Vector Machine Approach For Detecting GeneGene Interaction. Genet. Epidemiol. 2007, 32, 152–167. [Google Scholar] [CrossRef] [PubMed]
 Yao, X.; Tham, L.G.; Dai, F.C. Landslide Susceptibility Mapping Based on Support Vector Machine: A Case Study On Natural Slopes of Hong Kong, China. Geomorphology 2008, 101, 572–582. [Google Scholar] [CrossRef]
 Cheng, J.; Baldi, P. Improved Residue Contact Prediction Using Support Vector Machines And A Large Feature Set. BMC Bioinformatics 2007, 8, 113. [Google Scholar] [CrossRef] [PubMed]
 Ribeiro, B. Support Vector Machines For Quality Monitoring In A Plastic Injection Molding Process. IEEE T. Syst. Man Cy. C 2005, 35, 401–410. [Google Scholar] [CrossRef]
 Valentini, G. An Experimental BiasVariance Analysis of SVM Ensembles Based on Resampling Techniques. IEEE T. Syst. Man Cy. B 2005, 35, 1252–1271. [Google Scholar] [CrossRef]
 Waring, C.; Liu, X. Face Detection Using Spectral Histograms and SVMs. IEEE T. Syst. Man Cy. B 2005, 35, 467–476. [Google Scholar] [CrossRef]
 Chakrabartty, S.; Cauwenberghs, G. SubMicrowatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation. Adv. Neural Information Processing Systems (NIPS'2004) 2005, 17. [Google Scholar]
 Dacheng, T.; Tang, X.; Li, X.; Wu, X. Asymmetric Bagging and Random Subspace for Support Vector MachinesBased Relevance Feedback in Image Retrieval. IEEE T. Pattern Anal. 2006, 28, 1088–1099. [Google Scholar] [CrossRef] [PubMed]
 Dong, J.X.; Krzyzak, A.; Suen, C.Y. Fast SVM Training Algorithm With Decomposition On Very Large Data Sets. IEEE T. Pattern Anal. 2005, 27, 1088–1099. [Google Scholar]
 Mao, K. Feature Subset Selection For Support Vector Machines Through Discriminative Function Pruning Analysis. IEEE T. Syst. Man Cy. B 2004, 34, 60–67. [Google Scholar] [CrossRef]
 Fung, G.; Mangasarian, O. Proximal Support Vector Machine Classifiers. In Proceedings of the 7th ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 26–29, 2001; pp. 77–86.
 Song, Q.; Hu, W.; Xie, W. Robust Support Vector Machine With Bullet Hole Image Classification. IEEE T. Syst. Man Cy. C 2002, 32, 440–448. [Google Scholar] [CrossRef]
 Hua, S.; Sun, Z. A Novel Method of Protein Secondary Structure Prediction With Light Segment Overlap Measure: Support Vector Machine Approach. J. Mol. Biol. 2001, 308, 397–407. [Google Scholar] [CrossRef] [PubMed]
 Matas, J.; Li, Y. P.; Kittler, J.; Jonsson, K. Support Vector Machines For Face Authentication. Image Vis. Comput. 2002, 20, 369–375. [Google Scholar]
 Chiu, D.Y.; Chena, P.J. Dynamically Exploring Internal Mechanism of Stock Market by FuzzyBased Support Vector Machines With High Dimension Input Space and Genetic Algorithm. IEEE Expert 2009, 36, 1240–1248. [Google Scholar] [CrossRef]
 Guoa, X.; Yuan, Z.; Tian, B. Supplier Selection Based On Hierarchical Potential Support Vector Machine. IEEE Expert 2009, 36, 6978–6985. [Google Scholar]
 Yu, L.; Chen, H.; Wang, S.; Lai, K.K. Evolving Least Squares Support Vector Machines for Stock Market Trend Mining. IEEE T. Evolut. Comput. 2009, 13, 87–102. [Google Scholar]
 Gao, Z.; Lu, G.; Gu, D. A Novel P2P Traffic Identification Scheme Based on Support Vector Machine Fuzzy Network. Knowledge Discovery and Data Mining 2009, 909–912. [Google Scholar]
 Diehl, C.; Cauwenberghs, G. SVM Incremental Learning, Adaptation and Optimization. Proceedings of the International Joint Conference on Neural Networks 2003, 4, 2685–2690. [Google Scholar]
 Vapnik, V. H. The Nature of Statistical Learning Theory, 2nd Ed. ed; Springer: New York, NY, USA, 2000. [Google Scholar]
 Hsu, C.; Lin, C. A Comparison of Methods For MultiClass Support Vector Machines. IEEE T. Neural Networ. 2002, 13, 415–425. [Google Scholar]
 Golub, G.H.; Van Loan, C.F. Matrix Computations; John Hopkins University Press: London, UK, 1996. [Google Scholar]
 Chen, S. C.; Lu, D.S.; Hecht, J. R. CT Colonography: Value of Scanning in Both the Supine and Prone Positions. AJR 1999, 172, 595–599. [Google Scholar] [CrossRef] [PubMed]
 Nappi, J.; Okamura, A.; Frimmel, H.; Dachman, A.H.; Yoshida, H. Region Based SupineProne Correspondence For The Reduction Of FalsePositive Cad Polyp Candidates in CT Colonography. ACAD Radiol. 2005, 12, 695–707. [Google Scholar] [CrossRef] [PubMed]
 Nappi, J.; Yoshida, H. FeatureGuided Analysis For Reduction of False Positives in Cad of Polyps for Computed Tomographic Colonography. Med. Phys. 2003, 30, 1592–1601. [Google Scholar] [CrossRef] [PubMed]
 Kiss, G.; Cleynenbreugel, J.; Thomeer, M.; Suetens, P.; Marchal, G. Computer–aided Diagnosis in Virtual Colonography Via Combination of Surface Normal and Sphere Fitting Methods. Eur. Radiol. 2002, 12, 77–81. [Google Scholar] [CrossRef] [PubMed]
 Paik, D.S.; Beaulieu, C.F.; Rubin, G.D.; Acar, B.; Jeffrey, R.B., Jr.; Yee, J.; Dey, J.; Napel, S. Surface Normal Overlap: a Computer Aided Detection Algorithm with Application to Colonic Polyps and Lung Nodules in Helical CT. IEEE Trans. Med. Imaging 2004, 23, 661–675. [Google Scholar] [CrossRef] [PubMed]
 Jerebko, A.K.; Summers, R.M.; Malley, J.D.; Franaszek, M.; Johnson, C.D. Computer Assisted Detection of Colonic Polyps with CT Colonography Using Neural Networks and Binary Classification Trees. Med. Phys. 2003, 30, 52–60. [Google Scholar] [CrossRef] [PubMed]
 Masutani, Y.; Yoshida, H.; MacEneaney, P.; Dachman, A. Automated Segmentation of Colonic Walls for Computerized Detection of Polyps in CT Colonography. J. Comput. Assist. Tomogr. 2001, 25, 629–638. [Google Scholar] [CrossRef]
© 2010 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an openaccess article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).