MCMS-STM: An Extension of Support Tensor Machine for Multiclass Multiscale Object Recognition in Remote Sensing Images MCMS-STM: An Extension of Support Tensor Machine for Multiclass Multiscale Object Recognition in Remote Sensing Images

: The support tensor machine (STM) extended from support vector machine (SVM) can maintain the inherent information of remote sensing image (RSI) represented as tensor and obtain effective recognition results using a few training samples. However, the conventional STM is binary and fails to handle multiclass classiﬁcation directly. In addition, the existing STMs cannot process objects with different sizes represented as multiscale tensors and have to resize object slices to a ﬁxed size, causing excessive background interferences or loss of object’s scale information. Therefore, the multiclass multiscale support tensor machine (MCMS-STM) is proposed to recognize effectively multiclass objects with different sizes in RSIs. To achieve multiclass classiﬁcation, by embedding one-versus-rest and one-versus-one mechanisms, multiple hyperplanes described by rank-R tensors are built simultaneously instead of single hyperplane described by rank-1 tensor in STM to separate input with different classes. To handle multiscale objects, multiple slices of different sizes are extracted to cover the object with an unknown class and expressed as multiscale tensors. Then, M -dimensional hyperplanes are established to project the input of multiscale tensors into class space. To ensure an efﬁcient training of MCMS-STM, a decomposition algorithm is presented to break the complex dual problem of MCMS-STM into a series of analytic sub-optimizations. Using publicly available RSIs, the experimental results demonstrate that the MCMS-STM achieves 89.5% and 91.4% accuracy for classifying airplanes and ships with different classes and sizes, which outperforms typical SVM and STM methods. Abstract: The support tensor machine (STM) extended from support vector machine (SVM) can maintain the inherent information of remote sensing image (RSI) represented as tensor and obtain effective recognition results using a few training samples. However, the conventional STM is binary and fails to handle multiclass classification directly. In addition, the existing STMs cannot process objects with different sizes represented as multiscale tensors and have to resize object slices to a fixed size, causing excessive background interferences or loss of object’s scale information. Therefore, the multiclass multiscale support tensor machine (MCMS-STM) is proposed to recognize effectively multiclass objects with different sizes in RSIs. To achieve multiclass classification, by embedding one-versus-rest and one-versus-one mechanisms, multiple hyperplanes described by rank-R tensors are built simultaneously instead of single hyperplane described by rank-1 tensor in STM to separate input with different classes. To handle multiscale objects, multiple slices of different sizes are extracted to cover the object with an unknown class and expressed as multiscale tensors. Then, M -dimensional hyperplanes are established to project the input of multiscale tensors into class space. To ensure an efficient training of MCMS-STM, a decomposition algorithm is presented to break the complex dual problem of MCMS-STM into a series of analytic sub-optimizations. Using publicly available RSIs, the experimental results demonstrate that the MCMS-STM achieves 89.5% and 91.4% accuracy for classifying airplanes and ships with different classes and sizes, which outperforms typical SVM and STM methods. in STM, the M-dimensional hyperplanes are built to separate input of multiscale tensors, and the resulting projecting value is used to predict the class label of input to achieve cross-scale object recognition. In addition, to train the OVO version of MCMS-STM efﬁciently, a decomposition algorithm is proposed to split the dual problem of the MCMS-STM into a series of sub-optimizations to accelerate the training. Abstract: The support tensor machine (STM) extended from support vector machine (SVM) can maintain the inherent information of remote sensing image (RSI) represented as tensor and obtain effective recognition results using a few training samples. However, the conventional STM is binary and fails to handle multiclass classification directly. In addition, the existing STMs cannot process objects with different sizes represented as multiscale tensors and have to resize object slices to a fixed size, causing excessive background interferences or loss of object’s scale information. Therefore, the multiclass multiscale support tensor machine (MCMS-STM) is proposed to recognize effectively multiclass objects with different sizes in RSIs. To achieve multiclass classification, by embedding one-versus-rest and one-versus-one mechanisms, multiple hyperplanes described by rank-R tensors are built simultaneously instead of single hyperplane described by rank-1 tensor in STM to separate input with different classes. To handle multiscale objects, multiple slices of different sizes are extracted to cover the object with an unknown class and expressed as multiscale tensors. Then, M -dimensional hyperplanes are established to project the input of multiscale tensors into class space. To ensure an efficient training of MCMS-STM, a decomposition algorithm is presented to break the complex dual problem of MCMS-STM into a series of analytic sub-optimizations. Using publicly available RSIs, the experimental results demonstrate that the MCMS-STM achieves 89.5% and 91.4% accuracy for classifying airplanes and ships with different classes and sizes, which outperforms typical SVM and STM methods. recognition in


Introduction
The diverse types and sizes of objects bring a challenge to object recognition in remote sensing images (RSIs). Recently, mainly owing to its powerful feature abstraction ability, various deep learning technologies have achieved impressive success in different object recognition tasks, such as Fast R-CNN [1], Faster R-CNN [2], local attention based CNN [3],

Introduction
The diverse types and sizes of objects bring a challenge to object recognition in remote sensing images (RSIs). Recently, mainly owing to its powerful feature abstraction ability, various deep learning technologies have achieved impressive success in different object recognition tasks, such as Fast R-CNN [1], Faster R-CNN [2], local attention based CNN [3], 2 CNN [4], SSD [5] and YOLO [6]. In addition to object recognition, deep learning-based methods are widely used for a wide variety of classification tasks based on remote sensing data acquired by different sensors, such as graph convolution neural networks [7] based hyperspectral image classification [8], and multimodal deep learningbased multisource image classification [9]. Despite the recent advances, deep learningbased methods rely heavily on massive available labeled samples. In comparison, the machine learning-based object recognition method can obtain effective results using a small number of samples.

CNN [4]
, SSD [5] and YOLO [6]. In addition to object recognition, deep learning-based methods are widely used for a wide variety of classification tasks based on remote sensing data acquired by different sensors, such as graph convolution neural networks [7] based hyperspectral image classification [8], and multimodal deep learning-based multisource image classification [9]. Despite the recent advances, deep learning-based methods rely heavily on massive available labeled samples. In comparison, the machine learning-based object recognition method can obtain effective results using a small number of samples.
For the object recognition in RSIs, the general procedure consists of two steps, i.e., extracting an object slice using object detection method and recognizing the type of object contained in the slice using a trained classifier. By virtue of the object detection methods, slice to contain more background interferences. The examples of two types of resizin operations are given in Figure 1. It is seen that using the second type of resizing operatio will lose the scale information of the ships and thus cannot identify ships according t their size feature, and using the first type of resizing operation will generate slices with large size so that the slice containing ship with class 1 covers excessive backgroun interferences. To deal with the slices to be classified present different sizes, the dee learning method, i.e., the scale free convolution neural network, is built to utilize th global average pooling to map feature maps to unify size [32]. However, for representativ machine learning methods, e.g., STM, there is no related work that can process the inpu slices with different sizes. In comparison to image resizing, the objects with different size should be contained by slices with proper sizes to reduce the impact of backgroun interferences and maintain inherent scale information of the contained object, and thes slices can be naturally represented as tensors with different dimensions, denoted a multiscale tensors in this paper, while the existing STMs can only process tensor with th same size and cannot process the slices with different sizes represented as multiscal tensors.
Motived by the abovementioned issues, the multiclass multiscale support tenso machine (MCMS-STM) is proposed in this paper. To deal with mutliclass classification by integrating OVR and OVO strategies to optimization problems, a new multiclas classification mechanism is constructed to use multiple hyperplanes defined by ranktensors instead of a single hyperplane defined by rank-1 tensor of STM, where eac hyperplane is needed to separate samples with specific classes. Furthermore, to classif objects with different sizes, according to positions of objects obtained from detectio results, it is necessary to extract the objects slices with proper sizes rather than the fixe size to reduce the impact of background interferences and maintain inherent scal information of the contained object. These slices with different sizes can be naturall represented as tensors with different dimensions, denoted as multiscale tensors in th paper. Note that the existing STM methods can only process tensor with the same size an cannot process the slices with different sizes represented as multiscale tensors. To dea with input of multiscale tensors, instead of the fixed-dimensional hyperplane used i STM, the M-dimensional hyperplanes are built to separate input of multiscale tensors, an the resulting projecting value is used to predict the class label of input to achieve cross scale object recognition. In addition, to train the OVO version of MCMS-STM efficiently a decomposition algorithm is proposed to split the dual problem of the MCMS-STM int a series of sub-optimizations to accelerate the training.  The remainder of this paper is organized as follows. Section 2 consists of some preliminaries, such as the basic definitions and notions, the classical SVM and STM methods. In Section 3, the OVR version and OVO version of MCMS-STM and the corresponding solving methods are presented. Then, the decomposition algorithm is constructed to Remote Sens. 2022, 14,196 4 of 28 accelerate the training of the OVO version of MCMS-STM, and the relationship between the multiclass classification mechanism used in MCMS-STM and the existing methods is discussed. In Section 4, the experiments are conducted on publicly RSIs to analyze the parameter setting and the impact of image resizing operation and evaluate the performance of the MCMS-STM. Our conclusion is given in Section 5.

Preliminaries and Related Work
Before presenting the MCMS-STM, the notations, abbreviations, the basic tensor algebra used throughout this paper, and the traditional SVM and STM are introduced briefly as follows.

Notations, Abbreviations, and Tensor Operation
Tensor is the expansion of vector and matrix to the higher dimension, and tensor algebra, also known as multilinear algebra, is the extension of linear algebra to multiway data. To distinguish between tensors, matrices, and vectors, according to the convention in [33], the used symbols throughout this paper are summarized in Table 1. Table 1. The symbols and their corresponding description used in this paper.

Symbol Description
lowercase letters (e.g., x,y) scalar lowercase boldface letters (e.g., x,y) vector uppercase boldface letter (e.g., M) matrix calligraphy letter (e.g., X ) tensor Then, we summarize all the notations and abbreviations used throughout this paper in Table 2.
Remote Sens. 2022, 14, 196 5 of 28 The inner product between tensors A ∈ R I 1 ,...,I M , B ∈ R I 1 ,...,I M is defined as the sum of the products of their corresponding entries, i.e., Definition 2. Mode-k product of tensor: Given tensor A ∈ R I 1 ,...,I M and matrix B ∈ R I k ,I k , the mode-k product between A and B is denoted as A × k B, whose results are tensor C ∈ R I 1 ×...×I k ×...×I M , as calculated by Equation (3).
Definition 3. Outer product: The outer product of M vectors (i.e., a m ∈ R I m M m=1 ) is denoted as A = a 1 • . . . • a M , and the corresponding results are represented as an M-order tensor A ∈ R I 1 ,...,I M , whose entry with coordinate (i 1 , . . . , i M ) is calculated by

Classical Binary and Multiclass Support Vector Machine
Consider bi-category classification task, given training set containing N samples, i.e., , where x i ∈ R I 1 and y i ∈ {1, −1} denote the input vector of ith sample and the corresponding label, respectively. SVM aims at learning the parameters of classification hyperplane with the largest classification margin from the training set, which can be drawn using the following optimization problem.
where w, b, ξ i and C denote the normal vector for classification hyperplane, the bias, the slacking variable, and the regularization parameter, respectively. To obtain the optimal w, the dual problem can be drawn as follows. The detailed derivation can refer to [17].
where a denotes the Lagrangian multiplier. Q is a positive semidefinite matrix, where Q(i, j) = y i y j x T i x j . After solving the optimal α, the w can be calculated by Equation (7).
Then, the label of test sample x can be predicted by Equation (8).
where sign(·) denotes the sign function. The standard SVM can only deal with the bi-category classification problem, while most practical applications can be regarded as multiclass classification problems. Facing this situation, the multiclass SVM, i.e., an extension of SVM in multiclass classification problems, is established to identify samples with different classes.
Given the training set {x i , y i }| N i=1 , where x i and y i ∈ {1, . . . , M} denote the input vector of the ith sample and the corresponding class label. The multiclass SVM aims at learning M classification hyperplanes simultaneously by solving the following optimization problem.
where w m | M m=1 , b m and ξ m i denote the normal orientation of M classification hyperplanes, the mth bias, the slacking variable, respectively. The exact solving method can refer to [22].
According to the resulting w m | M m=1 and b m , the class label of test sample x can be predicted using the following decision function.

Support Tensor Machine
To better use the structural information of input data represented as a tensor, the STM is extended from SVM to directly separate input in tensor space.

The given training set containing
denotes the input tensor and y i ∈ {1, −1} denotes the corresponding label. Compared to SVM that separates input in vector space, the STM utilizes the mode-k product to process input in tensor space. The corresponding optimization problem of STM is given in Equation (11).
where w j ∈ R I j denotes the projection vector along with the jth mode of the input tensor. The W = w 1 • . . . • w m ∈ R I 1 ,...,I M denotes the projection tensor used to indicate the orientation of the normal orientation for the classification hyperplane. Since the parameters of STM w 1 • . . . • w m ∈ R I 1 ×,...,×I M is the M-order tensor, it is difficult to be solved directly. In this way, the alternating optimization scheme is introduced to solve the above optimization, i.e., optimize each w i k in turn by fixing the rest w i | i =i k in the kth iteration, where i k = mod(k, m) + 1 denotes the index of order to be optimized. The optimization problem in kth iteration is equivalent to that in SVM (see Equation (5)) it can be solved by using SVM solver. After the alternating optimization procedure is terminated, the optimal parameters w j j and b can be utilized to predict the label of the test sample X , as shown below.

Multiclass Multiscale Support Tensor Machine
As discussed above, the standard STM fails to deal with multiclass classification, and it requires resizing preprocessing to generate slices of the fixed size to comply with its input requirements, causing loss of scale information or increase of background interferences. To address multiclass multiscale object recognition in RSIs, the MCMS-STM is proposed by solving simultaneously hyperplanes of multiple dimensions defined by multiscale rank-R tensors to directly classify objects with different classes and sizes, as illustrated in Figure 2.
parameters w j j and b can be utilized to predict the label of the test sample  , as shown below.

Multiclass Multiscale Support Tensor Machine
As discussed above, the standard STM fails to deal with multiclass classification, and it requires resizing preprocessing to generate slices of the fixed size to comply with its input requirements, causing loss of scale information or increase of background interferences. To address multiclass multiscale object recognition in RSIs, the MCMS-STM is proposed by solving simultaneously hyperplanes of multiple dimensions defined by multiscale rank-R tensors to directly classify objects with different classes and sizes, as illustrated in Figure 2.

Construction of Multiclass Multiscale Support Tensor Machine
Starting from the standard STM, the classifier is extended gradually from two aspects, i.e., multiclass and multiscale, to form the proposed MCMS-STM.

Extend STM to Deal with Multiclass Classification
Consider the training set {X i , y i } N i=1 set with N samples of M classes, where X i and y i ∈ {1, 2, . . . , M} denote the L-order tensor representation of i-th image slice and the corresponding class label, respectively. Note that the standard STM can process input represented as a tensor, while it cannot process multiclass classifications directly. In contrast, the multiclass SVM can deal with multiclass classifications, while it cannot process input represented as a tensor. To achieve multiclass classification in tensor space, a straightforward idea is to construct the following optimization problem to learn parameters of multiple hyperplanes in tensor space simultaneously by integrating the merits of STM (see Equation (11)) and multiclass SVM (see Equation (9)).
where w 1 m • . . . • w L m ∈ R I 1 ,I 2 ,...,I L denotes the mth projection tensor used to determine the normal orientation of mth hyperplane. Compared with the optimization problem of multiclass SVM (see Equation (9)), the optimization problem in Equation (13) allows the tensor X i as input by constructing projection tensor W m with the same dimensions as X i , while more detailed modifies are essential to improve the performance of multiclass classification.
Remote Sens. 2022, 14, 196 8 of 28 According to the definition of tensor rank, e.g., CANDECOMP/PARAFAC (CP) rank [34], the tensor W m belongs to the rank-1 tensor. Referring to the study in [28,29] that the single rank-1 tensor cannot be used to describe the classification hyperplane accurately, it is considered to utilize multiple rank-1 tensors to improve the effect of classification, as shown in Equation (14).
where R denotes the number of rank-1 projecting tensors. In Equation (14), the hyperplane is defined by the sum of R rank-1 projection tensors (i.e., the rank-R projecting tensor W m ). The illustration of hyperplanes defined by rank-1 projecting tensor and hyperplanes defined by rank-R tensor is shown in Figure 3. It is seen that the linear combination of rank-1 projecting tensors can be used to obtain the hyperplane with more orientations, which indicates that the rank-R projection tensor is likely to obtain a more effective hyperplane.
Remote Sens. 2022, 14, x FOR PEER REVIEW 9 of 32 Figure 3. Illustration of classification hyperplane defined by rank-1 projection tensor and rank-R projection tensor. The bule solid line denotes the hyperplane defined by rank-1 or rank-R tensor, and the blue dotted line denotes the hyperplane defined by rank-R tensor.
Note that the multiclass classification mechanism of Equation (14) is equivalent to that of multiclass SVM (see Equation (9)). In other words, the multiclass classification mechanism of Equation (14) can be described as an OVR strategy that solves M hyperplanes simultaneously, of which mth hyperplane defined by  m is used to separate class m from the other. Considering the related research that the OVO strategy based SVM always achieves better results than OVR strategy based SVM, it is taken into consideration to embed OVO strategy to tensor space, of which the hyperplane defined by projection tensor  ' m ,m is used to separate samples from class m and samples from class ' m . In this way, the optimization problem in Equation (14) can be changed to the optimization problem in Equation (15).   . Illustration of classification hyperplane defined by rank-1 projection tensor and rank-R projection tensor. The bule solid line denotes the hyperplane defined by rank-1 or rank-R tensor, and the blue dotted line denotes the hyperplane defined by rank-R tensor.
Note that the multiclass classification mechanism of Equation (14) is equivalent to that of multiclass SVM (see Equation (9)). In other words, the multiclass classification mechanism of Equation (14) can be described as an OVR strategy that solves M hyperplanes simultaneously, of which mth hyperplane defined by W m is used to separate class m from the other. Considering the related research that the OVO strategy based SVM always achieves better results than OVR strategy based SVM, it is taken into consideration to embed OVO strategy to tensor space, of which the hyperplane defined by projection tensor W m,m is used to separate samples from class m and samples from class m . In this way, the optimization problem in Equation (14) can be changed to the optimization problem in Equation (15). m,m and b m,m denote the normal orientation of the hyperplane used to separate class m from class m and the corresponding bias. Since OVR and OVO strategies have their own advantages, these two classification mechanisms are used as different versions of MCMS-STM to cope with different classification tasks.

Classify the Multiscale Objects without Image Resizing Preprocessing
For the objects to be classified presenting different sizes, the convention manner is to adjust slices of different sizes into the fixed size, whereas this operation is easy to lead to the loss of scale information or increase of background interferences.
To maintain the scale information of objects and reduce the impact of background interferences, it is needed to extract slices with proper size to cover the object with a specific class, as shown in Figure 4. Assuming that each class of object presents a specific size, it is necessary to extract M sizes of slices of different sizes. Even for the same object, M slices of different sizes still need to be extracted because the category of the test sample is unknown (see Figure 4b). Therefore, different from the conventional manner that each object is described as a fixed-size slice, for each object, it is needed to extract M sizes of slices repre- . In this way, a multiscale sample set X where the projection tensor with specific dimension is used to separate input slice with specific scale. In this way, the classification model in Equation (15) can be further extended to identify the multiscale objects without image resizing operation, and the corresponding optimization problem is shown in Equation (16).
Equation (16) is the final optimization problem of OVO version of MCMS-STM. For OVR version of MCMS-STM, the final optimization problem is given directly as follows.
Note that the OVR version of MCMS-STM only uses M hyperplanes defined by W m = to separate samples.
On the one hand, the proposed MCMS-STM can use multiscale tensors to capture the category and scale information of objects and utilize grouped rank-R tensors to improve the effectiveness of multiclass classification problems. On the other hand, it can be found that the optimization problem of MCMS-STM is more complex than the binary STM, which needs to establish a specific solving method to obtain the optimal parameters. Note that the OVR version of MCMS-STM only uses M hyperplanes defined by On the one hand, the proposed MCMS-STM can use multiscale tensors to capture the category and scale information of objects and utilize grouped rank-R tensors to improve the effectiveness of multiclass classification problems. On the other hand, it can be found that the optimization problem of MCMS-STM is more complex than the binary STM, which needs to establish a specific solving method to obtain the optimal parameters.

Solving of the Optimization Problem for MCMS-STM
Since the procedures of training OVO and OVR versions of MCMS-STM are similar, we only give the solving methods for the OVO version of the MCMS-STM in this section and put the solving methods for the OVR version of the MCMS-STM in the Appendix A.
To solve the optimization problem in Equation (16) effectively, an alternating optimization scheme [25] is adopted for model training, i.e., optimize are initialized using uniformly distributed random numbers within 0 to 1. Then, we can get the Lagrangian function for alternating optimization in the kth iteration as follows.
where α m i and β m i denote the dual variables, and α y i i and β y i i denote the dummy dual variables. Let the partial derivatives of L w

Solving of the Optimization Problem for MCMS-STM
Since the procedures of training OVO and OVR versions of MCMS-STM are simi we only give the solving methods for the OVO version of the MCMS-STM in this sect and put the solving methods for the OVR version of the MCMS-STM in the Appendix To solve the optimization problem in Equation (16) are initialized using uniformly distribu random numbers within 0 to 1. Then, we can get the Lagrangian function for alternat optimization in the kth iteration as follows.

Solving of the Optimization Problem for MCMS-STM
Since the procedures of training OVO and OVR versions of MCMS-STM are simil we only give the solving methods for the OVO version of the MCMS-STM in this secti and put the solving methods for the OVR version of the MCMS-STM in the Appendix To solve the optimization problem in Equation (16) are initialized using uniformly distribut random numbers within 0 to 1. Then, we can get the Lagrangian function for alternati optimization in the kth iteration as follows.
Substituting Equation (19) and Equation (20) into Equation (18) yields the following Lagrangian dual problem. where For convenience, rewrite the dual problem in the matrix form as follows.
, and where It is found from Equation (23) that the dual problem of MCMS-STM in kth iteration is the quadratic optimization in terms of a, which can be solved by classical quadratic programing method (e.g., interior-point method [35]). According to the resulting a, the w (r,l iter ) m,m can be updated using Equation (19).
Note that the values of the objective function (see Equation (16)) in each iteration form a bounded and nonincreasing sequence. Therefore, there is a finite limit for that sequence. To stop iterating at the proper time, the termination criteria are constructed in Equation (28).
where w  [36], including complementary slackness, dual feasibility, and the equivalent constraints in Equation (19) and Equation (20), we obtain Combined with Equation (29), the equations and inequalities in terms of b can be drawn in following: where When the optimal solution is reached, the above equations and inequalities hold. Therefore, b can be calculated by following linear programming.
where u denote the slacking variables used to maintain the feasibility of linear programming. Using the classical simplex method [37], the Equation (31) can be solved to obtain the optimal b. At present, the training procedure of MCMS-STM has been finished.
After training MCMS-STM, for any test sample X m | M m=1 , the following decision function is used to predict the corresponding label.
where operator #{·} denotes the number of elements in the set. Note that there may be multiple candidate classes that maximize the Equation (32). For this situation, just like the operation in [20], we simply select the candidate class with the smallest label as the predicted class.

Acceleration of Training of OVO Version of MCMS-STM
In this section, combining the existing decomposition algorithm [38], a MCMS-STM-oriented decomposition algorithm is presented to break the complex quadratic programming (QP) for dual problem of MCMS-STM (see Equation (23)) into a series of simple analytic QP problems to train the OVO version of MCMS-STM efficiently.
To solve the QP in Equation (23) efficiently, based on the idea of a general decomposition algorithm, the dual variables a are split into working set and non-working set, where variables in the working set are updated, and those in the non-working set are fixed in each iteration. To select the proper variables as the working set, a graph-based constraints model is illustrated in Figure 5 to demonstrate the constraints in Equation (23) clearly.
where operator { } # ⋅ denotes the number of elements in the set. Note that there may be multiple candidate classes that maximize the Equation (32). For this situation, just like the operation in [20], we simply select the candidate class with the smallest label as the predicted class.

Acceleration of Training of OVO Version of MCMS-STM
In this section, combining the existing decomposition algorithm [38], a MCMS-STMoriented decomposition algorithm is presented to break the complex quadratic programming (QP) for dual problem of MCMS-STM (see Equation (23)) into a series of simple analytic QP problems to train the OVO version of MCMS-STM efficiently.
To solve the QP in Equation (23) efficiently, based on the idea of a general decomposition algorithm, the dual variables a are split into working set and nonworking set, where variables in the working set are updated, and those in the nonworking set are fixed in each iteration. To select the proper variables as the working set, a graph-based constraints model is illustrated in Figure 5 to demonstrate the constraints in Equation (23) clearly.
where y (j,i) k = 1 and y (j,i) k = −1. It is noticed that the equivalent constraints in Equation (33) are equal to those in C-SVM (see Equation (6)). Therefore, similar to the variables selection strategy in C-SVM [38], the two variables that maximize the following objective function are selected as a working set.  it indicates that there are no proper working variables in a i j and a i j to decrease the f (a), otherwise, select the α as the working set and the updating direction, respectively. Then, solve the following optimization problem to determine the optimal step size for updating.
where d step , a B and a N denote the step-size used for updating variables, the variables in working set, and the variables in non-working set, respectively. The Q BB , Q BN Q NB , Q NN is the permutation of the Q according to the selected working set. To obtain the optimal d step , let the derivative of the objective function be zero. We have where d * step denotes the optimal step size. To avoid the updated a B lie outside the range from 0 to C, the final step size is determined by where d step denotes the final step size. Use a B = a B + d step × [d 1 , d 2 ] to update the variables in working set. Then, use Equation (34) to re-select the working set for updating in the next iteration.
This process is performed alternatively until the following terminal criteria are met, i.e., min After the terminal criteria are satisfied, the current a is optimal. The decomposition algorithm for solving the dual problem of the OVO version of the MCMS-STM is summarized in Algorithm 1.

Algorithm 1. Decomposition algorithm of solving the dual problem of OVO version of the MCMS-STM.
Input: the Q of the dual problem of MCMS-STM and the labels y i 1 ≤ i ≤ N. Output: optimal a for vertex in ∀(i, j), i = j Step 1: Select working set using Equation (34).
Step 2: Calculate the step size using Equations (36) and (37), and update the variables in working set.
Step 3: If the terminal criteria are met, output the a as optimal solution, and return. end

Discussion of the Multiclass Classification Mechanism Used in MCMS-STM
To analyze the multiclass classification mechanism used in MCMS-STM compared with the existing OVO and OVR strategy, a detailed discussion is carried out as follows.

Discussion of OVR Version of MCMS-STM Compared with OVR Strategy Based STM
Considering the hard margin (i.e., the C = ∞) based C − STM [25] with OVR strategy, it is required to construct M classifiers, where the mth classifier is used to separate the class m from the other M-1 classes, as shown in the following optimization problem.
where w m denotes the normal orientation for the mth hyperplane. Then, let R = 1 and C = ∞ for OVR version of MCMS-STM (see Equation (17)), the corresponding optimization problem can be converted to Equation (39).
By adding the two inequalities in Equation (38), it is easy to find that the feasible solution of Equation (38) is also the feasible solution of Equation (39). Thus, the solution of Equation (38) is the feasible solution of Equation (39), but not necessarily the optimal one. That means the MCMS-STM may obtain the optimal solution with a larger classification margin compared with C − STM using OVR strategy, i.e., the better generalization ability.

Discussion of OVO Version of MCMS-STM Compared with OVO Strategy Based STM
When using OVO strategy [20] where w According to Equation (42), the constraints in Equation (41) can be converted to the same form as the constraints in Equation (40). Note that the feasible solutions of different methods can be transformed into each other using simple operations. Therefore, the MCMS-STM has the OVO interpretation.
Through the above analysis, the MCMS-STM present both the OVR and OVO interpretations under the specific parameter setting. More importantly, compared with the OVR and OVO strategies that need to learn multiple classification hyperplanes separately, a remarkable advantage is that the MCMS-STM can learn the multiple classification hyperplanes simultaneously to mine the correlation between multiple classes.

Experiments and Analysis
To demonstrate the superiority of the MCMS-STM for multiclass multiscale object recognition, two datasets containing image slices of multiclass multiscale objects are used to evaluate the performance of the proposed MCMS-STM, and the detailed information of datasets is introduced as follows.
(1) Dataset 1: To verify the performance of MCMS-STM for multiclass multiscale airplane classification, the RSIs containing 218 airplanes with five types are collected from Google Earth service with a spatial resolution of 0.5 m and R, G, and B spectral bands. Then, using two image resizing operations, these 218 airplanes are cut separately from RSIs to build two slice sets. For slices set 1 generated from image resizing operation 1, the slices are cut according to the type of contained objects and then resized to a fixed size using a bilinear interpolation method. For slices set 2 generated from image resizing operation 2, these slices are cut at a size large enough (i.e., 120 × 120) so that all the types of objects are contained completely in the corresponding slice. These slices contain various backgrounds, and the contained multiclass airplanes present different orientations and sizes. Some representative slices from two slices sets of dataset 1 are displayed in Figure 6. (2) Dataset 2: The HRSC-2016 [39] dataset contains 1070 harbor RSIs with R, G, and B spectral bands collected from Google Earth service. To evaluate the performance of MCMS-STM for multiscale object recognition, 342 ships with five types are sliced from HRSC-2016 whose spatial resolution is equal to 1.07 m. Similar to dataset 1, these slices are cut by two image resizing operations to form two slices set. For slices set 1 generated from image resizing operation 1, the slices are cut according to the type of contained objects and then resized to a fixed size using a bilinear interpolation method. For slices set 2 generated from image resizing operation 2, these slices are cut at a size large enough (i.e., 770 × 170) to contain airplanes with different types. Some image slices with different types of ships in two slices sets of dataset 2 are shown in Figure 7.
Remote Sens. 2022, 14, x FOR PEER REVIEW 19 of 32 multiclass airplanes present different orientations and sizes. Some representative slices from two slices sets of dataset 1 are displayed in Figure 6. (2). Dataset 2: The HRSC-2016 [39] dataset contains 1070 harbor RSIs with R, G, and B spectral bands collected from Google Earth service. To evaluate the performance of MCMS-STM for multiscale object recognition, 342 ships with five types are sliced from HRSC-2016 whose spatial resolution is equal to 1.07 m. Similar to dataset 1, these slices are cut by two image resizing operations to form two slices set. For slices set 1 generated from image resizing operation 1, the slices are cut according to the type of contained objects and then resized to a fixed size using a bilinear interpolation method. For slices set 2 generated from image resizing operation 2, these slices are cut at a size large enough (i.e., 770 170 × ) to contain airplanes with different types. Some image slices with different types of ships in two slices sets of dataset 2 are shown in Figure 7.  (2). Dataset 2: The HRSC-2016 [39] dataset contains 1070 harbor RSIs with R, G, and B spectral bands collected from Google Earth service. To evaluate the performance of MCMS-STM for multiscale object recognition, 342 ships with five types are sliced from HRSC-2016 whose spatial resolution is equal to 1.07 m. Similar to dataset 1, these slices are cut by two image resizing operations to form two slices set. For slices set 1 generated from image resizing operation 1, the slices are cut according to the type of contained objects and then resized to a fixed size using a bilinear interpolation method. For slices set 2 generated from image resizing operation 2, these slices are cut at a size large enough (i.e., 770 170 × ) to contain airplanes with different types. Some image slices with different types of ships in two slices sets of dataset 2 are shown in Figure 7. The experiments are composed of four parts. In Section 5.1, the impact of parameter setting is analyzed. In Section 5.2, the affection of image resizing preprocessing for the recognition results is examined. In Section 5.3, the efficiency of the proposed decomposition algorithm is verified compared with the typical interior-point method and active-set method. In Section 5.4, the recognization accuracy of MCMS-STM is evaluated compared with typical SVM and STM methods using dataset 1 and dataset 2. In Section 5.5, the performance of MCMS-STM is further evaluated compared with typical deep The experiments are composed of four parts. In Section 5.1, the impact of parameter setting is analyzed. In Section 5.2, the affection of image resizing preprocessing for the recognition results is examined. In Section 5.3, the efficiency of the proposed decomposition algorithm is verified compared with the typical interior-point method and active-set method. In Section 5.4, the recognization accuracy of MCMS-STM is evaluated compared with typical SVM and STM methods using dataset 1 and dataset 2. In Section 5.5, the performance of MCMS-STM is further evaluated compared with typical deep learning methods. All the simulations are running on the computer with Windows 10 operating system and Intel i7-7700CPU at 3.6 G Hz.

Analysis of the Impact of Parameter Setting on Classification Performance
The main parameters in MCMS-STM include the R and C, which denote the CP rank of the projection tensor and the regularization parameter, respectively. To demonstrate the impact of parameter setting, the N-fold cross-validation method is adopted to partition the slices set 1 of dataset 1 into N subsets, and test the recognization accuracy using the one subset according to the trained classifier using the remaining N-1 subsets. This process is then repeated N times, with each of the N subsets used exactly once as the test samples. Through adjusting the values of R from {1, 2, 3, . . . , 10} and C from 10 0 , 10 1 , 10 2 , the obtained classification accuracies of different versions of MCMS-STM are shown in Figure 8.
The experiments are composed of four parts. In Section 5.1, the impact of parameter setting is analyzed. In Section 5.2, the affection of image resizing preprocessing for the recognition results is examined. In Section 5.3, the efficiency of the proposed decomposition algorithm is verified compared with the typical interior-point method and active-set method. In Section 5.4, the recognization accuracy of MCMS-STM is evaluated compared with typical SVM and STM methods using dataset 1 and dataset 2. In Section 5.5, the performance of MCMS-STM is further evaluated compared with typical deep learning methods. All the simulations are running on the computer with Windows 10 operating system and Intel i7-7700CPU at 3.6G Hz.

Analysis of the Impact of Parameter Setting on Classification Performance
The main parameters in MCMS-STM include the R and C, which denote the CP rank of the projection tensor and the regularization parameter, respectively. To demonstrate the impact of parameter setting, the N-fold cross-validation method is adopted to partition the slices set 1 of dataset 1 into N subsets, and test the recognization accuracy using the one subset according to the trained classifier using the remaining N-1 subsets. This process is then repeated N times, with each of the N subsets used exactly once as the test samples. From Figure 8, it is observed that the recognization accuracy of MCMS-STM is affected by different parameter settings. In detail, it is seen that the increase of C may have a positive or negative effect on recognization accuracy for different values of R. Therefore, it is difficult to set an effective C in advance. In addition, for the case of using the same C, MCMS-STM with R < 3 obtains smaller recognization accuracy than that with 3 ≥ R , From Figure 8, it is observed that the recognization accuracy of MCMS-STM is affected by different parameter settings. In detail, it is seen that the increase of C may have a positive or negative effect on recognization accuracy for different values of R. Therefore, it is difficult to set an effective C in advance. In addition, for the case of using the same C, MCMS-STM with R < 3 obtains smaller recognization accuracy than that with R ≥ 3, because the projection tensor with a small R is difficult to define an accurate classification hyperplane. Through observing Figure 8, it is found that the best recognization accuracy is obtained by OVO version of the MCMS-STM. The reason is probably that OVO version of the MCMS-STM may obtain a larger classification margin compared with OVR version of the MCMS-STM.

Analysis of the Impact of Image Resizing on Classification Performance
One of the significant advantages of the MCMS-STM method is that it can use multiscale projection tensors to effectively classify objects with different sizes, avoiding the loss of objects' scale information caused by image resizing. To highlight the superiority of MCMS-STM, two slices sets obtained by different image resizing methods are utilized to examine the recognization accuracy of the MCMS-STM with multiscale projection tensors and single-scale projection tensors. The experiments are implemented in three control groups. In detail, for control group 1, the MCMS-STM with singlescale projection tensors and slices set 1 are utilized to obtain the classification results. For control group 2, the MCMS-STM with single-scale projection tensors and slices set 2 are utilized to obtain the classification results. For control group 3, the MCMS-STM with multiscale projection tensors and the slices set 1 are utilized to obtain the classification results. To use multiscale projection tensors, it is needed to decide the sizes of multiscale tensors. In the experiments, the sizes of slices for five types of ships are set to 770 × 170, 360 × 50, 400 × 70, 550 × 140 and 290 × 70, respectively, according to the average sizes of objects for the different types. The obtained classification results for MCMS-STM under different conditions are plotted in Figure 9. The accuracy in Figure 9 denotes the largest accuracy with parameters C and R For MCMS-STM, the OVO version indicates the larger classification margin, but at the same time, it will also cause misclassification because the decision function (see Equation (32)) may generate multiple results. Therefore, for dataset 2, it is observed that the recognization accuracy of OVR version of MCMS-STM is higher than that of OVO version of MCMS-STM. In addition, note that the resizing operation used to generate slices set 1 will lead to the loss of object's scale information, and the resizing operation used to generate slices set 2 will bring excessive background interferences, i.e., the two resizing operations have their own disadvantages. In comparison, the multiscale projection tensors used in MCMS-STM can maintain the scale information of objects while avoiding excessive background interferences. Therefore, it is observed that the recognization accuracy of MCMS-STM without resizing operations is better than using MCMS-STM with resizing operations. The above analysis concludes that the MCMS-STM with multiscale projection tensors is effective for multiscale object recognition tasks.

Evaluation of the Performance of the Decomposition Algorithm for Training MCMS-STM
To verify the efficiency of the proposed decomposition algorithm for training OVO version of MCMS-STM, the experiments are conducted on dataset 1 to evaluate the time The accuracy in Figure 9 denotes the largest accuracy with parameters C and R selected from C = 10 0 , 10 1 , 10 2 and R = {2, 4, 6, 8, 10} under different versions of MCMS-STM. For MCMS-STM, the OVO version indicates the larger classification margin, but at the same time, it will also cause misclassification because the decision function (see Equation (32)) may generate multiple results. Therefore, for dataset 2, it is observed that the recognization accuracy of OVR version of MCMS-STM is higher than that of OVO version of MCMS-STM. In addition, note that the resizing operation used to generate slices set 1 will lead to the loss of object's scale information, and the resizing operation used to generate slices set 2 will bring excessive background interferences, i.e., the two resizing operations have their own disadvantages. In comparison, the multiscale projection tensors used in MCMS-STM can maintain the scale information of objects while avoiding excessive background interferences. Therefore, it is observed that the recognization accuracy of MCMS-STM without resizing operations is better than using MCMS-STM with resizing operations. The above analysis concludes that the MCMS-STM with multiscale projection tensors is effective for multiscale object recognition tasks.

Evaluation of the Performance of the Decomposition Algorithm for Training MCMS-STM
To verify the efficiency of the proposed decomposition algorithm for training OVO version of MCMS-STM, the experiments are conducted on dataset 1 to evaluate the time consumption of training MCMS-STM and classical multiclass SVM using different optimization solving algorithms, including the proposed decomposition algorithm, interior-point method [35] and active-set method [40], under different numbers of training samples, as shown in Figure 10. The time consumption in Figure 10 represents the performing time to solve the dual problem of the MCMS-STM or multiclass SVM once. From Figure 10, when using the same optimization algorithm, i.e., interior-point algorithm or active-set algorithm, it can be seen that the time consumption of solving dual problem of MCMS-STM is similar to that of solving dual problem of multiclass SVM. In addition, it is seen that the time consumption of the interior-point algorithm is less than that of the active-set algorithm under different sizes of training sample sets. Remarkably, it can be seen that the proposed decomposition algorithm significantly reduces the time consumption compared with the interior-point method and active-set method for training MCMS-STM, especially for the training set with a larger size. It indicates the efficiency of the proposed decomposition algorithm for training the MCMS-STM, especially for the training set with a large size.

Evaluation of the Performance of MCMS-STM Compared with Existing SVM and STM Methods
In this section, the experiments are conducted on dataset 1 and dataset 2 to evaluate the performance of MCMS-STM for multiclass multiscale airplane classification and ship classification compared with typical SVM and STM methods. To avoid the classification results from being affected by different feature extraction algorithms, the image slices represented as tensors are used as input directly for all the classification methods. In detail, according to the size of different objects, the dimensions of multiscale tensors for airplanes with five types are set to for100 100 3 × × , 80 80 3 × × , 120 120 3 × × , 70 70 3 × × and 80 80 3 × × , respectively, and those for ships with five types are set to 770 170 3 × × , 360 50 3 × × , 400 70 3 × × , 550 140 3 × × and 290 70 3 × × , respectively. The first order, second order and third order for multiscale tensors denote the horizontal spatial order, vertical spatial order, and spectral order, respectively. For the comparison methods, two slice sets from dataset 1 and dataset 2 obtained by different slice cropping methods are utilized to obtain the classification results under two types of image resizing operations, respectively. In addition, note that the SVM and STM methods can only deal with bicategory classification problems. Therefore, the OVO and OVR strategies are introduced The time consumption in Figure 10 represents the performing time to solve the dual problem of the MCMS-STM or multiclass SVM once. From Figure 10, when using the same optimization algorithm, i.e., interior-point algorithm or active-set algorithm, it can be seen that the time consumption of solving dual problem of MCMS-STM is similar to that of solving dual problem of multiclass SVM. In addition, it is seen that the time consumption of the interior-point algorithm is less than that of the active-set algorithm under different sizes of training sample sets. Remarkably, it can be seen that the proposed decomposition algorithm significantly reduces the time consumption compared with the interior-point method and active-set method for training MCMS-STM, especially for the training set with a larger size. It indicates the efficiency of the proposed decomposition algorithm for training the MCMS-STM, especially for the training set with a large size.

Evaluation of the Performance of MCMS-STM Compared with Existing SVM and STM Methods
In this section, the experiments are conducted on dataset 1 and dataset 2 to evaluate the performance of MCMS-STM for multiclass multiscale airplane classification and ship classification compared with typical SVM and STM methods. To avoid the classification results from being affected by different feature extraction algorithms, the image slices represented as tensors are used as input directly for all the classification methods. In detail, according to the size of different objects, the dimensions of multiscale tensors for airplanes with five types are set to for 100 × 100 × 3, 80 × 80 × 3, 120 × 120 × 3, 70 × 70 × 3 and 80 × 80 × 3, respectively, and those for ships with five types are set to 770 × 170 × 3, 360 × 50 × 3, 400 × 70 × 3, 550 × 140 × 3 and 290 × 70 × 3, respectively. The first order, second order and third order for multiscale tensors denote the horizontal spatial order, vertical spatial order, and spectral order, respectively. For the comparison methods, two slice sets from dataset 1 and dataset 2 obtained by different slice cropping methods are utilized to obtain the classification results under two types of image resizing operations, respectively. In addition, note that the SVM and STM methods can only deal with bi-category classification problems. Therefore, the OVO and OVR strategies are introduced to perform multiclass classifications for binary SVM and STM methods indirectly. Considering that the SVM method cannot process tensors directly, the vectorization operation is utilized to convert image slice to vector to comply with its input requirement. Then, all the methods select parameters from C = 10 0 , 10 1 , 10 2 , ν = {0.1, 0.2, 0.3, 0.4} and R = {2, 4, 6, 8, 10} that corresponds to the best classification results to obtain the final recognization accuracy. Using the N-fold cross-validation, the accuracies of MCMS-STM for different classification tasks under different conditions are displayed in Table 3, Table 4, Table 5, and Table 6, respectively. Table 3. The accuracies of different methods for multiscale airplanes recognization using 5-fold cross-validation.

Method
Parameter Setting Accuracy   Table 6. The accuracies of different methods for multiscale ships recognization using 10-fold crossvalidation. In these tables, the bolding accuracy indicates the best results, and the notation C − SV M (OVO) 1 denotes using OVO strategy based C − SV M under slice set 1. Overall, note that the classification results for 10-fold cross-validation are better than those for 5-fold cross-validation, mainly because the 10-fold cross-validation indicates that more samples are used for training. From these tables, it is observed that applying OVO strategy based comparison methods present better classification results than applying OVR strategy based comparison methods because the OVO is a competitive multiclass classification strategy compared with OVR strategy. For N-fold cross-validation, by comparing the classification accuracies between N = 5 and N = 10, it can be found that there is a minor difference between different N for STM methods and a significant difference between different N for SVM methods. The reason is that the SVM method is embedded with much more parameters, and thus it is easy to improve performance as the number of samples increases. In addition, it can be seen that the recognization accuracy of multiclass SVM is better than most comparison methods under OVR or OVO strategy. Since the two resizing operations have their own advantages and disadvantages, i.e., the first type of resizing operation can reduce the impact of background interferences, while it loses the scale information of object, and the image the second type of resizing operation can maintain the scale information of objects, while it brings much more interferences for objects with small size, it is observed that using the slices set 1 obtains the largest accuracy for ship recognization under 10-fold cross-validation (see Table 6), and using the slices set 2 obtains the largest accuracy for airplane recognization under 10-fold cross-validation (see Table 4). That means two image resizing operations present comparable results. Moreover, it is seen that the optimal R that corresponds to the largest accuracy is located at the range from 4 to 8, as the projection with small R is difficult to describe the effective classification hyperplane, and the projection tensor with large R is easy to over-fit the training samples. It is worth observing that the MCMS-STM gets the best classification results among all the methods for different classification tasks. In detail, MCMS-STM gets 89.5% and 91.4% recognization accuracy for airplane recognization and ship recognization, respectively, while the largest accuracies of comparison methods for airplane recognization and ship recognization are equal to 88.6% and 89.9%, respectively. Therefore, it is concluded that the MCMS-STM is more effective for multiclass multiscale object recognition using remote sensing images compared with the comparison methods.

Method
In addition, to verify whether the MCMS-STM makes improvements from a statistical view, the statistical test is applied to analyze the performance of the MCMS-STM against the SVM and STM methods.
For convenience, the most commonly applied test method, i.e., right-tailed significance t-test [41], is used to test the null hypothesis H 0 that the difference of accuracy between MCMS-STM and the competitor is equal to zero. On the other hand, the alternate hypothesis H 1 is that the difference of accuracy between MCMS-STM and the competitor is greater than zero.
To obtain effective results, half of samples in dataset 1 are used to train a classifier, and the rest of the samples in dataset 1 are used to evaluate the recognization accuracy of trained classifier for five classes of airplanes. The same processing is performed for dataset 2. In this way, ten paired observations are obtained for the sign test. According to obtained ten paired observations, use paired-sample right-tailed significance t-test [41], and the resulting p-value for MCMS-STM against the other methods is shown in Table 7. Table 7. The accuracies of different methods using the second type of normalization operation for multiscale airplanes classification using 10-fold cross-validation.

Method
Positive Where positive, negative, ties, and p-value denote the number of times that the MCMS-STM outperforms the comparison method, the number of times that the comparison method outperforms the MCMS-STM, the number of times that both the MCMS-STM and comparison method obtain the same results, and the probability of the observation under H 0 . From Table 7, it is found that all the p-values are less than 0.05. Therefore, the null hypothesis can be rejected at the 0.05 significance level for all the comparison methods. It indicates that the proposed MCMS-STM outperforms the existing SVM and STM methods from the view of statistics.

Evaluation of the Performance of MCMS-STM Compared with Deep Learning Methods
Compared with deep learning methods relying on a large number of training samples, the proposed MCMS-STM can recognize multiclass multiscale objects effectively using a small number of training samples. To verify this advantage, the experiments are conducted on datasets 1 and 2 to compare the recognition accuracies of the MCMS-STM with two deep learning methods, i.e., GhostNet [42] and ResNeXt networks [43], of which the GhostNet utilizes the Ghost module to improve the feature representation power of the conventional convolutional layer and the ResNeXt improved from deep residual networks can exploit the aggregated residual transformations to mine the effective features to improve the recognization results. When training the GhostNet and ResNeXt, the learning rate and the batch size are set to 0.001 and 128, respectively. Since these two deep learning methods can only deal with slices with a fixed size, the slice sets 1 from two datasets are used to examine their performances. For MCMS-STM, the multiscale projection tensors are exploited, and the detailed sizes of projection tensors can be found in Section 5.4. To compare the recognization results of the proposed OVO version of MCMS-STM with the deep learning methods under different sizes of training sets, different proportions of samples in the dataset are selected as training samples and the rest as test samples, and the obtained recognization results are shown in Tables 8 and 9. Where the notation p in Tables 8 and 9 denotes the proportion of the training samples occupying the data set. From the above tables, it is observed that the recognization accuracy is generally improved with the increase of training set size because more training samples can ensure sufficient training of the classifier. In addition, it is seen that the GhostNet obtains better results than ResNeXt under the same p. It indicates that the GhostNet can extract more effective features using its Ghost module. Since the deep learning methods rely on a large number of training samples, they obtain worse results under a small p. Remarkably, it is worth noting that the proposed MCMS-STM obtains significantly better results compared to deep learning methods, especially when the training samples are small. This funding implies the superiority of the proposed MCMS-STM for the small sample case compared with the deep learning methods.

Conclusions
To classify the multiclass multiscale objects in RSIs effectively, the MCMS-STM is proposed incorporating multiple hyperplanes defined by multiscale projection tensors to map the input of object slices with different sizes to multiclass class space, getting rid of the conventional image resizing operation. The main contributions of this paper can be summarized in the three folds below.
(1) To achieve multiclass classifications for objects in RSIs, the MCMS-STM is proposed to learn multiple hyperplanes defined by rank-R projection tensors simultaneously to map input represented as tensor into class space. This new multiclass classification mechanism makes it easy to construct the corresponding decomposition algorithm to accelerate the training of the MCMS-STM and enables the classifier to present OVO and OVR interpretations, ensuring the MCMS-STM can deal with different classifications tasks effectively. (2) To identify multiscale objects in RSIs, instead of the conventional image resizing operation, according to the object position obtained from detection results, multiple slices of different sizes are extracted to describe the contained object with unknown class, and multidimensional classification hyperplanes are established to separate input of multiple slices with different sizes to achieve cross-scale object recognition. This multiscale classification mechanism can avoid the loss of scale information and reduce the impact of background interferences caused by conventional image resizing preprocessing.  Substituting Equation (19) and Equation (20) into Equation (18)

Solving of the Optimization Problem for MCMS-STM
Since the procedures of training OVO and OVR versions of MCMS-STM a we only give the solving methods for the OVO version of the MCMS-STM in t and put the solving methods for the OVR version of the MCMS-STM in the Ap To solve the optimization problem in Equation (16) effectively, an optimization scheme [25] is adopted for model training, i.e., Substituting Equation (19) and Equation (20) into Equation (18)   (A5) Using quadratic programing method (e.g., interior-point method [35]), the optimization problem in Equation (A4) can be solved. Then, according to the resulting α, the w (r,l iter ) mh can be updated using Equation (A2). To stop iterating at the proper time, the termination criteria are constructed in Equation (A6).
where w Then, the equations and inequalities in terms of b can be drawn in following: where p m i = ∑ r X (y i ) i L ∏ l=1 × l w (r,l) When the optimal solution is reached, the above equations and inequalities hold. Therefore, b can be calculated by following linear programming using the classical simplex method [37].
After training MCMS-STM, for any test sample X m | M m=1 , the following decision function is used to predict the corresponding label. (A10)

Introduction
The diverse types and sizes of objects bring a challenge to object recognition in remote sensing images (RSIs). Recently, mainly owing to its powerful feature abstraction ability, various deep learning technologies have achieved impressive success in different object recognition tasks, such as Fast R-CNN [1], Faster R-CNN [2], local attention based CNN [3], 2 CNN [4], SSD [5] and YOLO [6]. In addition to object recognition, deep learning-based methods are widely used for a wide variety of classification tasks based on remote sensing data acquired by different sensors, such as graph convolution neural networks [7] based hyperspectral image classification [8], and multimodal deep learningbased multisource image classification [9]. Despite the recent advances, deep learningbased methods rely heavily on massive available labeled samples. In comparison, the machine learning-based object recognition method can obtain effective results using a small number of samples.