1. Introduction
In machine learning, data classification plays a very important role. Up to now, a large number of data classification methods have emerged and powered the development of machine learning and its practical applications in different domains [
1], such as image detection [
2], speech recognition [
3], text understanding [
4], disease diagnosis [
5,
6], and financial prediction [
7].
Currently, popular data classification methods include the Support Vector Machine (SVM) [
8,
9], Decision Tree (DT) [
10,
11], Naive Bayes (NB) [
12], K-Nearest Neighbors (KNN) [
13], Random Forest (RF) [
14], Deep Learning (DL) [
15], and Deep Reinforcement Learning (DRL) [
16]. SVM is based on optimization theory [
17]. DL is implemented through a multilayer neural network under the guidance of optimization techniques, such as the stochastic gradient descent algorithm [
18]. DRL combines DL with Reinforcement Learning, and it is effective in real-time scenarios [
16]. The others fall in the title of statistical methods [
19,
20].
Many comparative studies are employed to evaluate these classification methods by analyzing their accuracies, time costs, stability, and sensitivity, as well as their advantages and disadvantages [
21,
22,
23]. SVM is efficient when there is a clear margin of separation between the classes, but the choice of its kernel function is difficult, and it does not work with noisy datasets [
23,
24]. DL is developing rapidly, but its training is a very time-consuming process because a large number of parameters need to be optimized through the stochastic gradient descent algorithm. In addition, some hyperparameters in DL are set empirically, such as the number of layers in the neural network, the number of nodes in each layer, and the learning rate, resulting in high sensitivity in the performance, dependent on the hyperparameters and specific problems [
25,
26]. KNN and DT are easy to apply. However, KNN requires the calculation of the Euclidean distance between all the points, leading to a high computation cost. DT is unsuitable for continuous variables, and it has a problem of overfitting [
23]. Other classical methods also obtained great successes [
27,
28]. In order to improve classification accuracy, an ensemble learning scheme, such as AdaBoost [
29,
30], Bagging [
31], Stacking [
32] and Gradient Boosting [
33], is usually adopted to solve an intricate or large-scale problem [
34,
35].
Inspired by the ability of our brain to recognize the musical notes played by any musical instrument in a noisy environment, this paper proposes an optimization method for constructing feature coordinates for data classification by simulating a non-uniform membrane structure model. No matter how complex a musical instrument’s structure is, or how different its vibration patterns are, when we listen to a piece of music played by an instrument, our brain can extract the fundamental tone of its vibration at every moment, and can recognize the beautiful melody as time goes by. Mathematically, this can be clearly explained. The vibration of the musical instrument at every moment is adaptively expanded on its own eigenfunction system, and our brain can grasp the lowest eigenvalue and its eigenfunction components corresponding to the musical notes every moment, and enjoy the beautiful melody over time. In order to extract the data features from complex samples, we simulate the adaptively generating process of the eigenfunction coordinate system of a musical instrument and build the mapping from data features to the low-frequency subspace of the eigenfunction system. Through analyzing the solution space and the eigenfunctions of the partial differential equations describing the vibration of a non-uniform membrane, which is a simple musical instrument, the mutual-energy inner product is defined and is used to extract data features. The introduction of the mutual-energy inner product can not only avoid generating an eigenfunction system to reduce the computational complexity, but also can enhance the feature information and filter out data noise, furthermore, it can benefit the simplification of the data classifier training.
The full paper is divided into six sections.
Section 1 briefly introduces popular data classification methods and the research background.
Section 2 analyzes the solution space of the partial differential equations describing a non-uniform membrane, and defines the concept of the mutual-energy inner product.
Section 3, by making use of the eigenvalues and the eigenfunctions of the non-uniform membrane vibration equations, the mutual-energy inner product is expressed as a series of eigenfunctions, and its potential in data classification is pointed out for enhancing feature information and filtering out data noise.
Section 4 builds a mutual-energy inner product optimization model and discusses the convexity and concavity properties of its objective function.
Section 5 designs a sequential linearization algorithm to solve the optimization model by combing the finite element method (FEM).
Section 6, the mutual-energy inner product optimization method for constructing feature coordinates is applied to a 2-D image classification problem, and numerical examples are given in combination with Gaussian classifiers and the handwritten digit MINST dataset.
Section 7, we summarize the full paper and introduce the future scope of the work.
2. Mutual-Energy Inner Product
Consider the linear partial differential equations
where
is a homogeneous linear self-adjoint differential operator;
is a piecewise continuous function;
is the domain of definition, with a boundary
; and
is a homogeneous linear differential operator on the boundary
, describing the Robin boundary condition.
Expression (2) can be regarded as static equilibrium equations of a simple elastic structure, such as a 1-D string and a 2-D membrane, and can be expanded to an n-dimensional problem. For a 2-dimensional problem, is a domain occupied by a membrane with its boundary ; and stand for the elastic modulus and distributed support elastic coefficient of the membrane, respectively; is the support elastic coefficient on the boundary; is an external force acting on the membrane; is the deformations of the membrane due to , and has a piecewise continuous first-order derivative; is the derivative of the deformations in the outward-pointing normal direction of . In this research, it is required that, , , are piecewise continuous functions, and , , .
A structure subjected to an external force
will generate the deformation
, and its deformation energy
can be expressed as
If the structure is simultaneously subjected to another external force
, then it will generate an additional deformation
. The total deformation
satisfies the superposition principle due to the linearity of Expression (1). The deformation
can cause additional work performed by
. Generally, the additional deformation energy
is called the mutual energy between
and
or the mutual work between
and
. The mutual energy describes the correlation of the two external forces, and can be expressed as
Substituting Expression (1) into Expression (4), by integrating by parts, we obtain
Expression (5) is a bilinear functional. Comparing Expressions (3) and (4), we have
Due to
,
,
, according to the Expressions (5) and (6), the mutual energy satisfies
Expression (7) describes a simple physical phenomenon: when the elastic modulus of the structural material is positive, if the structure deforms, deformation energy is generated; otherwise, the deformation energy is zero.
Expression (5) also shows that the mutual energy is symmetrical and satisfies the commutative law. Combined with Expression (7), it can be inferred that the mutual energy satisfies the Cauchy–Schwarz inequality
The Expressions (7) and (8) show that the mutual energy can be regarded as an inner product of the structural deformation functions. For simplicity, we use
and
to represent the mutual-energy inner product and the Euclidean inner product, respectively; that is,
We define
as the norm derived from
, and
as the norm derived from
. Based on Expression (6),
satisfies
is proportional to the square root of the deformation energy, and is also the energy norm. According to the Cauchy–Schwarz inequality (8),
satisfies the triangle inequality
Based on Expression (1), when a structure is subjected to a piecewise continuous external force, its deformation function has piecewise continuous first-order derivatives on the domain and satisfies the boundary condition . The set of these deformation functions can span a space , which can be equipped either with the Euclidean inner product or with the mutual-energy inner product .
In addition, applying the variational principle, Expression (1) can also be rewritten as the minimum energy principle expression
Here, the feasible domain of has piecewise continuous first-order derivatives on , and does not need to satisfy homogeneous boundary conditions.
3. Signal Processing Property of Mutual-Energy Inner Product
The eigenequation of
can be written as
For Expression (13), its non-zero solutions
and the corresponding coefficients
are called eigenfunctions and eigenvalues, respectively. These eigenfunctions and eigenvalues have the following properties due to
,
,
[
36].
- (1)
Expression (13) has infinite eigenvalues and eigenfunctions , i.e., . If all the eigenvalues are ranked like then they satisfy and . Meanwhile, has continuous dependence on , and , and will increase with the increase in , and .
- (2)
Normalized eigenfunctions
satisfy the orthogonality condition (14), and can form a set of orthogonal and complete basis functions to span the deformation function space
.
Therefore, the solutions of Expression (1) can be expressed by
. For
,
can be presented as a series of eigenfunctions satisfying absolute and uniform convergence, i.e.,
Expression (15) has profound physical meaning. and are the order structural natural frequency and the order vibration mode. If is regarded as a vibration amplitude function, it can be decomposed into a superposition of the vibration modes at each order natural frequency, where the coefficient is the vibration magnitude at . This is equivalent to spectral decomposition. Imagine such a scene. When we enjoy a piece of music, our brains constantly decompose the instantaneous vibration amplitude according to Expression (15), and meanwhile, perceive the vibration coefficients and mark them with . For a musical instrument, is its fundamental frequency (tone) and the remaining eigenvalues are overtones. Different musical instruments have different vibration patterns, and their eigenfunctions are also different. However, after tuning the tone of the different musical instruments, the fundamental frequency of each note is consistent.
The eigenfunctions and eigenvalues satisfy Expression (13), so we have
Multiplying both sides of Expression (16) by
and integrating by parts, we can yield
Expression (17) shows that the eigenfunctions also satisfy the orthogonal condition with respect to the mutual-energy inner product. So, these eigenfunctions can also be used as basis functions to span the mutual-energy inner product space .
Substituting Expression (15) into Expression (5) and applying Expression (17), we have
If
satisfies the normalization condition
or
, based on the Expressions (14) and (18), the eigenvalue
satisfies
where the optimal solution of
is the eigenfunction
.
Similarly, the deformation
caused by
can be expressed as
where
is the amplitude coefficient and can be interpreted as the component of
at the
vibration mode
. Substituting Expression (20) into Expression (12) and using the orthogonal condition (17), we have
where the coefficient
is the projection of
on
with respect to the Euclidean inner product
Enforcing the derivative of
in Expression (21) with respect to
to zero, we have
According to the series representation of
in Expression (15), if
is the deformation caused by
, the coefficient
satisfies
Substituting the Expressions (15), (20), (23) and (24) into Expression (5) and using the orthogonal condition (17), we have
Generally speaking, the external force
, because
does not satisfy the homogeneous boundary conditions, i.e.,
. In this case,
is equal to the projection of
on
or the optimal approximation of
in
. Of course, in order to make
, we may expand the design domain and simplify the boundary condition. For example, after expanding the design domain, we can set a fixed boundary and let
, or set a mirror boundary and let
. In these cases,
and
. Then, applying the orthogonal condition (14) yields
After and are expressed as a superposition of the eigenfunctions of the operator , through comparing the mutual-energy inner product in Expression (25) and the Euclidean inner product in Expression (26), it can be found that the mutual-energy inner product has the advantage of enhancing the low-frequency coordinate components () and suppressing the high-frequency coordinate components (). In other words, if and are regarded as signals, the mutual-energy inner product can augment the low-frequency eigenfunction components and filter out the high-frequency eigenfunction components of the signals, with the help of a structural model.
4. Mutual-Energy Inner Product Optimization Model for Feature Extraction
Assume that is a training dataset with samples, and each sample is represented as , while represents the class labels. For example, the samples are divided into two classes, and includes two subsets and , where . Generally, the samples in different classes are assumed to be random variables, which are independent and have identical distributions.
We hope to find an appropriate feature coordinate system to represent and use fewer coordinate components to classify the samples. If there is no further information, we may select the means of the probability distribution of and as reference features. In order to design a feature extraction model, two points should be considered: one to enhance the feature information, and the other to suppress the effect of random noise. We resort to a structural model and use the mutual-energy inner product to extract the features. Its main idea is to map the data features to a low-frequency eigenfunction space of the structural model.
If
and
are used to represent the means of the probability distribution of
and
, respectively, their unbiased estimates can be written as
We regard , and as external forces acting on the structural model, and use , and to represent their corresponding deformations, respectively. If we represent the selected reference feature in as , we can use the mutual-energy inner product to extract the feature coordinate component of . In order to construct the feature extraction optimization model, we first select as the reference feature and try to explore the physical meanings of the structural model when is the maximum, the minimum or equal to zero.
In order to enhance the feature information of the samples in
, a high statistical mean value
should be given
with a primary objective
In Expression (29), the mutual-energy inner product and deformations are functions of and , and its physical meaning is not intuitive. So, next, we will conduct a quantitative analysis to reveal the structural characteristics hidden in Expression (29).
According to the minimum energy principle (12), if an optimal solution of
is obtained, the derivative of the objective at the optimal solution in any direction
is zero, satisfying
Through calculating Expression (30), we obtain the relationship between
and
Expression (31) is a structural static equilibrium equation, and is also a constraint on
in optimization problem (29). In Expression (31), letting
yields
Substituting Expression (32) into Expression (12) yields the optimal value
of the objective
Through substituting Expressions (12) and (33) into the optimization problem (29), Expression (29) is transformed into an unconstrained optimization problem
If and are given, in Expression (34) is a quadratic and concave functional with respect to , due to and . If is given, is a linear function with respect to and .Through using the Univariate Search Method to solve Expression (34), if and are given, the maximum value of can be found by solving Expression (31) for , and if is given, the maximum value of will be reached on the lower bounds of and . So, the lower bounds of and must be larger than zero to ensure that Expression (29) has a finite optimal solution. In addition, the upper bounds of and should also be constrained to avoid the trivial solution . Therefore, when the optimization objective is to maximize the mutual-energy inner product, as shown in Expression (29), its optimal structural model would be the minimum stiffness structure, and the selected feature belongs to a low-frequency eigenfunction subspace. On the contrary, if the optimization objective is to minimize the mutual-energy inner product, the optimal structural model would be the maximum stiffness, and the selected feature would be mapped to a high-frequency eigenfunction subspace.
In addition, when using the mutual-energy inner product to extract feature information
of the samples in
, the feature information
of the samples in
should be suppressed. So, a small statistical mean value
is given
Here, we may set
to be zero or even negative, and impose constraints on the structural model
In Expression (31), setting
yields
. Replacing
with
, and exchanging
,
, we have
If Expression (36) satisfies
, then
and
are required to be orthogonal with respect to the mutual-energy inner product. Although the means of the two classes of the samples are generally not orthogonal in the continuous function space
, i.e.,
, the orthogonality of
and
can be easily realized according to Expression (37). For example, if setting
and dividing the domain
into two sub-regions according to the same or opposite signs of
and
, we can adjust
in the two sub-regions and control the positive and negative work performed by the external forces
on the deformations
, so as to make the total work
in Expression (37) zero. According to Expression (25), this can also be understood as designing a structural model and adjusting its eigenfunctions and eigenvalues, so as to use these eigenvalues as weights to achieve the weighted orthogonality of
and
. Further,
can be regarded as the relaxation of the orthogonal constraints on the mutual-energy inner product, which can be realized by adjusting
and
to make
. Geometrically, this means that the angle between
and
in the mutual-energy inner product space
is not an acute angle. If
is required to be minimal
based on Expression (12), similar to the discussion on Expression (29), the optimization problem (38) can be transformed into an unconstrained form
where
is a slack variable introduced to relax the constraint, which is the constraint of the static equilibrium equation describing the structural deformation due to
and
acting on the structure simultaneously. The objective can be expressed as
Obviously, if and are given, is a quadratic functional of , , and . is convex with respect to and , and is concave with respect to . If , and are given, is linear with respect to and .
In order to design a feature coordinate to classify the samples in
, the objective is to maximize
first. By combining the Expressions (28) and (35), the optimization objective can be expressed as
Then, to improve the classification accuracy, the distributions of the samples in and along the feature coordinate should also be considered, and their variances should be small. The variances of and are high-order functions of , , and , so putting them into the optimization objective function (41) will destroy its low-order characteristics.
In order to improve the computational efficiency, the sum of the absolute values of the sample deviations from the mean are used to replace the variances, and only some samples in
and
are selected for calculation. In the subset
, we only select
samples
, whose components on
are less than
, and calculate their mean absolute deviation
. In the subset
, we only select
samples
, whose components on
are larger than
, and calculate their mean absolute deviation
.
and
can be expressed as
Through using Expressions (41) and (42), and considering the means and the mean absolute deviations of the samples, the optimization objective can be written as
where
is a weight variable, satisfying
. To simplify Expression (42), the auxiliary deformation function
is defined as
where
can be regarded as an external force corresponding to
, satisfying
By substituting Expressions (41), (42), (44) and (45) into Expression (43), the optimization objective is simplified as
Here,
is a combination of the deformation functions, and can be expressed as
In order to improve the generalization of the data classifier, regularizers should be added to the optimization model. Here,
and
stand for the 1-norms of
and
, respectively, and are used as regularizers to avoid increasing the order of the optimization model. Meanwhile, these regularizers are treated as two constraints by directly setting the values of
and
. Due to
and
,
and
can be simply written as
It should be noted that objective (46) is built by taking the mean
of
as the reference feature and selecting the deformation
as the reference feature coordinate axis. If other deformation functions
are selected as the reference feature coordinate axis, the results are similar. For example,
can be set as
,
,
, or others. Through setting
as the reference feature coordinate axis, the optimization model can be summarized as
Here,
,
and
are arbitrary continuous functions on
;
and
, are lower bounds of
and
;
and
are two constants;
,
,
and
are given in Expressions (27), (45) and (47).
and
should be determined according to the reference feature coordinate axis, and can be rewritten as
5. Mutual-Energy Inner Product Feature Coordinate Optimization Algorithm
The EFM is used to solve the differential Equation (1) to realize the mapping from
,
, and
to
,
, and
in the optimization model (49). We divide the domain
into
elements
, and assume the
element
has
nodes. For the
node in
, its global coordinate in
, deformation value
, and interpolation basis function are denoted as
,
,
, respectively, where
, is the local coordinate of the element
. In this way, for an element, its global and local coordinate relationship
and the element deformation function
can be expressed as [
37]
It is assumed that
is an
-dimensional row vector with the
component
;
is an
matrix with the entry
, where
is the
component of the local coordinate
; and
is an
matrix with the entry
, where
is the
component of the element node coordinates
. Applying Expression (51), the
Jacobi matrix
for the transformation between the global and local coordinates, the deformation function
and its
-dimensional gradient vector
, can be expressed in the concise and compact form
where
is a vector with the component
, which is the deformation value of the
node in the
element, and
is an
matrix. In the optimization model (49), the design variables are
and
. We assume
and
in each element are constants
and
. So, the design variables can be expressed as
and
in
.
Substituting Expression (52) into the mutual-energy expressions (5) and (9) yields
Here,
is an
element stiffness matrix, which is a positive semidefinite symmetric matrix and can be expressed as
In Expression (54),
is a linear function of
and
;
and
are corresponding coefficient matrices; and
is the contribution of the boundary constraint to the element stiffness matrix. If the element boundary does not overlap with the design domain boundary, then
. Here,
,
,
can be calculated by
In Expression (53),
is the equivalent node input vector, resulting from the equivalent action between the force
on the element and the force
on the node, and satisfies
It is assumed that the design domain comprises element nodes. We number these nodes globally, and use two -dimension vectors and to denote the values of and at all the nodes. The components of and are and , where the subscript is the global node number. The component can be calculated through Expression (56). Expression (56) is calculated for each element adjacent to the global node, and is the superposition of the element node corresponding to the global node.
Based on the relationship between the local and global node numbers, Expression (53) can be rewritten as
where
is the global stiffness matrix, an
positive definite symmetric matrix. Substituting Expression (57) into Expression (12) yields
Based on Expression (58), the solution of the differential Equation (1) satisfies
Similarly, assume that the input of Expression (1) is
and the corresponding solution is
;
is the global node vector corresponding to
on
, and
is the element node vector corresponding to
on
; and
is the equivalent node input vector corresponding to
. We have
Similarly to the derivation of Expression (57), through using Expressions (59) and (60), the mutual-energy expression of
and
can be derived
In Expression (61), the first equation is used for model optimization, and the second equation is used for data classifier training and prediction, avoiding the need to solve for the Expressions (59) and (60).
After discretizing the design domain by finite elements, the differential Equation (1) is converted into a system of linear equations, and the mutual-energy definition (5) can be expressed by the matrix and vector product. In this way, the optimization model (49) can be rewritten in the vector form
Here,
is the finite element node vector corresponding to the selected reference feature coordinate, and can be the statistical features of the sample sets or their combination; for example,
Meanwhile,
,
, and
are the finite element node vectors corresponding to the mean and deviation of the samples, and
is the temporary node vector generated by the mean and deviation. Expression (47) can be rewritten as
The significant advantage of the optimization model (62) is that
is a positive definite symmetric matrix and is linear with respect to the design variables
and
, and meanwhile, the coefficient matrices corresponding to the components of the design variables are positive semidefinite matrices, convenient for the algorithm design. Intermediate variables
,
,
are functions of the design variables and can be calculated by using the linear equations, and the optimization model (62) can be solved by the sequential linearization algorithm. The objective
and the constraint
are nonlinear, and their derivatives with respect to the design variables need to be calculated. The derivative of
with respect to
is
where
and
are determined by taking the derivative of
and
with respect to
Substituting Expression (66) into Expression (44) yields
Substituting Expression (54) into Expression (67) yields
. Similarly,
can also be computed
The Expressions (63) and (64) show that
and
are linear combinations of
,
, and
. According to the superposition principle,
and
also satisfy equations similar to Expression (66), and have exactly the same derivation as Expression (68). So, we obtain
Optimization Algorithm 1: Mutual-energy inner product feature coordinate optimization algorithm
Based on Expressions (68) and (69), the optimization model (62) can be solved by the sequential linearization algorithm. The algorithm steps are summarized as follows:
- (1)
Use vectors to represent the sample data
Convert the sample data
in the training subsets
and
into the finite element node vectors
. Based on Expression (70), first calculate the element node vectors
, and then use them to assemble the global node vector
.
- (2)
Set the optimization constants and initial values of the design variables
- ①
Set the optimization constants
Set , the weight of the mean and deviation, with the requirement ; set the total amount , and the lower bounds , of the design variables; set the moving limit of the design variables for the linear programming; set the design variable minimum increment and the objective function minimum increment , which are used to determine if the optimization ends or not.
- ②
Set the initial values of the design variables
Set , . Generally, set , .
- (3)
Calculate the current value of the objective function
- ①
Calculate the element stiffness matrices and assemble the global stiffness matrix
Based on Expressions (54) and (55), calculate the element stiffness matrices . The element stiffness matrix is linear with respect to and , and the coefficient matrices are determined only by the element interpolation basis functions, so the calculation can be performed prior to the optimization to speed up the optimization process. Then, assemble the global stiffness matrix according to the node numbers. Since is a positive definite symmetric matrix, through performing Cholesky decomposition on it, we can have , where is a lower triangular matrix.
- ②
Compute the mean vectors and , and select the reference feature coordinate axis
where
and
represent the means of the sample data in
and
;
and
are the sample numbers in
and
;
can be selected and calculated by Expression (63).
- ③
Compute the deviation vector and the intermediate vector
where
is the deviation of the sample data and only the sample data in
and
are calculated.
and
represent the projections of the means of the sample data in
and
on
. After
,
,
are obtained,
can be obtained by Expression (64).
- ④
Calculate the current values of the objective function and the constraint
Based on the optimization model (62), the current values of
and
can be calculated by
- (4)
Calculate the gradient vectors of the objective function and the constraint
Apply Expressions (68) and (69) to calculate , , and . Then, express them as the compact gradient vectors , , and . Here, is defined as and the other gradient vector definitions are similar. In Expressions (68) and (69), and are only determined by the element interpolation basis functions and are constant matrices independent of the design variables. So, and can be calculated prior to the optimization, and the gradient vectors of and can be achieved through the mapping relationship between the local and global node numbers.
- (5)
Obtain increments of the design variables by solving the sequential linearization optimization model
- ①
Construct the sequential linearization optimization model
where the design variables
,
;
and
are increments of the design variables, and their
components are
and
;
,
, and
,
can be calculated by
- ②
Solve the sequential linearization optimization model (74) to obtain and
When solving Expression (74), slack variables are added to to facilitate the initial feasible solution construction.
- (6)
Determine whether to end the optimization iteration
- ①
Store the design variables, the objective function, and the constraint function of the previous step of the sequential linearization optimization.
Store the design variables , , the objective function value , and the constraint function value .
- ②
Update the design variables and the objective function value.
Let , , then execute step (3) to update the objective function value .
- ③
Determine whether to end the iteration.
If or , then end the iteration. Otherwise, if , go to step (4) to continue the iteration; if , reduce the moving limits of the design variables by letting ; here, , then go to step (5) to iteratively calculate the design variable increments and .
6. Algorithm Implementation and Image Classifier
Image classification is used to determine if an image has certain given features and can be realized by algorithms for extracting the feature information of the image. Applying the mutual-energy inner product to extract the image features has the advantage of enhancing the feature information and suppressing other high-frequency noise. If we select multiple features of an image, we can design multiple mutual-energy inner products, and each mutual-energy inner product can be regarded as one feature coordinate of the image. Using multiple mutual-energy inner products to characterize an image is equivalent to using multiple feature coordinates to describe the image, or equivalent to representing the high-dimensional image in a low-dimensional space, reducing the dimensionality of image data.
This part will discuss the implementation of Optimization Algorithm 1 and its application in 2-D grayscale image classification. Assume that each sample in the training datasets and is a 2-D grayscale image; the domain occupied by the image is rectangular; each image is expressed by pixels; and each pixel is a square with a side length of 1. In this case, and .
6.1. Vectorized Implementation of Optimization Algorithm 1
While using FEM to discretize the design domain, we regard each pixel as a finite element and divide the domain
into
quadrilateral elements
, i.e.,
and
. In
, the global element numbering uses column priority, where the upper left corner element is numbered 1 and the lower right corner element is numbered
. A planar quadrilateral element is used to interpolate the deformation functions. Each element has four nodes, so the total number of nodes is
, and the total number of boundary nodes is
. The global node numbering also uses column priority, where the upper left corner node is numbered 1 and the lower right corner node is numbered
. The interpolation basis functions of the quadrilateral element are
where the domain of the definition is square and is expressed as
. The element nodes are four corner points of the quadrilateral. The node with the coordinate
is numbered 1, in counter-clockwise order, and the other nodes with the coordinates
,
,
are numbered 2, 3, and 4, respectively. The interpolation basis function
corresponds to the
node, where
is the corresponding node coordinate. The mapping relationship between the element node numbers and the global node numbers can be described by an
matrix
, and its
row corresponds to the
element. If
denotes its entry at the
row and the
column, then
,
,
,
are the global node numbers corresponding to the element node numbers 1, 2, 3 and 4 of the
element. So, we have
where
is a module when
is divided by
. Since all the elements are same squares, the isoparametric transformation
in Expression (51) is actually a scaling transformation. Through substituting
into Expressions (52) and (55), we can find that the coefficient matrices
and
are independent of the element node numbers. So, we use
and
to express
and
, and calculate them directly by
When a side of an element overlaps with the boundary of the domain
, the influence of the boundary conditions
in Expression (1) on
should be considered, so a
matrix
should be calculated. Assume that the
side of the element overlaps with the boundary of
and the entry in the
row and
column of
is
. Then, the non-zero entries in
can be calculated by
In Expression (78), the subscripts and stand for the starting and end points of the side of the element, where the starting point is the element node numbered and the end point is determined along the side in counterclockwise order; is a constant, equal to the approximate value of on the side. In this paper, we handle the influence of on while assembling the global stiffness matrix. We just simply replace the subscripts and of in Expression (78) with global node numbers, then directly use them to assemble the global stiffness matrix.
Because each element corresponds to a pixel, we can assume that its grayscale value is a constant
. In this way, a sample image
can be expressed as
. Through substituting Expression (75) into Expression (70), the relationship between element node vectors and image grayscale values can be obtained
where
, which can be regarded as mapping coefficients from the image grayscale to the element node vector.
While using element stiffness matrices and element node vectors to assemble the global stiffness matrix and the global node vector , the functions for generating a sparse matrix in MATLAB R2020a or the Python 2.7 SciPy module can be used, and the input arguments include the row index vector, the column index vector, and the values of the non-zero entries. More importantly, these sparse matrix generation functions can sum the non-zero entries with the same indexes, which is consistent with the process of assembling and .
In order to convert the image grayscale vector
to the global node vector
, a
matrix
should first be calculated, whose
column corresponds to the element node vector
. Then,
is converted to a
-dimensional column vector
in column-major order. Obviously, if we divide the components of
into multiple groups in sequence and each group includes four components, then the
group corresponds to the element node vector
.
can be calculated by
where the function
can convert the dimension of the matrix
into
while keeping the total number of the entries unchanged.
Through the mapping matrix
, the position indexes of components of
in the global node vector
can be obtained. We transpose the
matrix
to the
matrix
, whose
column corresponds to global node numbers of the
element, and then convert
to a
-dimensional column vector
in the column-major order.
can be figured out by
is the row index vector for generating by a sparse matrix generation function. Since has only one column, we use to denote a -dimensional column index vector and set all the components of to 1. Through substituting , , into the sparse matrix generation function, we can yield .
Similarly, the global stiffness matrix
can be assembled by using the sparse matrix generation function. A vector
related to
should be first calculated by
where the operator
denotes the Kronecker product of the matrices;
is a
matrix;
and
are the design variables. If
is divided into multiple blocks from left to right and each block is a
matrix, the
block is the calculation result of the first two terms of
in Expression (54), without including
. Therefore, if
are divided into multiple blocks in sequence and each block includes 16 components, the
block will correspond to a 1-dimensional vector converted from the
element stiffness matrix in the column priority. We set
, and use
,
to denote the row indexes and column indexes of the entries in the global stiffness matrix. Then,
,
corresponding to the components of
can be calculated by
As mentioned above, the constraint on the design boundary can generate additional stiffness for the adjacent elements. If we regard an element side overlapping with as a 2-node line element, then its stiffness matrix will be a matrix , which can be figured out by Expression (78). Similarly, these line element stiffness matrices can be assembled into the global stiffness matrix. While designing an image classifier based on the mutual-energy inner products, we set a fixed boundary for Expression (1), i.e., . This boundary condition can be handled by adding a relatively large number to the diagonal entries of , where its diagonal entries correspond to the boundary node numbers. The sparse matrix generation function is used to implement this boundary condition. First, we set the dimension of the vector as , which is the total number of the boundary nodes, and set all the components of to . Meanwhile, we let the -dimensional row and column index vectors be the same, i.e., , and set their components to be the boundary node numbers. Finally, we combine and , and , and , respectively, and input them into the sparse matrix generation function to obtain .
Based on Expressions (68) and (69), the gradients of the objective and the constraint can be efficiently obtained by using
. For example, if we have two
-dimensional global node vectors
and
, we can adopt fancy indexing to generate two
matrices
and
whose
rows correspond to the node vectors of the
element. According to Expression (69), the objective function gradients
,
can be calculated by
where
stands for multiplying the corresponding entries of the matrices, and
is summing the rows of a matrix to obtain a column vector. Mathematically, Expression (84) can be written as
and
, where the function
is used to extract the main diagonal entries from a square matrix. Similarly, the constraint function gradients
,
can be calculated by replacing
,
with
,
.
6.2. Image Classifier
For a given training dataset
, in order to use Optimization Algorithm 1 to construct the mutual-energy inner product coordinate axes
, we select the subset
of
as the reference training set and select the mean of samples of the class “0” or the class “1” in
or a combination of these means as the reference feature
. The subset
is gradually generated as the coordinate
is generated. Prior to generating the coordinate
,
mutual-energy inner product coordinate axes
have been generated and there are
subsets
. One of the
subsets is selected as a subset
to generate the coordinate
. In order to explain how the generation of new axes work, we use a set
to manage the
generated subsets, i.e.,
. If
has
samples of the class “0” and
samples of the class “1”, the subset
in
is taken as the reference training sample set
to generate
and its index
satisfies
After determining the subset and the reference feature , can be obtained by Optimization Algorithm 1. Next, we divide into two subsets. First, for each sample in , we calculate its coordinate component on the axis by ; we calculate and , the means of samples of the class “0” and the class “1” and set a threshold . Second, according to and , we divide into two subsets satisfying and . Finally, we add and into , and delete from . At this time, contains training sample subsets, and one of them will be selected to calculate the coordinate axis .
The following summarizes the detailed steps of generating mutual-energy inner product feature coordinates.
Algorithm 2: Mutual-energy inner product feature coordinates generation
- (1)
Let and ;
- (2)
According to Expression (85), select in to generate the coordinate axis and delete from ;
- (3)
Adopt Optimization Algorithm 1 to calculate based on the determined reference subset and the selected reference feature ;
- (4)
For each sample in , calculate its coordinate components on the axis , the means and of the class “0” and the class “1”, as well as the threshold ;
- (5)
According to and , divide into two subsets and , and add them into ;
- (6)
Judge if , set and go to Step (2); otherwise, stop.
After generating mutual-energy inner product coordinate axes
by Algorithm 2, the coordinate components
of each sample in
can be calculated and are represented by a feature vector
. Based on
, a simple Gaussian classifier is used to classify the images. We use
to represent a training dataset comprising
samples, where the subscript
is the class index of the samples. A Gaussian classifier can be used to classify the samples into multiple classes. We use
to indicate the class of a sample and use
to denote the total number of classes. In
, the probability of the class
is
Furthermore, it is assumed that, for the samples in the same class, their feature vectors
follow the Gaussian distribution
where
is the mean of
;
is the covariance matrix of
; and the subscript
corresponds to the class
. Using the training sample dataset
, their maximum likelihood estimates can be calculated by [
38]
Here,
. Based on Expressions (86) and (87), when giving the feature vector of a sample, the posterior probability of the sample belonging to the class
is
where
is the posterior probability, and
can be expressed as
Finally, the class of the sample is determined based on the posterior probability
6.3. Numerical Examples
The MNIST dataset has become one of the benchmark datasets in machine learning. It comprises 60,000 sample images in the training set and 10,000 sample images in the test set, and each one is a 28-by-28-pixel grayscale image of the handwritten digits 0–9. In this section, we will use the MNIST to design Gaussian image classifiers based on Optimization Algorithm 1.
Before designing Gaussian image classifiers, image preprocessing is conducted to align the image centroids and normalize the sample images. In Optimization Algorithm 1, the selected parameters are , , , , , , and .
6.3.1. Binary Gaussian Classifier: Identify Digits “0” and “1”
The MINST training set comprises 6742 samples “1” and 5923 samples “0”. We select the difference between the means of samples “1” and “0” as the reference feature, i.e.,
. Optimization Algorithm 1 converges after 166 iterations. The means of samples “1” and “0”, the design variables, and the reference feature coordinate
, are visualized in
Figure 1,
Figure 2 and
Figure 3. Due to obvious differences in the mean feature, digits “0” and “1” can be identified using only one mutual-energy inner product coordinate
.
Figure 4a shows the training sample distribution in accordance with the components on
.
Figure 5a gives the Confusion Matrix of the classification results, where the horizontal and vertical axes correspond to the target class and the output class of the classifier, respectively. In the Confusion Matrix, the column on the far right shows the precision of all the examples predicted to belong to each class, and the row at the bottom shows the recall of all the examples belonging to each class; the entry in the bottom right shows the overall accuracy; the diagonal entries are the correctly classified numbers of digits “0” and “1” and the off-diagonal entries correspond to the wrong classifications. This binary Gaussian classifier on the training set achieves a very high overall accuracy of 99.66%, shown at the bottom right of the Confusion Matrix.
The binary Gaussian classifier is tested on the MINST test set, which comprises 1135 samples “1” and 980 samples “0”. The test results are visualized in
Figure 4b and
Figure 5b. Its overall accuracy can reach 99.91%, higher than that on the training set.
6.3.2. Binary Gaussian Classifier: Identify Digits “0” and “2”
The MINST training set comprises 5958 samples “2” and 5923 samples “0”, and the MINST test set comprises 1032 samples “2” and 980 samples “0”. Similarly to the previous classifier, the reference feature is also selected as
. The difference in the mean features of digits “2” and “0” is not as significant as that of digits “1” and “0”. If only one mutual-energy inner product coordinate is used for classification, the accuracy is only 96.72% on the training set and 97.81% on the test set. In order to improve the classification accuracy, we use Algorithm 2 to generate 60 mutual-energy inner product coordinates based on the training sample set and its subsets, and construct a 60-dimensional Gaussian classifier. The Confusion Matrices of the classification results are given in
Figure 6a,b, showing an overall accuracy of 99.55% on the training set and a higher overall accuracy of 99.85% on the test set.
6.3.3. Binary Gaussian Classifier: Identify Digits “3” and “4”
The MINST training set comprises 6131 samples of “3” and 5842 samples of “4”, and the MINST test set comprises 1010 samples of “3” and 982 samples of “4”. Here, we select the means of samples “3” and “4” as reference features, i.e.,
and
, and then use Algorithm 2 to generate 50 mutual-energy inner product coordinates, respectively, finally forming 100 classification coordinates. Because these coordinates are not linearly independent, we use matrix singular value decomposition to construct a 50-dimensional Gaussian classifier.
Figure 6c,d is the Confusion Matrices, showing an overall accuracy of 99.67% on the training set and a higher overall accuracy of 99.80% on the test set.
6.3.4. Multiclass Gaussian Classifier: Identify Digits “0”, “1”, “2”, “3” and “4”
In the training set, we select one digit from samples “0”, “1”, “2”, “3” and “4” as the first class and the other training samples of the five digits as the second class, and we take the two classes as the training sample set. Then, we select the difference between the means of the samples in the two classes as the reference feature, i.e.,
, and use Algorithm 2 to generate 120 mutual-energy inner product coordinates. In this way, we construct 5 training sample sets and finally generate 600 coordinates. However, many of them are linearly dependent; in order to identify the digits “0”, “1”, “2”, “3” and “4”, we use matrix singular value decomposition to reduce its dimensions from 600 to 60 and construct a 60-dimensional multiclass Gaussian classifier.
Figure 7 shows an overall accuracy of 98.22% on the training set and a higher overall accuracy of 98.83% on the test set.
7. Discussion
Based on the solution space of the partial differential equations describing the vibration of a non-uniform membrane, the concept of the mutual-energy inner product is defined. By expending the mutual-energy inner product as a superposition of the eigenfunctions of the partial differential equations, an important property is found: the mutual-energy inner product has the significant advantage of enhancing the low-frequency eigenfunction components and suppressing the high-frequency eigenfunction components, compared to the Euclidean inner product.
In data classification, if the reference data features of the samples belong to a low-frequency subspace of the set of the eigenfunctions, these data features can be extracted through the mutual-energy inner product, which can not only enhance feature information but also filter out high-frequency data noise. As a result, a mutual-energy inner product optimization model is built to extract the feature coordinates of the samples, which can enhance the data features, reduce the sample deviations, and regularize the design variables. We make use of the minimum energy principle to eliminate the constraints of the partial differential equations in the optimization model and obtain an unconstrained optimization objective function. The objective function is a quadratic functional, which is convex with respect to the variables that minimize the objective function, is concave with respect to the variables that maximize the objective function, and is linear with respect to the design variables. These properties facilitate the design of optimization algorithms.
FEM is used to discrete the design domain, and the design variables of each element are set as constants. Based on these finite elements, the gradients of the mutual-energy inner product relative to the element design variables are analyzed, and a sequential linearization algorithm is constructed to solve the mutual-energy inner product optimization model. Algorithm implementation only involves solving equations with the positive definite symmetric matrix when calculating the intermediate variables and only needs to handle a few constraints in the nested linear optimization module, guaranteeing the stability and effectiveness of the algorithm.
The mutual-energy inner product optimization model is applied to extract the feature coordinates of the sample images and construct a low-dimensional coordinate system to represent the sample images. Multiclass Gaussian classifiers are trained and tested to classify the 2-D images. Here, only the means of the training sample set and its subsets are selected as reference features in Optimization Algorithm 1, and the vectorized implementation of Optimization Algorithm 1 is discussed. Generating mutual-energy inner product coordinates via the optimization model and training or testing Gaussian classifiers are two independent steps. In training or testing Gaussian classifiers, calculating mutual-energy inner products can be converted into calculating the Euclidean inner products between the reference feature coordinates and the sample data, not adding computational complexity to the Gaussian classifiers.
In the MINST dataset, the mutual-energy inner product feature coordinate extraction method is used to train a 1-dimensional two-class Gaussian classifier, a 50-dimensional two-class Gaussian classifier, a 60-dimensional two-class Gaussian classifier, and a 60-dimensional five-class Gaussian classifier, and good prediction results are achieved. The feature coordinate extraction method achieves a higher overall accuracy on the test set than that on the training set, indicating that the classification model is experiencing underfitting. This shows large potential in the achievable accuracy of this method that has not yet been explored.
From the view of theory and algorithm, this feature extraction method is obviously different from the existing techniques in machine learning. Its limitation is the need of the given reference features in advance. In this paper, only the mean features of a sample dataset and its subsets are selected as the reference features to construct Gaussian classifiers. In the future, convolution operation can be adopted to construct other image reference features, such as image edge features, local features, textures [
39], and multi-scale features, and these image features can be combined to generate a mutual-energy inner product feature coordinate system. In addition, other ensemble classifiers, such as Bagging and AdaBoost, can be introduced to improve the performances of the image classifiers. Meanwhile, the feasibility of applying the mutual-energy inner product optimization method to the neural network will also be explored.