Mutual-Energy Inner Product Optimization Method for Constructing Feature Coordinates and Image Classification in Machine Learning

Wang, Yuanxiu

doi:10.3390/math12233872

Open AccessArticle

Mutual-Energy Inner Product Optimization Method for Constructing Feature Coordinates and Image Classification in Machine Learning

by

Yuanxiu Wang

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA

Mathematics 2024, 12(23), 3872; https://doi.org/10.3390/math12233872

Submission received: 10 November 2024 / Revised: 29 November 2024 / Accepted: 3 December 2024 / Published: 9 December 2024

(This article belongs to the Special Issue Advances in Machine Learning and Graph Neural Networks)

Download

Browse Figures

Versions Notes

Abstract

As a key task in machine learning, data classification is essential to find a suitable coordinate system to represent the data features of different classes of samples. This paper proposes the mutual-energy inner product optimization method for constructing a feature coordinate system. First, by analyzing the solution space and eigenfunctions of the partial differential equations describing a non-uniform membrane, the mutual-energy inner product is defined. Second, by expressing the mutual-energy inner product as a series of eigenfunctions, it shows the significant advantage of enhancing low-frequency features and suppressing high-frequency noise, compared to the Euclidean inner product. And then, a mutual-energy inner product optimization model is built to extract the data features, and the convexity and concavity properties of its objective function are discussed. Next, by combining the finite element method, a stable and efficient sequential linearization algorithm is constructed to solve the optimization model. This algorithm only solves positive definite symmetric matrix equations and linear programming with a few constraints, and its vectorized implementation is discussed. Finally, the mutual-energy inner product optimization method is used to construct feature coordinates, and multi-class Gaussian classifiers are trained on the MINST training set. Good prediction results of the Gaussian classifiers are achieved on the MINST test set.

Keywords:

machine learning; data classifier; features; mutual energy; optimization; linear programming

MSC:

68T10

1. Introduction

In machine learning, data classification plays a very important role. Up to now, a large number of data classification methods have emerged and powered the development of machine learning and its practical applications in different domains [1], such as image detection [2], speech recognition [3], text understanding [4], disease diagnosis [5,6], and financial prediction [7].

Currently, popular data classification methods include the Support Vector Machine (SVM) [8,9], Decision Tree (DT) [10,11], Naive Bayes (NB) [12], K-Nearest Neighbors (KNN) [13], Random Forest (RF) [14], Deep Learning (DL) [15], and Deep Reinforcement Learning (DRL) [16]. SVM is based on optimization theory [17]. DL is implemented through a multilayer neural network under the guidance of optimization techniques, such as the stochastic gradient descent algorithm [18]. DRL combines DL with Reinforcement Learning, and it is effective in real-time scenarios [16]. The others fall in the title of statistical methods [19,20].

Many comparative studies are employed to evaluate these classification methods by analyzing their accuracies, time costs, stability, and sensitivity, as well as their advantages and disadvantages [21,22,23]. SVM is efficient when there is a clear margin of separation between the classes, but the choice of its kernel function is difficult, and it does not work with noisy datasets [23,24]. DL is developing rapidly, but its training is a very time-consuming process because a large number of parameters need to be optimized through the stochastic gradient descent algorithm. In addition, some hyperparameters in DL are set empirically, such as the number of layers in the neural network, the number of nodes in each layer, and the learning rate, resulting in high sensitivity in the performance, dependent on the hyperparameters and specific problems [25,26]. KNN and DT are easy to apply. However, KNN requires the calculation of the Euclidean distance between all the points, leading to a high computation cost. DT is unsuitable for continuous variables, and it has a problem of overfitting [23]. Other classical methods also obtained great successes [27,28]. In order to improve classification accuracy, an ensemble learning scheme, such as AdaBoost [29,30], Bagging [31], Stacking [32] and Gradient Boosting [33], is usually adopted to solve an intricate or large-scale problem [34,35].

Inspired by the ability of our brain to recognize the musical notes played by any musical instrument in a noisy environment, this paper proposes an optimization method for constructing feature coordinates for data classification by simulating a non-uniform membrane structure model. No matter how complex a musical instrument’s structure is, or how different its vibration patterns are, when we listen to a piece of music played by an instrument, our brain can extract the fundamental tone of its vibration at every moment, and can recognize the beautiful melody as time goes by. Mathematically, this can be clearly explained. The vibration of the musical instrument at every moment is adaptively expanded on its own eigenfunction system, and our brain can grasp the lowest eigenvalue and its eigenfunction components corresponding to the musical notes every moment, and enjoy the beautiful melody over time. In order to extract the data features from complex samples, we simulate the adaptively generating process of the eigenfunction coordinate system of a musical instrument and build the mapping from data features to the low-frequency subspace of the eigenfunction system. Through analyzing the solution space and the eigenfunctions of the partial differential equations describing the vibration of a non-uniform membrane, which is a simple musical instrument, the mutual-energy inner product is defined and is used to extract data features. The introduction of the mutual-energy inner product can not only avoid generating an eigenfunction system to reduce the computational complexity, but also can enhance the feature information and filter out data noise, furthermore, it can benefit the simplification of the data classifier training.

The full paper is divided into six sections. Section 1 briefly introduces popular data classification methods and the research background. Section 2 analyzes the solution space of the partial differential equations describing a non-uniform membrane, and defines the concept of the mutual-energy inner product. Section 3, by making use of the eigenvalues and the eigenfunctions of the non-uniform membrane vibration equations, the mutual-energy inner product is expressed as a series of eigenfunctions, and its potential in data classification is pointed out for enhancing feature information and filtering out data noise. Section 4 builds a mutual-energy inner product optimization model and discusses the convexity and concavity properties of its objective function. Section 5 designs a sequential linearization algorithm to solve the optimization model by combing the finite element method (FEM). Section 6, the mutual-energy inner product optimization method for constructing feature coordinates is applied to a 2-D image classification problem, and numerical examples are given in combination with Gaussian classifiers and the handwritten digit MINST dataset. Section 7, we summarize the full paper and introduce the future scope of the work.

2. Mutual-Energy Inner Product

Consider the linear partial differential equations

{\begin{cases} L [u (\vec{x})] = f (\vec{x}) & \vec{x} \in Ω \\ l [u (\vec{x})] = 0 & \vec{x} \in Γ \end{cases},

(1)

where

L [u (\vec{x})]

is a homogeneous linear self-adjoint differential operator;

f (\vec{x})

is a piecewise continuous function;

Ω

is the domain of definition, with a boundary

Γ

; and

l [u (\vec{x})]

is a homogeneous linear differential operator on the boundary

Γ

, describing the Robin boundary condition.

{\begin{cases} L [u (\vec{x})] = - \sum_{i = 1}^{n} \frac{\partial}{\partial x_{i}} (p (\vec{x}) \frac{\partial u (\vec{x})}{\partial x_{i}}) + q (\vec{x}) u (\vec{x}) \\ l [u (\vec{x})] = p (\vec{x}) \frac{\partial u (\vec{x})}{\partial n} + σ (\vec{x}) u (\vec{x}) \end{cases} .

(2)

Expression (2) can be regarded as static equilibrium equations of a simple elastic structure, such as a 1-D string and a 2-D membrane, and can be expanded to an n-dimensional problem. For a 2-dimensional problem,

Ω

is a domain occupied by a membrane with its boundary

Γ

;

p (\vec{x})

and

q (\vec{x})

stand for the elastic modulus and distributed support elastic coefficient of the membrane, respectively;

σ (\vec{x})

is the support elastic coefficient on the boundary;

f (\vec{x})

is an external force acting on the membrane;

u (\vec{x})

is the deformations of the membrane due to

f (\vec{x})

, and has a piecewise continuous first-order derivative;

\partial u / \partial n

is the derivative of the deformations in the outward-pointing normal direction of

Γ

. In this research, it is required that,

p (\vec{x})

,

q (\vec{x})

,

σ (\vec{x})

are piecewise continuous functions, and

p (\vec{x}) > 0

,

q (\vec{x}) > 0

,

σ (\vec{x}) \geq 0

.

A structure subjected to an external force

f (\vec{x})

will generate the deformation

u (\vec{x})

, and its deformation energy

E [u (\vec{x})]

can be expressed as

E [u (\vec{x})] = \frac{1}{2} \int_{Ω} f (\vec{x}) u (\vec{x}) d Ω .

(3)

If the structure is simultaneously subjected to another external force

g (\vec{x})

, then it will generate an additional deformation

v (\vec{x})

. The total deformation

u (\vec{x}) + v (\vec{x})

satisfies the superposition principle due to the linearity of Expression (1). The deformation

v (\vec{x})

can cause additional work performed by

f (\vec{x})

. Generally, the additional deformation energy

U [u (\vec{x}), v (\vec{x})]

is called the mutual energy between

u (\vec{x})

and

v (\vec{x})

or the mutual work between

f (\vec{x})

and

g (\vec{x})

. The mutual energy describes the correlation of the two external forces, and can be expressed as

U [u (\vec{x}), v (\vec{x})] = \int_{Ω} f (\vec{x}) v (\vec{x}) d Ω .

(4)

Substituting Expression (1) into Expression (4), by integrating by parts, we obtain

\begin{array}{l} U [u (\vec{x}), v (\vec{x})] & = \int_{Ω} (\sum_{i = 1}^{n} \frac{\partial u (\vec{x})}{\partial x_{i}} p (\vec{x}) \frac{\partial v (\vec{x})}{\partial x_{i}} + u (\vec{x}) q (\vec{x}) v (\vec{x})) d Ω \\ + \int_{Γ} u (\vec{x}) σ (\vec{x}) v (\vec{x}) d Γ \end{array} .

(5)

Expression (5) is a bilinear functional. Comparing Expressions (3) and (4), we have

E [u (\vec{x})] = \frac{1}{2} U [u (\vec{x}), u (\vec{x})] .

(6)

Due to

p (\vec{x}) > 0

,

q (\vec{x}) > 0

,

σ (\vec{x}) \geq 0

, according to the Expressions (5) and (6), the mutual energy satisfies

U [u (\vec{x}), u (\vec{x})] \geq 0 (i f o n l y i f u (\vec{x}) = 0, U [u (\vec{x}), u (\vec{x})] = 0) .

(7)

Expression (7) describes a simple physical phenomenon: when the elastic modulus of the structural material is positive, if the structure deforms, deformation energy is generated; otherwise, the deformation energy is zero.

Expression (5) also shows that the mutual energy is symmetrical and satisfies the commutative law. Combined with Expression (7), it can be inferred that the mutual energy satisfies the Cauchy–Schwarz inequality

{(U [u (\vec{x}), v (\vec{x})])}^{2} \leq U [u (\vec{x}), u (\vec{x})] \cdot U [v (\vec{x}), v (\vec{x})] .

(8)

The Expressions (7) and (8) show that the mutual energy can be regarded as an inner product of the structural deformation functions. For simplicity, we use

{〈 u, v 〉}_{U}

and

〈 u, v 〉

to represent the mutual-energy inner product and the Euclidean inner product, respectively; that is,

{\begin{cases} {〈 u, v 〉}_{U} = U [u (\vec{x}), v (\vec{x})] \\ 〈 u, v 〉 = \int_{Ω} u (\vec{x}) v (\vec{x}) d Ω \end{cases} .

(9)

We define

{‖ u ‖}_{2}

as the norm derived from

〈 u, v 〉

, and

{‖ u ‖}_{U}

as the norm derived from

{〈 u, v 〉}_{U}

. Based on Expression (6),

{‖ u ‖}_{U}

satisfies

{‖ u ‖}_{U} = \sqrt{2 E [u (\vec{x})]} .

(10)

{‖ u ‖}_{U}

is proportional to the square root of the deformation energy, and is also the energy norm. According to the Cauchy–Schwarz inequality (8),

{‖ u ‖}_{U}

satisfies the triangle inequality

{‖ u + v ‖}_{U} \leq {‖ u ‖}_{U} + {‖ v ‖}_{U} .

(11)

Based on Expression (1), when a structure is subjected to a piecewise continuous external force, its deformation function has piecewise continuous first-order derivatives on the domain

Ω

and satisfies the boundary condition

l [u (\vec{x})] = 0

. The set of these deformation functions can span a space

V (Ω)

, which can be equipped either with the Euclidean inner product

〈 u, v 〉

or with the mutual-energy inner product

{〈 u, v 〉}_{U}

.

In addition, applying the variational principle, Expression (1) can also be rewritten as the minimum energy principle expression

\min_{u (\vec{x})} π [u] = \frac{1}{2} {〈 u, u 〉}_{U} - 〈 f, u 〉 .

(12)

Here, the feasible domain of

u (\vec{x})

has piecewise continuous first-order derivatives on

Ω

, and does not need to satisfy homogeneous boundary conditions.

3. Signal Processing Property of Mutual-Energy Inner Product

The eigenequation of

L [u (\vec{x})]

can be written as

{\begin{cases} L [u (\vec{x})] = λ u (\vec{x}) & \vec{x} \in Ω \\ p (\vec{x}) \frac{\partial u (\vec{x})}{\partial n} + σ (\vec{x}) u (\vec{x}) = 0 & \vec{x} \in Γ \end{cases} .

(13)

For Expression (13), its non-zero solutions

φ (\vec{x})

and the corresponding coefficients

λ

are called eigenfunctions and eigenvalues, respectively. These eigenfunctions and eigenvalues have the following properties due to

p (\vec{x}) > 0

,

q (\vec{x}) > 0

,

σ (\vec{x}) \geq 0

[36].

(1): Expression (13) has infinite eigenvalues $λ_{n}$ and eigenfunctions $φ_{n} (\vec{x})$ , i.e., $n = 1, 2 \dots \infty$ . If all the eigenvalues are ranked like $λ_{1} \leq λ_{2} \dots λ_{n} \leq \dots$ then they satisfy $λ_{1} > 0$ and $\lim_{n \to \infty} λ_{n} = + \infty$ . Meanwhile, $λ_{n}$ has continuous dependence on $p (\vec{x})$ , $q (\vec{x})$ and $σ (\vec{x})$ , and will increase with the increase in $p (\vec{x})$ , $q (\vec{x})$ and $σ (\vec{x})$ .
(2): Normalized eigenfunctions $φ_{n} (\vec{x})$ satisfy the orthogonality condition (14), and can form a set of orthogonal and complete basis functions to span the deformation function space $V (Ω)$ .

$〈 φ_{m}, φ_{n} 〉 = {\begin{matrix} 1 m = n \\ 0 m \neq n \end{matrix} .$

(14)

Therefore, the solutions of Expression (1) can be expressed by

φ_{n} (\vec{x})

. For

\forall u (\vec{x}) \in V (Ω)

,

u (\vec{x})

can be presented as a series of eigenfunctions satisfying absolute and uniform convergence, i.e.,

u (x) = \sum_{n = 1}^{+ \infty} c_{n} φ_{n} (x), c_{n} = 〈 u, φ_{n} 〉 .

(15)

Expression (15) has profound physical meaning.

\sqrt{λ_{n}}

and

φ_{n} (\vec{x})

are the

n - t h

order structural natural frequency and the

n - t h

order vibration mode. If

u (\vec{x})

is regarded as a vibration amplitude function, it can be decomposed into a superposition of the vibration modes at each order natural frequency, where the coefficient

c_{n}

is the vibration magnitude at

φ_{n} (\vec{x})

. This is equivalent to spectral decomposition. Imagine such a scene. When we enjoy a piece of music, our brains constantly decompose the instantaneous vibration amplitude

u (\vec{x})

according to Expression (15), and meanwhile, perceive the vibration coefficients

c_{n}

and mark them with

\sqrt{λ_{n}}

. For a musical instrument,

\sqrt{λ_{1}}

is its fundamental frequency (tone) and the remaining eigenvalues are overtones. Different musical instruments have different vibration patterns, and their eigenfunctions

{φ_{n} (\vec{x}), n = 1, 2 \dots}

are also different. However, after tuning the tone of the different musical instruments, the fundamental frequency of each note is consistent.

The eigenfunctions and eigenvalues satisfy Expression (13), so we have

L [φ_{n} (\vec{x})] = λ_{n} φ_{n} (\vec{x}) .

(16)

Multiplying both sides of Expression (16) by

φ_{n} (\vec{x})

and integrating by parts, we can yield

{〈 φ_{n}, φ_{m} 〉}_{U} = {\begin{array}{l} λ_{n} & m = n \\ 0 & m \neq n \end{array} .

(17)

Expression (17) shows that the eigenfunctions also satisfy the orthogonal condition with respect to the mutual-energy inner product. So, these eigenfunctions

{φ_{n} (\vec{x}) / \sqrt{λ_{n}}, n = 1, 2 \dots}

can also be used as basis functions to span the mutual-energy inner product space

V (Ω)

.

Substituting Expression (15) into Expression (5) and applying Expression (17), we have

{〈 u, u 〉}_{U} = \sum_{n = 1}^{\infty} c_{n}^{2} λ_{n} .

(18)

If

u (\vec{x})

satisfies the normalization condition

{‖ u ‖}_{2}^{2} = 1

or

\sum_{n = 1}^{\infty} c_{n}^{2} = 1

, based on the Expressions (14) and (18), the eigenvalue

λ_{n}

satisfies

\begin{array}{l} λ_{n} = \min_{u \in V (Ω)} {〈 u, u 〉}_{U} \\ s . t . \\ 〈 u, u 〉 = 1 \\ 〈 u, φ_{i} 〉 = 0 i = 1, 2 \dots n - 1 \end{array},

(19)

where the optimal solution of

u (\vec{x})

is the eigenfunction

φ_{n} (\vec{x})

.

Similarly, the deformation

v (\vec{x})

caused by

g (\vec{x})

can be expressed as

v (\vec{x}) = \sum_{n = 1}^{\infty} d_{n} φ_{n} (\vec{x}),

(20)

where

d_{n}

is the amplitude coefficient and can be interpreted as the component of

v (\vec{x})

at the

n - t h

vibration mode

φ_{n} (\vec{x})

. Substituting Expression (20) into Expression (12) and using the orthogonal condition (17), we have

\min_{d_{n}, n = 1, 2 \dots \infty} π [d_{1}, d_{2} \dots] = \sum_{n = 1}^{\infty} (\frac{1}{2} λ_{n} d_{n}^{2} - g_{n} d_{n}),

(21)

where the coefficient

g_{n}

is the projection of

g (\vec{x})

on

φ_{n} (\vec{x})

with respect to the Euclidean inner product

g_{n} = 〈 g, φ_{n} 〉 .

(22)

Enforcing the derivative of

π [d_{1}, d_{2} \dots]

in Expression (21) with respect to

d_{n}

to zero, we have

d_{n} = \frac{g_{n}}{λ_{n}} n = 1, 2 \dots \infty .

(23)

According to the series representation of

u (\vec{x})

in Expression (15), if

u (\vec{x})

is the deformation caused by

f (\vec{x})

, the coefficient

c_{n}

satisfies

c_{n} = \frac{f_{n}}{λ_{n}}, f_{n} = 〈 f, φ_{n} 〉 n = 1, 2 \dots \infty .

(24)

Substituting the Expressions (15), (20), (23) and (24) into Expression (5) and using the orthogonal condition (17), we have

{〈 u, v 〉}_{U} = \sum_{n = 1}^{\infty} \frac{f_{n} \cdot g_{n}}{λ_{n}} .

(25)

Generally speaking, the external force

f (\vec{x}) \notin V (Ω)

, because

f (\vec{x})

does not satisfy the homogeneous boundary conditions, i.e.,

l [f (\vec{x})] \neq 0

. In this case,

\sum_{n = 1}^{\infty} f_{n} φ_{n} (\vec{x})

is equal to the projection of

f (\vec{x})

on

V (Ω)

or the optimal approximation of

f (\vec{x})

in

V (Ω)

. Of course, in order to make

f (\vec{x}) \in V (Ω)

, we may expand the design domain and simplify the boundary condition. For example, after expanding the design domain, we can set a fixed boundary and let

σ (\vec{x}) = \infty

, or set a mirror boundary and let

σ (\vec{x}) = 0

. In these cases,

l [f (\vec{x})] = 0

and

f (\vec{x}) = \sum_{n = 1}^{\infty} f_{n} φ_{n} (\vec{x})

. Then, applying the orthogonal condition (14) yields

〈 f, g 〉 = \sum_{n = 1}^{\infty} f_{n} \cdot g_{n} .

(26)

After

f (\vec{x})

and

g (\vec{x})

are expressed as a superposition of the eigenfunctions of the operator

L

, through comparing the mutual-energy inner product

{〈 u, v 〉}_{U}

in Expression (25) and the Euclidean inner product

〈 f, g 〉

in Expression (26), it can be found that the mutual-energy inner product has the advantage of enhancing the low-frequency coordinate components (

\sqrt{λ_{n}} < 1

) and suppressing the high-frequency coordinate components (

\sqrt{λ_{n}} > 1

). In other words, if

f (\vec{x})

and

g (\vec{x})

are regarded as signals, the mutual-energy inner product can augment the low-frequency eigenfunction components and filter out the high-frequency eigenfunction components of the signals, with the help of a structural model.

4. Mutual-Energy Inner Product Optimization Model for Feature Extraction

Assume that

D = {(X^{(i)} (\vec{x}), y^{(i)})}_{i = 1}^{N}

is a training dataset with

N

samples, and each sample is represented as

X^{(i)} (\vec{x})

, while

y^{(i)}

represents the class labels. For example, the samples are divided into two classes, and

D

includes two subsets

D_{1} = {X^{(i)} (x) | (X^{(i)} (\vec{x}) \in D, y^{(i)} = 1)}_{i = 1}^{N_{1}}

and

D_{0} = {X^{(i)} (x) | (X^{(i)} (\vec{x}) \in D, y^{(i)} = 0)}_{i = 1}^{N_{0}}

, where

N = N_{1} + N_{0}

. Generally, the samples in different classes are assumed to be random variables, which are independent and have identical distributions.

We hope to find an appropriate feature coordinate system to represent

X^{(i)} (\vec{x}) \in D

and use fewer coordinate components to classify the samples. If there is no further information, we may select the means of the probability distribution of

D_{1}

and

D_{0}

as reference features. In order to design a feature extraction model, two points should be considered: one to enhance the feature information, and the other to suppress the effect of random noise. We resort to a structural model and use the mutual-energy inner product to extract the features. Its main idea is to map the data features to a low-frequency eigenfunction space of the structural model.

If

f (\vec{x})

and

g (\vec{x})

are used to represent the means of the probability distribution of

D_{1}

and

D_{0}

, respectively, their unbiased estimates can be written as

f (\vec{x}) = \frac{1}{N_{1}} \sum_{X^{(i)} (\vec{x}) \in D_{1}} X^{(i)} (\vec{x}), g (\vec{x}) = \frac{1}{N_{0}} \sum_{X^{(i)} (\vec{x}) \in D_{0}} X^{(i)} (\vec{x}) .

(27)

We regard

X^{(i)} (\vec{x})

,

f (\vec{x})

and

g (\vec{x})

as external forces acting on the structural model, and use

d^{(i)} (\vec{x})

,

u (\vec{x})

and

v (\vec{x})

to represent their corresponding deformations, respectively. If we represent the selected reference feature in

V (Ω)

as

α (\vec{x})

, we can use the mutual-energy inner product

{〈 d^{(i)}, α 〉}_{U}

to extract the feature coordinate component of

X^{(i)} (\vec{x})

. In order to construct the feature extraction optimization model, we first select

u (\vec{x})

as the reference feature

α (\vec{x})

and try to explore the physical meanings of the structural model when

{〈 d^{(i)}, u 〉}_{U}

is the maximum, the minimum or equal to zero.

In order to enhance the feature information of the samples in

D_{1}

, a high statistical mean value

μ_{1}

should be given

μ_{1} = \frac{1}{N_{1}} \sum_{X^{(i)} (\vec{x}) \in D_{1}} {〈 d^{(i)}, u 〉}_{U} = {〈 \frac{1}{N_{1}} \sum_{X^{(i)} (\vec{x}) \in D_{1}} d^{(i)}, u 〉}_{U} = {〈 u, u 〉}_{U},

(28)

with a primary objective

\max_{p (\vec{x}), q (\vec{x})} μ_{1} = {〈 u, u 〉}_{U} .

(29)

In Expression (29), the mutual-energy inner product and deformations are functions of

p (\vec{x})

and

q (\vec{x})

, and its physical meaning is not intuitive. So, next, we will conduct a quantitative analysis to reveal the structural characteristics hidden in Expression (29).

According to the minimum energy principle (12), if an optimal solution of

u (\vec{x})

is obtained, the derivative of the objective at the optimal solution in any direction

δ u (\vec{x})

is zero, satisfying

\frac{d}{d τ} π [u (\vec{x}) + τ \cdot δ u (\vec{x})] = 0 \forall δ u (\vec{x}) \in C (Ω) .

(30)

Through calculating Expression (30), we obtain the relationship between

u (\vec{x})

and

f (\vec{x})

{〈 u, δ u 〉}_{U} - 〈 f, δ u 〉 = 0 \forall δ u (\vec{x}) \in C (Ω) .

(31)

Expression (31) is a structural static equilibrium equation, and is also a constraint on

u (\vec{x})

in optimization problem (29). In Expression (31), letting

δ u (\vec{x}) = u (\vec{x})

yields

{〈 u, u 〉}_{U} = 〈 f, u 〉 .

(32)

Substituting Expression (32) into Expression (12) yields the optimal value

π [u]

of the objective

π [u] = - \frac{1}{2} {〈 u, u 〉}_{U} .

(33)

Through substituting Expressions (12) and (33) into the optimization problem (29), Expression (29) is transformed into an unconstrained optimization problem

\max_{p (\vec{x}), q (\vec{x}), u (\vec{x})} μ_{1} [u, p, q] = 2 〈 f, u 〉 - {〈 u, u 〉}_{U} .

(34)

If

p (\vec{x})

and

q (\vec{x})

are given,

μ_{1} [u, p, q]

in Expression (34) is a quadratic and concave functional with respect to

u (\vec{x})

, due to

p (\vec{x}) > 0

and

q (\vec{x}) > 0

. If

u (\vec{x})

is given,

μ_{1} [u, p, q]

is a linear function with respect to

p (\vec{x})

and

q (\vec{x})

.Through using the Univariate Search Method to solve Expression (34), if

p (\vec{x})

and

q (\vec{x})

are given, the maximum value of

μ_{1} [u, p, q]

can be found by solving Expression (31) for

u (\vec{x})

, and if

u (\vec{x})

is given, the maximum value of

μ_{1} [u, p, q]

will be reached on the lower bounds of

p (\vec{x})

and

q (\vec{x})

. So, the lower bounds of

p (\vec{x})

and

q (\vec{x})

must be larger than zero to ensure that Expression (29) has a finite optimal solution. In addition, the upper bounds of

p (\vec{x})

and

q (\vec{x})

should also be constrained to avoid the trivial solution

u (\vec{x}) = 0

. Therefore, when the optimization objective is to maximize the mutual-energy inner product, as shown in Expression (29), its optimal structural model would be the minimum stiffness structure, and the selected feature belongs to a low-frequency eigenfunction subspace. On the contrary, if the optimization objective is to minimize the mutual-energy inner product, the optimal structural model would be the maximum stiffness, and the selected feature would be mapped to a high-frequency eigenfunction subspace.

In addition, when using the mutual-energy inner product to extract feature information

f (\vec{x})

of the samples in

D_{1}

, the feature information

g (\vec{x})

of the samples in

D_{0}

should be suppressed. So, a small statistical mean value

μ_{0}

is given

μ_{0} = \frac{1}{N_{0}} \sum_{X^{(i)} (\vec{x}) \in D_{0}} {〈 d^{(i)}, u 〉}_{U} = {〈 \frac{1}{N_{0}} \sum_{X^{(i)} (\vec{x}) \in D_{0}} d^{(i)}, u 〉}_{U} = {〈 v, u 〉}_{U} .

(35)

Here, we may set

μ_{0}

to be zero or even negative, and impose constraints on the structural model

{〈 v, u 〉}_{U} \leq 0 .

(36)

In Expression (31), setting

δ u (\vec{x}) = v (\vec{x})

yields

{〈 u, v 〉}_{U} = 〈 f, v 〉

. Replacing

f (\vec{x})

with

g (\vec{x})

, and exchanging

u (\vec{x})

,

v (\vec{x})

, we have

{〈 u, v 〉}_{U} = 〈 f, v 〉 = 〈 g, u 〉 .

(37)

If Expression (36) satisfies

{〈 v, u 〉}_{U} = 0

, then

u (\vec{x})

and

v (\vec{x})

are required to be orthogonal with respect to the mutual-energy inner product. Although the means of the two classes of the samples are generally not orthogonal in the continuous function space

C (Ω)

, i.e.,

〈 f, g 〉 \neq 0

, the orthogonality of

u (\vec{x})

and

v (\vec{x})

can be easily realized according to Expression (37). For example, if setting

p (\vec{x}) = 0

and dividing the domain

Ω

into two sub-regions according to the same or opposite signs of

f (\vec{x})

and

g (\vec{x})

, we can adjust

q (\vec{x})

in the two sub-regions and control the positive and negative work performed by the external forces

g (\vec{x})

on the deformations

u (\vec{x})

, so as to make the total work

〈 g, u 〉

in Expression (37) zero. According to Expression (25), this can also be understood as designing a structural model and adjusting its eigenfunctions and eigenvalues, so as to use these eigenvalues as weights to achieve the weighted orthogonality of

f (\vec{x})

and

g (\vec{x})

. Further,

{〈 v, u 〉}_{U} \leq 0

can be regarded as the relaxation of the orthogonal constraints on the mutual-energy inner product, which can be realized by adjusting

p (\vec{x})

and

q (\vec{x})

to make

〈 g, u 〉 < 0

. Geometrically, this means that the angle between

u (\vec{x})

and

v (\vec{x})

in the mutual-energy inner product space

V (Ω)

is not an acute angle. If

μ_{0}

is required to be minimal

\min_{p (\vec{x}), q (\vec{x})} μ_{0} = {〈 v, u 〉}_{U},

(38)

based on Expression (12), similar to the discussion on Expression (29), the optimization problem (38) can be transformed into an unconstrained form

\min_{u (\vec{x}), v (\vec{x}), p (\vec{x}), q (\vec{x})} \max_{z (\vec{x})} μ_{0} [z, u, v, p, q],

(39)

where

z (\vec{x}) \in C (Ω)

is a slack variable introduced to relax the constraint, which is the constraint of the static equilibrium equation describing the structural deformation due to

f (\vec{x})

and

g (\vec{x})

acting on the structure simultaneously. The objective can be expressed as

μ_{0} [z, u, v, p, q] = \frac{1}{2} {〈 u, u 〉}_{U} + \frac{1}{2} {〈 v, v 〉}_{U} - \frac{1}{2} {〈 z, z 〉}_{U} - 〈 f, u - z 〉 - 〈 g, v - z 〉 .

(40)

Obviously, if

p (\vec{x}) > 0

and

q (\vec{x}) > 0

are given,

μ_{0} [z, u, v, p, q]

is a quadratic functional of

u (\vec{x})

,

v (\vec{x})

, and

z (\vec{x})

.

μ_{0} [z, u, v, p, q]

is convex with respect to

u (\vec{x})

and

v (\vec{x})

, and is concave with respect to

z (\vec{x})

. If

u (\vec{x})

,

v (\vec{x})

and

z (\vec{x})

are given,

μ_{0} [z, u, v, p, q]

is linear with respect to

p (\vec{x})

and

q (\vec{x})

.

In order to design a feature coordinate to classify the samples in

D

, the objective is to maximize

μ_{1} - μ_{0}

first. By combining the Expressions (28) and (35), the optimization objective can be expressed as

\min_{p (\vec{x}), q (\vec{x})} μ_{0} - μ_{1} = {〈 v - u, u 〉}_{U} .

(41)

Then, to improve the classification accuracy, the distributions of the samples in

D_{1}

and

D_{0}

along the feature coordinate

u (\vec{x})

should also be considered, and their variances should be small. The variances of

D_{1}

and

D_{0}

are high-order functions of

u (\vec{x})

,

v (\vec{x})

,

p (\vec{x})

and

q (\vec{x})

, so putting them into the optimization objective function (41) will destroy its low-order characteristics.

In order to improve the computational efficiency, the sum of the absolute values of the sample deviations from the mean are used to replace the variances, and only some samples in

D_{1}

and

D_{0}

are selected for calculation. In the subset

D_{1}

, we only select

M_{1}

samples

S_{1} = {X^{(i)} (\vec{x}) | X^{(i)} (\vec{x}) \in D_{1}, 〈 X^{(i)}, u 〉 < μ_{1}}_{i = 0}^{M_{1}}

, whose components on

u (\vec{x})

are less than

μ_{1}

, and calculate their mean absolute deviation

δ_{1}

. In the subset

D_{0}

, we only select

M_{0}

samples

S_{0} = {X^{(i)} (\vec{x}) | X^{(i)} (\vec{x}) \in D_{0}, 〈 X^{(i)}, u 〉 > μ_{0}}_{i = 0}^{M_{0}}

, whose components on

u (\vec{x})

are larger than

μ_{0}

, and calculate their mean absolute deviation

δ_{0}

.

δ_{1}

and

δ_{0}

can be expressed as

{\begin{cases} δ_{1} = \frac{1}{M_{1}} \sum_{X^{(i)} (\vec{x}) \in S_{1}} 〈 f - X^{(i)}, u 〉 = {〈 u, u 〉}_{U} - 〈 \frac{1}{M_{1}} \sum_{X^{(i)} (\vec{x}) \in S_{1}} X^{(i)}, u 〉 \\ δ_{0} = \frac{1}{M_{0}} \sum_{X^{(i)} (\vec{x}) \in S_{0}} 〈 X^{(i)} - g, u 〉 = 〈 \frac{1}{M_{0}} \sum_{X^{(i)} (\vec{x}) \in S_{0}} X^{(i)}, u 〉 - {〈 v, u 〉}_{U} \end{cases} .

(42)

Through using Expressions (41) and (42), and considering the means and the mean absolute deviations of the samples, the optimization objective can be written as

\min_{p (\vec{x}), q (\vec{x})} J [p (\vec{x}), q (\vec{x})] = λ (μ_{0} - μ_{1}) + (1 - λ) (δ_{0} + δ_{1}),

(43)

where

λ

is a weight variable, satisfying

0 \leq λ \leq 1

. To simplify Expression (42), the auxiliary deformation function

w (\vec{x}) \in V (Ω)

is defined as

{〈 w, δ w 〉}_{U} - 〈 h, δ w 〉 = 0 \forall δ w (\vec{x}) \in C (Ω),

(44)

where

h (\vec{x})

can be regarded as an external force corresponding to

w (\vec{x})

, satisfying

h (\vec{x}) = \frac{1}{M_{0}} \sum_{X^{(i)} (\vec{x}) \in S_{0}} X^{(i)} (\vec{x}) - \frac{1}{M_{1}} \sum_{X^{(i)} (\vec{x}) \in S_{1}} X^{(i)} (\vec{x}) .

(45)

By substituting Expressions (41), (42), (44) and (45) into Expression (43), the optimization objective is simplified as

\min_{p (\vec{x}), q (\vec{x})} J [p (\vec{x}), q (\vec{x})] = {〈 c, u 〉}_{U} .

(46)

Here,

c (\vec{x}) \in V (Ω)

is a combination of the deformation functions, and can be expressed as

c (\vec{x}) = (1 - 2 λ) (u (\vec{x}) - v (\vec{x})) + (1 - λ) w (\vec{x}) .

(47)

In order to improve the generalization of the data classifier, regularizers should be added to the optimization model. Here,

{‖ p ‖}_{1}

and

{‖ q ‖}_{1}

stand for the 1-norms of

p (\vec{x})

and

q (\vec{x})

, respectively, and are used as regularizers to avoid increasing the order of the optimization model. Meanwhile, these regularizers are treated as two constraints by directly setting the values of

{‖ p ‖}_{1}

and

{‖ q ‖}_{1}

. Due to

p (\vec{x}) > 0

and

q (\vec{x}) > 0

,

{‖ p ‖}_{1}

and

{‖ q ‖}_{1}

can be simply written as

{‖ p (\vec{x}) ‖}_{1} = \int_{Ω} p (\vec{x}) d Ω, {‖ q (\vec{x}) ‖}_{1} = \int_{Ω} q (\vec{x}) d Ω .

(48)

It should be noted that objective (46) is built by taking the mean

f (\vec{x})

of

D_{1}

as the reference feature and selecting the deformation

u (\vec{x})

as the reference feature coordinate axis. If other deformation functions

α (\vec{x})

are selected as the reference feature coordinate axis, the results are similar. For example,

α (\vec{x})

can be set as

u (\vec{x})

,

v (\vec{x})

,

u (\vec{x}) - v (\vec{x})

, or others. Through setting

α (\vec{x})

as the reference feature coordinate axis, the optimization model can be summarized as

\begin{array}{l} \min_{p (\vec{x}), q (\vec{x})} J [p, q] = {〈 c, α 〉}_{U} \\ s . t . \\ {〈 u, δ u 〉}_{U} = 〈 f, δ u 〉, {〈 v, δ v 〉}_{U} = 〈 g, δ v 〉, {〈 w, δ w 〉}_{U} = 〈 h, δ w 〉 \\ {〈 u, v 〉}_{U} \leq 0 \\ {‖ p ‖}_{1} = T o l p, {‖ q ‖}_{1} = T o l q \\ p_{\min} \leq p (\vec{x}) q_{\min} \leq q (\vec{x}) \end{array} .

(49)

Here,

δ u

,

δ v

and

δ w

are arbitrary continuous functions on

Ω

;

p_{\min} > 0

and

q_{\min} > 0

, are lower bounds of

p (\vec{x})

and

q (\vec{x})

;

T o l p

and

T o l q

are two constants;

f (\vec{x})

,

g (\vec{x})

,

h (\vec{x})

and

c (\vec{x})

are given in Expressions (27), (45) and (47).

S_{1}

and

S_{0}

should be determined according to the reference feature coordinate axis, and can be rewritten as

\begin{array}{l} S_{1} = {X^{(i)} (\vec{x}) | X^{(i)} (\vec{x}) \in D_{1}, 〈 X^{(i)}, α 〉 < 〈 f, α 〉}_{i = 0}^{M_{1}} \\ S_{0} = {X^{(i)} (\vec{x}) | X^{(i)} (\vec{x}) \in D_{0}, 〈 X^{(i)}, α 〉 > 〈 g, α 〉}_{i = 0}^{M_{0}} \end{array} .

(50)

5. Mutual-Energy Inner Product Feature Coordinate Optimization Algorithm

The EFM is used to solve the differential Equation (1) to realize the mapping from

f (\vec{x})

,

g (\vec{x})

, and

h (\vec{x})

to

u (\vec{x})

,

v (\vec{x})

, and

w (\vec{x})

in the optimization model (49). We divide the domain

Ω

into

N e

elements

Ω_{s}^{(e)}

(e = 1, 2 \dots N e)

, and assume the

e - t h

element

Ω_{s}^{(e)}

has

N d

nodes. For the

i - t h

(i = 1, 2 \dots N d)

node in

Ω_{s}^{(e)}

, its global coordinate in

Ω

, deformation value

u ({\vec{x}}_{i}^{(e)})

, and interpolation basis function are denoted as

{\vec{x}}_{i}^{(e)}

,

u_{i}^{(e)}

,

N_{i} (\vec{ξ})

, respectively, where

\vec{ξ} \in R^{n}

, is the local coordinate of the element

Ω_{s}^{(e)}

. In this way, for an element, its global and local coordinate relationship

{\vec{x}}^{(e)} (\vec{ξ})

and the element deformation function

u^{(e)} (\vec{ξ})

can be expressed as [37]

{\vec{x}}^{(e)} (\vec{ξ}) = \sum_{j = 1}^{N d} N_{j} (\vec{ξ}) {\vec{x}}_{j}^{(e)}, u^{(e)} (\vec{ξ}) = \sum_{j = 1}^{N d} N_{j} (\vec{ξ}) u_{j}^{(e)} .

(51)

It is assumed that

\vec{N}

is an

N d

-dimensional row vector with the

j - t h

component

N_{j} (\vec{ξ})

;

L

is an

n \times N d

matrix with the entry

L_{i j} = \partial N_{j} (\vec{ξ}) / \partial ξ_{i}

, where

ξ_{i}

is the

i - t h

component of the local coordinate

\vec{ξ}

; and

X

is an

N d \times n

matrix with the entry

X_{i j} = x_{i j}^{(e)}

, where

x_{i j}^{(e)}

is the

j - t h

component of the element node coordinates

{\vec{x}}_{i}^{(e)}

. Applying Expression (51), the

n \times n

Jacobi matrix

J

for the transformation between the global and local coordinates, the deformation function

u^{(e)} (\vec{x})

and its

n

-dimensional gradient vector

\nabla u^{(e)} (\vec{x}) = {[\begin{matrix} \partial u^{(e)} / \partial x_{1} & \partial u^{(e)} / \partial x_{2} & \dots & \partial u^{(e)} / \partial x_{n} \end{matrix}]}^{T}

, can be expressed in the concise and compact form

u^{(e)} (\vec{x}) = \vec{N} \cdot {\vec{u}}^{(e)}, \nabla u^{(e)} (\vec{x}) = B \cdot {\vec{u}}^{(e)}, J = L \cdot X, B = J^{- 1} L,

(52)

where

{\vec{u}}^{(e)} = {[\begin{matrix} u_{1}^{(e)} & u_{2}^{(e)} & \dots & u_{N d}^{(e)} \end{matrix}]}^{T}

is a vector with the component

u_{i}^{(e)}

, which is the deformation value of the

i - t h

node in the

e - t h

element, and

B

is an

n \times N d

matrix. In the optimization model (49), the design variables are

p (\vec{x})

and

q (\vec{x})

. We assume

p (\vec{x})

and

q (\vec{x})

in each element are constants

p_{e}

and

q_{e}

. So, the design variables can be expressed as

\vec{p} = {[\begin{matrix} p_{1} & p_{2} & \dots & p_{N e} \end{matrix}]}^{T}

and

\vec{q} = {[\begin{matrix} q_{1} & q_{2} & \dots & q_{N e} \end{matrix}]}^{T}

in

Ω

.

Substituting Expression (52) into the mutual-energy expressions (5) and (9) yields

{〈 u, u 〉}_{U} = \sum_{e = 1}^{N e} {\vec{u}}^{(e) T} K_{s}^{(e)} {\vec{u}}^{(e)}, 〈 f, u 〉 = \sum_{e = 1}^{N e} {\vec{u}}^{(e) T} {\vec{f}}^{(e)} .

(53)

Here,

K_{s}^{(e)}

is an

N d \times N d

element stiffness matrix, which is a positive semidefinite symmetric matrix and can be expressed as

K_{s}^{(e)} = p_{e} K_{p}^{(e)} + q_{e} K_{q}^{(e)} + K_{σ}^{(e)} .

(54)

In Expression (54),

K_{s}^{(e)}

is a linear function of

p_{e}

and

q_{e}

;

K_{p}^{(e)}

and

K_{q}^{(e)}

are corresponding coefficient matrices; and

K_{σ}^{(e)}

is the contribution of the boundary constraint to the element stiffness matrix. If the element boundary does not overlap with the design domain boundary, then

K_{σ}^{(e)} = 0

. Here,

K_{p}^{(e)}

,

K_{q}^{(e)}

,

K_{σ}^{(e)}

can be calculated by

K_{p}^{(e)} = \int_{Ω_{s}^{(e)}} B^{T} B d Ω, K_{q}^{(e)} = \int_{Ω_{s}^{(e)}} {\vec{N}}^{T} \vec{N} d Ω, K_{σ}^{(e)} = \int_{Γ_{s}^{(e)}} σ (\vec{x}) {\vec{N}}^{T} \vec{N} d Ω .

(55)

In Expression (53),

{\vec{f}}^{(e)}

is the equivalent node input vector, resulting from the equivalent action between the force

f (\vec{x})

on the element and the force

{\vec{f}}^{(e)}

on the node, and satisfies

{\vec{f}}^{(e)} = \int_{Ω_{s}^{(e)}} f (\vec{x}) {\vec{N}}^{T} d Ω .

(56)

It is assumed that the design domain

Ω

comprises

M

element nodes. We number these nodes globally, and use two

M

-dimension vectors

\vec{u}

and

\vec{f}

to denote the values of

u (\vec{x})

and

f (\vec{x})

at all the nodes. The components of

\vec{u}

and

\vec{f}

are

u_{i}

and

f_{i}

, where the subscript

i

is the global node number. The component

f_{i}

can be calculated through Expression (56). Expression (56) is calculated for each element adjacent to the

i - t h

global node, and

f_{i}

is the superposition of the element node corresponding to the

i - t h

global node.

Based on the relationship between the local and global node numbers, Expression (53) can be rewritten as

{〈 u, u 〉}_{U} = {\vec{u}}^{T} K \vec{u} = \sum_{e = 1}^{N e} {\vec{u}}^{(e) T} K_{s}^{(e)} {\vec{u}}^{(e)}, 〈 f, u 〉 = {\vec{u}}^{T} \vec{f} = \sum_{e = 1}^{N e} {\vec{u}}^{(e) T} {\vec{f}}^{(e)},

(57)

where

K

is the global stiffness matrix, an

M \times M

positive definite symmetric matrix. Substituting Expression (57) into Expression (12) yields

\min_{\vec{u}} π [\vec{u}] = \frac{1}{2} {\vec{u}}^{T} K \vec{u} - {\vec{u}}^{T} \vec{f} .

(58)

Based on Expression (58), the solution of the differential Equation (1) satisfies

K \vec{u} - \vec{f} = 0 .

(59)

Similarly, assume that the input of Expression (1) is

g (\vec{x})

and the corresponding solution is

v (\vec{x})

;

\vec{v}

is the global node vector corresponding to

v (\vec{x})

on

Ω

, and

{\vec{v}}^{(e)}

is the element node vector corresponding to

v (\vec{x})

on

Ω_{s}^{(e)}

; and

\vec{g}

is the equivalent node input vector corresponding to

g (\vec{x})

. We have

K \vec{v} - \vec{g} = 0 .

(60)

Similarly to the derivation of Expression (57), through using Expressions (59) and (60), the mutual-energy expression of

u (\vec{x})

and

v (\vec{x})

can be derived

{〈 u, v 〉}_{U} = {\vec{u}}^{T} K \vec{v} = \sum_{e = 1}^{N e} {\vec{u}}^{(e) T} K_{s}^{(e)} {\vec{v}}^{(e)}, {〈 u, v 〉}_{U} = {\vec{u}}^{T} \vec{f} = {\vec{v}}^{T} \vec{g} .

(61)

In Expression (61), the first equation is used for model optimization, and the second equation is used for data classifier training and prediction, avoiding the need to solve for the Expressions (59) and (60).

After discretizing the design domain by finite elements, the differential Equation (1) is converted into a system of linear equations, and the mutual-energy definition (5) can be expressed by the matrix and vector product. In this way, the optimization model (49) can be rewritten in the vector form

\begin{array}{l} \min_{\vec{p}, \vec{q}} J [\vec{p}, \overset{\leftarrow}{q}] = {\vec{c}}^{T} K \vec{α} \\ s . t . \\ K \vec{u} = \vec{f}, K \vec{v} = \vec{g}, K \vec{w} = \vec{h} \\ G [\vec{p}, \overset{\leftarrow}{q}] = {\vec{u}}^{T} K \vec{v} \leq 0 \\ \sum_{e = 1}^{N e} p_{e} = T o l p, \sum_{e = 1}^{N e} q_{e} = T o l q \\ p_{\min} \leq p_{e}, q_{\min} \leq q_{e}, e = 1, 2 \dots N e \end{array} .

(62)

Here,

\vec{α}

is the finite element node vector corresponding to the selected reference feature coordinate, and can be the statistical features of the sample sets or their combination; for example,

\vec{α} = \vec{u} o r \vec{α} = \vec{v} o r \vec{α} = \vec{u} - \vec{v} .

(63)

Meanwhile,

\vec{f}

,

\vec{g}

, and

\vec{h}

are the finite element node vectors corresponding to the mean and deviation of the samples, and

\vec{c}

is the temporary node vector generated by the mean and deviation. Expression (47) can be rewritten as

\vec{c} = (1 - 2 λ) (\vec{u} - \vec{v}) + (1 - λ) \vec{w} .

(64)

The significant advantage of the optimization model (62) is that

K

is a positive definite symmetric matrix and is linear with respect to the design variables

\vec{p}

and

\vec{q}

, and meanwhile, the coefficient matrices corresponding to the components of the design variables are positive semidefinite matrices, convenient for the algorithm design. Intermediate variables

\vec{u}

,

\vec{v}

,

\vec{w}

are functions of the design variables and can be calculated by using the linear equations, and the optimization model (62) can be solved by the sequential linearization algorithm. The objective

J [\vec{p}, \overset{\leftarrow}{q}]

and the constraint

G [\vec{p}, \overset{\leftarrow}{q}]

are nonlinear, and their derivatives with respect to the design variables need to be calculated. The derivative of

G [\vec{p}, \overset{\leftarrow}{q}]

with respect to

p_{e}

is

\frac{\partial G}{\partial p_{e}} = \frac{\partial {\vec{u}}^{T}}{\partial p_{e}} K \vec{v} + {\vec{u}}^{T} \frac{\partial K}{\partial p_{e}} \vec{v} + {\vec{u}}^{T} K \frac{\partial \vec{v}}{\partial p_{e}} e = 1, 2 \dots N e,

(65)

where

\frac{\partial \vec{u}}{\partial p_{e}}

and

\frac{\partial \vec{v}}{\partial p_{e}}

are determined by taking the derivative of

K \vec{u} = \vec{f}

and

K \vec{v} = \vec{g}

with respect to

p_{e}

\frac{\partial K}{\partial p_{e}} \vec{u} + K \frac{\partial \vec{u}}{\partial p_{e}} = 0, \frac{\partial K}{\partial p_{e}} \vec{v} + K \frac{\partial \vec{v}}{\partial p_{e}} = 0 .

(66)

Substituting Expression (66) into Expression (44) yields

\frac{\partial G}{\partial p_{e}} = - {\vec{u}}^{T} \frac{\partial K}{\partial p_{e}} \vec{v} .

(67)

Substituting Expression (54) into Expression (67) yields

\partial G / \partial p_{e}

. Similarly,

\partial G / \partial q_{e}

can also be computed

\frac{\partial G}{\partial p_{e}} = - {\vec{u}}^{(e) T} K_{p}^{(e)} {\vec{v}}^{(e)}, \frac{\partial G}{\partial q_{e}} = - {\vec{u}}^{(e) T} K_{q}^{(e)} {\vec{v}}^{(e)} .

(68)

The Expressions (63) and (64) show that

\vec{c}

and

\vec{α}

are linear combinations of

\vec{u}

,

\vec{v}

, and

\vec{w}

. According to the superposition principle,

\partial \vec{c} / \partial p_{e}

and

\partial \vec{α} / \partial p_{e}

also satisfy equations similar to Expression (66), and have exactly the same derivation as Expression (68). So, we obtain

\frac{\partial J}{\partial p_{e}} = - {\vec{c}}^{(e) T} K_{p}^{(e)} {\vec{α}}^{(e)}, \frac{\partial J}{\partial q_{e}} = - {\vec{c}}^{(e) T} K_{q}^{(e)} {\vec{α}}^{(e)} .

(69)

Optimization Algorithm 1: Mutual-energy inner product feature coordinate optimization algorithm

Based on Expressions (68) and (69), the optimization model (62) can be solved by the sequential linearization algorithm. The algorithm steps are summarized as follows:

(1)

Use vectors to represent the sample data

Convert the sample data

X^{(i)} (\vec{x})

in the training subsets

D_{1}

and

D_{0}

into the finite element node vectors

{\vec{X}}^{(i)} \in R^{M}

. Based on Expression (70), first calculate the element node vectors

{\vec{X}}^{(i) (e)} \in R^{N d}

, and then use them to assemble the global node vector

{\vec{X}}^{(i)}

.

{\vec{X}}^{(i) (e)} = \int_{Ω_{s}^{(e)}} X^{(i)} (\vec{x}) {\vec{N}}^{T} d Ω e = 1, 2 \dots N e .

(70)

(2)

Set the optimization constants and initial values of the design variables

①: Set the optimization constants

Set

λ

, the weight of the mean and deviation, with the requirement

λ \in [0, 1]

; set the total amount

T o l p

,

T o l q

and the lower bounds

p_{\min}

,

q_{\min}

of the design variables; set the moving limit

Δ x_{\max}

of the design variables for the linear programming; set the design variable minimum increment

ε_{x}

and the objective function minimum increment

ε_{J}

, which are used to determine if the optimization ends or not.

②: Set the initial values of the design variables

Set

p_{e} = p_{e}^{(0)}

,

q_{e} = q_{e}^{(0)}

(e = 1, 2 \dots N e)

. Generally, set

p_{e}^{(0)} = T o l p / M

,

q_{e}^{(0)} = T o l q / M

.

(3)

Calculate the current value of the objective function

①: Calculate the element stiffness matrices and assemble the global stiffness matrix

Based on Expressions (54) and (55), calculate the element stiffness matrices

K_{s}^{(e)} (e = 1, 2 \dots N e)

. The element stiffness matrix is linear with respect to

p_{e}

and

q_{e}

, and the coefficient matrices are determined only by the element interpolation basis functions, so the calculation can be performed prior to the optimization to speed up the optimization process. Then, assemble the global stiffness matrix according to the node numbers. Since

K

is a positive definite symmetric matrix, through performing Cholesky decomposition on it, we can have

K = L \cdot L^{T}

, where

L

is a lower triangular matrix.

②: Compute the mean vectors $\vec{u}$ and $\vec{v}$ , and select the reference feature coordinate axis $\vec{α}$

{\begin{matrix} \vec{u} = L^{- T} (L^{- 1} \vec{f}) \\ \vec{f} = \frac{1}{N_{1}} \sum_{{\vec{X}}^{(i)} \in D_{1}} {\vec{X}}^{(i)} \end{matrix}, {\begin{matrix} \vec{v} = L^{- T} (L^{- 1} \vec{g}) \\ \vec{g} = \frac{1}{N_{0}} \sum_{{\vec{X}}^{(i)} \in D_{0}} {\vec{X}}^{(i)} \end{matrix},

(71)

where

\vec{f}

and

\vec{g}

represent the means of the sample data in

D_{1}

and

D_{0}

;

N_{1}

and

N_{0}

are the sample numbers in

D_{1}

and

D_{0}

;

\vec{α}

can be selected and calculated by Expression (63).

③: Compute the deviation vector $\vec{w}$ and the intermediate vector $\vec{c}$

{\begin{matrix} \vec{w} = L^{- T} (L^{- 1} \vec{h}) \\ \vec{h} = \frac{1}{M_{0}} \sum_{{\vec{X}}^{(i)} \in S_{0}} {\vec{X}}^{(i)} - \frac{1}{M_{1}} \sum_{{\vec{X}}^{(i)} \in S_{1}} {\vec{X}}^{(i)} \end{matrix}, {\begin{matrix} S_{1} = {{\vec{X}}^{(i)} | {\vec{X}}^{(i)} \in D_{1}, {\vec{α}}^{T} {\vec{X}}^{(i)} < μ_{1}}_{i = 1}^{M_{1}} \\ S_{0} = {{\vec{X}}^{(i)} | {\vec{X}}^{(i)} \in D_{0}, {\vec{α}}^{T} {\vec{X}}^{(i)} > μ_{0}}_{i = 1}^{M_{0}} \end{matrix},

(72)

where

\vec{h}

is the deviation of the sample data and only the sample data in

S_{1}

and

S_{0}

are calculated.

μ_{1} = {\vec{α}}^{T} \vec{f}

and

μ_{0} = {\vec{α}}^{T} \vec{g}

represent the projections of the means of the sample data in

D_{1}

and

D_{0}

on

\vec{α}

. After

\vec{u}

,

\vec{v}

,

\vec{w}

are obtained,

\vec{c}

can be obtained by Expression (64).

④: Calculate the current values of the objective function and the constraint

Based on the optimization model (62), the current values of

J [\vec{p}, \overset{\leftarrow}{q}]

and

G [\vec{p}, \overset{\leftarrow}{q}]

can be calculated by

J_{0} = {\vec{c}}^{T} K \vec{α}, G_{0} = {\vec{u}}^{T} K \vec{v} .

(73)

(4)

Calculate the gradient vectors of the objective function and the constraint

Apply Expressions (68) and (69) to calculate

\partial J / \partial p_{e}

,

\partial J / \partial q_{e}

,

\partial G / \partial p_{e}

and

\partial G / \partial q_{e}

. Then, express them as the compact gradient vectors

\nabla_{\vec{p}} J

,

\nabla_{\vec{q}} J

,

\nabla_{\vec{p}} G

and

\nabla_{\vec{q}} G

. Here,

\nabla_{\vec{p}} J

is defined as

\nabla_{\vec{p}} J = {[\begin{matrix} \partial J / \partial p_{1} & \partial J / \partial p_{2} & \dots & \partial J / \partial p_{M} \end{matrix}]}^{T}

and the other gradient vector definitions are similar. In Expressions (68) and (69),

K_{p}^{(e)}

and

K_{q}^{(e)}

are only determined by the element interpolation basis functions and are constant matrices independent of the design variables. So,

K_{p}^{(e)}

and

K_{q}^{(e)}

can be calculated prior to the optimization, and the gradient vectors of

J [\vec{p}, \overset{\leftarrow}{q}]

and

G [\vec{p}, \overset{\leftarrow}{q}]

can be achieved through the mapping relationship between the local and global node numbers.

(5)

Obtain increments of the design variables by solving the sequential linearization optimization model

①: Construct the sequential linearization optimization model

\begin{array}{l} \min_{{\vec{x}}_{p}, {\vec{x}}_{q}} J [{\vec{x}}_{p}, {\vec{x}}_{q}] = J_{0} + {(\nabla_{\vec{p}} J)}^{T} {\vec{x}}_{p} + {(\nabla_{\vec{q}} J)}^{T} {\vec{x}}_{q} \\ s . t . \\ G [{\vec{x}}_{p}, {\vec{x}}_{q}] = G_{0} + {(\nabla_{\vec{p}} G)}^{T} {\vec{x}}_{p} + {(\nabla_{\vec{q}} G)}^{T} {\vec{x}}_{q} \leq 0 \\ \sum_{i = 1}^{N e} x_{p, i} = T o l x_{p}, \sum_{e = 1}^{N e} x_{q, i} = T o l x_{q} \\ x_{p \min, i} \leq x_{p, i} \leq Δ x_{\max}, x_{q \min, i} \leq x_{q, i} \leq Δ x_{\max} i = 1, 2 \dots N e \end{array},

(74)

where the design variables

\vec{p} \in R^{M}

,

\vec{q} \in R^{M}

;

{\vec{x}}_{p}

and

{\vec{x}}_{q}

are increments of the design variables, and their

i - t h

components are

x_{p, i}

and

x_{q, i}

;

T o l x_{p}

,

T o l x_{q}

, and

x_{p \min, i}

,

x_{q \min, i}

can be calculated by

{\begin{matrix} T o l x_{p} = T o l p - \sum_{i = 1}^{N e} p_{i} \\ T o l x_{q} = T o l q - \sum_{i = 1}^{N e} q_{i} \end{matrix}, {\begin{matrix} x_{p \min, i} = \max (p_{\min} - p_{i}, - Δ x_{\max}) \\ x_{q \min, i} = \max (q_{\min} - q_{i}, - Δ x_{\max}) \end{matrix} .

②: Solve the sequential linearization optimization model (74) to obtain ${\vec{x}}_{p}$ and ${\vec{x}}_{q}$

When solving Expression (74), slack variables are added to

G [{\vec{x}}_{p}, {\vec{x}}_{q}] \leq 0

to facilitate the initial feasible solution construction.

(6)

Determine whether to end the optimization iteration

①: Store the design variables, the objective function, and the constraint function of the previous step of the sequential linearization optimization.

Store the design variables

{\vec{p}}^{o l d} = \vec{p}

,

{\vec{q}}^{o l d} = \vec{q}

, the objective function value

J_{0}^{o l d} = J_{0}

, and the constraint function value

G_{0}^{o l d} = G_{0}

.

②: Update the design variables and the objective function value.

Let

\vec{p} = \vec{p} + {\vec{x}}_{p}

,

\vec{q} = \vec{q} + {\vec{x}}_{q}

, then execute step (3) to update the objective function value

J_{0}

.

③: Determine whether to end the iteration.

If

| J_{0} - J_{0}^{o l d} | \leq ε_{J}

or

\max [{‖ {\vec{x}}_{p} ‖}_{\infty}, {‖ {\vec{x}}_{p} ‖}_{\infty}] \leq ε_{x}

, then end the iteration. Otherwise, if

J_{0} < J_{0}^{o l d}

, go to step (4) to continue the iteration; if

J_{0} \geq J_{0}^{o l d}

, reduce the moving limits of the design variables by letting

Δ x_{\max} = γ \cdot Δ x_{\max}

; here,

γ = 0.5 \sim 0.85

, then go to step (5) to iteratively calculate the design variable increments

{\vec{x}}_{p}

and

{\vec{x}}_{q}

.

6. Algorithm Implementation and Image Classifier

Image classification is used to determine if an image has certain given features and can be realized by algorithms for extracting the feature information of the image. Applying the mutual-energy inner product to extract the image features has the advantage of enhancing the feature information and suppressing other high-frequency noise. If we select multiple features of an image, we can design multiple mutual-energy inner products, and each mutual-energy inner product can be regarded as one feature coordinate of the image. Using multiple mutual-energy inner products to characterize an image is equivalent to using multiple feature coordinates to describe the image, or equivalent to representing the high-dimensional image in a low-dimensional space, reducing the dimensionality of image data.

This part will discuss the implementation of Optimization Algorithm 1 and its application in 2-D grayscale image classification. Assume that each sample

X^{(i)} (\vec{x})

in the training datasets

D_{1}

and

D_{0}

is a 2-D grayscale image; the domain

Ω

occupied by the image is rectangular; each image is expressed by

n_{1} \times n_{2}

pixels; and each pixel is a square with a side length of 1. In this case,

\vec{x} \in Ω \in R^{2}

and

Ω = {(x_{1}, x_{2}) | 0 \leq x_{1} \leq n_{1}, 0 \leq x_{2} \leq n_{2}}

.

6.1. Vectorized Implementation of Optimization Algorithm 1

While using FEM to discretize the design domain, we regard each pixel as a finite element and divide the domain

Ω

into

n_{1} \times n_{2}

quadrilateral elements

Ω_{s}^{(e)}

, i.e.,

e = 1, 2 \dots N e

and

N e = n_{1} \times n_{2}

. In

Ω

, the global element numbering uses column priority, where the upper left corner element is numbered 1 and the lower right corner element is numbered

N e

. A planar quadrilateral element is used to interpolate the deformation functions. Each element has four nodes, so the total number of nodes is

M = (n_{1} + 1) \times (n_{2} + 1)

, and the total number of boundary nodes is

n_{Γ} = 2 \times (n_{1} + n_{2})

. The global node numbering also uses column priority, where the upper left corner node is numbered 1 and the lower right corner node is numbered

M

. The interpolation basis functions of the quadrilateral element are

N_{i} (\vec{ξ}) = \frac{1}{4} (1 + ξ_{1}^{(i)} ξ_{1}) (1 + ξ_{2}^{(i)} ξ_{2}) i = 1, 2, 3, 4,

(75)

where the domain of the definition is square and is expressed as

Ω_{ξ} = {(ξ_{1}, ξ_{2}) | - 1 \leq ξ_{1} \leq 1, - 1 \leq ξ_{2} \leq 1}

. The element nodes are four corner points of the quadrilateral. The node with the coordinate

(- 1, - 1)

is numbered 1, in counter-clockwise order, and the other nodes with the coordinates

(+ 1, - 1)

,

(+ 1, + 1)

,

(- 1, + 1)

are numbered 2, 3, and 4, respectively. The interpolation basis function

N_{i} (\vec{ξ})

corresponds to the

i - t h

node, where

(ξ_{1}^{(i)}, ξ_{2}^{(i)})

is the corresponding node coordinate. The mapping relationship between the element node numbers and the global node numbers can be described by an

N e \times 4

matrix

Θ

, and its

i - t h

row corresponds to the

i - t h

element. If

Θ_{i, j}

denotes its entry at the

i - t h

row and the

j - t h

column, then

Θ_{i, 1}

,

Θ_{i, 2}

,

Θ_{i, 3}

,

Θ_{i, 4}

are the global node numbers corresponding to the element node numbers 1, 2, 3 and 4 of the

i - t h

element. So, we have

{\begin{cases} Θ_{i, 1} = i + 1 + r \\ Θ_{i, 2} = Θ_{i, 1} + n_{1} + 1 \end{cases}, {\begin{matrix} Θ_{i, 3} = Θ_{i, 1} + n_{1} \\ Θ_{i, 4} = Θ_{i, 1} - 1 \end{matrix},

(76)

where

r

is a module when

i

is divided by

n_{1}

. Since all the elements are same squares, the isoparametric transformation

{\vec{x}}^{(e)} (\vec{ξ})

in Expression (51) is actually a scaling transformation. Through substituting

N_{i} (\vec{ξ})

into Expressions (52) and (55), we can find that the coefficient matrices

K_{p}^{(e)}

and

K_{q}^{(e)}

are independent of the element node numbers. So, we use

K_{p}

and

K_{q}

to express

K_{p}^{(e)}

and

K_{q}^{(e)}

, and calculate them directly by

K_{p} = \frac{1}{24} [\begin{matrix} 4 & - 1 & - 2 & - 1 \\ - 1 & 4 & - 1 & - 2 \\ - 2 & - 1 & 4 & - 1 \\ - 1 & - 2 & - 1 & 4 \end{matrix}], K_{q} = \frac{1}{36} [\begin{matrix} 4 & 2 & 1 & 2 \\ 2 & 4 & 2 & 1 \\ 1 & 2 & 4 & 2 \\ 2 & 1 & 2 & 4 \end{matrix}] .

(77)

When a side of an element overlaps with the boundary of the domain

Ω

, the influence of the boundary conditions

l [u (\vec{x})] = 0

in Expression (1) on

K_{s}^{(e)}

should be considered, so a

4 \times 4

matrix

K_{σ}^{(e)}

should be calculated. Assume that the

j - t h

side of the element overlaps with the boundary of

Ω

and the entry in the

i - t h

row and

j - t h

column of

K_{σ}^{(e)}

is

K_{σ i, j}^{(e)}

. Then, the non-zero entries in

K_{σ}^{(e)}

can be calculated by

K_{σ j, j}^{(e)} = K_{σ \hat{j}, \hat{j}}^{(e)} = \frac{2}{3} σ_{j, \hat{j}}^{(e)}, K_{σ j, \hat{j}}^{(e)} = K_{σ \hat{j}, j}^{(e)} = \frac{1}{3} σ_{j, \hat{j}}^{(e)} .

(78)

In Expression (78), the subscripts

j

and

\hat{j}

stand for the starting and end points of the

j - t h

side of the element, where the starting point is the element node numbered

j

and the end point is determined along the side in counterclockwise order;

σ_{j, \hat{j}}^{(e)}

is a constant, equal to the approximate value of

σ (\vec{x})

on the

j - t h

side. In this paper, we handle the influence of

l [u (\vec{x})] = 0

on

K_{s}^{(e)}

while assembling the global stiffness matrix. We just simply replace the subscripts

j

and

\hat{j}

of

K_{σ}^{(e)}

in Expression (78) with global node numbers, then directly use them to assemble the global stiffness matrix.

Because each element corresponds to a pixel, we can assume that its grayscale value is a constant

X_{g r a y}^{(i) (e)}

. In this way, a sample image

X^{(i)} (\vec{x})

can be expressed as

{\vec{X}}_{g r a y}^{(i)} = {[X_{g r a y}^{(i) (1)} X_{g r a y}^{(i) (2)} \dots X_{g r a y}^{(i) (e)} \dots X_{g r a y}^{(i) (N e)}]}^{T}

. Through substituting Expression (75) into Expression (70), the relationship between element node vectors and image grayscale values can be obtained

{\vec{X}}^{(i) (e)} = X_{g r a y}^{(i) (e)} \vec{F},

(79)

where

\vec{F} = 0.25 \times {[\begin{matrix} 1 & 1 & 1 & 1 \end{matrix}]}^{T}

, which can be regarded as mapping coefficients from the image grayscale to the element node vector.

While using element stiffness matrices

K_{s}^{(e)}

and element node vectors

{\vec{X}}^{(i) (e)}

to assemble the global stiffness matrix

K

and the global node vector

{\vec{X}}^{(i)}

, the functions for generating a sparse matrix in MATLAB R2020a or the Python 2.7 SciPy module can be used, and the input arguments include the row index vector, the column index vector, and the values of the non-zero entries. More importantly, these sparse matrix generation functions can sum the non-zero entries with the same indexes, which is consistent with the process of assembling

K

and

{\vec{X}}^{(i)}

.

In order to convert the image grayscale vector

{\vec{X}}_{g r a y}^{(i)}

to the global node vector

{\vec{X}}^{(i)}

, a

4 \times N e

matrix

\vec{F} {({\vec{X}}_{g r a y}^{(i)})}^{T}

should first be calculated, whose

e - t h

column corresponds to the element node vector

{\vec{X}}^{(i) (e)}

. Then,

\vec{F} {({\vec{X}}_{g r a y}^{(i)})}^{T}

is converted to a

4 N e

-dimensional column vector

{\vec{V}}_{X}

in column-major order. Obviously, if we divide the components of

{\vec{V}}_{X}

into multiple groups in sequence and each group includes four components, then the

e - t h

group corresponds to the element node vector

{\vec{X}}^{(i) (e)}

.

{\vec{V}}_{X}

can be calculated by

{\vec{V}}_{X} = r e s h a p e (\vec{F} {({\vec{X}}_{g r a y}^{(i)})}^{T}, 4 N e, 1),

(80)

where the function

r e s h a p e (A, m, n)

can convert the dimension of the matrix

A

into

m \times n

while keeping the total number of the entries unchanged.

Through the mapping matrix

Θ

, the position indexes of components of

{\vec{V}}_{X}

in the global node vector

{\vec{X}}^{(i)}

can be obtained. We transpose the

N e \times 4

matrix

Θ

to the

4 \times N e

matrix

Θ^{T}

, whose

e - t h

column corresponds to global node numbers of the

e - t h

element, and then convert

Θ^{T}

to a

4 N e

-dimensional column vector

{\vec{I}}_{X}

in the column-major order.

{\vec{I}}_{X}

can be figured out by

{\vec{I}}_{X} = r e s h a p e (Θ^{T}, 4 N e, 1) .

(81)

{\vec{I}}_{X}

is the row index vector for generating

{\vec{X}}^{(i)}

by a sparse matrix generation function. Since

{\vec{X}}^{(i)}

has only one column, we use

{\vec{J}}_{X}

to denote a

4 N e

-dimensional column index vector and set all the components of

{\vec{J}}_{X}

to 1. Through substituting

{\vec{V}}_{X}

,

{\vec{I}}_{X}

,

{\vec{J}}_{X}

into the sparse matrix generation function, we can yield

{\vec{X}}^{(i)}

.

Similarly, the global stiffness matrix

K

can be assembled by using the sparse matrix generation function. A vector

{\vec{V}}_{K}^{(p q)}

related to

K_{s}^{(e)}

should be first calculated by

{\vec{V}}_{K}^{(p q)} = r e s h a p e (K_{s}^{(p q)}, 16 N e, 1), K_{s}^{(p q)} = K_{p} \otimes {\vec{p}}^{T} + K_{q} \otimes {\vec{q}}^{T},

(82)

where the operator

\otimes

denotes the Kronecker product of the matrices;

K_{s}^{(p q)}

is a

4 \times 4 N e

matrix;

\vec{p}

and

\vec{q}

are the design variables. If

K_{s}^{(p q)}

is divided into multiple blocks from left to right and each block is a

4 \times 4

matrix, the

e - t h

block is the calculation result of the first two terms of

K_{s}^{(e)}

in Expression (54), without including

K_{σ}^{(e)}

. Therefore, if

{\vec{V}}_{K}^{(p q)}

are divided into multiple blocks in sequence and each block includes 16 components, the

e - t h

block will correspond to a 1-dimensional vector converted from the

e - t h

element stiffness matrix in the column priority. We set

\vec{1} = {[\begin{matrix} 1 & 1 & 1 & 1 \end{matrix}]}^{T}

, and use

{\vec{I}}_{K}^{(p q)}

,

{\vec{J}}_{K}^{(p q)}

to denote the row indexes and column indexes of the entries in the global stiffness matrix. Then,

{\vec{I}}_{K}^{(p q)}

,

{\vec{J}}_{K}^{(p q)}

corresponding to the components of

{\vec{V}}_{K}^{(p q)}

can be calculated by

{\vec{I}}_{K}^{(p q)} = r e s h a p e (Θ^{T} \otimes {\vec{1}}^{T}, 16 N e, 1), {\vec{J}}_{K}^{(p q)} = r e s h a p e (Θ^{T} \otimes \vec{1}, 16 N e, 1) .

(83)

As mentioned above, the constraint on the design boundary

Γ

can generate additional stiffness

K_{σ}^{(e)}

for the adjacent elements. If we regard an element side overlapping with

Γ

as a 2-node line element, then its stiffness matrix will be a

2 \times 2

matrix

K_{Γ}^{(e)}

, which can be figured out by Expression (78). Similarly, these line element stiffness matrices can be assembled into the global stiffness matrix. While designing an image classifier based on the mutual-energy inner products, we set a fixed boundary for Expression (1), i.e.,

u (\vec{x}) = 0 (\vec{x} \in Γ)

. This boundary condition can be handled by adding a relatively large number

σ_{0}

to the diagonal entries of

K

, where its diagonal entries correspond to the boundary node numbers. The sparse matrix generation function is used to implement this boundary condition. First, we set the dimension of the vector

{\vec{V}}_{K}^{(σ_{0})}

as

n_{Γ}

, which is the total number of the boundary nodes, and set all the components of

{\vec{V}}_{K}^{(σ_{0})}

to

σ_{0}

. Meanwhile, we let the

n_{Γ}

-dimensional row and column index vectors be the same, i.e.,

{\vec{I}}_{K}^{(σ_{0})} = {\vec{J}}_{K}^{(σ_{0})}

, and set their components to be the boundary node numbers. Finally, we combine

{\vec{V}}_{K}^{(p q)}

and

{\vec{V}}_{K}^{(σ_{0})}

,

{\vec{I}}_{K}^{(p q)}

and

{\vec{I}}_{K}^{(σ_{0})}

,

{\vec{J}}_{K}^{(p q)}

and

{\vec{J}}_{K}^{(σ_{0})}

, respectively, and input them into the sparse matrix generation function to obtain

K

.

Based on Expressions (68) and (69), the gradients of the objective and the constraint can be efficiently obtained by using

Θ

. For example, if we have two

M

-dimensional global node vectors

\vec{c}

and

\vec{α}

, we can adopt fancy indexing to generate two

N e \times 4

matrices

N_{c} = \vec{c} (Θ)

and

N_{α} = \vec{α} (Θ)

whose

e - t h

rows correspond to the node vectors of the

e - t h

element. According to Expression (69), the objective function gradients

\nabla_{\vec{p}} J

,

\nabla_{\vec{q}} J

can be calculated by

\nabla_{\vec{p}} J = - s u m (N_{c} K_{p} \cdot * N_{α}, 2), \nabla_{\vec{q}} J = - s u m (N_{c} K_{q} \cdot * N_{α}, 2),

(84)

where

\cdot *

stands for multiplying the corresponding entries of the matrices, and

s u m (A, 2)

is summing the rows of a matrix to obtain a column vector. Mathematically, Expression (84) can be written as

\nabla_{\vec{p}} J = d i a g (N_{c} K_{p} N_{α}^{T})

and

\nabla_{\vec{p}} J = d i a g (N_{c} K_{q} N_{α}^{T})

, where the function

d i a g ()

is used to extract the main diagonal entries from a square matrix. Similarly, the constraint function gradients

\nabla_{\vec{p}} G

,

\nabla_{\vec{q}} G

can be calculated by replacing

\vec{c}

,

\vec{α}

with

\vec{u}

,

\vec{v}

.

6.2. Image Classifier

For a given training dataset

D

, in order to use Optimization Algorithm 1 to construct the mutual-energy inner product coordinate axes

{\vec{α}}_{m} (m = 1, 2 \dots N_{α})

, we select the subset

D_{s}

of

D

as the reference training set and select the mean of samples of the class “0” or the class “1” in

D_{s}

or a combination of these means as the reference feature

\vec{α}

. The subset

D_{s}

is gradually generated as the coordinate

{\vec{α}}_{m}

is generated. Prior to generating the coordinate

{\vec{α}}_{m + 1}

,

m

mutual-energy inner product coordinate axes

{\vec{α}}_{1}, {\vec{α}}_{2} \dots {\vec{α}}_{m}

have been generated and there are

m

subsets

D_{s}^{(1)}, D_{s}^{(2)} \dots D_{s}^{(m)}

. One of the

m

subsets is selected as a subset

D_{s}

to generate the coordinate

{\vec{α}}_{m + 1}

. In order to explain how the generation of new axes work, we use a set

S_{T}

to manage the

m

generated subsets, i.e.,

S_{T} = {D_{s}^{(1)}, D_{s}^{(2)} \dots D_{s}^{(m)}}_{1}^{m}

. If

D_{s}^{(i)}

has

M_{0}^{(i)}

samples of the class “0” and

M_{1}^{(i)}

samples of the class “1”, the subset

D_{s}^{(k)}

in

S_{T}

is taken as the reference training sample set

D_{s}

to generate

{\vec{α}}_{m + 1}

and its index

k

satisfies

k = \underset{i}{argmax} \min (M_{0}^{(i)}, M_{1}^{(i)}) .

(85)

After determining the subset

D_{s}

and the reference feature

\vec{α}

,

{\vec{α}}_{m + 1}

can be obtained by Optimization Algorithm 1. Next, we divide

D_{s}

into two subsets. First, for each sample

{\vec{X}}^{(i)}

in

D_{s}

, we calculate its coordinate component

z_{m + 1}^{(i)}

on the axis

{\vec{α}}_{m + 1}

by

z_{m + 1}^{(i)} = 〈 {\vec{α}}_{m + 1}, {\vec{X}}^{(i)} 〉

; we calculate

μ_{0, m + 1}

and

μ_{1, m + 1}

, the means of samples of the class “0” and the class “1” and set a threshold

z_{m + 1}^{t h} = (μ_{0, m + 1} + μ_{1, m + 1}) / 2

. Second, according to

z_{m + 1}^{(i)}

and

z_{m + 1}^{t h}

, we divide

D_{s}

into two subsets satisfying

D_{s I} = {{\vec{X}}^{(i)} | {\vec{X}}^{(i)} \in D_{s}, z_{m + 1}^{(i)} \leq z_{m + 1}^{t h}}

and

D_{s I I} = {{\vec{X}}^{(i)} | {\vec{X}}^{(i)} \in D_{s}, z_{m + 1}^{(i)} > z_{m + 1}^{t h}}

. Finally, we add

D_{s I}

and

D_{s I I}

into

S_{T}

, and delete

D_{s}^{(k)}

from

S_{T}

. At this time,

S_{T}

contains

m + 1

training sample subsets, and one of them will be selected to calculate the coordinate axis

{\vec{α}}_{m + 2}

.

The following summarizes the detailed steps of generating mutual-energy inner product feature coordinates.

Algorithm 2: Mutual-energy inner product feature coordinates generation

(1): Let $m = 0$ and $S_{T} = {D}$ ;
(2): According to Expression (85), select $D_{s}$ in $S_{T}$ to generate the coordinate axis ${\vec{α}}_{m + 1}$ and delete $D_{s}^{(k)}$ from $S_{T}$ ;
(3): Adopt Optimization Algorithm 1 to calculate ${\vec{α}}_{m + 1}$ based on the determined reference subset $D_{s}$ and the selected reference feature $\vec{α}$ ;
(4): For each sample ${\vec{X}}^{(i)}$ in $D_{s}$ , calculate its coordinate components $z_{m + 1}^{(i)}$ on the axis ${\vec{α}}_{m + 1}$ , the means $μ_{0, m + 1}$ and $μ_{1, m + 1}$ of the class “0” and the class “1”, as well as the threshold $z_{m + 1}^{t h} = (μ_{0, m + 1} + μ_{1, m + 1}) / 2$ ;
(5): According to $z_{m + 1}^{(i)}$ and $z_{m + 1}^{t h}$ , divide $D_{s}$ into two subsets $D_{s I}$ and $D_{s I I}$ , and add them into $S_{T}$ ;
(6): Judge if $m < N_{α}$ , set $m = m + 1$ and go to Step (2); otherwise, stop.

After generating mutual-energy inner product coordinate axes

{\vec{α}}_{m} (m = 1, 2 \dots N_{α})

by Algorithm 2, the coordinate components

z_{m}^{(i)} (m = 1, 2 \dots N_{m})

of each sample in

D

can be calculated and are represented by a feature vector

{\vec{z}}^{(i)} = {[\begin{matrix} z_{1}^{(i)} & z_{2}^{(i)} & \dots & z_{N_{m}}^{(i)} \end{matrix}]}^{T}

. Based on

{\vec{z}}^{(i)}

, a simple Gaussian classifier is used to classify the images. We use

D_{j}

to represent a training dataset comprising

M_{j}

samples, where the subscript

j

is the class index of the samples. A Gaussian classifier can be used to classify the samples into multiple classes. We use

y^{(i)} = j (j = 0, 1 \dots C - 1)

to indicate the class of a sample and use

C

to denote the total number of classes. In

D

, the probability of the class

y^{(i)} = j

is

p (y^{(i)} = j) = \frac{M_{j}}{\sum_{k = 0}^{C - 1} M_{k}} .

(86)

Furthermore, it is assumed that, for the samples in the same class, their feature vectors

{\vec{z}}^{(i)}

follow the Gaussian distribution

p ({\vec{z}}^{(i)} | y^{(i)} = j) \sim N ({\vec{z}}^{(i)} | {\vec{μ}}_{j}, \sum_{j}),

(87)

where

{\vec{μ}}_{j}

is the mean of

{\vec{z}}^{(i)}

;

\sum_{j}

is the covariance matrix of

{\vec{z}}^{(i)}

; and the subscript

j

corresponds to the class

y^{(i)} = j

. Using the training sample dataset

D_{j}

, their maximum likelihood estimates can be calculated by [38]

{\vec{μ}}_{j} = \frac{1}{M_{j}} \sum_{i = 1}^{M_{j}} {\vec{z}}^{(i)}, \sum_{j} = \frac{1}{M_{j}} \sum_{i = 1}^{M_{j}} ({\vec{z}}^{(i)} - {\vec{μ}}_{j}) {({\vec{z}}^{(i)} - {\vec{μ}}_{j})}^{T} .

(88)

Here,

z^{(i)} \in D_{j}

. Based on Expressions (86) and (87), when giving the feature vector of a sample, the posterior probability of the sample belonging to the class

y = j

is

p (y = j | \vec{z}) = \frac{e^{β_{j} (\vec{z})}}{\sum_{i = 1}^{C} e^{β_{i} (\vec{z})}},

(89)

where

p (y = j | \vec{z})

is the posterior probability, and

β_{j} (\vec{z})

can be expressed as

{\begin{cases} β_{j} (\vec{z}) = \frac{1}{2} {\vec{z}}^{T} H_{j} \vec{z} + {\vec{b}}_{j}^{T} \vec{z} + c_{j} \\ H_{j} = - Σ_{j}^{- 1} {\vec{b}}_{j} = Σ_{j}^{- 1} {\vec{μ}}_{j} \\ c_{j} = - \frac{1}{2} {\vec{μ}}_{j}^{T} Σ_{j}^{- 1} {\vec{μ}}_{j} - \frac{1}{2} \ln (| Σ_{j} |) + \ln (p (y = j)) \end{cases} .

(90)

Finally, the class of the sample is determined based on the posterior probability

y = \underset{j}{\arg \max} p (y = j | \vec{z}) j \in {0, 1 \dots C - 1} .

(91)

6.3. Numerical Examples

The MNIST dataset has become one of the benchmark datasets in machine learning. It comprises 60,000 sample images in the training set and 10,000 sample images in the test set, and each one is a 28-by-28-pixel grayscale image of the handwritten digits 0–9. In this section, we will use the MNIST to design Gaussian image classifiers based on Optimization Algorithm 1.

Before designing Gaussian image classifiers, image preprocessing is conducted to align the image centroids and normalize the sample images. In Optimization Algorithm 1, the selected parameters are

λ = 0.3

,

p_{\min} = q_{\min} = 10^{- 3}

,

T o l p = T o l q = 2.0

,

σ_{0} = 10^{5}

,

Δ x_{\max} = 0.08

,

ε_{x} = 8 \times 10^{- 4}

, and

ε_{J} = 10^{- 7}

.

6.3.1. Binary Gaussian Classifier: Identify Digits “0” and “1”

The MINST training set comprises 6742 samples “1” and 5923 samples “0”. We select the difference between the means of samples “1” and “0” as the reference feature, i.e.,

\vec{α} = \vec{u} - \vec{v}

. Optimization Algorithm 1 converges after 166 iterations. The means of samples “1” and “0”, the design variables, and the reference feature coordinate

α (\vec{x})

, are visualized in Figure 1, Figure 2 and Figure 3. Due to obvious differences in the mean feature, digits “0” and “1” can be identified using only one mutual-energy inner product coordinate

\vec{α}

. Figure 4a shows the training sample distribution in accordance with the components on

\vec{α}

. Figure 5a gives the Confusion Matrix of the classification results, where the horizontal and vertical axes correspond to the target class and the output class of the classifier, respectively. In the Confusion Matrix, the column on the far right shows the precision of all the examples predicted to belong to each class, and the row at the bottom shows the recall of all the examples belonging to each class; the entry in the bottom right shows the overall accuracy; the diagonal entries are the correctly classified numbers of digits “0” and “1” and the off-diagonal entries correspond to the wrong classifications. This binary Gaussian classifier on the training set achieves a very high overall accuracy of 99.66%, shown at the bottom right of the Confusion Matrix.

The binary Gaussian classifier is tested on the MINST test set, which comprises 1135 samples “1” and 980 samples “0”. The test results are visualized in Figure 4b and Figure 5b. Its overall accuracy can reach 99.91%, higher than that on the training set.

6.3.2. Binary Gaussian Classifier: Identify Digits “0” and “2”

The MINST training set comprises 5958 samples “2” and 5923 samples “0”, and the MINST test set comprises 1032 samples “2” and 980 samples “0”. Similarly to the previous classifier, the reference feature is also selected as

\vec{α} = \vec{u} - \vec{v}

. The difference in the mean features of digits “2” and “0” is not as significant as that of digits “1” and “0”. If only one mutual-energy inner product coordinate is used for classification, the accuracy is only 96.72% on the training set and 97.81% on the test set. In order to improve the classification accuracy, we use Algorithm 2 to generate 60 mutual-energy inner product coordinates based on the training sample set and its subsets, and construct a 60-dimensional Gaussian classifier. The Confusion Matrices of the classification results are given in Figure 6a,b, showing an overall accuracy of 99.55% on the training set and a higher overall accuracy of 99.85% on the test set.

6.3.3. Binary Gaussian Classifier: Identify Digits “3” and “4”

The MINST training set comprises 6131 samples of “3” and 5842 samples of “4”, and the MINST test set comprises 1010 samples of “3” and 982 samples of “4”. Here, we select the means of samples “3” and “4” as reference features, i.e.,

\vec{α} = \vec{u}

and

\vec{α} = \vec{v}

, and then use Algorithm 2 to generate 50 mutual-energy inner product coordinates, respectively, finally forming 100 classification coordinates. Because these coordinates are not linearly independent, we use matrix singular value decomposition to construct a 50-dimensional Gaussian classifier. Figure 6c,d is the Confusion Matrices, showing an overall accuracy of 99.67% on the training set and a higher overall accuracy of 99.80% on the test set.

6.3.4. Multiclass Gaussian Classifier: Identify Digits “0”, “1”, “2”, “3” and “4”

In the training set, we select one digit from samples “0”, “1”, “2”, “3” and “4” as the first class and the other training samples of the five digits as the second class, and we take the two classes as the training sample set. Then, we select the difference between the means of the samples in the two classes as the reference feature, i.e.,

\vec{α} = \vec{u} - \vec{v}

, and use Algorithm 2 to generate 120 mutual-energy inner product coordinates. In this way, we construct 5 training sample sets and finally generate 600 coordinates. However, many of them are linearly dependent; in order to identify the digits “0”, “1”, “2”, “3” and “4”, we use matrix singular value decomposition to reduce its dimensions from 600 to 60 and construct a 60-dimensional multiclass Gaussian classifier. Figure 7 shows an overall accuracy of 98.22% on the training set and a higher overall accuracy of 98.83% on the test set.

7. Discussion

Based on the solution space of the partial differential equations describing the vibration of a non-uniform membrane, the concept of the mutual-energy inner product is defined. By expending the mutual-energy inner product as a superposition of the eigenfunctions of the partial differential equations, an important property is found: the mutual-energy inner product has the significant advantage of enhancing the low-frequency eigenfunction components and suppressing the high-frequency eigenfunction components, compared to the Euclidean inner product.

In data classification, if the reference data features of the samples belong to a low-frequency subspace of the set of the eigenfunctions, these data features can be extracted through the mutual-energy inner product, which can not only enhance feature information but also filter out high-frequency data noise. As a result, a mutual-energy inner product optimization model is built to extract the feature coordinates of the samples, which can enhance the data features, reduce the sample deviations, and regularize the design variables. We make use of the minimum energy principle to eliminate the constraints of the partial differential equations in the optimization model and obtain an unconstrained optimization objective function. The objective function is a quadratic functional, which is convex with respect to the variables that minimize the objective function, is concave with respect to the variables that maximize the objective function, and is linear with respect to the design variables. These properties facilitate the design of optimization algorithms.

FEM is used to discrete the design domain, and the design variables of each element are set as constants. Based on these finite elements, the gradients of the mutual-energy inner product relative to the element design variables are analyzed, and a sequential linearization algorithm is constructed to solve the mutual-energy inner product optimization model. Algorithm implementation only involves solving equations with the positive definite symmetric matrix when calculating the intermediate variables and only needs to handle a few constraints in the nested linear optimization module, guaranteeing the stability and effectiveness of the algorithm.

The mutual-energy inner product optimization model is applied to extract the feature coordinates of the sample images and construct a low-dimensional coordinate system to represent the sample images. Multiclass Gaussian classifiers are trained and tested to classify the 2-D images. Here, only the means of the training sample set and its subsets are selected as reference features in Optimization Algorithm 1, and the vectorized implementation of Optimization Algorithm 1 is discussed. Generating mutual-energy inner product coordinates via the optimization model and training or testing Gaussian classifiers are two independent steps. In training or testing Gaussian classifiers, calculating mutual-energy inner products can be converted into calculating the Euclidean inner products between the reference feature coordinates and the sample data, not adding computational complexity to the Gaussian classifiers.

In the MINST dataset, the mutual-energy inner product feature coordinate extraction method is used to train a 1-dimensional two-class Gaussian classifier, a 50-dimensional two-class Gaussian classifier, a 60-dimensional two-class Gaussian classifier, and a 60-dimensional five-class Gaussian classifier, and good prediction results are achieved. The feature coordinate extraction method achieves a higher overall accuracy on the test set than that on the training set, indicating that the classification model is experiencing underfitting. This shows large potential in the achievable accuracy of this method that has not yet been explored.

From the view of theory and algorithm, this feature extraction method is obviously different from the existing techniques in machine learning. Its limitation is the need of the given reference features in advance. In this paper, only the mean features of a sample dataset and its subsets are selected as the reference features to construct Gaussian classifiers. In the future, convolution operation can be adopted to construct other image reference features, such as image edge features, local features, textures [39], and multi-scale features, and these image features can be combined to generate a mutual-energy inner product feature coordinate system. In addition, other ensemble classifiers, such as Bagging and AdaBoost, can be introduced to improve the performances of the image classifiers. Meanwhile, the feasibility of applying the mutual-energy inner product optimization method to the neural network will also be explored.

Funding

This research received no external funding.

Data Availability Statement

Data and code are available upon request from the author.

Acknowledgments

We appreciate Brian Barsky and Stuart Russell for their encouragement during our difficult times.

Conflicts of Interest

The author declares no conflicts of interest.

References

Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef] [PubMed]
Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 645–657. [Google Scholar] [CrossRef]
Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N. Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
Kowsari, K.; Jafari Meimandi, K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text Classification Algorithms: A Survey. Information 2019, 10, 150. [Google Scholar] [CrossRef]
Ramana, B.V.; Babu, M.S.P.; Venkateswarlu, N.B. A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis. Int. J. Database Manag. Syst. 2011, 3, 101–114. [Google Scholar] [CrossRef]
Sisodia, D.; Sisodia, D.S. Prediction of Diabetes using Classification Algorithms. Procedia Comput. Sci. 2018, 132, 1578–1585. [Google Scholar] [CrossRef]
Barboza, F.; Kimura, H.; Altman, E. Machine Learning Models and Bankruptcy Prediction. Expert Syst. Appl. 2017, 83, 405–417. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Jain, S.; Rastogi, R. Parametric non-parallel support vector machines for pattern classification. Mach. Learn. 2022, 113, 1567–1594. [Google Scholar] [CrossRef]
Kotsiantis, S.B. Decision Trees: A Recent Overview. Artif. Intell. Rev. 2013, 39, 261–283. [Google Scholar] [CrossRef]
Patel, H.H.; Prajapati, P. Study and Analysis of Decision Tree Based Classification Algorithms. Int. J. Comput. Sci. Eng. 2018, 6, 74–78. [Google Scholar] [CrossRef]
Friedman, N.; Koller, D. Being Bayesian about Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks. Mach. Learn. 2003, 50, 5–125. [Google Scholar] [CrossRef]
Muja, M.; Lowe, D.G. Scalable Nearest Neighbor Algorithms for High Dimensional Data. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2227–2240. [Google Scholar] [CrossRef] [PubMed]
Breiman, L. Radom Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Hinton, G.; Salakhutdinov, R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Yerramreddy, D.R.; Marasani, J.; Ponnuru, S.V.G.; Min, D. Harnessing deep reinforcement learning algorithms for image categorization: A multi algorithm approach. Eng. Appl. Artif. Intell. 2024, 136, 108925. [Google Scholar] [CrossRef]
Wang, M.J.; Chen, H.L. Chaotic Multi-Swarm Whale Optimizer Boosted Support Vector Machine for Medical Diagnosis. Appl. Soft Comput. J. 2020, 88, 105946. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Sen, P.C.; Hajra, M.; Ghosh, M. Supervised Classification Algorithms in Machine Learning: A Survey and Review. In Emerging Technology in Modelling and Graphics: Proceedings of IEM Graph 2018; Advances in Intelligent Systems and Computing; Springer: Singapore, 2018; Volume 937, pp. 99–111. [Google Scholar]
Schonlau, M.; Zou, R.Y. The Random Forest Algorithm for Statistical Learning. Stata J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
Zhang, C.; Liu, C.; Zhang, X.; Almpanidis, G. An Up-To-Date Comparison of State-of-the-art Classification Algorithms. Expert Syst. Appl. 2017, 82, 128–150. [Google Scholar] [CrossRef]
Wang, P.; Fan, E.; Wang, P. Comparative Analysis of Image Classification Algorithms Based on Traditional Machine Learning and Deep Learning. Pattern Recognit. Lett. 2021, 141, 61–67. [Google Scholar] [CrossRef]
Bansal, M.; Goyal, A.; Choudhary, A. A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning. Decis. Anal. J. 2022, 3, 100071. [Google Scholar] [CrossRef]
Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
Xie, D.; Zhang, L.; Bai, L. Deep Learning in Visual Computing and Signal Processing. Appl. Comput. Intell. Soft Comput. 2017, 2017, 1320780. [Google Scholar] [CrossRef]
Srinidhi, C.L.; Ciga, O.; Martel, A.L. Deep Neural Network Models for Computational Histopathology: A survey. Med. Image Anal. 2021, 67, 101813. [Google Scholar] [CrossRef]
Aher, S.B.; Lobo, L.M.R.J. Comparative Study of Classification Algorithms. Int. J. Inf. Technol. Knowl. Manag. 2012, 5, 239–243. [Google Scholar]
Nachappa, T.G.; Piralilou, S.T.; Gholamnia, K.; Ghorbanzadeh, O.; Rahmati, O.; Blaschke, T. Flood Susceptibility Mapping with Machine Learning, Multi-Criteria Decision Analysis and Ensemble Using Dempster Shafer Theory. J. Hydrol. 2020, 590, 125275. [Google Scholar] [CrossRef]
Creamer, G.; Freund, Y. Learning A Board Balanced Scorecard to Improve Corporate Performance. Decis. Support Syst. 2010, 49, 365–385. [Google Scholar] [CrossRef]
Asteris, P.G.; Rizal, F.I.M.; Koopialipoor, M.; Roussis, P.C.; Ferentinou, M.; Armaghani, D.J.; Gordan, B. Slope Stability Classification under Seismic Conditions Using Several Tree-Based Intelligent Techniques. Appl. Sci. 2022, 12, 1753. [Google Scholar] [CrossRef]
Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Ribeiro, M.H.D.M.; dos Santos Coelho, L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl. Soft Comput. 2020, 86, 105837. [Google Scholar] [CrossRef]
Monego, V.S.; Anochi, J.A.; de Campos Velho, H.F. South America Seasonal Precipitation Prediction by Gradient-Boosting Machine-Learning Approach. Atmosphere 2022, 13, 243. [Google Scholar] [CrossRef]
Nanni, L.; Brahnam, S.; Ghidoni, S.; Lumini, A. Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification. Comput. Intell. Neurosci. 2015, 2015, 909123. [Google Scholar] [CrossRef] [PubMed]
Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A Survey on Ensemble Learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
Courant, R.; Hilbert, D. Methods of Mathematical Physics; John Wiley & Sons Incorporated: New York, NY, USA, 1991. [Google Scholar]
Reddy, J.N. An Introduction to the Finite Element Method, 3rd ed.; McGraw-Hill Education: New York, NY, USA, 2005. [Google Scholar]
Friedman, J.H.; Tibshirani, R.; Hastie, T. The Elements of Statistical Learning Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Liu, Z.; Qi, X.; Torr, P.H. Global Texture Enhancement for Fake Face Detection in the Wild. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]

Figure 1. The means of the samples.

Figure 2. Design variables.

Figure 3. Reference feature coordinate.

Figure 4. Sample distribution.

Figure 5. Confusion Matrix.

Figure 6. Confusion Matrix.

Figure 7. Confusion Matrix.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y. Mutual-Energy Inner Product Optimization Method for Constructing Feature Coordinates and Image Classification in Machine Learning. Mathematics 2024, 12, 3872. https://doi.org/10.3390/math12233872

AMA Style

Wang Y. Mutual-Energy Inner Product Optimization Method for Constructing Feature Coordinates and Image Classification in Machine Learning. Mathematics. 2024; 12(23):3872. https://doi.org/10.3390/math12233872

Chicago/Turabian Style

Wang, Yuanxiu. 2024. "Mutual-Energy Inner Product Optimization Method for Constructing Feature Coordinates and Image Classification in Machine Learning" Mathematics 12, no. 23: 3872. https://doi.org/10.3390/math12233872

APA Style

Wang, Y. (2024). Mutual-Energy Inner Product Optimization Method for Constructing Feature Coordinates and Image Classification in Machine Learning. Mathematics, 12(23), 3872. https://doi.org/10.3390/math12233872

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mutual-Energy Inner Product Optimization Method for Constructing Feature Coordinates and Image Classification in Machine Learning

Abstract

1. Introduction

2. Mutual-Energy Inner Product

3. Signal Processing Property of Mutual-Energy Inner Product

4. Mutual-Energy Inner Product Optimization Model for Feature Extraction

5. Mutual-Energy Inner Product Feature Coordinate Optimization Algorithm

6. Algorithm Implementation and Image Classifier

6.1. Vectorized Implementation of Optimization Algorithm 1

6.2. Image Classifier

6.3. Numerical Examples

6.3.1. Binary Gaussian Classifier: Identify Digits “0” and “1”

6.3.2. Binary Gaussian Classifier: Identify Digits “0” and “2”

6.3.3. Binary Gaussian Classifier: Identify Digits “3” and “4”

6.3.4. Multiclass Gaussian Classifier: Identify Digits “0”, “1”, “2”, “3” and “4”

7. Discussion

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI