Reduced Dilation-Erosion Perceptron for Binary Classiﬁcation

: Dilation and erosion are two elementary operations from mathematical morphology, a non-linear lattice computing methodology widely used for image processing and analysis. The dilation-erosion perceptron (DEP) is a morphological neural network obtained by a convex combination of a dilation and an erosion followed by the application of a hard-limiter function for binary classiﬁcation tasks. A DEP classiﬁer can be trained using a convex-concave procedure along with the minimization of the hinge loss function. As a lattice computing model, the DEP classiﬁer assumes the feature and class spaces are partially ordered sets. In many practical situations, however, there is no natural ordering for the feature patterns. Using concepts from multi-valued mathematical morphology, this paper introduces the reduced dilation-erosion (r-DEP) classiﬁer. An r-DEP classiﬁer is obtained by endowing the feature space with an appropriate reduced ordering. Such reduced ordering can be determined using two approaches: one based on an ensemble of support vector classiﬁers (SVCs) with different kernels and the other based on a bagging of similar SVCs trained using different samples of the training set. Using several binary classiﬁcation datasets from the OpenML repository, the ensemble and bagging r-DEP classiﬁers yielded mean higher balanced accuracy scores than the linear, polynomial, and radial basis function (RBF) SVCs as well as their ensemble and a bagging of RBF SVCs.


Introduction
Cyber-physical systems (CPS) is a broad interdisciplinary area which combines computational and physical devices in an integrated manner [1][2][3][4].Internet of things (IoT), for instance, can be viewed as an important class of CPS where physical objects are interconnected in a network with identified address [5].Besides Industry 4.0 and related technologies [6], applications of CPS include social robots for educational purposes [7], medical services and healthcare [8], and many other fields.
Although modeling a CPS comprises both physical and computational process [4], this paper focuses only on the latter.Specifically, we model the computational process using lattice computing paradigm.Lattice computing (LC) comprises the many techniques and mathematical modeling methodologies based on lattice theory [9,10].Lattice theory is concerned with a mathematical structure obtained by enriching a non-empty set with an ordering scheme with well defined extrema operations.Specifically, a lattice is a partially ordered set in which any finite set has both an infimum and a supremum [11].One of the main advantages of LC is its capability to process ordered data which include logic values, sets and more generally fuzzy sets, images, graphs, and many other types of information granules [9,10,12,13].Mathematical morphology and morphological neural networks are examples of well succeeded LC modeling methodologies.Let us briefly address these two LC methodologies in the following paragraphs.
Mathematical morphology (MM) is a non-linear theory widely used for image processing and analysis [14][15][16][17].MM was originally conceived for processing binary images in the 1960s.Subsequently, it has been extended to gray-scale images using the notions of umbra, level sets, and fuzzy set theory [18][19][20].Complete lattice is one key concept to extend MM from binary to more general contexts [21,22].Specifically, MM is a theory mainly concerned with mappings between complete lattices [23,24].Dilations and erosions, which are defined algebraically as mappings that commute respectively with the supremum and infimum operations, are elementary operations of MM.Many other operators from MM are defined by combining dilations and erosions [17,23,25].Although complete lattice provides an appropriate mathematical background for binary and gray-scale MMs, defining morphological operators for multi-valued images is not straightforward because there is no universal ordering for vector-valued spaces [26,27].In fact, the development of appropriate ordering scheme for multi-valued MM is an active area of research [28][29][30][31][32].In this paper, we make use of the suprevised reduced ordering proposed by Velasco-Forero and Angulo [33].In a few words, a supervised reduced ordering is defined using a training set of negative (or background) and positive (or foreground) values.As a consequence, the resulting multi-valued morphological operators can be interpreted in terms of positive and negative training values.One contribution of this paper is the definition of morphological neural networks based on supervised reduced orderings (see Section 4).
Morphological neural networks (MNNs) refer to the broad class of neural networks whose processing units perform an operation from MM, possibly followed by the application of an activation function [34].The single-layer morphological perceptron, introduced by Ritter and Sussner in the middle 1990s, is one of the earliest MNNs along with the morphological associative memories [35].Briefly, the morphological perceptron performs either a dilation or an erosion from gray-scale MM followed by a hard-limiter activation function.The original morphological perceptron has been subsequently investigated and generalized by many prominent researchers [34,[36][37][38].For example, Sussner addressed the multilayer morphological perceptron and introduced a supervised learning algorithm for binary classification problems [36].In a few words, the learning algorithm proposed by Sussner is an incremental algorithm which adds hidden morphological neurons until all the training set is correctly classified.By taking into account the relevance of dendrites in biological neurons, Ritter and Urcid proposed a morphological neuron with dendritic structure [37].Apart from the biological motivation, the morphological perceptron with dendritic structure is similar to the multilayer morphological perceptron investigated by Sussner.In fact, like the learning algorithm of the multilayer morphological perceptron, the morphological perceptron with dendritic structure grows as it learns until there are no mis-classified training samples.Furthermore, the decision surface of both multilayer perceptron and the morphological perceptron with dentritic structure depends on the order in which the training samples are presented to the network.The morphological perceptron with competitive layer (MPC) introduced by Sussner and Esmi does not depend on the order in which the training samples are presented to the network [34].Like the previous models, however, training morphological perceptron with competitive layer finishes only when all training patterns are correctly classified.Thus, it is possible to the network to end up overfitting the training data.
In contrast to the greedy methods described in the previous paragraph, many researchers formulated the training of MNNs and hybrid models as an optimization problem.For example, Pessoa and Maragos used pulse functions to circumvent the non-differentiability of lattice-based operations on a steepest descent method designed for training a hybrid morphological/rank/linear network [39].Based on the ideas of Pessoa and Maragos, Araújo proposed a hybrid morphological/linear network called dilation-erosion perceptron (DEP), which is trained using a steepest descent method [40].Steepest descent methods are also used by Hernández et al. for training hybrid two-layer neural networks, where one layer is morphological and the other is linear [41].In a similar fashion, Mondal et al. proposed a hybrid morphological/linear model, called a dense morphological network, which was trained using stochastic gradient descent method such as adam optimization [42].Apparently unaware of the aforementioned works on MNNs, Franchi et al. recently integrated morphological operators with a deep learning framework to introduce the so-called deep morphological network, which was also trained using a steepest descent algorithm [43].In contrast to steepest descent methods, Arce et al. trained MNNs using differential evolution [44].Moreover, Sussner and Campiotti proposed a hybrid morphological/linear extreme learning machine which hads a hidden-layer of morphological units and a linear output layer that was trained by regularized least-squares [45].Recently, Charisopoulos and Maragos formulated the training of a single morphological perceptron as the solution of a convex-concave optimization problem [46,47].Apart from the elegant formulation, the convex-concave procedure outperformed gradient descent methods in terms of accuracy and robustness on some computational experiments.
In this paper, we investigate the convex-concave procedure for training the DEP classifier.As a lattice-based model, DEP requires a partial ordering on both feature and class label spaces.Furthermore, the traditional approach assumes the feature space is equipped with the component-wise ordering induced by the natural ordering of real numbers.The component-wise ordering, however, may be inappropriate in the feature space.Based on ideas from multi-valued MM, we make use of supervised reduced orderings for the feature space of the DEP model.The resulting model is referred to as reduced dilation-erosion perceptron (r-DEP).The performance of the new r-DEP model is evaluated by considering 30 binary classification problems from the OpenML repository [48,49], most of which are also available at the well-known UCI Machine Learning Repository [50].
The paper is organized as follows.The next section presents a brief review on the basic concepts from lattice theory and MM, including the supervised reduced ordering-based approach to multi-valued MM.Traditional MNNs, including the DEP classifier and the convex-concave procedure, are discussed in Section 3. Section 4 presents the main contribution of this paper: the reduced DEP classifier.In Section 5, we compare the performance of the r-DEP classifier with other traditional machine learning approaches from the literature.The paper finishes with some concluding remarks in Section 6.

Basic Concepts from Lattice Theory and Mathematical Morphology
Let us begin by recalling some basic concepts from lattice theory and mathematical morphology (MM).Precisely, we shall only present the necessary concepts for understanding of morphological perceptron models.Furthermore, we will focus on elementary concepts without going deep into these rich theories.The reader interested on lattice theory is invited to consult [11].Detailed account on MM and its applications can be found in [14,17,23,25].The reader familiar with lattice theory and MM may skip Section 2.1.

Lattice Theory and Mathematical Morphology
First of all, a non-empty set L equipped with a binary relation "≤" is a partially ordered set (poset) if the following conditions hold true: P1: For all x ∈ L, we have x ≤ x.
(Transitivity) P3: If x ≤ y and y ≤ x, then x = y. (Antisymmetry) In this case, the binary relation "≤" is called a partial order.We speak of a pre-ordered set if L is equipped with a binary relation which satisfies the properties P1 and P2.
A partially ordered set L is a complete lattice if any subset X ⊂ L has a supremum (least upper bound) and an infimum (greatest lower bound) denoted respectively by X and X.When X = {x 1 , . . ., x n } is finite, we write X = n i=1 and X = n i=1 x i .
Example 1.The extended real numbers R = R ∪ {+∞, −∞} with the natural ordering is an example of a complete lattice.The Cartesian product of the extended real numbers Rn is also a complete lattice with the partial order defined as follows in a component-wise manner: In this case, the infimum and the supremum of a set X ⊆ Rn is also determined in a component-wise manner by X = X 1 , X 2 , . . ., X n and X = X 1 , X 2 , . . ., X n , where X i = {x i : (x 1 , . . ., x i , . . ., x n ) ∈ X} ⊆ R is the set of the ith component of all the vectors in X, for i = 1, . . ., n.
Mathematical morphology is a non-linear theory widely used for image processing and analysis [24,25].From the mathematical point of view, MM can be viewed as a theory of mappings between complete lattices.In fact, the elementary operations of MM are mappings that distribute over either infimum or supremum operations.Precisely, two elementary operations of MM are defined as follows [15,23]: Definition 1 (Erosion and Dilation).Let L and M be complete lattices.A mapping ε : L → M is an erosion and a mapping δ : L → M is a dilation if the following identities hold true for any X ⊆ L : From Theorem 3 on [34], we have the following example of dilations and erosions: The operators ε m : Rn → R and δ w : Rn → R given by for all x = (x 1 , x 2 , . . ., x n ) ∈ Rn are respectively an erosion and a dilation.
Remark 1.The mappings ε m and δ w given by (4) can be extended for m, w ∈ Rn by appropriately dealing with indeterminacy such as (+∞) + (−∞) [34].In this paper, however, we only consider finite-valued vectors m, w ∈ R n .
Definition 2 (Increasing Operators).Let L and M be two complete lattices.An operator ψ : L → M is isotone or increasing if x ≤ y implies ψ(x) ≤ ψ(y).
Despite the rich theory on morphological operators and their many successful applications, the concepts presented above are sufficient for this paper.Let us now turn our attention to some concepts from multi-valued MM.

Multi-valued Mathematical Morphology
Although MM can be very well defined on complete lattices (see Definition 1), there is no unambiguous ordering for vector-valued sets.For example, although Rn equipped with the componentwise ordering is a complete lattice, the partial order given by (1) does not take into account possible relationship between the vector components.Furthermore, the component-wise order given by (1) results the so-called "false color" problem in multi-valued MM [52].As a consequence, a great deal of effort has been devoted to finding appropriate ordering schemes for vector-valued data [26,27,29,31,53].Among the many approaches to multi-valued MM, those based on reduced orderings are particularly interesting and computationally cheap [32,33,53].
In a reduced ordering, also referred to as an r-ordering, the elements of a vector-valued non-empty set V are ranked according to a surjective mapping ρ : V → L, where L is a complete lattice.Precisely, an r-ordering is defined as follows using the mapping ρ : V → L: x ≤ ρ y ⇐⇒ ρ(x) ≤ ρ(y), ∀x, y ∈ V. (5) In analogy to Definition 2, r-increasing operators are defined as follows using r-orderings: Definition 3 (r-Increasing Operator).Let ρ : V → L and σ : W → M be surjective mappings from non-empty sets V and W to complete lattices L and M.An operator ψ : Although been reflexive and transitive, an r-ordering is in principle a pre-ordering because it may fails to be anti-symmetric.Notwithstanding, morphological operators can be defined as follows using reduced orderings [53]: Definition 4 (r-Increasing Morphological Operator).Let V and W be non-empty sets, L and M be complete lattices, and ρ : V → L and σ : W → M be surjective mappings.A mapping ψ r : V → W is an r-increasing morphological operator if there exists an increasing morphological operator ψ : L → M such that σψ r = ψρ, that is, In words, the mapping σ : W → M applied on an r-increasing morphological operator ψ r : V → W corresponds to the morphological operator ψ : L → M applied on the output of the mapping ρ : V → L. For example, an operator ε r : V → W is an r-increasing erosion, or simply an r-erosion, if there exists an erosion ε : L → M such that σε r = ερ.Dually, a mapping δ r : V → W is an r-increasing dilation, or simply an r-dilation, if there exists a dilation δ : L → M such that σδ r = δρ.
Remark 2. Definition 4, which is motivated by Proposition 2.5 in [53], generalizes of the notions of r-erosion and r-dilation as well as r-opening and r-closing from Goutsias et al.Precisely, if W = V and M = L, then one can consider σ = ρ, where ρ : V → L is a surjective mapping.Thus, an operator ε r : V → V is an r-erosion if and only if there exists an erosion ε : L → L such that ρε r = ερ, which is exactly the definition of r-erosion introduced by Goutsias et al. (shortly after Proposition 2.11 in [53]).

Example 3.
A general approach to multi-valued MM based on supervised reduced ordering have been proposed by Velasco-Forero and Angulo [33].Briefly, in a supervised reduced ordering the sobrejective mapping ρ : V → L is determined using training sets P and N of positive (foreground) and negative (background) values, respectively.Furthermore, the mapping ρ is expected to satisfy where = L and ⊥ = L denote respectively the largest (top) and the least (bottom) elements of L. As a consequence, a supervised r-ordering is interpretable with respect to the training sets P and N [32].Assuming V ⊂ R N and L ⊆ R, Velasco-Forero and Angulo proposed to determine ρ using a support vector machine [54][55][56].As usual, let us first combine the positive and negative sets into a single training set Given a suitable kernel function κ, the supervised reduced ordering mapping ρ corresponds to the decision function of a support vector classifier (SVC) given by where α = (α 1 , . . ., α M ) ∈ R M is the solution of the quadratic programming problem where C > 0 is a user specified parameter which controls the trade-off between minimizing the training error and maximizing the separation margin [55,56].Examples of kernels include • Gaussian kernel: The Gaussian kernel yields a radial-basis function (RBF) SVC.
• Polynomial kernel: The polynomial kernel yields a polynomial SVC.
We would like to point out that the intercept term is not relevant for a reduced ordering scheme and, thus, we refrained from including it on (8).In addition, we would like to remark that the decision function ρ maps the multi-valued set V to a totally ordered subset of R, which allows for efficient implementation of multi-valued morphological operators using look-up table and the usual gray-scale operators (see Algorithm 1 in [32] for details).
Recall that a classifier is a mapping φ : V → C, where V and C are respectively the sets of features and classes.In a binary classification problem, the set C = {c 1 , c 2 } of classes can be identified with {−1, +1} by means of a one-to-one mapping σ : C → {−1, +1}.In this paper, we say that the elements of V associated with the class labels −1 and +1 belong respectively to the negative and positive classes.In a supervised binary classification task, the classifier φ :

Morphological and Dilation-Erosion Perceptron Models
Morphological perceptron has been introduced by Ritter and Sussner in the mid-1990s for binary classification problems [35].In analogy to Rosenblatt's perceptron, Ritter and Sussner define a morphological perceptron by either one of the two equations where f denotes a hard limiter activation function.To simplify the exposition, we consider in this paper f ≡ sgn with the convention sgn(0) = +1.By adopting the signal function, the morphological perceptrons given by ( 14) can be used for binary classification whose labels are −1 and +1.
Note that a morphological perceptron is given by either the composition f ε m or the composition f δ w , where ε m : Rn → R and δ w : Rn → R denote respectively the dilation and the erosion given by (4) for m, w ∈ R n .Therefore, we refer to the models in (14) as erosion-based and dilation-based morphological perceptrons, respectively.Furthermore, ε m : Rn → R and δ w : Rn → R are respectively the decision functions of the erosion-based and dilation-based morphological perceptron.
Let us now briefly address the geometry of the morphological perceptrons with f ≡ sgn.Given a weight vector m = (m 1 , . . ., m n ) ∈ R n , let be the set of all points such that y = sgnε m (x) ≥ 0. Since ε m (x) = n j=1 (m j + x j ) ≥ 0 if and only if m j + x j ≥ 0 for all j = 1, . . ., n, we conclude that E(m) is equivalently given by The decision boundary of an erosion-based morphological perceptron sgnε m corresponds to the boundary of the set E(m): The class label +1 is assigned to all patterns in E(m) while the class label −1 is given to all patterns outside E(m).In view of this remark, we may say that an erosion-based morphological perceptron focuses on the positive class, whose label is +1.Dually, a dilation-based morphological perceptron focuses on the negative class, whose label is −1.Specifically, given a weight vector w = (w 1 , . . ., w n ) ∈ R n , the set D(w) of all points x ∈ R n such that y = sgnδ w (x) < 0 satisfies The decision boundary of a dilation-based morphological perceptron corresponds to the boundary of D(w), that is, patterns inside D(w) are classified as negative while patterns outside D(w) are classified as +1.For illustrative purposes, Figure 1 shows the sets E(1, 2.25) (yellow region) and D(2, 1) (purple region), obtained by considering respectively w = (2, 1) ∈ R 2 and m = (1, 2.25) ∈ R 2 .Figure 1 also shows the decision boundary of the dilation-erosion perceptron described below.
In the previous paragraph, we pointed out that erosion-based and dilation-based morphological perceptrons focus respectively on the positive and the negative classes.The dilation-erosion perceptron (DEP) proposed by Araújo allows a graceful balance between the two classes [40].The dilation-erosion perceptron is simply a convex combination of an erosion-based and a dilation-based morphological perceptron.In mathematical terms, given m, w ∈ R n and 0 ≤ β ≤ 1, the decision function of a DEP classifier is defined by The binary DEP classifier φ : R n → {−1, +1} is defined by the composition In other words, given m, w ∈ R n and β ∈ [0, 1], the class of an unknown pattern x ∈ R n is determined by evaluating φ(x) = sgnτ(x).
Note that τ given by ( 18) corresponds respectively to δ w and ε m when β = 0 and β = 1.More generally, the parameter β controls the trade off between the dilation-based and the erosion-based morphological perceptrons, which focus on negative and positive classes, respectively.For illustrative purposes, the decision boundary of a DEP classifier obtained by considering β = 0.2, m = (1, 2.25) and w = (2, 1) is depicted in Figure 1.In this case, the decision boundary of the DEP classifier φ is closer to the decision boundary of sgnε m than that of sgnδ w .We address a good choice of the parameter β of a DEP classifier in the following subsection.In the following subsection we also review the elegant convex-concave procedure proposed recently by Charisopoulus and Maragos to train the morphological perceptrons ε m and δ w [46].

Convex-Concave Procedure for Training Morphological Perceptron
In analogy to the soft-margin support vector classifier, the weights of a morphological perceptron can be determined by solving a convex-concave optimization problem [46,47].Precisely, consider a training set T = {(x i , d i ) : i = 1, . . ., m}, where x i ∈ R n is a training pattern and d i ∈ {−1, +1} is its binary class label for i = 1, . . ., m.To simplify the exposition, let N and P denote respectively the sets of negative and positive training patterns, that is, In addition, let ψ u be the decision function of either an erosion-based or a dilation-based morphological perceptron.In other words, let ψ u = ε m with u = m or ψ u = δ w with u = w.The vector u ∈ R n of either ψ u = ε m or ψ u = δ w is defined as the solution of the following convex-concave optimization problem: subject to where C is a regularization parameter, r is a reference value for u, ξ − i and ξ + i are slack variable and ν − i ≥ 0 and ν + i ≥ 0 are their penalty weights.As usual, |N | and |P | denote the cardinality of N and P, respectively.Different from the procedure proposed by Charisopoulos and Maragos [46], the convex-concave optimization problem proposed in this paper includes the regularization term C u − r 1 in the objective function.
The slack variables ξ − i and ξ + i measure the classification error of negative and positive training patterns weighted by ν − i and ν + i , respectively.Indeed, the objective function is minimized when all slack variables are non-positive, that is, ξ − i ≤ 0 and ξ + i ≤ 0 for all index i.On the one hand, a negative training pattern x i ∈ N is mis-classified if ψ u (x i ) > 0. From (22), however, we have 0 < ψ u (x i ) ≤ ξ − i and, therefore, the objective function is not minimized.On the other hand, if a positive training pattern x i ∈ P is mis-classified then 0 > ψ u (x i ) ≥ −ξ i .Equivalently, ξ i > 0 and, again, the objective is not minimized.
The slack variable penalty weights ν i 's have been introduced to deal with the presence of outliers.The following presents a simple weighting scheme proposed by Charisopoulus and Maragos to penalizes training patterns with greater chances of being outliers [46].Let µ − and µ + be the mean of the negative and positive training patterns, that is, In addition, let λ − i and λ + i be the reciprocal of the distance between x i and either the mean µ − i or µ + i .In mathematical terms, define Finally, the slack variable weights ν − i and ν + i are obtained by scaling λ − i and λ + i to the interval (0, 1] as follows for all indexed i: As to the reference, we recommend respectively r = − N and r = − P for the synaptic weights w and m.In this case, δ w and ε m classify correctly the largest possible number of negative and positive training patterns, respectively.In addition, we recommend a small regularization parameter C so that the objective is dominated by the classification error measured by the slack variables.Although in our computational implementation we adopted C = 10 −2 , we recommend to fine tune this hyper-parameter using, for example, exhaustive search or a randomized parameter optimization strategy [63]. Finally, we propose to train a DEP classifier using a greedy algorithm.Intuitively, the greedy algorithm first finds the best erosion-based and the best dilation-based morphological perceptrons and then it seeks for their best convex combination.Formally, we first solve two independent convex-concave optimization problems formulated using ( 21)-( 23), one to determine the synaptic weight m of the erosion-based morphological perceptron ε m and the other to compute w of the dilation-based morphological perceptron δ w .Subsequently, we determine the parameter β by minimizing the average hinge loss.In mathematical terms, β is obtained by solving the constrained convex problem: Remark 3. In our computational experiments, we solved the optimization problem (21)-( 23) using CVXOPT python package with the DCCP extension for convex-concave programing [47] and the MOSEK solver.Further information on the MOSEK software package can be obtained on www.mosek.com.The source-code of the DEP classifier, trained using convex-concave programming and compatible with the scikit-learn API, is available at https://github.com/mevalle/r-DEP-Classifier.Despite the DEP classifier yielding satisfactory accuracy scores in test sets from the two previous examples, this classifier has a serious drawback: As a lattice-based classifier, the DEP classifier presupposes a partial ordering on the feature space as well as on the set of classes.From ( 14), the component-wise ordering given by ( 1) is adopted in the feature space while the usual total ordering of real-numbers is used to rank the class labels.Most importantly, the DEP classifier φ : R n → {−1, +1} defined by the composition ( 19) is an increasing operator because both sgn and τ are increasing operators.Note that τ is increasing because it is the convex combination of increasing operators ε m and δ w .
As a consequence, the patterns from the positive class must be in general greater than the patterns from the negative class.In many practical situations, however, the component-wise ordering of the feature space is not in agreement with the natural ordering of the class labels.For example, if we invert the class labels on the synthetic dataset of Ripley, the accuracy score of the DEP classifier decreases to 0.33 and 0.31 for the training and test data, respectively.Similarly, the accuracy score of the DEP classifier decreases respectively to 0.66 and 0.65 on training and test set if we invert the class labels in the double-moon classification problem.Fortunately, we can circumvent this drawback through the use of dendrite computations [37], morphological competitive units [34], or hybrid morphological/linear neural networks [41,45].Alternatively, we can avoid the inconsistency between the partial orderings of the feature and class spaces by making use of multi-valued mathematical morphology.

Reduced Dilation-Erosion Perceptron
As pointed out in the previous section, the DEP classifier is an increasing operator φ : R n → {−1, +1}, where the feature space R n is equipped with the component-wise ordering given by (1) while the set of classes {−1, +1} inherits the natural ordering of real-numbers.In many practical situations, however, the component-wise ordering is not appropriate for the feature space.Motivated by the developments on multi-valued MM, we propose to circumvent this drawback using reduced orderings.Precisely, we introduce the so-called reduced dilation-erosion perceptron (r-DEP) which is a reduced morphological operator derived from (19).
Formally, let us assume the feature space is a vector-valued nonempty set V and let C = {c 1 , c 2 } be the set of classes.In practice, the feature space V is usually a subset of R n , but we may consider more abstract feature sets.In addition, let L = Rr and M = {−1, +1} be complete lattices with the component-wise ordering and the natural ordering of real numbers, respectively.Consider the DEP classifier φ : L → M defined by (19) for some w, m ∈ R r and 0 ≤ β ≤ 1.Given a one-to-one mapping σ : C → {−1, +1} and a surjective mapping ρ : V → L, from Definition 4, the mapping φ r : V → C given by is an r-increasing morphological operator because φ : L → M is increasing and the identity σφ r = φρ.holds true.Most importantly, (28) defines a binary classifier φ r : V → C called reduced dilation-erosion perceptron (r-DEP).The decision function of the r-DEP classifier is the mapping τ r : V → R given by Note that the r-DEP classifier φ r is obtained from its decision function τ r by means of the identity Simply put, the decision function τ r of an r-DEP is obtained by composing the surjective mapping ρ : V → L and the decision function τ of a DEP, that is, τ r = τρ.In other words, τ r is obtained by applying sequentially the transformation ρ and τ.Thus, given a training set T = {(x i , d i ) : i = 1, . . ., m} ⊂ V × C, we simply train a DEP classifier using the transformed training data Then, the classification of an unknown pattern x ∈ V is achieved by computing φ r (x) = σ −1 sgnτρ(x).
The major challenge for the design of a successful r-DEP classifier is how to determine the surjective mapping ρ : V → R r .Intuitively, the mapping ρ performs a kind of dimensionality reduction which takes into account the lattice structure of patterns and labels.In this paper, we propose to determine ρ : V → R d in a supervised manner.Specifically, based on the successful supervised reduced orderings proposed by Velasco-Forero and Angulo [33], we define ρ : V → L using the decision function of support vector classifiers.
Formally, consider a training set T = {(x i , d i The mapping ρ : R n → R r is defined in a component-wise manner by means of the equation ρ(x) = (ρ 1 (x), ρ 2 (x), . . ., ρ r (x)), where ρ 1 , ρ 2 , . . ., ρ r : R n → R are the decision functions of distinct support vector classifiers.Recall that the decision function of a support vector classifier is given by (8).Moreover, the distinct support vector classifiers can be determined using either one of the following approaches referred to as ensemble and bagging: • Ensemble: The support vector classifiers are determined using the whole training set T but they have different kernels.
• Bagging: The support vector classifiers have the same kernel and parameters but they are trained using different samples of the training set T .
The following examples, based on Ripley's and double-moon datasets, illustrate the transformation provided by these two approaches.The following examples also address the performance of the r-DEP classifier.
Example 6 (Ripley's Dataset).Consider the synthetic dataset of Ripley [64].Using the Gaussian radial basis function (RBF SVC) and the linear SVC (Linear SVC), both with the default parameters of python's scikit-learn API, we determined the reduced mapping ρ from the training data.Figure 3a) shows the scatter plot of the transformed training set T r given by (30). Figure 3a) also shows the regions D(−0.59, −1.28) and E(1.00, 0.57) and the decision boundary (black-dashed-line) of the DEP classifier on the transformed space.In this example, the convex-concave optimization problem given by ( 21)-( 23) and the minimization of the hinge loss (27) yielded m = (1.00,0.57), w = (−0.59,−1.28), and β = 0.54. Figure 3b) shows the decision boundary of the ensemble r-DEP classifier (black) on the original space together with the scatter plot of the original test set.For comparison purposes, 3b) also shows the decision boundary of the RBF-SVC (blue), linear SVC (green), and the hard-voting classifier (red) obtained using the RBF and linear SVCs.Table 1 contains the accuracy score (between 0 and 1) of each of the classifiers on both training and test sets.Note that the greatest accuracy scores on the test set have been achieved by the r-DEP and the RBF-SVC classifiers.In particular, the r-DEP classifier outperformed the hard-voting ensemble classifier in this example.
Similarly, we determined the mapping ρ using a bagging of two distinct RBF SVCs trained with different samplings of the original training set.Precisely, we used the default parameters of a bagging classifier (BaggingClassifier) of the scikit-learn but with only two esmitamtors (n_estimators=2) for a visual interpretation of the transformed data.Figure 3c) shows the scatter plot of the training data along with the regions D(−0.8, −0.5) and E(1.09, 0.90).In this example, the optimization problem (27) yielded β = 0.76. Figure 3d) shows the scatter plot of the original data and the decision boundaries of the classifiers: bagging r-DEP (black), RBF SVC 1 (blue), RBF SVC 2 (green), and the bagging of the two RBF SVCs (red).Table 1 contains the accuracy score of these four classifiers on both training and test data.Although the RBF SVC 1 yielded the greatest accuracy score in the test set, the bagging r-DEP produced the largest accuracy on the training set.In general, however, the four classifiers are competitive.Example 7 (Double-Moon).In analogy to the previous example, we also evaluated the performance of the r-DEP classifier on the double-moon problem presented in Example 5. Figure 4a,c) show the transformed training set obtained from the mappings determined using the ensemble and bagging strategies, respectively.We considered again a Gaussian RBF and a linear SVC in the ensemble strategy and two Gaussian RBF SVCs for the bagging.In addition, we adopted the default parameters of python's scikit-learn API except for the number of estimators in the bagging strategy which we set to two (n_estimators = 2) for a visual interpretation of the transformed data.Figure 4b) shows the scatter plot of the original test set with the decision boundaries of the ensemble r-DEP (black), RBF SVC (blue), linear SVC (green), and the hard-voting ensemble classifier (red).
Similarly, Figure 4d) shows the test data with the decision boundary of the bagging r-DEP (black), RBF SVC 1 (blue), RBF SVC 2 (green), and the bagging classifier (red).Table 2 lists the accuracy score (between 0 and 1) of all the classifiers on both training and test sets of the double-moon problem.As expected, the linear SVC yielded the worst perforamnce.The largest scores have been achieved by both ensemble and bagging r-DEP as well as the Gaussian RBF SVCs and their bagging.We would like to point out that, in contrast to the original DEP classifier, the perforamance of the r-DEP model remains high if we change the pattern labels.In the following section we provide more conclusive computational experiments concerning the performance of r-DEP for binary classification.

Computational Experiments
Let us now provide extensive computational experiments to evaluate the performance of the ensemble and bagging r-DEP classifiers.In the ensemble strategy, the mapping ρ is obtained by considering a RBF SVC, a linear SVC, and a polynomial SVC.The bagging strategy consists of 10 RBF SVCs where each base estimator has been trained using a sampling of the original training set with replacement.Let us also compare the new r-DEP classifiers with the original DEP classifier, linear SVC, RBF SVC, the polynomial SVC (poly SVC) as well as an ensemble of the three SVCs and a bagging of RBF SVCs.We would like to point out that we used the default parameters of the python's scikit-learn API in our computational experiments [65,66].
We considered a total of 30 binary classification problems from the OpenML repository available at https://www.openml.org/[48].We would like to point out that most datasets we considered are also available at the well-known UCI machine learning repository [50].We used the OpenML repository because all the datasets can be accessed by means of the command fetch_openml from python's scikit-learn [65].Moreover, we handled missing data using the SimpleImputer command, also from scikit-learn.Table 3 lists the 30 datasets considered.Table 3 also include the number of instances (#instances), the number of features (#features), the percentage of the negative and positive patterns, denoted by the pair (N %, P%), and the OpenML name/version.
Note that the number of samples ranges from 200 (Arsene) to 14,980 (Egg-Eye-State) while the number of features varies from 2 (Banana) to 10,000 (Arsene).Furthermore, some datasets such as the Sick and Toracic Surgery are extremely unbalanced.Therefore, we used the balanced accuracy score, which ranges from 0 to 1, to measure the performance of a classifier [67].Table 4 contain the mean and standard deviation of the balanced accuracy score obtained using a stratified 10-fold cross-validation.The largest mean score for each dataset have been typed using boldface.
We would like to point out that, to avoid biases, we used the same training and test partition for all the classifiers.In addition, we pre-processed the data using the command StandardScaler from scikit-learn, that is, we computed the mean and the standard deviation of each feature on the training set and normalized both training and test sets using the obtained values.The StandardScaler transformation has also been applied on the output of the ρ mapping.The source-code of the computational experiment is available at https://github.com/mevalle/r-DEP-Classifier.
From Table 4, the largest average of the balanced accuracy scores have been achieved by the ensemble and bagging r-DEP classifiers.Using paired Student's t-test with confidence level at 99%, we confirmed that the ensemble and bagging r-DEP, in general, performed better than the other classifiers.In fact, Figure 5 shows the Hasse diagram of the outcome of paired hypothesis tests [68,69].Specifically, an edge in this diagram means that the hypothesis test discarded the null hypothesis that the classifier on the top yielded balanced accuracy score less than or equal to the classifier on the bottom.For example, Student's t-test discarded the null hypothesis that the ensemble r-DEP classifier performs as well as or worst than the hard-voting ensemble of SVCs.In other words, the ensemble r-DEP statistically outperformed the ensemble of SVCs.Concluding, in Figure 5, the method on the top of an edge statistically outperformed the method on the bottom.The outcome of the computational experiment is also summarized on the boxplot shown on Figure 6.The boxplot confirms that the ensemble and bagging r-DEP classifiers yielded, in general, the largest balanced accuracy scores.This boxplot also reveals the poor performance of the DEP classifier which presupposes the positive samples are, in general, greater than or equal to the negative samples according to the component-wise ordering.In particular, the three points above the box of the DEP classifier corresponds to the average balanced accuracy score values 0.90, 0.88, and 0.77 obtained from the datasets Ionosphere, Breast Cancer Wisconsin, and Internet Advertisement, respectively.It turns out, however, that the ensemble and bagging r-DEP classifiers outperformed the original DEP model even in these three datasets.This remark confirms the important role of the transformations ρ and σ for successful applications of increasing lattice-based models.4. In general, the ensemble and bagging r-DEP classifiers yielded largest balanced accuracy scores.

Concluding Remarks
In analogy to Rosemblatt's perceptron, the morphological perceptron introduced by Ritter and Sussner can be applied for binary classification [35].In contrast to the traditional perceptron, however, the usual algebra is replaced by lattice-based operations in the morphological perceptron models.Specifically, the erosion-based and the dilation-based morphological perceptrons compute respectively an erosion ε m and a dilation δ w given by (3) followed by the application of the sign function.The erosion-based and dilation-based morfological perceptrons focus respectively on the positive and negative classes.A graceful balance between the two morphological perceptrons is provided by the dilation-erosion perceptron (DEP) classifier whose decision function given by ( 18) is nothing but a linear combination of the an erosion ε m and a dilation δ w [40].
In this paper, we propose to train a DEP classifier in two steps.First, based on the works of Charisopoulus and Maragos [46], the synaptic weights m and w of ε m and δ w are determined by solving two independent convex-concave optimization problems given by ( 21)-( 23) [47].Subsequently, the parameter β is determined by minimizing the hinge loss given by (27).
Despite its elegant formulation, as a lattice-based model the DEP classifier presupposes that both feature and class spaces are partially ordered sets.The feature patterns, in particular, are ranked according to the component-wise ordering given by (1).Furthermore, the DEP classifier is an increasing operator.Therefore, it implicitly assumes a relationship between the orderings of features and classes.In many practical situations, however, the component-wise ordering is not appropriate for ranking features.Using results from multi-valued mathematical mophology, in this paper we introduced the reduced dilation-erosion perceptron (r-DEP) classifier.The r-DEP classifier corresponds to the r-increasing morphological operator derived from the DEP classifier φ by means of (28) using a one-to-one correspondence σ between the set of classes C and {−1, +1} and a surjective mapping ρ from the feature space V to Rr .Finding appropriate transformation mapping ρ is the major challenge on the design of an r-DEP classifier.
Inspired by the supervised reduced ordering proposed by Velasco-Forero and Angulo [33], we defined the transformation mapping ρ using the decision functions of either an ensemble of SVCs with different kernels or a bagging of a base SVC trained using different samples of the original traning set.The source-codes of the ensemble and bagging r-DEP classifiers are available at https: //github.com/mevalle/r-DEP-Classifier.Both ensemble and bagging r-DEP classifiers yielded the highest average of the balanced accuracy score among SVCs, their ensemble, and bagging of RBF SVCs, on 30 binary classification problems from the OpenML repository.Furthermore, paired Student's t-test with significance level at 99% confirmed that the bagging r-DEP classifier outperformed the individual SVCs as well as their ensemble in our computational experiment.The outcome of the computational experiment shows the potential application of the ensemble and bagging r-DEP classifier on practical pattern recognition problems including-but not limited to-credit card fraud detection or medical diagnosis.Moreover, although we only focused on binary classification, multi-class problems can be addressed using one-against-one or one-against-all strategies available, for instance, in the scikit-learn API.
In the future, we plan to investigate further the approaches used to determine the mapping ρ.We also intent to study in details the optimization problem used to train a r-DEP classifier.

Figure 1 .
Figure 1.The purple and the yellow regions corresponds respectively to the sets E(1, 2.25) and D(2, 1).The piece-wise linear curve corresponds to the decision boundary of the DEP classifier with β = 0.2.

Example 4 (Figure 2 .
Figure 2. Performance of the dilation-erosion perceptron (DEP) classifier on Ripley's and double-moon datasets described on Examples 4 and 5. Scatter plot of test data and the decision boundary of DEP classifier.The purple and the yellow regions corresponds respectively to the sets D(w) and E(m).

Figure 3 .
Figure 3. Performance of ensemble and bagging reduced dilation-erosion (r-DEP) classifiers on Ripley's dataset (see Example 6).(a) and (c) show the scatter plot of transformed training data, the regions D(w) and E(m), and the decision boundary of the DEP classifier.(b) and (d) depict the scatter plot of the original test data and the decision boundary of r-DEP (black) and other binary classifiers.

Figure 4 .
Figure 4. Performance of the ensemble r-DEP classifier on double-moon datasets described on Example 7. (a) and (c) show the scatter plot of transformed training data, the regions D(w) and E(m), and the decision boundary of the DEP classifier.(b) and (d) depict the scatter plot of the original test data and the decision boundary of r-DEP (black) and other binary classifiers.

Figure 6 .
Figure 6.Boxplot summarizing the average balanced accuracy scores provided on Table4.In general, the ensemble and bagging r-DEP classifiers yielded largest balanced accuracy scores.

Table 1 .
Accuracy score of the classifiers considered in Example 6 on Ripley's dataset.

Table 2 .
Accuracy score of the classifiers considered in Example 7 on double-moon problem.

Table 3 .
Informations on the considered datasets.
Figure 5. Hasse diagram of paired Student's t-test with confidence level at 99%.