Pattern Classification for Mixed Feature-Type Symbolic Data Using Supervised Hierarchical Conceptual Clustering

Ichino, Manabu; Yaguchi, Hiroyuki

doi:10.3390/stats8030076

Open AccessArticle

Pattern Classification for Mixed Feature-Type Symbolic Data Using Supervised Hierarchical Conceptual Clustering

by

Manabu Ichino

^* and

Hiroyuki Yaguchi

School of Science and Engineering, Tokyo Denki University, Hatoyama 350-0394, Saitama, Japan

^*

Author to whom correspondence should be addressed.

Stats 2025, 8(3), 76; https://doi.org/10.3390/stats8030076

Submission received: 5 July 2025 / Revised: 7 August 2025 / Accepted: 8 August 2025 / Published: 25 August 2025

Download

Browse Figures

Versions Notes

Abstract

This paper describes a region-oriented method of pattern classification based on the Cartesian system model (CSM), a mathematical model that allows manipulating mixed feature-type symbolic data. We use the supervised hierarchical conceptual clustering to generate class regions for respective pattern class based on the evaluation of the generality of the regions and the separability of the regions against other classes in each clustering step. We can easily find the robustly informative features to describe each pattern class against other pattern classes. Some examples show the effectiveness of the proposed method.

Keywords:

classification; supervised feature selection; hierarchical conceptual clustering; generality; separability; Cartesian system model; symbolic data

1. Introduction

Feature selection has been extensively developed in pattern recognition, data mining [1,2,3], and, more generally, symbolic data analysis [4]. Feature selection has two main problems: how to evaluate the importance of the given features and how to find a set of important feature subset from the given set of features. As summarized in [1,2,3,5], many feature selection methods have been developed by combining various feature evaluation measures (distance measure, information measure, dependency measure, consistency measure, and classification error rate) and search strategies (heuristic, complete, and random).

The authors reported an unsupervised feature selection method using the hierarchical conceptual clustering for histogram-valued symbolic data [6]. In this method, each object is described by a series of quantile vectors, and the concept size, called the compactness, is used as the similarity measure between objects and/or clusters represented by the quantile vectors. It should be noted that the compactness also plays the roles of cluster quality criterion and feature-effectiveness criterion. This fact simplifies the process of unsupervised feature selection.

As a pattern classifier for mixed feature-type symbolic data, the authors proposed region-oriented method based on the Cartesian system model (CSM) [7], where the authors pointed out the trade-off between the "separability" of classes and the "generality" of class descriptions, and asserted the importance of feature selection.

The purpose of this paper is to realize a feature selection method for our region-oriented classifier using the supervised hierarchical conceptual clustering.

In Section 2, the CSM is defined by the triplet (D^(d), ⊞, ⊠), where D^(d) is the feature space composed of d mixed-type features. We define the Cartesian join operator ⊞ and the Cartesian meet operator ⊠ on the feature space. The Cartesian join generates a generalized description of the given two descriptions in the feature space. On the other hand, the Cartesian meet extracts a common description from the given two descriptions in the feature space. We define the measure of compactness with respect to the Cartesian join in order to evaluate the generality of cluster descriptions in each clustering step. Subsection 2.2 describes the algorithm of our unsupervised hierarchical conceptual clustering using the compactness. Subsection 2.3 describes our classification model based on the supervised hierarchical conceptual clustering. This clustering algorithm checks, in each clustering step, not only the generality of the class description but also the separability of the class description against other pattern classes using the Cartesian meet operator.

Section 3 describes three experimental results in order to show the effectiveness of the proposed method. Section 4 discusses the obtained results.

2. Cartesian System Model (CSM) and Feature Selection Using Hierarchical Conceptual Clustering

In this section, we describe the Cartesian system model, a mathematical model for mixed feature-type symbolic data. We define a measure to evaluate the compactness of the generated concepts, and we describe an algorithm of the unsupervised hierarchical conceptual clustering. Then, we describe the classification model based on the supervised hierarchical conceptual clustering.

2.1. The Cartesian System Model (CSM)

Let D_k be the domain of feature F_k, k =1, 2, …, d. Then, the feature space is defined by

D^(d)= D₁ × D₂ × ⋯ × D_d.

(1)

Since we permit the simultaneous use of various feature types, we use the notation D^(d) for the feature space in order to distinguish it from usual d-dimensional Euclidean space D^d. Each element of D^(d) is represented by

E = E₁ × E₂ × ⋯ × E_d or E = (E₁, E₂, …, E_d).

(2)

where E_k is the feature value taken by the feature F_k, k = 1, 2, …, d. We are able to use the following feature types:

(1): Continuous quantitative feature (e.g., height and weight);
(2): Discrete quantitative feature (e.g., the number of family members);
(3): Ordinal qualitative feature (e.g., academic career, etc., where there is some kind of ordered relationships between values)
(4): Nominal qualitative feature (e.g., gender and blood type).

When we use feature types (1), (2), and (3), we permit interval values of the form E_k = [a, b], and in the case of feature type (4), we allow for finite sets as feature values. The Cartesian product (2) described in terms of feature types (1)–(4) is called an event in the feature space D^(d).

2.1.1. The Cartesian Join Operator

The Cartesian join, A⊞B, of a pair of events A = (A₁, A₂, …, A_d) and B = (B₁, B₂, …, B_d) in the feature space D^(d), is defined by

A⊞B = [A₁⊞B₁] × [A₂⊞B₂] × ⋯ × [A_d⊞B_d],

(3)

where [A_k⊞B_k] is the Cartesian join of feature values A_k and B_k for feature F_k and is defined as follows.

When F_k is a quantitative or an ordinal qualitative feature, let A_k = [A_k_L, A_k_U] and B_k = [B_k_L, B_k_U], then [A_k⊞B_k] is the closed interval given by

[A_k⊞B_k] = [min(A_{k_L}, B_{k_L}), max(A_{k_U}, B_{k_U})].

(4)

When F_k is a nominal feature, [A_k⊞B_k] is the union:

[A_k⊞B_k] = A_k ∪ B_k.

(5)

Figure 1a illustrates the Cartesian join of two interval-valued events A and B in the Euclidean plane.

2.1.2. The Cartesian Meet Operator

The Cartesian meet A⊠B of a pair of events A = (A₁, A₂, …, A_d) and B = (B₁, B₂, …, B_d) in the feature space D^(d) is defined by

A⊠B = [A₁⊠B₁] × [A₂⊠B₂] × ⋯ × [A_d⊠B_d],

(6)

where [A_k⊠B_k] is the Cartesian meet of feature values A_k and B_k for feature F_k defined by the intersection:

[A_k⊠B_k] = A_k ∩ B_k.

(7)

When the intersection (7) takes the empty value ∅, for at least one feature, the events A and B have no common part. We denote this fact by

A⊠B = ∅,

(8)

and we say that A and B are completely distinguishable. Figure 1b illustrates the Cartesian meet of two interval-valued events, A and B. We call the triplet (D^(d), ⊞, ⊠) the Cartesian System Model (CSM).

2.1.3. Concept Size

Let U = {ω₁, ω₂, …, ω_N} be the given set of sample objects without class labels, and let each sample object ω_k be described by an event E_k = E_k₁ × E_k₂ × ⋯ × E_k_d in D^(d). We define the concept size P(E_k_j) of ω_k in terms of feature F_j as the probability:

P(E_k_j) = |E_k_j|/|D_j|, j = 1, 2, …, d; k = 1, 2, …, N, and

(9)

0 ≤ P(E_k_j) ≤ 1, j = 1, 2, …, d; k = 1, 2, …, N,

(10)

where |D_j| is the length of the interval D_j, if feature F_j is type (1), (2), or (3) in Section 2.1, and is the number of elements in the set D_j if feature F_j is type (4). Then, we define the concept size P(E_k) of ω_k in the feature space D^(d) by the arithmetic mean:

P(E_k) = {P(E_k₁) + P(E_k₂) + ⋯ + P(E_k_d)}/d, and

(11)

0 ≤ P(E_k) ≤ 1, k = 1, 2, …, N.

(12)

2.2. Measure of Compactness and Unsupervised Hierarchical Conceptual Clustering

2.2.1. Measure of Compactness

In the CSM, each sample object ω_p is described as an event E_p which is equivalent to a conjunctive logical expression and is regarded as a minimum unit of concept described by d features. Now, we define the measure of compactness for the concept generated by two sample objects ω_p and ω_q, written C(ω_p, ω_q), as follows:

C(ω_p, ω_q) = P(E_p⊞E_q) = {P(E_p₁⊞E_q₁) + P(E_p₂⊞E_q₂) + ⋯ + P(E_p_d⊞E_q_d)}/d.

(13)

Since the Cartesian join E_p⊞E_q generates the smallest description spanned by the given two events E_p and E_q, the compactness C(ω_p, ω_q) evaluates the quantitative size of the generated “concept”. The compactness satisfies the following properties:

(1): 0 ≤ C(ω_p, ω_q) ≤ 1;
(2): C(ω_p, ω_p) = P(E_p) ≥ 0;
(3): C(ω_p, ω_q) = C(ω_q, ω_p);
(4): C(ω_p, ω_q) = 0 iff E_i ≡E_l and has null size (P(E_i) = 0);
(5): C(ω_p, ω_p), C(ω_q, ω_q) ≤ C(ω_p, ω_q);
(6): The triangle law C(ω_p, ω_q) ≤ C(ω_p, ω_r) + C(ω_r, ω_q) may not hold as in Figure 2.

2.2.2. Unsupervised Hierarchical Conceptual Clustering

Let U = {ω₁, ω₂, …, ω_N} be the given set of sample objects, and let each ω_k be described by an event E_k = E_k₁ × E_k₂ × ⋯ × E_kd in the feature space D^(d). By using the measure of compactness defined in (13), we can construct the following algorithm [6].

Algorithm of the unsupervised hierarchical conceptual clustering:

Step (1): For each pair of sample objects ω and ω’ in U, calculate the compactness C(ω, ω’) in (13) and find the pair ω_p and ω_q that has the minimum compactness.
Step (2): Generate the merged concept ω_pq of ω_p and ω_q in U, and replace ω_p and ω_q in U by the new concept ω_pq, where ω_pq is described by the Cartesian join E_pq = E_p⊞E_q in the feature space D^(d).
Step (3): Repeat Step 2 until U includes only one concept (i.e., the whole concept generated by the given N sample objects). (End of Algorithm)

In the above algorithm, Step 1 finds the most similar sample objects and/or sub-concepts that should be merged into a single concept, and then Step 2 characterizes a new extensional concept by the Cartesian join region, which is equivalent to a conjunctive logical expression. Since minimizing the concept size of the Cartesian join region means to maximize the dissimilarity of the join region from the whole region D^(d), the compactness plays the role of the cluster quality criterion. We should also note the fact that through the steps of agglomeration process, we can know not only the sizes of sub-concepts but also which features create the differences between sub-concepts. Therefore, the compactness also plays the role of the feature-effectiveness criterion.

As an illustrative example, we apply our unsupervised hierarchical conceptual clustering to the data in Table 1, where each sample is described by five features:

(1): Specific gravity,
(2): Freezing point (°C),
(3): Iodine value,
(4): Saponification value,
(5): Major acids.

Figure 3 shows the results for two selected informative features: iodine value and specific gravity. We have three explicit clusters: linseed and perilla; cotton, sesame, olive, and camellia; and beef and hog. These three clusters take similar concept-size values for the selected two features.

Table 1. Fats and oils data [6].

Sample	Specific G.	Freezing P.	Iodine V.	Saponification V.	Major Acids
Linseed	[0.930, 0.935]	[−27, −18]	[170, 204]	[118, 196]	[1.75, 4.81]
Perilla	[0.930, 0.937]	[−5, −4]	[192, 208]	[188, 197]	[0.77, 4.85]
Cotton	[0.916, 0.918]	[−6, −1]	[99, 113]	[189, 198]	[0.42, 3.84]
Sesame	[0.920, 0.926]	[−6,−4]	[104, 116]	[187, 193]	[0.91, 3.77]
Camellia	[0.916, 0.917]	[−21, −15]	[80, 82]	[189, 193]	[2.00, 2.98]
Olive	[0.914, 0.919]	[0, 6]	[79, 90]	[187, 196]	[0.83, 4.02]
Beef	[0.860, 0.870]	[30, 38]	[40, 48]	[190, 199]	[0.31, 2.89]
Hog	[0.858, 0.864]	[22, 32]	[53, 77]	[190, 202]	[0.37, 3.65]

2.3. Classification Model Based on the Supervised Conceptual Clustering

In this subsection, we describe a classification model, and we use the term “sample pattern” when referring to a sample object.

2.3.1. Classification Model

Let C₁, C₂, …, C_M be M pattern classes and, for each pattern class C_k, let

U_k = {ω_k₁, ω_k₂, …, ω_k_Nk}, k = 1, 2, …, M

(14)

be the given set of N_k sample patterns for class C_k. Let each sample pattern ω_k_j in class C_k be represented in the d-dimensional feature space D^(d) as follows:

E_kj = E_kj₁ × E_kj₂ × ⋯ × E_kj_d, j = 1, 2, …, N_k; k = 1, 2, …, M.

(15)

From the consistency viewpoint, we assume that, for any pair of p and q (≠ p), U_p and U_q have no common sample patterns:

U_p ∩ U_q = ∅.

(16)

This condition, however, does not mean that two pattern classes share the disjoint regions in the feature space D^(d). For example, in Figure 4, two pattern classes share an overlapped region O, but the training sets do not have any common sample pattern.

We use the term subclass for a subset of sample patterns from a pattern class. A subclass of a pattern class is described by a closed region in the feature space D^(d). A closed region for a subclass is reduced to a hyper box when the feature space D^(d) is a Euclidean space. Moreover, a sample pattern given as a point in the feature space D^(d) is also regarded as a closed region that describes a subclass.

Now, let Ω_kp and R_kp, p =1, 2, …, m_k (≤N_k), be m_k subsets of the training set U_k, i.e., subclasses of class C_k, and their representation in the feature space D^(d), respectively, such that

E_kp⊆ (R_k1∪ R_k2∪ ⋯ ∪ R_km_k), p = 1, 2, …, N_k, k = 1, 2, …, M; and

(17)

E_kp⊠R_jq = ∅, p = 1, 2, …, N_k, q = 1, 2, …, m_j, j = 1, 2, …, M (j ≠ k),

(18)

where ∅ denotes that two descriptions E_kp and R_jq are completely distinguishable. The existence of subclasses satisfying (17) and (18) is guaranteed by the consistency condition (16). In fact, if we set Ω_kp = {ω_kp} and R_kp= E_kp, p =1, 2, …, N_k, for each pattern class C_k, these reduced subclasses satisfy conditions (17) and (18).

In the fats and oils dataset, we assume two pattern classes: plant oils and fats. If we take the Cartesian join of all plant oils and fats, respectively, with respect to each feature, we have the result in Table 2. The two pattern classes are completely disjoint with respect to the first three features: specific gravity, freezing point, and iodine value. In unsupervised feature selection, the most informative features throughout the clustering process are specific gravity and iodine value. However, the results of Table 2 suggest that freezing point is more informative than iodine value in the classification between plant oils and fats. This is why we need to check the separability of a subclass from other pattern classes, in addition to checking its generality at each clustering step.

We define an asymmetric similarity measure from a pattern ω described by E = E₁ × E₂ × ⋯ × E_d in D^(d) to a subclass Ω_kq described by R_kq= R_kq₁ × R_kq₂ × ⋯ × R_kq_d in D^(d) as follows:

S(ω→Ω_kq) = {P(R_kq₁)/P(E₁⊞R_kq₁) + P(R_kq₂)/P(E₂⊞R_kq₂) + ⋯ + P(R_kq_d)/P(E_d⊞R_kq_d)}/d,

(19)

where P(*) is the concept size defined in (11).

This asymmetric similarity measure satisfies the following properties:

(1): 0 ≤ S(ω→Ω_kq) ≤ 1;
(2): S(ω→Ω_kq) = S(Ω_kq→ω) iff ω = Ω_kq;
(3): E ⊆ R_kq implies S(ω→Ω_kq) = 1; and
(4): S(Ω_kp→Ω_kq) ≤ S(Ω_kq→Ω_kp) iff P(R_kq) ≤ P(R_kp).

If the description E in Figure 5a is completely included in the region R_kq, Property (3) becomes valid. Figure 5b illustrates Property (4).

2.3.2. Classification Rule

Let a given pattern ω be described by E in the feature space D^(d). The given pattern ω is determined to come from class C_k if there exists a sub-concept Ω_kq for which the similarity S(ω→Ω_kq) is the largest.

It is clear that our basic problem is how to generate a sufficiently small number of subclasses that satisfy conditions (17) and (18) for each pattern class. However, it should be noted that, from Property (4) of our similarity measure, a description R with a larger object size in the feature space asserts a greater similarity to the description E of the given pattern ω (see Figure 6), while the “generality” of sub-concept Ω (i.e., the “number of sample patterns” included in Ω) corresponding to R is not directly related to the concept size of R. From this reason, we use the “compactness” to evaluate the description of a subclass in the feature space. We also use the “mutual neighborhood concept” [7] in order to assure the separability between pattern classes.

2.3.3. Generation of Subclasses by the Supervised Hierarchical Conceptual Clustering

Sample patterns ω_kp, ω_kq ∈ U_k of pattern class C_k are called mutual neighbors [7] against other classes C_j, j = 1, 2, …, M (j ≠ k), if any sample ω_jr ∈ U_j, r = 1, 2, …, N_j; j = 1, 2, …, M (j ≠ k), is separated from the Cartesian join of ω_kp and ω_kq:

(E_kp⊞E_kq)⊠E_jr = ∅, r =1, 2, …, N_j; j = 1, 2, …, M, (j ≠ k).

(20)

For each pattern class C_k, let C^c_k denote all other pattern classes except C_k, and let U^c_k be the union of the training sets U_j, j = 1, 2, …, M (j ≠ k). Then, the subclass generation based on the supervised hierarchical conceptual clustering (SHCC) is summarized as follows.

Algorithm SHCC for generating subclasses (for pattern class C_k):

We assume that each N_k sample pattern in U_k is itself a subclass of C_k.

Step (1): For each pair of subclasses ω and ω’ in U_k, calculate the compactness C(ω, ω’) = P(E⊞E’) and find the pair ω_p and ω_q in U_k that minimizes C(ω_p, ω_q) = P(E_p⊞E_q) and satisfies also the mutual neighborhood condition in (20) against U^c_k.
Step (2): Define the new subclass ω_pq by the set {ω_p, ω_q}. Then, delete ω_p and ω_q from U_k and put the new subclass ω_pq into U_k, where the subclass ω_pq is described by the Cartesian join E_pq = E_p⊞E_q in the feature space D^(d).
Step (3): Repeat Step (1) and Step (2) until the set U_k is unchanged.
Step (4): Define the subclasses Ω_k₁, Ω_k₂, …, Ω_k_m and their descriptions R_k₁, R_k₂, …, R_km in the feature space from the generated subclasses in U_k according to the cardinality of the subclasses from the largest to the smallest. (End of algorithm)

Using this algorithm, we can reduce the given set of N_k sample patterns to m subclasses. It should be noted that there is a possibility that some original sample patterns remain as subclasses depending on the interclass structure between C_k and C^c_k (see Figure 4).

3. Experimental Results

3.1. Artificial Data

In this data, two pattern classes have two subclasses, respectively, and they follow the X-or form in the plane by F1 and F2, as shown in Figure 7. We use the zero one normalized data in Table 3 to describe the clustering steps. In this table, F3, F4, and F5 are randomly generated useless features; class C1 is composed of samples numbered from 1 to 8, and the other class C2 is composed of samples A to H.

Table 4 and Table 5 summarize clustering steps to generate subclasses for each pattern class. In Table 4, the compactness of subclass (7, 8), for example, is calculated as follows:

C(7, 8) = ((0.985 − 0.530) + (0.254 − 0.030) + (0.564 − 0.221) + (0.666 − 0.259) + (0.600 − 0.319))/5 = 0.342.

Table 4. Subclass generation for class C1 by the SHCC.

Steps	Subclass	C(p, q)	F1	F2	F3	F4	F5
1	(7, 8)	0.420	E, F, G, H	A, B, C, D	F	C, E	C, E, F
2	(1, 3)	0.446	A, B, C, D	E, F, G, H		C, D, G	B, E, F
3	(2, 4)	0.547	A, B, C, D	E, F, G, H		E	F
4	((7, 8), 6)	0.580	E, F, G, H	A, B, C, D		E	F
5	((1, 3), (2, 4))	0.668	A, B, C, D	E, F, G, H
6	(((7, 8), 6), 5)	0.694	E, F, G, H	A, B, C, D			F
C11	(1, 2, 3, 4)	0.429	[0.000, 0.379]	[0.507, 0.985]
C12	(5, 6, 7, 8)	0.429	[0.485, 1.000]	[0.030, 0.373]

Table 5. Subclass generation for class C2 by the SHCC.

Steps	Subclass	C(p, q)	F1	F2	F3	F4	F5
1	(A, C)	0.387	1, 2, 3, 4	5, 6, 7, 8	6	1	8
2	(G, H)	0.414	5, 6, 7, 8	1, 2, 3, 4	4, 6	1, 6
3	((A, C), D)	0.538	1, 2, 3, 4	5, 6, 7, 8	4, 6	1
4	((G, H), F)	0.566	5, 6, 7, 8	1, 2, 3, 4	6	1
5	(((G, H), F), E)	0.628	5, 6, 7, 8	1, 2, 3, 4	6
6	(((A, C), D), B)	0.661	1, 2, 3, 4	5, 6, 7, 8	6	1
C21	(E, F, G, H)	0.628	[0.000, 0.394]	[0.000, 0.448]
C22	(A, B, C, D)	0.661	[0.485, 0.955]	[0.433, 1.000]

These tables summarize how each feature separates samples from the opposite class. As the results, we could have subclass pairs (C11, C12) and (C21, C22) that compose X-or structure in the plane by the selected features, F1 and F2.

3.2. The Hardwood Data

The data is selected from the US Geological Survey (Climate—Vegetation Atlas of North America) [8]. The following eight features describe ten selected hardwoods:

F1: Annual temperature (ANNT) (°C);

F2: January temperature (JANT) (°C);

F3: July temperature (JULT) (°C);

F4: Annual precipitation (ANNP) (mm);

F5: January precipitation (JANP) (mm);

F6: July precipitation (JULP) (mm);

F7: Growing degree days on 5 °C base ×1000 (GDC5);

F8: Moisture index (MITM).

Table 6 shows the min-max descriptions for the selected ten hardwoods data. The original data is quantile representation by seven quantile values for 0, 10, 25, 50, 75, 90, and 100 (%). We select here only 0% and 100% quantile values, since these values describe well the differences between clusters in the unsupervised hierarchical conceptual clustering [5]. In this example, class C1 and class C2 are composed of east hardwoods {E1, E2, E3, E4, E5} and west hardwoods {W1, W2, W3, W4, W5}, respectively.

Table 7 and Table 8 summarize the clustering steps to generate subclasses to each class by the SHCC. From Table 7, east hardwoods are reduced to the class C1 = (E1, E2, E3, E4, E5) and separated from west hardwoods with respect to F1max, F4max, F5min, and F6min.

On the other hand, west hardwoods are separated into subclasses C21 = (W1, W2, W3) and C22 = (W1, W2). Subclass C21 is separated from class C1 with respect to F1max, F2min, F5min, F6min, F8min, and F8max, while C22 is separated from class C1 with respect to F1max, F2max, F4min, F4max, F5min, F5max, F6min, and F7max.

Since F6min has null concept size for C21 and C22, we use the representation in Table 9 for each class.

Figure 8 shows the scatter diagram of the given hardwoods. In this figure, the maximum vertices of the west hardwoods of C21 surround the maximum vertices of the east hardwoods.

Table 10 contains test data to check the property of our classification model. Table 11 summarizes the similarity of each test sample to the subclasses. BETURA is classified as C22 with the similarity value one. CARYA and TILIA are classified as C21 and C1, respectively, with high similarity values. Both CASTANEA and ULMUS are classified to class C1 with slightly degraded similarity values compared to CARYA and TILIA. Figure 9 summarizes mutual positions of test samples and the ten hardwoods initially given. This figure shows that there is a difficulty in distinguishing between class C1 and subclass C21.

3.3. Golf Data [9]

The Golf data in Table 12 is described by four features having the following possible values or ranges:

F₁: Outlook = {overcast, rain, sunny}, F₂: Temperature = [64, 85],

F₃: Humidity = [65, 96], and F₄: Windy = {TRUE, FALSE}.

Therefore, from (9), we have

|D₁| = 3, |D₂| = 85 − 64 = 21, |D₃| = 96 − 65 = 31, and |D₄| = 2.

The compactness (15) of samples 1 and 2 in class C1, for example, is calculated as follows:

C(1, 2) = {1/3 + (83 − 72)/21 + (90 − 78)/31 + 2/2}/4 = 0.561.

Table 12. Golf data.

Class	Sample	F1: Outlook	F2: Temp. (℉)	F3: Humidity (%)	F4: Windy
	1	Overcast	72	90	TRUE
	2	Overcast	83	78	FALSE
	3	Rainy	75	80	FALSE
	4	Overcast	64	65	TRUE
C1	5	Sunny	75	70	TRUE
(Play)	6	Overcast	81	75	FALSE
	7	Rainy	68	80	FALSE
	8	Rainy	70	96	FALSE
	9	Sunny	69	70	FALSE
	A	Rainy	71	80	TRUE
	B	Rainy	65	70	TRUE
C2	C	Sunny	80	90	TRUE
(Don’t play)	D	Sunny	85	85	FALSE
	E	Sunny	72	95	FALSE

In this example, class C1 and class C2 are composed of the set {1, 2, 3, 4, 5, 6, 7, 8, 9} and the set {A, B, C, D, E}, respectively. Table 13 summarizes the result of the SHCC for class C1 (play). We have three subclasses: C11 = (1, 2, 4, 6), C12 = (3, 7, 8), and C13 = (5, 9). Subclass C11 is described by the single feature value “Outlook = Overcast”, and the compactness is 0.333. Subclass C12 is described as “Outlook = Rainy and Windy = FALSE”, and the compactness is 0.417. Subclass C13 is described as “Outlook = Sunny and Humidity = 70”, and the compactness is 0.167.

On the other hand, from Table 14, we have C21 = (C, D, E) and C22 = (A, B) with the compactness 0.328 and 0.417, respectively. Subclass C21 is described as “Outlook is Sunny and Humidity = [85, 95]”, and C22 as “Outlook = Rainy and Windy = TRUE”. These class descriptions are easily summarized as the decision tree in Figure 10, where Humidity = 77.5 is the midpoint value between [70, 70] and [85, 95].

4. Discussion

This paper presented a region-oriented pattern classification method for mixed feature-type data based on the Cartesian system model (D^(d), ⊞, ⊠). We defined the notion of the concept size and then the compactness for objects and/or clusters in the feature space D^(d) as the measure of similarity based on the Cartesian join operator ⊞. In unsupervised hierarchical conceptual clustering, the compactness played not only the role of the similarity measure, but also as the roles of the cluster quality measure and the feature-effectiveness criterion. By minimizing the compactness in each clustering step, we could find the meaningful structures, e.g., a functional structure and a cluster structure with respect to the obtained robustly informative features.

On the other hand, in the supervised hierarchical conceptual clustering of a given pattern class, we used the mutual neighborhood condition based on the Cartesian meet ⊠ to assure the separability of the class from other pattern classes, in addition to the evaluation of the compactness in each clustering step.

In the X-or problem, two pattern classes are mutually overlapped in each of the given five features. By the supervised hierarchical conceptual clustering, two subclasses of each pattern class organizing X-or form are exactly detected with respect to the selected first two features.

In the hardwood data, the SHCC generated a single class description C1 for the east hardwoods and two subclasses C21 and C22 for the west hardwoods. For the selected informative features, the scatter diagram and the classification results of test samples show a difficulty for clear separation between class C1 and subclass C21.

In the golf data, the SHCC generated three subclasses, C11 = {Outlook = Overcast}, C12 = {Outlook = Rainy and Windy = FALSE}, and C13 = {Outlook = Sunny and Humidity =70} for class C1 = (Play), while two subclasses, C21 = {Outlook = Sunny and Humidity = [85, 95]} and C22 = {Outlook = Rainy and Windy =TRUE}, for class C2 = (Don’t play) without any feature evaluation criterion used in, for example, ID3/C4.5. These results easily lead to the decision tree in Figure 10.

Finally, we should note that we could obtain subclass descriptions with informative features by applying the supervised hierarchical conceptual clustering to each pattern class against other pattern classes. This approach largely simplifies the process to design a pattern classifier for mixed feature-type symbolic data.

Author Contributions

Conceptualization and methodology, M.I. and H.Y.; original draft preparation, M.I. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JSPS KAKENHI (Grants-in-Aid for Scientific Research) Grant Number 25330268.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, H.; Motoda, H. Computational Methods of Feature Selection; CRC Press: London, UK, 2007. [Google Scholar]
Miao, J.; Niu, L. A survey on feature selection. Procedia Comput. Sci. 2016, 91, 919–926. [Google Scholar] [CrossRef]
Solorio-Fernández, S.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A. A review of unsupervised feature selection methods. Artif. Intell. Rev. 2020, 53, 907–948. [Google Scholar] [CrossRef]
Billard, L.; Diday, E. Symbolic Data Analysis: Conceptual Statistics and Data Mining; Wiley: Chichester, UK, 2007. [Google Scholar]
Huang, H.S. Supervised feature selection: A tutorial. Artif. Intell. Res. 2015, 4, 22–37. [Google Scholar] [CrossRef]
Ichino, M.; Umbleja, K.; Yaguchi, H. Unsupervised feature selection for histogram-valued symbolic data using hierarchical conceptual clustering. Stats 2021, 4, 359–384. [Google Scholar] [CrossRef]
Ichino, M.; Yaguchi, H. Symbolic pattern classifiers based on the Cartesian system model. In Data Science, Classification, and Related Methods; Hayashi, C., Yajima, K., Bock, H.-H., Ohsumi, N., Tanaka, Y., Baba, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Histogram Data by the U.S. Geological Survey, Climate-Vegetation Atlas of North America. Available online: http://pubs.usgs.gov/pp/p1650-b/ (accessed on 20 November 2010).
Wu, X.; Kumar, V. (Eds.) The Top Ten Algorithms in Data Mining; Chapman and Hall/CRC: New York, NY, USA, 2009. [Google Scholar]

Figure 1. The Cartesian join and the Cartesian meet in the Euclidean plane.

Figure 2. A counter example of property (6).

Figure 3. The results of the fats and oils data for iodine value (I) and specific gravity (S).

Figure 4. Illustration for condition (16).

Figure 5. Illustrations for the properties of the asymmetric similarity measure.

Figure 6. Illustration for the relation S(ω→Ω_kp) ≤ S(ω→Ω_kq).

Figure 7. A two-dimensional X-or problem.

Figure 8. The scatter diagram of the hardwood data.

Figure 9. The Scatter diagram of the test data.

Figure 10. Decision tree for the golf data.

Table 2. Descriptions of two pattern classes for the fats and oils data.

Class	F1	F2	F3	F4	F5
Plant oils	[0.914, 0.937]	[−27, 6]	[79, 208]	[118, 198]	[0.42, 4.85]
Fats	[0.858, 0.870]	[22, 38]	[40, 77]	[190, 202]	[0.31, 3.65]

Table 3. The X-or data.

Sample	F1 Min	F1 Max	F2 Min	F2 Max	F3 Min	F3 Max	F4 Min	F4 Max	F5 Min	F5 Max
1	0.000	0.182	0.746	0.985	0.386	0.590	0.000	0.150	0.367	0.992
2	0.212	0.379	0.657	0.970	0.299	0.330	0.262	1.000	0.507	0.524
3	0.045	0.121	0.507	0.687	0.468	0.906	0.112	0.426	0.585	0.776
4	0.197	0.364	0.522	0.731	0.778	0.949	0.409	0.856	0.157	0.877
5	0.485	0.773	0.149	0.358	0.000	1.000	0.020	0.800	0.714	0.781
6	0.848	1.000	0.224	0.373	0.925	0.941	0.722	0.893	0.044	0.778
7	0.530	0.697	0.045	0.209	0.221	0.410	0.342	0.666	0.363	0.600
8	0.727	0.985	0.030	0.254	0.459	0.564	0.259	0.609	0.319	0.338
A	0.515	0.652	0.672	0.940	0.301	0.485	0.335	0.480	0.522	0.984
B	0.697	0.955	0.701	1.000	0.108	0.862	0.297	0.767	0.235	0.339
C	0.485	0.712	0.433	0.642	0.227	0.511	0.673	0.858	0.685	0.944
D	0.682	0.909	0.478	0.612	0.399	0.448	0.466	0.489	0.030	0.725
E	0.000	0.227	0.239	0.418	0.310	0.877	0.112	0.151	0.000	0.179
F	0.212	0.394	0.224	0.448	0.668	0.799	0.303	0.752	0.971	1.000
G	0.030	0.167	0.060	0.155	0.283	0.406	0.516	0.537	0.479	0.848
H	0.258	0.364	0.000	0.299	0.211	0.629	0.352	0.386	0.013	0.735

Table 6. The hardwoods data [8].

Taxon Name	ANNT: F1	JANT: F2	JULT: F3	ANNP: F4	JANP: F5	JULP: F6	GDC5: F7	MITM: F8
ACER EAST: E1 min	−2.3	−24.6	11.5	415	10	56	0.5	0.62
max	23.8	18.9	28.8	1630	166	222	6.8	1.00
ACER WEST: W1 min	−3.9	−23.8	7.1	105	5	0	0.1	0.14
max	20.6	11.0	29.2	4370	616	160	5.6	1.00
ALNUS EAST: E2 min	−10.2	−30.9	7.1	220	9	28	0.1	0.22
max	20.9	14.1	29.1	1650	166	212	5.9	1.00
ALNUS WEST: W2 min	−12.2	−30.5	7.1	170	4	0	0.1	0.22
max	18.7	10.8	28.3	4685	667	452	4.8	1.00
FRAXINUS EAST: E3 min	−2.3	−23.8	13.5	270	6	18	0.8	0.39
max	23.2	18.1	29.5	1630	166	218	6.7	1.00
FRAXINUS EAST: W3 min	2.6	−7.4	12.5	85	5	0	0.9	0.09
max	24.4	16.9	33.1	2555	414	206	6.9	0.97
JUGLANS EAST: E4 min	1.3	−14.6	15.2	525	9	41	1.0	0.63
max	21.4	12.4	29.4	1560	150	204	6.0	1.00
JUGLANS WEST: W4 min	7.3	−1.3	17.1	235	1	0	1.6	0.20
max	26.6	26.2	31.3	1245	166	328	8.5	0.94
QUERCUS EAST: E5 min	−1.5	−22.7	13.5	240	7	32	0.8	0.21
max	24.2	19.6	31.8	1630	161	222	7.0	1.00
QUERCUS EAST: W5 min	−1.5	−12.0	9.7	85	1	0	0.3	0.08
max	27.2	26.2	33.8	2555	400	350	8.5	0.99

Table 7. Subclass generation for east hardwoods by the SHCC.

Steps	1		2		3		4
Subclass	(E1, E4)		((E1, E4), E3)		(((E1, E4), E3), E5)		((((E1, E4), E3), E5), E2)
	C(p, q) = 0.520	Separability	C(p, q) = 0.562	Separability	C(p, q) = 0.604	Separability	C(p, q) = 0.671	Separability
F1min	[−2.3, 1.3]	W1–W4	[−2.3, 1.3]	W1–W4	[−2.3, 1.3]	W1–W4	[−10.2, 1.3]	W2, W3, W4
F1max	[21.4, 23.8]	W1–W5	[21.4, 23.8]	W1–W5	[21.4, 24.2]	W1–W5	[20.9, 24.2]	W1–W5
F2min	[−24.6, −14.6]	W2, W3, W4, W5	[−24.6, −14.6]	W2, W3, W4, W5	[−24.6, −14.6]	W2, W3, W4, W5	[−30.9, −14.6]	W3, W4, W5
F2max	[12.4, 18.9]	W1, W2, W4, W5	[12.4, 18.9]	W1, W2, W4, W5	[12.4, 19.6]	W1, W2, W4, W5	[12.4, 19.6]	W1, W2, W4, W5
F3min	[11.5, 15.2]	W1, W2, W4, W5	[11.5, 15.2]	W1, W2, W4, W5	[11.5, 15.2]	W1, W2, W4, W5	[7.1, 15.2]	W4
F3max	[28.8, 29.4]	W2–W5	[28.8, 29.5]	W2–W5	[28.8, 31.8]	W2, W3, W5	[28.8, 31.8]	W2, W3, W5
F4min	[415, 525]	W1–W5	[270, 525]	W1–W5	[240, 525]	W1–W5	[220, 525]	W1, W2, W3, W5
F4max	[1560, 1630]	W1–W5	[1560, 1630]	W1–W5	[1560, 1630]	W1–W5	[1560, 1650]	W1–W5
F5min	[9, 10]	W1–W5	[6, 10]	W1–W5	[6, 10]	W1–W5	[6, 10]	W1–W5
F5max	[150, 166]	W1, W2, W3, W5	[150, 166]	W1, W2, W3, W5	[150, 166]	W1, W2, W3, W5	[150, 166]	W1, W2, W3, W5
F6min	[41, 56]	W1–W5	[18, 56]	W1–W5	[18, 56]	W1–W5	[18, 56]	W1–W5
F6max	[204, 222]	W1, W2, W4, W5	[204, 222]	W1, W2, W4, W5	[204, 222]	W1, W2, W4, W5	[204, 222]	W1, W2, W4, W5
F7min	[0.5, 1.0]	W1, W2, W4, W5	[0.5, 1.0]	W1, W2, W4, W5	[0.5, 1.0]	W1, W2, W4, W5	[0.1, 1.0]	W4
F7max	[6.0, 6.8]	W1–W5	[6.0, 6.8]	W1–W5	[6.0, 7.0]	W1, W2, W4, W5	[5.9, 7.0]	W1, W2, W4, W5
F8min	[0.62, 0.63]	W1–W5	[0.39, 0.63]	W1–W5	[0.21, 0.63]	W1, W3, W4, W5	[0.21, 0.63]	W1, W3, W4, W5
F8max	[1.0, 1.0]	W3, W4, W5	[1.0, 1.0]	W3, W4, W5	[1.0, 1.0]	W3, W4, W5	[1.0, 1.0]	W3, W4, W5

Table 8. Subclass generation for west hardwoods by the SHCC.

Steps	1		2		3
Subclass	(W3, W4)		((W3, W4), W5)		(W1, W2)
	C(p, q) = 0.714	Separability	C(p, q) = 0.775	Separability	C(p, q) = 0.872	Separability
F1min	[2.6, 7.3]	E1–E5	[−1.5, 7.3]	E1, E2, E3	[−12.2, −3.9]	E1, E3, E4
F1max	[24.4, 26.6]	E1–E5	[24.4, 27.2]	E1–E5	[18.7, 20.6]	E1–E5
F2min	[−7.4, −1.3]	E1–E5	[−12.0, −1.3]	E1–E5	[−30.5, −23.8]	E2, E4, E5
F2max	[16.9, 26.2]	E2, E4	[16.9, 26.2]	E2, E4	[10.8, 11.0]	E1–E5
F3min	[12.5, 17.1]	E1, E2	[9.7, 17.1]	E2	[7.1, 7.1]	E1, E3, E4, E5
F3max	[31.3, 33.1]	E1, E2, E3, E4	[31.3, 33.8]	E1, E2, E3, E4	[28.3, 29.2]	E3, E4, E5
F4min	[85, 235]	E1, E3, E4, E5	[85, 235]	E1, E3, E4, E5	[105, 170]	E1–E5
F4max	[1245, 2555]	None	[1245, 2555]	None	[4370, 4685]	E1–E5
F5min	[1, 5]	E1–E5	[1, 5]	E1–E5	[4, 5]	E1–E5
F5max	[166, 414]	E4, E5	[166, 414]	E4, E5	[616, 667]	E1–E5
F6min	[0, 0]	E1–E5	[0, 0]	E1–E5	[0, 0]	E1–E5
F6max	[206, 328]	E4	[206, 350]	E4	[160, 452]	None
F7min	[0.9, 1.6]	E1, E2, E3, E5	[0.3, 1.6]	E2	[0.1, 0.1]	E1, E3, E4, E5
F7max	[6.9, 8.5]	E1, E2, E3, E4	[6.9, 8.5]	E1, E2, E3, E4	[4.8, 5.6]	E1–E5
F8min	[0.09, 0.20]	E1–E5	[0.08, 0.20]	E1–E5	[0.14, 0,22]	E1, E3, E4
F8max	[0.94, 0.97]	E1–E5	[0.94, 0.99]	E1–E5	[1.0, 1.0]	None

Table 9. Three subclasses obtained for the hardwood data.

Class	ANNT Max	ANNP Max	JANP Min
C1: E1–E5	[20.9, 24.2]	[1560, 1650]	[6, 10]
C21: W3, W4, W5	[24.2, 27.2]	[1245, 2555]	[1, 5]
C22: W1, W2	[18.7, 20.6]	[4370, 4685]	[4, 5]

Table 10. A test dataset.

Taxon Name	ANNT	ANNP	JANP
BETURA	[−13.4, 20.3]	[90, 4370]	[4, 612]
CARYA	[3.6, 23.5]	[410, 1755]	[2, 150]
CASTANEA	[4.4, 21.5]	[765, 1630]	[32, 150]
TILIA	[1.1, 19.9]	[415, 1560]	[9, 150]
ULMUS	[−2.3, 23.8]	[325, 1145]	[6, 166]

Table 11. The similarity values of the test hardwoods to each pattern class.

Class	Features	BETURA	CARYA	CASTANEA	TILIA	ULMUS
C1	F1max	0.846	1	1	0.767	1
	F4max	0.032	0.462	1	1	0.178
	F5min	0.667	0.5	0.154	1	1
	Avg.	0.504	0.654	0.718	0.922	0.726
C21	F1max	0.074	0.811	0.526	0.411	0.235
	F4max	0.419	1	1	1	0.929
	F5min	1	1	0.129	0.5	0.8
	Avg.	0.498	0.937	0.552	0.637	0.655
C22	F1max	1	0.396	0.679	1	0.373
	F4max	1	0.108	0.103	0.101	0.089
	F5min	1	1	0.036	0.2	0.8
	Avg.	1	0.501	0.272	0.403	0.421

Table 13. Subclass generation for class C1 (Play) by the SHCC.

Steps	Subclass	C(p, q)	Outlook	Temp.	Humidity	Windy
1	(2, 6)	0.256	A, B, C, D, E	A, B, C, D, E	A, B, C, D, E	A, B, C
2	(3, 7)	0.291	C, D, E	B, C, D	B, C, D, E	A, B, C
3	(5, 9)	0.405	A, B	B, C, E	A, C, D, E	A, B, C
4	((3, 7), 8)	0.421	C, D, E	C, D	B, D	A, B, C
5	(1, 4)	0.505	A, B, C, D, E	C, D	E	D, E
6	((2, 6), (1, 4))	0.761	A, B, C, D, E	D	E
C11	(1, 2, 4, 6)	0.333	Overcast
C12	(3, 7, 8)	0.417	Rainy			FALSE
C13	(5, 9)	0.167	Sunny		[70, 70]

Table 14. Subclass generation for class C2 (Don’t play) by the SHCC.

Steps	Subclass	C(p, q)	Outlook	Temp.	Humidity	Windy?
1	(C, D)	0.244	1, 2, 3, 4, 6, 7, 8	1, 3, 4, 5, 7, 8, 9	2, 3, 4, 5, 6, 7, 8, 9
2	(A, B)	0.360	1, 2, 4, 5, 6, 9	7, 8, 9	2, 3, 5, 6, 7, 9	2, 3, 6, 7, 8, 9
3	((C, D), E)	0.569	1, 2, 3, 4, 6, 7, 8	1, 2, 3, 5, 6	2, 3, 4, 5, 6, 7, 8, 9
C21	(C, D, E)	0.328	Sunny		[85, 95]
C22	(A, B)	0.417	Rainy			TRUE

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ichino, M.; Yaguchi, H. Pattern Classification for Mixed Feature-Type Symbolic Data Using Supervised Hierarchical Conceptual Clustering. Stats 2025, 8, 76. https://doi.org/10.3390/stats8030076

AMA Style

Ichino M, Yaguchi H. Pattern Classification for Mixed Feature-Type Symbolic Data Using Supervised Hierarchical Conceptual Clustering. Stats. 2025; 8(3):76. https://doi.org/10.3390/stats8030076

Chicago/Turabian Style

Ichino, Manabu, and Hiroyuki Yaguchi. 2025. "Pattern Classification for Mixed Feature-Type Symbolic Data Using Supervised Hierarchical Conceptual Clustering" Stats 8, no. 3: 76. https://doi.org/10.3390/stats8030076

APA Style

Ichino, M., & Yaguchi, H. (2025). Pattern Classification for Mixed Feature-Type Symbolic Data Using Supervised Hierarchical Conceptual Clustering. Stats, 8(3), 76. https://doi.org/10.3390/stats8030076

Article Menu

Pattern Classification for Mixed Feature-Type Symbolic Data Using Supervised Hierarchical Conceptual Clustering

Abstract

1. Introduction

2. Cartesian System Model (CSM) and Feature Selection Using Hierarchical Conceptual Clustering

2.1. The Cartesian System Model (CSM)

2.1.1. The Cartesian Join Operator

2.1.2. The Cartesian Meet Operator

2.1.3. Concept Size

2.2. Measure of Compactness and Unsupervised Hierarchical Conceptual Clustering

2.2.1. Measure of Compactness

2.2.2. Unsupervised Hierarchical Conceptual Clustering

2.3. Classification Model Based on the Supervised Conceptual Clustering

2.3.1. Classification Model

2.3.2. Classification Rule

2.3.3. Generation of Subclasses by the Supervised Hierarchical Conceptual Clustering

3. Experimental Results

3.1. Artificial Data

3.2. The Hardwood Data

3.3. Golf Data [9]

4. Discussion

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI