Abstract
This paper describes a region-oriented method of pattern classification based on the Cartesian system model (CSM), a mathematical model that allows manipulating mixed feature-type symbolic data. We use the supervised hierarchical conceptual clustering to generate class regions for respective pattern class based on the evaluation of the generality of the regions and the separability of the regions against other classes in each clustering step. We can easily find the robustly informative features to describe each pattern class against other pattern classes. Some examples show the effectiveness of the proposed method.
1. Introduction
Feature selection has been extensively developed in pattern recognition, data mining [1,2,3], and, more generally, symbolic data analysis [4]. Feature selection has two main problems: how to evaluate the importance of the given features and how to find a set of important feature subset from the given set of features. As summarized in [1,2,3,5], many feature selection methods have been developed by combining various feature evaluation measures (distance measure, information measure, dependency measure, consistency measure, and classification error rate) and search strategies (heuristic, complete, and random).
The authors reported an unsupervised feature selection method using the hierarchical conceptual clustering for histogram-valued symbolic data [6]. In this method, each object is described by a series of quantile vectors, and the concept size, called the compactness, is used as the similarity measure between objects and/or clusters represented by the quantile vectors. It should be noted that the compactness also plays the roles of cluster quality criterion and feature-effectiveness criterion. This fact simplifies the process of unsupervised feature selection.
As a pattern classifier for mixed feature-type symbolic data, the authors proposed region-oriented method based on the Cartesian system model (CSM) [7], where the authors pointed out the trade-off between the "separability" of classes and the "generality" of class descriptions, and asserted the importance of feature selection.
The purpose of this paper is to realize a feature selection method for our region-oriented classifier using the supervised hierarchical conceptual clustering.
In Section 2, the CSM is defined by the triplet (D(d), ⊞, ⊠), where D(d) is the feature space composed of d mixed-type features. We define the Cartesian join operator ⊞ and the Cartesian meet operator ⊠ on the feature space. The Cartesian join generates a generalized description of the given two descriptions in the feature space. On the other hand, the Cartesian meet extracts a common description from the given two descriptions in the feature space. We define the measure of compactness with respect to the Cartesian join in order to evaluate the generality of cluster descriptions in each clustering step. Subsection 2.2 describes the algorithm of our unsupervised hierarchical conceptual clustering using the compactness. Subsection 2.3 describes our classification model based on the supervised hierarchical conceptual clustering. This clustering algorithm checks, in each clustering step, not only the generality of the class description but also the separability of the class description against other pattern classes using the Cartesian meet operator.
2. Cartesian System Model (CSM) and Feature Selection Using Hierarchical Conceptual Clustering
In this section, we describe the Cartesian system model, a mathematical model for mixed feature-type symbolic data. We define a measure to evaluate the compactness of the generated concepts, and we describe an algorithm of the unsupervised hierarchical conceptual clustering. Then, we describe the classification model based on the supervised hierarchical conceptual clustering.
2.1. The Cartesian System Model (CSM)
Let Dk be the domain of feature Fk, k =1, 2, …, d. Then, the feature space is defined by
D(d)= D1 × D2 × ⋯ × Dd.
Since we permit the simultaneous use of various feature types, we use the notation D(d) for the feature space in order to distinguish it from usual d-dimensional Euclidean space Dd. Each element of D(d) is represented by
where Ek is the feature value taken by the feature Fk, k = 1, 2, …, d. We are able to use the following feature types:
E = E1 × E2 × ⋯ × Ed or E = (E1, E2, …, Ed).
- (1)
- Continuous quantitative feature (e.g., height and weight);
- (2)
- Discrete quantitative feature (e.g., the number of family members);
- (3)
- Ordinal qualitative feature (e.g., academic career, etc., where there is some kind of ordered relationships between values)
- (4)
- Nominal qualitative feature (e.g., gender and blood type).
When we use feature types (1), (2), and (3), we permit interval values of the form Ek = [a, b], and in the case of feature type (4), we allow for finite sets as feature values. The Cartesian product (2) described in terms of feature types (1)–(4) is called an event in the feature space D(d).
2.1.1. The Cartesian Join Operator
The Cartesian join, A⊞B, of a pair of events A = (A1, A2, …, Ad) and B = (B1, B2, …, Bd) in the feature space D(d), is defined by
where [Ak⊞Bk] is the Cartesian join of feature values Ak and Bk for feature Fk and is defined as follows.
A⊞B = [A1⊞B1] × [A2⊞B2] × ⋯ × [Ad⊞Bd],
When Fk is a quantitative or an ordinal qualitative feature, let Ak = [AkL, AkU] and Bk = [BkL, BkU], then [Ak⊞Bk] is the closed interval given by
[Ak⊞Bk] = [min(AkL, BkL), max(AkU, BkU)].
When Fk is a nominal feature, [Ak⊞Bk] is the union:
[Ak⊞Bk] = Ak ∪ Bk.
Figure 1a illustrates the Cartesian join of two interval-valued events A and B in the Euclidean plane.
Figure 1.
The Cartesian join and the Cartesian meet in the Euclidean plane.
2.1.2. The Cartesian Meet Operator
The Cartesian meet A⊠B of a pair of events A = (A1, A2, …, Ad) and B = (B1, B2, …, Bd) in the feature space D(d) is defined by
where [Ak⊠Bk] is the Cartesian meet of feature values Ak and Bk for feature Fk defined by the intersection:
A⊠B = [A1⊠B1] × [A2⊠B2] × ⋯ × [Ad⊠Bd],
[Ak⊠Bk] = Ak ∩ Bk.
When the intersection (7) takes the empty value ∅, for at least one feature, the events A and B have no common part. We denote this fact by
and we say that A and B are completely distinguishable. Figure 1b illustrates the Cartesian meet of two interval-valued events, A and B. We call the triplet (D(d), ⊞, ⊠) the Cartesian System Model (CSM).
A⊠B = ∅,
2.1.3. Concept Size
Let U = {ω1, ω2, …, ωN} be the given set of sample objects without class labels, and let each sample object ωk be described by an event Ek = Ek1 × Ek2 × ⋯ × Ekd in D(d). We define the concept size P(Ekj) of ωk in terms of feature Fj as the probability:
where |Dj| is the length of the interval Dj, if feature Fj is type (1), (2), or (3) in Section 2.1, and is the number of elements in the set Dj if feature Fj is type (4). Then, we define the concept size P(Ek) of ωk in the feature space D(d) by the arithmetic mean:
P(Ekj) = |Ekj|/|Dj|, j = 1, 2, …, d; k = 1, 2, …, N, and
0 ≤ P(Ekj) ≤ 1, j = 1, 2, …, d; k = 1, 2, …, N,
P(Ek) = {P(Ek1) + P(Ek2) + ⋯ + P(Ekd)}/d, and
0 ≤ P(Ek) ≤ 1, k = 1, 2, …, N.
2.2. Measure of Compactness and Unsupervised Hierarchical Conceptual Clustering
2.2.1. Measure of Compactness
In the CSM, each sample object ωp is described as an event Ep which is equivalent to a conjunctive logical expression and is regarded as a minimum unit of concept described by d features. Now, we define the measure of compactness for the concept generated by two sample objects ωp and ωq, written C(ωp, ωq), as follows:
C(ωp, ωq) = P(Ep⊞Eq) = {P(Ep1⊞Eq1) + P(Ep2⊞Eq2) + ⋯ + P(Epd⊞Eqd)}/d.
Since the Cartesian join Ep⊞Eq generates the smallest description spanned by the given two events Ep and Eq, the compactness C(ωp, ωq) evaluates the quantitative size of the generated “concept”. The compactness satisfies the following properties:
- (1)
- 0 ≤ C(ωp, ωq) ≤ 1;
- (2)
- C(ωp, ωp) = P(Ep) ≥ 0;
- (3)
- C(ωp, ωq) = C(ωq, ωp);
- (4)
- C(ωp, ωq) = 0 iff Ei ≡El and has null size (P(Ei) = 0);
- (5)
- C(ωp, ωp), C(ωq, ωq) ≤ C(ωp, ωq);
- (6)
Figure 2. A counter example of property (6).
2.2.2. Unsupervised Hierarchical Conceptual Clustering
Let U = {ω1, ω2, …, ωN} be the given set of sample objects, and let each ωk be described by an event Ek = Ek1 × Ek2 × ⋯ × Ekd in the feature space D(d). By using the measure of compactness defined in (13), we can construct the following algorithm [6].
Algorithm of the unsupervised hierarchical conceptual clustering:
- Step (1)
- For each pair of sample objects ω and ω’ in U, calculate the compactness C(ω, ω’) in (13) and find the pair ωp and ωq that has the minimum compactness.
- Step (2)
- Generate the merged concept ωpq of ωp and ωq in U, and replace ωp and ωq in U by the new concept ωpq, where ωpq is described by the Cartesian join Epq = Ep⊞Eq in the feature space D(d).
- Step (3)
- Repeat Step 2 until U includes only one concept (i.e., the whole concept generated by the given N sample objects). (End of Algorithm)
In the above algorithm, Step 1 finds the most similar sample objects and/or sub-concepts that should be merged into a single concept, and then Step 2 characterizes a new extensional concept by the Cartesian join region, which is equivalent to a conjunctive logical expression. Since minimizing the concept size of the Cartesian join region means to maximize the dissimilarity of the join region from the whole region D(d), the compactness plays the role of the cluster quality criterion. We should also note the fact that through the steps of agglomeration process, we can know not only the sizes of sub-concepts but also which features create the differences between sub-concepts. Therefore, the compactness also plays the role of the feature-effectiveness criterion.
As an illustrative example, we apply our unsupervised hierarchical conceptual clustering to the data in Table 1, where each sample is described by five features:
- (1)
- Specific gravity,
- (2)
- Freezing point (°C),
- (3)
- Iodine value,
- (4)
- Saponification value,
- (5)
- Major acids.
Figure 3 shows the results for two selected informative features: iodine value and specific gravity. We have three explicit clusters: linseed and perilla; cotton, sesame, olive, and camellia; and beef and hog. These three clusters take similar concept-size values for the selected two features.
Figure 3.
The results of the fats and oils data for iodine value (I) and specific gravity (S).
Table 1.
Fats and oils data [6].
Table 1.
Fats and oils data [6].
| Sample | Specific G. | Freezing P. | Iodine V. | Saponification V. | Major Acids |
|---|---|---|---|---|---|
| Linseed | [0.930, 0.935] | [−27, −18] | [170, 204] | [118, 196] | [1.75, 4.81] |
| Perilla | [0.930, 0.937] | [−5, −4] | [192, 208] | [188, 197] | [0.77, 4.85] |
| Cotton | [0.916, 0.918] | [−6, −1] | [99, 113] | [189, 198] | [0.42, 3.84] |
| Sesame | [0.920, 0.926] | [−6,−4] | [104, 116] | [187, 193] | [0.91, 3.77] |
| Camellia | [0.916, 0.917] | [−21, −15] | [80, 82] | [189, 193] | [2.00, 2.98] |
| Olive | [0.914, 0.919] | [0, 6] | [79, 90] | [187, 196] | [0.83, 4.02] |
| Beef | [0.860, 0.870] | [30, 38] | [40, 48] | [190, 199] | [0.31, 2.89] |
| Hog | [0.858, 0.864] | [22, 32] | [53, 77] | [190, 202] | [0.37, 3.65] |
2.3. Classification Model Based on the Supervised Conceptual Clustering
In this subsection, we describe a classification model, and we use the term “sample pattern” when referring to a sample object.
2.3.1. Classification Model
Let C1, C2, …, CM be M pattern classes and, for each pattern class Ck, let
be the given set of Nk sample patterns for class Ck. Let each sample pattern ωkj in class Ck be represented in the d-dimensional feature space D(d) as follows:
Uk = {ωk1, ωk2, …, ωkNk}, k = 1, 2, …, M
Ekj = Ekj1 × Ekj2 × ⋯ × Ekjd, j = 1, 2, …, Nk; k = 1, 2, …, M.
From the consistency viewpoint, we assume that, for any pair of p and q (≠ p), Up and Uq have no common sample patterns:
Up ∩ Uq = ∅.
This condition, however, does not mean that two pattern classes share the disjoint regions in the feature space D(d). For example, in Figure 4, two pattern classes share an overlapped region O, but the training sets do not have any common sample pattern.
Figure 4.
Illustration for condition (16).
We use the term subclass for a subset of sample patterns from a pattern class. A subclass of a pattern class is described by a closed region in the feature space D(d). A closed region for a subclass is reduced to a hyper box when the feature space D(d) is a Euclidean space. Moreover, a sample pattern given as a point in the feature space D(d) is also regarded as a closed region that describes a subclass.
Now, let Ωkp and Rkp, p =1, 2, …, mk (≤Nk), be mk subsets of the training set Uk, i.e., subclasses of class Ck, and their representation in the feature space D(d), respectively, such that
where ∅ denotes that two descriptions Ekp and Rjq are completely distinguishable. The existence of subclasses satisfying (17) and (18) is guaranteed by the consistency condition (16). In fact, if we set Ωkp = {ωkp} and Rkp= Ekp, p =1, 2, …, Nk, for each pattern class Ck, these reduced subclasses satisfy conditions (17) and (18).
Ekp⊆ (Rk1∪ Rk2∪ ⋯ ∪ Rkmk), p = 1, 2, …, Nk, k = 1, 2, …, M; and
Ekp⊠Rjq = ∅, p = 1, 2, …, Nk, q = 1, 2, …, mj, j = 1, 2, …, M (j ≠ k),
In the fats and oils dataset, we assume two pattern classes: plant oils and fats. If we take the Cartesian join of all plant oils and fats, respectively, with respect to each feature, we have the result in Table 2. The two pattern classes are completely disjoint with respect to the first three features: specific gravity, freezing point, and iodine value. In unsupervised feature selection, the most informative features throughout the clustering process are specific gravity and iodine value. However, the results of Table 2 suggest that freezing point is more informative than iodine value in the classification between plant oils and fats. This is why we need to check the separability of a subclass from other pattern classes, in addition to checking its generality at each clustering step.
Table 2.
Descriptions of two pattern classes for the fats and oils data.
We define an asymmetric similarity measure from a pattern ω described by E = E1 × E2 × ⋯ × Ed in D(d) to a subclass Ωkq described by Rkq= Rkq1 × Rkq2 × ⋯ × Rkqd in D(d) as follows:
where P(*) is the concept size defined in (11).
S(ω→Ωkq) = {P(Rkq1)/P(E1⊞Rkq1) + P(Rkq2)/P(E2⊞Rkq2) + ⋯ + P(Rkqd)/P(Ed⊞Rkqd)}/d,
This asymmetric similarity measure satisfies the following properties:
- (1)
- 0 ≤ S(ω→Ωkq) ≤ 1;
- (2)
- S(ω→Ωkq) = S(Ωkq→ω) iff ω = Ωkq;
- (3)
- E ⊆ Rkq implies S(ω→Ωkq) = 1; and
- (4)
- S(Ωkp→Ωkq) ≤ S(Ωkq→Ωkp) iff P(Rkq) ≤ P(Rkp).
If the description E in Figure 5a is completely included in the region Rkq, Property (3) becomes valid. Figure 5b illustrates Property (4).
Figure 5.
Illustrations for the properties of the asymmetric similarity measure.
2.3.2. Classification Rule
Let a given pattern ω be described by E in the feature space D(d). The given pattern ω is determined to come from class Ck if there exists a sub-concept Ωkq for which the similarity S(ω→Ωkq) is the largest.
It is clear that our basic problem is how to generate a sufficiently small number of subclasses that satisfy conditions (17) and (18) for each pattern class. However, it should be noted that, from Property (4) of our similarity measure, a description R with a larger object size in the feature space asserts a greater similarity to the description E of the given pattern ω (see Figure 6), while the “generality” of sub-concept Ω (i.e., the “number of sample patterns” included in Ω) corresponding to R is not directly related to the concept size of R. From this reason, we use the “compactness” to evaluate the description of a subclass in the feature space. We also use the “mutual neighborhood concept” [7] in order to assure the separability between pattern classes.
Figure 6.
Illustration for the relation S(ω→Ωkp) ≤ S(ω→Ωkq).
2.3.3. Generation of Subclasses by the Supervised Hierarchical Conceptual Clustering
Sample patterns ωkp, ωkq ∈ Uk of pattern class Ck are called mutual neighbors [7] against other classes Cj, j = 1, 2, …, M (j ≠ k), if any sample ωjr ∈ Uj, r = 1, 2, …, Nj; j = 1, 2, …, M (j ≠ k), is separated from the Cartesian join of ωkp and ωkq:
(Ekp⊞Ekq)⊠Ejr = ∅, r =1, 2, …, Nj; j = 1, 2, …, M, (j ≠ k).
For each pattern class Ck, let Cck denote all other pattern classes except Ck, and let Uck be the union of the training sets Uj, j = 1, 2, …, M (j ≠ k). Then, the subclass generation based on the supervised hierarchical conceptual clustering (SHCC) is summarized as follows.
Algorithm SHCC for generating subclasses (for pattern class Ck):
We assume that each Nk sample pattern in Uk is itself a subclass of Ck.
- Step (1)
- For each pair of subclasses ω and ω’ in Uk, calculate the compactness C(ω, ω’) = P(E⊞E’) and find the pair ωp and ωq in Uk that minimizes C(ωp, ωq) = P(Ep⊞Eq) and satisfies also the mutual neighborhood condition in (20) against Uck.
- Step (2)
- Define the new subclass ωpq by the set {ωp, ωq}. Then, delete ωp and ωq from Uk and put the new subclass ωpq into Uk, where the subclass ωpq is described by the Cartesian join Epq = Ep⊞Eq in the feature space D(d).
- Step (3)
- Repeat Step (1) and Step (2) until the set Uk is unchanged.
- Step (4)
- Define the subclasses Ωk1, Ωk2, …, Ωkm and their descriptions Rk1, Rk2, …, Rkm in the feature space from the generated subclasses in Uk according to the cardinality of the subclasses from the largest to the smallest. (End of algorithm)
Using this algorithm, we can reduce the given set of Nk sample patterns to m subclasses. It should be noted that there is a possibility that some original sample patterns remain as subclasses depending on the interclass structure between Ck and Cck (see Figure 4).
3. Experimental Results
3.1. Artificial Data
In this data, two pattern classes have two subclasses, respectively, and they follow the X-or form in the plane by F1 and F2, as shown in Figure 7. We use the zero one normalized data in Table 3 to describe the clustering steps. In this table, F3, F4, and F5 are randomly generated useless features; class C1 is composed of samples numbered from 1 to 8, and the other class C2 is composed of samples A to H.
Figure 7.
A two-dimensional X-or problem.
Table 3.
The X-or data.
Table 4 and Table 5 summarize clustering steps to generate subclasses for each pattern class. In Table 4, the compactness of subclass (7, 8), for example, is calculated as follows:
C(7, 8) = ((0.985 − 0.530) + (0.254 − 0.030) + (0.564 − 0.221) + (0.666 − 0.259) + (0.600 − 0.319))/5 = 0.342.
Table 4.
Subclass generation for class C1 by the SHCC.
Table 4.
Subclass generation for class C1 by the SHCC.
| Steps | Subclass | C(p, q) | F1 | F2 | F3 | F4 | F5 |
|---|---|---|---|---|---|---|---|
| 1 | (7, 8) | 0.420 | E, F, G, H | A, B, C, D | F | C, E | C, E, F |
| 2 | (1, 3) | 0.446 | A, B, C, D | E, F, G, H | C, D, G | B, E, F | |
| 3 | (2, 4) | 0.547 | A, B, C, D | E, F, G, H | E | F | |
| 4 | ((7, 8), 6) | 0.580 | E, F, G, H | A, B, C, D | E | F | |
| 5 | ((1, 3), (2, 4)) | 0.668 | A, B, C, D | E, F, G, H | |||
| 6 | (((7, 8), 6), 5) | 0.694 | E, F, G, H | A, B, C, D | F | ||
| C11 | (1, 2, 3, 4) | 0.429 | [0.000, 0.379] | [0.507, 0.985] | |||
| C12 | (5, 6, 7, 8) | 0.429 | [0.485, 1.000] | [0.030, 0.373] |
Table 5.
Subclass generation for class C2 by the SHCC.
Table 5.
Subclass generation for class C2 by the SHCC.
| Steps | Subclass | C(p, q) | F1 | F2 | F3 | F4 | F5 |
|---|---|---|---|---|---|---|---|
| 1 | (A, C) | 0.387 | 1, 2, 3, 4 | 5, 6, 7, 8 | 6 | 1 | 8 |
| 2 | (G, H) | 0.414 | 5, 6, 7, 8 | 1, 2, 3, 4 | 4, 6 | 1, 6 | |
| 3 | ((A, C), D) | 0.538 | 1, 2, 3, 4 | 5, 6, 7, 8 | 4, 6 | 1 | |
| 4 | ((G, H), F) | 0.566 | 5, 6, 7, 8 | 1, 2, 3, 4 | 6 | 1 | |
| 5 | (((G, H), F), E) | 0.628 | 5, 6, 7, 8 | 1, 2, 3, 4 | 6 | ||
| 6 | (((A, C), D), B) | 0.661 | 1, 2, 3, 4 | 5, 6, 7, 8 | 6 | 1 | |
| C21 | (E, F, G, H) | 0.628 | [0.000, 0.394] | [0.000, 0.448] | |||
| C22 | (A, B, C, D) | 0.661 | [0.485, 0.955] | [0.433, 1.000] |
These tables summarize how each feature separates samples from the opposite class. As the results, we could have subclass pairs (C11, C12) and (C21, C22) that compose X-or structure in the plane by the selected features, F1 and F2.
3.2. The Hardwood Data
The data is selected from the US Geological Survey (Climate—Vegetation Atlas of North America) [8]. The following eight features describe ten selected hardwoods:
F1: Annual temperature (ANNT) (°C);
F2: January temperature (JANT) (°C);
F3: July temperature (JULT) (°C);
F4: Annual precipitation (ANNP) (mm);
F5: January precipitation (JANP) (mm);
F6: July precipitation (JULP) (mm);
F7: Growing degree days on 5 °C base ×1000 (GDC5);
F8: Moisture index (MITM).
Table 6 shows the min-max descriptions for the selected ten hardwoods data. The original data is quantile representation by seven quantile values for 0, 10, 25, 50, 75, 90, and 100 (%). We select here only 0% and 100% quantile values, since these values describe well the differences between clusters in the unsupervised hierarchical conceptual clustering [5]. In this example, class C1 and class C2 are composed of east hardwoods {E1, E2, E3, E4, E5} and west hardwoods {W1, W2, W3, W4, W5}, respectively.
Table 6.
The hardwoods data [8].
Table 7 and Table 8 summarize the clustering steps to generate subclasses to each class by the SHCC. From Table 7, east hardwoods are reduced to the class C1 = (E1, E2, E3, E4, E5) and separated from west hardwoods with respect to F1max, F4max, F5min, and F6min.
Table 7.
Subclass generation for east hardwoods by the SHCC.
Table 8.
Subclass generation for west hardwoods by the SHCC.
On the other hand, west hardwoods are separated into subclasses C21 = (W1, W2, W3) and C22 = (W1, W2). Subclass C21 is separated from class C1 with respect to F1max, F2min, F5min, F6min, F8min, and F8max, while C22 is separated from class C1 with respect to F1max, F2max, F4min, F4max, F5min, F5max, F6min, and F7max.
Since F6min has null concept size for C21 and C22, we use the representation in Table 9 for each class.
Table 9.
Three subclasses obtained for the hardwood data.
Figure 8 shows the scatter diagram of the given hardwoods. In this figure, the maximum vertices of the west hardwoods of C21 surround the maximum vertices of the east hardwoods.
Figure 8.
The scatter diagram of the hardwood data.
Table 10 contains test data to check the property of our classification model. Table 11 summarizes the similarity of each test sample to the subclasses. BETURA is classified as C22 with the similarity value one. CARYA and TILIA are classified as C21 and C1, respectively, with high similarity values. Both CASTANEA and ULMUS are classified to class C1 with slightly degraded similarity values compared to CARYA and TILIA. Figure 9 summarizes mutual positions of test samples and the ten hardwoods initially given. This figure shows that there is a difficulty in distinguishing between class C1 and subclass C21.
Table 10.
A test dataset.
Table 11.
The similarity values of the test hardwoods to each pattern class.
Figure 9.
The Scatter diagram of the test data.
3.3. Golf Data [9]
The Golf data in Table 12 is described by four features having the following possible values or ranges:
F1: Outlook = {overcast, rain, sunny}, F2: Temperature = [64, 85],
F3: Humidity = [65, 96], and F4: Windy = {TRUE, FALSE}.
Therefore, from (9), we have
|D1| = 3, |D2| = 85 − 64 = 21, |D3| = 96 − 65 = 31, and |D4| = 2.
The compactness (15) of samples 1 and 2 in class C1, for example, is calculated as follows:
C(1, 2) = {1/3 + (83 − 72)/21 + (90 − 78)/31 + 2/2}/4 = 0.561.
Table 12.
Golf data.
Table 12.
Golf data.
| Class | Sample | F1: Outlook | F2: Temp. (℉) | F3: Humidity (%) | F4: Windy |
|---|---|---|---|---|---|
| 1 | Overcast | 72 | 90 | TRUE | |
| 2 | Overcast | 83 | 78 | FALSE | |
| 3 | Rainy | 75 | 80 | FALSE | |
| 4 | Overcast | 64 | 65 | TRUE | |
| C1 | 5 | Sunny | 75 | 70 | TRUE |
| (Play) | 6 | Overcast | 81 | 75 | FALSE |
| 7 | Rainy | 68 | 80 | FALSE | |
| 8 | Rainy | 70 | 96 | FALSE | |
| 9 | Sunny | 69 | 70 | FALSE | |
| A | Rainy | 71 | 80 | TRUE | |
| B | Rainy | 65 | 70 | TRUE | |
| C2 | C | Sunny | 80 | 90 | TRUE |
| (Don’t play) | D | Sunny | 85 | 85 | FALSE |
| E | Sunny | 72 | 95 | FALSE |
In this example, class C1 and class C2 are composed of the set {1, 2, 3, 4, 5, 6, 7, 8, 9} and the set {A, B, C, D, E}, respectively. Table 13 summarizes the result of the SHCC for class C1 (play). We have three subclasses: C11 = (1, 2, 4, 6), C12 = (3, 7, 8), and C13 = (5, 9). Subclass C11 is described by the single feature value “Outlook = Overcast”, and the compactness is 0.333. Subclass C12 is described as “Outlook = Rainy and Windy = FALSE”, and the compactness is 0.417. Subclass C13 is described as “Outlook = Sunny and Humidity = 70”, and the compactness is 0.167.
Table 13.
Subclass generation for class C1 (Play) by the SHCC.
On the other hand, from Table 14, we have C21 = (C, D, E) and C22 = (A, B) with the compactness 0.328 and 0.417, respectively. Subclass C21 is described as “Outlook is Sunny and Humidity = [85, 95]”, and C22 as “Outlook = Rainy and Windy = TRUE”. These class descriptions are easily summarized as the decision tree in Figure 10, where Humidity = 77.5 is the midpoint value between [70, 70] and [85, 95].
Table 14.
Subclass generation for class C2 (Don’t play) by the SHCC.
Figure 10.
Decision tree for the golf data.
4. Discussion
This paper presented a region-oriented pattern classification method for mixed feature-type data based on the Cartesian system model (D(d), ⊞, ⊠). We defined the notion of the concept size and then the compactness for objects and/or clusters in the feature space D(d) as the measure of similarity based on the Cartesian join operator ⊞. In unsupervised hierarchical conceptual clustering, the compactness played not only the role of the similarity measure, but also as the roles of the cluster quality measure and the feature-effectiveness criterion. By minimizing the compactness in each clustering step, we could find the meaningful structures, e.g., a functional structure and a cluster structure with respect to the obtained robustly informative features.
On the other hand, in the supervised hierarchical conceptual clustering of a given pattern class, we used the mutual neighborhood condition based on the Cartesian meet ⊠ to assure the separability of the class from other pattern classes, in addition to the evaluation of the compactness in each clustering step.
In the X-or problem, two pattern classes are mutually overlapped in each of the given five features. By the supervised hierarchical conceptual clustering, two subclasses of each pattern class organizing X-or form are exactly detected with respect to the selected first two features.
In the hardwood data, the SHCC generated a single class description C1 for the east hardwoods and two subclasses C21 and C22 for the west hardwoods. For the selected informative features, the scatter diagram and the classification results of test samples show a difficulty for clear separation between class C1 and subclass C21.
In the golf data, the SHCC generated three subclasses, C11 = {Outlook = Overcast}, C12 = {Outlook = Rainy and Windy = FALSE}, and C13 = {Outlook = Sunny and Humidity =70} for class C1 = (Play), while two subclasses, C21 = {Outlook = Sunny and Humidity = [85, 95]} and C22 = {Outlook = Rainy and Windy =TRUE}, for class C2 = (Don’t play) without any feature evaluation criterion used in, for example, ID3/C4.5. These results easily lead to the decision tree in Figure 10.
Finally, we should note that we could obtain subclass descriptions with informative features by applying the supervised hierarchical conceptual clustering to each pattern class against other pattern classes. This approach largely simplifies the process to design a pattern classifier for mixed feature-type symbolic data.
Author Contributions
Conceptualization and methodology, M.I. and H.Y.; original draft preparation, M.I. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by JSPS KAKENHI (Grants-in-Aid for Scientific Research) Grant Number 25330268.
Data Availability Statement
The original data presented in the study are openly available in [U.S. Geological Survey] at [http://pubs.usgs.gov/pp/p1650-b/], reference number [7] (accessed on 7 August 2024).
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Liu, H.; Motoda, H. Computational Methods of Feature Selection; CRC Press: London, UK, 2007. [Google Scholar]
- Miao, J.; Niu, L. A survey on feature selection. Procedia Comput. Sci. 2016, 91, 919–926. [Google Scholar] [CrossRef]
- Solorio-Fernández, S.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A. A review of unsupervised feature selection methods. Artif. Intell. Rev. 2020, 53, 907–948. [Google Scholar] [CrossRef]
- Billard, L.; Diday, E. Symbolic Data Analysis: Conceptual Statistics and Data Mining; Wiley: Chichester, UK, 2007. [Google Scholar]
- Huang, H.S. Supervised feature selection: A tutorial. Artif. Intell. Res. 2015, 4, 22–37. [Google Scholar] [CrossRef]
- Ichino, M.; Umbleja, K.; Yaguchi, H. Unsupervised feature selection for histogram-valued symbolic data using hierarchical conceptual clustering. Stats 2021, 4, 359–384. [Google Scholar] [CrossRef]
- Ichino, M.; Yaguchi, H. Symbolic pattern classifiers based on the Cartesian system model. In Data Science, Classification, and Related Methods; Hayashi, C., Yajima, K., Bock, H.-H., Ohsumi, N., Tanaka, Y., Baba, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
- Histogram Data by the U.S. Geological Survey, Climate-Vegetation Atlas of North America. Available online: http://pubs.usgs.gov/pp/p1650-b/ (accessed on 20 November 2010).
- Wu, X.; Kumar, V. (Eds.) The Top Ten Algorithms in Data Mining; Chapman and Hall/CRC: New York, NY, USA, 2009. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).