Next Article in Journal
Regression Models for Lifetime Data: An Overview
Previous Article in Journal
Addressing Disparities in the Propensity Score Distributions for Treatment Comparisons from Observational Studies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Lookup Table Regression Model for Histogram-Valued Symbolic Data

School of Science and Engineering, Tokyo Denki University, Hatoyama, Saitama 350-0394, Japan
Stats 2022, 5(4), 1271-1293; https://doi.org/10.3390/stats5040077
Submission received: 2 November 2022 / Revised: 23 November 2022 / Accepted: 26 November 2022 / Published: 4 December 2022

Abstract

:
This paper presents the Lookup Table Regression Model (LTRM) for histogram-valued symbolic data. We first transform the given symbolic data to a numerical data table by the quantile method. Then, under the selected response variable, we apply the Monotone Blocks Segmentation (MBS) to the obtained numerical data table. If the selected response variable and some remained explanatory variable(s) organize a monotone structure, the MBS generates a Lookup Table composed of interval values. For a given object, we search the nearest value of an explanatory variable, then the corresponding value of the response variable becomes the estimated value. If the response variable and the explanatory variable(s) are covariate but they follow to a non-monotonic structure, we need to divide the given data into several monotone substructures. For this purpose, we apply the hierarchical conceptual clustering to the given data, and we obtain Multiple Lookup Tables by applying the MBS to each of substructures. We show the usefulness of the proposed method by using an artificial data set and real data sets.

1. Introduction

The extension of regression model has been developed for complex data types. Bock and Diday [1], Billard and Diday [2,3], and Diday [4] present various methods of Symbolic Data Analysis (SDA) for complex data types including regression model for histogram-valued data. Additional linear regression models for histogram-valued variables are developed by Irpino and Verde [5,6], Lina Neto and De Carvalho [7,8,9], and Dias and Brito [10,11]. In these researches, several types of functional forms between the response variable and the explanatory variable(s) have been proposed under the appropriately defined optimality criterion.
The authors proposed a generalized Minkowski metric to analise mixed feature types data [12], and a feature selecdtion method to detect geometrically thin covariate structures embedded in the given data set [13]. This paper describes the Lookup Table Regression Model (LTRM) for histogram-valued symbolic data (Ichino [14]). We apply the quantile method (Ichino [15]) to the given symbolic data table of the size (N objects) × (p variables), and we represent each object by (m + 1) p-dimensional numerical vectors, called the quantile vectors, for a preselected integer m. The integer m controls the granularity of the descriptions for symbolic objects. To the new numerical data table of the size {N × (m + 1)} × p, we apply the algorithm called the Monotone Blocks Segmentation (MBS). The MBS interchange N × (m + 1) rows according to the values of the selected response variable from the smallest to the largest. For each of remained p-1 explanatory variables, i.e., columns, the MBS executes the segmentation of feature values into blocks so that the generated blocks, i.e., interval values, satisfy the monotone property. The MBS discard columns that have only a single block. Therefore, the MBS detects monotone covariate relations existing between the response variable and explanation variable(s). Finally, the MBS obtains the Lookup Table of the size N’ × p’, where N’ < N × (m + 1) and p’ < p. Each element of the table is an interval value corresponding to the segmented block. We realize the interval-value estimation rule for the response variable by the search of the “nearest element” in the lookup table.
The structure of this paper is as follows: Section 2 describes the quantile method to represent multi-dimensional histogram-valued data. Section 3 proposes the basic procedure of the MBS and the LTRM using the Fats and Oils data [12,14,15,16]. In Section 4, we illustrate that the MBS does not work well for non-monotonic data structures using artificially generated data, and we introduce the hierarchical conceptual clustering to obtain the Multi-Lookup Table Regression Model (M-LTRM). Finally, we show the usefulness of the M-LTRM by a real data set. Section 5 discusses about the obtained results, and Section 6 summarizes concluding remarks.

2. Representation of Objects by Quantile Vectors and Bin Rectangles

Let U = {ωi, i = 1, 2, ..., N} be the set of given objects, and let feature variables Fj, j =1 , 2, ..., p, describe each object. Let Dj be the domain of feature Fj, j = 1, 2, ..., p. Then, the feature space is defined by
D(p)= D1 × D2 × ··· × Dp.
Each element of D(p) is represented by
E = E1 × E2 × ··· × Ep,
where Ej is the feature value of Fj, j = 1, 2, ..., p.

2.1. Histogram-Valued Feature

For each object ωi, let each feature Fj be represented by histogram value:
Eij = {[aijk, aij(k+1)), pijk; k = 1, 2, ..., nij},
where pij1 + pij2 + … + pijnij = 1 and nij is the number of bins that compose the histogram Eij.
Therefore, the Cartesian product of p histogram values represents object ωi:
Ei = Ei1 × Ei2 × ··· × Eip.
Since the interval-valued feature is a special case of histogram feature with nij = 1 and pij1 = 1, the representation of (3) is reduced to an interval:
Eij = [aij1, aij2).

2.2. Representation of Histograms by Common Number of Quantiles

Let ωiU be the given object, and let Eij be a histogram value in (3) for a feature Fj. Then, under the assumption that nij bins have uniform distributions, we define the cumulative distribution function Fij(x) of the histogram (3) as:
Fij(x) = 0 for xaij1
Fij(x) = pij1(xaij1)/(aij2aij1) for aij1x < aij2
Fij(x) = F(aij1) + pij2(xaij2)/(aij3aij2) for aij2x < aij3
······
Fij(x) = F(anij−1) + pijnij(xanij)/(anij+1anij) for anijx < anij+1
Fij(x) = 1 for anij+1x.
Figure 1 illustrates a cumulative distribution function for a histogram feature value, where c1, c2, and c3 are cut points for the case m = 4, and q1, q2, and q3 are the corresponding quantile values.
Our general procedure to have common representation for histogram-valued data is as follows.
(1)
We choose common number m of quantiles.
(2)
Let c1, c2, ..., cm−1 be preselected cut points dividing the range of the distribution function Fij(x) into continuous intervals, i.e., bins, with preselected probabilities associated with m cut points.
(3)
For the given cut points c1, c2, ..., cm−1, we have the corresponding quantiles by solving the following equations:
Fij(xij0) = 0, (i.e., xij0 = aij1)
Fij(xij1) = c1, Fij(xij2) = c2, ..., Fij(xij(m−1)) = cm−1, and
Fij(xijm) = 1, (i.e., xijm = aijnij+1).
Therefore, we describe each object ωiU for each feature Fj using a (m + 1) tuple:
(xij0, xij1, xij2, …, xij(m−1), xijm), j = 1, 2, …, p,
and the corresponding histogram using:
Eij = {[xijk, xij(k+1)), (ck+1ck); k = 0, 1, ..., m−1}, j = 1, 2, ..., p,
where we assume that c0 = 0 and cm = 1. In (7), (ck+1ck), k = 0, 1, ..., m−1, denote bin probabilities using the preselected cut point probabilities c1, c2, ..., cm−1. In the quartile case again, m = 4 and c1 = 1/4, c2 = 2/4, and c3 = 3/4, and four bins, [xij0, xij1), [xij1, xij2), [xij2, xij3), and [xij3, xij4), have the same probability 1/4.
It should be noted that the number of bins of the given histograms are mutually different in general. However, we can obtain (m + 1)-tuples as the common representation for all histograms by selecting an integer m and a set of cut points.

2.3. Quantile Vectors and Bin-Rectangles

For each object ωiU, we define (m + 1) p-dimensional numerical vectors, called the quantile vectors, as follows.
xik = (xi1k, xi2k, …, xipk), k = 0, 1, ..., m.
We call xi0 and xim the minimum quantile vector and the maximum quantile vector, respectively. Therefore, m + 1 quantile vectors {xi0, xi1,..., xim} in Rp describe each object ωiU together with cut point probabilities.
The components of m + 1 quantile vectors in (8) for object ωiU satisfy the inequalities:
xij0xij1xij2 ≤···≤ xij(m−1)xijm, j = 1, 2, …, p.
Therefore, m + 1 quantile vectors in (8) for object ωiU satisfy the monotone property:
xi0xi1 ≤ ··· ≤ xim.
For the series of quantile vectors xi0, xi1,..., xim of object ωiU, we define m series of p dimensional rectangles, called bin-rectangles, spanned by adjacent quantile vectors xik and xi(k+1), k = 0, 1, ..., m−1, as follows:
B(xik, xi(k+1)) = [xi1k, xi1(k+1)] × [xi2k, xi2(k+1)] × ··· × [xipk, xip(k+1)], k = 0, 1, ..., m−1.
Figure 2 illustrates two objects represented by two dimensional bin-rectangles in the quartile case. Since a p-dimensional rectangle in Rp is equivalent to a conjunctive logical expression, we also use the term concept for a rectangular expression in the space Rp.

3. Monotone Blocks Segmentation (MBS) and Lookup Table Regression Model (LTRM)

We use the Fats and Oils data in Table 1 in order to describe the basic ideas of the MBS and the LTRM. Each object is described by four interval-valued features, F1, F2, F3, and F4. For each object, Major acids, F5, takes a set of acids from nine possible acids. We assigned the rank values for each acid according to the occurrence frequency for eight objects as follows.
{Lu: 1, A: 2, C: 2, Ln: 4, M: 5, S: 6, P: 7, L: 8, O: 9}
For each object, we assign interval value by the minimum rank value and the maximum rank value according to (12). For example, Linseed takes the minimum rank value 4 and the maximum rank value 9. Hence, the interval value is [4,9].
Since an interval is a special histogram composed of one bin, we split each object into two sub-objects, the minimum sub-object and the maximum sub-object, described by five dimensional quantile vectors, i.e., the minimum quantile vector and the maximum quantile vector. Table 2 is the obtained quantile representation of our numerical data of the size {(8 objects) × 2} × (5 features).
In this example, we select Iodine value as the response variable and remained four are explanatory variables. In Table 3, we interchanged the given sixteen quantile vectors, according to the Iodine value from the minimum value 40 to the maximum value 208. Then, we segment each column into blocks to satisfy monotone property. We use colors to show different blocks.
The most covariate feature to the response variable is the specific gravity. The saponification value and major acids are composed of single blocks. We omit these feature variables from explanatory variables. Figure 3 is the scatter plot of eight interval valued objects for the most covariate feature variables: Iodine value and Specific gravity. The given eight objects are clearly placed along a monotonic curve.
Table 4 is the obtained Lookup Table, where several intervals are composed of reduced interval values. Based on this Lookup Table, we can estimate of Iodine value for each object by using Specific gravity and Freezing point. For example, Specific gravity of Linseed is [0.930, 0.935]. The minimum value 0.930 suggests the minimum value 170 of the response variable value [170, 192]. On the other hand, the maximum value 0.935 suggests the maximum value 204 of the response value [204, 204]. As the result, the estimated Iodine value is [170, 204]. Freezing point of Linseed is [−27, −18], and is included in the interval [−27, 6]. Therefore lookup Table suggests the estimated value [79, 208].
Table 5 shows the estimated values for the Fats and Oils data. We should note that almost estimated interval values include the given Iodine values. Only the estimated interval for Sesame is shorter than the actual value. The estimation by Specific gravity yield finer results compared to the results by Freezing point.
In order to check the ability of our LTRM, we use unknown fats and oils data in Table 6. Table 7 is the estimated result based on our Lookup Table in Table 4. The estimation accuracy for plant oils is better than that for fats. These results suggest the following possibilities.
  • Our Fats and Oils data is composed of six plant oils and only two animal fats. By increasing the number of sample objects, we may obtain more accurate Lookup Table.
  • On the other hand, increasing the number of sample objects may complicate the covariate relations between the response variable and explanatory variables. For example, if we separate the plant oils and animal fats, we may have better Lookup Tables for respective categories.
Based on these possibilities, we discuss Multi-Lookup Table Regression Model (M-LTRM) using hierarchical conceptual clustering in the next section.

4. Multi-Lookup Table Regression Model (M-LTRM)

4.1. Illustration by Oval Data

We use the artificial data in Figure 4 and Table 8. This data was applied to check the capability of the unsupervised feature selection method using hierarchical conceptual clustering in [16]. Sixteen small rectangles organize an oval structure in the first two features F1 and F2, as shown in Figure 4. In this figure, the oval structure is organized by four different colored monotone substructures. For each of sixteen objects, we transform the feature values of F1 and F2 to 0–1 normalized interval values. Then, as feature values of F3, F4, and F5, we select three randomly generated interval values in the unit interval [0, 1] for sixteen objects. Table 8 summarizes sixteen objects described by five interval-valued features.
We rewrote Table 8 to the form {(16 objects) × (2 quantile values)} × (5 features), then we applied the MBS to the data table of the size (32 quantile values) × (5 features) under the response variable F1. Table 9 is the result of our MBS for the Oval data, where each explanatory variable has only one block. This result suggests to divide the oval structure into several monotone segments. For this purpose, we use the hierarchical conceptual clustering in the next subsection.

4.2. Hierarchical Conceptual Clustering

As the measure of similarity between objects and/or clusters described by histogram-valued features, we use the compactness proposed in [16]. Under the assumption of common number of bins m and equal bin probabilities, the compactness C(ωi, ωl) defines the concept size spanned by two objects ωi and ωl in the feature space Rp and have the following properties:
(1)
0 ≤ C(ωi, ωl) ≤ 1
(2)
C(ωi, ωi), C(ωl, ωl) ≤ C(ωi, ωl)
(3)
C(ωi, ωl) = C(ωl, ωi)
(4)
C(ωi, ωr) ≤ C(ωi, ωl) + C(ωl, ωr) may not hold in general.
In the Oval data in Figure 4, C(1, 1) defines the concept size of rectangle 1. C(1, 2) defines the concept size of the minimum rectangle including rectangles 1 and 2. Therefore, monotone property 2) is clear.
Let U = {ω1, ω2, ..., ωN} be the given set of objects, and let each object ωi be described using p histograms in the feature space Rpas Ei = Ei1 × Ei2 × ··· × Eip. We assume that all histogram values for all objects have the same number of quantiles m and the same bin probabilities (Algorithm 1).
Algorithm 1 (Hierarchical Conceptual Clustering (HCC) [16])
Step 1: For each pair of objects ωi and ωl in U, evaluate the compactness C(ωi, ωl) and find the pair ωq and ωr that minimizes the compactness.
Step 2: Add the merged concept ωqr = {ωq, ωr} to U and delete ωq and ωr from U. The merged concept ωqr is again described using p histograms as Eqr = Eqr1 × Eqr2 × ··· × Eqrp by the Cartesian join operation defined in [16] under the assumption of m quantiles and the equal bin probabilities.
Step 3: Repeat Step 1 and Step 2 until U includes only one concept, i.e., the whole concept.
It should be noted that to minimize the compactness between objects is equivalent to maximize the dissimilarity between the merged concept of the objects and the whole concept. In this meaning, the compactness plays not only the role of the measure of similarity between objects and/or clusters, but also plays the role of cluster quality.
The Oval data is a special histogram data, where each feature value takes an interval value with probability one. By applying the HCC to the Oval data, we obtain the dendrogram in Figure 5. We cut the dendrogram at the concept size 0.8 to find a least number of monotone substructures. As the result, we obtained four clusters (a) = (1, 2, 3), (b) = (4, 5, 6), (c) = (7, 8, 9, 10, 11), and (d) = (12, 13, 14, 15, 16). Table 10 is the results of MBS for these four clusters. Table 11 shows the final Lookup Tables for the Oval data. In this example, the resolution of Lookup Tables for clusters (b) and (d) are better than for (a) and (c). We can estimate the value of response variable F1 for each object by finding the nearest values of explanatory variable F2 in these four Lookup Tables.
From the results of the Fats and Oils data and the Oval artificial data, we see that the following facts.
  • The MBS is able to detect a covariate relation between the response variable and explanation variable(s), if the covariate relation has a monotone structure. In other words, the MBS has a feature selection capability when the target covariate relation has a monotone structure.
  • On the other hand, the unsupervised feature selection method using hierarchical conceptual clustering in [16] can detect “geometrically thin structures” embedded in the given histogram-valued data. The covariation of F1 and F2 is found by evaluating the compactness for each of five features in each step of clustering [16]. Therefore, the compactness plays also the role of feature effectiveness criterion.

4.3. M-LTRM for the Hardwood Data

The data is selected from the US Geological Survey (Climate-Vegetation Atlas of North America) [17]. The number of objects is ten and the number of features is eight.
Table 8 shows quantile values for the selected ten hardwoods under the feature: (Mean) Annual Temperature (ANNT). We selected the following eight features to describe objects (hardwoods). The data formats for other features F2~F8 are the same as in Table 12.
  • F1: Annual Temperature (ANNT) (°C);
  • F2: January Temperature (JANT) (°C);
  • F3: July Temperature (JULT) (°C);
  • F4: Annual Precipitation (ANNP) (mm);
  • F5: January Precipitation (JANP) (mm);
  • F6: July Precipitation (JULP) (mm);
  • F7: Growing Degree Days on 5 °C base ×1000 (GDC5);
  • F8: Moisture Index (MITM).
Our hardwoods data is a numerical data of the size {(10 objects) × (7 quantile values)} × (8 features). We first apply the quantile method of the principal component analysis (PCA) in [15] to our data. Table 13 is the obtained first two principal components and Figure 6 shows the mutual positions of eight features by two eigen vectors. We have two groups {ANNP, JANP, JULP, and MOISTURE} and {ANNT, JANT, JULT, and GDC5}. Figure 7 shows the mutual positions of ten objects in the first factor plane. Each hardwood is represented by six arrow lines connecting from the minimum quantile vector to the maximum quantile vectors.
We should note the following facts for the results of PCA.
  • The first principal component is the size factor and the second is the shape factor, and the sum of their contribution ratios is very high.
  • East hardwoods show similar line graphs, and the maximum quantile vectors take mutually near positions.
  • West hardwoods are separated into two groups: {ACER WEST and ALNUS WEST} and {FRAXINUS WEST, JUGLANS WEST, and QUERCUS WEST}. The last line segments are very long especially for ACER WEST and ALNUS WEST.
We change the places of “objects” and “features” in our hardwoods data. Then, we apply the quantile method of PCA to the dual data of the form {(8 features) × (7 quantile values)} × (10 objects). Table 14 is the first two principal components for the dual data, and Figure 8 shows the mutual positions of ten hardwoods by two eigenvectors. West hardwoods are separated again into two different groups. Figure 9 shows the mutual position of eight features in the first factor plane. Each feature is represented by a series of six line segments connecting from the minimum quantile vector to the maximum quantile vector.
We should note the following facts for the results of our dual-PCA.
  • The first principal component is the size factor and the second is the shape factor, and the sum of their contribution ratios is very high.
  • We have two groups {ANNP, JANP, JULP, and MITM} and {ANNT, JANT, JULT, and GDC5}. MITM and GDC5 are very long line graphs compared to other members in each group.
We assume that GDC5 is the response variable and seven other variables are explanatory variables. Then, we applied the MBS to the data of the size (10 × 7 quantile values) × (8 features), and we obtained the result of Table 15. We used colors to show different blocks. The MBS selected only ANNT, JANT, and JULT as explanatory variables.
Table 16 is the obtained Lookup Table for Hardwood data. In this table, ANNT shows the most strong connection to the response variable GDC5. We use test data in Table 17 to check the estimation ability of our Lookup Table. Table 18 summarizes the estimated result for our test data. In the range [0.1, 2.5] of GDC5, the result requires a further improvement, since the PCA result in Figure 7 suggests the use of clustering.
Under the assumption of quartiles, we applied the HCC to the Hardwood data, and we obtained the result in Figure 10. By cutting the dendrogram at the concept size 0.8, we have four clusters (FW, JW, QW), (AcE, AlE, FE, JE, QE), (AcW), and (AlW). The concept size for the East Hardwood is 0.671. By the addition of AcW to the East Hardwood, the concept size increases largely to 0.847. However, further addition of AlW shows a small increasing to 0.935. This fact asserts that AcW and AlW are mutually similar. The PCA result in Figure 7 also supports the cluster (AcW, AlW). Therefore, we suppose three clusters C1 = (AcW, AlW), C2 = (AcE, AlE, FE, JE, QE), and C3 = (FW, JW, QW) in the following discussions.
From the view point of unsupervised features selection in [16], the most informative features are ANNP then JULP in clustering steps 1~7, and feature JANT is important to separate cluster (FW, JW, QW) from the other large cluster in step 8. Figure 11 shows the mutual positions of ten hardwoods in the plane by ANNP and JANT, and is very similar to the result in Figure 7.
We applied the MBS to each of three clusters C1 = (AcW, AlW), C2 = (AcE, AlE, FE, JE, QE) and C3 = (FW, JW, QW). Table 19, Table 20 and Table 21 are the Lookup Tables for these three clusters. In Table 19, JULT contributes in the range [0.1, 1.1] of GDC5. On the other hand, in Table 20 and Table 21, ANNT is strongly connected to the whole range of GDC5.
Figure 12 shows the scatter diagram of Hardwood data for ANNT and GDC5, where all hardwoods exist in a narrow region. We use the estimation of GDC5 by ANNT for cluster C2, since Lookup Table for C2 covers the most wide range of ANNT compared to other Lookup Tables. Figure 13 shows the graph of GDC5 for ANNT under cluster C2, and Table 22 is the estimation result for the test data. We could have a better estimation result compared to the result in Table 18.

5. Discussion

The quantile method is a unified quantification method for the histogram-valued symbolic data. When each of N objects is described by p histogram-valued features, we select a common integer number m and we represent each of p histogram-valued features by (m + 1) quantile values. As the result, we have a numerical data table of the size {N × (m + 1) quantiles} × (p features). Based on this type numerical data table, we proposed Lookup Table Regression Model (LTRM) using Monotone Blocks Segmentation (MBS). Then, we extended LTRM to Multi-Lookup Table Regression Model (M-LTRM) using the hierarchical conceptual clustering in [16]. In the following we discuss our results.
  • As a mixed feature-type symbolic data, we used the Fats and Oils data. This data is organized from eight objects, two fats and six plant oils, described by four interval valued features and one multinominal feature. By the quantile method, we transformed to a numerical data table of the size (8 × 2 quantiles) × (5 features). Then, we applied our MBS to this data under the assumption that the response variable is Iodine value. The MBS selected Specific gravity as the most covariate feature to Iodine value, then Freezing point. By using the obtained Lookup Table, we checked the estimation of Iodine value of each given object using Specific gravity and Freezing point. The estimated result is reasonable for the given fats and oils. We also checked our Lookup Table by a set of independent fats and oils. The result by test samples suggests the use of clustering and Multi-Lookup Table Regression Model (M-LTRM) to improve the estimation accuracy.
  • The MBS works well to generate a meaningful Lookup Table, when the response variable and explanatory variable(s) follow to a monotone structure. Therefore, if the response variable and explanatory variable(s) follow to non-monotonic data structure, we have to divide the given data structure into several monotone substructures. We used the hierarchical conceptual clustering in [16] to the Oval artificial data, and we could obtain four monotone substructures and corresponding Lookup Tables.
  • As a general histogram-valued data, we used the Hardwood data of the size {(10 objects) × (7 quantiles)} × (8 features). We applied the quantile method of Spearman PCA to this data. As a monotone structure, the first factor plane draws three streams C1 = (AcW, AlW), C2 = (AcE, AlE, FE, JE, QE), and C3 = (FW, JW, QW) with a very high contribution ratio. We applied also the Spearman PCA to the dual data of the size {(8 features) × (7 quantiles)} × (10 objects), and we obtained a monotone structure composed of two groups, (ANNP, JANP, JULP, MITM) and (ANNT, JANT, JULT, GDC5) in the first factor plane with a very high contribution ratio. We applied the MBS to the Hardwood data under the assumption that GDC5 is the response variable, and obtained the Lookup Table with explanatory variables: ANNT, JANT, and JULT. Therefore, our MBS has the ability of supervised feature selection.
  • For a further improvement of the Lookup Table, we applied the hierarchical conceptual clustering to the Hardwood data, and obtained again three clusters: C1, C2, and C3. From the view point of unsupervised feature selection by the hierarchical conceptual clustering, features ANNP then JULP are informative during 1~7 steps of clustering, and JANT is important to separate between clusters (C1, C2) and C3. In fact, the scatter plot of ten hardwoods in the plane ANNP and JANT is very similar to the result of PCA for the Hardwood data. We applied again the MBS to each of clusters C1, C2, and C3, and we obtained three different Lookup Tables. As the result, the Lookup Table for C2 has the highest resolution to estimate GDC5 using ANNT, and it achieves the better estimation result for our test data.

6. Conclusions

This paper proposed the Lookup Table Regression Model (LTRM) and Multi-Lookup Table Regression Model (M-LTRM) for histogram-valued symbolic data. The proposed models are very different from the traditional functional models developed for histogram-valued symbolic data.
The monotone blocks segmentation (MBS) is simple but effective to detect covariate explanatory variable(s) for the selected response variable and to obtain the Lookup Table. For a given object, the LTRM estimates each quantile value of the selected response variable by finding the nearest quantile value of an explanatory variable in the Lookup Table. We showed also, the quantile method of symbolic PCA is useful to detect monotone structure embedded in multidimensional histogram-valued symbolic data. Furthermore, the dual-PCA is useful to detect covariate explanatory variables for the selected response variable.
When the quantile method of symbolic PCA does not work well to the given histogram-valued symbolic data, the MBS also fails to have Lookup Table. In such general cases, hierarchical conceptual clustering (HCC) becomes useful. By the HCC, we may divide the given data set into several sub-data sets that have monotone structures. Then, by applying the MBS to each monotone structure, we can lead to the Multi-Lookup Table Regression Model that is flexible than the LTRM.

Funding

This work was supported by JSPS KAKENHI (Grants-in-Aid for Scientific Research) Grant Number 25330268.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

Not Applicable.

Acknowledgments

The author thanks to Kadri Umbleja for her collaborations.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Bock, H.-H.; Diday, E. Analysis of Symbolic Data; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
  2. Billard, L.; Diday, E. Symbolic Data Analysis: Conceptual Statistics and Data Mining; Wiley: Chichester, UK, 2007. [Google Scholar]
  3. Billard, L.; Diday, E. Regression analysis for interval-valued data. In Data Analysis, Classification and Related Metods. Proceedings of the Conference of the International Federation of Classification Societies (IFCS’00); Springer: Berlin/Heidelberg, Germany, 2000; pp. 347–369. [Google Scholar]
  4. Diday, E. Thinking by classes in data science: The symbolic data analysis paradigm. WIREs Comput. Stat. 2016, 8, 172–205. [Google Scholar] [CrossRef]
  5. Verde, R.; Irpino, A. Ordinary least squares for histogram data based on Wasserstein distance. In Proceedings of the COM-STAT’, Paris, France, 22–27 August 2010; Lechevallier, Y., Saporta, G., Eds.; Physica-Verlag: Heidelberg, Germany, 2010; pp. 581–589. [Google Scholar]
  6. Irpino, A.; Verde, R. Linear regression for numeric symbolic variables: Ordinary least squares approach based on Wasserstein Distance. Adv. Data Anal. Classif. 2015, 9, 81–106. [Google Scholar] [CrossRef] [Green Version]
  7. Neto, L.; Carvalho, D. Center and range method for fitting a linear regression model for symbolic interval data. Comput. Stat. Data Anal. 2008, 52, 1500–1515. [Google Scholar] [CrossRef]
  8. Neto, L.; Carvalho, D. Constrained linear regression models for symbolic interval-valued variables. Comput. Stat. Data. Anal. 2010, 54, 333–347. [Google Scholar] [CrossRef]
  9. Neto, L.; Cordeiro, M.; Carvalho, D. Bivariate symbolic regression models for interval-valued variables. J. Stat. Comput. Simul. 2011, 81, 1727–1744. [Google Scholar] [CrossRef]
  10. Dias, S.; Brito, P. Linear regression model with Histogram-valued variables. Stat. Anal. Data Min. 2015, 8, 75–113. [Google Scholar] [CrossRef] [Green Version]
  11. Dias, L.; Brito, P. (Eds.) Analysis of Distributional Data; CRC Press: Boca Raton, FL, USA, 2022. [Google Scholar]
  12. Ichino, M.; Yaguchi, H. Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Trans. Syst. Man Cybern. 1994, 24, 698–708. [Google Scholar] [CrossRef]
  13. Ono, Y.; Ichino, M. A new feature selection method based on geometrical thickness. In Proceedings of the KESDA’98, Luxembourg, 27–28 April 1998; Volume 1, pp. 19–38. [Google Scholar]
  14. Ichino, M. The lookup table regression model for symbolic data. In Proceedings of the Data Sciences Workshop, Paris-Dauphin University, Paris, France, 12–13 November 2015. [Google Scholar]
  15. Ichino, M. The quantile method of symbolic principal component analysis. Stat. Anal. Data Min. 2011, 4, 184–198. [Google Scholar] [CrossRef]
  16. Ichino, M.; Umbleja, K.; Yaguchi, H. Unsupervised feature selection for histogram-valued symbolic data using hierarchical conceptual clustering. Stats 2021, 4, 359–384. [Google Scholar] [CrossRef]
  17. Histogram Data by the U.S. Geological Survey, Climate-Vegetation Atlas of North America. Available online: http://pubs.usgs.gov/pp/p1650-b/ (accessed on 11 November 2010).
Figure 1. Cumulative distribution function and cut point probabilities.
Figure 1. Cumulative distribution function and cut point probabilities.
Stats 05 00077 g001
Figure 2. Representation of objects by bin-rectangle in the quartile case.
Figure 2. Representation of objects by bin-rectangle in the quartile case.
Stats 05 00077 g002
Figure 3. Scatter plot of eight objects for the most covariate variables.
Figure 3. Scatter plot of eight objects for the most covariate variables.
Stats 05 00077 g003
Figure 4. Oval data.
Figure 4. Oval data.
Stats 05 00077 g004
Figure 5. Dendrogram using HCC for Oval artificial data in Table 8.
Figure 5. Dendrogram using HCC for Oval artificial data in Table 8.
Stats 05 00077 g005
Figure 6. Scatter plot of eight features by two eigen vectors.
Figure 6. Scatter plot of eight features by two eigen vectors.
Stats 05 00077 g006
Figure 7. Results of PCA for hardwoods data.
Figure 7. Results of PCA for hardwoods data.
Stats 05 00077 g007
Figure 8. Scatter plot of ten hardwoods by two eigen vectors.
Figure 8. Scatter plot of ten hardwoods by two eigen vectors.
Stats 05 00077 g008
Figure 9. Results of dual-PCA for hardwoods data.
Figure 9. Results of dual-PCA for hardwoods data.
Stats 05 00077 g009
Figure 10. Dendrogram using HCC for Hardwood data.
Figure 10. Dendrogram using HCC for Hardwood data.
Stats 05 00077 g010
Figure 11. Scatter diagram of Hardwood data by ANNP and JANT.
Figure 11. Scatter diagram of Hardwood data by ANNP and JANT.
Stats 05 00077 g011
Figure 12. Scatter diagram of Hardwood data for ANNT and GDC5.
Figure 12. Scatter diagram of Hardwood data for ANNT and GDC5.
Stats 05 00077 g012
Figure 13. Estimation of GDC5 by ANNT for cluster C2.
Figure 13. Estimation of GDC5 by ANNT for cluster C2.
Stats 05 00077 g013
Table 1. Fats and Oils data (Ichino and Yaguchi, 1994 [12]).
Table 1. Fats and Oils data (Ichino and Yaguchi, 1994 [12]).
Specific Gravity (g/cm3): F1Freezing Point (°C): F2Iodine Value: F3Saponification Value:F4Major Acids: F5
Linseed[0.930, 0.935][−27, −18][170, 204][118, 196]L, Ln, O, P, M
Perilla[0.930, 0.937][−5, −4][192, 208][188, 197]L, Ln, O, P, S
Cotton[0.916, 0.918][−6, −1][99, 113][189, 198]L, O, P, M, S
Sesame[0.920, 0.926][−6, −4][104, 116][187, 196]L, O, P, S, A
Camellia[0.916, 0.917][−21, −15][80, 82][189, 193]L, O
Olive[0.914, 0.919][0, 6][79, 90][190, 199]L, O, P, S
Beef[0.860, 0.870][30, 38][40, 48][190, 199]O, P, M, S, C
Hog[0.858,0.864][22, 32][53, 77][190, 202]L, O, P, M, S, Lu
L: Linoleic acid; Ln: Linolenic acid; O: Oleic acid; P: Palmitic acid; M: Myristic acid; S: Stearic acid; A: Arachidic acid; C: Capric acid; Lu: Lauric acid
Table 2. Fats and Oils data by quantile vectors.
Table 2. Fats and Oils data by quantile vectors.
Specific GravityFreezing PointIodine ValueSaponification ValueMajor Acids
Linseed 10.930−271701184
Linseed 20.935−182041969
Perilla 10.930−51921884
Perilla 20.937−42081979
Cotton 10.916−6991895
Cotton 20.918−11131989
Sesame 10.920−61041872
Sesame 20.926−41161939
Camellia 10.916−21801898
Camellia 20.917−15821939
Olive 10.9140791876
Olive 20.9196901969
Beef 10.86030401902
Beef 20.87038481999
Hog 10.85822531901
Hog 20.86432772029
Table 3. Monotone Blocks Segmentation (MBS) for Fats and Oils data.
Table 3. Monotone Blocks Segmentation (MBS) for Fats and Oils data.
Iodine ValueSpecific GravityFreezing PointSaponification ValueMajor Acids
Beef 1400.860301903
Beef 2480.870381999
Hog 1530.858221901
Hog 2770.864322029
Olive 1790.91401876
Camellia 1800.916−211898
Camellia 2820.917−151939
Olive 2900.91961969
Cotton 1990.916−61895
Sesame 11040.920−61872
Cotton 21130.918−11989
Sesame 21160.926−41939
Linseed 11700.930−271184
Perilla 11920.930−51884
Linseed 22040.935−181969
Perilla 22080.937−41979
Table 4. The Lookup Table for the Fats and Oils data.
Table 4. The Lookup Table for the Fats and Oils data.
Iodine ValueSpecific GravityFreezing Point
[40, 77][0.858, 0.870][22, 38]
[79, 79][0.914, 0.914]
[79, 208] [−27, 6]
[80, 113][0.916, 0.920]
[116, 116][0.926, 0.926]
[170, 192][0.930, 0.930]
[204, 204][0.935, 0.935]
[208, 208][0.937, 0.937]
Table 5. Estimated result by the LTRM for the Fats and Oils data.
Table 5. Estimated result by the LTRM for the Fats and Oils data.
Fats and OilsEstimated byEstimated byActual Value
Specific GravityFreezing Point
Linseed[170, 204][79, 208][170, 204]
Perilla[170, 208][79, 208][188, 197]
Cotton[80, 113][79, 208][99, 113]
Sesame[113, 116][79, 208][104, 116]
Camellia[80, 113][79, 208][80, 82]
Olive[79, 113][79, 208][79, 90]
Beef[40, 77][40, 77][40, 48]
Hog[40, 77][40, 77][55, 77]
Table 6. Test data for the LTRM.
Table 6. Test data for the LTRM.
Fats and OilsSpecific GravityFreezing PointIodine Value
Corn[0.920, 0.928][−18, −10][88, 147]
Soybeen[0.922, 0.934][−7, −8][114, 138]
Rice bran[0.916, 0.922][−10, −5][92, 115]
Horse fat[0.90, 0.95][30, 35][65, 95]
Sheep tallow[0.89, 0.90][30, 35][35, 46]
Chiken fat[0.91, 0.92][30, 32][76, 80]
Table 7. Estimated result for the Test data.
Table 7. Estimated result for the Test data.
Fats and OilsEstimated byEstimated byActual Value
Specific GravityFreezing Point
Corn[113, 170][79, 208][88, 147]
Soybeen[113, 204][79, 208][114, 138]
Rice bran[80, 113][79, 208][92, 115]
Horse fat[79, 208][40, 77][65, 95]
Sheep tallow[77, 79][40, 77][35, 46]
Chiken fat[79, 113][40, 77][76, 80]
Table 8. Oval artificial data.
Table 8. Oval artificial data.
F1F2F3F4F5
1[0.629, 0.798][0.905, 0.986][0.000, 0.982][0.002, 0.883][0.360, 0.380]
2[0.854, 0.955][0.797, 0.905][0.002, 0.421][0.573, 1.000][0.754, 0.761]
3[0.921, 1.000][0.527, 0.716][0.193, 0.934][0.035, 0.477][0.406, 0.587]
4[0.865, 0.933][0.378, 0.500][0.452, 0.854][0.213, 0.604][0.000, 0.074]
5[0.775, 0.876][0.257, 0.338][0.300, 0.614][0.425, 0.979][0.217, 0.568]
6[0.663, 0.764][0.135, 0.216][0.712, 1.000][0.904, 0.968][0.103, 0.950]
7[0.494, 0.596][0.041, 0.122][0.293, 0.470][0.023, 0.086][0.765, 0.902]
8[0.225, 0.427][0.000, 0.081][0.633, 0.872][0.000, 0.582][0.719, 0.852]
9[0.112, 0.213][0.041, 0.149][0.167, 0.802][0.056, 0.129][0.124, 0.642]
10[0.022, 0.112][0.162, 0.270][0.026, 0.718][0.418, 0.851][0.549, 0.853]
11[0.000, 0.090][0.297, 0.392][0.096, 0.759][0.438, 0.938][0.495, 0.760]
12[0.045, 0.112][0.446, 0.554][0.826, 0.962][0.230, 0.755][0.104, 0.189]
13[0.101, 0.202][0.608, 0.676][0.367, 0.570][0.236, 0.684][0.683, 0.930]
14[0.213, 0.292][0.676, 0.811][0.371, 0.381][0.086, 0.305][0.009, 1.000]
15[0.315, 0.438][0.811, 0.919][0.049, 0.585][0.056, 0.891][0.528, 0.881]
16[0.483, 0.562][0.878, 1.000][0.402, 0.609][0.150, 0.769][0.207, 0.732]
Table 9. The result of MBS for the Oval data.
Table 9. The result of MBS for the Oval data.
F1F2F3F4F5
11100.2970.0960.4380.495
1010.0220.1620.0260.4180.549
1210.0450.4460.8260.230.104
1120.090.3920.7590.9380.76
1310.1010.6080.3670.2360.683
910.1120.0410.1670.0560.124
1020.1120.270.7180.8510.853
1220.1120.5540.9620.7550.189
1320.2020.6760.570.6840.93
920.2130.1490.8020.1290.642
1410.2130.6760.3710.0860.009
810.22500.63300.719
1420.2920.8110.3810.3051
1510.3150.8110.0490.0560.528
820.4270.0810.8720.5820.852
1520.4380.9190.5850.8910.881
1610.4830.8780.4020.150.207
710.4940.0410.2930.0230.765
1620.56210.6090.7690.732
720.5960.1220.470.0860.902
110.6290.90500.0020.36
610.6630.1350.7120.9040.103
620.7640.21610.9680.95
510.7750.2570.30.4250.217
120.7980.9860.9820.8830.38
210.8540.7970.0020.6730.754
410.8650.3780.4520.2130
520.8760.3380.6140.9790.568
310.9210.5270.1930.0350.406
420.9330.50.8540.6040.074
220.9550.9050.42110.761
3210.7160.9340.4770.587
Table 10. The results of MBS for four clusters.
Table 10. The results of MBS for four clusters.
(a)(b)
F1F2 F1F2
110.6290.905610.6630.135
120.7980.986620.7640.216
210.8540.797510.7750.257
310.9210.527410.8650.378
220.9550.905520.8760.338
321.0000.716420.9330.500
(c)(d)
F1F2 F1F2
1110.0000.2971210.0450.446
1010.0220.1621310.1010.608
1120.0900.3921220.1120.554
1020.1120.2701320.2020.676
910.1120.0411410.2130.676
920.2130.1491420.2920.811
810.2250.0001510.3150.811
820.4270.0811520.4380.919
710.4940.0411610.4830.878
720.5960.1221620.5621.000
Table 11. Look up tables for the Oval data.
Table 11. Look up tables for the Oval data.
(a)(c)
F1F2F1F2
[0.629, 0.798][0.905, 0.986][0.000, 0.112][0.162, 0.392]
[0.854, 1.000][0.527, 0.905][0.112, 0.596][0.000, 0.149]
(b)(d)
F1F2F1F2
[0,663, 0.663][0.135, 0.135][0.045, 0.112][0.446, 0.608]
[0.764, 0.764][0.216, 0.216][0.202, 0.213][0.676, 0.676]
[0.775, 0.775][0.257, 0.257][0.292, 0.315][0.811, 0.811]
[0.865, 0.876][0.338, 0.378][0.438, 0.483][0.878, 0.919]
[0.933, 0.933][0.500, 0.500][0.562, 0.562][1.000, 1.000]
Table 12. The quantile values for ANNT.
Table 12. The quantile values for ANNT.
Taxon NameMean Annual Temperature (°C)
0%10%25%50%75%90%100%
ACER EAST−2.30.63.89.214.417.924
ACER WEST−3.90.21.94.27.510.321
ALNUS EAST−10−4.4−2.30.66.115.021
ALNUS WEST−12−4.6−3.00.33.27.619
FRAXINUS EAST−2.31.44.38.614.117.923
FRAXINUS WEST2.69.411.517.221.222.724
JAGLANS EAST1.36.99.112.415.517.621
JAGLANS WEST7.312.614.116.319.422.727
QUERCUS EAST−1.53.46.311.216.419.124
QUERCUS WEST−1.56.09.514.617.919.927
Table 13. The first two principal components for hardwood data.
Table 13. The first two principal components for hardwood data.
SpearmanPc1Pc2
Eigen values6.6910.909
Contribution (%)83.63511.357
Eigen vectorPc1Pc2
ANNT0.362−0.363
JANT0.346−0.427
JULT0.372−0.208
ANNP0.3590.369
JANP0.3370.365
JULP0.3520.170
GDC50.365−0.331
MITM0.3350.484
Table 14. The first two principal components for hardwoods data (dual).
Table 14. The first two principal components for hardwoods data (dual).
SpearmanPc1Pc2
Eigen values8.790.54
Contribution (%)87.895.40
Eigen vectorsPc1Pc2
AcE0.3230.156
AcW0.3050.308
AlE0.3170.354
AlW0.3030.496
FE0.3310.008
FW0.305−0.436
JE0.318−0.071
JW0.309−0.497
QE0.331−0.056
QW0.320-0.253
Table 15. The result of MBS for Hardwood data.
Table 15. The result of MBS for Hardwood data.
Q.V.GDC5ANNTJANTJULTANNPJANPJULPMITM
AcW10.1−3.9−23.87.1105500.14
AlE10.1−10.2−30.97.12209280.22
AlW10.1−12.2−30.57.1170400.22
QuW10.3−1.5−129.785100.08
AcE10.5−2.3−24.611.541510560.62
AcW20.50.2−11.811.33802880.49
AlW20.5−4.6−25.711.533518210.49
AlE20.6−4.4−26.513.238019580.53
AcW30.71.9−10.112.850554230.61
AlW30.7−3−21.612.841023410.59
AlE30.8−2.3−22.714.847523740.69
FrE10.8−2.3−23.813.52706180.39
QuE10.8−1.5−22.713.52407320.21
AlW40.90.3−15.114.451037570.72
FrW10.92.6−7.412.585500.09
JuE111.3−14.615.25259410.63
AcW41.14.2−6.914.975092380.75
AlE41.10.6−18.116.577046910.93
AlW51.13.2−7.615.679093740.87
AcE21.20.6−18.316.672023770.89
FrE21.31.4−1817.441012540.6
QuW21.46−5.416.22951020.35
AcE31.53.8−12.318.283540890.94
QuE21.53.4−14.518.450514560.66
AcW51.67.5−1.317.61175176520.91
AlW61.67.6−0.817.51385199870.97
FrE31.64.3−13.11965521740.83
JuW11.67.3−1.317.1235100.2
AlE51.96.1-819.81060801080.99
FrW229.4−0.2182551220.27
JuE226.9−9.120.378522770.88
QuE326.3−9.720.574525770.88
QuW329.50.218.938513190.48
AcW62.210.33.319.91860267710.98
FrE42.48.6−622.291055940.95
AcE42.59.2−5.122.21010691000.97
JuE32.59.1−5.422.189040910.93
FrW32.711.53.521.236019120.38
QuE42.911.2−2.823.996061970.95
JuW2312.63.3203559510.42
JuE43.112.4−124.71030711010.96
FrE53.514.11.725.71130851080.98
JuW33.514.15.620.944511760.57
AcE53.614.42.325.81200961130.99
QuW43.614.66.821.154025540.63
AlE63.7153.725.712351061260.99
JuE53.915.53.826.41190961120.97
QuE54.216.4526.91175901100.98
JuW44.316.38.822.7625171600.69
FrW44.517.29.124.348528430.49
JuE64.717.6727.713501271240.99
AcE64.817.97.927.313551271350.99
AlW74.818.710.828.346856674521
FrE64.817.97.527.413201181270.99
QuW54.817.911.324.2815631500.77
QuE65.219.19.52813451221330.99
JuW55.419.412.525.3790242000.78
QuW65.519.915.327.411601632010.88
AcW75.620.61129.243706161601
AlE75.920.914.129.116501662121
FrW55.921.21328.970577600.64
JuE7621.412.429.415601502041
FrW66.522.714.730.41155217850.78
JuW66.522.718.427.7905352240.89
FrE76.723.218.129.516301662181
AcE76.823.818.928.816301662221
FrW76.924.416.933.125554142060.97
QuE7724.219.631.816301612221
JuW78.526.626.231.312451663280.94
QuW78.527.226.233.825554003500.99
Table 16. Lookup table of Hardwood data.
Table 16. Lookup table of Hardwood data.
GDC5ANNTJANTJULT
[0.1, 0.1] [7.1, 7.1]
[0.1, 2.5][−12.2, 10.3]
[0.1, 4.2] [−30.9, 6.8]
[0.3, 0.5] [9.7, 11.5]
[0.6, 0.9] [12.5, 14.8]
{1.0, 1.1] [14.9, 15.2]
[1.0, 6.8] [15.6, 30.4]
[2.7, 3.1][11.2, 12.6]
[3.5, 3.6][14.1, 14.6]
[3.7, 4.8][15,0, 18.7]
[4.3, 6.5] [7.0, 15.3]
[4.5, 4.8][17.2, 18.7]
[5.2, 5.5][19.1, 19.9]
[5.6, 5.9][20.6, 21.2]
[6.0, 6.5][21.4, 22.7]
[6.5, 6.9] [16.9, 18.9]
[6.7, 7.0][23.2, 24.4]
[6.9, 8.5] [31.3, 33.8]
[7.0, 7.0] [19.6, 19.6]
[8.5, 8.5][26.6, 27.2][26.2, 26.2]
Table 17. Test data for the Lookup table of Hardwood data.
Table 17. Test data for the Lookup table of Hardwood data.
TAXON NAMEQuantiles (%)
01025507590100
BETURAGDC50.00.30.60.91.53.25.7
ANNT−13.4−8.4−5.1−1.03.912.620.3
CARYAGDC51.42.12.63.44.55.26.7
ANNT3.67.510.013.617.219.423.5
CASTANEAGDC51.42.22.83.74.65.26
ANNT4.48.611.314.917.519.221.5
CAPRINUSGDC511.622.94.15.28.6
ANNT1.24.4711.41619.228
TILIAGDC51.01.61.92.43.03.65.4
ANNT1.13.85.88.812.014.419.9
ULMUSGDC50.81.31.72.63.956.8
ANNT−2.31.74.99.715.318.623.8
Table 18. Estimated result for the test data.
Table 18. Estimated result for the test data.
TAXON NAMEQuantiles (%)
01025507590100
BETURAGDC50.00.30.60.91.53.25.7
Estimated< 0.1 [0.1, 2.5] 3.15.6
CARYAGDC51.42.12.63.44.55.26.7
Estimated [0.1, 2.5] [3.1, 3.6]4.5[5.2, 5.5][6.7,7.0]
CASTANEAGDC51.42.22.83.74.65.26
Estimated[0.1, 2.5][2.7, 3.1]3.7[4.5, 4.8][5.2, 5.5][6.0, 6.5]
CAPRINUSGDC511.622.94.15.28.6
Estimated [0.1, 2.5][2.7, 3.1][3.7, 4.3][5.2, 5.5]8.5 <
TILIAGDC51.01.61.92.43.03.65.4
Estimated [0.1, 2.5] [2.7, 3.1][3.5, 3.6][5.2, 5.5]
ULMUSGDC50.81.31.72.63.956.8
Estimated [0.1, 2.5] [3.7, 4.3][4.5, 4.8][6.7, 7.0]
Table 19. Lookup Table for Cluster C1 = (AcW, AlW).
Table 19. Lookup Table for Cluster C1 = (AcW, AlW).
GDC5ANNTJANTJULT
[0.1, 0.1] [7.1, 7.1]
[0.1, 0.9][−12.2, 1.9][−30.5, −10.1]
[0.5, 0.5] [11.3, 11.5]
[0.7, 0.7] [11.8, 12.8]
[0.9, 1.1] [14.4, 15.6]
[1.1, 1.1][3.2, 4.2][−7.6, −6.9]
[1.6, 1.6][7.5, 7.6][−1.3, −0.8][17.5, 17.6]
[2.2, 2.2][10.3, 10.3][3.3, 3.3][19.9, 19.9]
[4.8, 4.8][18.7, 18.7][10.8, 10.8][28.3, 28.3]
[5.6, 5.6][20.5, 20.6][11.0, 11.0][29.2, 29.2]
Table 20. Lookup Table for Cluster C2 = (AcE, AlE, FE, JE, QE).
Table 20. Lookup Table for Cluster C2 = (AcE, AlE, FE, JE, QE).
GDC5ANNTJANTJULT
[0.1, 0.1][−10.2, −10.2] [7.1, 7.1]
[0.1, 0.6] [−30.9, −24.6]
[0.5, 0.5] [11.5, 11.5]
[0.5, 0.8][−4.4, −1.5]
[0.6, 0.6] [13.2, 13.2]
[0.8, 0.8] [−23.8, −22.7][13.5, 14.8]
[1.0, 1.0] [15.2, 15.2]
[1.0, 1.2][0.6, 1.3]
[1.0, 1.3] [−18.3, −14.6]
[1.1, 1.1] [16.5, 16.5]
[1.2, 1.2] [16.6, 16.6]
[1.3, 1.3][1.4, 1.4] [17.4, 17.4]
[1.5, 1.5][3.4, 3.8] [18.2, 18.4]
[1.5, 1.6] [−14.5, −12.3]
[1.6, 1.6][4.3, 4.3] [19.0, 19.0]
[1.9, 1.9][6.1, 6.1] [19.8, 19.8]
[1.9, 2.0] [−9.7, −8.0]
[2.0, 2.0][6.3, 6.9] [20.3, 20.5]
[2.4, 2.4][8.6, 8.6][−6.0, −6.0]
[2.4, 2.5] [22.1, 22.2]
[2.5, 2.5][9.1, 9.2][−5.4, −5.1]
[2.9, 2.9][11.2, 11.2][−2.8, −2.8][23.9, 23.9]
[3.1, 3.1][12.4, 12.4][−1.0, −1.0][24.7, 24.7]
[3.5, 3.5][14.1, 14.1][1.7, 1.7]
[3.5, 3.7] [25.7, 25.8]
[3.6, 3.6][14.4, 14.4][2.3, 2.3]
[3.7, 3.7][15.0, 15.0][3.7, 3.7]
[3.9, 3.9][15.5, 15.5][3.8, 3.8][26.4, 26.4]
[4.2, 4.2][16.4. 16.4][5.0, 5.0][26.9, 26.9]
[4.7, 4.7][17.6, 17.6][7.0, 7.0]
[4.7, 4.8] [27.3, 27.7]
[4.8, 4.8][17.9, 17.9][7.5, 7.9]
[5.2, 5.2][19.1, 19.1][9.5, 9.5]
[5.2, 6.8] [28.0, 29.5]
[5.9, 5.9][20.9, 20.9]
[5.9, 6.0] [12.4, 14.1]
[6.0, 6.0][21.4, 21.4]
[6.7, 6.7][23.2, 23.2][18.1, 18.1]
[6.8, 6.8][23.8, 23.8][18.9, 18.9]
[7.0, 7.0][24.2, 24.2][19.6, 19.6][31.8, 31.8]
Table 21. Lookup Table for Cluster C3 = (FW, JW, QW).
Table 21. Lookup Table for Cluster C3 = (FW, JW, QW).
GDC5ANNTJANTJULT
[0.3, 0.3][−1.5, −1.5][−12.0, −12.0][9.7, 9.7]
[0.9, 0.9][2.6,2.6][−7.4, −7.4][12.5, 12.5]
[1.4, 1.4][6.0, 6.0][−5.4, −5.4][16.2, 16.2]
[1.6, 1.6][7.3, 7.3][−1.3, −1.3][17.1, 17.1]
[2.0, 2.0][9.4, 9.5][−0.2, 0.2][18.0, 18.9]
[2.7, 2.7][11.5, 11.5]
[2.7, 3.0] [3.3, 3.5]
[2.7, 3.6] [20.0, 21.2]
[3.0, 3.0][12.6, 12.6]
[3.5, 3.5][14.1, 14.1][5.6, 5.6]
[3.6, 3.6][14.6, 14.6][6.8, 6.8]
[4.3, 4.3][16.3, 16.3][8.8, 8.8][22.7, 22.7]
[4.5, 4.5][17.2, 17.2][9.1, 9.1]
[4.5, 4.8] [24.2, 24.3]
[4.8, 4.8][17.9, 17.9][11.3, 11.3]
[5.4, 5.4][19.4, 19.4][12.5, 12.5][25.3, 25.3]
[5.5, 5.5][19.9, 19.9]
[5.5, 6.5] [14.7, 15.3][27.4, 30.4]
[6.5, 6.5][22.7, 22.7][18.4, 18.4]
[8.5, 8.5][26.6, 27.2][26.2, 26.2][31.3, 33.8]
Table 22. Estimated result for the test data by Lookup Table for cluster C2.
Table 22. Estimated result for the test data by Lookup Table for cluster C2.
TAXON NAMEQuantiles (%)
01025507590100
BETURAGDC50.00.30.60.91.53.25.7
Estimated<0.1[0.1, 0.5]0.5[0.8, 1.0][1.5, 1.6][3.1, 3.5][5.2, 5.9]
CARYAGDC51.42.12.63.44.55.26.7
Estimated1.5[2.0, 2.4][2.5, 2.9][3.1, 3.5][4.2, 4.7][5.2, 5.9][6.7,6.8]
CASTANEAGDC51.42.22.83.74.65.26.0
Estimated1.62.42.93.74.75.26.0
CAPRINUSGDC51.01.62.02.94.15.28.6
Estimated[1.0, 1.2]1.62.02.9[3.9, 4.2]5.27.0 <
TILIAGDC51.01.61.92.43.03.65.4
Estimated[1.0, 1.2]1.51.92.43.13.6[5.2, 5.9]
ULMUSGDC50.81.31.72.63.956.8
Estimated[0.5,0.8][1.3, 1.5][1.6, 1.9][2.5, 2.9][3.7, 3.9][4.8, 5.2]6.8
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ichino, M. The Lookup Table Regression Model for Histogram-Valued Symbolic Data. Stats 2022, 5, 1271-1293. https://doi.org/10.3390/stats5040077

AMA Style

Ichino M. The Lookup Table Regression Model for Histogram-Valued Symbolic Data. Stats. 2022; 5(4):1271-1293. https://doi.org/10.3390/stats5040077

Chicago/Turabian Style

Ichino, Manabu. 2022. "The Lookup Table Regression Model for Histogram-Valued Symbolic Data" Stats 5, no. 4: 1271-1293. https://doi.org/10.3390/stats5040077

Article Metrics

Back to TopTop