1. Introduction
Symbolic Data Analysis (SDA), first introduced by [
1], focuses on analyzing complex data types that can encompass multiple values, distributions, or more generalized forms rather than traditional single-valued types. Unlike classical random variables, which take on a single, specific value, symbolic random variables can represent a range of values, leading to an internal distribution that captures uncertainty or variability within the data. The foundation of SDA relies on the concept that exploratory analyses and statistical inferences are often more meaningful at the group level rather than examining individual values in isolation [
2].
SDA emphasizes the importance of group-level summaries—known as symbols—as the primary units of statistical analysis. These symbols, often represented as intervals, histograms, or other distributional forms that occupy a hypercube in
(as opposed to considering values within
single-valued data), serve as the foundation for statistical symbolic data [
3,
4].
Interval-valued data, a specific type of symbolic data, provide a structured way to represent information that naturally falls within ranges rather than precise point values. This approach has gained significant attention recently due to its ability to manage uncertainty and variability more effectively than traditional point-based methods. As discussed by [
2,
3,
5], interval-valued data offer a versatile framework accommodating both univariate and multivariate forms, making them highly suitable for a wide range of real-world applications in various applied sciences.
The attractiveness of interval-valued data lies in their simplicity, flexibility, and broad applicability, driving extensive research on their statistical analysis. Recent studies have investigated both frequentist and Bayesian methods for constructing estimators tailored to interval-valued data (e.g., [
5,
6]. Moreover, innovative statistical techniques have been developed to explore relationships within interval-valued datasets. For example, ref. [
7] introduced a Principal Component Analysis (PCA) method designed to represent intervals, while other methods have refined regression techniques to directly handle intervals. Some researchers have focused on extracting numerical characteristics from intervals, such as the midpoint [
8], the midpoint and range [
9], or the lower and upper limits [
10], then applying these attributes to standard regression models. Alternatively, other researchers have chosen to work with intervals directly, treating them as the core unit of analysis without converting them into numerical summaries (e.g., [
11,
12]).
Complex data can also be studied in the context of random matrices, which extend the principles of univariate and multivariate analysis to matrix-based data structures. Unlike conventional multivariate methods that treat data as vectors, matrix-variate approaches are particularly well-suited for the analysis of data that naturally take the form of matrices, which are common in multidimensional settings. These methods are effective for the modeling of complex dependencies where multiple variables are inter-related and organized into rows and columns, capturing two-dimensional structures. Applications of matrix-variate random variables are found in various fields, such as econometrics, multivariate analysis, and machine learning, where they offer sophisticated approaches for the handling of relational data (see, e.g., [
13], and references therein).
Matrix-variate methodologies offer significant advantages for the analysis of complex data, as they effectively capture correlations both within rows and across columns of a matrix. This ability allows for the simultaneous analysis of multiple variables while accounting for their interdependencies. Methods such as the use of matrix-variate normal distributions and matrix regression models provide robust frameworks for investigating relationships at both the row and column levels. This dual-level analysis is especially valuable in practical applications involving relational data, such as the joint analysis of multiple financial indicators in econometrics or understanding covariance structures among feature matrices in machine learning. For more information, see [
14].
With the increasing complexity and dimensionality of modern datasets, both matrix-variate random variables and interval-valued data provide flexible frameworks to tackle these challenges. Interval-valued data handle uncertainty and variability by representing values as intervals rather than fixed points, effectively accommodating the imprecision common in real-world data. In contrast, matrix-variate methods are adept at modeling complex dependencies within matrix-structured data. Combining these two approaches—matrix-variate methods and interval-valued data—offers a powerful strategy for the analysis of large, complex datasets, allowing for a more comprehensive modeling of uncertainties and dependencies. This integration enhances the flexibility and robustness of statistical models, expanding their applicability to more intricate, high-dimensional scenarios.
This paper aims to explore the integration of matrix-variate and interval-valued data approaches through both frequentist and Bayesian methods. By examining how these methodologies can be combined, it seeks to develop a comprehensive strategy that enhances the analysis of complex data structures, accommodating both uncertainty and multidimensional dependencies. The focus is on leveraging the strengths of both methods—frequentist methods for their precision and objectivity and Bayesian methods for their flexibility and incorporation of prior knowledge—to create robust inferential frameworks. This dual approach is designed to be highly applicable across a range of scientific and practical fields, such as econometrics, machine learning, and bioinformatics, where complex and high-dimensional data are commonly encountered. Ultimately, this study aims to provide new insights and tools for researchers and practitioners to better understand, model, and interpret intricate datasets.
The rest of this paper is organized as follows. We begin with some preliminaries and definitions regarding matrix-variate and tensor distributions in
Section 2. We continue by defining likelihood functions and Maximum Likelihood (ML) estimators, along with their asymptotic distributions.
Section 3 is dedicated to defining priors, posteriors, and Bayes estimators under the Frobenius norm loss function, as well as the corresponding dominance results. In
Section 4, we present simulations to demonstrate the superiority of the proposed estimators and to validate the theoretical results.
Section 5 involves an analysis of temperature variations across different seasons using a dataset comprising interval-valued temperature matrices. Finally, concluding remarks are provided in
Section 6.
2. Likelihood Function of Interval-Valued Matrices and ML Estimators
Before defining the likelihood function for interval-valued random variables, we introduce the tensor Wishart and inverse tensor Wishart distributions, which generalize the classical Wishart distribution. For a more detailed discussion, see [
13,
15].
Definition 1 (Tensor Wishart Distribution)
. Let be a positive definite tensor, where for each fixed , matrix slice is positive definite. We say that follows a tensor Wishart distribution, as denoted by , where m represents the degrees of freedom and is the scale tensor. The probability density function (PDF) is given bywhere and refer to the determinant and trace of matrix , respectively. The generalized multivariate gamma function for tensors is denoted by and is expressed asThe degrees of freedom must satisfy to ensure that each matrix slice () is invertible. The expectation of is .
Definition 2 (Inverse Tensor Wishart Distribution)
. Let , where the inverse is defined slice-wise, i.e., for each i, . Then, follows an inverse tensor Wishart distribution, as denoted , where is the scale tensor. Its PDF is given bywhere and refer to the determinant and trace of matrix , respectively. The expectation of is , which holds for . In Definition 2, if
is a vector, then it corresponds to
, and its PDF is given as
with
for
.
To establish the structure of our model, we now consider the following interval-valued random matrix:
where each element forms an interval
with
for
and
. This setup generalizes the concept of an interval-valued random vector of size
, which represents a hyper-rectangle (or box) in
. When
is a matrix of size
, where each element is an interval, the geometric interpretation becomes more complex. In this case, each row of the matrix can be viewed as a
vector, with each element representing an interval in
. Consequently, each row represents a hyper-rectangle in
, and the entire matrix (
) can be interpreted as a collection or product of
p such hyper-rectangles.
Overall, matrix represents a generalized hyper-rectangle (or hyper-parallelepiped) in , where each interval contributes to the overall dimensionality of the object. This matrix structure introduces additional complexity by capturing variability across both rows and columns, resulting in a richer and more intricate geometric representation compared to the vector case.
Given that matrix
in (
3) contains aggregated observed values over uniformly distributed intervals
, with
for
and
, we extend the results reported by [
5,
16,
17] regarding the existence of a one-to-one correspondence between
and
, where
and
represent the mean matrix and the variance–covariance tensor of the internal distribution, respectively, and are expressed as
where
and
for all
and
.
Let
for
be a random matrix with the PDF denoted by
. The associated parameters,
and
, as defined in Equations (4) and (5), vary and take different values. The PDFs of these parameters are given by
where
represents the matrix normal distribution with a mean matrix (
) of size
, a row covariance matrix (
) of size
, and a column covariance matrix (
) of size
with the following PDF:
and
represents the tensor Wishart distribution with degrees of freedom
m and scale tensor
as in (
1).
2.1. ML Estimators and Related Properties
For a given tensor (
), which can be decomposed into a product of matrices (
), we introduce the
determinantal product, which is defined as the product of the determinants of these matrices as follows:
where
denotes the determinant of matrix
. Assuming that
and
are independent, the likelihood is given by
, where
where
The ML estimators of the unknown mean matrix (), the covariance matrices ( and ), and the tensor () are presented in the following theorem.
Theorem 1. The maximum likelihood (ML) estimators for the matrix parameters (, , and ) and the tensor parameter () are given by Proof. To find the ML estimator for
, we differentiate the log-likelihood function (
) with respect to
and set it to zero as follows:
Solving for
, we obtain (
12). For
and
, we differentiate
with respect to
and
and solve the resulting equations, yielding (13) and (14), respectively.
For the tensor parameter (
), we differentiate the log-likelihood function (
) with respect to each slice (
) of
, set it to zero, and solve
Multiplying by
, rearranging terms, and solving for
, we obtain
Thus, the MLE for the tensor parameter
is given by
□
2.2. Asymptotic Properties of ML Estimators
Theorem 2. Consider , the set of and real symmetric matrices, and let represent the subset consisting of symmetric positive definite matrices, forming convex regular cones. Setting , we havewhere is the Fisher information matrix with elements given byand .
Proof. The Fisher information matrix for matrix-variate distributions (see, e.g., [
13]) is more precisely characterized by the covariance tensor of the score (gradient of the log-likelihood function). Specifically, if we let
denote the matrix normal distribution as in (
8), then the Fisher information matrix is given by
Here, the expectation is assumed with respect to the distribution parameterized by . Note that the covariance is represented in terms of a tensor product, emphasizing that the covariance structure involves interactions between the matrices.
Following the theory presented by [
18,
19], the Fisher information matrix for the matrix normal distribution can be broken down using the Kronecker product properties of the row and column covariances. For matrix-variate distributions parameterized by an
m-dimensional vector (
, where
and
and
are the row and column covariance matrices corresponding to the remaining parameters), the Fisher information matrix components are given by
These components follow from the matrix differentiation results that account for the second-order derivatives and trace operations needed to properly capture the covariance interactions.
To verify that the stationary point corresponds to a maximum of the log-likelihood function, we conduct a second derivative test by examining the Hessian matrix of the negative log-likelihood function at the stationary point. If this Hessian is negative definite, it confirms that the point is a maximum. Given that the Fisher information matrix () reflects the curvature of the log-likelihood function, its negative definiteness guarantees that our stationary point is, indeed, a local maximum.
Therefore, the Fisher information matrix for the matrix normal distribution respects the Kronecker product structure and tensor properties of the distribution, as detailed by [
13,
19]. This completes the proof. □
3. Bayesian Setup
In a Bayesian setup, we begin by defining independent prior distributions for the unknown matrix and tensor parameters in models (6) and (7). Using the determinantal product defined in (
9), the prior distribution is given by
where
is the mean matrix;
and
are the row and column covariance matrices, respectively; and
represents the tensor of interest.
Given the independent likelihood functions in (10) and (11), the joint posterior distribution of the matrices (
) and the tensor (
) can be written as
where
and
are the scatter matrices for row and column covariances, respectively.
The following lemma provides the full conditional posterior distributions associated with the posterior distribution in (17).
Lemma 1. The full conditional distributions associated with the posterior distribution in (17) are given bywhere is the matrix normal distribution, and are inverse Wishart distributions, and is the inverse tensor Wishart distribution. Proof. From the joint posterior distribution in (17), it can easily be seen that
thus, the full conditional distribution for
is
where
is the sample mean.
Next, the full conditional for
comes from the part of the posterior involving
where
. This corresponds to an inverse Wishart distribution in (
2), so we have
Similarly, the full conditional for
is derived as
where
. Hence, the full conditional for
is
Finally, the full conditional for
is derived from the likelihood involving
in model (7). This likelihood is proportional to an inverse tensor Wishart distribution (Definition 2). Specifically,
which leads to the full conditional for
as in (21). □
3.1. Loss Functions
One of the most common loss functions for estimating a matrix (
) using
is the Frobenius norm, which is given by
where, for a given matrix (
), the Frobenius norm is defined as
with
representing the elements of matrix
. The Frobenius norm can be interpreted as the square root of the sum of the squares of all the entries in the matrix. It corresponds to the Euclidean norm when the matrix is flattened into a vector. For more information, see [
20,
21].
For the tensor-valued case, the tensor Frobenius norm generalizes the matrix Frobenius norm. If
and
are tensors of the same dimensions, the loss function is given by
where the tensor Frobenius norm for a tensor (
) is defined as
The interpretation is similar to that of the matrix case; it is the square root of the sum of the squares of all the tensor elements. This generalization is crucial for the extension of matrix-based methods to higher-order data structures such as tensors.
3.2. Bayes Estimators
Theorem 3. Consider models (6) and (7) and prior (16). The Bayes estimators of matrix parameters , , and (with respect to the Frobenius norm loss in (22)) and tensor (with respect to the entropy loss function in (23)) are given by Proof. The Bayes estimators are derived from the expectations of the posterior marginal distributions of the parameters. Based on the posterior distribution (18), one can easily show that the posterior marginal distribution for
is
where
represents a matrix T distribution with a mean matrix (
), row covariance matrix (
), column covariance matrix (
), and
degrees of freedom, with the PDF given by
The expectation of the matrix T distribution in (29) is
, yielding (24).
Next, the posterior marginal distribution for
is
where
is the inverse Wishart distribution. The expectation of the inverse Wishart distribution is
, which proves (25).
Similarly, it can be shown that
Hence, the proof of (26) is complete.
To derive the Bayes estimator for the tensor parameter (
), note that the posterior distribution for
given as
follows an inverse tensor Wishart distribution as follows:
and the expectation of the inverse tensor Wishart distribution is
which completes the proof for (27). □
In statistical estimation, evaluating the efficacy of Bayes estimators compared to ML estimators is crucial, especially under non-informative priors. The next theorem investigates the performance of Bayes estimators under a proposed prior distribution for matrix and tensor parameters compared to ML estimators. Our goal is to demonstrate that, under the Frobenius norm loss function, Bayes estimators exhibit dominance over their ML counterparts.
Theorem 4. Under the assumptions of Theorem 3, the Bayes estimators of parameters , , and given in Equations (25)–(27) dominate the ML estimators obtained by Theorem 1.
Proof. To prove that the Bayes estimators dominate the ML estimators under the Frobenius norm loss, we need to compute the differences in risk between the Bayes and ML estimators for the matrix parameters (
and
) and the tensor parameter (
). These differences in risk are given by the following expected losses:
For the matrix parameter (
), the ML and Bayes estimators are given by (13) and (25), respectively. The expected loss (under Frobenius norm squared) for the ML estimator is
and for the Bayes estimator, we have
Since , it follows that .
Similarly, using the ML and Bayes estimators given by (14) and (26), respectively, we have
and
Thus, the difference in risk is
Since , it follows that .
For the tensor parameter (
), the ML and Bayes estimators are given by (15) and (27), respectively. The expected loss for the ML estimator is
where
. For the Bayes estimator, it is
Thus, the difference in risk is
Since , it follows that .
Therefore, the Bayes estimators for , , and dominate the ML estimators under the Frobenius norm loss. □
4. Simulation Results
In this section, we present simulation results to evaluate the performance of the ML estimators (Equations (12)–(15)) and Bayesian estimators (Equations (24)–(27)) for interval-valued matrix data under models (6) and (7). The parameters of interest include the row covariance matrix (), the column covariance matrix (), and the variance–covariance tensor (). To assess the performance of these estimators, we compute the expected loss (also known as risk) under the Frobenius norm loss functions given by (22) and (23) for matrix and tensor estimators, respectively.
The simulation setup involves generating
iterations with sample sizes of
and 250 of interval-valued matrices with known matrix and tensor parameters, as shown below.
It can be observed that
. The ML and Bayesian estimations of the parameters, with
m selected as
, for each sample size are provided in
Table 1,
Table 2 and
Table 3.
Table 4 illustrates the risk functions of the ML and Bayes estimators under the Frobenius norm loss for different sample sizes (
). In
Table 4, we observe that for small sample sizes (
), the Bayesian estimator generally outperforms the maximum likelihood (ML) estimator, particularly for the row and column covariance matrices (
and
). As the sample size increases to
and beyond, the risk values for both ML and Bayes estimators decrease, indicating improved performance. Additionally, the difference between the risks of the two estimators narrows, showing that their performances become more similar. This convergence is due to the consistency property of ML estimators, which ensures asymptotically good performance. As more data become available, the influence of prior information in the Bayesian approach diminishes, leading both estimators to yield similar results.
5. Real Application Example
The dataset consists of interval-valued (minimum and maximum) temperature matrices representing temperature variations across different seasons—winter, spring, summer, and fall—i.e., , over a 20-year period spanning four time intervals, namely 1950–1969, 1970–1989, 1990–2009, and 2010–2023, (), in Asian countries.
In other words, for each country, the data are arranged in a matrix format where the rows correspond to seasons and the columns correspond to different time periods spanning multiple years. Each matrix element is an interval representing the minimum and maximum temperatures recorded during the corresponding period. The data are sourced from the Berkeley Earth Surface Temperature Study and can be accessed at
Kaggle.
Table 5 illustrates the dataset’s structure.
Table 6 presents the ML and Bayesian estimations for matrix parameters
,
, and
.
Table 7,
Table 8,
Table 9 and
Table 10 provide estimations for the tensor parameter (
), with each table showing one season’s component of the tensor. Note that
m is set to
.
While we observed the superiority of Bayes estimators over ML estimators for the matrix parameters ( and ) and the tensor parameter (), we further assessed the difference in the average length of the corresponding 95% intervals for each element of the parameters. Specifically, we calculated the average length of the 95% confidence intervals obtained from the ML estimation as and, similarly, the average length of the 95% credible intervals from the Bayesian estimation as . Here, and represent the upper limits of the corresponding intervals for element , while and represent the lower limits. Each parameter matrix can be denoted generically as , for . The results indicate that the average length of the credible intervals from the Bayesian estimation is consistently shorter than that of the ML confidence intervals, confirming that Bayesian estimators provide narrower intervals. This finding is consistent with our previous simulation results, which demonstrated the superior performance of Bayesian estimation.
6. Concluding Remarks
In this study, we introduced the concept of interval-valued random matrices, representing a significant advancement at the intersection of symbolic data analysis and matrix theory. This novel approach offers a robust framework for statistical modeling, particularly in contexts characterized by complex and large datasets where traditional methods may fall short.
One of the strengths of our model is its ability to leverage both frequentist and Bayesian approaches for statistical inference. We have demonstrated that, especially when the sample size is small, Bayesian estimators dominate maximum likelihood estimators under the Frobenius norm loss function. Moreover, the asymptotic distribution of the ML estimators was established. This work paves the way for further research and application across various fields, encouraging the exploration of similar frameworks to handle symbolic data. The use of non-informative priors enhances decision making in complex analytical scenarios, ultimately improving the quality of statistical modeling in diverse disciplines.
A critical application explored in this paper involves analyzing temperature variations across different seasons using a dataset of interval-valued temperature matrices. Specifically, we examined seasonal temperature ranges (winter, spring, summer, and fall) across four distinct 20-year periods from 1950 to 2023. This practical example underscores the versatility of interval-valued random matrices, demonstrating their capacity to capture and analyze variability in temperature data.