Finite Mixture at Quantiles and Expectiles

Furno, Marilena

doi:10.3390/jrfm18040177

Open AccessArticle

Finite Mixture at Quantiles and Expectiles

by

Marilena Furno

Department of Agricultural Sciences, University of Naples Federico II, 80055 Napoli, Italy

J. Risk Financial Manag. 2025, 18(4), 177; https://doi.org/10.3390/jrfm18040177

Submission received: 3 February 2025 / Revised: 17 March 2025 / Accepted: 25 March 2025 / Published: 27 March 2025

(This article belongs to the Special Issue Machine Learning-Based Risk Management in Finance and Insurance)

Download

Browse Figures

Versions Notes

Abstract

Finite mixture regression identifies homogeneous groups within a sample and computes the regression coefficients in each group. Groups and group coefficients are jointly estimated using an iterative approach. This work extends the finite mixture estimator to the tails of the distribution, by incorporating quantiles and expectiles and relaxing the constraint of constant group probability adopted in previous analysis. The probability of each group depends on the selected location: an observation can be allocated in the best-performing group if we look at low values of the dependent variable, while at higher values it may be assigned to the poorly performing class. We explore two case studies: school data from a PISA math proficiency test and asset returns from the Center for Research in Security Prices. In these real data examples, group classifications change based on the selected location of the dependent variable, and this has an impact on the regression estimates due to the joint computation of class probabilities and class regressions coefficients. A Monte Carlo experiment is conducted to compare the performances of the discussed estimators with results of previous research.

Keywords:

finite mixture model; quantile; expectile

1. Introduction

The work extends the finite mixture regression estimator (FM), which is generally computed at the conditional mean of the dependent variable. The goal is to estimate it in the tails of the conditional distribution, focusing on both low and high values of the dependent variable. This approach provides a powerful tool for uncovering tail behaviors, such as skewness and heterogeneity in response.

The FM regression approach (McLachlan & Peel, 2004) assumes that the population comprises many homogeneous subgroups. For each subgroup, a probability density function describes the data generating process, with each cluster characterized by a set of regression parameters specific to that group. The impact of the predictors varies among classes and these differences help determine the class membership. The data are assumed to be drawn from a population divided into clusters sharing similar characteristics. FM identifies homogeneous groups within the sample while computing the regression coefficients for each group. The class-specific coefficients are estimated without any prior knowledge or assumption of the clustering. FM is a model-based clustering approach designed to account for between-groups heterogeneity, where the population of interest is generated by homogeneous sub-populations. The resulting FM regressions are more flexible than the standard regressions that typically pool the subsamples. FM is commonly used as a flexible method to account for heteroskedasticity (Compiani & Kitamura, 2016). Neglecting group heterogeneity leads to misleading results since (i) the estimated coefficients would average the true values of each group without mirroring any of them, and (ii) the estimated error variance would be larger due to group heterogeneity, thus causing flawed inference (Van Horn et al., 2013).

The standard FM model computes class assignment and class-specific regressions in each group at the conditional mean of the dependent variable. Due to data heterogeneity, the sample may form several distinct clusters, and FM adaptively models multiple regressions, each one responsible for one subset/cluster of the data. These models have been widely used in market segmentation studies or patients’ disease progression subtyping (Städler et al. (2010), Khalili (2010), Khalili and Chen (2007)). A market segmentation approach is implemented by Betrail and Callavet (2008) for fruit and vegetable consumption, as well as by Caudill and Mixon (2016). Liang et al. (2018) extend the FM approach to improve robustness. Alfò et al. (2017) extend FM to quantile and M-quantile (Breckling & Chambers, 1988) models in a random coefficient setting. In this work, we extend FM by computing it away from the mean following the approach in Furno (2023) and Furno and Caracciolo (2024). They impose a constraint on the class probabilities fixing the group probability at the mean estimated values and shifting only the class regressions to the quantiles. In contrast, here, we relax this constraint, and show in a few case studies and in Monte Carlo simulations that relaxing the constraint provides additional information and more accurate results. By implementing tail estimators like expectiles (Newey & Powell, 1987) and quantiles (Koenker, 2005), we define the FM-expectile and the FM-quantile. These estimators differ from one another—expectiles are pure location devices used to move away from the mean regression. The advantages of expectiles over quantile regressions are (i) in their computational ease and (ii) in preventing regression crossing. The latter can occur in quantile regressions and are hard to explain. However, expectiles lack the robustness of quantiles, just as OLS is less robust than the median regression.

Compared to previous research (Furno, 2023; Furno & Caracciolo, 2024), the approach here considered allows for the flexibility of class membership to change at different locations, relaxing the constraint of stable class composition across locations. The probability of each group depends on the selected location—an observation can be in the best-performing group if we look at low values of the dependent variable, but the same observation can have a low probability of belonging to the best class at the right tail. The case studies here analyzed show that classes do change depending on the selected location of the dependent variable—whether we focus on low-, medium-, or high-performing observations—and this influences the class regression estimates as well. Previous research focused on the tail behavior of the sole group regressions while keeping group probability constant. This approach excludes by assumption any heterogeneity in the probability of belonging to a group. Here we relax this constraint. By allowing both group probability and group regression to change across locations, we avoid unnecessary constraints and gain a deeper understanding of the data. The probability does change across quantiles, and the novelty is in modeling the changing group probability together with the changing explanatory power of the covariates at differing locations, by implementing expectiles or quantiles estimators. The changing probability has an impact on the class regression estimates as well.

A Monte Carlo experiment compares the performance of the changing probability estimators defined in this study with that of the existing constant probability estimators. The relative bias is generally smaller in the current approach, yielding an improved performance of the FM tail estimators once the constraint of constant group probability is removed.

We examine two case studies: school data from the OECD-PISA math proficiency test and asset returns form the Center for Research in Security Prices. Both analyses highlight the utility of the proposed changing probability approach.

The school data set is considered to compare our findings with the Furno (2023) analysis, using the same data set of students’ scores in the PISA math test. There, the group probability is estimated once and for all at the mean, and kept constant when moving to the tails. In contrast, by lifting the constraint of group probability invariance, we show that class membership changes across quartiles. For instance, when examining the right tail to model the high-proficiency students, their math test scores are split into excellent versus just-above-average, and the difference turns out to be significant but rather small. In the left tail, looking at low-proficiency students, the groups collect just-below-average versus very poor performance, and their difference is nearly three times larger than the difference between the high-scoring groups. There is no reason to assume that the low-proficiency groups should have the same proportions/probabilities as the high-proficiency groups. Keeping the groups invariant is an approximation that we are now able to relax.

With financial data, we analyze asset performance both at the conditional mean and in the tails, at low, medium, and high returns. FM selects, without any ad hoc procedure, two classes of homogeneous assets defining two differing portfolios based on the response to the explanatory variables. This example shows once again the relevance of allowing composition to change across locations. At low returns, the groups show different proportions with respect to groups at high returns. At the lower tail the group difference is not statistically significant; although each group responds in its own way to the explanatory variables, the overall group difference is not statistically relevant, and the two portfolios exhibit comparable performance. However, at the mean and top quartile, one group significantly improves upon the other. Class 1 assets turn out to be preferable at and above the mean, while class 2 collects more risky assets. This highlights the validity of grouping data in more homogenous classes not just at the mean, but also—as in this case—at the upper tail, without imposing the same group composition at differing locations of the conditional distribution of the dependent variable. The group composition does change in moving from the left to the right tail, and the probability of belonging to the best portfolio increases as the location goes up at higher returns.

In summary, in the student proficiency case, the two groups of highly performing students show a minimal difference between one another, while a significant disparity exists among the groups at and below the mean for the lower-scoring students. In contrast, in the financial case, the two groups of lower-performing assets are quite similar, whereas the difference between groups becomes more pronounced at the top quartile. These examples show the presence of differing processes operating away from the mean. Uncovering differing behaviors enhances the understanding of the variables being investigated and can be a valuable insight for policymakers.

Relaxing the constant group probability constraint impacts not only the class probability, but also the estimated coefficients within each class. Indeed, the FM components, class probability and class regressions, are jointly estimated through iterations. Alternative approaches typically address these components separately, one at the time, first estimating the class composition and then separately computing the within-class regressions, possibly inducing econometric issues impairing the validity of the results.

Section 2 and Section 3 describe the FM estimator at the mean and in the tails, respectively. Section 4 and Section 5 report the Monte Carlo experiments and their results. Section 6 discusses two case studies, one on student proficiency and the other on portfolio choice. The final section concludes the analysis, while Appendix A reports the codes used to compute FM in the tails using both expectiles and quantiles.

2. Methods: The Finite Mixture Estimator

In the linear regression model, y_i is the dependent variable, x_i collects the ith observation of the k explanatory variables, u_i is the error term and β is the vector of regression coefficients. The standard finite mixture regression estimator relates the dependent variable y_i to x_i within each subgroup while a latent variable z_i denotes the subgroup to which each unit belongs. For the ith unit in the kth group, with k = 1, …, G, the linear regression model is

y_i = x_i′β_k + u_i

(1)

The latent variable z_i defines the probability of a generic unit belonging to the kth component, and it defines the π_k probabilities.

π(z_i) = exp(z_i)/Ʃ_k=1,G exp(z_i)

(2)

For each observation, the probability of inclusion in a group, π_k, is the function of the unobservable latent variable z_i. The latter captures population heterogeneity and assumes values 0 or 1 to define the exclusion or the inclusion of the ith observation in the kth group.

The likelihood combines the conditional likelihood of each class weighted by the associated probability.

\sum_{k = 1, G} π_{k} \prod_{i = 1, n} f (y_{i} | x_{i}, β_{k})

(3)

Unfortunately, we do not know which group an observation belongs to, and the observed sample is incomplete since the latent variable z_i is not observable. However, z_i can be approximated by a subset of group-specific explanatory variables x_iq, with q

\leq

p, and this subset is used to model the prior probability of component membership. A multinomial logit model computes π_k as

π_{k} = \exp (g (x_{iq})) / Ʃ_{k} \exp (g (x_{iq}))

(4)

where g(x_iq) is the function relating the probability of being in the kth class to the characteristics of the x_iq data set.1 It is a model-based clustering approach where the mixture proportions are defined by a logistic model, and the mixture components are linear regressions. The outcome variable distribution depends on both the covariate x_i and the latent cluster membership variable z_i, as approximated by x_iq.

The estimation relies on an iterative process. The Expectation-Maximization (EM) algorithm (Dempster et al., 1977) computes the probability of belonging to a given group by implementing Equation (4), which defines the probability weights. Next, the weighted regression parameters are estimated in (3), and the iterations update weights and regression coefficients until convergence is reached.

3. Method: The FM Estimator in the Tails

The purpose is to implement the finite mixture not only on average, at the conditional mean, but also in the tails of both components of the FM estimator. The FM group analysis reduces the heterogeneity of the data, selecting more homogeneous subsets. The tail analysis within FM considers the impacts of the explanatory variables away from the mean, at low/high values of the dependent variable within each class. In Furno (2023) and in Furno and Caracciolo (2024), the heterogeneity is modeled by splitting the sample into homogeneous sub-groups at the conditional mean. Once the clusters are defined, quantile regressions are estimated in the tails within each class, and the grouping does not vary across quantiles. This approach excludes by assumption any heterogeneity in the probability of belonging to a group. The probability may change as well according to the selected location. For instance, in one of the following case studies on students’ performance, the probability of being in the best group is higher in the left tail, at low scores, while it is lower in the right tail of the distribution. The probability undergoes changes, and the novelty is in modeling the changing group probability together with the changing explanatory power of the covariates at the differing locations, i.e., moving both probability and regression estimates away from the mean implementing expectiles or quantiles estimators.

3.1. FM-Expectiles

To move away from the average Equation (4), we consider expectiles (Newey & Powell, 1987) that provide a shifting weight to move the logistic model at various locations of the conditional distribution of the dependent variable. Equation (4) is modified to include an asymmetric weighting system w_i that moves the equation up or down toward the tails,

π_{k} = P (w_{i} z_{i} = 1 | X) = w_{i} \frac{\exp (x_{i}^{'} β_{k})}{{Σ \exp (x_{i}^{'} β_{k})}}

(5)

where the asymmetric weighting system used to define location is

w_{i} = {\begin{matrix} θ i f u_{i} > 0 \\ 1 - θ e l s e w h e r e \end{matrix}

,

θ

is the chosen location, and u is the regression error. For instance, to compute the

θ

= 25th expectile,

w_{i}

assigns weights 0.75 to those observations below the regression to attract the estimated equation toward the lower tail, and it assigns weights of 0.25 to the observations above it.2

The regression model within each group is computed away from the mean as well, and in the regression

\sum_{k} π_{k} f (y_{i} | x_{i}, β_{k}), π_{k} is approximated by its estimated values \hat{π_{k}} i n (5), \sum_{k} \hat{π_{k}} f (y_{i} | x_{i}, β_{k}),

and the standard linear regression OLS estimator is replaced by expectiles, with the following objective function

\sum_{y_{i} > {x_{i}}^{'} β_{k}} θ {(y_{i} - {x_{i}}^{'} β_{k})}^{2} + \sum_{y_{i} < {x_{i}}^{'} β_{k}} (1 - θ) {(y_{i} - {x_{i}}^{'} β_{k})}^{2}

(6)

The FM-expectile estimator iterates between Equations (5) and (6) until convergence, at the given location

θ

, thus computing both class probability and class regression coefficients in the tail. The location θ away from the mean affects both terms of the finite mixture estimator, the logit and the regression model. Equations (5) and (6) provide the FM-expectile estimator.3

3.2. FM-Quantile

Besides expectiles, it is possible to implement finite mixtures at the quantiles, which provide more robust results. Instead of Equation (6), the regression step is computed as

\sum_{y_{i} > {x_{i}}^{'} β_{k}} θ | y_{i} - {x_{i}}^{'} β_{k} | + \sum_{y_{i} < {x_{i}}^{'} β_{k}} (1 - θ) | y_{i} - {x_{i}}^{'} β_{k} |

(7)

Koenker (2005) replaced the L2 norm with the L1 norm, and the weights generated by this equation pre-multiply the data when implementing the FM routine.4 The class probability equation in (4) at the quantiles becomes

Q_{y} (θ) = [\exp ({x_{i}}^{'} β) y_{\max} + y_{\min}] / [1 + \exp ({x_{i}}^{'} β)]

(8)

to define the logistic quantile regression for outcomes that are bounded within y_max and y_min, where y_max and y_min are not the maximum and the minimum but the bounds of the interval of the outcome variable (Orsini & Bottai, 2011).5 The FM-quantile iterates between (7) and (8) until convergence.

4. Simulations

To better analyze the FM performance in the tails, the same simulation scheme as in Furno (2023) and in Alfò et al. (2017) is implemented.6 The model is a random coefficients equation, defined as

y_ik = (β₁ + b_1i) + (β₂ + b_2i) x_2ik + β₃ x_3ik + ε_ik, i = 1, …, n k = 1, …, G

(9)

with β₃ being the sole non-random term. The error term is in turn a standard normal or a Student t with 3 degrees of freedom, t₃; the β vector assumes values (β₁, β₂, β₃)′ = (100, 2, 1)′; b_1i and b_2i are individual specific random parameters distributed in turn as a standard normal or a Student t with 3 degrees of freedom; x_2ik is defined as the sum of two independent standard normal variables; x_3ik is a binomial B(100;0.5); the sample size is n = 100 and there are 1000 iterations for each experiment.

The first set of experiments consider the standard normal distribution for b_1i, b_2i and the error term. The second set of experiments considers Student t distributions for the errors ε_ik. The second set of experiments looks at the performance in cases of heavy tail distributions.

The estimator considered is the finite mixture model at the quantiles, together with its variants at the expectiles. The FM-quantile and FM-expectile estimators are computed at locations θ = 0.25 and 0.75, respectively at the first and third quartiles.

To compute the two finite mixture estimators of Equations (5)–(8), the data set is divided into k = 2 sub-groups; there is an estimated β vector for each sub-group, and the results are reported in Table 1. The relative bias and the average Akaike are reported together with their dispersion in the 1000 iterations.

Next, the standard normal error distribution is replaced by a 10% contaminated normal, with a contaminating distribution N(50, 100), while the Student t error distribution is replaced with a 10% contaminated Student t with a t₆ central distribution and a t₃ contaminating distribution. This set of experiments looks at the performances of the finite mixture estimators when there are two distributions generating the sample. This set of experiments is particularly suitable for the finite mixture estimators, with the central distribution describing one group and the contaminating distribution describing the other group of tail observations. Bartolucci and Scaccia (2005) specifically suggest implementing finite mixture regressions to model mixtures of normal errors. Of the two contaminating schemes selected, the contaminated normal defines a bimodal distribution with very different variances, while the Student t contamination defines a unimodal distribution.

5. Simulation Results

Table 1 presents the estimates computed in two homogeneous subgroups with two estimated β vectors, one for each group. The estimated coefficients do change, both across quartiles and between groups. The relative bias is generally small throughout the table, in both the FM-quantile and FM-expectile estimators, and when the errors are normal or follow a Student t distribution. Comparing these results with those reported in Furno (2023), Table 2, the constant term and the β₃ coefficients present much smaller relative biases; the β₂ relative bias is slightly larger, although comparable in many experiments, and its dispersion is generally smaller; the Akaike values are now notably smaller. Figure 1 reports the empirical distributions of the slope coefficients of Equation (9), in the experiment with normal distributions in the top graphs and in the case of Student t distributions. The β₂ + b_2i distributions are more dispersed than the β₃ distributions since the latter is not a random coefficient. The Student t experiments are more dispersed than the standard normal ones. In all the experiments of this table, at the first quartile, group 1 overestimates the true value and group 2 underestimates it; the opposite occurs at the top quartile, as can be seen in Figure 1.

In Table 2, the errors are generated by contaminated distributions. Here, the classes mirror the contamination. The sample is divided in two sub-groups yielding two vectors of estimated coefficients: one vector for the group generated by the central distribution and the other for the data generated by the contaminating distribution. The relative bias in this table is sizably smaller throughout than in Table 4 in Furno (2023) with the sole exception of the β₂ expectile results, which are slightly larger. Figure 2 reports the empirical distributions of the slope coefficients from the experiments with contaminated distributions. In this figure, the experiments with standard normal distributions exhibit the greatest dispersion, and the FM-expectile estimates are more dispersed than the FM-quantile results.

In all the experiments, the distributions of the non-random coefficient b3 = β₃ are less dispersed than the b2 = β₂ + b_2i distributions.

6. Case Studies

6.1. School Data

To look at the difference between this approach and that in the previous research by Furno (2023), the same data set is analyzed. The math scores of 15-year-old Italian students taking the OECD-PISA test on math proficiency in 2009 are considered. The math score (math in the table) is explained by school characteristics to assess the link between performance and school structures. We select only a small number of explanatory variables—field (academic and technical), school size (schlsize in the table), student–teacher ratio (stratio), and teacher shortage (tcshort)—to streamline the discussion of the results. The field is defined by the dummy variables academic and technical to separate students enrolled in vocational fields; school size collects the number of students in the school; student–teacher ratio is the ratio between number of students and teachers, and tcshort is an index of staff shortage. The sample comprises n = 26,240 observations and Table 3 reports the summary statistics. Table 4 collects the FM-quantile results at 0.25 and 0.75, together with the standard FM results at the center, at the conditional mean 0.50. The quantile weights all equal to 0.5 to define the median regressions, do not modify the FM results at the conditional mean.7

Students are grouped in two classes, with one group improving upon the other in math proficiency. The first line of the table reports the estimated group differences. Looking at the high-proficiency students, their math scores are split into excellent versus just-above-average, and the difference turns out to be significant but rather small, with class 2 improving upon class 1 by 0.2014, with z = 1.87. In the left tail, looking at low-proficiency students, the groups show just-below-average versus very poor performance, and their difference is almost three times as large as the above difference between the high-scoring students’ groups. Group 2 shows a worse performance x with respect to group 1 by −0.5899 (z = −6.80) at the first quartile; and group 2 significantly worsen with respect to class 1 by −1.39 (z = −2.47) at the mean. For the top math scores, these results are reversed: group 2 performs better than group 1, although by a small amount. The opposite occurs at and below the mean, with group 1 outperforming group 2 by a larger margin. The last two rows of this table report the latent class probability, i.e., the groups’ compositions’. These probabilities do change across locations, assuming an inverse u-shaped pattern in class 1, which is higher at the mean, coupled with a u-shaped pattern in class 2. At the mean, the class probabilities are 80% and 20%, just as in Furno (2023), but away from the mean the probabilities between the two classes are more balanced: respectively, 65% and 35% in the left tail that turns to 45% and 55% at the upper tail. The groups’ proportions do change across locations, and are more balanced in the tails than at the mean. This shows the relevance of shifting both components in FM, both the logit—defining the class probabilities—and the class regressions. In contrast to previous work, here, the location changes in the two FM components, shifting both the logit model and the regression model at the same location of the conditional distribution of the dependent variables.

The changing group probability impacts the final FM results. Group difference is statistically significant throughout, assuming great values at and below the mean, while declining at the top quartile: the worse performance of class 2 at and below the mean is reversed at the top quartile, with class 2 collecting the best-performing students.

In the table, school size is not significant for group 2 at and above the mean. Between groups, divergence due to curricula is quite sizable; the academic and technical tracks improve with respect to vocational schools, particularly for group 2. The widest group discrepancy occurs at the mean.

6.2. Financial Data

Data from for the Center for Research in Security Prices in year 2022 are here analyzed. Table 5 reports the summary statistics of the variables in the model. The yearly average of monthly total return (return in the table) in our model is related to its lagged value (return1)—its recent past; the yearly average of monthly market capitalization (capit)—its market value; earnings before interest (earning)— profitability; the yearly average of monthly return on the Standard and Poor 500 index (spreturn)—a benchmark of market performance, and book-to-market value (bookmkt) defined as the difference between total assets (asset) minus total liability (liability) divided by the yearly average of monthly market capitalization, bookmkt = (asset−liability)/capit. The analysis focuses on the ordinary shares, NS type, in the CRSP data set. In addition, the observations with very low market capitalization, those below the 5th percentile, are excluded from the sample. The usual approach in cross-section studies consists in grouping similar assets in a first stage to define portfolios. Here, the selection of assets into homogeneous classes/portfolios is model-based and performed by the FM estimator, avoiding hand-made ad hoc selections. Table 5 collects the summary statistics.

Table 6 collects the estimated regression coefficients computed, respectively, by OLS, median (Koenker, 2005) and robust regression (Huber, 1981). The three estimators present comparable values, not statistically different from one another, with book-to-market not statistically significant throughout. The sole exception is given by the capitalization coefficient, the market value, which is not significant in OLS and is significant elsewhere. This suggests the need for further scrutiny, and possibly some trimming. Instead of defining a subjective trimming criterion, the latter is defined by the robust regression estimator. The latter excludes 58 observations from the sample, anomalous values that are either too small or too large with respect to the bulk of data. The robust regression induces a 1.9% trimming, and the final sample size is n = 2912. Figure 3 compares the box plot of returns before and after trimming, and the impact of trimming is quite evident in the graphs.

Table 7 reports the expectile regression estimates at locations 0.25, 0.50 and 0.75 for low/mean/high returns, respectively. In this table, most coefficients are statistically significant and present a pattern across the various locations. Book-to-market and earnings increase their impact in moving to the 0.75th location, while spreturn and lagged return both decrease at and above the mean. Market capitalization shows a statistically significant inverse U pattern.

Recalling that expectiles are just a location device, and that they are not robust and may still be attracted by high and low observations, Table 8 reports the quantile results, which are generally more robust. The table shows a non-significant impact of book-to-market, while the other estimated coefficients are significant and generally decline across quartiles. The spreturn coefficient is smaller than 1 and can be interpreted as low dependence on market conditions, while a negative constant term turning to positive at the top quartile signals returns smaller than a safe asset but at 0.75. Earning and market capitalization have a small and more stable impact across quartiles.

Table 7 and Table 8 show that the estimates do change across expectiles and quantiles, with estimates statistically different across locations. They display a generally diminishing impact in moving from the left to the right tail.

Next, we look for the presence of more homogeneous subsets in the data by implementing FM, since Table 6, Table 7 and Table 8 do not consider grouping the data into classes, analyzing all the observations together. Furthermore, by saving the weights generated in the expectile/quantile regressions of Equations (6) or (7), it is possible to move FM to the tails. The tail results are obtained by weighing all the variables by the location terms generated when computing the regression away from the conditional mean.8 Thus, besides analyzing the sample in more homogeneous subsets, as in the standard FM approach, the location weights allow for moving both logit and regression components to the tails, yielding the FM-quantile and FM-expectile estimators.

Table 9 collects FM-quantile results at the 0.25 and 0.75 locations, to be compared with the standard FM results at 0.50 reported in the last columns of this table.9

We select two classes of homogeneous assets, defining two differing portfolios depending on their response to the explanatory variables, and we analyze them at various locations, looking at low, medium and high returns. If needed, it is possible to further refine the analysis by increasing the number of classes. Classes are model-based and are not forced to remain constant across locations, and this is the novelty of our approach. At low returns, we sort poor from just-below-average assets. Analogously, at high returns, FM-quantile sorts excellent from just-above-average assets. The top line of the table reports the total difference between classes/portfolios. While the class difference is not significant at the lower tail, 0.239 (z = 1.16), class 2 collects the worse-performing assets at and above the mean, with an increasing difference equal to −0.361 (z = −3.74) at the mean as computed by the standard FM, and to −0.679 (z = −5.18) at 0.75 in the FM-quantile case. Class 1 is characterized by a better performance at θ = 0.50 and 0.75, where the class difference is statistically significant and increases at the top quartile. The significance of the total difference between the two groups shows that analyzing data all together disregards relevant group discrepancies.

At the bottom of the table, in the last two rows, the probability of belonging to each class is reported. At θ = 0.25, class 2 shows higher probability, 56%. At the other locations the group probabilities are instead higher in class 1, reaching 59% at the mean and 66% at the top quartile for the best-performing group. In this table, the group probability is a function of the same explanatory variables of the regression model. Later, we will introduce other variables to define the class probability as differing from the regression explanatory variables, their lagged values.

So far, there are two factors confirming the validity of a group analysis at various locations, as follows: differing group probabilities across location, which show the importance of shifting the logit together with the regression component in FM; significant between-class differences at and above the mean.

Moving to the estimated coefficients, the standard FM results are reported on the right-hand side of this table. On average, the best-performing class 1 assets are characterized by a positive impact of lagged returns and earnings, negative market capitalization, and a negative constant. The spreturn and book-to-market estimates are not significant in this group. Class 2 shows a large positive spreturn, signaling a high response to market performance, and a positive constant term suggesting returns higher than a safe asset. These are typical features of riskier shares. All the other coefficients do not statistically differ from zero.

At the lower tail, the two groups behave similarly, and their overall difference is not significant. However, their estimated coefficients do differ. Class 1 is positively related to lagged returns and earnings, while class 2 is driven by earnings and by negative market capitalization.

At the top quartile, once again, class 1 collects the best-performing assets. The negative impact of capitalization is statistically significant, together with the positive impact of lagged returns and earnings. Class 2 shows positive book-to-market and spreturn values, while lagged returns and earnings have a negative impact.

Finally, the introduction of lagged book-to-market, lagged spreturn, and lagged market capitalization in the definition of the group probability, the x_iq set of variables in Equation (4), can further refine these findings.10 The impacts of these variables on the group probabilities in the logit model are reported in Table 10. The constant term, here representing the difference in class 2 with respect to class 1, is negative at and above the mean, while it is not significant at the lower quartile, confirming the findings in the top row of Table 9. When selecting lagged variables to explain FM class probability, these variables are all significant at the mean. Lagged capitalization is the sole significant factor at the first quartile. Past book-to-market and past capitalization are relevant at the top quartile. Lagged spreturn is not significant in defining class probability in the tails.

7. Conclusions

A finite mixture model is considered to divide data into homogeneous subsets without ad hoc intervention, employing a model-based clustering approach. FM assumes that the population consists of homogeneous subgroups, with a probability density function defining groups. Each of them is characterized by group-specific sets of regressions. The model involves two iteratively estimated components: one defining the group probability and the other computing the regression within each class. The impact of predictors varies between classes, and these differences determine the class membership.

The class analysis of FM is extended to the tails to define the FM-quantile and the FM-expectile estimators. The novelty of this proposal lies in allowing both class probability and class-specific regressions to vary across quantiles or expectiles. Previous research defines the FM-quantile by constraining the groups to have the same average class probabilities at all locations. By relaxing this constraint, both class probability and class regressions coefficients shift at the selected quantile or expectile. In the case studies analyzed, class probabilities do vary across locations, and the class composition has an impact on the class regression estimates as well, due to the iterative Expectation-Maximization algorithm.

A Monte Carlo experiment compares results with those from Furno’s (2023) constant probability analysis. The relative biases and the average Akaike Information Criterion in the changing probability experiments are notably smaller than in the previous research based on constant probability.

The case studies analyze data on students’ proficiency in a math test and data on assets returns and portfolio selection. In both studies, the class composition changes across locations. In the students’ performance case, on average, most students are gathered in the best group—80%. In the tails, the best group remains the majority, but the class probabilities are more balanced, accounting for 64% in the left tail and 55% in the right tail. At higher scores, the two groups show only slight differences, but the gap between groups widens at and below the mean, particularly among the low-scoring students.

In the asset portfolios analysis, class composition does change as well. For the best-performing class 1, the probability increases from 44% in the left tail to 59% at the mean and 66% in the right tail, showing a rising trend across quartiles. Changing the class proportions does change the class regressions estimates since the two FM components, class probability and class regressions, are jointly estimated in the FM approach. In this case study, the group difference is not significant at low returns, while it gets larger and more significant at, and more so above, the mean. The class 2 portfolio turns out to collect more risky assets.

In the student proficiency case, the two groups of high-performing students show minimal differences, while a significant group disparity occurs at and below the mean. In the financial case, the groups of lower-performing assets are quite similar, while the difference between groups becomes more pronounced at the top quartile. Uncovering these differing behaviors enhances the understanding of the variables being investigated and can offer valuable insights for policymakers.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be available on request.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Stata Codes

reg y x1 x2
predict resols, resid
*FM at the mean
fmm 2: regress y x1 x2
*Akaike and log likelihood
estat ic
*probability to belong to a group
estat lcprob
*expectile regression and definition of weights	*first quartile regression weights
reg y x1 x2 +	qreg y x1 x2, q(0.25) +++
gen eweight = 1	predict qres, resid
scalar tau = 0.25	gen qweight = 0.25 if qres > 0 & qres!=.
gen wy = eweight*y	replace qweight = 0.75 if qres < 0 & qres!=.
gen wx1 = eweight*x1
gen wx2 = eweight*x2
forvalues i = 1(1)8 {
predict resid, resid
replace eweight = 2*tau if resid >= 0
replace eweight = 2*(1-tau) if resid < 0
replace wy = eweight*y
replace wx1 = eweight*x1
replace wx2 = eweight*x2
reg wy wx1 wx2 ++
drop resid
}
*expectile FM	*quantile FM
fmm 2: regress y x1 x2 [pw = eweight]	fmm 2: regress y x1 x2 [pw = qweight]
estat ic	estat ic
estat lcprob	estat lcprob
+ can be replaced by logit d x1 x2
++ can be replaced by logit wd wx1 wx2	+++ can be replaced by lqreg qd qx1 qx2
where d is a location dummy assuming value equal to 1 at the chosen expectile/quantile, zero otherwise.
To select the variables defining the group probabilities the FM code becomes
fmm 2, lcprob (z1 z2): regress y x1 x2 [pw = weight]
where z1 and z2 are the explanatory variables defining the group probability. They can or cannot differ from the variables of the regression model. The weight within square brackets is qweight or eweight depending on the selected estimator, FM-quantile or FM-expectile.

Notes

1	The variables defining the group probability could differ from the variables defining the regression model. In the case studies, the explanatory variables of $π_{k}$ and $y_{i}$ coincide. However, the financial model has been re-estimated by setting lagged variables as predictors of group probability in x_iq.
2	The stata code to compute the weights in the expectile logistic regression of Equation (5) is logit eweight-depvar eweight-indepvar within the iterative loop that computes the location expectile weights; eweight are defined by $w_{i}$ = $θ$ or $w_{i}$ = 1 − $θ$ as in (5) and they modify the regression variables into eweight-depvar and eweight-indepvar, as in (6) (see the Appendix A).
3	In the expectile case, the regression code to compute (6) is reg eweight-depvar eweight-indepvar within the iterative loop that computes the location weights (see the Appendix A).
4	The stata code in the quantile regression case of Equation (7) at the $θ$ quantile is qreg depvar indepvar, q( $θ$ ), and the residuals allow us to define the quantile weights (see the Appendix A).
5	In the quantile case, the stata code at the θ quantile is lqreg depvar indepvar, q(θ) (Orsini & Bottai, 2011), and the residuals allow us to define the weights (see the Appendix A).
6	The simulations and all the empirical analyses are implemented in Stata, version 15.
7	We compute only FM-quantile for comparability’s sake.
8	The weights can be determined starting from the quantile/expectile logit model as well.
9	We present the FM-quantile due to its robustness. The FM-expectile results are available on request.
10	The full set of results is available on request.

References

Alfò, M., Salvati, N., & Ranalli, M. G. (2017). Finite mixtures of quantile and M-quantile regression models. Statistics and Computing, 27, 547–570. [Google Scholar] [CrossRef]
Bartolucci, F., & Scaccia, L. (2005). The use of mixtures for dealing with non-normal regression errors. Computational Statistics and Data Analysis, 48, 821–834. [Google Scholar] [CrossRef]
Betrail, P., & Callavet, F. (2008). Fruit and vegetable consumption: A segmentation approach. American Journal of Agricultural Economics, 90, 827–842. [Google Scholar]
Breckling, J., & Chambers, R. (1988). M-quantiles. Biometrika, 75, 761–771. [Google Scholar] [CrossRef]
Caudill, S. B., & Mixon, F. G., Jr. (2016). Estimating class-specific parametric models using finite mixtures: An application to a hedonic model of wine prices. Journal of Applied Statistics, 43(7), 1253–1261. [Google Scholar] [CrossRef]
Compiani, C., & Kitamura, Y. (2016). Using mixtures in econometric models: A brief review and some new results. The Econometrics Journal, 19, C95–C127. [Google Scholar] [CrossRef]
Dempster, P., Laird, N., & Rubin, D. (1977). Maximum likelihood for incomplete data via the EM algorithm. Journal of the Royal Statistical Society (B), 39, 1–38. [Google Scholar] [CrossRef]
Furno, M. (2023). Computing finite mixture estimators in the tails. Journal of Classification, 40, 267–297. [Google Scholar] [CrossRef]
Furno, M., & Caracciolo, F. (2024). Finite mixture model for the tails of distribution: Monte Carlo experiment and empirical applications. Statistical Analysis and Data Mining, 17, 1–15. [Google Scholar] [CrossRef]
Huber, P. (1981). Robust statistics. Wiley. [Google Scholar]
Khalili, A. (2010). New estimation and feature selection methods in mixture-of-experts models. The Canadian Journal of Statistics, 38, 519–539. [Google Scholar] [CrossRef]
Khalili, A., & Chen, J. (2007). Variable selection in finite mixture of regression models. Journal of the American Statistical Association, 102, 1025–1038. [Google Scholar] [CrossRef]
Koenker, R. (2005). Quantile regression. Cambridge University Press. [Google Scholar]
Liang, J., Chen, K., Lin, M., Zhang, C., & Wang, F. (2018). Robust finite mixture regression for heterogeneous targets. Data Mining and Knowledge Discovery, 32, 1509–1560. [Google Scholar] [CrossRef]
McLachlan, G., & Peel, D. (2004). Finite mixture models. Wiley Series in Probability and Statistics. John Wiley & Sons. [Google Scholar]
Newey, W., & Powell, J. (1987). Asymmetric least squares estimation and testing. Econometrica, 55, 819–847. [Google Scholar] [CrossRef]
Orsini, N., & Bottai, M. (2011). Logistic quantile regression in stata. The Stata Journal, 11, 327–344. [Google Scholar] [CrossRef]
Städler, N., Bühlmann, P., & van de Geer, S. (2010). L1-penalization for mixture regression models. Test, 19, 209–256. [Google Scholar] [CrossRef]
Van Horn, L., Smith, J., Fagan, A., Jaki, T., Feaster, D., Hawkins, D., & Howe, G. (2013). Not quite normal: Consequences of violating the assumption of normality in regression mixture models. Structural Equation Modeling, 19, 227–249. [Google Scholar] [CrossRef]

Figure 1. Empirical distributions in 1000 iterations of the slope regression coefficients within each group. The b2 distributions, with b2 = β₂ + b_2i, are more dispersed than the distributions of the non-random b3, where b3 = β₃, and the Student t experiments are more dispersed than the standard normal ones.

Figure 2. Empirical distributions in 1000 iterations of the regression slope coefficients within each group. The contaminated normal distributions are highly dispersed, significantly more than in the Student t experiments. The b2 distributions, with b2 = β₂ + b_2i, are more dispersed than the distributions of the non-random b3, where b3 = β₃.

Figure 3. Total returns box plot before and after 1.9% trimming driven by the robust regression.

Table 1. Standard normal and Student t, t(3); two groups; n = 100; 1000 replicates.

Standard Normal Errors
quartiles	0.25				0.75
	group1		group2		group1		group2
FM-quantile
	relative bias std. d.		relative bias std. d.		relative bias std. d.		relative bias std. d.
beta1	−0.0126	0.063	0.0038	0.061	0.0066	0.055	0.0120	0.062
beta2	0.1178	0.461	−0.1258	0.432	−0.0847	0.435	0.0837	0.457
beta3	0.0021	0.125	−0.0111	0.119	−0.0101	0.109	−0.0006	0.124
Akaike	167.3 7.90				171.2 7.86
FM-expectile
beta1	−0.0146	0.063	0.0041	0.061	0.0063	0.052	0.0085	0.059
beta2	0.0871	0.437	−0.0684	0.407	−0.0969	0.411	0.0807	0.419
beta3	0.0058	0.128	−0.0124	0.122	−0.0093	0.105	0.0058	0.117
Akaike	344.4 16.6				343.6 17.1
Student t distributions
quartiles	0.25				0.75
	group1		group2		group1		group2
FM-quantile
	relative bias std. d.		relative bias std. d.		relative bias std. d.		relative bias std. d.
beta1	−0.0197	0.164	0.0041	0.133	0.0137	0.150	−0.0089	0.261
beta2	0.0739	0.594	−0.0599	0.571	−0.0737	0.520	0.0584	0.693
beta3	0.0100	0.333	−0.0098	0.271	−0.0259	0.308	0.0498	0.515
Akaike	180.1 9.21				184.4 9.61
FM-expectile
beta1	−0.0225	0.148	0.0124	0.139	0.0078	0.151	0.0148	0.310
beta2	0.0873	0.560	−0.0749	0.551	−0.0786	0.490	0.0563	1.05
beta3	0.0161	0.296	−0.0271	0.278	−0.0120	0.296	0.0002	0.649
Akaike	368.7 20.0				369.4 18.5

Table 2. Contaminated error distributions; two groups; n = 100; 1000 replicates.

10% Normal Contamination, N(0, 1) in the Center and N(50, 100) in the Tails
quartiles	0.25				0.75
	group1		group2		group1		group2
FM-quantile
	relative bias std. d.		relative bias std. d.		relative bias std. d.		relative bias std. d.
beta1	0.0007	0.162	0.0101	0.076	0.0052	0.061	0.0075	0.067
beta2	−0.1422	0.803	0.0865	0.457	−0.0919	0.447	0.0752	0.503
beta3	0.0009	0.290	0.0024	0.151	−0.0067	0.123	0.0101	0.175
Akaike	171.8 8.32				171.5 8.64
FM-expectile
beta1	−0.3055	3.70	0.6246	3.95	0.4283	1.95	0.1866	4.06
beta2	0.8358	9.41	−1.112	13.8	−0.7338	5.07	1.288	13.3
beta3	0.5214	7.35	−0.7367	7.85	−0.6463	3.62	0.9026	8.07
Akaike	502.07 83.40		83.40		406.7 42.6		42.6
10% contaminated student t distributions, t(6) in the Center and t(3) in the Tails;
quartiles	0.25				0.75
	group1		group2		group1		group2
FM-quantile
	relative bias std. d.		relative bias std. d		relative bias std. d.		relative bias std. d.
beta1	0.0034	0.057	0.0103	0.060	0.0085	0.057	0.0118	0.063
beta2	−0.0999	0.441	0.0988	0.436	−0.0881	0.432	0.0863	0.437
beta3	−0.0032	0.113	0.0023	0.120	−0.0133	0.114	−0.0002	0.123
Akaike	171.6 7.94				171.8 8.08
FM-expectile
beta1	−0.0162	0.158	0.0024	0.093	−0.0004	0.112	0.0028	0.071
beta2	0.0417	0.578	−0.0818	0.464	−0.0859	0.403	0.0700	0.397
beta3	−0.0031	0.318	−0.0192	0.185	−0.0072	0.231	0.0117	0.139
Akaike	351.7 19.0				354.4 18.4

Table 3. Summary statistics, n = 26,240.

	Mean	Std. Dev.
math	493.768	84.7263
academic	0.4654867	0.4988168
technical	0.3281504	0.4695487
stratio	9.121897	2.767687
schlsize	682.1716	371.274

Table 4. Finite mixture at the quantiles, n = 26,240.

Location	0.25		0.50 = Mean		0.75
	Coef.	z	Coef.	z	Coef.	z
ΔClass 2	−0.5899	−6.80	−1.397	−2.47	0.2014	1.87
Class 1
math	Coef.	z	Coef.	z	Coef.	z
academic	64.45824	17.75	47.84108	2.91	68.32732	15.92
technical	51.78727	20.66	31.22054	2.31	54.06143	18.37
stratio	3.789255	6.42	3.169064	4.05	3.558821	4.98
schlsize	0.0187885	6.09	0.0278323	6.20	0.0209033	5.72
constant	314.1871	66.01	396.9153	70.34	358.4312	53.76
Class 2
math	Coef.	z	Coef.	z	Coef.	z
academic	86.71918	23.99	204.0552	13.34	86.39819	27.73
technical	63.84786	17.24	175.2696	9.45	63.24565	18.47
stratio	2.606343	4.89	2.250182	1.50	2.579255	5.98
schlsize	0.0090863	2.07	−0.0287043	−1.22	0.0062238	1.59
constant	445.1051	70.89	375.9389	33.63	484.1338	107.46
	log likelihood	Akaike	log likelihood	Akaike	log likelihood	Akaike
	−308,776.3	617,578.7	−820,835.8	1,641,698	−308,039.4	616,104.9
Latent class probability
Class 1	0.64335		0.80175		0.44982
Class 2	0.35665		0.19825		0.55018

Table 5. Summary statistics, year 2022, n = 2970.

	Mean	Std. Dev.
return	−0.02873	0.07589
asset total	12,578.51	111,076.1
liability total	9851.449	100,613.1
capit	1.02 × 10⁷	7.23 × 10⁷
bookmkt	0.000662	0.001517
earning	829.399	4983.197
spreturn	−0.015748	0.00923

Table 6. Estimated coefficients, n = 2970.

	Robust Regression		OLS		Median
return	Coef.	t	Coef.	t	Coef.	t
bookmkt	0.0599	0.10	0.95327	1.05	0.47756	0.71
capit	−5.42 × 10⁻¹¹	−2.47	−4.84 × 10⁻¹¹	−1.51	−6.50 × 10⁻¹¹	−2.77
return1	0.05523	4.41	0.06891	3.77	0.06097	4.55
earning	1.31 × 10⁻⁶	5.62	1.36 × 10⁻⁶	2.92	1.55 × 10⁻⁶	4.55
spreturn	0.57730	5.62	0.78418	5.24	0.56197	5.12
constant	−0.01267	−6.42	−0.01567	−5.45	−0.00890	−4.22
R²/Pseudo R²			0.0174		0.0077