Previous Article in Journal
I(2) Cointegration in Macroeconometric Modelling: Tourism Price and Inflation Dynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Panel Variable Selection Under Model Uncertainty for High-Dimensional Data

Faculty of Economics, Chiang Mai University, Chiang Mai 50200, Thailand
*
Author to whom correspondence should be addressed.
Econometrics 2026, 14(1), 3; https://doi.org/10.3390/econometrics14010003
Submission received: 23 October 2025 / Revised: 25 December 2025 / Accepted: 30 December 2025 / Published: 4 January 2026

Abstract

Selecting the relevant covariates in high-dimensional panel data remains a central challenge in applied econometrics. Conventional fixed effects and random effects models are not designed for systematic variable selection under model uncertainty. In addition, many existing models such as LASSO in machine learning or Bayesian approaches like model averaging, Bayesian Additive Regression Trees, and Bayesian Variable Selection with Shrinking and Diffusing Priors have been primarily developed for time series analysis. This paper develops and applies Bayesian Panel Variable Selection (BPVS) models to simulation and empirical applications. These models are designed to assist researchers in identifying which input covariates matter most, while also determining whether their effects should be treated as fixed or random through Bayesian hierarchical modeling and posterior inference, which jointly accounts for variable importance ranking. Both the simulation studies and the empirical application to socioeconomic determinants of subjective well-being show that Bayesian panel models outperform classical models, especially in terms of convergence stability, predictive accuracy, and reliable variable selection. Classical panel models, in contrast, remain attractive for their computational efficiency and simplicity. The Hausman test is used as a robustness check. The study adds an econometric approach for dealing with model uncertainty in high-dimensional panel analysis and offers open-source R 4.5.1 code to support future applications.

1. Introduction

The problem of selecting most relevant covariates in defining true panel models has become more and more imperative in applied econometrics. With the rapid growth of large-scale financial, socioeconomic, and behavioral datasets, academic researchers and data scientists are confronted with high-dimensional panels where the number of independent variables may be large, their importance uncertain, and their associations with the dependent variable potentially heterogeneous across units and time (Baltagi, 2021; Hsiao, 2014). In such cases, conventional modeling strategies such as pooled ordinary least squares (OLS), fixed effects (FE), and random effects (RE) models have two main limitations. First, they do not provide systematic procedures for variable selection. Second, they are sensitive to model uncertainty. As a result, empirical findings can be fragile depending heavily on the researcher’s prior choice of covariates (Wooldridge, 2010). This specifies the necessity of variable selection step in dynamic panel models beyond theoretical coverages. For example, having too many irrelevant regressors can lead to overfitting and unreliable estimation, while excluding the important covariates may generate omitted variables bias. When models include lagged dependent variables, they add additional sources of persistence and endogeneity while expanding the dimensionality of the model. Therefore, the key econometric task is to identify covariates that are both statistically and economically meaningful while appropriately accounting for dynamic dependence and unobserved heterogeneity (Baltagi, 2021).
There are some econometric approaches that have been developed outside the panel modeling. Machine learning techniques such as the Least Absolute Shrinkage and Selection Operator (LASSO; Tibshirani, 1996) and its extensions shrink coefficients and remove irrelevant predictors. In Bayesian econometrics, econometric models such as Bayesian model averaging (BMA; Hoeting et al., 1999), Bayesian Additive Regression Trees (BART; Chipman et al., 2010), and Bayesian Variable Selection with Shrinking and Diffusing Priors (BASAD; Narisetty & He, 2014) handle model uncertainty in the time series data. However, their applications are limited to panel-specific features such as unit heterogeneity and time dependence, while they ignore to account for hierarchical assemblies and the difference between fixed and random effects. In empirical practice, researchers are required to choose between two extremes. Conventional FE and RE models are computationally efficient and widely applied but lack systematic tools for ranking the importance of covariates or assembling the important variables. This gap demonstrates the need for new methods, and we introduce Bayesian panel variable selection (BPVS) models in this paper to address that. Specifically, the BPVS framework systematically identifies important covariates and determines whether their effects should be treated as fixed or random through flexible prior specifications and posterior inference, thereby explicitly accounting for model uncertainty. These models are designed to help researchers (i) determine the important covariates influencing the dependent variable of the study, (ii) evaluate whether effects should be treated as fixed or random, and (iii) define an appropriate model averaging strategy under model uncertainty in high-dimensional panel data.
This study makes two main contributions to the field of applied econometrics. First, it provides the first Bayesian panel variable selection methods. This makes it possible to address model uncertainty in a way that fits the hierarchical and dynamic structure of panel data. Second, the proposed approach effectively identifies important covariates, which is essential not only for correctly interpreting heterogeneity across firms or countries but also for defining the most appropriate panel model averaging procedure. In addition, we display how BPVS models can be applied in practice to uncover the important socioeconomic determinants of subjective well-being across 89 countries. The remainder of this paper is organized as follows. Section 2 introduces the proposed BPVS models. Section 3 presents the simulation studies. Section 4 presents an empirical application to subjective well-being data. Section 5 discusses practical implications, and Section 6 concludes the paper with directions for future research.

2. Bayesian Panel Variable Selection (BPVS) Models

Panel data, which consists of repeated observations for multiple cross-sectional units over time, provides a good base for analyzing dynamic relationships while accounting for unobserved heterogeneity. However, not all covariates may have a significant effect on the response in many cases. Plus, including irrelevant variables may lead to bias estimates. Therefore, panel variable selection is an essential step in empirical model construction that is to identify a subset of most relevant covariates while properly accounting for both the temporal and cross-sectional dimensions of the data.

2.1. Classical Panel Estimation Framework

Consider a panel dataset of N individuals observed over T periods. Let y i t indicate the outcome variable for individual i = 1, …, N at time t = 1, …, T, and let x i t = ( x i t , 1 , , x i t , K ) be the K-dimensional vector of explanatory variables. The panel variable selection aims to determine the most relevant covariates affecting the outcome variable y i t .
The general panel model can be expressed as:
y i t = β 0 + k = 1 K β k x i t , k + μ i + ε i t
where y i t denotes the dependent variable for unit i at time t, x i t , k represents the k-th explanatory variable, β k is the slope parameter, μ i is unobserved unit-specific heterogeneity, and ε i t ~ N 0 , σ ε 2 is an error term.
Under the fixed effects (FE) model, μ i is treated as a parameter to be estimated while controlling for all time-invariant heterogeneity. Under the random effects (RE) model, it is assumed that
μ i ~ N 0 , σ μ 2 ,   C o v μ i , x i t = 0
The FE model is performed using the within transformation, which demeans each variable by its individual-specific mean using the OLS estimator, same as pooled ordinary least squares (OLS). The pooled OLS model is computationally simple but fails to account for individual-specific effects. And the RE is estimated through the Generalized Least Squares (GLS) estimator. The selection between FE and RE is generally determined by the Hausman test (Hausman, 1978). To generalize this, the linear mixed effects (LME) model can be implemented for multiple levels of random effects and hierarchical modeling.

2.2. Bayesian Panel Estimation Framework

While classical panel models offer robust point estimates, the Bayesian approach provides the full probabilistic inference of model parameters. In Bayesian modeling, the coefficients are allocated priors that enable variable selection and shrinkage. Extending panel model in Equation (1) into a Bayesian framework, we specify:
y i t = β 0 + k = 1 K β k x i t , k + μ i + ε i t
With prior distributions for the regression coefficients assuming normal priors as
β k ~ N ( 0 , τ k 2 ) ,   ε i t ~ N 0 , σ ε 2
where τ k 2 controls the degree of the shrinkage on each coefficient β k and can be tuned to reflect prior beliefs about parameter relevance.
The joint posterior distribution is obtained via Bayes’ theorem.
p ( θ | D ) p D θ p ( θ )
where θ = ( β 0 , β k , μ , σ μ 2 , σ ε 2 ) , and D indicates the observed data. Finally, the posterior inference can be computed using Markov Chain Monte Carlo (MCMC) simulations.
In this study, the posterior predictive distribution of outcome y i t is obtained as:
p y i t D = p y i t θ p θ D d θ
It is sampled through posterior draws y ~ i t ( s ) ~ p y i t θ ( s ) for s = 1, …, S.
The model predictive performance was evaluated by the Root Mean Squared Error (RMSE) and Log Pointwise Predictive Density (LPPD) shown as follows:
L 0 R M S E = 1 n i = 1 N t = 1 T ( y i t y ^ i t m e d ) 2
L 0 L P P D = i = 1 N t = 1 T l o g 1 S s = 1 S p y i t y ~ i t s , σ ^
where y ^ i t m e d = m e d i a n y ~ i t s is the posterior predictive median, p y i t y ~ i t s , σ ^ is the predictive density from the posterior draw using the Poisson distribution, and σ ^ is the residual standard deviation obtained from the model fit.

2.3. Computing Variable Importance for Variable x j

To estimate the importance of variable x j , we compute how much predictive performance deteriorates when x j is randomly permuted by breaking its relationship with y i t .
Let X π j denote the dataset where the j-th variable is permuted across observations:
X π j = ( x 1 , , x j , π , , x p )
where π is a random permutation of the indices 1, …, n. We then compute posterior predictions under the permuted data:
y ~ i t , π j s = f ( X π j , i t , θ s )
We evaluate the permuted predictive performance L π j using the same metric, as shown in Equations (9) and (10).
The permutation procedure breaks the relationship between covariate x j and the outcome y i t by randomly reshuffling the values of x j across observations while keeping all other covariates fixed. This ensures that any predictive contribution of x j is removed, so that the resulting deterioration in predictive performance can be attributed solely to the loss of information from x j . When predictors are highly correlated, unconditional permutation may distort the joint dependence structure among covariates and lead to biased importance measures. To address this issue, we implement a correlation-based conditional permutation scheme. Specifically, each covariate x j is permuted only within strata defined by highly correlated variables C j = { x k : ρ j k > τ } . This two-level stratification approximately preserves dependence within correlated covariate groups, thereby reducing instability and bias arising from multicollinearity.
The importance score for variable x j is defined as the change in predictive loss before and after permutation:
I m p x j = { L 0 L P P D L π j L P P D ,   i f   L P P D   m e t r i c   i s   u s e d L π j R M S E L 0 R M S E ,   i f   R M S E   m e t r i c   i s   u s e d
To reduce randomness, the permutation is repeated B times:
I m p ^ x j = 1 B b = 1 B I m p b x j ,   σ I m p ^ x j = 1 B 1 b = 1 B ( I m p b x j I m p ^ x j ) 2
Hence, the distribution of LPPD permutation scores can be approximated as:
I m p ^ x j ~ N μ I m p x j , σ I m p 2 x j
where μ I m p x j = E b L P P D 0 L P P D π j b and σ I m p 2 x j = v b L P P D 0 L P P D π j b .
For linear mixed effects model, we separate this variance into fixed and random:
σ I m p 2 x j = ( σ ^ I m p f i x e d x j ) 2 + ( σ ^ I m p r a n d o m x j ) 2  
The empirical 90 percent confidence interval was used.
C I 0.05,0.95 x j = Q 0.05 I m p 1 : B x j , Q 0.95 I m p 1 : B x j
where Q p ( . ) indicates the p-th quantile.
Then, the variance importance for observation (i,t) was computed based on RMSE and LPPD metrics, as shown in Equations (17) and (18).
I m p i t b , R M S E x j = ( y i t y ~ i t , π j m e d , b ) 2 ( y i t y ~ i t m e d ) 2
I m p i t b , L P P D x j = log p ^ i t ( 0 ) l o g ( p ^ i t , π j 0 )
where p ^ i t 0 = 1 S s p ( y i t | y ~ i t s , σ ^ ) . Also, when predictors are correlated, unconditional shuffling may break joint dependence. Thus, we implemented a conditional permutation by permuting x j within bands defined by correlated variables C j = { x k : p j k > τ } , which then formed a two-level stratification as:
s i = 1 ( r o w M e a n ( { x i , k : K C j } ) > m e d i a n   o f   r o w M e a n s )
x j , π | C j = p e r m u t e x j s i { l o w , h i g h }
where permute x j within each stratum s i so that dependencies within C j are approximately preserved. This can lessen the potential bias from multicollinearity.
Once all I m p ^ x j are computed, the relative importance of predictors was ranked.
R a n k   x j = r a n k ( I m p ^ x j )

3. Simulation Studies

3.1. Data Generation and Model Specification

The primary objective of the simulation experiment is to evaluate the performance of the Bayesian fixed effects (BFE) and Bayesian random effects (BRE), classical fixed effects (FE), classical random effects (RE), classical pooled ordinary least squares (OLS), and linear mixed effects (LME) models. We can then evaluate the best models to design for panel variable selection. In this work, we simulate panel datasets with N = 100 and 200 units observed at T = 10 and 20 time periods. Three covariates are generated with x 1 ~ N 0,1 ,   x n ~ U n i f o r m 1,1 , and x 3 ~ N 5,2 , with unit-level random intercepts for each unit μ i ~ N 0 , σ μ 2 , where σ μ = 2 . The response variable y i t is structured according to a nonlinear data-generating process as:
y i t = 3 · sin x 1 , i t + 2 · x 2 , i t 2 + μ i + ε i t ,   ε i t ~ N 0,1
This specification is designed to test the availability of panel variable selection under both nonlinear covariate effects and cross-sectional dependence assuming the realistic scenarios. In this way, we can also observe if x 3 , i t can still have a causal effect on simulated y in our models.
We considered both classical models and Bayesian hierarchical models; the classical panel methods comprised pooled ordinary least squares (pooled OLS), within (fixed effects) and random effects models implemented using the “plm” R package, while the linear mixed effects model was estimated using the “lme4” R package. On the other hand, Bayesian hierarchical panel models were implemented via the “brms” R package, with the “cmdstanr” R package backend. These Bayesian models incorporated both fixed and random effects for better comparison with classical FE and RE models. In addition, Bayesian FE and RE provide flexible modeling of nonlinear relationships through the inclusion of important covariates. Weakly informative priors were applied to let the data to govern inference, and posterior distributions were sampled using two Markov chains of 1000 iterations each with 500 warm-ups.
The performance of classical and Bayesian models was systematically evaluated using multiple dimensions. The goodness-of-fit was computed using R 2 . Parameter estimates were evaluated by comparing estimated means against the true values generated in the simulation. While the stability of Bayesian models was weighed across repeated simulation runs, diagnostic checks were conducted to certify that model assumptions were satisfactorily satisfied. Residual analyses focused on normality and homoscedasticity in detecting potential misspecifications or violations that could affect modeling inference. Plus, the several metrics, namely, prediction accuracy, residual normality, residual homoscedasticity, model fit quality, convergence, computational efficiency, and interpretability, were used to assess the practical feasibility of each modeling approach.
From the simulations, variable importance or rank was measured through an advanced permutation-based method adapted for Bayesian modeling. This approach involved permuting the values of each input predictor while preserving correlations among covariates exceeding a threshold 0.3, thereby isolating the contribution of individual covariates to predictive performance. To ensure stability, each permutation was repeated 20 times, and results were summarized using means, standard deviations, and 90% credible intervals. Local importance metrics at the observation level further facilitated the detection of unit-specific contributions.

3.2. Simulation Results

This subsection presents the estimation results from two sample simulations. Table 1 reports the estimation results from the classical panel models including the pooled OLS, fixed effects (FE), random effects (RE), and linear mixed effects (LME) estimators for simulated datasets with N = 100, T = 10 and N = 200, T = 20. In the small sample size, the covariate x 1 consistently exhibits a strong positive and significant effect on Y across all classical models. The nonlinear specified term I ( x 2 ) 2 is significant in every model that reveals the intended quadratic relationship. In contrast, x 3 shows a negative and insignificant influence on Y. Goodness-of-fit measures show that pooled OLS and classical RE perform best compared to FE and LME models. For the second panel (simulation 2), the results remain robust and consistent. The estimated effect of x 1 increases marginally to approximately 1.84 across all models, while I ( x 2 ) 2 remains statistically significant with coefficients near 1.95. The covariate x 3 continues to exhibit a weak and insignificant effect on Y. The model fit improves slightly for the larger sample dataset, showing a stable performance as the sample size rises. Table 2 shows the posterior estimates from the Bayesian fixed and random effects models for the same simulated datasets. The results are remarkably consistent with those obtained from the classical estimators. For both sample sizes, the means of x 1 and I ( x 2 ) 2 are statistically significant and closely approximate their true values, with posterior standard errors remaining small. The posterior distributions offer full uncertainty quantification and model convergence as well as effective sample sizes to ensure the stable and reliable Bayesian inference.
Model diagnostics and predictive performance of each model are provided in Table A1 (see Appendix A) and Table 3. The results indicate Bayesian models achieve excellent predictive accuracy, while computational efficiency favors classical models. In sum, it highlights the strength of Bayesian panel modeling. Therefore, we estimate panel variable selection using Bayesian fixed effects and random effects models. While the BFE model also performs well, BRE demonstrates the best model for the simulated panels. Figure 1 and Figure 2 present the simulated variable importance rankings and corresponding posterior mean importance scores under Bayesian fixed effects and Bayesian random effects models. The results show that x 1 and x 2 consistently exhibit substantially higher importance than x 3 , particularly under the random effects specification. This pattern is fully consistent with the data-generating process, in which x 1 and x 2 enter the outcome equation, while x 3 has no causal effect. Figure A1 and Figure A2 provided in Appendix B compare the parameter estimates for fixed effects and random effects across two simulated panels. They confirm that x 3 has no significant causal effect on y in our models. Next, we estimate Bayesian PVS models to check the variable importance as simulated, and the results confirm that only x 1 and x 2 are important variables or covariates. To double validate the choice of Bayesian random effects model, we further conducted the Hausman test as a robustness check (see Table 4). For two panels, the chi-squared statistics are small and provide weak evidence against the null hypothesis that the random effects are consistent. Thus, these results reveal the BRE model as the optimal choice for panel variable selection. Methodologically, the simulation studies have confirmed that Bayesian panel models provide reliable inference under unobserved heterogeneity.

4. Empirical Study

This section presents the empirical application of the proposed Bayesian Panel Variable Selection (BPVS) models to examine the effects of socioeconomic factors on subjective well-being across 89 countries as a focus. The analysis focuses on identifying the most influential determinants of subjective well-being, also referred to as temporary happiness consistent with Pastpipatkul and Ko (2025) while accounting for cross-country heterogeneity, time variation, and model uncertainty. To double validate the findings in the simulation studies, we preliminarily compared the performance of classical and Bayesian estimators through model diagnostics before proceeding to the identification of important variables using Bayesian fixed effects and random effects models.

4.1. Data and Preliminary Analysis

In this study, we used the secondary data obtained from the World Happiness Report and World Bank. Table 5 provides the symbols, their measurement units, and data sources for full transparency. The dependent variable, subjective well-being, studies individual-level life satisfaction on a point scale and is widely accepted in cross-country happiness research. The independent variables were selected to echo both economic and social dimensions of well-being. Economic indicators include per capital GDP (INCOME) and real GDP growth rates (GDPRs), which capture the dynamic of individual and national wealth. Social indicators include healthy life expectancy at birth (HEALTH), freedom to make life choices (FREEDOM), social support (SUPPORT), generosity (GENEROSITY), and perception of corruption (CORRUPTION). Macroeconomic control variables are consumer price inflation (INFLATION) and unemployment (UNEMPLOYMENT). The initial inclusion of these explanatory variables is justified on theoretical and prior empirical research. The existing empirical literature has frequently stated that subjective well-being is multidimensional, which is influenced not only by income and GDP growth alone but also by health, social, governance, and households’ conditions.
Table A3 (see Appendix A) shows the correlation matrix of the covariates. INCOME and GDPR are highly negatively correlated (−0.988), which may cause multicollinearity. SUPPORT and HEALTH show moderate positive correlation (0.692). Most other correlations are low, suggesting limited linear dependence. Multicollinearity can reduce the precision of coefficient estimates. The Bayesian Panel Variable Selection (BPVS) model helps address this issue. By selecting the most important variables, it reduces redundancy and improves model stability. This ensures more reliable estimation of the effects of socioeconomic factors on subjective well-being. Table A4 (see Appendix A) presents the estimation results from classical panel models, including pooled OLS, fixed effects, random effects, and linear mixed effects models. Across these models, FREEDOM, HEALTH, and CORRUPTION consistently show strong associations with subjective well-being. INCOME, GDPR, SUPPORT, GENEROSITY, INFLATION, and UNEMPLOYMENT exhibit varying significance depending on the model specification. Table A5 (see Appendix A) reports the results from Bayesian fixed effects (BFE) and Bayesian random effects (BRE) models. Bayesian estimates provide posterior means, credible intervals, and convergence diagnostics. FREEDOM and CORRUPTION remain the most influential factors for SWB. HEALTH and SUPPORT also show positive effects, while INFLATION and UNEMPLOYMENT have negative associations.
Table A2 (see Appendix A) summarizes the comparative diagnostics of the models. Bayesian models (BFE/BRE) exhibit excellent predictive accuracy, stable convergence, and high interpretability. Classical models show moderate predictive performance and some issues with residual homoscedasticity. Computational efficiency is higher for classical models, but Bayesian models provide richer uncertainty quantification and better handling of cross-country heterogeneity. Overall, Bayesian models outperform classical estimators in terms of predictive accuracy, parameter uncertainty quantification, and model reliability. These results demonstrate the value of Bayesian variable selection for identifying the most important determinants of SWB.

4.2. Empirical Results

Table 6 presents the variable importance ranking from Bayesian fixed effects (BFE) and Bayesian random effects (BRE) models. HEALTH and FREEDOM are consistently identified as the top two determinants of subjective well-being (SWB) in both models. SUPPORT ranks third in BFE but drops to ninth in BRE, highlighting the effect of model specification on variable importance. CORRUPTION, INCOME, INFLATION, UNEMPLOYMENT, GDPR, and GENEROSITY are ranked lower that indicate relatively smaller effects on SWB. These differences illustrate the value of DPVS in accounting for model uncertainty as it identifies which variables have the most influential factors driving SWB. Next, we conducted a Hausman test (Hausman, 1978) to determine whether fixed effects or random effects, which is widely used to specify the appropriate panel specification. Table 7 reports the Hausman test results for choosing between the fixed effects and random effects models. The chi-squared statistic is 21.28 with 9 degrees of freedom and a p-value of 0.01146. The corresponding minimum Bayes factor (mBF) is 0.1392, which indicates weak Bayesian evidence against the null hypothesis. So, it supports the use of a fixed effects for this dataset. However, the Hausman test only evaluates the overall choice between FE and RE. It may be unreliable if effects vary across countries. DPVS does not require pre-selecting FE or RE. It determines for each variable whether a fixed or random effect is more appropriate and ranks variable importance. The DPVS results show that HEALTH and FREEDOM have strong effects across countries. INCOME, UNEMPLOYMENT, and GDPR vary more across countries. This highlights which factors are universally important and which are context-specific. DPVS captures both robust and heterogeneous effects. But we estimate the effects of top four independent variables on subjective well-being following Hausman test results. Table A6 (see Appendix A) shows the Bayesian fixed effects estimation results. We found that FREEDOM is the strongest positive and significant factor. This reveals that the ability to make life choices greatly improves temporary happiness. Policies promoting personal autonomy may therefore enhance the overall well-being of populations. HEALTH also has a positive and significant effect. Good physical and mental health support daily functioning and life satisfaction. Investments in healthcare and preventive programs can strengthen SWB. SUPPORT contributes positively and significantly. Good social networks improve resilience and life satisfaction. Strengthening social safety nets and community programs can enhance SWB. CORRUPTION has a negative and insignificant effect. High perceived corruption undermines trust and reduces life satisfaction. Anti-corruption policies and transparent governance are important for maintaining SWB. These findings are also consistent with prior work by Pastpipatkul and Ko (2025) that social factors are as important as money to improve the overall self-reported life satisfaction across different income economies.

5. Discussion

The Bayesian Panel Variable Selection (BPVS) models offer a practical solution for high-dimensional and nonlinear data analysis. The BPVS are designed to help researchers in order to identify the most important covariates while accounting for unobserved heterogeneity and model uncertainty. Unlike classical fixed effects or random effects models, BPVS systematically ranks variables and quantifies their relevance using posterior inclusion probabilities and permutation-based posterior predictive importance measures. This makes it easier to distinguish strong predictors from irrelevant covariates. The method also accommodates both fixed and random effects, which provide the flexibility to model unit-specific differences across individuals, firms, or countries. Both simulation studies and the empirical study reveal that Bayesian models achieve stable convergence and accurate parameter estimates. They also maintain strong predictive performance even under nonlinear covariate relationships and cross-sectional dependence. Empirical results for subjective well-being further demonstrate the practical implication of the model. Social and economic factors such as freedom, health, and social support emerge as the top determinants, which highlight the importance of input social factors affecting subjective well-being. This approach effectively handles multicollinearity issues while reducing redundancy among correlated independent variables and improving the robustness of model inference. For applied researchers, this method could be one of the efficient tools to adopt because it integrates nonlinear variable selection, hierarchical modeling, and uncertainty quantification. In sum, this variable importance ranking method has great potential to bridge a gap in applied econometrics for Bayesian inference, model averaging, or variable selection for high-dimensional panel data.

6. Concluding Remarks

This paper develops Bayesian Panel Variable Selection (BPVS) models for high-dimensional and nonlinear panel datasets so as to address model uncertainty and multicollinearity issues. This approach determines the most relevant covariates or independent variables and classifies whether their effects should be fixed or random. Thereby, the study adds a new method to applied econometrics that addresses model uncertainty, hierarchical structure, and cross-sectional heterogeneity simultaneously for the panel study. We encourage econometricians and applied econometrists to further this work in several directions. Researchers could practically apply these models to the fields of policy analysis, behavioral economics, or health economics where high-dimensional panels are common. Data and R codes are provided in the supplementary files. We suggest developing open-source R packages to make the BPVS models widely accessible. Such package could integrate ready-to-use functions for Bayesian variable selection, permutation-based variable importance, and model diagnostics, which would reorganize applied research. In addition, future studies can combine BPVS with other machine learning techniques or nonparametric methods such as maximum likelihood estimation to handle nonlinearity more flexibly. These extensions could establish Bayesian panel variable selection as a standard tool in econometric modeling.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/econometrics14010003/s1.

Author Contributions

Conceptualization, methodology, software, validation, formal analysis, investigation, data curation, visualization, writing—original draft, writing—review & editing, P.P. and H.K.; resources, supervision, project administration, funding acquisition, P.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Faculty of Economics, Chiang Mai University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors used only the secondary data obtained from the publicly available sources.

Acknowledgments

The corresponding author gratefully acknowledges the Faculty of Economics, Chiang Mai University, for the full support during the doctoral course in economics under the CMU Presidential Scholarships. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The diagnostics of models from simulation studies.
Table A1. The diagnostics of models from simulation studies.
TestBFEBREFERELMEBest Model
N = 100; T = 10
Prediction AccuracyExcellentExcellentGoodVery GoodGoodBFE/BRE
Residual NormalityGoodBetterModerateGoodGoodBRE
Residual HomoscedasticityModerateGoodSome IssuesGoodGoodMixed
Model Fit QualityHighVery HighGoodExcellentHighRE
ConvergenceStableStableStableStableStableMixed
Computational EfficiencyModerateGoodExcellentExcellentExcellentFE/RE
InterpretabilityHighModerateVery HighHighModerateFE
N = 200; T = 20
Prediction AccuracyExcellentExcellentGoodVery GoodGoodBFE/BRE
Residual NormalityGoodBetterModerateGoodGoodBRE
Residual HomoscedasticityModerateGoodSome IssuesGoodGoodMixed
Model Fit QualityHighVery HighGoodExcellentHighRE
ConvergenceStableStableStableStableStableMixed
Computational EfficiencyModerateGoodExcellentExcellentExcellentFE/RE
InterpretabilityHighModerateVery HighHighModerateFE
Notes: “Excellent”, “Very Good”, “Very High”, “Good”, “High”, “Moderate”, “Stable”, and “Better” classifications indicate comparative diagnostic performance of the model. Bayesian models show excellent predictive precision.
Table A2. The diagnostics of models from empirical study.
Table A2. The diagnostics of models from empirical study.
TestBFEBREFEREBest Model
Prediction AccuracyExcellentExcellentGoodVery GoodBFE/BRE
Residual NormalityGoodBetterModerateGoodBRE
Residual HomoscedasticityModerateGoodSome IssuesGoodMixed
Model Fit QualityHighVery HighGoodExcellentRE
ConvergenceStableStableStableStableMixed
Computational EfficiencyModerateGoodExcellentExcellentFE/RE
InterpretabilityHighModerateVery HighHighFE
Notes: “Excellent”, “Very Good”, “Very High”, “Good”, “High”, “Moderate”, “Stable”, and “Better” classifications indicate comparative diagnostic performance of the model. Bayesian models show excellent predictive precision.
Table A3. The correlation matrix among covariates.
Table A3. The correlation matrix among covariates.
InterceptIncomeGdprHealthFreedomSupportGenerosityCorruptionInflation
Income0.115
Gdpr−0.135−0.988
Health−0.904−0.340.050
Freedom−0.036−0.0670.065−0.262
Support−0.636−0.0220.0370.692−0.182
Generosity0.269−0.0260.022−0.251−0.092−0.180
Corruption−0.242−0.0170.0200.1000.0340.071−0.051
Inflation0.0610.0090.006−0.0830.023−0.052−0.008−0.009
Unemployment−0.114−0.2010.224−0.0490.247−0.023−0.0050.0210.21
Notes: Values represent Pearson correlation coefficients between covariates. High correlation (>|0.7|) may indicate potential multicollinearity. INCOME and GDPR show strong negative correlation. Other correlations are generally low to moderate. BPVS models can reduce multicollinearity by selecting important covariates.
Table A4. The estimation results from classical panel models for the socioeconomic factors on subjective well-being.
Table A4. The estimation results from classical panel models for the socioeconomic factors on subjective well-being.
VariablePooled OLSClassical Fixed EffectsClassical Random EffectsLinear Mixed Effects
Estimate (Est. Error)Estimate (Est. Error)Estimate (Est. Error)Estimate (Est. Error)
Intercept−1.53842355 (0.26522153) ***−1.53842355 (0.26522153) ***2.86311937 (0.32283410) ***4.2099028 (0.3435049) [12.256]
Income−0.01065499 (0.01682268)−0.03616362 (0.01590007) *−0.00626134 (0.01604104)−0.0242188 (0.0156817) [−1.544]
Gdpr−0.00130581 (0.01680997)0.03728959 (0.01582592) *0.00833363 (0.01599315)0.0257590 (0.0156181) [1.649]
Health0.09898864 (0.00348756) ***0.00142637 (0.00529746)0.03509723 (0.00479972) ***0.0139637 (0.0050377) [2.772]
Freedom1.77289128 (0.15008940) ***1.29447149 (0.11818652) ***1.25948822 (0.12262170) ***1.2757611 (0.1176892) [10.840]
Support0.09277490 (0.00877809) ***−0.00441317 (0.00667276)0.02670812 (0.00655309) ***0.0071415 (0.0065105) [1.097]
Generosity0.20853481 (0.10625875) *0.18932375 (0.09979860)0.04338223 (0.10112106)0.1291903 (0.0985866) [1.310]
Corruption−0.66627108 (0.07936454) ***−0.23095567 (0.06228039) ***−0.27885220 (0.06464032) ***−0.2452482 (0.0620348) [−3.953]
Inflation−0.00355258 (0.00090031) ***−0.00237520 (0.00056376) ***−0.00292763 (0.00059292) ***−0.0025793 (0.0005637) [−4.576]
Unemployment−0.01183866 (0.00368910) **−0.03866514 (0.00415799) ***−0.03420293 (0.00417719) ***−0.0371079 (0.0041007) [−9.049]
R 2 0.62550.193890.210920.1321
Notes: Standard errors are in parentheses. Significance codes *** p < 0.001; ** p < 0.01; * p < 0.05.
Table A5. Bayesian fixed effects and Bayesian random effects estimation results.
Table A5. Bayesian fixed effects and Bayesian random effects estimation results.
Simulated VariablePosterior Estimate (Est. Error)95% Credible Interval R ^ Bulk-EssTail-Ess
Bayesian Fixed Effects
Intercept−1.54 (0.25)−1.99–(−1.05)1.00707564
Income−0.01 (0.02)−0.04–0.021.00555539
Gdpr−0.00 (0.02)−0.03–0.031.00548494
Health0.10 (0.00)0.09–0.111.00824910
Freedom1.78 (0.15)1.49–2.071.00799562
Support0.09 (0.01)0.08–0.111.001077627
Generosity0.21 (0.10)−0.00–0.401.01925701
Corruption−0.67 (0.08)−0.82–(−0.51)1.00908461
Inflation−0.00 (0.00)−0.01–(−0.00)1.01906634
Unemployment−0.01 (0.00)−0.02–(−0.00)1.00992793
σ 0.66 (0.01)0.63–0.681.001052724
Bayesian Random Effects
Intercept4.16 (0.38)3.47–4.941.01157321
Income−0.02 (0.02)−0.05–0.011.00277466
Gdpr0.02 (0.02)−0.01–0.061.00285498
Health0.01 (0.01)0.00–0.021.01158336
Freedom1.27 (0.12)1.05–1.501.00454550
Support0.01 (0.01)−0.01–0.021.00229360
Generosity0.13 (0.10)−0.06–0.321.00235518
Corruption−0.24 (0.06)−0.36–(−0.11)1.01469580
Inflation−0.00 (0.00)−0.00–(−0.00)1.001050771
Unemployment−0.04 (0.00)−0.05–(−0.03)1.01445630
σ 0.36 (0.01)0.35–0.381.00578578
Notes: Standard errors are in parentheses. The model was estimated using 1000 iterations, 500 warm-up iterations, 2 chains, and a thinning rate of 1. The 95 percent credible intervals are reported for all parameter estimates to check the stability of parameters and uncertainty of estimates. R-hat values close to 1 indicate stable and model convergence. Bulk ESS and Tail ESS measure the effective sample sizes of the posterior distribution.
Table A6. Bayesian fixed effects estimation results for the top 4 important variables affecting subjective well-being.
Table A6. Bayesian fixed effects estimation results for the top 4 important variables affecting subjective well-being.
VariableEstimate (Est. Error)95% Credible Interval R ^ Bulk-EssTail-Ess
Intercept−1.73 (0.21)−2.15–(−1.31)1.00837926
Health0.10 (0.00)0.09–0.101.011052923
Freedom2.03 (0.14)1.76–2.321.00694699
Support0.09 (0.01)0.08–0.111.00991735
Corruption−0.70 (0.08)−0.86–(−0.54)1.00806789
σ 0.70 (0.01)0.68–0.731.001013815
Notes. The model was estimated using 1000 iterations, 500 warm-up iterations, 2 chains, and a thinning rate of 1. The 95 percent credible intervals are reported for all parameter estimates to check the stability of parameters and uncertainty of estimates. R-hat values close to 1 indicate stable and model convergence. Bulk ESS and Tail ESS measure the effective sample sizes of the posterior distribution.

Appendix B

Figure A1. Parameter estimates comparison for simulation 1.
Figure A1. Parameter estimates comparison for simulation 1.
Econometrics 14 00003 g0a1
Figure A2. Parameter estimates comparison for simulation 2.
Figure A2. Parameter estimates comparison for simulation 2.
Econometrics 14 00003 g0a2

References

  1. Baltagi, B. H. (2021). Econometric analysis of panel data (6th ed.). Springer. [Google Scholar]
  2. Chipman, H., George, E., & McCulloch, R. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298. [Google Scholar] [CrossRef]
  3. Hausman, J. (1978). Specification tests in econometrics. Econometrica, 46, 1251–1271. [Google Scholar] [CrossRef]
  4. Helliwell, J. F., Layard, R., Sachs, J. D., De Neve, J.-E., Aknin, L. B., & Wang, S. (Eds.). (2024). World happiness report 2024. University of Oxford. [Google Scholar]
  5. Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4), 382–401. [Google Scholar] [CrossRef]
  6. Hsiao, C. (2014). Analysis of panel data (3rd ed.). Cambridge University Press. [Google Scholar]
  7. Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford University Press. [Google Scholar]
  8. Narisetty, N. N., & He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. The Annals of Statistics, 42(2), 789–817. [Google Scholar] [CrossRef]
  9. Pastpipatkul, P., & Ko, H. (2025). Buddhist thought on happiness and income growth relations across varying income countries. Journal of Happiness Studies, 26, 91. [Google Scholar] [CrossRef]
  10. Sellke, T., Bayarri, M., & Berger, J. (2001). Calibration of p-values for testing precise null hypotheses. The American Statistician, 55, 62–71. [Google Scholar] [CrossRef]
  11. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. [Google Scholar] [CrossRef]
  12. Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data (2nd ed.). MIT Press. [Google Scholar]
Figure 1. The variable ranking results from Bayesian fixed effects and Bayesian random effects for simulation 1.
Figure 1. The variable ranking results from Bayesian fixed effects and Bayesian random effects for simulation 1.
Econometrics 14 00003 g001
Figure 2. The variable ranking results from Bayesian fixed effects and Bayesian random effects for simulation 2.
Figure 2. The variable ranking results from Bayesian fixed effects and Bayesian random effects for simulation 2.
Econometrics 14 00003 g002
Table 1. The estimated results from classical panel models.
Table 1. The estimated results from classical panel models.
Pooled OLSClassical Fixed EffectsClassical Random EffectsLinear Mixed Effects
Simulated VariableEstimate (Est. Error)Estimate (Est. Error)Estimate (Est. Error)Estimate (Est. Error)
N = 100; T = 10
Intercept0.1047976 (0.2164648)-0.1047976 (0.2164648)0.104798 (0.216465) [0.484]
x 1 1.7421047 (0.0730332) ***1.7265712 (0.0785378) ***1.7421047 (0.0730332) ***1.742105 (0.073033) [23.854]
I ( x 2 ) 2 2.3226677 (0.2448289) ***2.2104658 (0.2694868) ***2.3226677 (0.2448289) ***2.322668 (0.244829) [9.487]
x 3 −0.0046189 (0.0367853)−0.0085616 (0.0394481)−0.0046189 (0.0367853)−0.004619 (0.036785) [−0.126]
R 2 0.398060.378550.398060.397404
N = 200; T = 20
Intercept−0.095822 (0.102455)-−0.095822 (0.102455)−0.09582 (0.10245)
x 1 1.842230 (0.035690) ***1.838079 (0.037045) ***1.842230 (0.035690) ***1.84223 (0.03569)
I ( x 2 ) 2 1.948087 (0.119837) ***1.903118 (0.123985) ***1.948087 (0.119837) ***1.94809 (0.11984)
x 3 0.018729 (0.017621)0.021081 (0.018226) 0.018729 (0.017621)0.01873
R 2 0.423410.415350.423410.42322
Notes: Standard errors are in parentheses. Significance codes *** p < 0.001.
Table 2. The estimated results from Bayesian panel models.
Table 2. The estimated results from Bayesian panel models.
Simulated VariablePosterior Means (Est. Error)95% Credible Interval R ^ Bulk-EssTail-Ess
N = 100; T = 10Bayesian Fixed Effects
Intercept0.09 (0.21)−0.32–0.531.001074724
x 1 1.74 (0.07)1.61–1.881.001244886
I ( x 2 ) 2 2.33 (0.25)1.83–2.861.011171846
x 3 −0.00 (0.04)−0.07–0.071.001074698
σ 2.29 (0.05)2.19–2.401.001570814
N = 100; T = 10Bayesian Random Effects
Intercept0.10 (0.20)−0.31–0.511.001948667
x 1 1.74 (0.07)1.60–1.871.001905725
I ( x 2 ) 2 2.33 (0.25)1.79–2.851.002627634
x 3 −0.00 (0.04)−0.07–0.071.001774792
σ 2.29 (0.05)2.19–2.391.001450636
N = 200; T = 20Bayesian Fixed Effects
Intercept−0.09 (0.11)−0.29–0.111.012374665
x 1 1.84 (0.04)1.77–1.921.002049566
I ( x 2 ) 2 1.95 (0.11)1.72–2.171.001922740
x 3 0.02 (0.02)−0.02–0.051.012499618
σ 2.24 (0.03)2.19–2.291.001915625
N = 200; T = 20Bayesian Random Effects
Intercept−0.10 (0.11)−0.31–0.111.001382773
x 1 1.84 (0.04)1.77–1.911.001112810
I ( x 2 ) 2 1.95 (0.12)1.72–2.181.001153712
x 3 0.02 (0.02)−0.02–0.051.001224734
σ 2.24 (0.02)2.19–2.291.001067713
Notes: Standard errors are in parentheses. Credible intervals are reported as lower–upper bounds.
Table 3. Summary of predictive performance of models.
Table 3. Summary of predictive performance of models.
Simulation 1 R 2 Best-Fit RankingSimulation 2 R 2 Best-Fit Ranking
N = 100; T = 10 N = 200; T = 20
Bayesian Fixed Effects (BFE)0.3981Bayesian Fixed Effects (BFE)0.4232
Bayesian Random Effects (BRE)0.3981Bayesian Random Effects (BRE)0.4241
Classical Fixed Effects (FE)0.3793Classical Fixed Effects (FE)0.4232
Classical Random Effects (RE)0.3981Classical Random Effects (RE)0.4232
Pooled OLS0.3981Pooled OLS0.4232
Linear Mixed Effects (LME)0.3972Linear Mixed Effects (LME)0.4153
Notes: Higher R 2 values indicate better model fit. The Bayesian RE model shows the best fit.
Table 4. Choosing between fixed effects and random effects for the robustness check.
Table 4. Choosing between fixed effects and random effects for the robustness check.
TestChi-SquaredDegree of Freedomp-Value m B F p Conclusion
N = 100; T = 10 or Simulation 1
Hausman Test1.220530.74811RE is consistent (FE is biased)
N = 200; T = 20 or Simulation 2
Hausman Test2.267530.51881RE is consistent (FE is biased)
Notes: The minimum Bayes factor (mBF) is computed from the corresponding p-values using the formula m B F ( p ) = e . p . l n p , for p < 1/e and m B F ( p ) = 1 if p ≥ 1/e from Sellke et al. (2001). The interpretation of mBF values follows Jeffreys (1961) in which the mBF < 0.01 indicates very strong evidence against the null hypothesis (H0). The mBF between 0.01 and 0.10 indicates strong evidence against H0. The mBF between 0.10 and 0.50 indicates moderate evidence against H0. The mBF > 0.50 indicates weak evidence against H0. Since mBFs equal 1, there is weak evidence against the null hypothesis. Thus, the Hausman test supports the random effects model for both.
Table 5. Symbols, units of measurement, and data sources of variables used in this study.
Table 5. Symbols, units of measurement, and data sources of variables used in this study.
VariableSymbolsUnitsSources
Subjective well-being or life ladder or temporary happinessSWBPointHelliwell et al. (2024)
Per capital Gross Domestic Product (PCGDP)INCOMElog PCGDPWorld Bank
Real Gross Domestic Product, Annual Growth Rates GDPR%World Bank
Healthy Life Expectancy at BirthHEALTHPointHelliwell et al. (2024)
Freedom to make life choicesFREEDOMPointHelliwell et al. (2024)
Social SupportSUPPORTPointHelliwell et al. (2024)
GenerosityGENEROSITYPointHelliwell et al. (2024)
Perception of CorruptionCORRUPTIONPointHelliwell et al. (2024)
Consumer Price Inflation RatesINFLATION%World Bank
Unemployment % of Total Labor ForceUNEMPLOYMENT%World Bank
Notes: Helliwell et al. (2024) provides the open data publicly available at https://worldhappiness.report/data, accessed on 5 July 2025 and Work Bank also provides the open data publicly available at https://data.worldbank.org.
Table 6. Variable importance ranking using Bayesian fixed effects and Bayesian random effects models.
Table 6. Variable importance ranking using Bayesian fixed effects and Bayesian random effects models.
VariableBFE EstimatesBRE EstimatesBFE Variable RankingBRE Variable Ranking
HEALTH−34.858−43.79611
FREEDOM−28.212−29.47622
SUPPORT24.3261.09139
CORRUPTION−17.817−12.80946
INCOME2.77316.72053
INFLATION−2.256−6.79467
UNEMPLOYMENT−1.61316.59374
GDPR−1.15016.07185
GENEROSITY−0.394−3.25698
Notes: The importance of covariates were determined at a threshold 0.3 and was replicated 20 times for robustness.
Table 7. Choosing between fixed effects and random effects model using a Hausman test.
Table 7. Choosing between fixed effects and random effects model using a Hausman test.
TestChi-SquaredDegree of Freedomp-Value m B F p Conclusion
Hausman Test21.279990.011460.1392FE is consistent (RE is biased)
Notes: The minimum Bayes factor (mBF) is computed from the corresponding p-values using the formula m B F ( p ) = e . p . l n p , for p < 1/e and m B F ( p ) = 1 if p ≥ 1/e from Sellke et al. (2001). The interpretation of mBF values follows Jeffreys (1961) in which the mBF < 0.01 indicates very strong evidence against the null hypothesis (H0). The mBF between 0.01 and 0.10 indicates strong evidence against H0. The mBF between 0.10 and 0.50 indicates moderate evidence against H0. The mBF > 0.50 indicates weak evidence against H0. Although mBF shows weak Bayesian evidence against the null hypothesis of random effects, the overall result supports the fixed effects model as an appropriate model for this dataset.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pastpipatkul, P.; Ko, H. Bayesian Panel Variable Selection Under Model Uncertainty for High-Dimensional Data. Econometrics 2026, 14, 3. https://doi.org/10.3390/econometrics14010003

AMA Style

Pastpipatkul P, Ko H. Bayesian Panel Variable Selection Under Model Uncertainty for High-Dimensional Data. Econometrics. 2026; 14(1):3. https://doi.org/10.3390/econometrics14010003

Chicago/Turabian Style

Pastpipatkul, Pathairat, and Htwe Ko. 2026. "Bayesian Panel Variable Selection Under Model Uncertainty for High-Dimensional Data" Econometrics 14, no. 1: 3. https://doi.org/10.3390/econometrics14010003

APA Style

Pastpipatkul, P., & Ko, H. (2026). Bayesian Panel Variable Selection Under Model Uncertainty for High-Dimensional Data. Econometrics, 14(1), 3. https://doi.org/10.3390/econometrics14010003

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop