1. Introduction
Poverty is increasingly recognized as a multidimensional phenomenon rather than a solely income-based measure. In Malaysia, the official Multidimensional Poverty Index (MPI) was introduced in 2015 as a complement to traditional income-based measures under the Eleventh Malaysia Plan (MP11) [
1]. This framework covers three dimensions: education, health, and living-standard deprivations [
2,
3], emphasizing that poverty is far more complex than low income alone [
4,
5]. International studies have found that some individuals are multidimensionally deprived even when they are above the income poverty line [
6], reflecting the importance of assessing vulnerability across multiple dimensions.
Economists have long argued that household consumption expenditure often provides a more accurate picture of household living standards than income. Firstly, households tend to smooth consumption using savings, whereas income may vary seasonally or be under-reported [
7,
8]. Several local studies have analyzed household poverty and inequality from a consumption expenditure perspective, but these are mainly conducted at an aggregate level [
9,
10]. For instance, Ref. [
9] analyzed overall per-capita consumption expenditure inequality and its sociodemographic drivers, while [
10] applied both linear and machine learning models to estimate overall consumption expenditure and its sociodemographic determinants, and found that variables such as age exhibit an inverted U-shaped nonlinear relationship with consumption expenditure.
A more informative analysis involves examining consumption expenditure at a disaggregated level to better understand household living standards. This represents a monetary view of living standards, where the analysis not only considers purchasing power but also evaluates whether households meet essential expenditure needs such as food and housing [
11]. Aggregate consumption measures may conceal vulnerable households; for example, a household may be classified as non-poor overall but still be deprived in key areas if large share of its budget is absorbed by housing costs and constrain expenditure on food. In this study, we refer to this disaggregated composition of household consumption expenditure as the household spending patterns. In addition, multidimensional vulnerability refers to household deprivation assessed jointly across household spending pattern (monetary value) and socioeconomic characteristics (non-monetary value), aligning closely with the three core dimensions of the United Nation (UN)’s MPI (standard of living, health, and education). The former captures household expenditure allocations across essential and non-essential categories including health and education spending, while the latter incorporates income, education level, household size, age, and residential strata as proxies for the standard of living dimension.
Beyond the monetary perspective, recent studies have documented the use of sociodemographic variables as predictors in regression models to estimate their effects on consumption expenditure [
12,
13,
14,
15]. Nevertheless, a more holistic approach is to use these variables to explain the variations in spending behavior. Studies examining the relationship between socioeconomic drivers and spending patterns have gained increasing international attention in recent years [
16,
17], yet remain limited in the local literature. Existing Malaysian studies either treat overall consumption expenditure as a single dependent variable [
9,
10] or focus on a single Classification of Individual Consumption According to Purpose (COICOP) category at a time [
13,
14].
Generally, most of the local and international studies discussed above rely on regression techniques, treating income, consumption expenditure, or related poverty measures as dependent variables and household socioeconomic factors as determinants, with relatively little attention given to clustering approaches [
18]. Clustering methods used predominantly rely on classical parametric techniques such as K-means or hierarchical clustering, which require pre-specifying the number of clusters or using heuristic methods such as the elbow criterion [
18]. In contrast, Bayesian nonparametric approaches allow the data to determine the number of clusters. A common algorithm in this class is the Dirichlet Process Mixture Model (DPMM), implemented via a stick-breaking construction. Moreover, Bayesian methods provide probabilistic household-to-cluster assignments, capturing uncertainty that frequentist approaches typically overlook. In this context, a DPMM model can capture the joint distribution of households’ disaggregated consumption expenditures to cluster households with similar spending profiles together.
To incorporate socioeconomic variables, we adopt a covariate-dependent DPMM, specifically the single-atom-dependent formulation introduced by [
19]. In this framework, all the clusters are defined as mixture components (atoms) shared across a common set of spending profiles, while household assignment probabilities vary with socioeconomic characteristics through a covariate-dependent weight function. In this context, socioeconomic variables are first mapped into a Random Tree embedding layer that captures nonlinear relationships and produces a high dimensional sparse matrix of the covariates. These matrix representations are then transformed into a dense vector and fed into a logistic stick-breaking that defines the covariate-dependent mixture weights within the DPMM. We refer to this specification as the Random Tree-dependent Dirichlet Process Mixture Model (RT-DPMM). This model enables the segmentation of households into latent spending clusters, addressing two key questions as follows:
The proposed model demonstrates novelty in joining nonparametric clustering over disaggregated spending variables with the integration of socioeconomic covariates through an embedding layer within a coherent Bayesian framework. To the best of our knowledge, no prior Malaysian study has employed such an advanced mixture model to link household socioeconomic characteristics with spending patterns. Furthermore, the incorporation of forest-based machine learning models to create a nonlinear-based weight function within a unified Bayesian mixture modeling framework remains relatively unexplored in the context of household expenditure pattern analysis.
In summary, several gaps remain in the literature motivating this research. First, Malaysian poverty studies predominantly rely on aggregate income or consumption measures to assess household vulnerability. Although some studies conceptualize poverty as a multidimensional phenomenon, there is limited work examining disaggregated consumption expenditure to identify multidimensionally deprived households based on spending patterns. Second, Bayesian nonparametric clustering frameworks remain underutilized in this domain, despite their suitability for uncovering latent household groups that correspond to multidimensional vulnerability. This study aims to identify associations between household socioeconomic factors and spending patterns. By integrating disaggregated consumption expenditure features and modeling socioeconomic factors as covariates within the proposed RT-DPMM, this research seeks to fill these gaps and provide interpretable clustering results that can inform targeted policy interventions based on household spending behavior and socioeconomic characteristics.
5. Conclusions
This study develops a novel hybrid machine learning and Bayesian mixture model, RT-DPMM, to analyze the heterogeneity of household spending patterns and examine how socioeconomic characteristics drive these patterns in Malaysia. Unlike traditional parametric clustering methods that rely on prespecified cluster number, our proposed nonparametric approach automatically identified a number of clusters that best represents spending pattern given household samples based on their disaggregated spending features and socioeconomic covariates. By interpreting the RT-DPMM using the Random Forest regressor and SHAP surrogate model, the proposed model provides a more transparent interpretation of how complex socioeconomic factors drive the probabilities of households falling to different spending pattern clusters visually.
The empirical finding reveals that unidimensional income measures are inadequate for assessing household vulnerability. Specifically, households with similar per-capita income exhibit fundamentally different spending patterns and vulnerabilities. Both Balanced Budget Households (Cluster 1) and the Basic Essentials-Focused Households (Cluster 3) are vulnerable clusters of low-income categories. Cluster 1 households are typically larger families with balanced budget allocation across essential and non-essential items while Cluster 3 households are smaller families with elderly households head exhibiting higher budget allocation in essential spending such as Food and Beverages and Clothing and Footwear. Furthermore, Luxury Households (Cluster 4) and Mobility and Home-Support Households (Cluster 2) with better financial status have distinct spending behaviors also. Cluster 4 is characterized by younger household heads and smaller household sizes with budget allocation that focus on luxury spendings, while Cluster 2 consists of middle-aged household heads coming from larger household sizes with spending priorities in Personal Development and Mobility and Connectivity.
In summary, this paper contributes to the existing socioeconomic literature by demonstrating that household vulnerability and standard of living are multidimensional phenomena that cannot be fully captured by income alone. The four distinct spending pattern clusters identified here suggest that policy design must move beyond income-based measures to the diversification of social assistance programs. We recommend both vulnerable clusters (Cluster 1 and 3) should receive direct and non-direct cash transfers to support a balanced budget share and thus improve their standard of living. Moreover, targeted healthcare and food security interventions for the elderly in Cluster 3 is recommended as they are mostly low-income earner and living in rural area where health resources are limited. For the financially better-off clusters (Cluster 2 and 4), policy should focus on human capital investment and financial literacy to prevent younger low-income household heads from debt-financed overconsumption.
Methodologically, this paper contributes to the existing literature on Bayesian mixture model by demonstrating that the proposed RT-DPMM is applicable to solving real-world issues. Based on empirical analysis, the RT-DPMM models the nine spending features (which sum to one) and let socioeconomic covariates drive the posterior probability of household assignment to each spending cluster. As a result, households with similar socioeconomic characteristics are expected to exhibit similar spending patterns. Using an iterative refinement strategy, the RT-DPMM achieves convergence among the stable households.
The findings of this study should be interpreted in light of several limitations. From a methodological point of view, although the proposed RT-DPMM model is capable of effectively segmenting household observations into different spending clusters using an iterative refinement strategy, the filtered unstable households remain subject to further investigation. Second, this study only used a single year household data using HES 2022. Thus, it cannot track the spending patterns and cluster memberships over time. Additionally, the RF-SHAP surrogate model quantifies covariates’ associations with cluster assignment probabilities rather than causal effect, and it reflect the approximated mapping instead of the RT-DPMM itself. From a socioeconomic view, the four spending pattern clusters identified and their socioeconomic drivers reflect the households structure inside HES 2022. Lastly, socioeconomic differences, such as cultural spending norms and urbanization rates, across countries mean that the four identified clusters and their specific policy instruments suggested in this study should not be directly transferred to other settings without further contextual validation.
For future works, the proposed RT-DPMM is not inherently Malaysia-specific and may be extended to other developing countries as well. Researchers may use similar disaggregated consumption expenditure data in compositional or proportional scale, where the spending features sum up to be one. Future studies may also consider integrating other household socioeconomic factors or even macroeconomic indicators as covariates to capture dynamic determinants of spending patterns. In addition, they may incorporate expert knowledge as informative priors of base cluster spending patterns or exploration of alternative embedding techniques to be used in the LSBP of RT-DPMM to capture even more complex socioeconomic nonlinearities.
Author Contributions
Conceptualization, E.L. and T.S.O.; methodology, E.L. and T.S.O.; software, E.L.; validation, E.L., T.S.O. and Y.L.; data curation, T.S.O. and Y.L.; writing—original draft preparation, E.L.; writing—review and editing, T.S.O. and Y.L.; visualization, E.L.; supervision, T.S.O. and Y.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Telekom Malaysia Research & Development under Grant RDTC/241111.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data used in this study are derived from the Household Expenditure Survey (HES) 2022 administered by the Department of Statistics Malaysia (DOSM) and are not publicly available, and access to the data is subject to a formal request to DOSM.
Acknowledgments
The authors would also like to thank the Department of Statistics Malaysia (DOSM) for providing the HES 2022 data under the Memorandum of Understanding between DOSM and Multimedia University, which was essential in supporting this research.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| MPI | Multidimensional Poverty Index |
| UN | United Nations |
| RT-DPMM | Random Tree–Dirichlet Process Mixture Model |
| DPMM | Dirichlet Process Mixture Model |
| DDP | Dependent Dirichlet Process |
| LSBP | Logit Stick-Breaking Process |
| NUTS | No-U-Turn Sampler |
| MCMC | Markov Chain Monte Carlo |
| HDI | Highest Density Interval |
| RTE | Random Tree Embedding |
| SVD | Singular Value Decomposition |
| ESS | Effective Sample Size |
| RF | Random Forest |
| SHAP | SHapley Additive exPlanations |
| PDP | Partial Dependence Plot |
| RMSE | Root Mean Squared Error |
| HES | Household Expenditure Survey |
| M-COICOP | Malaysian Classification of Individual Consumption According to Purpose |
| COICOP | Classification of Individual Consumption According to Purpose |
| DOSM | Department of Statistics Malaysia |
| OECD | Organisation for Economic Co-operation and Development |
| PPC | Posterior Predictive Checks |
| STR | Rahmah Cash Contribution |
| SARA | Rahmah Basic Contribution |
| BWS | The Elderly Assistance |
| MP11 | Eleventh Malaysia Plan |
| OLS | Ordinary Least Squares |
References
- Economic Planning Unit. Eleventh Malaysia Plan, 2016–2020: Anchoring Growth on People; Kementerian Ekonomi: Putrajaya, Malaysia, 2015. [Google Scholar]
- Usamah, W.A.W. Deepening Malaysia’s Understanding of Poverty; Khazanah Research Institute: Kuala Lumpur, Malaysia, 2024. [Google Scholar]
- World Bank. Multidimensional Poverty in Malaysia: Improving Measurement and Policies in the 2020s; The World Bank: Washington, DC, USA, 2021. [Google Scholar]
- OECD. How’s Life? 2020; Organisation for Economic Co-Operation and Development: Paris, France, 2020. [Google Scholar]
- UNDP. Unpacking deprivation bundles to reduce multidimensional poverty. In Human Development Perspectives; UNDP: New York, NY, USA, 2022. [Google Scholar]
- Dhongde, S.; Haveman, R. A Decade-Long View of Multidimensional Deprivation in the United States; IRP Discussion Paper No. 1440-19; Institute for Research on Poverty, University of Wisconsin–Madison: Madison, WI, USA, 2019. [Google Scholar]
- Deaton, A.; Grosh, M. Consumption. In Designing Household Survey Questionnaires; Oxford University Press: New York, NY, USA, 2000. [Google Scholar]
- Meyer, B.D.; Sullivan, J.X. Identifying the disadvantaged: Official poverty, consumption poverty, and the new supplemental poverty measure. J. Econ. Perspect. 2012, 26, 111–136. [Google Scholar] [CrossRef]
- Ayyash, M.; Sek, S.K. Decomposing Inequality in Household Consumption Expenditure in Malaysia. Economies 2020, 8, 83. [Google Scholar] [CrossRef]
- Lee, E.; Ong, T.S.; Lee, Y. Evaluating household consumption patterns: OLS and random forest regression models. HighTech Innov. J. 2024, 5, 489–507. [Google Scholar] [CrossRef]
- World Bank. Beyond monetary poverty. In Poverty and Shared Prosperity 2018: Piecing Together the Poverty Puzzle; World Bank: Washington, DC, USA, 2018; pp. 87–120. [Google Scholar]
- Cheah, Y.K.; Su, T.T.; Adzis, A.A. Cross-Sectional Analysis of Expenditure on Fruits and Vegetables. Int. J. Inst. Econ. 2024, 16, 53–78. [Google Scholar]
- Ismail, N.A.; Daud, L.; Mohd, S.; Samat, N.; Ridzuan, A.R. Consumption pattern determinants of low-income household: Evidence from Malaysia. J. Ekon. Malays. 2023, 57, 31–45. [Google Scholar]
- Applanaidu, S.D.; Abdul-Adzis, A.; Jan, S.J.; Abidin, N.Z. Socio-Economics Factors Affecting B40 Households Food Expenditure in Malaysia. J. Posit. Sch. Psychol. 2022, 6. [Google Scholar]
- Agyepong, L.; Kuuwill, A.; Kimengsi, J.N.; Darfor, K.N.; Ampomah, S.; Evans, K.; Gbogbolu, A.; Attado, G.N.; Charles, A.K. Household Consumption Expenditure Determinants Across Poverty Subgroups in Sub-Sahara Africa. J. Poverty 2024, 30, 26–51. [Google Scholar] [CrossRef]
- Piekut, M.; Knapkova, M. Patterns and convergence in household spending. Amfiteatru Econ. 2025, 27, 180. [Google Scholar] [CrossRef] [PubMed]
- Yüksel, E.; Başar, D. Household Consumption Expenditures in Türkiye: Socio-Economic Determinants, Spending Patterns, and Policy Perspectives. J. Res. Econ. Polit. Financ. 2025, 10, 467–483. [Google Scholar] [CrossRef]
- Abdul Rahman, M.; Sani, N.S.; Hamdan, R.; Ali Othman, Z.; Abu Bakar, A. A clustering approach to identify multidimensional poverty indicators for the bottom 40 percent group. PLoS ONE 2021, 16, e0255312. [Google Scholar] [CrossRef]
- Denti, F.; Camerlenghi, F.; Guindani, M.; Mira, A. A common atoms model for the Bayesian nonparametric analysis of nested data. J. Am. Stat. Assoc. 2023, 118, 405–416. [Google Scholar] [CrossRef] [PubMed]
- Boehmke, B.; Greenwell, B. Hands-On Machine Learning with R, 1st ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2020. [Google Scholar]
- United Nations. Classification of Individual Consumption According to Purpose (COICOP); United Nations Statistics Division: New York, NY, USA, 2026; Available online: https://unstats.un.org/unsd/classifications/coicop (accessed on 4 May 2026).
- OECD. The OECD List of Social Indicators; OECD Publishing: Paris, France, 1982. [Google Scholar]
- Carroll, C.; Slacalek, J.; Tokuoka, K.; White, M.N. The distribution of wealth and the marginal propensity to consume. Quant. Econ. 2017, 8, 977–1020. [Google Scholar] [CrossRef]
- Almås, I.; Beatty, T.K.M.; Crossley, T.F. Lost in Translation: What Do Engel Curves Tell Us about the Cost of Living? SSRN Electron. J. 2018, 1–57. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Ren, L.; Du, L.; Dunson, D.B. Logistic stick-breaking process. J. Mach. Learn. Res. 2011, 12, 713–739. [Google Scholar]
- Sethuraman, J. A Constructive Definition of Dirichlet Priors; Technical Report; Florida State University: Tallahassee, FL, USA, 1991. [Google Scholar]
- MacEachern, S.N. Dependent Nonparametric Processes; American Statistical Association: Alexandria, VA, USA, 2000. [Google Scholar]
- Stephens, M. Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B 2000, 62, 795–809. [Google Scholar] [CrossRef]
- Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar] [CrossRef]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
- Vehtari, A.; Gelman, A.; Simpson, D.; Carpenter, B.; Bürkner, P.C. Rank-Normalization, Folding, and Localization: An Improved for assessing convergence of MCMC (with discussion). Bayesian Anal. 2020, 16, 667–718. [Google Scholar] [CrossRef]
- Ishwaran, H.; James, L.F. Gibbs sampling Methods for Stick-Breaking Priors. J. Am. Stat. Assoc. 2001, 96, 161–173. [Google Scholar] [CrossRef]
- Müller, P.; Quintana, F. Random partition models with regression on covariates. J. Stat. Plan. Inference 2010, 140, 2801–2808. [Google Scholar] [CrossRef] [PubMed]
- Wade, S.; Inácio, V. Bayesian Dependent Mixture Models: A Predictive Comparison and Survey. Stat. Sci. 2025, 40, 81–108. [Google Scholar] [CrossRef]
- Houthakker, H.S. An international comparison of household expenditure patterns. Econometrica 1957, 25, 532. [Google Scholar] [CrossRef]
- Deaton, A.; Muellbauer, J. Economics and Consumer Behaviour; Cambridge University Press: Cambridge, UK, 1980. [Google Scholar]
- Banks, J.; Blundell, R.; Lewbel, A. Quadratic Engel Curves and Consumer Demand. Rev. Econ. Stat. 1997, 79, 527–539. [Google Scholar] [CrossRef]
- Barca, V.; Brook, S.; Holland, J.; Otulana, M.; Pozarny, P. Qualitative Research and Analyses of the Economic Impacts of Cash Transfer Programmes in Sub-Saharan Africa: Synthesis Report; PtoP Project Report; Food and Agriculture Organization of the United Nations (FAO): Rome, Italy, 2015; Available online: https://www.fao.org/3/i3616e/i3616e.pdf (accessed on 4 May 2026).
- Daidone, S.; Davis, B.; Handa, S.; Winters, P. The Household and Individual-Level Productive Impacts of Cash Transfer Programs in Sub-Saharan Africa. Am. J. Agric. Econ. 2019, 101, 1401–1431. [Google Scholar] [CrossRef] [PubMed]
- Dauda Goni, M.; Aroyehun, A.B.; Abdul Razak, S.; Drammeh, W.; Abbas, M.A. Food insecurity in Malaysia: Assessing the impact of movement control order during COVID-19. Nutr. Food Sci. 2024, 54, 1202–1218. [Google Scholar] [CrossRef]
- Banerjee, A.; Hanna, R.; Olken, B.A.; Satriawan, E.; Sumarto, S. Electronic Food Vouchers: Evidence from an At-Scale Experiment in Indonesia. Am. Econ. Rev. 2023, 113, 514–547. [Google Scholar] [CrossRef]
- Jih, J.; Stijacic-Cenzer, I.; Seligman, H.K.; Boscardin, W.J.; Nguyen, T.T.; Ritchie, C.S. Chronic disease burden predicts food insecurity among older adults. Public Health Nutr. 2018, 21, 1737–1742. [Google Scholar] [CrossRef]
- Gajda, R.; Jeżewska-Zychowicz, M. The importance of social financial support in reducing food insecurity among elderly people. Food Secur. 2021, 13, 717–727. [Google Scholar] [CrossRef]
- Arsenijevic, J.; Pavlova, M.; Rechel, B.; Groot, W. Catastrophic Health Care Expenditure among Older People with Chronic Diseases in 15 European Countries. PLoS ONE 2016, 11, e0157765. [Google Scholar] [CrossRef]
- Wan, Y.S.; Cheng, N.F.L. Social Assistance in Malaysia: Who Benefits, and Who Misses Out; World Bank: Washington, DC, USA, 2026. [Google Scholar]
- Hamid, T.A. Population Ageing in Malaysia: A Mosaic of Issues, Challenges and Prospects; Universiti Putra Malaysia Press: Serdang, Malaysia, 2015. [Google Scholar]
- Cheah, Y.K.; Meltzer, D. Ethnic Differences in Participation in Medical Check-ups Among the Elderly. J. Gen. Intern. Med. 2020, 35, 2680–2686. [Google Scholar] [CrossRef]
- Park, A.; Sawada, Y. Human Capital Investment and Economic Growth; Asian Development Bank: Mandaluyong City, Philippines, 2018. [Google Scholar]
- Shafee, N.B.; Mohamed, Z.S.S.; Suhaimi, S.; Hashim, H.; Mohd, S.N.H. Credit Card and Compulsive Buying Behavior Among the Generation Z (Gen Z) in Malaysia. In Technology and Business Model Innovation: Challenges and Opportunities; Alareeni, B., Elgedawy, I., Eds.; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2024; Volume 926, pp. 213–222. [Google Scholar] [CrossRef]
- Sabri, M.F.; Wahab, R.; Mahdzan, N.S.; Magli, A.S.; Rahim, H.A. Mediating Effect of Financial Behaviour on the Relationship Between Perceived Financial Wellbeing and Its Factors Among Low-Income Young Adults in Malaysia. Front. Psychol. 2022, 13, 858630. [Google Scholar] [CrossRef] [PubMed]
- Kaiser, T.; Lusardi, A.; Menkhoff, L.; Urban, C. Financial education affects financial knowledge and downstream behaviors. J. Financ. Econ. 2021, 145, 255–272. [Google Scholar] [CrossRef]
Figure 1.
Overview of the two-phase RT-DPMM framework: Phase 1 performs initial clustering and filters unstable households; Phase 2 refits the model on stable households and interprets cluster profiles via posterior analysis and SHAP.
Figure 2.
Schematic diagram of the RT-DPMM specification. Arrows denote data flow and probabilistic dependencies. The is used to compute RTE-SVD and thus the LSBP mixture weights, while Dirichlet component priors define cluster distributions; both pathways become inputs into the mixture likelihood.
Figure 3.
Cumulative explained variance plot of SVD components.
Figure 4.
Bar chart showing the number of households assigned with 95% HDI per cluster.
Figure 5.
The posterior predictive plot.
Figure 6.
Radar charts illustrating the distinct spending pattern the of four active clusters. The solid lines represent the posterior mean while the shallow regions indicate the 95% HDI.
Figure 7.
Parallel coordinate plot comparing the spending features across four active clusters. Each colored path tracks the component centroids, , of a cluster across the nine spending features with transparent bands denoting the 95% HDI to quantify uncertainty.
Figure 8.
Feature importance bar plots (mean absolute SHAP values) for four active clusters, indicating the relative importance of socioeconomic factor.
Figure 9.
SHAP partial dependence plots (PDPs) for four active clusters across four socioeconomic covariates, where (i) per capita income, (ii) household size, (iii) education level, and (iv) age. (a–d) correspond to Clusters 1–4 and visualize the nonlinear effect of household covariates on cluster assignment probabilities. (a) Cluster 1: low-income households show SHAP values decline with income and increase with household size. (b) Cluster 2: mid-range income (RM3000–9000) and medium household size drive positive assignment, peaking at ages 30–40. (c) Cluster 3: strongly associated with very low income, small household size and higher education. (d) Cluster 4: higher income and smaller households positively predict assignment.
Table 1.
Descriptive statistics of initial selected household heads (N = 14,268).
| Features | Mean (RM) | Std. (RM) | Zero Values (%) |
|---|
| Food and Beverages | 324.54 | 175.14 | 0.01 |
| Alcoholic Beverages and Tobacco | 418.52 | 357.89 | 33.94 |
| Clothing and Footwear | 48.45 | 41.86 | 0.08 |
| Housing, Water, Electricity, Gas and Other Fuels | 418.52 | 357.89 | 0.00 |
| Furnishing, Household Equipment, Routine Household Maintenance | 74.74 | 91.31 | 0.18 |
| Health | 48.38 | 68.47 | 2.80 |
| Transport | 182.99 | 151.44 | 0.27 |
| Information and Communication | 116.82 | 100.20 | 0.40 |
| Recreation, Sport and Culture | 45.97 | 90.04 | 11.16 |
| Education | 16.12 | 43.36 | 49.76 |
| Restaurant and Accommodation Services | 260.85 | 241.94 | 0.22 |
| Insurance and Financial Services | 57.25 | 101.82 | 13.09 |
| Personal Care, Social Protection and Miscellaneous Goods and Services | 105.07 | 114.17 | 0.09 |
| Per Capita Income | 2837.95 | 2104.27 | 0.00 |
| Education Level | 3.73 | 1.74 | 0.00 |
| Household Size | 3.87 | 1.93 | 0.00 |
| Age | 47.07 | 13.67 | 0.00 |
| Strata (Urban reference) | 0.71 | 0.45 | 0.00 |
Table 2.
Descriptive statistics of final selected household heads (N = 11,374).
| Features | Mean | Std. |
|---|
| M-COICOP (per capita scale) (RM) |
| Food and Beverages | 0.203 | 0.087 |
| Clothing and Footwear | 0.031 | 0.019 |
| Housing, Water, Electricity, Gas and Other Fuels | 0.226 | 0.090 |
| Health | 0.027 | 0.028 |
| Insurance and Financial Services | 0.032 | 0.036 |
| Household Operations | 0.104 | 0.056 |
| Mobility and Connectivity | 0.176 | 0.062 |
| Discretionary Spending | 0.167 | 0.084 |
| Personal Development | 0.037 | 0.042 |
| Socioeconomic |
| Income (per capita scale) | ∼0.000 | ∼1.00 |
| Education Level | ∼0.000 | ∼1.00 |
| Household Size | ∼0.000 | ∼1.00 |
| Age | ∼0.000 | ∼1.00 |
| Strata (Urban reference) | 0.710 | 0.454 |
Table 3.
Parameters settings used in RTE and SVD.
| Embedding | Parameter | Feature |
|---|
| RTE | | 5356 |
| Number of Tree () | 200 | |
| Maximum Number of Leaf Nodes (max_leaf_nodes) | 32 | |
| SVD | | 50 |
| Number of Components () | 50 | |
Table 4.
Summary of Statistics for component centroid () of active clusters ().
| Features | Mean | Std. | HDI (2.5%) | HDI (97.5%) | ESS (Bulk) | ESS (Tail) | R-Hat |
|---|
| Cluster 1 |
| Food and Beverages | 0.234 | 0.002 | 0.230 | 0.238 | 4369 | 5832 | 1.00 |
| Clothing and Footwear | 0.185 | 0.002 | 0.181 | 0.188 | 5384 | 6479 | 1.00 |
| Housing, Water, Electricity, Gas and Other Fuels | 0.042 | 0.001 | 0.041 | 0.043 | 7628 | 6420 | 1.00 |
| Health | 0.025 | 0.000 | 0.024 | 0.026 | 7314 | 6102 | 1.00 |
| Insurance and Financial Services | 0.028 | 0.000 | 0.027 | 0.029 | 6250 | 6770 | 1.00 |
| Household Operations | 0.103 | 0.001 | 0.101 | 0.105 | 7119 | 6131 | 1.00 |
| Mobility and Connectivity | 0.180 | 0.001 | 0.177 | 0.182 | 6985 | 6074 | 1.00 |
| Discretionary Spending | 0.175 | 0.002 | 0.171 | 0.179 | 6081 | 6345 | 1.00 |
| Personal Development | 0.029 | 0.001 | 0.028 | 0.030 | 4361 | 5817 | 1.00 |
| Cluster 2 |
| Food and Beverages | 0.147 | 0.003 | 0.141 | 0.152 | 5832 | 6435 | 1.00 |
| Clothing and Footwear | 0.194 | 0.004 | 0.187 | 0.201 | 5942 | 7549 | 1.00 |
| Housing, Water, Electricity, Gas and Other Fuels | 0.039 | 0.001 | 0.037 | 0.041 | 8092 | 6998 | 1.00 |
| Health | 0.031 | 0.001 | 0.029 | 0.033 | 7403 | 5838 | 1.00 |
| Insurance and Financial Services | 0.044 | 0.002 | 0.042 | 0.047 | 6390 | 6522 | 1.00 |
| Household Operations | 0.116 | 0.003 | 0.112 | 0.121 | 7474 | 5961 | 1.00 |
| Mobility and Connectivity | 0.186 | 0.003 | 0.181 | 0.191 | 7950 | 6785 | 1.00 |
| Discretionary Spending | 0.174 | 0.004 | 0.166 | 0.181 | 6193 | 6692 | 1.00 |
| Personal Development | 0.068 | 0.003 | 0.062 | 0.074 | 4673 | 6647 | 1.00 |
| Cluster 3 |
| Food and Beverages | 0.280 | 0.003 | 0.274 | 0.286 | 7117 | 5873 | 1.00 |
| Clothing and Footwear | 0.262 | 0.004 | 0.256 | 0.269 | 5546 | 6916 | 1.00 |
| Housing, Water, Electricity, Gas and Other Fuels | 0.037 | 0.001 | 0.035 | 0.038 | 7862 | 6200 | 1.00 |
| Health | 0.026 | 0.001 | 0.025 | 0.027 | 7905 | 6089 | 1.00 |
| Insurance and Financial Services | 0.025 | 0.001 | 0.024 | 0.026 | 7831 | 6677 | 1.00 |
| Household Operations | 0.091 | 0.002 | 0.088 | 0.094 | 7980 | 6482 | 1.00 |
| Mobility and Connectivity | 0.158 | 0.002 | 0.154 | 0.162 | 6065 | 5485 | 1.00 |
| Discretionary Spending | 0.098 | 0.002 | 0.094 | 0.103 | 5644 | 7167 | 1.00 |
| Personal Development | 0.022 | 0.001 | 0.021 | 0.023 | 7091 | 4907 | 1.00 |
| Cluster 4 |
| Food and Beverages | 0.115 | 0.003 | 0.110 | 0.120 | 7970 | 7108 | 1.00 |
| Clothing and Footwear | 0.297 | 0.006 | 0.286 | 0.309 | 4630 | 5822 | 1.00 |
| Housing, Water, Electricity, Gas and Other Fuels | 0.032 | 0.001 | 0.030 | 0.034 | 6907 | 6327 | 1.00 |
| Health | 0.031 | 0.001 | 0.029 | 0.034 | 7401 | 6190 | 1.00 |
| Insurance and Financial Services | 0.046 | 0.001 | 0.043 | 0.048 | 6996 | 6888 | 1.00 |
| Household Operations | 0.088 | 0.002 | 0.083 | 0.092 | 6955 | 6647 | 1.00 |
| Mobility and Connectivity | 0.176 | 0.003 | 0.170 | 0.182 | 8063 | 7352 | 1.00 |
| Discretionary Spending | 0.185 | 0.004 | 0.177 | 0.192 | 6862 | 6617 | 1.00 |
| Personal Development | 0.030 | 0.001 | 0.027 | 0.032 | 6023 | 6681 | 1.00 |
Table 5.
Summary of descriptive statistics of five socioeconomic factors of active clusters (k = 1, 2, 3, 4).
| Features | Median | Mean | Std |
|---|
| Cluster 1 (N = 2883) |
| Age | 46.70 | 46.69 | 0.24 |
| Education Level | 3.27 | 3.27 | 0.03 |
| Household Size | 5.28 | 5.28 | 0.04 |
| Per-capita Income | 1718.49 | 1721.09 | 25.95 |
| Strata (Urban ref.) | 0.50 | 0.50 | 0.01 |
| Cluster 2 (N = 642) |
| Age | 45.08 | 45.09 | 0.48 |
| Education Level | 4.38 | 4.38 | 0.07 |
| Household Size | 5.02 | 5.01 | 0.09 |
| Per-capita Income | 3961.85 | 3973.26 | 133.32 |
| Strata (Urban ref.) | 0.92 | 0.92 | 0.02 |
| Cluster 3 (N = 977) |
| Age | 50.77 | 50.78 | 0.44 |
| Education Level | 3.31 | 3.32 | 0.06 |
| Household Size | 4.01 | 4.01 | 0.07 |
| Per-capita Income | 1527.83 | 1529.11 | 32.02 |
| Strata (Urban ref.) | 0.39 | 0.39 | 0.02 |
| Cluster 4 (N = 628) |
| Age | 41.27 | 41.28 | 0.53 |
| Education Level | 4.77 | 4.76 | 0.06 |
| Household Size | 2.36 | 2.36 | 0.10 |
| Per-capita Income | 6249.73 | 6255.19 | 147.36 |
| Strata (Urban ref.) | 0.98 | 0.98 | 0.01 |
Table 6.
Five-fold and average performance of four active clusters using R2 and RMSE (k = 1, 2, 3, 4).
| Features | Five-Fold R2 | Average R2 | Five-Fold RMSE | Average RMSE |
|---|
| Cluster 1 | [0.9694, 0.9784, 0.9736, 0.9768, 0.9803] | 0.9757 | [0.0543, 0.0461, 0.0508, 0.0465, 0.0442] | 0.0484 |
| Cluster 2 | [0.9565, 0.9656, 0.9680, 0.9697, 0.9693] | 0.9658 | [0.0536, 0.0464, 0.0468, 0.0443, 0.0448] | 0.0472 |
| Cluster 3 | [0.9780, 0.9900, 0.9857, 0.9854, 0.9882] | 0.9854 | [0.0365, 0.0246, 0.0297, 0.0282, 0.0269] | 0.0292 |
| Cluster 4 | [0.9820, 0.9901, 0.9873, 0.9876, 0.9899] | 0.9874 | [0.0340, 0.0253, 0.0292, 0.0279, 0.0255] | 0.0284 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |