Equity Market Structure and Trading Diversification: Insights from Panel Data, Clustering, and Machine Learning

Leogrande, Angelo; Anobile, Fabio; Costantiello, Alberto; Drago, Carlo; Arnone, Massimo

doi:10.3390/ijfs14060150

Open AccessFeature PaperArticle

Equity Market Structure and Trading Diversification: Insights from Panel Data, Clustering, and Machine Learning

by

Angelo Leogrande

^1,*

,

Fabio Anobile

¹

,

Alberto Costantiello

¹

,

Carlo Drago

²

and

Massimo Arnone

³

¹

Dipartimento di Management, Finanza e Tecnologia, LUM University Giuseppe Degennaro, 70010 Casamassima, Italy

²

Dipartimento di Scienze Economiche, Psicologiche, della Comunicazione, della Formazione e Motorie, Niccolò Cusano University, 00166 Roma, Italy

³

Dipartimento di Scienze Politiche e Sociali, University of Catania, 95131 Catania, Italy

^*

Author to whom correspondence should be addressed.

Int. J. Financial Stud. 2026, 14(6), 150; https://doi.org/10.3390/ijfs14060150

Submission received: 17 March 2026 / Revised: 1 May 2026 / Accepted: 6 May 2026 / Published: 4 June 2026

Download

Browse Figures

Versions Notes

Abstract

This paper studies the topic that has been rather less explored until now—the internal diversification of trading. Unlike looking at aggregate measures of financial development such as market capitalization and liquidity, the study focuses on trading diversification, defined as the portion of trading volume attributed to firms other than the ten most actively traded (VTX). The empirical analysis is based on the World Bank’s Global Financial Development database. It covers an unbalanced cross-country dataset of 2004–2021. Due to limited data availability, the resulting database became smaller and has an unbalanced panel structure. Four main independent variables in the core regression specification are related to financial structure (bank deposits) and financial integration (remittances, international public debt), as well as external measures of financial development (market capitalization, excluding firms within VTX). A broad range of control variables are introduced into the model to account for macroeconomic conditions, financial development, market size, liquidity, and participation. Lagged regressors are introduced to address persistence, delays, and potential endogeneity issues. The methodology relies on panel data econometrics, hierarchical clustering, and machine learning. The findings show that market structure and remittances positively affect trading diversification, whereas banks’ dominance and international public debt contribute to its concentration. The results persist across alternative specifications and robustness tests. The country-level analysis shows a core–periphery pattern, while machine learning demonstrates the critical importance of market structure.

Keywords:

stock market diversification; market structure; remittances; financial development; machine learning

JEL Classification:

G12; G15; C23; C38; O16

1. Introduction

Over the past several decades, financial markets have undergone major structural changes driven by globalization, technological development, and the growing intertwining of the real and financial sectors. This led scholars and policymakers to pay less attention to traditional indicators of market development, such as market size and liquidity, and more to market structure, specifically the distribution of trading across firms. In this context, the structure of financial markets—depth, breadth, and concentration—became an important element of financial development, which affects their efficiency and resource allocation (Philippon, 2019; Stulz, 2019). Notwithstanding this change in perspective, the great majority of studies on stock market development still focus on aggregate measures such as market capitalization, value traded, and turnover rates, while paying little attention to the dispersion of trading across firms. However, stock markets tend to be highly concentrated, with a few firms accounting for most trading volume, while smaller firms have little influence on the trading process (Kahraman & Tookes, 2017). At the same time, more inclusive financial systems, associated with greater participation and lower concentration, provide greater resilience and better welfare outcomes (Demirgüç-Kunt et al., 2021). Despite widespread recognition that financial development is multifaceted and influenced by financial structure, market dynamics, and financial innovations, the literature still lacks a coherent framework to explain the relationships among these variables and the distribution of trading activity across firms. Therefore, the paper’s key problem is to identify the structural and financial determinants of trading diversification and how they operate. Trading diversification is measured using a new indicator that estimates the proportion of trading performed by firms other than the top 10 largest by trade volume (VTX). It indicates whether trading is concentrated or diversified. The main question addressed in the paper is therefore as follows: what are the key determinants of trading diversification among the dimensions of financial systems? To test this hypothesis, the paper uses an analytical framework based on four important dimensions of financial systems: the relative size of deposit-taking banks (DBS), market capitalization excluding the top ten firms (MCX), remittance inflows (REM), and international public debt stock (IPU). DBS indicates the nature of financial systems—market versus bank-based—MCX indicates market dynamics and concentration, and REM indicates the importance of financial inclusion, while IPU indicates the dependence on international financing. Thus, the paper makes three key contributions. First, it suggests a new approach to measuring and analyzing stock market trading diversification, thereby addressing an important yet understudied aspect of financial market development. Second, it contributes to knowledge about the determinants of stock market trading diversification by providing new information about its relationship with financial structure, financial flows, and market dynamics. Third, the paper combines panel econometrics, clustering, and machine learning approaches. In terms of policies, understanding stock market trading diversification is critically important for ensuring efficient, stable, and inclusive financial systems because high levels of trading diversification are associated with greater liquidity and informational efficiency, as well as better welfare and systemic performance.

The remainder of the article is structured as follows. Section 2 reviews the relevant literature on financial structure and trading diversification. Section 3 presents the methodology and data. Section 4 reports the panel regression results and provides the economic interpretation of trading diversification. Section 4.1 discusses robustness checks based on alternative standard errors and inference stability, while Section 4.2 extends the specification by incorporating additional controls and dynamic effects. Section 5 presents the results of the hierarchical clustering analysis and the segmentation of financial structures. Section 6 evaluates machine learning performance and analyzes variable importance. Section 7 integrates the empirical evidence to provide a comprehensive assessment of the determinants of stock market trading diversification. Section 8 discusses the policy implications for promoting broader and more inclusive equity markets. Finally, Section 9 concludes the paper.

2. Literature Review

Financial development and economic performance are two interrelated topics that have been widely researched in the economic literature, and considerable consensus has emerged on the importance of financial structures in contributing to economic growth, efficiency, and the efficient allocation of resources. Fundamental literature notes that financial development positively influences economic performance through improving capital allocation, diversifying risks, and overcoming problems associated with information asymmetry (Levine, 1997, 2005). Moreover, properly functioning financial systems promote savings and allocate funds towards efficient investments, leading to long-run growth and development (Levine et al., 2000; Beck et al., 2000). Such seminal works constitute the theoretical basis for studying financial development and provide the necessary references for the current paper. Accordingly, a fundamental issue in financial systems concerns the financial architecture, which is usually considered along the bank-based versus market-based continuum. The main point is that different financial architectures can have various effects, because they differ in how they organize financial intermediation (Allen & Gale, 2000; Demirgüç-Kunt & Levine, 2001). Bank-based financial systems rely primarily on intermediated finance, whereas market-based systems focus on the role of financial markets in allocating resources. However, another important aspect is that there should not be a single dominant financial system, but rather proper organization and complementarity among the different elements of financial systems. Another stream of financial development research studies its effects on inequality and access to opportunities in the economy. Financial inclusion reduces income inequality and expands opportunities available for both households and firms (Beck et al., 2007). This is an important consideration for financial market participation, as broader access to financial services increases the number of participating traders. However, despite these developments in the literature, financial development has traditionally been measured by size, using aggregate measures such as credit to the private sector, market capitalization, and other indicators of liquidity. However, the issue of the internal structure of financial markets, including the allocation of capital within them, has not been fully addressed so far. Indeed, Wurgler (2000) demonstrates that financial development improves capital allocation across firms and industries in financial markets, suggesting that the internal structure of financial markets matters for their performance. Moreover, some theories of market microstructure also point to the significance of capital allocation in markets. Market microstructure theory shows that trading is influenced by factors such as liquidity, transaction costs, and informational imperfections. Bid-ask spread measures market liquidity and affects trading volume (Amihud & Mendelson, 1986), whereas informational imperfections make it impossible for markets to achieve efficiency (Grossman & Stiglitz, 1980). As a result, there would be an unequal distribution of trading activity across assets, with trading concentrated around several large, liquid assets. There is some evidence on this matter, as large firms affect aggregate dynamics in the economy, according to Gabaix (2011). Therefore, trading activity in equity markets might be concentrated among a few assets and firms. In addition, the firm-level financial structure is an important determinant of economic performance. Financial dependence positively affects growth in countries with highly developed financial systems (Rajan & Zingales, 1996), suggesting that financial structure affects both aggregate economic performance and its distribution. Moreover, financial liberalization and international financial integration can increase access to financing and broaden market participation (Bekaert et al., 2005), although results vary across different countries. Moreover, individual investors further exacerbate the distribution of trading activity. Barber and Odean (2000) show that trading activity is not optimally concentrated across many assets but rather restricted to a very limited number. Thus, all these developments imply that financial development needs to be studied not only with regard to the size of markets but also their internal structure in terms of trading diversification. In this respect, the composition and distribution of trading among firms in equity markets constitute an important, yet under-researched topic. Recent research extends this framework by analyzing the role of different aspects of financial development (financial market structure, openness, and institutions) and their impact on market performance. However, this research still focuses on aggregate financial markets or specific markets, without addressing the distribution of trading activity. In this respect, a research gap in the literature concerns the absence of a theoretical framework linking financial system structure, market composition, and external financial flows to trading diversification. This study fills the existing gap by building on a well-established, broadly recognized literature on financial development rather than relying solely on recent findings. First of all, the novelty of the study lies in introducing trading diversification as a measure of market internal structure, which constitutes a unique feature of the market’s financial development. Second, this study contributes to the literature by providing a more granular perspective on financial development by examining the market composition and distribution of trading activity across firms. Finally, the current paper draws on theoretical knowledge from financial-structure theory, market microstructure theory, and the capital allocation literature. See Table 1.

3. Methodology and Data

In order to achieve the stated research objective, the present study relies on the Global Financial Development Database (GFDD) produced by the World Bank, one of the most common sources in empirical research due to its broad coverage in terms of various financial structure, depth, efficiency, and access indicators (Patalano & Roulet, 2020). The selected database provides data on 38 countries for the period 2004–2021, offering up to 684 country-year observations with complete coverage (18 observations per country per year). However, there are significant variations in data availability, resulting in a large number of missing values and an unbalanced panel dataset. Although all countries are observed over the same reference period, 2004–2021, the effective sample size will be considerably reduced due to missing data for key variables. Econometric estimations are performed using the complete-case methodology and listwise deletion, in accordance with which only observations with values for all variables are used. As a result, baseline models (OLS, Fixed Effects, and Random Effects) include 266 country-year observations with full data. The addition of extra control variables and lags makes the dataset incomplete, decreasing the effective sample size. In particular, dynamic specifications that include lagged variables are estimated based on 216 country-year observations, which account for both data availability problems and the loss of initial periods necessary for lag construction. Missing data are not imputed or interpolated; only country-year observations that are complete in terms of all variables are considered, although at the expense of a considerable reduction in the sample size (from the theoretical maximum 684 observations to only 216). With the introduction of additional control variables, the sample is further reduced to 244 observations due to missing values in these variables. The dependent variable VTX is calculated as the value traded by all firms, excluding the top 10 most traded firms, relative to the total value traded by all firms (Yakubu et al., 2023). This measure captures the degree of trade diversification by gauging how much trade activity occurs outside the largest firms. The empirical model comprises four independent variables representing various dimensions of financial structure and external financial integration. Variable selection is based on theoretical considerations within the paper’s framework, which analyzes the relationship between financial structure, market composition, and financial flows on the one hand, and trading diversification on the other. These variables have been chosen because they correspond to specific, non-overlapping aspects of the financial system that are theoretically linked to the process under examination. Empirical specification aims to represent four mutually complementary mechanisms. First, DBS is used to capture financial intermediation structure, distinguishing between bank-based and market-based financial systems. Second, MCX is used internally to measure the composition of equity markets. Third, REM is intended to reveal the role of external private financial flows represented by remittances in influencing liquidity and participation in the financial market. Fourth, IPU is used to assess the impact of external public finance reflected in dependence on international borrowing and the possible crowding-out effect. Additional variables included in the model serve to control for macroeconomic conditions (GDPG), financial development (DCP), market size and liquidity (SMC, TOR), as well as alternative channels of participation and resource allocation. Specifically, Internet usage (INTU) is used to control for the impact of improved access on participation in the financial markets; the number of listed companies (NLC) reflects market breadth and opportunities for investing in it; and government credit (GOV) reflects public financing and possible crowding-out effects domestically. All of these variables are important because they aim to isolate the theoretical channels, avoiding redundancy and multicollinearity in the models rather than maximizing fit through exhaustive specification searching. The empirical results confirm the rationality of this approach, as the main associations persist across all specifications and robustness checks. Therefore, the empirical specification can be viewed as a direct application of the theoretical model in practice, ensuring its consistency.

The variables used in the paper are calculated from standard GFDD indicators and are renamed for ease of comparison. Definitions of all variables are taken from the GFDD documentation, while they are calculated as indicators available in GFDD, without transformation (as ratios relative to GDP or totals). Specifically, VTX is value traded excluding the top ten traded companies relative to total value traded; DBS is deposit-taking bank assets relative to assets of deposit-taking banks and central banks; REM is personal remittances received relative to GDP; MCX is market capitalization excluding the top ten companies relative to total market capitalization; and IPU is outstanding international public debt securities of the public sector relative to GDP. See Table 2.

In terms of variable selection, the approach remains theoretically guided and parsimonious, including only a few independent variables that represent distinct features of the financial system to avoid overlapping and multicollinearity. In place of selecting a large number of potentially overlapping indicators, the analysis focuses on identifying important structural mechanisms, while the other dimensions, such as financial depth, institutional quality, and macroeconomic fundamentals, are included as covariates. Most importantly, the choice of variables is neither random nor intuitive, but is based on strong theoretical justification for the link between financial structure and external flows to participation in trading. Bank-based financial structures, represented by DBS, are likely to crowd out equity market development and, thus, lower the trading diversification. Increased household income from remittances (REM) is expected to boost investors’ participation in markets, leading to a more even distribution of trading across participants (Bettin et al., 2017; Imran et al., 2019). Market concentration (MCX) represents a measure of trade-off between market concentration and breadth, and, hence, should be considered a relevant and important determinant of trading dispersion. Lastly, reliance on external public debt (IPU) is used as a proxy for external financing, which may hinder domestic financial development.

The initial dataset includes 38 OECD countries. However, due to data availability constraints and the adoption of a listwise deletion approach, the final estimation sample is reduced to 22 countries. The countries included in the econometric analysis are: Australia, Austria, Canada, Chile, Colombia, Germany, Greece, Hungary, Ireland, Israel, Italy, Japan, Korea (Rep.), Luxembourg, New Zealand, Poland, Slovenia, Spain, Switzerland, Turkey, United Kingdom, and the United States. See Figure 1.

Table 3 summarizes the full sample structure, including country coverage, time span, and missing data patterns.

The dataset comprises all 38 countries, enabling a balanced dataset with 18 observations per country per year over the period from 2004 to 2021. However, as discussed above, the actual panel will be unbalanced due to unequal data availability, particularly for the two important market-structure variables VTX and MCX (Baltagi, 2021). The first pattern is that missing values are common across countries for VTX and MCX. For example, these variables are not available in many economically advanced European countries, such as Belgium, France, Finland, and the Netherlands. Also, in other economically underdeveloped countries, such as Latvia, Lithuania, and Estonia, there appears to be no data on these variables, indicating a lack of structural data. It shows that measurements of trading diversity are missing in some countries, especially in non-financially developed markets. The second pattern refers to the uneven distribution of the missing values over time. In many countries, including Australia, Austria, Italy, and Slovenia, gaps became more visible after 2014. This can indicate changes in accounting practices or other problems in data acquisition (Hughes, 2025). By contrast, countries such as Chile, Colombia, and Germany have fewer missing values over time. Other data, such as DBS, do not show values in the later years in Canada and Mexico, while IPU data in Nordic countries, including Norway and Denmark, do not have complete datasets. Moreover, Luxembourg and Switzerland have missing data across different variables, further reducing the number of valid observations. All in all, the table clearly demonstrates considerable cross-country data variation. As a result, the use of an unbalanced panel will be justified, and the listwise deletion method should also be considered reasonable (Espinas et al., 2026).

This approach ensures internal consistency and comparability of the empirical analysis while explicitly accounting for the unbalanced nature of the panel.

4. Panel Regression Results and Economic Interpretation of Trading Diversification

Specifically we have estimated the following equation:

V T X_{i t} = α + β_{1} {(D B S)}_{i t} + β_{2} {(R E M)}_{i t} + β_{3} {(M C X)}_{i t} + β_{4} {(I P U)}_{i t}

where

i = 1, \dots, 23 a n d t \subseteq [2004; 2021]

reflecting the unbalanced panel structure.

This analysis does not aim to establish strict causal relationships, but rather to identify robust empirical associations consistent with theoretical mechanisms. The use of fixed effects controls for time-invariant unobserved heterogeneity across countries, while the inclusion of lagged independent variables helps mitigate potential simultaneity concerns. Although these approaches do not fully eliminate endogeneity, they improve the credibility of the estimated relationships.

The results reported in Table 4 provide a comprehensive comparison across three econometric specifications—OLS, Fixed Effects (FE), and Random Effects (RE)—to identify the determinants of stock market trading diversification (VTX). This comparison is not merely methodological but carries important economic implications, as it allows us to distinguish between spurious correlations driven by unobserved heterogeneity and structurally robust relationships. See Table 4.

One of the initial observations concerns the clear distinction between ordinary least squares (OLS) and panel data specifications. Specifically, the latter estimates show a significantly lower value of R-squared (0.199 compared to 0.731 in OLS). However, this interpretation should be approached with caution, as OLS estimation does not account for unobserved country-specific heterogeneity (Cameron & Trivedi, 2010; Kahane, 2024). In particular, this can be problematic for the current research question as structural characteristics (institutional quality, financial development, regulations, etc.) affect both the dependent variable and regressors. One example concerns the DBS variable, which appears statistically insignificant and close to zero in OLS, while being negative and economically meaningful in all panel estimates. Comparing the fixed-effects (FE) and random-effects (RE) estimates of panel models is the essence of econometric analysis. Specifically, the Hausman test results (χ²(4) = 28.39; p < 0.01) confirm the preference of FE estimation. Namely, it rejects the null hypothesis of no systematic difference between estimators. Thus, the existence of a correlation between the unobserved effects and regressors is confirmed. As a consequence, the inconsistency of the RE estimator is proven, making the FE model preferable for further economic analysis. Speaking of the DBS variable representing the relative size of deposit-taking banks, its effect is economically and statistically meaningful. Namely, the FE estimate shows a negative coefficient (−0.269) that is significant at the 10% level. These results are in line with the theoretical background of the current paper. The financial system based on bank intermediation has low levels of trading diversification. This is due to lower stock market development and a higher share of bank financing used by companies, as well as the corresponding concentration in the list of trading firms (Demirgüç-Kunt et al., 2013). The REM variable, which represents remittance inflows, is another significant factor affecting the degree of trading diversification. In particular, the FE estimate is positive and highly significant (9.486; p < 0.01). This result underscores the importance of external private financial flows in increasing the share of trading firms in total trading volumes. More specifically, remittances allow households to increase their income and savings and to participate more actively in financial transactions (Combes et al., 2014). The third important variable in the current analysis is MCX. In particular, it shows high statistical significance and large coefficient values across all estimated models. For example, in the FE model, it equals 0.499 and is highly significant (p < 0.01). This result clearly confirms the hypothesis that the internal equity market structure is significant for trading diversification. Indeed, this result aligns with the relevant literature on market structure (Bekaert et al., 2013) and with the machine learning results reported in the paper. The high share of market capitalization held by firms other than the top ten companies leads to a more diversified trading process. The last important explanatory variable in the analysis is the share of international public debt (IPU). The FE estimate of the variable is negative and statistically significant (−1.062; p < 0.01). This variable represents a negative relationship between external financing and stock market structure. The economic implications of this estimate concern the lower development of equity markets and the weaker ability of countries relying on external finance to attract private investment (Broner et al., 2014; Eberhardt & Presbitero, 2015; Kose et al., 2021). It should be noted that the signs of the estimates are similar for FE and RE models. However, there are notable differences in the values of these estimates for the REM and MCX variables. They correspond to the Hausman test results and demonstrate the impact of neglecting the correlation between regressors and unobserved country-specific factors on the reliability of estimates (Cameron & Trivedi, 2010). As for the model fit, the within R-squared is 0.185, which is significantly lower than the OLS indicator (0.731). However, this result corresponds to the common property of panel data models with fixed effects when much of the variation comes from cross-sections (Kahane, 2024). Thus, approximately 18.5% of the within-country variation in trading diversification is explained by the regressors included in the current model. While it does not seem very impressive, it is a reasonable result considering the complexity of the topic. Another point to mention is the presence of missing data. It occurs frequently for VTX and MCX. As a result, the estimation is based on 266 observations using the case-complete method. Although it can lead to sample selection bias, the results indicate the robustness of the identified relationships. Overall, the estimates presented in the paper support the main hypotheses regarding the driving factors of trading diversification in stock markets. They include the internal equity market structure represented by the MCX variable, external private financial flows measured by REM, and factors that discourage the process—DBS and IPU. In this regard, the FE estimation appears to be the most suitable one.

4.1. Robustness Checks: Alternative Standard Errors and Inference Stability

The robustness checks reported in Table 5 play a crucial role in validating the empirical findings by addressing potential violations of standard panel data assumptions, namely heteroskedasticity, serial correlation, and cross-sectional dependence.

However, while the FE baseline model provides consistent estimates, inference relies heavily on whether the standard errors were correctly calculated (Baltagi, 2021). For this reason, Table 5 evaluates the baseline FE model along with two additional variations—FE with country-clustered standard errors and FE with Driscoll-Kraay standard errors that are robust to all forms of dependence (Driscoll & Kraay, 1998; Hoechle, 2007). First of all, it should be noted that all estimated coefficients are stable across the considered models, with no changes in size or sign. Therefore, it is safe to conclude that the underlying economic relationships are structural in nature and not a consequence of a particular assumption concerning the nature of error. For instance, the estimated DBS coefficient is always negative, thus confirming the inverse relationship between bank dominance and market diversity in terms of trading structure. According to the baseline FE model, the variable is statistically significant at the 10% level. When using clustered standard errors, the result is non-significant, indicating that larger standard errors lead to this outcome. It is likely that the initial loss of significance was due to conservative standard error estimates, since applying Driscoll-Kraay standard errors renders the variable statistically significant at the 5% level. Overall, all results confirm that more bank-oriented financial systems have more concentrated market structures. Regarding the coefficient of REM, it also does not change its sign and remains positively related to diversification regardless of the standard errors used. At the same time, it remains economically significant. Specifically, the coefficient is highly significant according to the baseline and Driscoll-Kraay models but is non-significant under country clustering. As previously mentioned, the choice of standard errors significantly affects the significance of certain coefficients, and hence their magnitudes can differ across models. However, the stability of the coefficient size across all considered models suggests that remittances’ contribution to trade is economically relevant. Most importantly, the MCX coefficient remains positive and economically significant across all specifications. It seems to have the largest influence on trading diversification and remains statistically significant in all models. Therefore, it can be concluded with certainty that this variable is the driving force behind diversification. Specifically, the greater importance of firms not among the top ten determines the trading structure of stock exchanges. Overall, this finding demonstrates once again that the internal composition of stock markets affects their structure. The coefficient of IPU is always negatively correlated with diversification, regardless of the standard errors applied. It remains insignificant under clustering but is weakly significant under the Driscoll-Kraay approach. Since the impact of international public debt has little influence on the results, this suggests that cross-country and temporal dependence are especially important in determining the statistical significance of the coefficient. However, regardless of the chosen specification, the variable’s effect remains negative. One interesting point is that the R-squared remains stable across all three specifications. This is quite understandable because only the standard error calculation changed in this case, and not the methodology. The results confirm that the choice of standard errors does not affect the model’s explanatory power and is solely related to significance and confidence levels. It can be stated with certainty that Driscoll–Kraay standard errors account for heteroskedasticity, cross-sectional, and serial correlations (De Hoyos & Sarafidis, 2006; Driscoll & Kraay, 1998). As shown above, the main results of the study did not significantly change. All variables remain significant across specifications, indicating that the results are quite reliable and credible. In summary, robustness checks show that the main results of the study are robust and cannot be explained by violations of classical assumptions about error characteristics. Thus, most coefficients retain their statistical significance despite using different standard errors. Moreover, the signs of those that become statistically insignificant remain the same, indicating that the general interpretation of the results is correct (Thompson, 2011).

4.2. Extended Specification with Controls and Dynamic Effects

It is crucial that the inclusion of control variables ensures that the estimations are unbiased due to the presence of other relevant factors alongside the independent variables of interest. As trading activity in financial markets depends not only on its structure but also on macroeconomic conditions, financial development level, market size and liquidity, and accessibility/participation by agents, the empirical analysis is specified to include such control variables that can separate the effect of the primary independent variables from other factors related to trading (Levine, 2005; Beck et al., 2000). Firstly, macroeconomic conditions are included in the specification as the GDP growth rate (GDPG), since they can serve as proxies for business cycle dynamics. According to the theory, fluctuations in macroeconomic conditions can affect investor behavior and, consequently, the extent of their trading activity, regardless of the financial system’s structure (Levine, 2005). Secondly, the level of financial development will be included in the analysis through the domestic credit to the private sector (DCP) variable. This proxy accounts for differences in financial systems’ efficiency and, accordingly, investors’ participation in financial activities. Furthermore, the stock market features are captured through the inclusion of stock market capitalization (SMC) and the stock market turnover ratio (TOR). The inclusion of these variables is justified insofar as there is a need to distinguish between the intensity and volume of trading and its diversification, which is the focal point of the analysis. The reason is that a large, liquid market may generate high trade volumes without necessarily being diversified (Levine, 2005). As for alternative sources of financial resources, their consideration is especially needed to control for households’ participation in the financial market and to account for domestic public financing channels. Thus, remittances (REM) can be used as a measure of household income and liquidity, as they are related to consumption, saving, and investment. Moreover, internet usage (INTU) will serve as an indicator of participation opportunities resulting from increased access to financial services (Ozili, 2021; Demirgüç-Kunt et al., 2021). The final control included is the number of listed companies (NLC). It represents the size of the opportunity set in stock markets, reducing market concentration and fostering trading diversification. Finally, credit to government and state-owned enterprises (GOV) will be considered as a proxy for domestic public debt and its effect on trading (Sahay et al., 2015; Kose et al., 2021). Hence, a combination of these control variables will allow for accounting for other channels through which financial diversification can be achieved, including private/public financial flows and household financial inclusion in the stock market. Notably, the addition of these variables does not affect the main conclusions drawn from the analysis; rather, it improves their identification. To reduce the potential simultaneity or reverse causality problem, it is possible to introduce lagged versions of the main regressors and control variables. It is necessary because financial and macroeconomic variables tend to exhibit high persistence and may be jointly determined by trading activity. Using lagged values of these variables (e.g., L.DBS, L.REM, L.MCX, L.IPU) would help to avoid problems associated with contemporaneous correlation. In turn, lagged control variables (L.GDPG, L.DCP, L.SMC, L.TOR, L.INTU, L.NLC, L.GOV) would control for the dynamics of macroeconomic conditions, financial development, stock markets’ features, and participation. Generally speaking, the simultaneous use of lagged and contemporaneous variables can be justified by the persistence of financial processes (Sahay et al., 2015). See Table 6.

The results reported in Table 7 provide a rich and structured assessment of the determinants of trading diversification (VTX), allowing for a direct comparison between the baseline specification, the inclusion of additional controls, and dynamic specifications with lagged variables. Overall, the findings strongly reinforce the theoretical framework of the article, which emphasizes the role of financial structure, external financial flows, and market composition in shaping the distribution of trading activity. See Table 7.

Following the baseline fixed-effects specification, the estimates confirm the basic theoretical claims. The coefficient of DBS is significantly negative and insignificant. According to substantive meaning, an increasing number of banks in the country’s financial sector is associated with a lower expectation of trading diversification. This statement aligns with theoretical considerations suggesting that a financial system dominated by banks hinders the development of equity markets, thereby limiting the number of participants (Didier et al., 2021; Aghion et al., 2005). Meanwhile, REM has a significantly positive coefficient, suggesting that remittances increase household liquidity and promote financial participation. Remittances have been found to facilitate broader inclusivity and encourage engagement in trading activities (Barajas et al., 2020). MCX is an important variable with a substantial coefficient, suggesting that trading diversification primarily depends on market features. The coefficient for IPU is negative and significant, confirming that external public debt can lead to increased market concentration through crowding-out effects (Bonizzi, 2013; Raddatz et al., 2017).

Several conclusions may be drawn from the regression with control variables included. Namely, the effect of DBS turns out to be even smaller and insignificant, implying that part of the effect previously observed is due to the country’s broader macro-financial development. However, the variable still preserves its negative sign as expected. The coefficients for REM and MCX are highly significant and large, confirming their relevance for interpreting the discussed phenomenon. No omitted factor affects the contribution of remittances to trade diversification (Didier et al., 2021).

The inclusion of controls yields the following theoretical implications. First, GDPG does not show a significant contribution to VTX, implying that macroeconomic growth plays a minor role in promoting the trading diversification. It supports the idea that VTX measures the structural dimension of trading activity. Second, DCP does not affect VTX significantly either, suggesting that financial development cannot provide sufficient benefits; it is the structure and allocation policy that matter (Aghion et al., 2005).

Further, two crucial variables have been identified in the extended theoretical model. First, the SMC coefficient is significantly positive, confirming the contribution of market size to trading diversification (Didier et al., 2021). TOR also contributes to VTX, though without statistical significance; this implies that liquidity contributes to the diversification process but is not the main factor. The inclusion of the new control variables adds more substance to theoretical assumptions. The Internet usage index, INTU, confirms the theoretical expectation; nevertheless, it proves insignificant for financial participation due to cross-country heterogeneity (Saraf & Kayal, 2022; Barajas et al., 2020). NLC is highly significant, with a large coefficient, suggesting that the number of listed companies in a stock market facilitates trading diversification. GOV is negatively significant with respect to VTX, confirming the crowding-out theory (Bonizzi, 2013).

If dynamic models with a lag structure are estimated, REM and MCX are found to be positively significant, indicating a persistent contribution of these factors. The results align with theoretical expectations regarding the persistence of remittance and market structure impacts on financial participation (Barajas et al., 2020). IPU continues to show negative but insignificant results, which might mean that the contribution of international public debt operates on a delayed basis. Similarly, DBS becomes insignificant under dynamic models too. As seen, the comparison of standard errors across specifications shows stability; only GDPG fails to demonstrate statistical significance under standard clustering.

To summarize, the findings reliably support the basic hypothesis that trading diversification is shaped by the structure of the stock market and financial flows. DBS is a variable with a complex effect that warrants consideration in relation to its composition. Also, the addition of new control variables helps highlight other dimensions of the discussed phenomenon (Idroes et al., 2024; Beck et al., 2015).

4.3. Alternative Banking Measure: Structure vs. Size Effects

To conduct a more robust analysis of the baseline results and maintain consistency in scaling all explanatory variables, we specify another model that focuses on the choice of the banking system indicator. In the baseline model, the variable DBS reflects the importance of deposit-taking banks within the total banking sector (Beck et al., 2009; Čihák et al., 2012). At the same time, it is worth noting that, unlike other regressors, DBS does not reflect shares of GDP, which can lead to inconsistencies in the scaling of regressors. To eliminate such concerns, the following variable, called DBG, is introduced and reflects the ratio of deposit money banks’ assets to GDP. It should be noted that this step serves two complementary functions. First, it provides consistency in the scale of regressors by making a corresponding adjustment to the financial sector indicator and aligning it with other regressors in light of their macro-financial orientation (Čihák et al., 2012). Second, it allows for a robustness check and for assessing whether the results obtained depend on the specific financial-sector indicator used (Beck et al., 2009). In this context, it is possible to separate the effects associated with the structure of the financial sector from those associated with its size in the economy. The results of the aforementioned procedure are presented in Table 8, where, in addition to the baseline fixed-effect specification, results from estimations using the alternative measure of the banking sector variable and clustering are shown. It is evident that the increase in the number of observations in the model using the DBG variable (286) compared to the baseline model (266) is due to greater data availability associated with this variable, resulting in a larger complete-case sample.

In the baseline specification, DBS stands for the internal structure of the financial system captured by the share of deposit-taking banks among the assets of all banks. Still, as indicated by the reviewer, most of the explanatory variables are in terms of the shares of GDP. In order to control for such potential effects and test whether they affect the results, we consider an alternative measure of the share of the banking sector—DBG, or the share of the assets of deposit money banks among GDP (Langfield & Pagano, 2016). There exists a clear theoretical reason behind using a particular measure in the equation. While DBS characterizes the internal structure of the financial system and tells us something about the share of commercial banks among the banks’ assets, DBG characterizes the scale of the banking sector relatively to the economy. Hence, one of the measures captures how the financial intermediation takes place while the other—how much the banking sector represents the economy. Both these dimensions have certain implications for financial development, and thus it may make sense to check whether they play a role (Brei et al., 2023; Cournède et al., 2015). We want to compare the two and see which of them influences the level of trading diversification more. The estimates suggest that the coefficient becomes insignificant regardless of whether we cluster standard errors or not. Even though the coefficient remains negative and implies the same interpretation—a bigger banking sector leads to less diversified trading, its impact is no longer statistically significant. Thus, even though the result seems to hold in a sense, the magnitudes differ, depending on which measure of the banking sector is chosen. As discussed above, there exist conceptual differences between these measures that allow making an economically sensible conclusion that the internal structure of the financial system plays a role, but its scale does not matter (Kharroubi, 2015). In other words, it may seem that the existence of a huge banking sector crowds out the equity market. What actually matters is the fact that financial intermediation occurs through the banking system and not other channels, since the first variable characterizes this phenomenon and the latter one—the scale. Thus, our baseline regression contains the first measure as a more economically meaningful one. At the same time, the remaining set of variables shows high stability across different specifications and thus reinforces the robustness of the results. First of all, the coefficient on REM again turns out to be positive and implies that remittances facilitate financial participation. Secondly, the variable MCX remains significant and has a similar magnitude. Finally, the coefficient of IPU keeps being negative and insignificant. All in all, this robustness check confirms that our results are not affected by a particular measure of the banking sector used. It also gives additional support to the idea that structural characteristics of the financial system matter for trading diversification (Brei et al., 2023). Despite DBG ensuring consistency with other regressors in terms of scaling, we use DBS in the baseline regression, as it better captures the internal structure of the financial system rather than its magnitude. The robustness analysis shows that the results are not driven by this choice and also allows concluding that the internal structure is the economically relevant characteristic (Cournède et al., 2015).

5. Hierarchical Clustering Results and Financial Structure Segmentation

The results obtained for the normalized indicators clearly demonstrate the trade-off between compactness, separation, and structure for the considered algorithms, as widely discussed in the literature on clustering validity indices (Arbelaitz et al., 2013; Vendramin et al., 2010). Density-based clustering has high values for maximum diameter, minimum separation, and the Dunn index, which means good cluster compactness and separation, while it has poor results for Pearson’s gamma, entropy, and the Calinski–Harabasz index, which indicate poor global structure and balance in cluster partitions, consistent with the idea that different indices capture different aspects of clustering quality (Liu et al., 2010).

Hierarchical clustering has the best results for Pearson’s gamma and reasonably good results for entropy and the Dunn index, which indicate good global ordering and acceptable internal validity, though it does not dominate in separation and compactness results. This reflects the known strength of hierarchical methods in preserving global structure rather than optimizing a single objective function (Saxena et al., 2017).

k-Means has the best results for the Calinski–Harabasz index and high values for Pearson’s gamma, which indicate good global variance separation, while it has poor results for the Dunn index and minimum separation, which indicate poor cluster separation in the data space, in line with evidence that k-means tends to favor variance-based criteria over distance-based separation (Arbelaitz et al., 2013; Vendramin et al., 2010).

Model-based and Random Forest clustering methods demonstrate balanced results, which are neither good nor poor for any of the considered indicators, though they are close to the best results for entropy and moderate results for separation and compactness, supporting the view that some clustering approaches provide more stable but less extreme performance across validation metrics (Saxena et al., 2017).

Fuzzy C-Means has poor results for all separation and compactness indicators, despite good results for entropy, which makes it difficult to evaluate its quality, highlighting the sensitivity of fuzzy clustering methods to the choice of validity indices (Liu et al., 2010).

Considering all indicators together, Hierarchical clustering demonstrates the best balance between high values for Pearson’s gamma and good results for other indicators, avoiding poor results observed in other methods, which makes it the best compromise for clustering validity, consistent with comparative studies emphasizing the importance of multi-criteria evaluation (Arbelaitz et al., 2013; Vendramin et al., 2010). See Table 9.

As indicated by the hierarchical clustering results, the cluster structure shows considerable unevenness, namely, a principal cluster that is substantially larger than the rest. This type of unevenness is typical in empirical clustering cases in which the data demonstrate dominant group patterns (Essary et al., 2022; Xu & Tian, 2015). Specifically, cluster 1 contains 186 observations and accounts for the majority of the dataset, whereas the other clusters contain fewer observations, with two of them having just one observation each (clusters 5 and 10). Thus, one may infer that the majority of units in the multi-dimensional space defined by variables VTX, DBS, REM, MCX, and IPU have a similar configuration, whereas a few units represent unique combinations of the above-mentioned variables and thus require clustering. This explanation aligns with the identification of outliers and specialized clusters in clustering analyses (Hennig, 2015; Campello et al., 2015). This inference is supported by the results of calculations of within-cluster heterogeneity. As is shown by the results, cluster 1 constitutes about 69.5% of all within-cluster heterogeneity, meaning that a large majority of all within-cluster heterogeneity of the data is associated with cluster 1. At that, clusters 2, 3, and 4 account for rather small amounts of within-cluster heterogeneity, ranging between 7% and 13%, whereas the other clusters account for negligible values. Thus, one may assume that the data under consideration have the structure characterized by the presence of a few clusters, with cluster 1 being the most numerous and constituting “typical” cases with respect to stock market breadth (VTX), banking sector size versus central bank size (DBS), remittances (REM), market concentration (MCX), and international public debt (IPU). This inference also aligns with clustering models, suggesting the dominance of a few clusters and the presence of special clusters (Xu & Tian, 2015). Furthermore, one can consider the within-cluster sum of squares of the data under consideration. The highest within-cluster sum of squares is exhibited by cluster 1, which is a consequence of its large size but also indicates the presence of considerable heterogeneity within this cluster. That is, although the observations in cluster 1 are closer to each other than those in other clusters, they exhibit considerable financial structure diversity. Conversely, the other clusters are characterized by lower within-cluster sums of squares, with some reaching zero, which may be explained either by extremely homogeneous structures within these clusters or by the mere fact of their single membership (Essary et al., 2022). In addition, silhouette analysis can serve as another criterion for evaluating clustering quality. Thus, cluster 1 has a rather low silhouette value of 0.271, indicating that many observations are close to the cluster boundary and to observations in adjacent clusters. It is consistent with the assumption of relatively high heterogeneity in this large cluster, suggesting gradual transitions from the cluster to other clusters (Hennig, 2015). Meanwhile, clusters 2, 3, and 4 show somewhat higher silhouette values, implying rather good quality and separation, though not too high. Finally, the smallest clusters (6, 7, 8, and particularly 9) exhibit exceptionally high silhouette values (0.905), indicating strong separation from other clusters. Overall, one may conclude that hierarchical clustering allows for inferring a core-periphery structure of the data under consideration. Thus, the majority of observations constitute the core, defined by relatively diffuse patterns in the multi-dimensional space composed of variables VTX, DBS, REM, MCX, and IPU, whereas few observations make up the periphery, characterized by sharp separation and specificities. From the viewpoint of economic and financial analysis, one may conclude that there are many units or countries that demonstrate similar financial structures or market developments; however, there are outliers characterized by peculiarities in terms of market concentration, remittances, banking structure, or international public debt, thus requiring isolation into separate clusters (Xu & Tian, 2015). See Table 10.

Thus, the obtained cluster centroids provide an extensive explanation of how differentiated groups of observations can be formed based on their features across the mentioned aspects of stock market structure, banking system composition, external financial flows, market concentration, and international public debt dependence. This approach aligns well with modern clustering analysis in finance and economics (Petrescu & Krishen, 2023; Huang & Yang, 2025). Since the standardized values represent deviations from the sample average, the clusters can be compared with respect to the relative positions of the variables. Cluster 1, which also includes the majority of the sample, shows moderately positive VTX and REM, but slightly negative DBS and MCX, and close to zero IPU. This means that this cluster characterizes countries with somewhat broader, more diversified stock market trading than that represented by large companies. In addition, this cluster represents countries where the importance of deposit-taking banks is rather small compared to that of the central bank. The market concentration is relatively low, while there is no significant reliance on international public debt. Therefore, Cluster 1 can be defined as a country with an average financial system. In contrast, Cluster 2 is characterized by a highly positive DBS value but negative REM and IPU values. VTX is rather small, and MCX is slightly positive. This means that this cluster represents countries with relatively less diversified stock exchanges and banking systems, which are important to the central bank, and that have low levels of international public debt and remittances. Thus, Cluster 2 comprises countries with low remittance levels and high international public debt dependence. Cluster 3 has negative VTX and DBS but positive REM and IPU, and negative MCX. This implies that this group of observations refers to countries with rather concentrated stock exchanges and poorly developed banking. However, they have substantial remittance flows and rely heavily on international public debt markets. Accordingly, Cluster 3 is associated with a regime in which remittances and international public debt are of key importance in the economy. It is also possible to see such results in regard to some economic systems (Huang & Yang, 2025). Cluster 4 has a high MCX, moderate positive DBS, VTX, and negative REM and IPU. Therefore, this cluster can be interpreted as one characterized by a highly diversified stock market beyond the largest companies, a relatively developed banking sector, and a comparatively small impact of remittances and international public debt markets. As a result, remittances and international public debt markets do not play an essential role in Cluster 4. Clusters 5, 8, and 10 have rather extreme profiles. So, Cluster 5 is characterized by high MCX and IPU, positive DBS, but negative REM, meaning high levels of market orientation and banking, reliance on international public debt, and minimal significance of remittances. Cluster 8 is characterized by very high MCX and negative REM and moderate VTX and positive DBS, implying highly market-based financial systems with low remittance dependence. Finally, Cluster 10 has high DBS and REM but a negative IPU. Finally, Clusters 6 and 9 represent the most extreme profiles. Cluster 9 has extremely negative VTX but positive REM and IPU, while Cluster 6 is similar but has moderate VTX. Such results indicate that economies in these clusters have a very narrow stock market structure, weak and less dominant banking, and a high dependence on external financial flows from the public and private sectors. This approach to determining outliers is supported by modern clustering research (Hennig, 2015). To conclude, the cluster means show that countries have been split into different types of financial system development and external financial integration, ranging from more diversified, market-oriented structures to less concentrated but more dependent ones. See Table 11.

Figure 2 presents a comprehensive visual interpretation of the results obtained from the hierarchical clustering for the model with the inclusion of VTX, DBS, REM, MCX, and IPU, bringing together the information obtained for cluster selection, cluster structure, and economic interpretation, consistent with recent advances in hierarchical clustering visualization and interpretability (Cabezas et al., 2023; Randriamihamison et al., 2021). See Figure 2.

Panel A presents the evolution of information criteria and the sum of squares for each cluster, depending on the number of clusters considered for the partition. The downward trend represents the improvement in the goodness-of-fit measure for the model, while the highlighted minimum represents the optimal number of clusters, near ten, where the model balances goodness-of-fit and parsimony considerations for the partition. This result supports the selection of a rich cluster structure, capable of capturing the heterogeneity present in the data set, in line with model selection approaches in hierarchical clustering (Barratt & Plucinski, 2023; Rebafka, 2024).

Panel B presents the clustered observations in a reduced dimensional space, where different colors are used to differentiate the clusters. The clear differentiation between some of the clusters supports the evidence that the hierarchical algorithm does not partition the sample mechanically but recognizes patterns in the data set. Some clusters are more compact and well differentiated, while others show more overlapping patterns, consistent with the coexistence of dense and diffuse structures in hierarchical clustering outputs (Malzer & Baum, 2021).

Panel C presents the dendrogram, where the hierarchical structure of the clustering algorithm is presented, showing the merging of the observations and the different groups obtained during the process, where broad branches are associated with more general patterns, while smaller branches are associated with more idiosyncratic patterns. The level at which the branches merge represents the distance between the groups, while the existence of some long vertical jumps suggests that the clusters are genuinely different in terms of the underlying financial and market characteristics, as widely discussed in dendrogram-based interpretations of hierarchical clustering (Tamarit et al., 2020; Randriamihamison et al., 2021).

Panel D displays the standardized cluster means, which are the economic interpretation of the clusters. The differences in the clusters are evident in terms of the stock market breadth (VTX), relative importance of deposit money banks (DBS), remittance inflows (REM), market concentration (MCX), as well as the relative importance of international public debt (IPU). Clusters are characterized by diversified structures, with high values in terms of market concentration (MCX) and relative importance of deposit money banks (DBS), as well as low values in terms of remittance inflows (REM), while other clusters are characterized by the opposite: narrow structures, with low market concentration (MCX) and relative importance of deposit money banks (DBS), as well as high values in terms of remittance inflows (REM) and relative importance of international public debt (IPU), reflecting the ability of hierarchical clustering to capture heterogeneous structural patterns in multivariate data (Ichino et al., 2021).

The extreme positive or negative values in the clusters indicate the existence of very specific financial structures, as suggested by the small, well-separated groups in the other panels. The figure suggests that the methodology of hierarchical clustering detects a rich segmentation in the data, highlighting the coexistence of a large group of similar observations, as well as a few smaller, more differentiated clusters, driven by different financial development, market structures, and external financial integration, as captured by the underlying five variables, consistent with recent applications of hierarchical clustering in complex multivariate systems (Rebafka, 2024; Cabezas et al., 2023).

The resulting scatter plot matrix in Figure 3 provides a detailed view of the relationships between these five standardized financial variables, VTX, DBS, REM, MCX, and IPU, with the data points colored according to their clusters determined using hierarchical clustering. This form of visualization is useful not only for evaluating the internal consistency of these clusters but also for understanding the ways in which distinct financial dimensions interact with each other, as emphasized in recent studies on financial development and multivariate relationships (Ikpesu, 2024; Alhassan et al., 2025). See Figure 3.

Several distinct financial relationships can be determined. For instance, there is a strong positive relationship between MCX and VTX, indicating that an expansion of stock exchange trading activities beyond the largest firms is related to a less concentrated and more diversified financial system. This is consistent with evidence linking market development and diversification to broader financial deepening (Cuestas et al., 2020). This pattern holds across all clusters, with some clusters residing at more extreme levels.

In terms of DBS, there is a more nuanced relationship. Clusters with high levels of DBS, indicating a stronger financial system dominated by deposit money banks relative to the central bank, are typically found at moderate to high levels of MCX and low levels of REM. This indicates a stronger domestic capital market and less reliance upon remittance flows, consistent with findings that more developed banking systems reduce dependence on external financial inflows (Alhassan et al., 2025).

By contrast, clusters with low DBS levels, indicating a less important role played by domestic banking, are more frequently associated with high levels of REM and, at times, high levels of IPU. This reflects the role of remittances as alternative financial resources in less developed financial systems (Ikpesu, 2024).

From the REM panels, we can see that there is an obvious distinction between clusters that have high remittance dependence and those that have relatively lower remittances. Clusters that have high REM values tend to be located in areas that have lower values of VTX and MCX, indicating that they have relatively narrower and more concentrated market activity and tend to have higher IPU values, indicating more dependence on international public debt. Again, this suggests that remittance-dependent countries tend to have less developed domestic financial markets and more developed links with external sources of finance, consistent with empirical evidence on remittance-finance interactions (Ikpesu, 2024; Alhassan et al., 2025).

The IPU relationships also support this view of segmentation into remittance-dependent and less remittance-dependent countries. Clusters with high IPU values tend to group together with high REM values and lower values of VTX and MCX, while clusters with low IPU values tend more often to group together with stronger values of domestic market activity and DBS. This is consistent with literature linking financial structure, banking development, and external debt reliance (Aldomy et al., 2020; Budhathoki et al., 2024).

The colored point clouds of each of the variables also suggest that the hierarchical clustering has successfully captured significant multivariate structure and not simply arbitrary groupings of countries. Each of the clusters is located in relatively distinct areas of the variable space, with considerable overlap in the central or more “average” areas of each distribution.

In summary, the scatterplot matrix of the data set visually confirms that the hierarchical clustering has successfully captured significant and coherent financial and market structure, distinguishing between market-oriented systems that have well-developed banking systems and relatively well-developed equity markets and systems that tend to be more externally driven and have relatively higher remittances, relatively higher values of international public debt, and relatively lower values of domestic market activity, in line with recent empirical findings on financial system heterogeneity (Cuestas et al., 2020; Budhathoki et al., 2024).

6. Machine Learning Performance and Variable Importance Analysis

The normalized performance metrics provide a clear and consistent basis upon which all the algorithms can be compared along various dimensions of predictive accuracy and goodness of fit, as emphasized in recent machine learning evaluation frameworks (Sagi & Rokach, 2018). All error-based performance metrics are standardized such that higher values indicate better performance, and the same is true of the R² metric. It is, therefore, possible to evaluate all algorithms along a common framework with regard to their predictive accuracy and goodness of fit.

It is possible to immediately rule out the neural networks based on their performance metrics, as they indicate a score of zero across all metrics, implying comparatively inferior performance compared to all other algorithms. The boosting regression model is found to possess moderate performance, with all error-based metrics indicating values closer to the midpoint, as well as a low value of R², indicating that it does not perform better than basic models, consistent with evidence that ensemble methods can vary significantly depending on tuning and data characteristics.

The decision trees indicate comparatively better performance along dimensions of MSE, RMSE, and MAPE, but with a low value of R². However, the relevant comparison here is with KNN, Random Forest, and SVM, as they are the ones that are ranked the highest. KNN achieves the maximum or near-maximum score for a number of error metrics, such as MAE and MAPE, and has exceptional scores for MSE, RMSE, and R², implying local prediction accuracy with minimal average error, in line with findings on instance-based learning methods.

Similarly, the Random Forest achieves the maximum score for scaled MSE and R², and near-maximum scores for RMSE and MSE, implying high overall prediction accuracy. This supports extensive evidence that Random Forest models often outperform other algorithms in terms of predictive accuracy and robustness across different datasets (Biau & Scornet, 2016; Probst et al., 2019).

Although the SVM achieves high scores for MSE and RMSE, along with high R², its performance for MAPE is extremely poor, with a zero score, implying significant problems in terms of percentage error. This reflects the sensitivity of SVM models to specific loss functions and data scaling issues in regression tasks.

Thus, if all the metrics are considered, the overall best-performing algorithm is the Random Forest, as it avoids zero scores for any error metrics, is near the top for all error metrics, and achieves the maximum score for R², implying the maximum proportion of variance explained by the model. Although the KNN might have a slight edge over the Random Forest for local error metrics such as MAE and MAPE, the overall performance of the Random Forest is significantly better than the other two, with minimal problems in terms of error, unlike the SVM and KNN, and therefore can be considered the best-performing model for those problems where overall prediction accuracy along with R² is important, consistent with comparative machine learning studies (Sagi & Rokach, 2018). See Table 12.

The feature importance metrics offer a vivid and informative view of the contribution of the four explanatory variables, namely MCX, REM, IPU, and DBS, towards the predictive capability of the model. The model is presumably a tree-based ensemble model, such as a Random Forest, where variable importance plays a crucial role in interpreting predictive mechanisms (Bénard et al., 2021; Williamson et al., 2023). The metrics offer a nuanced view of the contribution of each variable towards the dependent variable.

The Mean Decrease in Accuracy (MDA) is a measure of the decrease in predictive accuracy when the values of a particular variable are randomly permuted. The higher the value, the more important the variable is. This approach to permutation-based importance is widely used in modern machine learning interpretation frameworks (Fisher et al., 2019; Molnar et al., 2020). MCX has a much higher value of 495.125, more than three times higher than REM and IPU, and nearly nine times higher than DBS. This indicates that the variable representing market capitalization excluding the top ten companies has a significant contribution towards the predictive accuracy of the model. When information is fed into MCX at random, the capability of the model to predict or explain the dependent variable decreases correspondingly.

The Mean Decrease in Accuracy of DBS is 55.195, indicating that DBS is the least important of the four explanatory variables and, therefore, has played a relatively less decisive role. The Total Increase in Node Purity (TINP) is usually computed as the decrease in residual sum of squares over all regression tree splits for a particular variable. MCX has again recorded the highest value of 24,459.853, indicating its significance. REM and IPU recorded values of 11,351.301 and 10,534.754, respectively, indicating their contribution towards the model. The importance of remittance flows (REM) and international public debt (IPU) is reflected here. The contribution of these two variables towards splitting the trees towards homogeneity is high, but not as high as MCX. DBS has recorded a much lower TINP of 6851.254, indicating a relatively minor role played by the domestic business sector, consistent with interpretations of impurity-based importance in tree ensembles (Janitza et al., 2018).

The Mean Dropout Loss, which is computed from the root mean squared error when the variable is dropped from the model, is another measure of variable importance. The higher the dropout loss, the more significant is the effect of dropping that variable from the model. This approach aligns with recent “leave-one-feature-out” or removal-based explanation frameworks (Covert et al., 2021). Again, MCX has recorded the highest importance value of 22.182, which reconfirms its importance in the model. REM and IPU have recorded importance values of 12.734 and 12.111, respectively, indicating their relatively comparable importance. DBS has recorded the lowest importance value of 10.703.

These three importance measures provide a consistent overall picture of variable importance, clearly indicating that MCX is the most important variable in influencing the model’s predictions, while REM and IPU have intermediate importance values, reflecting their role as external financial flows. DBS has the lowest importance among all variables, although it remains non-negligible. The convergence of these importance measures increases confidence in their interpretation and suggests that the model’s predictions are not unduly driven by a single metric, an issue often highlighted in recent discussions on model interpretability and robustness (Hooker et al., 2021; Molnar et al., 2020). See Table 13.

The table reports a decomposition of the predicted values of the dependent variable into a baseline component and the individual contributions of the explanatory variables DBS, REM, MCX, and IPU for five representative cases. The baseline value is constant across all cases at 42.517, which can be interpreted as the reference prediction in the absence of deviations in the explanatory variables, while the final predicted value is obtained by adding the variable-specific contributions to this base, consistent with additive explanation frameworks in interpretable machine learning (Lundberg et al., 2020; Covert et al., 2021).

Case 1 illustrates a situation in which all variables contribute positively to the prediction. Starting from the baseline of 42.517, the combined positive effects of DBS (0.681), REM (1.081), MCX (7.497), and IPU (8.518) raise the predicted value to 60.295. The largest contributions come from MCX and IPU, confirming the strong influence of market structure and international public debt in shaping the outcome. This case represents a profile where both equity market diversification and external financing conditions significantly boost the predicted level of the dependent variable.

Cases 2 and 3 show the opposite pattern, with predicted values well below the baseline. In Case 2, negative contributions from all variables, especially MCX (−11.650) and REM (−4.830), drive the prediction down to 19.734. A similar configuration appears in Case 3, where large negative effects from REM (−5.938), MCX (−9.263), and IPU (−5.183) reduce the predicted value to 20.671. These two cases highlight how unfavorable positions in market diversification and external flows can substantially depress the outcome relative to the baseline level, consistent with the interpretation of feature contributions as marginal effects on predictions (Janzing et al., 2020).

Case 4 presents the highest predicted value, 66.774, driven by very strong positive contributions from REM (5.970) and especially MCX (17.398), alongside a positive effect from DBS (1.092). The small negative contribution from IPU (−0.202) is negligible in comparison. This case underscores the dominant role of equity market diversification in lifting the prediction far above the baseline, with remittances also playing an important supporting role, in line with Shapley-based interpretations of feature importance (Frye et al., 2020).

Finally, Case 5 shows a more mixed configuration. Although DBS contributes negatively (−0.735) and IPU also exerts a downward effect (−3.285), positive contributions from REM (1.168) and MCX (7.075) are sufficient to keep the predicted value slightly above the baseline at 46.740.

Overall, the decomposition confirms that MCX and REM are the most influential drivers of variation around the baseline, while DBS and IPU play more secondary but still meaningful roles in shaping the final predictions. The consistency of these additive contributions across cases strengthens the interpretability of the model, although caution is warranted given known limitations of model-agnostic explanation techniques (Molnar et al., 2020). See Table 14.

Figure 4 presents the main results obtained from the random forest model and also offers additional information regarding the model’s performance in terms of prediction and the relative importance of the explanatory variables used in the model, consistent with established analyses of random forest behavior and performance (Biau & Scornet, 2016; Probst et al., 2019).

Panel A in the figure demonstrates the relationship between observed and predicted test values. The dispersion of the points around the 45-degree line in the graph suggests that the model’s performance in reproducing observed values is reasonably high. Panel B in the figure presents the evolution of the out-of-bag mean squared error for increasing numbers of trees in the random forest model for both the training and the validation sets.

It is clear from the graph in Panel B that initially, the error is high and fluctuates significantly for a small number of trees in the model, but it decreases sharply and stabilizes gradually for increasing numbers of trees in the model, which is consistent with the random forest model’s performance and its ability to benefit from increasing the ensemble size (Oshiro et al., 2012). Moreover, the close proximity of the curves for the validation and training sets in the graph in Panel B suggests good generalization performance and little overfitting in the model, which is a well-known strength of random forest methods (Biau & Scornet, 2016). However, beyond a certain point, the rate of improvement slows, indicating that the model has reached a level of stability and reliability, in line with findings on hyperparameter tuning and diminishing returns from increasing the number of trees (Probst et al., 2019).

The variable importance plots, as represented in Panels C and D, provide a similar, albeit distinct, reading. Both plots show MCX as having the greatest influence, with a significant lead over the other predictors. This again points towards the fact that the structure and diversification of the equity market are significant contributors towards explaining the dependent variable. REM and IPU follow with a fair degree of distance, indicating that external private flows as well as international public debt contribute towards explaining the dependent variable. DBS has the least influence, indicating that, although relevant, the relative size of deposit-taking banks has a lesser impact. These findings are consistent with established approaches to variable importance in random forests (Speiser et al., 2019; Janitza et al., 2018).

The four plots provide a cohesive picture. The random forest model has shown satisfactory levels of predictive capability and stability, improving with an increase in ensemble size while still maintaining a reasonable gap between training and validation error. It has also provided an economically relevant reading, with MCX having the greatest influence, indicating the relevance of the structure and diversification of the equity market. The secondary level of influence of external financial flows is also relevant. The random forest approach can thus be considered a reliable method for analyzing the dependent variable, consistent with the broader literature on ensemble learning and predictive modeling (Biau & Scornet, 2016; Probst et al., 2019). See Figure 4.

7. Integrated Evidence on the Determinants of Stock Market Trading Diversification

The empirical results provide compelling insights, indicating that stock market trading diversification (VTX) is essentially a function of equity market structure. The application of a combination of panel econometrics, hierarchical clustering, and machine learning approaches to the topic at hand yields consistent results and reinforces each other, fully consistent with the current state of the art in research on financial development and structural diversity (Demirgüç-Kunt et al., 2021; Sahay et al., 2015). Across all econometric specifications employed, including OLS, fixed effects, random effects, extended models with controls, and dynamic models with lagged regressors, the results are remarkably stable. Specifically, the market capitalization index, excluding the top ten companies (MCX), turns out to be the most important factor in determining trading diversification and displays a significant, positive, and long-lasting effect. Likewise, remittance inflows (REM) have a substantial positive effect, providing evidence of the role of external private financial flows in facilitating participation and trading diversification. Conversely, bank dominance (DBS) and international public debt (IPU) are correlated with more concentrated market structures, which is perfectly in line with the currently developing theory about financial structure and its evolution from banks’ preeminence towards market domination (Demirgüç-Kunt et al., 2021). From the perspective of hierarchical clustering, financial systems exhibit significant structural diversity and a distinct core-periphery structure. While the largest number of countries fall into clusters with typical financial structures, smaller clusters exhibit very distinctive structures, in accordance with the clustering evidence (Jiang et al., 2020). It should be emphasized that there is much heterogeneity across countries regarding market structure, financial intermediation, and external private inflows. As for the predictions made using machine learning algorithms, the results support the conclusions made above. Indeed, the Random Forest approach appears superior among all the models tested, confirming the superiority of ensemble techniques (Biau & Scornet, 2016; Probst et al., 2019). As regards variable importance measures, they consistently indicate that MCX is the leading variable affecting VTX, with REM and IPU ranking second and third, respectively, followed by DBS. In conclusion, it can be said that trading diversification should not be treated as an artifact resulting from financial development, but rather as a feature defined by market structure, facilitated by external private financing, and constrained by bank dominance and the use of international public debt. See Table 15.

The results yield a clearly structured picture of determinants of stock market trading diversification and several key insights into economics. In any of the baseline, extended, and dynamic versions of the econometric analysis, the internal structure of equity markets is a leading factor driving the dynamics of trading diversification, in line with previous studies on market breadth (Gurley & Shaw, 1967; Gbadebo, 2024). Specifically, it is demonstrated that market breadth and composition (MCX) have a strong, highly statistically significant positive effect. Similarly, the positive effect of remittance inflows (REM) proved to be quite strong and persistent across the board. This implies that external financial flows expand access to trade by increasing liquidity and the investment capacity of households. This finding aligns with recent research on how remittances influence financial inclusion. The impact of other factors is ambiguous. For instance, international public debt (IPU) is negative and statistically significant across almost all versions considered, indicating that an increase in external public debt is associated with greater market concentration. At the same time, the significance and magnitude of the effect of deposit-taking banks (DBS) are reduced to zero when controlling variables are introduced. Hence, it is shown that financial structure affects macro-financial conditions but does not affect trading directly. It should also be noted that the introduction of control variables provides further support for the results. First, it should be noted that SMC and NLC are significant in almost all cases. Thus, it can be concluded that market size and breadth are important structural factors driving trading diversification. At the same time, GDP growth and credit depth appear to have a smaller effect, suggesting that the trading pattern is rather structural. The results were confirmed by the analysis based on dynamic specifications, which showed that the effects of MCX and REM were highly significant and persistent. At the same time, the effects of DBS and IPU remained statistically insignificant. The clustering analysis yields several insights into differences among countries in their financial structures and levels of development. Specifically, it appears that there are different types of financial development, and convergence to a single type is impossible. The presence of a core-periphery structure also supports this conclusion. In this regard, financial systems with a large MCX feature more diverse and broad trading activities. At the same time, bank-oriented and external-dependent systems are usually concentrated. Machine learning results further reinforce the obtained conclusions. The best-performing Random Forest model shows the MCX factor as the most prominent among those with significant weights. It means that market structure is the key determinant of trading diversification. Remittance inflow is another important variable alongside the IPU factor, while DBS is not critical. See Figure 5.

8. Policy Implications for Promoting Broader and More Inclusive Equity Markets

Implications of the findings for policy are substantial and will be discussed next. The internal composition of the stock market is shown to be as significant as its total size, suggesting a crucial policy consideration. It is especially relevant given the persistence of the market capitalization effect even after excluding the largest companies. This means that market expansion and growth in trading volume policies may not suffice if trading activities remain highly concentrated among a few large companies. Indeed, it correlates well with research on firm-size distribution and capital-market financing (Didier et al., 2014). Thus, it would be reasonable to pay increased attention to policies that promote the registration and exposure of small and medium-sized enterprises, such as lower listing fees, appropriate increases in transparency measures, and incentives for market-makers to ensure an equal distribution of trading activity. The positive association between remittance inflows and market participation is another valuable result, as it shows that external financial flows may support wider market participation when incorporated into the local financial system. In other words, policies aimed at lowering transaction fees for remittances, ensuring their incorporation into local financial institutions, and encouraging the use of formal financial products by recipient households will facilitate broader market participation. In addition, the connection between remittance flows and market participation suggests that these flows can serve as triggers for further deepening and diversifying financial systems. For instance, one may suggest developing dedicated savings or investment tools that incorporate remittance flows and encourage people to participate in the stock market. It is worth noting that such measures have been successfully implemented in other countries, including Pakistan and Mexico (Ofosu-Mensah Ababio et al., 2023). The presence of negative associations between trading diversification and two measures of bank dominance and international public debt highlights an intrinsic trade-off in financial system designs. Although banks and public debt are integral parts of financial systems, their high presence may restrain the development and participation in the stock market. As a result, the financial structure and external flows adversely affect economic performance (Coleman & Feler, 2015; Qwader, 2021). Hence, it is reasonable to implement policies that ensure a balanced development of all segments of the financial system—banking, bonds, and stocks. These goals can be achieved by regulating the financial sector to prevent the disproportionate growth of the banking sector and by implementing effective debt management strategies. Finally, it is vital to note the heterogeneity among countries in their levels of financial development, as revealed by the clustering analysis. It implies that each country develops its financial system following its trajectory. Thus, it once again demonstrates the importance of institutional and structural factors in shaping the evolution of financial systems (Aghion et al., 2005; Deng et al., 2023). Accordingly, it is wrong to search for a single general strategy to improve trading diversification. It is evident that some countries should focus on developing domestic equity markets, while others may prioritize increasing the share of foreign capital in domestic financial systems. It should be mentioned that the changing nature of banking relationships over time implies that balancing the development of financial intermediaries is another significant policy consideration. Overall, one can conclude that policymakers should leverage the benefits of econometric, clustering, and machine learning techniques to identify the fundamental structural factors that affect stock market performance (Didier et al., 2014; Ofosu-Mensah Ababio et al., 2023).

9. Conclusions

This paper provides a structured contribution to the literature on financial development by examining the under-researched area of the internal organization of trading activity on equity markets. Through this focus, enabled by the development of the trading diversification indicator (VTX), a new structural view of the operation of financial systems emerges. All empirical estimations presented in this study, including baseline models, extended models, dynamic models with lagged variables, and robustness check regressions, provide consistent conclusions regarding the role of internal structure in trading diversification. First, market capitalization excluding the ten largest firms (MCX) emerges as the most important and consistent variable. Hence, it can be stated that the internal composition of the market is the main driver of trading diversification and, consequently, broader markets with greater participation from a larger number of firms exhibit lower levels of trading concentration. External private financial flows, measured by remittance volume (REM), are also important for trading diversification. In turn, external flows via public debt (IPU) are a strong inhibitor of diversification. At the same time, while the role of the banking structure (DBS) is significant in the baseline models, this variable becomes less relevant when control and lagged variables are added. Thus, the evidence shows that banking structure acts through financial conditions as a determinant, but is not itself directly related to trading diversification. The results obtained by adding control variables and dynamic models indicate that trading diversification is a structural process that is not influenced by the current state of the economy. Furthermore, market size (SMC) and the number of listed companies (NLC) serve as additional confirmation of the structural character of the process under consideration, alongside digital access (DIF) and public finance channel (PUB). Cluster analysis provides evidence of the core–periphery structure of the financial system, confirming significant heterogeneity across countries, indicating that they do not converge but have different developmental trajectories. This conclusion is supported by machine learning analysis, which identifies MCX as the main determinant of trading diversification, followed by REM and IPU, with DBS taking only third place. Thus, the convergence of conclusions from econometrics, cluster analysis, and machine learning provides confidence that the results are not biased by any specific model choice but instead reveal fundamental regularities. It can be concluded that trading diversification does not depend on financial system size and liquidity, but rather on the internal market structure and the openness of the participation channel. In particular, private external flows support diversification, while dependence on public funds limits diversification opportunities. In general, policy recommendations follow from the analysis results. To build a diversified and resilient financial system, emphasis should be placed on the internal structure of equity markets, their broadening, and the creation of additional channels of participation. Future studies could explore the issue by using firm-level data, examining causality, and expanding the range of countries analyzed.

Author Contributions

Conceptualization, A.L., F.A., A.C., C.D. and M.A.; Methodology, A.L., F.A., A.C., C.D. and M.A.; Software, A.L., F.A., A.C., C.D. and M.A.; Validation, A.L., F.A., A.C., C.D. and M.A.; Formal analysis, A.L., F.A., A.C., C.D. and M.A.; Investigation, A.L., F.A., A.C., C.D. and M.A.; Resources, A.L., F.A., A.C., C.D. and M.A.; Data curation, A.L., F.A., A.C., C.D. and M.A.; Writing—original draft, A.L., F.A., A.C., C.D. and M.A.; Writing—review and editing, A.L., F.A., A.C., C.D. and M.A.; Visualization, A.L., F.A., A.C., C.D. and M.A.; Supervision, A.L., F.A., A.C., C.D. and M.A.; Project administration, A.L., F.A., A.C., C.D. and M.A.; Funding acquisition, A.L., F.A., A.C., C.D. and M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in [Global Financial Development Database] at [https://www.worldbank.org/en/publication/gfdr/data/global-financial-development-database] (accessed on 2 January 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Acronym	Definition
MSE	Mean Squared Error
MSE (scaled)	Scaled Mean Squared Error
RMSE	Root Mean Squared Error
MAE/MAD	Mean Absolute Error/Mean Absolute Deviation
MAPE	Mean Absolute Percentage Error
R²	Coefficient of Determination
OLS	Ordinary Least Squares
FE	Fixed Effects
RE	Random Effects
KNN	K-Nearest Neighbors
SVM	Support Vector Machine
VTX	Value traded excluding top 10 traded companies to total value traded (%)
DBS	Deposit money bank assets to deposit money bank and central bank assets (%)
REM	Remittance inflows to GDP (%)
MCX	Market capitalization excluding top 10 companies to total market capitalization (%)
IPU	Outstanding international public debt securities to GDP (%)
GDPG	Annual GDP growth rate, capturing macroeconomic conditions and business cycle dynamics.
DCP	Domestic credit to the private sector (% of GDP), proxy for financial development and credit depth.
SMC	Stock market capitalization (% of GDP), measuring overall size of the equity market.
TOR	Stock market turnover ratio, capturing market liquidity and trading activity intensity.
INTU	Individuals using the Internet (% of population), proxy for digital access and financial participation.
NLC	Number of listed companies per 1,000,000 people.
GOV	Credit to government and state-owned enterprises (% of GDP).
L.DBS	One-period lag of deposit-taking bank assets relative to total bank and central bank assets.
L.REM	One-period lag of remittance inflows (% of GDP).
L.MCX	One-period lag of market capitalization excluding top ten firms (%).
L.IPU	One-period lag of outstanding international public debt securities (% of GDP).
L.GDPG	One-period lag of GDP growth.
L.DCP	One-period lag of domestic credit to the private sector.
L.SMC	One-period lag of stock market capitalization.
L.TOR	One-period lag of stock market turnover ratio.
L.INTU	One-period lag of individuals using the Internet (% of population).
L.NLC	One-period lag of the number of listed companies per 1,000,000 people.
L.GOV	One-period lag of credit to government and state-owned enterprises (% of GDP).

References

Aghion, P., Howitt, P., & Mayer-Foulkes, D. (2005). The effect of financial development on convergence: Theory and evidence. The Quarterly Journal of Economics, 120(1), 173–222. [Google Scholar]
Aldomy, R. F., Thim, C. K., Lan, N. T. P., & Norhashim, M. B. (2020). Bank concentration and financial risk in Jordan. Montenegrin Journal of Economics, 16(3), 31–44. [Google Scholar]
Alhassan, U., Maswana, J. C., & Inaba, K. (2025). Leveraging remittances for entrepreneurship: The role of financial development in developing economies. Journal of the Knowledge Economy, 1–40. [Google Scholar] [CrossRef]
Allen, F., & Gale, D. (2000). Comparing financial systems. MIT Press. [Google Scholar]
Amihud, Y., & Mendelson, H. (1986). Asset pricing and the bid-ask spread. Journal of Financial Economics, 17(2), 223–249. [Google Scholar] [CrossRef]
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J. M., & Perona, I. (2013). An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1), 243–256. [Google Scholar] [CrossRef]
Baltagi, B. H. (2021). Nonstationary panels. In Econometric analysis of panel data (pp. 337–389). Springer International Publishing. [Google Scholar]
Barajas, A., Naceur, M. S. B., Beck, T., & Belhaj, M. (2020). Financial inclusion: What have we learned so far? What do we have to learn? (Vol. 157). International Monetary Fund. [Google Scholar]
Barber, B. M., & Odean, T. (2000). Trading is hazardous to your wealth: The common stock investment performance of individual investors. The Journal of Finance, 55(2), 773–806. [Google Scholar] [CrossRef]
Barratt, J. L., & Plucinski, M. M. (2023). Epidemiologic utility of a framework for partition number selection when dissecting hierarchically clustered genetic data evaluated on the intestinal parasite Cyclospora cayetanensis. American Journal of Epidemiology, 192(5), 772–781. [Google Scholar] [CrossRef]
Beck, T., Demirgüç-Kunt, A., & Levine, R. (2007). Finance, inequality and the poor. Journal of Economic Growth, 12(1), 27–49. [Google Scholar] [CrossRef]
Beck, T., Demirgüç-Kunt, A., & Levine, R. (2009). Financial institutions and markets across countries and over time-data and analysis. World Bank Policy Research Working Paper No. 4943. World Bank Group. [Google Scholar]
Beck, T., Levine, R., & Loayza, N. (2000). Finance and the sources of growth. Journal of Financial Economics, 58(1–2), 261–300. [Google Scholar] [CrossRef]
Beck, T., Senbet, L., & Simbanegavi, W. (2015). Financial inclusion and innovation in Africa: An overview. Journal of African Economies, 24(Suppl. S1), i3–i11. [Google Scholar] [CrossRef]
Bekaert, G., Harvey, C. R., & Lundblad, C. (2005). Does financial liberalization spur growth? Journal of Financial Economics, 77(1), 3–55. [Google Scholar] [CrossRef]
Bekaert, G., Harvey, C. R., Lundblad, C. T., & Siegel, S. (2013). The European Union, the Euro, and equity market integration. Journal of Financial Economics, 109(3), 583–603. [Google Scholar] [CrossRef]
Bettin, G., Presbitero, A. F., & Spatafora, N. L. (2017). Remittances and vulnerability in developing countries. The World Bank Economic Review, 31(1), 1–23. [Google Scholar]
Bénard, C., Biau, G., Da Veiga, S., & Scornet, E. (2021). Interpretable random forests via rule extraction. In International conference on artificial intelligence and statistics (pp. 937–945). PMLR. [Google Scholar]
Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197–227. [Google Scholar] [CrossRef]
Bonizzi, B. (2013, October 24–26). Capital flows to emerging markets: Institutional investors and the post-crisis environment. 17th FMM Conference of the Research Network Macroeconomics and Macroeconomic Policies, Berlin, Germany. [Google Scholar]
Box, T., & Davis, R. (2026). Illuminating OTC markets: The impact of public disclosures on trading dynamics. Journal of Financial Research. [Google Scholar] [CrossRef]
Brei, M., Ferri, G., & Gambacorta, L. (2023). Financial structure and income inequality. Journal of International Money and Finance, 131, 102807. [Google Scholar] [CrossRef]
Broner, F., Erce, A., Martin, A., & Ventura, J. (2014). Sovereign debt markets in turbulent times: Creditor discrimination and crowding-out effects. Journal of Monetary Economics, 61, 114–142. [Google Scholar] [CrossRef]
Budhathoki, P. B., Bhattarai, G., Aryal, N. P., & Ghimire, S. R. (2024). The bank concentration and risk exposure: Empirical insights from Asian countries. Nepal Journal of Multidisciplinary Research, 7(2), 12–29. [Google Scholar] [CrossRef]
Cabezas, L. M., Izbicki, R., & Stern, R. B. (2023). Hierarchical clustering: Visualization, feature importance and model selection. Applied Soft Computing, 141, 110303. [Google Scholar] [CrossRef]
Cameron, A. C., & Trivedi, P. K. (2010). Microeconometrics using stata (Vol. 2). Stata Press. [Google Scholar]
Campello, R. J., Moulavi, D., Zimek, A., & Sander, J. (2015). Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(1), 1–51. [Google Scholar] [CrossRef]
Coleman, N., & Feler, L. (2015). Bank ownership, lending, and local economic performance during the 2008–2009 financial crisis. Journal of Monetary Economics, 71, 50–66. [Google Scholar] [CrossRef]
Combes, J. L., Ebeke, C. H., Etoundi, S. M. N., & Yogo, T. U. (2014). Are remittances and foreign aid a hedge against food price shocks in developing countries? World Development, 54, 81–98. [Google Scholar] [CrossRef]
Cournède, B., Denk, O., & Hoeller, P. (2015). Finance and inclusive growth. OECD Economic Policy Papers No. 14, 0_2. OECD Publishing. [Google Scholar]
Covert, I., Lundberg, S., & Lee, S. I. (2021). Explaining by removing: A unified framework for model explanation. Journal of Machine Learning Research, 22(209), 1–90. [Google Scholar]
Cuestas, J. C., Lucotte, Y., & Reigl, N. (2020). Banking sector concentration, competition and financial stability: The case of the Baltic countries. Post-Communist Economies, 32(2), 215–249. [Google Scholar] [CrossRef]
Čihák, M., Demirgüç-Kunt, A., Feyen, E., & Levine, R. (2012). Benchmarking financial systems around the world. World Bank Policy Research Working Paper No. 6175. World Bank Group. [Google Scholar]
De Hoyos, R. E., & Sarafidis, V. (2006). Testing for cross-sectional dependence in panel-data models. The Stata Journal, 6(4), 482–496. [Google Scholar] [CrossRef]
Demirgüç-Kunt, A., Feyen, E., & Levine, R. (2013). The evolving importance of banks and securities markets. The World Bank Economic Review, 27(3), 476–490. [Google Scholar] [CrossRef]
Demirgüç-Kunt, A., Klapper, L., Singer, D., & Ansar, S. (2021). Financial inclusion, digital payments, and resilience in the age of COVID-19. World Bank Report. World Bank Group. [Google Scholar]
Demirgüç-Kunt, A., & Levine, R. (Eds.). (2001). Financial structure and economic growth: A cross-country comparison of banks, markets, and development. MIT Press. [Google Scholar]
Deng, Y., Dong, K., Zhang, X., & Taghizadeh Hesary, F. (2023). How does environmental concern affect the corporate resilience? Evidence from China’s energy companies. Evidence from China’s Energy Companies. [Google Scholar] [CrossRef]
Didier, T., Levine, R., Montanes, R. L., & Schmukler, S. L. (2021). Capital market financing and firm growth. Journal of International Money and Finance, 118, 102459. [Google Scholar] [CrossRef]
Didier, T., Levine, R., & Schmukler, S. L. (2014). Capital market financing, firm growth, firm size distribution. No. w20336. National Bureau of Economic Research. [Google Scholar]
Driscoll, J. C., & Kraay, A. C. (1998). Consistent covariance matrix estimation with spatially dependent panel data. Review of Economics and Statistics, 80(4), 549–560. [Google Scholar] [CrossRef]
Eberhardt, M., & Presbitero, A. F. (2015). Public debt and growth: Heterogeneity and non-linearity. Journal of International Economics, 97(1), 45–58. [Google Scholar] [CrossRef]
Espinas, D. R., Swanz, A., Hanson, A. L., & Logan, J. A. (2026). Missing data reporting and handling in special education group intervention research. Exceptional Children. [Google Scholar] [CrossRef]
Essary, C. R., Fischer, L. M., & Irlbeck, E. (2022). A Statistical Approach to Classification: A guide to hierarchical cluster analysis in agricultural communications research. Journal of Applied Communications, 106(3), 3. [Google Scholar] [CrossRef]
Fisher, A., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177), 1–81. [Google Scholar]
Frye, C., de Mijolla, D., Begley, T., Cowton, L., Stanley, M., & Feige, I. (2020). Shapley explainability on the data manifold. arXiv, arXiv:2006.01272. [Google Scholar]
Gabaix, X. (2011). The granular origins of aggregate fluctuations. Econometrica, 79(3), 733–772. [Google Scholar] [CrossRef]
Gbadebo, A. D. (2024). Theories of financial intermediation: Evaluation and empirical relevance. Journal of Law and Sustainable Development, 12(9), e3950. [Google Scholar] [CrossRef]
Grossman, S. J., & Stiglitz, J. E. (1980). On the impossibility of informationally efficient markets. The American Economic Review, 70(3), 393–408. [Google Scholar]
Gurley, J. G., & Shaw, E. S. (1967). Financial structure and economic development. Economic Development and Cultural Change, 15(3), 257–268. [Google Scholar] [CrossRef] [PubMed]
Hennig, C. (2015). What are the true clusters? Pattern Recognition Letters, 64, 53–62. [Google Scholar] [CrossRef]
Hoechle, D. (2007). Robust standard errors for panel regressions with cross-sectional dependence. The Stata Journal, 7(3), 281–312. [Google Scholar] [CrossRef]
Hooker, G., Mentch, L., & Zhou, S. (2021). Unrestricted permutation forces extrapolation: Variable importance requires at least one more model, or there is no free variable importance. Statistics and Computing, 31(6), 82. [Google Scholar] [CrossRef]
Huang, H., & Yang, X. (2025). Spatiotemporal evolution mechanism and dynamic simulation of the urban resilience system in the Chengdu–Chongqing economic circle. Sustainability, 17(8), 3448. [Google Scholar] [CrossRef]
Hughes, B. B. (2025). Analysis of integrated global SDG pursuit: Challenges and progress. Sustainability, 17(15), 6672. [Google Scholar] [CrossRef]
Ichino, M., Umbleja, K., & Yaguchi, H. (2021). Unsupervised feature selection for histogram-valued symbolic data using hierarchical conceptual clustering. Stats, 4(2), 359–384. [Google Scholar] [CrossRef]
Idroes, G. M., Maulidar, P., Marsellindo, R., Afjal, M., & Hardi, I. (2024). The impact of credit access on economic growth in SEA countries. Indatu Journal of Management and Accounting, 2(2), 96–104. [Google Scholar] [CrossRef]
Ikpesu, O. A. (2024). Interactive effect of migrant remittances and financial market development on growth in Sub-Saharan Africa. International Journal of Professional Business Review, 9(3), e04346. [Google Scholar] [CrossRef]
Imran, K., Devadason, E. S., & Kee Cheok, C. (2019). Developmental impacts of remittances on migrant-sending households: Micro-level evidence from Punjab, Pakistan. Journal of South Asian Development, 14(3), 338–366. [Google Scholar] [CrossRef]
Janitza, S., Celik, E., & Boulesteix, A. L. (2018). A computationally fast variable importance test for random forests for high-dimensional data. Advances in Data Analysis and Classification, 12(4), 885–915. [Google Scholar] [CrossRef]
Janzing, D., Minorics, L., & Blöbaum, P. (2020). Feature relevance quantification in explainable AI: A causal problem. In International conference on artificial intelligence and statistics (pp. 2907–2916). PMLR. [Google Scholar]
Jiang, T., Levine, R., Lin, C., & Wei, L. (2020). Bank deregulation and corporate risk. Journal of Corporate Finance, 60, 101520. [Google Scholar] [CrossRef]
Kahane, L. H. (2024). Regression basics: A student’s guide to quantitative methods and statistical analysis. Routledge. [Google Scholar]
Kahraman, B., & Tookes, H. E. (2017). Trader leverage and liquidity. The Journal of Finance, 72(4), 1567–1610. [Google Scholar] [CrossRef]
Kharroubi, S. G. (2015). Why does financial sector growth crowd out real economic growth. Bank for International Settlements. Available online: https://www.bis.org/publ/work490.pdf (accessed on 31 March 2025).
Kose, M. A., Nagle, P., Ohnsorge, F., & Sugawara, N. (2021). Global waves of debt: Causes and consequences. World Bank Publications. [Google Scholar]
Kwabi, F., Wonu, C., Ezeani, E., Owusu, A., & Leone, V. (2025). Impacts of cross-border equity portfolio flow and central bank transparency on financial development: The role of economic freedom and international bonds. International Journal of Finance and Economics, 30(2), 1319–1347. [Google Scholar] [CrossRef]
Langfield, S., & Pagano, M. (2016). Bank bias in Europe: Effects on systemic risk and growth. Economic Policy, 31(85), 51–106. [Google Scholar] [CrossRef]
Levine, R. (1997). Financial development and economic growth: Views and agenda. Journal of Economic Literature, 35(2), 688–726. [Google Scholar]
Levine, R. (2005). Finance and growth: Theory and evidence. Handbook of Economic Growth, 1, 865–934. [Google Scholar]
Levine, R., Loayza, N., & Beck, T. (2000). Financial intermediation and growth: Causality and causes. Journal of Monetary Economics, 46(1), 31–77. [Google Scholar] [CrossRef]
Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2010, December 13–17). Understanding of internal clustering validation measures. 2010 IEEE International Conference on Data Mining (pp. 911–916), Sydney, Australia. [Google Scholar]
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S. I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56–67. [Google Scholar] [CrossRef] [PubMed]
Malzer, C., & Baum, M. (2021). Constraint-based hierarchical cluster selection in automotive radar data. Sensors, 21(10), 3410. [Google Scholar] [CrossRef]
Molnar, C., König, G., Herbinger, J., Freiesleben, T., Dandl, S., Scholbeck, C. A., Casalicchio, G., Grosse-Wentrup, M., & Bischl, B. (2020, July). General pitfalls of model-agnostic interpretation methods for machine learning models. In International workshop on extending explainable AI beyond deep models and classifiers (pp. 39–68). Springer International Publishing. [Google Scholar]
Ofosu-Mensah Ababio, J., Yiadom, E. B., Sarpong-Kumankoma, E., & Boadi, I. (2023). Financial inclusion: A catalyst for financial system development in emerging and frontier markets. Journal of Financial Economic Policy, 15(6), 530–550. [Google Scholar] [CrossRef]
Oshiro, T. M., Perez, P. S., & Baranauskas, J. A. (2012, July). How many trees in a random forest? In International workshop on machine learning and data mining in pattern recognition (pp. 154–168). Springer. [Google Scholar]
Ozili, P. K. (2021). Financial inclusion research around the world: A review. Forum for Social Economics, 50(4), 457–479. [Google Scholar] [CrossRef]
Patalano, R., & Roulet, C. (2020). Structural developments in global financial intermediation: The rise of debt and non-bank credit intermediation (pp. 1–92). OECD Working Papers on Finance, Insurance and Private Pensions No. 44. OECD. [Google Scholar]
Petrescu, M., & Krishen, A. S. (2023). A decade of marketing analytics and more to come: JMA insights. Journal of Marketing Analytics, 11(2), 117–129. [Google Scholar] [CrossRef]
Philippon, T. (2019). On fintech and financial inclusion. No. w26330. National Bureau of Economic Research. [Google Scholar]
Pradhan, R. P., Arvin, M. B., Nair, M. S., Hall, J. H., & Bennett, S. E. (2026). Financial market structures, financial market openness, and the innovation-growth nexus? Evidence from developing countries. Asia Pacific Financial Markets, 33(1), 171–211. [Google Scholar] [CrossRef]
Probst, P., Wright, M. N., & Boulesteix, A. L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3), e1301. [Google Scholar] [CrossRef]
Qwader, A. S. (2021). International financial flows its effects on economic growth in Jordan. International Journal of Academic Research in Accounting Finance and Management Sciences, 11(1), 37–56. [Google Scholar] [CrossRef]
Raddatz, C., Schmukler, S. L., & Williams, T. (2017). International asset allocations and capital flows: The benchmark effect. Journal of International Economics, 108, 413–430. [Google Scholar] [CrossRef]
Rajan, R. G., & Zingales, L. (1996). Financial dependence and growth. NBER Working Paper No. w5758. National Bureau of Economic Research, Inc. [Google Scholar]
Randriamihamison, N., Vialaneix, N., & Neuvial, P. (2021). Applicability and interpretability of Ward’s hierarchical agglomerative clustering with or without contiguity constraints. Journal of Classification, 38(2), 363–389. [Google Scholar] [CrossRef]
Rebafka, T. (2024). Model-based clustering of multiple networks with a hierarchical algorithm. Statistics and Computing, 34(1), 32. [Google Scholar] [CrossRef]
Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249. [Google Scholar]
Sahay, M. R., Cihak, M., N’Diaye, M. P., Barajas, M. A., Mitra, M. S., Kyobe, M. A., Mooi, M., & Yousefi, M. R. (2015). Financial inclusion: Can it meet multiple macroeconomic goals? International Monetary Fund. [Google Scholar]
Saraf, M., & Kayal, P. (2022). Role of digital financial inclusion in promoting economic growth and freedom. In Digitalization and the future of financial services: Innovation and impact of digital finance (pp. 163–180). Springer International Publishing. [Google Scholar]
Saxena, A., Prasad, M., Gupta, A., Bharill, N., Patel, O. P., Tiwari, A., Er, M. J., Ding, W., & Lin, C. T. (2017). A review of clustering techniques and developments. Neurocomputing, 267, 664–681. [Google Scholar] [CrossRef]
Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134, 93–101. [Google Scholar] [CrossRef]
Srivastava, N., Tripe, D., Haq, M., & Yuen, M. K. (2026). Financial market development and bank deposits. Journal of International Financial Markets, Institutions and Money, 107, 102278. [Google Scholar] [CrossRef]
Stulz, R. M. (2019). Fintech, bigtech, and the future of banks. Journal of Applied Corporate Finance, 31(4), 86–97. [Google Scholar] [CrossRef]
Tamarit, I., Pereda, M., & Cuesta, J. A. (2020). Hierarchical clustering of bipartite data sets based on the statistical significance of coincidences. Physical Review E, 102(4), 042304. [Google Scholar] [CrossRef] [PubMed]
Thompson, S. B. (2011). Simple formulas for standard errors that cluster by both firm and time. Journal of financial Economics, 99(1), 1–10. [Google Scholar] [CrossRef]
Vendramin, L., Campello, R. J., & Hruschka, E. R. (2010). Relative clustering validity criteria: A comparative overview. Statistical Analysis and Data Mining: The ASA Data Science Journal, 3(4), 209–235. [Google Scholar] [CrossRef]
Williamson, B. D., Gilbert, P. B., Simon, N. R., & Carone, M. (2023). A general framework for inference on algorithm-agnostic variable importance. Journal of the American Statistical Association, 118(543), 1645–1658. [Google Scholar] [CrossRef] [PubMed]
Wurgler, J. (2000). Financial markets and the allocation of capital. Journal of Financial Economics, 58(1–2), 187–214. [Google Scholar] [CrossRef]
Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2(2), 165–193. [Google Scholar] [CrossRef]
Yakubu, S., Muritala, T. A., Abubakar, H. L., Bakare, A. A., Yusuf, W. A., & Afolabi, H. O. (2023). How does pension funds impact stock market development? An empirical analysis from Nigeria using ARDL technique. Open Journal of Social Sciences, 11(9), 575–600. [Google Scholar] [CrossRef]

Figure 1. Initial OECD Sample (38 Countries) and Final Estimation Sample (22 Countries). The map highlights OECD countries in red and the final sample in blue, showing a reduced, data-driven selection concentrated in Europe, North America, and Asia-Pacific, reflecting data availability constraints in the econometric analysis.

Figure 2. Hierarchical Segmentation of Financial Structures. Note. Panel (A) shows model selection criteria (BIC, AIC, WSS), suggesting an optimal cluster range around 9–10. Panel (B) illustrates well-separated groups alongside a dense core, indicating both dominant and niche financial structures. Panel (C)’s dendrogram confirms hierarchical relationships and distinct subgroups. Panel (D) interprets clusters economically: differences in market breadth (VTX), banking importance (DBS), remittances (REM), concentration (MCX), and public debt (IPU) reveal diversified versus externally dependent systems. Extreme values indicate highly specialized structures. Overall, the figure highlights strong heterogeneity, capturing both large homogeneous groups and smaller differentiated clusters driven by financial development and global integration.

Figure 3. Pairwise Relationships of Financial Indicators Across Clusters. The scatter plots show pairwise relationships among VTX, DBS, REM, MCX, and IPU by cluster. Distinct patterns and correlations highlight heterogeneous financial structures, revealing differences in market development, banking importance, and external dependence across clusters.

Figure 4. Random Forest Performance and Variable Importance Analysis. Note. Panel (A) shows a strong alignment between observed and predicted test values, indicating good predictive accuracy, although some dispersion suggests residual error. The red line represents the ideal prediction line, where predicted values perfectly match observed values. Panel (B) reports the out-of-bag error: it decreases rapidly with more trees and stabilizes, with slightly higher validation error, indicating limited overfitting. Panel (C) presents variable importance (mean decrease in accuracy), identifying MCX as the dominant predictor, followed by IPU and REM, while DBS has a marginal role. Panel (D) confirms this ranking using total increase in node purity, again highlighting MCX as the most influential variable, reinforcing its central role in explaining the model’s predictions.

Figure 5. Financial Market Diversification: Drivers and Structural Constraints. The figure highlights key drivers and barriers to market diversification. Market composition (MCX) is the main positive factor, supported by remittances (REM) and broader participation. In contrast, bank dominance and external debt (IPU) constrain development. A core-periphery pattern reflects structural heterogeneity across countries. Source: Authors’ elaboration using Notebook ML.

Table 1. Integration of Financial Theories and Literature Streams on Trading Diversification and Market Structure.

Literature Stream	Key References	Theoretical Foundation	Main Limitation vs. This Study
Financial development and structure (core theory)	Levine (1997, 2005); Levine et al. (2000); Beck et al. (2000, 2007); Allen and Gale (2000); Demirgüç-Kunt and Levine (2001)	Financial systems promote growth through efficient capital allocation, risk diversification, and institutional structure	Focus on aggregate indicators; do not consider distribution of trading across firms
Market efficiency, allocation and firm dynamics	Wurgler (2000); Rajan and Zingales (1996); Gabaix (2011)	Capital allocation and firm size distribution shape economic outcomes and market concentration	Do not explicitly analyze trading diversification in equity markets
Market microstructure and investor behavior	Amihud and Mendelson (1986); Grossman and Stiglitz (1980); Barber and Odean (2000)	Trading depends on liquidity, information asymmetries, and behavioral biases	Focus on trading mechanisms or individuals, not cross-firm distribution
Recent empirical extensions	Pradhan et al. (2026); Srivastava et al. (2026); Box and Davis (2026); Kwabi et al. (2025)	Extend classical theory to financial openness, structure, and flows	Remain focused on aggregate or context-specific dynamics; neglect trading distribution (VTX)

Note: The table synthesizes key literature streams, highlighting theoretical foundations and limitations. It emphasizes the study’s contribution in introducing trading diversification as a novel indicator capturing the distribution of trading activity across firms.

Table 2. Definition of Variables and Theoretical Expectations for Trading Diversification Analysis.

Variable Name	Acronym	Description	Expected Effect on VTX	Theoretical Motivation for Variable Selection
Value traded excluding top 10 traded companies to total value traded (%)	VTX	Measures stock market trading activity excluding the ten largest firms, capturing market breadth and liquidity dispersion. Higher values indicate more diversified trading, lower concentration, and a greater role of smaller and mid-cap companies in market turnover.	—	Captures the dependent concept of trading diversification, reflecting how evenly trading activity is distributed across firms. It represents a key dimension of market inclusiveness and financial development beyond size and liquidity.
Deposit money bank assets to deposit money bank and central bank assets (%)	DBS	Indicates the relative size of deposit money banks within the overall banking system. Higher values suggest greater financial intermediation by commercial banks.	Negative (−)	Proxies the structure of financial intermediation (bank-based vs. market-based systems). A stronger banking sector may crowd out equity market activity, reducing participation and leading to more concentrated trading structures.
Remittance inflows to GDP (%)	REM	Measures the importance of workers’ remittances relative to GDP, capturing external private financial inflows.	Positive (+)	Represents external private financial flows that increase household income and liquidity. These flows may enhance financial inclusion and broaden participation in equity markets, fostering more diversified trading activity.
Market capitalization excluding top 10 companies to total market capitalization (%)	MCX	Captures stock market concentration by excluding the ten largest firms from total capitalization. Higher values indicate a more diversified equity market structure.	Positive (+)	Direct proxy for equity market structure and breadth. A less concentrated market provides more investment opportunities and promotes trading across a wider set of firms, directly increasing diversification.
Outstanding international public debt securities to GDP (%)	IPU	Measures the stock of public sector debt issued on international markets relative to GDP.	Negative (−)	Captures external public financial dependence. Greater reliance on international debt may weaken domestic financial development and crowd out local capital markets, leading to more concentrated trading structures.

Note: Variables follow GFDD definitions and are expressed as ratios to ensure comparability. The selection reflects a parsimonious, theory-driven approach capturing distinct financial system dimensions while minimizing redundancy and multicollinearity in empirical analysis.

Table 3. Sample Composition and Missing Data Overview.

Rank	Country/Region	Years	Raw Obs.	Missing-Data Summary	Rank	Country/Region	Years	Raw Obs.	Missing-Data Summary
1	Australia	2004–2021	18	Later gaps in VTX/MCX/IPU	20	Japan	2004–2021	18	VTX/MCX missing in 2021
2	Austria	2004–2021	18	VTX/MCX gaps after 2014	21	Korea (Rep.)	2004–2021	18	Selected VTX/MCX gaps
3	Belgium	2004–2021	18	VTX/MCX largely missing	22	Latvia	2004–2021	18	VTX/MCX largely missing; some DBS gaps
4	Canada	2004–2021	18	DBS gaps after 2008; VTX/MCX after 2016	23	Lithuania	2004–2021	18	VTX/MCX largely missing; limited DBS gaps
5	Chile	2004–2021	18	Limited VTX/MCX gaps	24	Luxembourg	2004–2021	18	Several gaps across DBS/VTX/MCX/IPU
6	Colombia	2004–2021	18	Limited VTX/MCX gaps	25	Mexico	2004–2021	18	DBS missing; VTX/MCX later gaps
7	Costa Rica	2004–2021	18	VTX/MCX largely missing	26	Netherlands	2004–2021	18	VTX/MCX largely missing
8	Czech Republic	2004–2021	18	VTX/MCX largely missing	27	New Zealand	2004–2021	18	Selected VTX/DBS/MCX gaps
9	Denmark	2004–2021	18	VTX/MCX largely missing; some IPU gaps	28	Norway	2004–2021	18	IPU missing; VTX/MCX later gaps
10	Estonia	2004–2021	18	VTX/MCX largely missing	29	Poland	2004–2021	18	Selected VTX/MCX gaps
11	Finland	2004–2021	18	VTX/MCX largely missing	30	Portugal	2004–2021	18	VTX/MCX largely missing
12	France	2004–2021	18	VTX/MCX largely missing	31	Slovak Republic	2004–2021	18	VTX/MCX largely missing
13	Germany	2004–2021	18	Limited VTX/MCX gaps	32	Slovenia	2004–2021	18	VTX/MCX gaps after 2014
14	Greece	2004–2021	18	VTX/MCX missing in 2021	33	Spain	2004–2021	18	Selected VTX/MCX gaps
15	Hungary	2004–2021	18	VTX/MCX gaps after 2014	34	Sweden	2004–2021	18	VTX/MCX largely missing; some DBS gaps
16	Iceland	2004–2021	18	VTX/MCX largely missing	35	Switzerland	2004–2021	18	Later gaps across DBS/VTX/MCX/IPU
17	Ireland	2004–2021	18	Selected VTX/MCX gaps	36	Turkey	2004–2021	18	VTX/MCX missing in 2021
18	Israel	2004–2021	18	VTX/MCX gaps after 2018	37	United Kingdom	2004–2021	18	Selected VTX/MCX gaps
19	Italy	2004–2021	18	VTX/MCX gaps after 2009	38	United States	2004–2021	18	VTX/MCX gaps after 2016

Note: Each country has 18 potential annual observations over 2004–2021. The panel is unbalanced because variable availability differs across country–year pairs. No imputation or interpolation is applied; the empirical analysis relies on complete-case observations only. As a result, the baseline econometric models are estimated on 266 observations, while specifications including additional controls and lagged variables are based on a reduced sample of 216 observations, due to missing values and lag construction requirements.

Table 4. Comparative Panel Regression Results for the Determinants of Trading Diversification (VTX).

Variable/Statistic	OLS (Robust)	Fixed Effects (FE)	Random Effects (RE)
DBS	0.005 (0.089)	−0.269 (0.141)	−0.209 (0.138)
REM	5.928 *** (2.006)	9.486 *** (2.374)	5.846 *** (1.970)
MCX	0.936 *** (0.065)	0.499 *** (0.090)	0.738 *** (0.066)
IPU	−0.941 *** (0.199)	−1.062 *** (0.278)	−1.170 *** (0.205)
Constant	1.529 (10.266)	47.397 *** (14.574)	32.345 ** (14.300)
Observations	266	266	266
R-squared	0.731	0.185 (within)	0.719 (overall)
F/Wald	F = 380.65	F = 13.63	Chi2 = 169.27
Countries	22	22	22
Years	2004–2021	2004–2021	2004–2021

Notes: The Hausman test strongly rejects the null hypothesis of no systematic difference between estimators (χ²(4) = 28.39; p < 0.01), indicating that the fixed effects specification is preferred over random effects. Missing data are non-negligible and unevenly distributed across variables: VTX (369 missing observations), MCX (364), DBS (74), REM (38), and IPU (72). The final estimation sample is based on complete cases only. Robust standard errors are reported in parentheses. Statistical significance is denoted as follows: ** p < 0.05, *** p < 0.01.

Table 5. Robustness Check with Alternative Standard Errors.

Variable	FE Baseline	FE Clustered SE	FE Driscoll-Kraay SE
DBS	−0.269 * (0.141)	−0.269 (0.195)	−0.269 ** (0.0974)
REM	9.486 *** (2.374)	9.486 (5.819)	9.486 *** (2.563)
MCX	0.499 *** (0.0904)	0.499 *** (0.100)	0.499 *** (0.167)
IPU	−1.062 *** (0.278)	−1.062 (0.742)	−1.062 * (0.580)
Constant	47.40 *** (14.57)	47.40 ** (18.57)	47.40 *** (12.10)
Observations	266	266	266
Within R-squared	0.185	0.185	0.185

Notes: Standard errors are reported in parentheses. The dependent variable is VTX. The table compares the baseline fixed-effects (FE) model with alternative specifications using country-clustered standard errors and Driscoll–Kraay standard errors. FE denotes fixed effects. Clustered standard errors are computed at the country level, while Driscoll–Kraay standard errors are robust to heteroskedasticity, serial correlation, and cross-sectional dependence. Statistical significance is indicated as follows: * p < 0.10, ** p < 0.05, *** p < 0.01. The results show that the signs of the coefficients remain stable across all specifications. MCX is consistently positive and statistically significant in all models. When using Driscoll–Kraay standard errors, DBS, REM, and MCX remain statistically significant, while IPU retains a negative coefficient and remains weakly significant at the 10% level. Overall, these findings confirm the robustness of the main results after correcting for potential misspecification in the error structure.

Table 6. Extended Control Variables and Lagged Specification for Trading Diversification Analysis.

Variable	Description	Role as Control Variable	Theoretical Motivation
GDPG	Annual GDP growth rate, capturing macroeconomic conditions and business cycle dynamics.	Controls for macroeconomic fluctuations that may affect trading activity independently of financial structure.	Economic cycles influence investment behavior and market participation, potentially affecting trading activity regardless of financial structure.
DCP	Domestic credit to the private sector (% of GDP), proxy for financial development and credit depth.	Controls for overall financial system development and credit availability.	Financial development determines access to funding and investment opportunities, shaping participation in equity markets.
SMC	Stock market capitalization (% of GDP), measuring overall size of the equity market.	Controls for market size, separating scale effects from internal market structure (MCX).	Larger markets may naturally exhibit higher trading activity; controlling for size isolates structural effects on diversification.
TOR	Stock market turnover ratio, capturing market liquidity and trading activity intensity.	Controls for market liquidity, isolating diversification effects from trading intensity.	Higher liquidity may increase trading but does not necessarily imply diversification; this control separates volume from distribution effects.
INTU	Individuals using the Internet (% of population), proxy for digital access and financial participation.	Controls for household access to financial markets and participation capacity.	Digital access facilitates retail investor participation and trading, expanding market inclusiveness and potentially affecting trading dispersion.
NLC	Number of listed companies per 1,000,000 people.	Controls for market breadth and availability of investment opportunities.	A larger number of listed firms increases diversification opportunities and reduces concentration in trading activity.
GOV	Credit to government and state-owned enterprises (% of GDP).	Controls for domestic public financing as an alternative use of financial resources.	Government borrowing may crowd out private investment and equity market activity, affecting trading structure.
L.DBS	One-period lag of deposit-taking bank assets relative to total bank and central bank assets.	Mitigates simultaneity by capturing delayed effects of bank-based financial structure.	Financial structure evolves slowly; lagging accounts for persistence and reduces reverse causality.
L.REM	One-period lag of remittance inflows (% of GDP).	Accounts for delayed impact of external private financial flows and reduces reverse causality.	Remittances influence savings and investment decisions with a time lag.
L.MCX	One-period lag of market capitalization excluding top ten firms (%).	Captures persistence in equity market structure and reduces contemporaneous bias.	Market concentration is highly persistent over time.
L.IPU	One-period lag of outstanding international public debt securities (% of GDP).	Controls for lagged effects of external public financing and financial exposure.	External debt influences financial conditions with delayed effects.
L.GDPG	One-period lag of GDP growth.	Controls for dynamic macroeconomic effects and persistence in economic conditions.	Macroeconomic trends exhibit persistence and influence financial behavior over time.
L.DCP	One-period lag of domestic credit to the private sector.	Captures persistence in financial development and credit cycles.	Credit markets evolve gradually and affect investment dynamics over time.
L.SMC	One-period lag of stock market capitalization.	Accounts for inertia in market size and structural evolution over time.	Market size changes slowly and influences trading patterns persistently.
L.TOR	One-period lag of stock market turnover ratio.	Controls for persistence in liquidity conditions and trading behavior.	Liquidity dynamics are persistent and influence future trading activity.
L.INTU	One-period lag of individuals using the Internet (% of population).	Captures persistence in digital access and delayed effects on financial participation.	Digital inclusion evolves gradually and influences market participation over time, affecting trading dispersion with a lag.
L.NLC	One-period lag of the number of listed companies per 1,000,000 people.	Controls for persistence in market breadth and availability of investment opportunities.	The number of listed firms changes slowly and shapes long-term diversification opportunities and trading structure.
L.GOV	One-period lag of credit to government and state-owned enterprises (% of GDP).	Accounts for delayed effects of domestic public financing and crowding-out dynamics.	Government borrowing affects financial resource allocation over time, potentially crowding out private investment and influencing trading concentration with a lag.

Note: The table presents contemporaneous and lagged control variables capturing macroeconomic conditions, financial development, market structure, and participation. Lagged terms mitigate endogeneity and reflect persistence, improving identification and robustness of trading diversification estimates.

Table 7. Fixed Effects Estimates of Trading Diversification with Controls and Dynamic Specifications.

	FE Baseline	FE + Controls	FE + Controls + Lagged Variables	FE + Controls + Lagged Variables (Clustered SE)
	VTX
DBS	−0.269 * (0.141)	−0.0187 (0.192)
REM	9.486 *** (2.374)	9.503 *** (2.346)
MCX	0.499 *** (0.0904)	0.320 *** (0.0947)
IPU	−1.062 *** (0.278)	−0.733 ** (0.296)
GDPG		−0.176 (0.225)
DCP		−0.0508 (0.0537)
SMC		0.115 *** (0.0399)
TOR		0.0263 (0.0178)
INTU		0.0907 (0.0566)
NLC		0.463 *** (0.119)
GOV		−0.272 * (0.156)
L.DBS			−0.0146 (0.232)	−0.0146 (0.342)
L.REM			7.163 ** (3.084)	7.163 (4.958)
L.MCX			0.381 *** (0.137)	0.381 * (0.185)
L.IPU			−0.667 * (0.375)	−0.667 ** (0.292)
L.GDPG			−0.101 (0.280)	−0.101 (0.329)
L.DCP			−0.119 * (0.0657)	−0.119 (0.104)
L.SMC			0.0309 (0.0428)	0.0309 (0.0382)
L.TOR			−0.0306 (0.0202)	−0.0306 * (0.0161)
L.INTU			0.106 (0.0655)	0.106 (0.113)
L.NLC			0.474 *** (0.147)	0.474 (0.346)
L.GOV			−0.127 (0.181)	−0.127 (0.215)
_cons	47.40 *** (14.57)	10.01 (19.49)	21.02 (24.45)	21.02 (36.64)
Observations	266	244	216	216
R-squared	0.185	0.274	0.232	0.232

Note: The table reports fixed effects regressions with baseline, extended controls, and lagged variables. Clustered standard errors account for heteroskedasticity and autocorrelation. Results highlight robustness across specifications, emphasizing dynamic effects and persistence in trading diversification determinants. Standard errors in parentheses * p < 0.10, ** p < 0.05, *** p < 0.01.

Table 8. Baseline vs. Alternative Banking Variable (DBG).

Variable	FE (DBS)	FE (DBG)	FE (DBG, Clustered)
DBS	−0.269 * (0.141)
DBG		−0.0376 (0.0359)	−0.0376 (0.0528)
REM	9.486 *** (2.374)	7.198 *** (2.290)	7.198 (5.190)
MCX	0.499 *** (0.0904)	0.536 *** (0.0872)	0.536 *** (0.120)
IPU	−1.062 *** (0.278)	−0.944 *** (0.274)	−0.944 (0.751)
Constant	47.40 *** (14.57)	22.81 *** (5.185)	22.81 ** (9.592)
Observations	266	286	286
R-squared	0.185	0.163	0.163

Notes: All models are estimated using fixed effects. Standard errors in parentheses. Column (3) reports standard errors clustered at the country level. * p < 0.10, ** p < 0.05, *** p < 0.01.

Table 9. Comparative Clustering Validity Metrics Across Alternative Algorithms.

Indicator	Density Based	Fuzzy C-Means	Hierarchical	Model Based	k-Means	Random Forest
Maximum diameter	1.000	0.135	0.099	0.500	0.000	0.360
Minimum separation	1.000	0.000	0.192	0.051	0.036	0.049
Pearson’s γ	0.000	0.383	1.000	0.512	0.710	0.235
Dunn index	1.000	0.000	0.392	0.059	0.088	0.065
Entropy	0.000	1.000	0.616	0.984	0.911	0.984
Calinski–Harabasz	0.000	0.505	0.479	0.578	1.000	0.421

Note: The table reports multiple clustering validity indicators across methods. Hierarchical clustering provides the best overall balance, particularly in Pearson’s gamma, supporting its selection based on multi-criteria evaluation and robustness across diverse performance metrics.

Table 10. Cluster Composition and Internal Validity Measures from Hierarchical Clustering.

Cluster	1	2	3	4	5	6	7	8	9	10
Size	186	28	42	24	1	6	9	5	5	1
Explained proportion within-cluster heterogeneity	0.695	0.071	0.134	0.072	0.000	0.008	0.009	0.010	6.498 × 10⁻⁴	0.000
Within sum of squares	331.713	33.816	64.125	34.549	0.000	3.664	4.416	4.755	0.310	0.000
Silhouette score	0.271	0.407	0.310	0.339	0.000	0.646	0.607	0.530	0.905	0.000

Note: The table reports cluster sizes, within-cluster heterogeneity, dispersion, and silhouette scores. Results indicate uneven cluster structures, with a dominant cluster and several small, distinct groups, reflecting heterogeneity and well-separated observations in multivariate data.

Table 11. Cluster Centroids by Financial Development, Market Structure, and External Integration Indicators.

Cluster	VTX	DBS	REM	MCX	IPU
1	0.405	−0.360	0.085	−0.315	0.107
2	0.003	2.275	−0.479	0.249	−0.777
3	−1.328	−0.729	0.977	−0.631	0.835
4	0.490	0.545	−0.473	2.251	−0.379
5	0.128	0.957	−1.201	3.353	2.360
6	−2.357	0.145	−1.964	−0.310	−1.604
7	0.387	1.469	−2.187	0.713	−1.723
8	0.469	1.375	−2.244	3.791	−1.749
9	−4.729	−0.889	1.938	−0.655	1.751
10	0.495	3.412	2.168	0.412	−1.358

Note: The table reports standardized cluster centroids across key financial indicators. Results reveal heterogeneous financial system profiles, ranging from diversified market-based structures to externally dependent and concentrated systems, highlighting structural differences in development and integration patterns.

Table 12. Normalized Performance Comparison of Machine Learning Models Across Evaluation Metrics.

Metric (Normalized)	Boosting	Decision Tree	KNN	Random Forest	Linear Reg.	Neural Net	Lasso	SVM
MSE	0.496	0.838	0.970	0.944	0.848	0.000	0.851	1.000
MSE (scaled)	0.447	0.705	0.958	1.000	0.733	0.000	0.838	0.951
RMSE	0.503	0.834	0.974	0.947	0.850	0.000	0.853	1.000
MAE/MAD	0.486	0.726	1.000	0.794	0.798	0.000	0.760	0.918
MAPE	0.000	0.995	1.000	0.000	0.000	0.928	0.969	0.000
R²	0.355	0.631	0.941	1.000	0.657	0.000	0.783	0.931

Note: The table presents normalized performance metrics across multiple machine learning models. Results highlight relative predictive accuracy and robustness, emphasizing trade-offs across error measures and goodness-of-fit indicators, consistent with comparative evaluation practices in machine learning research.

Table 13. Variable Importance Measures Across Multiple Metrics in the Predictive Model.

Variable	Mean Decrease in Accuracy	Total Increase in Node Purity	Mean Dropout Loss
MCX	495.125	24,459.853	22.182
REM	142.043	11,351.301	12.111
IPU	144.458	10,534.754	12.734
DBS	55.195	6851.254	10.703

Note: The table reports variable importance using multiple metrics, confirming MCX as the dominant predictor, followed by REM and IPU, while DBS shows lower influence. Consistent rankings across measures enhance interpretability and robustness of model insights.

Table 14. Decomposition of Predicted Values into Baseline and Variable Contributions.

Case	Predicted	Base	DBS	REM	MCX	IPU
1	60.295	42.517	0.681	1.081	7.497	8.518
2	19.734	42.517	−0.966	−4.830	−11.650	−5.338
3	20.671	42.517	−1.462	−5.938	−9.263	−5.183
4	66.774	42.517	1.092	5.970	17.398	−0.202
5	46.740	42.517	−0.735	1.168	7.075	−3.285

Note: The table presents additive decomposition of predictions into baseline and variable contributions. Results highlight the dominant role of MCX and REM, while DBS and IPU remain secondary, supporting model interpretability despite limitations of explanation techniques.

Table 15. Integrated Evidence on the Determinants of Trading Diversification Across Empirical Approaches.

Method	Main Findings	Economic Interpretation
Panel Econometrics (OLS, FE, RE, Controls, Dynamic Models)	Across all specifications (baseline, with controls, and dynamic), MCX remains the strongest and most robust positive determinant of VTX, while REM is consistently positive and significant. IPU is systematically negative, and DBS loses significance once controls and dynamics are introduced, indicating a weaker and indirect role. Results are stable under alternative standard errors (clustered and Driscoll–Kraay).	Trading diversification is structural, not cyclical: it is primarily driven by equity market composition. External private flows (REM) expand participation, while public debt (IPU) constrains market development. Banking structure matters, but mainly through interaction with broader financial conditions rather than as a standalone driver.
Hierarchical Clustering	Clear core–periphery structure: one dominant cluster with average characteristics and several small clusters with extreme financial configurations (high MCX, high REM/IPU, or high DBS). Strong heterogeneity across countries.	Financial systems follow distinct structural regimes: market-based, bank-dominated, or externally driven. Trading diversification is not uniform but depends on the structural configuration of the financial system.
Machine Learning (Random Forest, KNN, SVM)	Random Forest delivers the most robust and consistent performance. Variable importance confirms: MCX ≫ REM ≈ IPU > DBS. Results are stable across models and evaluation metrics.	Equity market structure is unambiguously the dominant predictor of trading diversification. External financial flows play a supporting role, while banking structure is second-order and context-dependent.
Prediction Decomposition/Feature Contribution	MCX and REM explain the largest deviations in predicted VTX, both in static and dynamic settings. IPU has a consistent negative contribution, while DBS contributes marginally once controls are included.	Confirms that trading diversification is driven by market structure and participation channels, not by aggregate financial size. External flows amplify diversification, while public debt and banking dominance constrain it.

Note: The table synthesizes findings from econometric, clustering, and machine learning approaches. Results consistently highlight equity market structure as the main driver of trading diversification, supported by external flows and constrained by public debt and banking dominance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Leogrande, A.; Anobile, F.; Costantiello, A.; Drago, C.; Arnone, M. Equity Market Structure and Trading Diversification: Insights from Panel Data, Clustering, and Machine Learning. Int. J. Financial Stud. 2026, 14, 150. https://doi.org/10.3390/ijfs14060150

AMA Style

Leogrande A, Anobile F, Costantiello A, Drago C, Arnone M. Equity Market Structure and Trading Diversification: Insights from Panel Data, Clustering, and Machine Learning. International Journal of Financial Studies. 2026; 14(6):150. https://doi.org/10.3390/ijfs14060150

Chicago/Turabian Style

Leogrande, Angelo, Fabio Anobile, Alberto Costantiello, Carlo Drago, and Massimo Arnone. 2026. "Equity Market Structure and Trading Diversification: Insights from Panel Data, Clustering, and Machine Learning" International Journal of Financial Studies 14, no. 6: 150. https://doi.org/10.3390/ijfs14060150

APA Style

Leogrande, A., Anobile, F., Costantiello, A., Drago, C., & Arnone, M. (2026). Equity Market Structure and Trading Diversification: Insights from Panel Data, Clustering, and Machine Learning. International Journal of Financial Studies, 14(6), 150. https://doi.org/10.3390/ijfs14060150

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Equity Market Structure and Trading Diversification: Insights from Panel Data, Clustering, and Machine Learning

Abstract

1. Introduction

2. Literature Review

3. Methodology and Data

4. Panel Regression Results and Economic Interpretation of Trading Diversification

4.1. Robustness Checks: Alternative Standard Errors and Inference Stability

4.2. Extended Specification with Controls and Dynamic Effects

4.3. Alternative Banking Measure: Structure vs. Size Effects

5. Hierarchical Clustering Results and Financial Structure Segmentation

6. Machine Learning Performance and Variable Importance Analysis

7. Integrated Evidence on the Determinants of Stock Market Trading Diversification

8. Policy Implications for Promoting Broader and More Inclusive Equity Markets

9. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI