1. Introduction
Over the past several decades, financial markets have undergone major structural changes driven by globalization, technological development, and the growing intertwining of the real and financial sectors. This led scholars and policymakers to pay less attention to traditional indicators of market development, such as market size and liquidity, and more to market structure, specifically the distribution of trading across firms. In this context, the structure of financial markets—depth, breadth, and concentration—became an important element of financial development, which affects their efficiency and resource allocation (
Philippon, 2019;
Stulz, 2019). Notwithstanding this change in perspective, the great majority of studies on stock market development still focus on aggregate measures such as market capitalization, value traded, and turnover rates, while paying little attention to the dispersion of trading across firms. However, stock markets tend to be highly concentrated, with a few firms accounting for most trading volume, while smaller firms have little influence on the trading process (
Kahraman & Tookes, 2017). At the same time, more inclusive financial systems, associated with greater participation and lower concentration, provide greater resilience and better welfare outcomes (
Demirgüç-Kunt et al., 2021). Despite widespread recognition that financial development is multifaceted and influenced by financial structure, market dynamics, and financial innovations, the literature still lacks a coherent framework to explain the relationships among these variables and the distribution of trading activity across firms. Therefore, the paper’s key problem is to identify the structural and financial determinants of trading diversification and how they operate. Trading diversification is measured using a new indicator that estimates the proportion of trading performed by firms other than the top 10 largest by trade volume (VTX). It indicates whether trading is concentrated or diversified. The main question addressed in the paper is therefore as follows: what are the key determinants of trading diversification among the dimensions of financial systems? To test this hypothesis, the paper uses an analytical framework based on four important dimensions of financial systems: the relative size of deposit-taking banks (DBS), market capitalization excluding the top ten firms (MCX), remittance inflows (REM), and international public debt stock (IPU). DBS indicates the nature of financial systems—market versus bank-based—MCX indicates market dynamics and concentration, and REM indicates the importance of financial inclusion, while IPU indicates the dependence on international financing. Thus, the paper makes three key contributions. First, it suggests a new approach to measuring and analyzing stock market trading diversification, thereby addressing an important yet understudied aspect of financial market development. Second, it contributes to knowledge about the determinants of stock market trading diversification by providing new information about its relationship with financial structure, financial flows, and market dynamics. Third, the paper combines panel econometrics, clustering, and machine learning approaches. In terms of policies, understanding stock market trading diversification is critically important for ensuring efficient, stable, and inclusive financial systems because high levels of trading diversification are associated with greater liquidity and informational efficiency, as well as better welfare and systemic performance.
The remainder of the article is structured as follows.
Section 2 reviews the relevant literature on financial structure and trading diversification.
Section 3 presents the methodology and data.
Section 4 reports the panel regression results and provides the economic interpretation of trading diversification.
Section 4.1 discusses robustness checks based on alternative standard errors and inference stability, while
Section 4.2 extends the specification by incorporating additional controls and dynamic effects.
Section 5 presents the results of the hierarchical clustering analysis and the segmentation of financial structures.
Section 6 evaluates machine learning performance and analyzes variable importance.
Section 7 integrates the empirical evidence to provide a comprehensive assessment of the determinants of stock market trading diversification.
Section 8 discusses the policy implications for promoting broader and more inclusive equity markets. Finally,
Section 9 concludes the paper.
2. Literature Review
Financial development and economic performance are two interrelated topics that have been widely researched in the economic literature, and considerable consensus has emerged on the importance of financial structures in contributing to economic growth, efficiency, and the efficient allocation of resources. Fundamental literature notes that financial development positively influences economic performance through improving capital allocation, diversifying risks, and overcoming problems associated with information asymmetry (
Levine, 1997,
2005). Moreover, properly functioning financial systems promote savings and allocate funds towards efficient investments, leading to long-run growth and development (
Levine et al., 2000;
Beck et al., 2000). Such seminal works constitute the theoretical basis for studying financial development and provide the necessary references for the current paper. Accordingly, a fundamental issue in financial systems concerns the financial architecture, which is usually considered along the bank-based versus market-based continuum. The main point is that different financial architectures can have various effects, because they differ in how they organize financial intermediation (
Allen & Gale, 2000;
Demirgüç-Kunt & Levine, 2001). Bank-based financial systems rely primarily on intermediated finance, whereas market-based systems focus on the role of financial markets in allocating resources. However, another important aspect is that there should not be a single dominant financial system, but rather proper organization and complementarity among the different elements of financial systems. Another stream of financial development research studies its effects on inequality and access to opportunities in the economy. Financial inclusion reduces income inequality and expands opportunities available for both households and firms (
Beck et al., 2007). This is an important consideration for financial market participation, as broader access to financial services increases the number of participating traders. However, despite these developments in the literature, financial development has traditionally been measured by size, using aggregate measures such as credit to the private sector, market capitalization, and other indicators of liquidity. However, the issue of the internal structure of financial markets, including the allocation of capital within them, has not been fully addressed so far. Indeed,
Wurgler (
2000) demonstrates that financial development improves capital allocation across firms and industries in financial markets, suggesting that the internal structure of financial markets matters for their performance. Moreover, some theories of market microstructure also point to the significance of capital allocation in markets. Market microstructure theory shows that trading is influenced by factors such as liquidity, transaction costs, and informational imperfections. Bid-ask spread measures market liquidity and affects trading volume (
Amihud & Mendelson, 1986), whereas informational imperfections make it impossible for markets to achieve efficiency (
Grossman & Stiglitz, 1980). As a result, there would be an unequal distribution of trading activity across assets, with trading concentrated around several large, liquid assets. There is some evidence on this matter, as large firms affect aggregate dynamics in the economy, according to
Gabaix (
2011). Therefore, trading activity in equity markets might be concentrated among a few assets and firms. In addition, the firm-level financial structure is an important determinant of economic performance. Financial dependence positively affects growth in countries with highly developed financial systems (
Rajan & Zingales, 1996), suggesting that financial structure affects both aggregate economic performance and its distribution. Moreover, financial liberalization and international financial integration can increase access to financing and broaden market participation (
Bekaert et al., 2005), although results vary across different countries. Moreover, individual investors further exacerbate the distribution of trading activity.
Barber and Odean (
2000) show that trading activity is not optimally concentrated across many assets but rather restricted to a very limited number. Thus, all these developments imply that financial development needs to be studied not only with regard to the size of markets but also their internal structure in terms of trading diversification. In this respect, the composition and distribution of trading among firms in equity markets constitute an important, yet under-researched topic. Recent research extends this framework by analyzing the role of different aspects of financial development (financial market structure, openness, and institutions) and their impact on market performance. However, this research still focuses on aggregate financial markets or specific markets, without addressing the distribution of trading activity. In this respect, a research gap in the literature concerns the absence of a theoretical framework linking financial system structure, market composition, and external financial flows to trading diversification. This study fills the existing gap by building on a well-established, broadly recognized literature on financial development rather than relying solely on recent findings. First of all, the novelty of the study lies in introducing trading diversification as a measure of market internal structure, which constitutes a unique feature of the market’s financial development. Second, this study contributes to the literature by providing a more granular perspective on financial development by examining the market composition and distribution of trading activity across firms. Finally, the current paper draws on theoretical knowledge from financial-structure theory, market microstructure theory, and the capital allocation literature. See
Table 1.
3. Methodology and Data
In order to achieve the stated research objective, the present study relies on the Global Financial Development Database (GFDD) produced by the World Bank, one of the most common sources in empirical research due to its broad coverage in terms of various financial structure, depth, efficiency, and access indicators (
Patalano & Roulet, 2020). The selected database provides data on 38 countries for the period 2004–2021, offering up to 684 country-year observations with complete coverage (18 observations per country per year). However, there are significant variations in data availability, resulting in a large number of missing values and an unbalanced panel dataset. Although all countries are observed over the same reference period, 2004–2021, the effective sample size will be considerably reduced due to missing data for key variables. Econometric estimations are performed using the complete-case methodology and listwise deletion, in accordance with which only observations with values for all variables are used. As a result, baseline models (OLS, Fixed Effects, and Random Effects) include 266 country-year observations with full data. The addition of extra control variables and lags makes the dataset incomplete, decreasing the effective sample size. In particular, dynamic specifications that include lagged variables are estimated based on 216 country-year observations, which account for both data availability problems and the loss of initial periods necessary for lag construction. Missing data are not imputed or interpolated; only country-year observations that are complete in terms of all variables are considered, although at the expense of a considerable reduction in the sample size (from the theoretical maximum 684 observations to only 216). With the introduction of additional control variables, the sample is further reduced to 244 observations due to missing values in these variables. The dependent variable VTX is calculated as the value traded by all firms, excluding the top 10 most traded firms, relative to the total value traded by all firms (
Yakubu et al., 2023). This measure captures the degree of trade diversification by gauging how much trade activity occurs outside the largest firms. The empirical model comprises four independent variables representing various dimensions of financial structure and external financial integration. Variable selection is based on theoretical considerations within the paper’s framework, which analyzes the relationship between financial structure, market composition, and financial flows on the one hand, and trading diversification on the other. These variables have been chosen because they correspond to specific, non-overlapping aspects of the financial system that are theoretically linked to the process under examination. Empirical specification aims to represent four mutually complementary mechanisms. First, DBS is used to capture financial intermediation structure, distinguishing between bank-based and market-based financial systems. Second, MCX is used internally to measure the composition of equity markets. Third, REM is intended to reveal the role of external private financial flows represented by remittances in influencing liquidity and participation in the financial market. Fourth, IPU is used to assess the impact of external public finance reflected in dependence on international borrowing and the possible crowding-out effect. Additional variables included in the model serve to control for macroeconomic conditions (GDPG), financial development (DCP), market size and liquidity (SMC, TOR), as well as alternative channels of participation and resource allocation. Specifically, Internet usage (INTU) is used to control for the impact of improved access on participation in the financial markets; the number of listed companies (NLC) reflects market breadth and opportunities for investing in it; and government credit (GOV) reflects public financing and possible crowding-out effects domestically. All of these variables are important because they aim to isolate the theoretical channels, avoiding redundancy and multicollinearity in the models rather than maximizing fit through exhaustive specification searching. The empirical results confirm the rationality of this approach, as the main associations persist across all specifications and robustness checks. Therefore, the empirical specification can be viewed as a direct application of the theoretical model in practice, ensuring its consistency.
The variables used in the paper are calculated from standard GFDD indicators and are renamed for ease of comparison. Definitions of all variables are taken from the GFDD documentation, while they are calculated as indicators available in GFDD, without transformation (as ratios relative to GDP or totals). Specifically, VTX is value traded excluding the top ten traded companies relative to total value traded; DBS is deposit-taking bank assets relative to assets of deposit-taking banks and central banks; REM is personal remittances received relative to GDP; MCX is market capitalization excluding the top ten companies relative to total market capitalization; and IPU is outstanding international public debt securities of the public sector relative to GDP. See
Table 2.
In terms of variable selection, the approach remains theoretically guided and parsimonious, including only a few independent variables that represent distinct features of the financial system to avoid overlapping and multicollinearity. In place of selecting a large number of potentially overlapping indicators, the analysis focuses on identifying important structural mechanisms, while the other dimensions, such as financial depth, institutional quality, and macroeconomic fundamentals, are included as covariates. Most importantly, the choice of variables is neither random nor intuitive, but is based on strong theoretical justification for the link between financial structure and external flows to participation in trading. Bank-based financial structures, represented by DBS, are likely to crowd out equity market development and, thus, lower the trading diversification. Increased household income from remittances (REM) is expected to boost investors’ participation in markets, leading to a more even distribution of trading across participants (
Bettin et al., 2017;
Imran et al., 2019). Market concentration (MCX) represents a measure of trade-off between market concentration and breadth, and, hence, should be considered a relevant and important determinant of trading dispersion. Lastly, reliance on external public debt (IPU) is used as a proxy for external financing, which may hinder domestic financial development.
The initial dataset includes 38 OECD countries. However, due to data availability constraints and the adoption of a listwise deletion approach, the final estimation sample is reduced to 22 countries. The countries included in the econometric analysis are: Australia, Austria, Canada, Chile, Colombia, Germany, Greece, Hungary, Ireland, Israel, Italy, Japan, Korea (Rep.), Luxembourg, New Zealand, Poland, Slovenia, Spain, Switzerland, Turkey, United Kingdom, and the United States. See
Figure 1.
Table 3 summarizes the full sample structure, including country coverage, time span, and missing data patterns.
The dataset comprises all 38 countries, enabling a balanced dataset with 18 observations per country per year over the period from 2004 to 2021. However, as discussed above, the actual panel will be unbalanced due to unequal data availability, particularly for the two important market-structure variables VTX and MCX (
Baltagi, 2021). The first pattern is that missing values are common across countries for VTX and MCX. For example, these variables are not available in many economically advanced European countries, such as Belgium, France, Finland, and the Netherlands. Also, in other economically underdeveloped countries, such as Latvia, Lithuania, and Estonia, there appears to be no data on these variables, indicating a lack of structural data. It shows that measurements of trading diversity are missing in some countries, especially in non-financially developed markets. The second pattern refers to the uneven distribution of the missing values over time. In many countries, including Australia, Austria, Italy, and Slovenia, gaps became more visible after 2014. This can indicate changes in accounting practices or other problems in data acquisition (
Hughes, 2025). By contrast, countries such as Chile, Colombia, and Germany have fewer missing values over time. Other data, such as DBS, do not show values in the later years in Canada and Mexico, while IPU data in Nordic countries, including Norway and Denmark, do not have complete datasets. Moreover, Luxembourg and Switzerland have missing data across different variables, further reducing the number of valid observations. All in all, the table clearly demonstrates considerable cross-country data variation. As a result, the use of an unbalanced panel will be justified, and the listwise deletion method should also be considered reasonable (
Espinas et al., 2026).
This approach ensures internal consistency and comparability of the empirical analysis while explicitly accounting for the unbalanced nature of the panel.
4. Panel Regression Results and Economic Interpretation of Trading Diversification
Specifically we have estimated the following equation:
where
reflecting the unbalanced panel structure.
This analysis does not aim to establish strict causal relationships, but rather to identify robust empirical associations consistent with theoretical mechanisms. The use of fixed effects controls for time-invariant unobserved heterogeneity across countries, while the inclusion of lagged independent variables helps mitigate potential simultaneity concerns. Although these approaches do not fully eliminate endogeneity, they improve the credibility of the estimated relationships.
The results reported in
Table 4 provide a comprehensive comparison across three econometric specifications—OLS, Fixed Effects (FE), and Random Effects (RE)—to identify the determinants of stock market trading diversification (VTX). This comparison is not merely methodological but carries important economic implications, as it allows us to distinguish between spurious correlations driven by unobserved heterogeneity and structurally robust relationships. See
Table 4.
One of the initial observations concerns the clear distinction between ordinary least squares (OLS) and panel data specifications. Specifically, the latter estimates show a significantly lower value of R-squared (0.199 compared to 0.731 in OLS). However, this interpretation should be approached with caution, as OLS estimation does not account for unobserved country-specific heterogeneity (
Cameron & Trivedi, 2010;
Kahane, 2024). In particular, this can be problematic for the current research question as structural characteristics (institutional quality, financial development, regulations, etc.) affect both the dependent variable and regressors. One example concerns the DBS variable, which appears statistically insignificant and close to zero in OLS, while being negative and economically meaningful in all panel estimates. Comparing the fixed-effects (FE) and random-effects (RE) estimates of panel models is the essence of econometric analysis. Specifically, the Hausman test results (χ
2(4) = 28.39;
p < 0.01) confirm the preference of FE estimation. Namely, it rejects the null hypothesis of no systematic difference between estimators. Thus, the existence of a correlation between the unobserved effects and regressors is confirmed. As a consequence, the inconsistency of the RE estimator is proven, making the FE model preferable for further economic analysis. Speaking of the DBS variable representing the relative size of deposit-taking banks, its effect is economically and statistically meaningful. Namely, the FE estimate shows a negative coefficient (−0.269) that is significant at the 10% level. These results are in line with the theoretical background of the current paper. The financial system based on bank intermediation has low levels of trading diversification. This is due to lower stock market development and a higher share of bank financing used by companies, as well as the corresponding concentration in the list of trading firms (
Demirgüç-Kunt et al., 2013). The REM variable, which represents remittance inflows, is another significant factor affecting the degree of trading diversification. In particular, the FE estimate is positive and highly significant (9.486;
p < 0.01). This result underscores the importance of external private financial flows in increasing the share of trading firms in total trading volumes. More specifically, remittances allow households to increase their income and savings and to participate more actively in financial transactions (
Combes et al., 2014). The third important variable in the current analysis is MCX. In particular, it shows high statistical significance and large coefficient values across all estimated models. For example, in the FE model, it equals 0.499 and is highly significant (
p < 0.01). This result clearly confirms the hypothesis that the internal equity market structure is significant for trading diversification. Indeed, this result aligns with the relevant literature on market structure (
Bekaert et al., 2013) and with the machine learning results reported in the paper. The high share of market capitalization held by firms other than the top ten companies leads to a more diversified trading process. The last important explanatory variable in the analysis is the share of international public debt (IPU). The FE estimate of the variable is negative and statistically significant (−1.062;
p < 0.01). This variable represents a negative relationship between external financing and stock market structure. The economic implications of this estimate concern the lower development of equity markets and the weaker ability of countries relying on external finance to attract private investment (
Broner et al., 2014;
Eberhardt & Presbitero, 2015;
Kose et al., 2021). It should be noted that the signs of the estimates are similar for FE and RE models. However, there are notable differences in the values of these estimates for the REM and MCX variables. They correspond to the Hausman test results and demonstrate the impact of neglecting the correlation between regressors and unobserved country-specific factors on the reliability of estimates (
Cameron & Trivedi, 2010). As for the model fit, the within R-squared is 0.185, which is significantly lower than the OLS indicator (0.731). However, this result corresponds to the common property of panel data models with fixed effects when much of the variation comes from cross-sections (
Kahane, 2024). Thus, approximately 18.5% of the within-country variation in trading diversification is explained by the regressors included in the current model. While it does not seem very impressive, it is a reasonable result considering the complexity of the topic. Another point to mention is the presence of missing data. It occurs frequently for VTX and MCX. As a result, the estimation is based on 266 observations using the case-complete method. Although it can lead to sample selection bias, the results indicate the robustness of the identified relationships. Overall, the estimates presented in the paper support the main hypotheses regarding the driving factors of trading diversification in stock markets. They include the internal equity market structure represented by the MCX variable, external private financial flows measured by REM, and factors that discourage the process—DBS and IPU. In this regard, the FE estimation appears to be the most suitable one.
4.1. Robustness Checks: Alternative Standard Errors and Inference Stability
The robustness checks reported in
Table 5 play a crucial role in validating the empirical findings by addressing potential violations of standard panel data assumptions, namely heteroskedasticity, serial correlation, and cross-sectional dependence.
However, while the FE baseline model provides consistent estimates, inference relies heavily on whether the standard errors were correctly calculated (
Baltagi, 2021). For this reason,
Table 5 evaluates the baseline FE model along with two additional variations—FE with country-clustered standard errors and FE with Driscoll-Kraay standard errors that are robust to all forms of dependence (
Driscoll & Kraay, 1998;
Hoechle, 2007). First of all, it should be noted that all estimated coefficients are stable across the considered models, with no changes in size or sign. Therefore, it is safe to conclude that the underlying economic relationships are structural in nature and not a consequence of a particular assumption concerning the nature of error. For instance, the estimated DBS coefficient is always negative, thus confirming the inverse relationship between bank dominance and market diversity in terms of trading structure. According to the baseline FE model, the variable is statistically significant at the 10% level. When using clustered standard errors, the result is non-significant, indicating that larger standard errors lead to this outcome. It is likely that the initial loss of significance was due to conservative standard error estimates, since applying Driscoll-Kraay standard errors renders the variable statistically significant at the 5% level. Overall, all results confirm that more bank-oriented financial systems have more concentrated market structures. Regarding the coefficient of REM, it also does not change its sign and remains positively related to diversification regardless of the standard errors used. At the same time, it remains economically significant. Specifically, the coefficient is highly significant according to the baseline and Driscoll-Kraay models but is non-significant under country clustering. As previously mentioned, the choice of standard errors significantly affects the significance of certain coefficients, and hence their magnitudes can differ across models. However, the stability of the coefficient size across all considered models suggests that remittances’ contribution to trade is economically relevant. Most importantly, the MCX coefficient remains positive and economically significant across all specifications. It seems to have the largest influence on trading diversification and remains statistically significant in all models. Therefore, it can be concluded with certainty that this variable is the driving force behind diversification. Specifically, the greater importance of firms not among the top ten determines the trading structure of stock exchanges. Overall, this finding demonstrates once again that the internal composition of stock markets affects their structure. The coefficient of IPU is always negatively correlated with diversification, regardless of the standard errors applied. It remains insignificant under clustering but is weakly significant under the Driscoll-Kraay approach. Since the impact of international public debt has little influence on the results, this suggests that cross-country and temporal dependence are especially important in determining the statistical significance of the coefficient. However, regardless of the chosen specification, the variable’s effect remains negative. One interesting point is that the R-squared remains stable across all three specifications. This is quite understandable because only the standard error calculation changed in this case, and not the methodology. The results confirm that the choice of standard errors does not affect the model’s explanatory power and is solely related to significance and confidence levels. It can be stated with certainty that Driscoll–Kraay standard errors account for heteroskedasticity, cross-sectional, and serial correlations (
De Hoyos & Sarafidis, 2006;
Driscoll & Kraay, 1998). As shown above, the main results of the study did not significantly change. All variables remain significant across specifications, indicating that the results are quite reliable and credible. In summary, robustness checks show that the main results of the study are robust and cannot be explained by violations of classical assumptions about error characteristics. Thus, most coefficients retain their statistical significance despite using different standard errors. Moreover, the signs of those that become statistically insignificant remain the same, indicating that the general interpretation of the results is correct (
Thompson, 2011).
4.2. Extended Specification with Controls and Dynamic Effects
It is crucial that the inclusion of control variables ensures that the estimations are unbiased due to the presence of other relevant factors alongside the independent variables of interest. As trading activity in financial markets depends not only on its structure but also on macroeconomic conditions, financial development level, market size and liquidity, and accessibility/participation by agents, the empirical analysis is specified to include such control variables that can separate the effect of the primary independent variables from other factors related to trading (
Levine, 2005;
Beck et al., 2000). Firstly, macroeconomic conditions are included in the specification as the GDP growth rate (GDPG), since they can serve as proxies for business cycle dynamics. According to the theory, fluctuations in macroeconomic conditions can affect investor behavior and, consequently, the extent of their trading activity, regardless of the financial system’s structure (
Levine, 2005). Secondly, the level of financial development will be included in the analysis through the domestic credit to the private sector (DCP) variable. This proxy accounts for differences in financial systems’ efficiency and, accordingly, investors’ participation in financial activities. Furthermore, the stock market features are captured through the inclusion of stock market capitalization (SMC) and the stock market turnover ratio (TOR). The inclusion of these variables is justified insofar as there is a need to distinguish between the intensity and volume of trading and its diversification, which is the focal point of the analysis. The reason is that a large, liquid market may generate high trade volumes without necessarily being diversified (
Levine, 2005). As for alternative sources of financial resources, their consideration is especially needed to control for households’ participation in the financial market and to account for domestic public financing channels. Thus, remittances (REM) can be used as a measure of household income and liquidity, as they are related to consumption, saving, and investment. Moreover, internet usage (INTU) will serve as an indicator of participation opportunities resulting from increased access to financial services (
Ozili, 2021;
Demirgüç-Kunt et al., 2021). The final control included is the number of listed companies (NLC). It represents the size of the opportunity set in stock markets, reducing market concentration and fostering trading diversification. Finally, credit to government and state-owned enterprises (GOV) will be considered as a proxy for domestic public debt and its effect on trading (
Sahay et al., 2015;
Kose et al., 2021). Hence, a combination of these control variables will allow for accounting for other channels through which financial diversification can be achieved, including private/public financial flows and household financial inclusion in the stock market. Notably, the addition of these variables does not affect the main conclusions drawn from the analysis; rather, it improves their identification. To reduce the potential simultaneity or reverse causality problem, it is possible to introduce lagged versions of the main regressors and control variables. It is necessary because financial and macroeconomic variables tend to exhibit high persistence and may be jointly determined by trading activity. Using lagged values of these variables (e.g., L.DBS, L.REM, L.MCX, L.IPU) would help to avoid problems associated with contemporaneous correlation. In turn, lagged control variables (L.GDPG, L.DCP, L.SMC, L.TOR, L.INTU, L.NLC, L.GOV) would control for the dynamics of macroeconomic conditions, financial development, stock markets’ features, and participation. Generally speaking, the simultaneous use of lagged and contemporaneous variables can be justified by the persistence of financial processes (
Sahay et al., 2015). See
Table 6.
The results reported in
Table 7 provide a rich and structured assessment of the determinants of trading diversification (VTX), allowing for a direct comparison between the baseline specification, the inclusion of additional controls, and dynamic specifications with lagged variables. Overall, the findings strongly reinforce the theoretical framework of the article, which emphasizes the role of financial structure, external financial flows, and market composition in shaping the distribution of trading activity. See
Table 7.
Following the baseline fixed-effects specification, the estimates confirm the basic theoretical claims. The coefficient of DBS is significantly negative and insignificant. According to substantive meaning, an increasing number of banks in the country’s financial sector is associated with a lower expectation of trading diversification. This statement aligns with theoretical considerations suggesting that a financial system dominated by banks hinders the development of equity markets, thereby limiting the number of participants (
Didier et al., 2021;
Aghion et al., 2005). Meanwhile, REM has a significantly positive coefficient, suggesting that remittances increase household liquidity and promote financial participation. Remittances have been found to facilitate broader inclusivity and encourage engagement in trading activities (
Barajas et al., 2020). MCX is an important variable with a substantial coefficient, suggesting that trading diversification primarily depends on market features. The coefficient for IPU is negative and significant, confirming that external public debt can lead to increased market concentration through crowding-out effects (
Bonizzi, 2013;
Raddatz et al., 2017).
Several conclusions may be drawn from the regression with control variables included. Namely, the effect of DBS turns out to be even smaller and insignificant, implying that part of the effect previously observed is due to the country’s broader macro-financial development. However, the variable still preserves its negative sign as expected. The coefficients for REM and MCX are highly significant and large, confirming their relevance for interpreting the discussed phenomenon. No omitted factor affects the contribution of remittances to trade diversification (
Didier et al., 2021).
The inclusion of controls yields the following theoretical implications. First, GDPG does not show a significant contribution to VTX, implying that macroeconomic growth plays a minor role in promoting the trading diversification. It supports the idea that VTX measures the structural dimension of trading activity. Second, DCP does not affect VTX significantly either, suggesting that financial development cannot provide sufficient benefits; it is the structure and allocation policy that matter (
Aghion et al., 2005).
Further, two crucial variables have been identified in the extended theoretical model. First, the SMC coefficient is significantly positive, confirming the contribution of market size to trading diversification (
Didier et al., 2021). TOR also contributes to VTX, though without statistical significance; this implies that liquidity contributes to the diversification process but is not the main factor. The inclusion of the new control variables adds more substance to theoretical assumptions. The Internet usage index, INTU, confirms the theoretical expectation; nevertheless, it proves insignificant for financial participation due to cross-country heterogeneity (
Saraf & Kayal, 2022;
Barajas et al., 2020). NLC is highly significant, with a large coefficient, suggesting that the number of listed companies in a stock market facilitates trading diversification. GOV is negatively significant with respect to VTX, confirming the crowding-out theory (
Bonizzi, 2013).
If dynamic models with a lag structure are estimated, REM and MCX are found to be positively significant, indicating a persistent contribution of these factors. The results align with theoretical expectations regarding the persistence of remittance and market structure impacts on financial participation (
Barajas et al., 2020). IPU continues to show negative but insignificant results, which might mean that the contribution of international public debt operates on a delayed basis. Similarly, DBS becomes insignificant under dynamic models too. As seen, the comparison of standard errors across specifications shows stability; only GDPG fails to demonstrate statistical significance under standard clustering.
To summarize, the findings reliably support the basic hypothesis that trading diversification is shaped by the structure of the stock market and financial flows. DBS is a variable with a complex effect that warrants consideration in relation to its composition. Also, the addition of new control variables helps highlight other dimensions of the discussed phenomenon (
Idroes et al., 2024;
Beck et al., 2015).
4.3. Alternative Banking Measure: Structure vs. Size Effects
To conduct a more robust analysis of the baseline results and maintain consistency in scaling all explanatory variables, we specify another model that focuses on the choice of the banking system indicator. In the baseline model, the variable DBS reflects the importance of deposit-taking banks within the total banking sector (
Beck et al., 2009;
Čihák et al., 2012). At the same time, it is worth noting that, unlike other regressors, DBS does not reflect shares of GDP, which can lead to inconsistencies in the scaling of regressors. To eliminate such concerns, the following variable, called DBG, is introduced and reflects the ratio of deposit money banks’ assets to GDP. It should be noted that this step serves two complementary functions. First, it provides consistency in the scale of regressors by making a corresponding adjustment to the financial sector indicator and aligning it with other regressors in light of their macro-financial orientation (
Čihák et al., 2012). Second, it allows for a robustness check and for assessing whether the results obtained depend on the specific financial-sector indicator used (
Beck et al., 2009). In this context, it is possible to separate the effects associated with the structure of the financial sector from those associated with its size in the economy. The results of the aforementioned procedure are presented in
Table 8, where, in addition to the baseline fixed-effect specification, results from estimations using the alternative measure of the banking sector variable and clustering are shown. It is evident that the increase in the number of observations in the model using the DBG variable (286) compared to the baseline model (266) is due to greater data availability associated with this variable, resulting in a larger complete-case sample.
In the baseline specification, DBS stands for the internal structure of the financial system captured by the share of deposit-taking banks among the assets of all banks. Still, as indicated by the reviewer, most of the explanatory variables are in terms of the shares of GDP. In order to control for such potential effects and test whether they affect the results, we consider an alternative measure of the share of the banking sector—DBG, or the share of the assets of deposit money banks among GDP (
Langfield & Pagano, 2016). There exists a clear theoretical reason behind using a particular measure in the equation. While DBS characterizes the internal structure of the financial system and tells us something about the share of commercial banks among the banks’ assets, DBG characterizes the scale of the banking sector relatively to the economy. Hence, one of the measures captures how the financial intermediation takes place while the other—how much the banking sector represents the economy. Both these dimensions have certain implications for financial development, and thus it may make sense to check whether they play a role (
Brei et al., 2023;
Cournède et al., 2015). We want to compare the two and see which of them influences the level of trading diversification more. The estimates suggest that the coefficient becomes insignificant regardless of whether we cluster standard errors or not. Even though the coefficient remains negative and implies the same interpretation—a bigger banking sector leads to less diversified trading, its impact is no longer statistically significant. Thus, even though the result seems to hold in a sense, the magnitudes differ, depending on which measure of the banking sector is chosen. As discussed above, there exist conceptual differences between these measures that allow making an economically sensible conclusion that the internal structure of the financial system plays a role, but its scale does not matter (
Kharroubi, 2015). In other words, it may seem that the existence of a huge banking sector crowds out the equity market. What actually matters is the fact that financial intermediation occurs through the banking system and not other channels, since the first variable characterizes this phenomenon and the latter one—the scale. Thus, our baseline regression contains the first measure as a more economically meaningful one. At the same time, the remaining set of variables shows high stability across different specifications and thus reinforces the robustness of the results. First of all, the coefficient on REM again turns out to be positive and implies that remittances facilitate financial participation. Secondly, the variable MCX remains significant and has a similar magnitude. Finally, the coefficient of IPU keeps being negative and insignificant. All in all, this robustness check confirms that our results are not affected by a particular measure of the banking sector used. It also gives additional support to the idea that structural characteristics of the financial system matter for trading diversification (
Brei et al., 2023). Despite DBG ensuring consistency with other regressors in terms of scaling, we use DBS in the baseline regression, as it better captures the internal structure of the financial system rather than its magnitude. The robustness analysis shows that the results are not driven by this choice and also allows concluding that the internal structure is the economically relevant characteristic (
Cournède et al., 2015).
5. Hierarchical Clustering Results and Financial Structure Segmentation
The results obtained for the normalized indicators clearly demonstrate the trade-off between compactness, separation, and structure for the considered algorithms, as widely discussed in the literature on clustering validity indices (
Arbelaitz et al., 2013;
Vendramin et al., 2010). Density-based clustering has high values for maximum diameter, minimum separation, and the Dunn index, which means good cluster compactness and separation, while it has poor results for Pearson’s gamma, entropy, and the Calinski–Harabasz index, which indicate poor global structure and balance in cluster partitions, consistent with the idea that different indices capture different aspects of clustering quality (
Liu et al., 2010).
Hierarchical clustering has the best results for Pearson’s gamma and reasonably good results for entropy and the Dunn index, which indicate good global ordering and acceptable internal validity, though it does not dominate in separation and compactness results. This reflects the known strength of hierarchical methods in preserving global structure rather than optimizing a single objective function (
Saxena et al., 2017).
k-Means has the best results for the Calinski–Harabasz index and high values for Pearson’s gamma, which indicate good global variance separation, while it has poor results for the Dunn index and minimum separation, which indicate poor cluster separation in the data space, in line with evidence that k-means tends to favor variance-based criteria over distance-based separation (
Arbelaitz et al., 2013;
Vendramin et al., 2010).
Model-based and Random Forest clustering methods demonstrate balanced results, which are neither good nor poor for any of the considered indicators, though they are close to the best results for entropy and moderate results for separation and compactness, supporting the view that some clustering approaches provide more stable but less extreme performance across validation metrics (
Saxena et al., 2017).
Fuzzy C-Means has poor results for all separation and compactness indicators, despite good results for entropy, which makes it difficult to evaluate its quality, highlighting the sensitivity of fuzzy clustering methods to the choice of validity indices (
Liu et al., 2010).
Considering all indicators together, Hierarchical clustering demonstrates the best balance between high values for Pearson’s gamma and good results for other indicators, avoiding poor results observed in other methods, which makes it the best compromise for clustering validity, consistent with comparative studies emphasizing the importance of multi-criteria evaluation (
Arbelaitz et al., 2013;
Vendramin et al., 2010). See
Table 9.
As indicated by the hierarchical clustering results, the cluster structure shows considerable unevenness, namely, a principal cluster that is substantially larger than the rest. This type of unevenness is typical in empirical clustering cases in which the data demonstrate dominant group patterns (
Essary et al., 2022;
Xu & Tian, 2015). Specifically, cluster 1 contains 186 observations and accounts for the majority of the dataset, whereas the other clusters contain fewer observations, with two of them having just one observation each (clusters 5 and 10). Thus, one may infer that the majority of units in the multi-dimensional space defined by variables VTX, DBS, REM, MCX, and IPU have a similar configuration, whereas a few units represent unique combinations of the above-mentioned variables and thus require clustering. This explanation aligns with the identification of outliers and specialized clusters in clustering analyses (
Hennig, 2015;
Campello et al., 2015). This inference is supported by the results of calculations of within-cluster heterogeneity. As is shown by the results, cluster 1 constitutes about 69.5% of all within-cluster heterogeneity, meaning that a large majority of all within-cluster heterogeneity of the data is associated with cluster 1. At that, clusters 2, 3, and 4 account for rather small amounts of within-cluster heterogeneity, ranging between 7% and 13%, whereas the other clusters account for negligible values. Thus, one may assume that the data under consideration have the structure characterized by the presence of a few clusters, with cluster 1 being the most numerous and constituting “typical” cases with respect to stock market breadth (VTX), banking sector size versus central bank size (DBS), remittances (REM), market concentration (MCX), and international public debt (IPU). This inference also aligns with clustering models, suggesting the dominance of a few clusters and the presence of special clusters (
Xu & Tian, 2015). Furthermore, one can consider the within-cluster sum of squares of the data under consideration. The highest within-cluster sum of squares is exhibited by cluster 1, which is a consequence of its large size but also indicates the presence of considerable heterogeneity within this cluster. That is, although the observations in cluster 1 are closer to each other than those in other clusters, they exhibit considerable financial structure diversity. Conversely, the other clusters are characterized by lower within-cluster sums of squares, with some reaching zero, which may be explained either by extremely homogeneous structures within these clusters or by the mere fact of their single membership (
Essary et al., 2022). In addition, silhouette analysis can serve as another criterion for evaluating clustering quality. Thus, cluster 1 has a rather low silhouette value of 0.271, indicating that many observations are close to the cluster boundary and to observations in adjacent clusters. It is consistent with the assumption of relatively high heterogeneity in this large cluster, suggesting gradual transitions from the cluster to other clusters (
Hennig, 2015). Meanwhile, clusters 2, 3, and 4 show somewhat higher silhouette values, implying rather good quality and separation, though not too high. Finally, the smallest clusters (6, 7, 8, and particularly 9) exhibit exceptionally high silhouette values (0.905), indicating strong separation from other clusters. Overall, one may conclude that hierarchical clustering allows for inferring a core-periphery structure of the data under consideration. Thus, the majority of observations constitute the core, defined by relatively diffuse patterns in the multi-dimensional space composed of variables VTX, DBS, REM, MCX, and IPU, whereas few observations make up the periphery, characterized by sharp separation and specificities. From the viewpoint of economic and financial analysis, one may conclude that there are many units or countries that demonstrate similar financial structures or market developments; however, there are outliers characterized by peculiarities in terms of market concentration, remittances, banking structure, or international public debt, thus requiring isolation into separate clusters (
Xu & Tian, 2015). See
Table 10.
Thus, the obtained cluster centroids provide an extensive explanation of how differentiated groups of observations can be formed based on their features across the mentioned aspects of stock market structure, banking system composition, external financial flows, market concentration, and international public debt dependence. This approach aligns well with modern clustering analysis in finance and economics (
Petrescu & Krishen, 2023;
Huang & Yang, 2025). Since the standardized values represent deviations from the sample average, the clusters can be compared with respect to the relative positions of the variables. Cluster 1, which also includes the majority of the sample, shows moderately positive VTX and REM, but slightly negative DBS and MCX, and close to zero IPU. This means that this cluster characterizes countries with somewhat broader, more diversified stock market trading than that represented by large companies. In addition, this cluster represents countries where the importance of deposit-taking banks is rather small compared to that of the central bank. The market concentration is relatively low, while there is no significant reliance on international public debt. Therefore, Cluster 1 can be defined as a country with an average financial system. In contrast, Cluster 2 is characterized by a highly positive DBS value but negative REM and IPU values. VTX is rather small, and MCX is slightly positive. This means that this cluster represents countries with relatively less diversified stock exchanges and banking systems, which are important to the central bank, and that have low levels of international public debt and remittances. Thus, Cluster 2 comprises countries with low remittance levels and high international public debt dependence. Cluster 3 has negative VTX and DBS but positive REM and IPU, and negative MCX. This implies that this group of observations refers to countries with rather concentrated stock exchanges and poorly developed banking. However, they have substantial remittance flows and rely heavily on international public debt markets. Accordingly, Cluster 3 is associated with a regime in which remittances and international public debt are of key importance in the economy. It is also possible to see such results in regard to some economic systems (
Huang & Yang, 2025). Cluster 4 has a high MCX, moderate positive DBS, VTX, and negative REM and IPU. Therefore, this cluster can be interpreted as one characterized by a highly diversified stock market beyond the largest companies, a relatively developed banking sector, and a comparatively small impact of remittances and international public debt markets. As a result, remittances and international public debt markets do not play an essential role in Cluster 4. Clusters 5, 8, and 10 have rather extreme profiles. So, Cluster 5 is characterized by high MCX and IPU, positive DBS, but negative REM, meaning high levels of market orientation and banking, reliance on international public debt, and minimal significance of remittances. Cluster 8 is characterized by very high MCX and negative REM and moderate VTX and positive DBS, implying highly market-based financial systems with low remittance dependence. Finally, Cluster 10 has high DBS and REM but a negative IPU. Finally, Clusters 6 and 9 represent the most extreme profiles. Cluster 9 has extremely negative VTX but positive REM and IPU, while Cluster 6 is similar but has moderate VTX. Such results indicate that economies in these clusters have a very narrow stock market structure, weak and less dominant banking, and a high dependence on external financial flows from the public and private sectors. This approach to determining outliers is supported by modern clustering research (
Hennig, 2015). To conclude, the cluster means show that countries have been split into different types of financial system development and external financial integration, ranging from more diversified, market-oriented structures to less concentrated but more dependent ones. See
Table 11.
Figure 2 presents a comprehensive visual interpretation of the results obtained from the hierarchical clustering for the model with the inclusion of VTX, DBS, REM, MCX, and IPU, bringing together the information obtained for cluster selection, cluster structure, and economic interpretation, consistent with recent advances in hierarchical clustering visualization and interpretability (
Cabezas et al., 2023;
Randriamihamison et al., 2021). See
Figure 2.
Panel A presents the evolution of information criteria and the sum of squares for each cluster, depending on the number of clusters considered for the partition. The downward trend represents the improvement in the goodness-of-fit measure for the model, while the highlighted minimum represents the optimal number of clusters, near ten, where the model balances goodness-of-fit and parsimony considerations for the partition. This result supports the selection of a rich cluster structure, capable of capturing the heterogeneity present in the data set, in line with model selection approaches in hierarchical clustering (
Barratt & Plucinski, 2023;
Rebafka, 2024).
Panel B presents the clustered observations in a reduced dimensional space, where different colors are used to differentiate the clusters. The clear differentiation between some of the clusters supports the evidence that the hierarchical algorithm does not partition the sample mechanically but recognizes patterns in the data set. Some clusters are more compact and well differentiated, while others show more overlapping patterns, consistent with the coexistence of dense and diffuse structures in hierarchical clustering outputs (
Malzer & Baum, 2021).
Panel C presents the dendrogram, where the hierarchical structure of the clustering algorithm is presented, showing the merging of the observations and the different groups obtained during the process, where broad branches are associated with more general patterns, while smaller branches are associated with more idiosyncratic patterns. The level at which the branches merge represents the distance between the groups, while the existence of some long vertical jumps suggests that the clusters are genuinely different in terms of the underlying financial and market characteristics, as widely discussed in dendrogram-based interpretations of hierarchical clustering (
Tamarit et al., 2020;
Randriamihamison et al., 2021).
Panel D displays the standardized cluster means, which are the economic interpretation of the clusters. The differences in the clusters are evident in terms of the stock market breadth (VTX), relative importance of deposit money banks (DBS), remittance inflows (REM), market concentration (MCX), as well as the relative importance of international public debt (IPU). Clusters are characterized by diversified structures, with high values in terms of market concentration (MCX) and relative importance of deposit money banks (DBS), as well as low values in terms of remittance inflows (REM), while other clusters are characterized by the opposite: narrow structures, with low market concentration (MCX) and relative importance of deposit money banks (DBS), as well as high values in terms of remittance inflows (REM) and relative importance of international public debt (IPU), reflecting the ability of hierarchical clustering to capture heterogeneous structural patterns in multivariate data (
Ichino et al., 2021).
The extreme positive or negative values in the clusters indicate the existence of very specific financial structures, as suggested by the small, well-separated groups in the other panels. The figure suggests that the methodology of hierarchical clustering detects a rich segmentation in the data, highlighting the coexistence of a large group of similar observations, as well as a few smaller, more differentiated clusters, driven by different financial development, market structures, and external financial integration, as captured by the underlying five variables, consistent with recent applications of hierarchical clustering in complex multivariate systems (
Rebafka, 2024;
Cabezas et al., 2023).
The resulting scatter plot matrix in
Figure 3 provides a detailed view of the relationships between these five standardized financial variables, VTX, DBS, REM, MCX, and IPU, with the data points colored according to their clusters determined using hierarchical clustering. This form of visualization is useful not only for evaluating the internal consistency of these clusters but also for understanding the ways in which distinct financial dimensions interact with each other, as emphasized in recent studies on financial development and multivariate relationships (
Ikpesu, 2024;
Alhassan et al., 2025). See
Figure 3.
Several distinct financial relationships can be determined. For instance, there is a strong positive relationship between MCX and VTX, indicating that an expansion of stock exchange trading activities beyond the largest firms is related to a less concentrated and more diversified financial system. This is consistent with evidence linking market development and diversification to broader financial deepening (
Cuestas et al., 2020). This pattern holds across all clusters, with some clusters residing at more extreme levels.
In terms of DBS, there is a more nuanced relationship. Clusters with high levels of DBS, indicating a stronger financial system dominated by deposit money banks relative to the central bank, are typically found at moderate to high levels of MCX and low levels of REM. This indicates a stronger domestic capital market and less reliance upon remittance flows, consistent with findings that more developed banking systems reduce dependence on external financial inflows (
Alhassan et al., 2025).
By contrast, clusters with low DBS levels, indicating a less important role played by domestic banking, are more frequently associated with high levels of REM and, at times, high levels of IPU. This reflects the role of remittances as alternative financial resources in less developed financial systems (
Ikpesu, 2024).
From the REM panels, we can see that there is an obvious distinction between clusters that have high remittance dependence and those that have relatively lower remittances. Clusters that have high REM values tend to be located in areas that have lower values of VTX and MCX, indicating that they have relatively narrower and more concentrated market activity and tend to have higher IPU values, indicating more dependence on international public debt. Again, this suggests that remittance-dependent countries tend to have less developed domestic financial markets and more developed links with external sources of finance, consistent with empirical evidence on remittance-finance interactions (
Ikpesu, 2024;
Alhassan et al., 2025).
The IPU relationships also support this view of segmentation into remittance-dependent and less remittance-dependent countries. Clusters with high IPU values tend to group together with high REM values and lower values of VTX and MCX, while clusters with low IPU values tend more often to group together with stronger values of domestic market activity and DBS. This is consistent with literature linking financial structure, banking development, and external debt reliance (
Aldomy et al., 2020;
Budhathoki et al., 2024).
The colored point clouds of each of the variables also suggest that the hierarchical clustering has successfully captured significant multivariate structure and not simply arbitrary groupings of countries. Each of the clusters is located in relatively distinct areas of the variable space, with considerable overlap in the central or more “average” areas of each distribution.
In summary, the scatterplot matrix of the data set visually confirms that the hierarchical clustering has successfully captured significant and coherent financial and market structure, distinguishing between market-oriented systems that have well-developed banking systems and relatively well-developed equity markets and systems that tend to be more externally driven and have relatively higher remittances, relatively higher values of international public debt, and relatively lower values of domestic market activity, in line with recent empirical findings on financial system heterogeneity (
Cuestas et al., 2020;
Budhathoki et al., 2024).
6. Machine Learning Performance and Variable Importance Analysis
The normalized performance metrics provide a clear and consistent basis upon which all the algorithms can be compared along various dimensions of predictive accuracy and goodness of fit, as emphasized in recent machine learning evaluation frameworks (
Sagi & Rokach, 2018). All error-based performance metrics are standardized such that higher values indicate better performance, and the same is true of the R
2 metric. It is, therefore, possible to evaluate all algorithms along a common framework with regard to their predictive accuracy and goodness of fit.
It is possible to immediately rule out the neural networks based on their performance metrics, as they indicate a score of zero across all metrics, implying comparatively inferior performance compared to all other algorithms. The boosting regression model is found to possess moderate performance, with all error-based metrics indicating values closer to the midpoint, as well as a low value of R2, indicating that it does not perform better than basic models, consistent with evidence that ensemble methods can vary significantly depending on tuning and data characteristics.
The decision trees indicate comparatively better performance along dimensions of MSE, RMSE, and MAPE, but with a low value of R2. However, the relevant comparison here is with KNN, Random Forest, and SVM, as they are the ones that are ranked the highest. KNN achieves the maximum or near-maximum score for a number of error metrics, such as MAE and MAPE, and has exceptional scores for MSE, RMSE, and R2, implying local prediction accuracy with minimal average error, in line with findings on instance-based learning methods.
Similarly, the Random Forest achieves the maximum score for scaled MSE and R
2, and near-maximum scores for RMSE and MSE, implying high overall prediction accuracy. This supports extensive evidence that Random Forest models often outperform other algorithms in terms of predictive accuracy and robustness across different datasets (
Biau & Scornet, 2016;
Probst et al., 2019).
Although the SVM achieves high scores for MSE and RMSE, along with high R2, its performance for MAPE is extremely poor, with a zero score, implying significant problems in terms of percentage error. This reflects the sensitivity of SVM models to specific loss functions and data scaling issues in regression tasks.
Thus, if all the metrics are considered, the overall best-performing algorithm is the Random Forest, as it avoids zero scores for any error metrics, is near the top for all error metrics, and achieves the maximum score for R
2, implying the maximum proportion of variance explained by the model. Although the KNN might have a slight edge over the Random Forest for local error metrics such as MAE and MAPE, the overall performance of the Random Forest is significantly better than the other two, with minimal problems in terms of error, unlike the SVM and KNN, and therefore can be considered the best-performing model for those problems where overall prediction accuracy along with R
2 is important, consistent with comparative machine learning studies (
Sagi & Rokach, 2018). See
Table 12.
The feature importance metrics offer a vivid and informative view of the contribution of the four explanatory variables, namely MCX, REM, IPU, and DBS, towards the predictive capability of the model. The model is presumably a tree-based ensemble model, such as a Random Forest, where variable importance plays a crucial role in interpreting predictive mechanisms (
Bénard et al., 2021;
Williamson et al., 2023). The metrics offer a nuanced view of the contribution of each variable towards the dependent variable.
The Mean Decrease in Accuracy (MDA) is a measure of the decrease in predictive accuracy when the values of a particular variable are randomly permuted. The higher the value, the more important the variable is. This approach to permutation-based importance is widely used in modern machine learning interpretation frameworks (
Fisher et al., 2019;
Molnar et al., 2020). MCX has a much higher value of 495.125, more than three times higher than REM and IPU, and nearly nine times higher than DBS. This indicates that the variable representing market capitalization excluding the top ten companies has a significant contribution towards the predictive accuracy of the model. When information is fed into MCX at random, the capability of the model to predict or explain the dependent variable decreases correspondingly.
The Mean Decrease in Accuracy of DBS is 55.195, indicating that DBS is the least important of the four explanatory variables and, therefore, has played a relatively less decisive role. The Total Increase in Node Purity (TINP) is usually computed as the decrease in residual sum of squares over all regression tree splits for a particular variable. MCX has again recorded the highest value of 24,459.853, indicating its significance. REM and IPU recorded values of 11,351.301 and 10,534.754, respectively, indicating their contribution towards the model. The importance of remittance flows (REM) and international public debt (IPU) is reflected here. The contribution of these two variables towards splitting the trees towards homogeneity is high, but not as high as MCX. DBS has recorded a much lower TINP of 6851.254, indicating a relatively minor role played by the domestic business sector, consistent with interpretations of impurity-based importance in tree ensembles (
Janitza et al., 2018).
The Mean Dropout Loss, which is computed from the root mean squared error when the variable is dropped from the model, is another measure of variable importance. The higher the dropout loss, the more significant is the effect of dropping that variable from the model. This approach aligns with recent “leave-one-feature-out” or removal-based explanation frameworks (
Covert et al., 2021). Again, MCX has recorded the highest importance value of 22.182, which reconfirms its importance in the model. REM and IPU have recorded importance values of 12.734 and 12.111, respectively, indicating their relatively comparable importance. DBS has recorded the lowest importance value of 10.703.
These three importance measures provide a consistent overall picture of variable importance, clearly indicating that MCX is the most important variable in influencing the model’s predictions, while REM and IPU have intermediate importance values, reflecting their role as external financial flows. DBS has the lowest importance among all variables, although it remains non-negligible. The convergence of these importance measures increases confidence in their interpretation and suggests that the model’s predictions are not unduly driven by a single metric, an issue often highlighted in recent discussions on model interpretability and robustness (
Hooker et al., 2021;
Molnar et al., 2020). See
Table 13.
The table reports a decomposition of the predicted values of the dependent variable into a baseline component and the individual contributions of the explanatory variables DBS, REM, MCX, and IPU for five representative cases. The baseline value is constant across all cases at 42.517, which can be interpreted as the reference prediction in the absence of deviations in the explanatory variables, while the final predicted value is obtained by adding the variable-specific contributions to this base, consistent with additive explanation frameworks in interpretable machine learning (
Lundberg et al., 2020;
Covert et al., 2021).
Case 1 illustrates a situation in which all variables contribute positively to the prediction. Starting from the baseline of 42.517, the combined positive effects of DBS (0.681), REM (1.081), MCX (7.497), and IPU (8.518) raise the predicted value to 60.295. The largest contributions come from MCX and IPU, confirming the strong influence of market structure and international public debt in shaping the outcome. This case represents a profile where both equity market diversification and external financing conditions significantly boost the predicted level of the dependent variable.
Cases 2 and 3 show the opposite pattern, with predicted values well below the baseline. In Case 2, negative contributions from all variables, especially MCX (−11.650) and REM (−4.830), drive the prediction down to 19.734. A similar configuration appears in Case 3, where large negative effects from REM (−5.938), MCX (−9.263), and IPU (−5.183) reduce the predicted value to 20.671. These two cases highlight how unfavorable positions in market diversification and external flows can substantially depress the outcome relative to the baseline level, consistent with the interpretation of feature contributions as marginal effects on predictions (
Janzing et al., 2020).
Case 4 presents the highest predicted value, 66.774, driven by very strong positive contributions from REM (5.970) and especially MCX (17.398), alongside a positive effect from DBS (1.092). The small negative contribution from IPU (−0.202) is negligible in comparison. This case underscores the dominant role of equity market diversification in lifting the prediction far above the baseline, with remittances also playing an important supporting role, in line with Shapley-based interpretations of feature importance (
Frye et al., 2020).
Finally, Case 5 shows a more mixed configuration. Although DBS contributes negatively (−0.735) and IPU also exerts a downward effect (−3.285), positive contributions from REM (1.168) and MCX (7.075) are sufficient to keep the predicted value slightly above the baseline at 46.740.
Overall, the decomposition confirms that MCX and REM are the most influential drivers of variation around the baseline, while DBS and IPU play more secondary but still meaningful roles in shaping the final predictions. The consistency of these additive contributions across cases strengthens the interpretability of the model, although caution is warranted given known limitations of model-agnostic explanation techniques (
Molnar et al., 2020). See
Table 14.
Figure 4 presents the main results obtained from the random forest model and also offers additional information regarding the model’s performance in terms of prediction and the relative importance of the explanatory variables used in the model, consistent with established analyses of random forest behavior and performance (
Biau & Scornet, 2016;
Probst et al., 2019).
Panel A in the figure demonstrates the relationship between observed and predicted test values. The dispersion of the points around the 45-degree line in the graph suggests that the model’s performance in reproducing observed values is reasonably high. Panel B in the figure presents the evolution of the out-of-bag mean squared error for increasing numbers of trees in the random forest model for both the training and the validation sets.
It is clear from the graph in Panel B that initially, the error is high and fluctuates significantly for a small number of trees in the model, but it decreases sharply and stabilizes gradually for increasing numbers of trees in the model, which is consistent with the random forest model’s performance and its ability to benefit from increasing the ensemble size (
Oshiro et al., 2012). Moreover, the close proximity of the curves for the validation and training sets in the graph in Panel B suggests good generalization performance and little overfitting in the model, which is a well-known strength of random forest methods (
Biau & Scornet, 2016). However, beyond a certain point, the rate of improvement slows, indicating that the model has reached a level of stability and reliability, in line with findings on hyperparameter tuning and diminishing returns from increasing the number of trees (
Probst et al., 2019).
The variable importance plots, as represented in Panels C and D, provide a similar, albeit distinct, reading. Both plots show MCX as having the greatest influence, with a significant lead over the other predictors. This again points towards the fact that the structure and diversification of the equity market are significant contributors towards explaining the dependent variable. REM and IPU follow with a fair degree of distance, indicating that external private flows as well as international public debt contribute towards explaining the dependent variable. DBS has the least influence, indicating that, although relevant, the relative size of deposit-taking banks has a lesser impact. These findings are consistent with established approaches to variable importance in random forests (
Speiser et al., 2019;
Janitza et al., 2018).
The four plots provide a cohesive picture. The random forest model has shown satisfactory levels of predictive capability and stability, improving with an increase in ensemble size while still maintaining a reasonable gap between training and validation error. It has also provided an economically relevant reading, with MCX having the greatest influence, indicating the relevance of the structure and diversification of the equity market. The secondary level of influence of external financial flows is also relevant. The random forest approach can thus be considered a reliable method for analyzing the dependent variable, consistent with the broader literature on ensemble learning and predictive modeling (
Biau & Scornet, 2016;
Probst et al., 2019). See
Figure 4.
7. Integrated Evidence on the Determinants of Stock Market Trading Diversification
The empirical results provide compelling insights, indicating that stock market trading diversification (VTX) is essentially a function of equity market structure. The application of a combination of panel econometrics, hierarchical clustering, and machine learning approaches to the topic at hand yields consistent results and reinforces each other, fully consistent with the current state of the art in research on financial development and structural diversity (
Demirgüç-Kunt et al., 2021;
Sahay et al., 2015). Across all econometric specifications employed, including OLS, fixed effects, random effects, extended models with controls, and dynamic models with lagged regressors, the results are remarkably stable. Specifically, the market capitalization index, excluding the top ten companies (MCX), turns out to be the most important factor in determining trading diversification and displays a significant, positive, and long-lasting effect. Likewise, remittance inflows (REM) have a substantial positive effect, providing evidence of the role of external private financial flows in facilitating participation and trading diversification. Conversely, bank dominance (DBS) and international public debt (IPU) are correlated with more concentrated market structures, which is perfectly in line with the currently developing theory about financial structure and its evolution from banks’ preeminence towards market domination (
Demirgüç-Kunt et al., 2021). From the perspective of hierarchical clustering, financial systems exhibit significant structural diversity and a distinct core-periphery structure. While the largest number of countries fall into clusters with typical financial structures, smaller clusters exhibit very distinctive structures, in accordance with the clustering evidence (
Jiang et al., 2020). It should be emphasized that there is much heterogeneity across countries regarding market structure, financial intermediation, and external private inflows. As for the predictions made using machine learning algorithms, the results support the conclusions made above. Indeed, the Random Forest approach appears superior among all the models tested, confirming the superiority of ensemble techniques (
Biau & Scornet, 2016;
Probst et al., 2019). As regards variable importance measures, they consistently indicate that MCX is the leading variable affecting VTX, with REM and IPU ranking second and third, respectively, followed by DBS. In conclusion, it can be said that trading diversification should not be treated as an artifact resulting from financial development, but rather as a feature defined by market structure, facilitated by external private financing, and constrained by bank dominance and the use of international public debt. See
Table 15.
The results yield a clearly structured picture of determinants of stock market trading diversification and several key insights into economics. In any of the baseline, extended, and dynamic versions of the econometric analysis, the internal structure of equity markets is a leading factor driving the dynamics of trading diversification, in line with previous studies on market breadth (
Gurley & Shaw, 1967;
Gbadebo, 2024). Specifically, it is demonstrated that market breadth and composition (MCX) have a strong, highly statistically significant positive effect. Similarly, the positive effect of remittance inflows (REM) proved to be quite strong and persistent across the board. This implies that external financial flows expand access to trade by increasing liquidity and the investment capacity of households. This finding aligns with recent research on how remittances influence financial inclusion. The impact of other factors is ambiguous. For instance, international public debt (IPU) is negative and statistically significant across almost all versions considered, indicating that an increase in external public debt is associated with greater market concentration. At the same time, the significance and magnitude of the effect of deposit-taking banks (DBS) are reduced to zero when controlling variables are introduced. Hence, it is shown that financial structure affects macro-financial conditions but does not affect trading directly. It should also be noted that the introduction of control variables provides further support for the results. First, it should be noted that SMC and NLC are significant in almost all cases. Thus, it can be concluded that market size and breadth are important structural factors driving trading diversification. At the same time, GDP growth and credit depth appear to have a smaller effect, suggesting that the trading pattern is rather structural. The results were confirmed by the analysis based on dynamic specifications, which showed that the effects of MCX and REM were highly significant and persistent. At the same time, the effects of DBS and IPU remained statistically insignificant. The clustering analysis yields several insights into differences among countries in their financial structures and levels of development. Specifically, it appears that there are different types of financial development, and convergence to a single type is impossible. The presence of a core-periphery structure also supports this conclusion. In this regard, financial systems with a large MCX feature more diverse and broad trading activities. At the same time, bank-oriented and external-dependent systems are usually concentrated. Machine learning results further reinforce the obtained conclusions. The best-performing Random Forest model shows the MCX factor as the most prominent among those with significant weights. It means that market structure is the key determinant of trading diversification. Remittance inflow is another important variable alongside the IPU factor, while DBS is not critical. See
Figure 5.
9. Conclusions
This paper provides a structured contribution to the literature on financial development by examining the under-researched area of the internal organization of trading activity on equity markets. Through this focus, enabled by the development of the trading diversification indicator (VTX), a new structural view of the operation of financial systems emerges. All empirical estimations presented in this study, including baseline models, extended models, dynamic models with lagged variables, and robustness check regressions, provide consistent conclusions regarding the role of internal structure in trading diversification. First, market capitalization excluding the ten largest firms (MCX) emerges as the most important and consistent variable. Hence, it can be stated that the internal composition of the market is the main driver of trading diversification and, consequently, broader markets with greater participation from a larger number of firms exhibit lower levels of trading concentration. External private financial flows, measured by remittance volume (REM), are also important for trading diversification. In turn, external flows via public debt (IPU) are a strong inhibitor of diversification. At the same time, while the role of the banking structure (DBS) is significant in the baseline models, this variable becomes less relevant when control and lagged variables are added. Thus, the evidence shows that banking structure acts through financial conditions as a determinant, but is not itself directly related to trading diversification. The results obtained by adding control variables and dynamic models indicate that trading diversification is a structural process that is not influenced by the current state of the economy. Furthermore, market size (SMC) and the number of listed companies (NLC) serve as additional confirmation of the structural character of the process under consideration, alongside digital access (DIF) and public finance channel (PUB). Cluster analysis provides evidence of the core–periphery structure of the financial system, confirming significant heterogeneity across countries, indicating that they do not converge but have different developmental trajectories. This conclusion is supported by machine learning analysis, which identifies MCX as the main determinant of trading diversification, followed by REM and IPU, with DBS taking only third place. Thus, the convergence of conclusions from econometrics, cluster analysis, and machine learning provides confidence that the results are not biased by any specific model choice but instead reveal fundamental regularities. It can be concluded that trading diversification does not depend on financial system size and liquidity, but rather on the internal market structure and the openness of the participation channel. In particular, private external flows support diversification, while dependence on public funds limits diversification opportunities. In general, policy recommendations follow from the analysis results. To build a diversified and resilient financial system, emphasis should be placed on the internal structure of equity markets, their broadening, and the creation of additional channels of participation. Future studies could explore the issue by using firm-level data, examining causality, and expanding the range of countries analyzed.