1. Introduction
Population mobility has become a central topic of academic research in the international literature, given its profound impact on economic growth, social stability, and security within contemporary societies. A substantial body of research has examined the multifaceted determinants of migration, encompassing economic factors such as gross value added (GVA) and gross fixed capital formation (GFCF), as well as social dimensions including access to essential services, public infrastructure, and the quality of governance [
1,
2,
3]. Foundational models such as Lee’s push-pull framework [
1] and de Haas’s migration transition theory [
4] have conceptualised migration as an emergent outcome of interactions between origin and destination contexts, shaped by economic disparities and opportunities for social mobility. Moreover, recent studies have expanded the scope of analysis to include environmental drivers—most notably greenhouse gas emissions (GHGE)—as significant push factors, contributing to the development of the “climate migration” paradigm [
5,
6,
7].
Recent literature has also emphasised that migration is not solely determined by economic and environmental factors but is profoundly shaped by socio-political conditions, such as the quality of democratic governance, the degree of minority inclusion, and the existence of active integration policies for migrants [
8,
9,
10,
11,
12,
13,
14]. These socio-political dimensions influence not only the attractiveness of destination regions but also the resilience and adaptability of migrants within host societies, thereby interacting with economic and environmental variables in complex and context-specific ways.
Within the European context, scholarly attention has increasingly focused on Romanian migration flows, particularly in relation to labour market integration, welfare access, and the broader implications for social cohesion within host societies [
15,
16,
17,
18]. However, despite the abundance of studies addressing individual facets of migration—be it economic, environmental, or social—there is a paucity of research adopting a holistic, systems-oriented approach that integrates these dimensions into a unified analytical framework. Moreover, the extant literature has largely overlooked the potential of advanced analytical techniques, such as machine learning and intelligent information systems, in capturing the complex and non-linear interdependencies that characterise migration phenomena in the European Union.
This study aims to bridge this gap by developing a comprehensive analysis that synthesises economic, environmental, and service access variables, employing state-of-the-art machine learning methodologies—including K-Means, Expectation-Maximisation (EM), and M5P decision tree regressors—to uncover latent patterns and predictive relationships between gross value added (GVA), gross fixed capital formation (GFCF), greenhouse gas emissions (GHGE), and migration flows in EU Member States. By integrating these advanced analytical techniques within an intelligent information system framework, the study contributes to the international literature by: introducing a systemic perspective on migration that captures the synergistic effects of economic and environmental drivers; applying modern clustering and regression techniques to reveal hidden heterogeneity across countries and regions; and proposing policy-relevant insights for reducing regional disparities and enhancing socio-economic resilience in the European Union.
Approaching this topic from a multidimensional perspective—economic, social, and security—allows for a comprehensive analysis of the factors driving migration and offers solutions to increase the economic and social resilience of the European Union. The research has the following specific objectives:
O1. Analysing the relationship between economic indicators and migration to identify specific patterns that determine migration flows;
O2. Assess the impact of greenhouse gas emissions on population mobility;
O3. Determine the influence of access to essential services on economic and social stability;
O4. Identify clusters of countries and regions by economic and migration profile;
O5. Developing a predictive model to assess migration trends;
O6. Assess the implications of modelling for improving economic and social security;
O7. Integrate decision tree models into a decision information system for analysing; and forecasting migration flows as a function of socio-economic and climatic variables.
O8. Analyse the potential of HII information systems in the formulation of public policies on migration and equitable access to services within the European Union.
To operationalize these contributions, the study addresses the following research questions:
Q1: What are the systemic relationships between economic indicators (GVA, GFCF), environmental factors (GHGE), and migration flows in EU Member States?
Q2: How does unequal access to essential services, including healthcare and digital infrastructure, shape migration decisions across different socio-economic contexts?
Q3: To what extent can machine learning algorithms identify clusters of countries with similar economic and migration profiles, and how might these clusters inform differentiated policy responses aimed at enhancing social and economic security?
The study makes a significant contribution to the literature by integrating a multidisciplinary approach linking economic factors, migration, and social security, using advanced data modelling methods. Unlike previous research that analyses these variables independently or within limited theoretical frameworks, the present study innovates by taking an integrated approach of economic and environmental factors in migration analysis, using advanced machine learning techniques to identify migration patterns, analysing economic and social security from the perspective of regional imbalances, and developing a predictive model to assess the impact of economic factors on migration specific to an HII. The research combines macroeconomic indicators such as gross value added (GVA) and gross fixed capital formation (GFCF) with environmental factors such as greenhouse gas emissions (GHGE) to assess the cumulative impact on population mobility. The study applies clustering methods (K-Means and Expectation-Maximisation) and regression decision tree (M5P)-based models, which allow a deeper analysis of the complex relationships between variables, overcoming the limitations of traditional econometric models.
The research proposes an interpretation of migration not only as a demographic phenomenon, but also as an indicator of economic and social vulnerabilities, emphasising how inequalities in access to services (health—SHA11, digital infrastructure—HIAC) can amplify the risks of instability. At the same time, the study introduces a framework for predicting population mobility based on economic and environmental variables, which in combination with an HII provides a useful tool for policy makers to formulate proactive policies to manage migration flows.
Through these contributions, the study not only complements the specialised literature, but also showcases the potential of the components of the integrated information system to provide an innovative and applied perspective on the interdependencies between economy, migration, and security, generating solid premises for informing public policies at national and European level.
2. Literature Review
Analysing the relationships between economic factors, social sustainability, and internal or international migration has seen a significant increase in the literature over the last decade, as population mobility has become a sensitive indicator of structural imbalances in contemporary societies. Recent studies increasingly focus on the combined effects of economic development, investment in social infrastructure, environmental degradation and access to essential services on migration decisions. This section critically reviews the relevant academic contributions, focusing on the complexity of the interactions between macroeconomic indicators and migration behaviour in the European Union space.
2.1. The Correlation Between Economic Growth and Migration
The literature on the relationship between economic growth and migration traditionally starts from the push–pull model developed by Everett Lee [
1], according to which the migration decision is simultaneously influenced by the factors that push people out of their areas of origin (push) and those that pull them to more favourable destinations (pull). The authors Janicki and Ledwith [
19] argued that economic development initially provides the population with the necessary resources for mobility, such as financial capital and information capital, which may lead to a temporary increase in emigration before flows stabilise. The author De Haas [
4] confirmed this hypothesis with the ‘migration transition’ model, according to which economic development does not immediately reduce migration, but on the contrary, it may stimulate it in the early stages, especially in regions where frustrations about the lack of real opportunities accumulate, despite rising macroeconomic indicators.
In the economic literature, economic development is frequently analysed as a determinant of migration, as a structural element influencing people’s mobility decisions. Studies underline the ambivalent nature of this impact: in destination countries, economic growth is associated with employment opportunities, stability, and favourable integration prospects, which contribute to attracting migrants [
20,
21]; instead, in regions of origin with unequal distribution of the benefits of development, economic expansion can accentuate internal disparities and stimulate migration, especially where growth is concentrated in favoured urban areas or sectors at the expense of peripheral or underdeveloped areas [
3,
22].
In the European context, recent research shows that economic development, analysed in isolation by global indicators, does not provide a complete picture of the causes of migration [
23]. In this context, variables such as the level of income inequality, the unemployment rate, and the territorial distribution of investment need to be integrated to fully understand the dynamics of population mobility [
24]. A strictly quantitative perspective on economic growth risks ignoring the phenomena of inequitable development, where economic progress is not accompanied by a commensurate improvement in living conditions for all social groups, thus generating collective frustrations and strong migratory pressures. Recent research suggests that the economic impact on migration needs to be analysed in conjunction with institutional and policy factors. Some studies emphasise that reforms in public infrastructure, investment policies and support for entrepreneurship can alleviate migratory pressures even in the context of modest economic performance [
25,
26], while the absence of proactive policies can lead to massive emigration from countries that are converging nominally but lack social inclusion [
27].
The relationship between economic development and migration does not follow a linear path, being profoundly influenced by the structural context, the level of institutional development and the capacity of the formal economy to integrate and sustain population mobility. Thus, the recent literature argues in favour of a theoretical reconfiguration of the push-pull model towards a multilateral paradigm in which migration is understood as the emergent outcome of complex interactions between the economic, social and territorial dimensions of development.
2.2. The Impact of Greenhouse Gas Emissions and Climate Change on Population Mobility
Recent studies emphasise that environmental degradation and intensifying climate change are contributing to population migration, leading to long-term relocations and significant changes in the territorial distribution of the working population. The literature conceptualises this phenomenon as “climate migration” or “adaptive migration”, underlining that mobility forced by environmental factors is no longer a theoretical hypothesis but an empirically documented reality, especially in regions exposed to ecosystem degradation, over-industrialisation and climate risk intensification [
5,
28,
29,
30,
31].
Authors Ochi and Saidi [
32], Gyamfi et al. [
33], and Deveci et al. [
34] have shown in their studies that greenhouse gas emissions (GHGE) are both a symptom of pressure on natural resources and an indirect driver of population mobility, especially in densely populated and industrialised urban areas. According to other studies [
35,
36,
37], adaptive migration is driven by the progressive degradation of the natural environment, affecting livelihoods, agriculture, food security, and access to basic resources such as water. This form of migration is often internal but can take transnational dimensions in contexts of systemic vulnerability.
Without proactive interventions, climate change-induced migration risks amplifying territorial inequalities and exacerbating social tensions. From this perspective, mainstreaming the territorial dimension and social justice into climate policies becomes an imperative for the long-term sustainability of the European Union.
2.3. Access to Essential Services and Economic and Social Stability
Unequal access to essential public services is a major determinant of migration decisions, as it directly influences quality of life and perceptions of social security. This line of research focuses in particular on health services and digital infrastructure, two fundamental pillars of modern socio-economic inclusion. Studies [
38,
39,
40,
41,
42] argue that systemic deficits in access to these services not only affect individual well-being, but also act as push factors, driving population mobility from disadvantaged regions to those better equipped in institutional and technological terms. Studies by Jia [
43], Peng and Ling [
44], Renzaho et al. [
45], and Bossavie and Özden [
46] emphasise that migration is significantly influenced by the quality of basic services available in the countries of origin and destination, especially in health and education.
According to OECD analyses [
47,
48], public spending on health is a proxy indicator of the stability of the social system, and the lack of investment in this area increases feelings of insecurity and social exclusion, stimulating intentions to migrate.
Access to digital infrastructure, as measured by indicators such as the share of households with an internet connection, is increasingly associated with economic participation in the digitised society [
49,
50,
51].
Regional inequalities in the distribution of these services generate risks of exclusion, deepen economic and social disparities and accentuate territorial polarisation within the EU. For this reason, public interventions should not only aim at increasing investment, but also at tailoring service provision to the specific needs of each region, with a view to strengthening social cohesion and reducing migration due to lack of basic opportunities.
A growing body of research has focused on Romanian migration to Western Europe, highlighting the complex interplay between structural conditions, social networks, and individual agency. Vlase [
52], for instance, examined the gendered dimensions of migration between Vulturu (Romania) and Rome, Italy, revealing how migration is shaped not only by economic necessity but also by kinship networks and gendered expectations. Similarly, Anghel and Horváth [
53] has contributed significantly to the theorization of Romanian migration through case studies that emphasise the importance of social capital and transnational practices in sustaining mobility and facilitating adaptation in host countries. These perspectives underscore the need to approach migration not as a unidirectional economic process but as a socially embedded phenomenon with enduring cross-border ties. Recent studies have further documented how Romanian migrants experience integration in Western Europe in ways that are often mediated by cultural stereotypes, institutional discrimination, and uneven access to public services. Dragan et al. [
54], in a nationwide survey on public perceptions of immigrant integration, found that both host societies and migrant communities navigate integration through a complex mix of expectations, informal norms, and structural constraints. This literature points to the need for more grounded and context-specific analyses that take into account the lived experiences of Romanian migrants, particularly in relation to healthcare, housing, and digital inclusion. Moreover, as Cosciug [
55] have argued, Romanian migration has increasingly taken the form of circular or pendular mobility, facilitated by EU citizenship rights and evolving labour market dynamics. Their findings highlight the blurred boundary between temporary and permanent migration, as well as the growing importance of governance frameworks that can accommodate transnational lives. Integrating these perspectives into broader migration modelling approaches contributes to a more accurate and policy-relevant understanding of intra-European mobility, particularly in light of Romania’s dual status as both a sending and a transitional country.
2.4. Migration Analysis and Prediction Models
The literature on migration modelling has evolved significantly in recent decades, reflecting a transition from classical econometric techniques to modern machine learning and artificial intelligence methods. Previous studies have predominantly relied on linear or logistic regression models to estimate the impact of economic and social factors on migration flows, providing a robust causal interpretation, but often limited by the assumption of linear relationships and independence of explanatory variables.
Recent contributions in the field of migration modelling have underscored the need to move beyond purely economic or environmental determinants by incorporating socio-political dimensions that more accurately reflect the multifaceted nature of population mobility. The integration of variables such as political stability, cultural norms, and migration governance frameworks has been highlighted as essential for capturing the institutional and normative contexts in which migration decisions are made. As emphasised by Meyers [
56], the inclusion of governance and policy-related factors enables a more nuanced understanding of the systemic drivers underlying population movements, thereby strengthening the analytical depth of empirical studies. In parallel, extending the temporal dimension through the use of longitudinal data is increasingly recognised as a critical methodological enhancement. Elhorst [
57] illustrates that panel and time-series approaches provide a more robust framework for analysing dynamic migration patterns, revealing how economic and policy shifts unfold over time and influence mobility in both the short and long term. These insights suggest that combining socio-political indicators with temporal depth can significantly improve both the explanatory power and policy relevance of migration models. In addition to refining variable selection and temporal design, the literature has also stressed the importance of methodological rigour in the validation of machine learning models employed in migration studies. James et al. [
58] advocate for systematic validation procedures such as cross-validation and sensitivity analysis, which are indispensable for ensuring the stability, predictive accuracy, and generalisability of model outputs. Without such techniques, empirical results may risk overfitting or lack reproducibility in diverse policy environments. Moreover, the translation of analytical findings into concrete policy recommendations represents a critical dimension of applied migration research. Carling and Collins [
59] argue that connecting empirical typologies—such as those derived from clustering and predictive models—to differentiated intervention strategies is fundamental for informing targeted responses at regional, national, or supranational levels. This entails not only identifying systemic relationships and structural profiles but also operationalising them into actionable public policy. The relevance of this integrative approach has been further demonstrated in recent work by Chattoraj and Ullah [
60], who analysed the interplay between labour mobility and foreign direct investment in India during the COVID-19 pandemic, highlighting how crisis conditions reshape migration patterns through both institutional and economic channels. Altogether, the literature points toward an integrated research agenda that combines socio-political contextualisation, longitudinal design, rigorous model validation, and policy translation to advance both the methodological sophistication and practical utility of migration modelling in contemporary settings.
Specialised studies highlight the advantages of using hybrid predictive models, such as decision trees (Random Forest) [
61,
62,
63,
64], support vector machines (SVM) [
65,
66], and M5P regressors [
67,
68,
69], which allow capturing of non-linear relationships and complex interdependencies between explanatory variables. These models provide superior accuracy in estimating migration flows, outperforming classical techniques in terms of prediction accuracy and model flexibility. On the other hand, the literature on clustering algorithms, such as K-Means and Expectation-Maximisation (EM), highlights their ability to identify latent patterns in the structure of socio-economic data. The authors of Dai et al. [
70] have shown that these methods allow for a more refined segmentation of the populations analysed, contributing to a more precise regionalisation of public policy interventions in the field of migration. This approach is particularly useful in the context of spatial analysis and differentiated assessment of push and pull factors at the regional level.
Machine learning models have been successfully applied in recent studies on delayed migration, seasonal migration, and circular migration. Recent studies in the literature [
71,
72,
73,
74] emphasise that machine learning models, especially those based on decision trees such as Random Forest, provide a superior framework for migration analysis through their ability to capture complex, seasonal and non-linear relationships between economic, demographic, climatic, and institutional factors. These methods are particularly effective in identifying regional and temporal variations in migration behaviour and are more adaptable than traditional deterministic models, which assume stable and linear relationships between variables. In parallel, recent applications of artificial neural networks and advanced architectures, such as LSTM (Long Short-Term Memory), contribute to modelling migration as a sequential and context-dependent process, especially in situations characterised by climate uncertainty, economic instability or recurrent socio-political pressures [
75,
76,
77,
78]. These networks are able to learn from extended time series and generate robust forecasts of future migration trends, including circular or seasonal migration. The use of such models also allows the integration of heterogeneous and unstructured data sources such as satellite imagery, mobility data, meteorological data, or information from social networks, thus enhancing a multidimensional and predictive approach to migration. Machine learning techniques allow the integration of large volumes of unstructured and multi-source data, including spatial, demographic, environmental, and socio-economic data, creating a holistic analytical framework for anticipating migration trends [
79,
80,
81]. This methodological flexibility is essential in the context of multiple crises and the accelerated dynamics of contemporary migration, where traditional causal relationships are often disrupted by unpredictable and non-linear factors.
The literature review highlights that modern classification, clustering and regression techniques, such as M5P, ANN, Random Forest, or Bayesian models, offer valuable insights into migration through their ability to capture the deterministic and evolutionary complexity of this phenomenon. The integration of these methods into empirical studies is becoming indispensable for the formulation of evidence-based public policies capable of anticipating and managing migration phenomena more effectively in the European area. The integration of these methods into empirical studies is becoming indispensable for the formulation of evidence-based public policies capable of anticipating and managing migration phenomena more effectively in the European area.
2.5. Implications of Migration on Economic and Social Security
Contemporary literature on the implications of migration for economic and social security has expanded considerably, reflecting the complexity of the direct and indirect effects of population mobility on systemic stability. This line of research examines in an integrated way the influence of migration on the labour market, social cohesion, the risks of exclusion and radicalization, and the effectiveness of economic integration strategies within the European Union. According to recent studies [
82,
83,
84], migration can generate imbalances in the structure of the labour market, especially when it occurs in an uncoordinated manner or in the absence of active integration policies. Massive and rapid migration can put pressure on social protection systems, on still low wages and on the employment of domestic workers in vulnerable sectors [
85,
86]. At the same time, difficulties in integrating the migrant population into the formal sector of the economy, caused by administrative barriers, lack of recognition of qualifications, or institutional discrimination, often encourage employment in the informal economy [
87,
88,
89]. This situation increases job insecurity, the economic vulnerability of migrants and contributes to social tensions in destination communities.
Several recent empirical studies [
90,
91,
92,
93] have confirmed that the economic effects of migration are profoundly mediated by national institutional policies and the capacity of labour markets to absorb and integrate newcomers efficiently. Other research [
94,
95,
96] has also shown that long-term labour market effects depend crucially on the compatibility between the profile of migrants and the structure of domestic demand for skills, which calls for active policies for vocational training and skills recognition.
From a social point of view, the literature emphasises that migration influences the level of cohesion in communities, especially when it takes place under conditions of residential segregation, limited access to public services or polarising public discourses [
97,
98,
99]. Other literature [
100,
101,
102] has argued that migration needs to be conceptualised as a dimension of systemic sustainability, as its mismanagement can undermine long-term trust in public institutions, sense of belonging and social solidarity. In addition, authors Hossain [
103] and Letsch [
104] have emphasised the importance of intercultural dialogue and community interventions in mitigating perceptions of competition and preventing social conflict. Other studies [
105,
106,
107], have emphasised local population perceptions of resource redistribution and the impact on public services, identifying these dimensions as crucial factors in the acceptance or rejection of migrants. The literature argues in favour of a more reflexive and integrative migration governance capable of responding to the real dynamics of the migration phenomenon [
108,
109,
110]. This governance should be adaptive, include participatory processes, and focus on alleviating inequalities and building inclusive capacities at local, national and European levels.
A review of the literature shows a reconceptualization of migration from a systemic perspective, in which human mobility is seen not as a singular threat, but as a complex phenomenon with the potential for positive socio-economic transformation, if managed through proactive, inclusive, and coordinated policies at European level. This type of approach not only fosters economic and social security but also strengthens cohesion in the context of cultural pluralism and global interdependence.
2.6. The Implications of Intelligent Information Systems for Enhancing Social and Economic Security
Recent studies in the field of digital transformation increasingly emphasise that information systems are a key enabler for enhancing economic and social security, especially in the context of intensifying global challenges and increasing pressure on public infrastructures [
111,
112,
113,
114]. According to research by [
111], investment in social security, when supported by a coherent and integrated information system, facilitates human capital accumulation and fosters sustainable economic growth. On a convergent line [
112], they draw attention to how intelligent information technologies can support the achievement of sustainable development goals, but also point out the ethical risks associated with their use in socially sensitive areas. Also, the digital business literature argues that accelerated digitization, coupled with strategic use of information systems, has a positive impact on organisational performance and sustainability [
113,
115]. Applied supply chain studies indicate that technology integration supports economic resilience and responsiveness to external shocks [
113,
116]. Research by Clark et al. [
117] also highlights the need to include digital connectivity in the design of sustainable development policies, recognising information technology as a balancing element in the socio-economic architecture.
In retrospect, the literature converges on the idea that information systems can no longer be seen exclusively as operational tools, but as strategic pillars of modern governance. They provide the capacity to collect and analyse large volumes of data, to inform evidence-based public decisions and to build adaptive policies, essential for ensuring economic and social stability in an environment characterised by uncertainty and complexity [
113,
116,
117].
3. Materials and Methods
Given the relevance of information systems in strengthening economic and social security, as highlighted in the literature [
111,
112,
114], this paper proposes an integrated analytical approach to investigate the relationships between information infrastructure, socio-economic policies and institutional resilience in the European context. The methodological rationale is based on the premise that information systems not only reflect socio-economic realities but can also generate predictive models useful in decision making, thus contributing to reducing structural vulnerabilities. The study proposes components of an intelligent information system (IIS) designed to analyse and predict migration patterns in the European Union based on socio-economic and environmental determinants. In the first stage, the data pre-processing module of the system ensured data consistency and accuracy by analysing the distribution of attributes and handling missing values. Key variables—gross value added (GVA), gross fixed capital formation (GFCF), health expenditure (SHA11), greenhouse gas emissions (GHGE), internet access (HIAC), and migration rate (MIGR)—were examined to identify possible biases that could affect the analytical results.
The exploratory system analysis module revealed non-linear interdependencies between variables. Within the machine learning module, clustering techniques (K-Means and Expectation-Maximisation) were applied to segment the data into relevant subgroups, supporting the identification of vulnerable regions and migration typologies (O4). The use of the M5P regression tree in the decision module allowed the extraction of interpretable rules and complex relationships between economic factors and population mobility, significantly improving the predictive capacity of the system (O5, O6). By integrating the tree model into a digital decision architecture (O7), the system demonstrated the ability to simulate migration as a function of socioeconomic and climatic conditions, providing early warnings of potential social instabilities. The results emphasise the strategic role of HIIs in relationship-based public policy making (O8), particularly in the design of targeted interventions to reduce regional disparities and promote equitable access to services. This research emphasises how the architecture of intelligent information systems can strengthen the analytical capacity of institutions in addressing complex transnational phenomena such as migration, providing a solid, data-driven foundation for sustainable governance in the European Union. In this direction, the research employs a combination of quantitative and qualitative methods, with a focus on the analysis of data from official Eurostat sources, coupled with machine learning applications to identify patterns in the evolution of relevant indicators. The application component is underpinned by the possibility of integrating these data into an information system to support the simulation of public policy scenarios and the assessment of their impact on social cohesion and economic stability.
The integrated analytical approach aimed at investigating the relationships between information infrastructure is detailed and based on rigorous processing of input data, i.e., macroeconomic indicators collected from Eurostat platform over the period 2010–2023 (
Table 1), identification of hidden patterns and modelling of the relationships between economic variables and migration using advanced machine learning techniques.
As a first step, the data underwent a pre-processing process to ensure their accuracy and consistency by analysing the distribution of attributes and dealing with missing values. Particular attention was paid to the characterisation of each individual parameter, namely gross value added (GVA), gross fixed capital formation (GFCF), health expenditure (SHA11), greenhouse gas emissions (GHGE), internet access (HIAC), and migration (MIGR), in order to highlight possible imbalances or inconsistencies that could influence the results. After this preliminary stage, the relationships between the variables were analysed by means of a scatter plot, which led to the conclusion that strictly linear relationships between the indicators considered could not be identified. This led to the need for more flexible machine learning techniques capable of capturing the complexity of interactions between variables. As the distribution of the data points suggested the existence of distinct structures, we opted for the use of classification algorithms, in particular those based on decision trees. To identify hidden patterns in the data, two clustering methods were compared, namely K-Means and Expectation-Maximisation (EM). The application of the K-Means algorithm allowed the grouping of the instances into two clusters, which provided a first segmentation of the data according to the similarities between the variables. However, given that this method constrains the distribution of the clusters to spherical shapes, it was considered appropriate to apply the EM method, which, was able to detect finer structures and identified four distinct clusters, characterised by differentiated values of GVA, GHGE and MIGR. This approach emphasised that the EM method is better suited to capture the complexity of the data, allowing the identification of subgroups that were not clearly delineated in the K-Means results.
Since the data distribution did not present clear separations, it was necessary to discretize the data, an essential step to optimise the training of classification models. In this direction, the M5P method was used, a decision tree model for regression, which allows the segmentation of the data into homogeneous subsets and the fitting of linear relationships specific to each subset. By applying this model to the relationship between GVA, GHGE and MIGR, several distinct regression functions were generated, each corresponding to a category of data identified during the learning process. The M5P model proved suitable as it captured non-linear relationships across the dataset while maintaining high interpretability through the regression equations specific to each data subset. The use of the EM technique showed a superior clustering ability compared to K-Means, while the M5P model allowed a more accurate prediction of migration trends in relation to the economic and ecological indicators considered. By applying these methods, the research provides a complex perspective on the interactions between economic factors and migration, demonstrating the need to use advanced data analysis techniques to capture non-linear correlations between variables. The proposed methodology aims not only to investigate the empirical relationships between the variables analysed, but also to harness the potential of information systems as tools for adaptive governance in a global landscape marked by rapid and uneven transformations and to support a deeper understanding of migration from an economic and security perspective with the aim of reducing regional disparities in the European Union.
The dataset employed in this study combines harmonised indicators from Eurostat, ensuring a high degree of reliability, cross-national comparability, and methodological coherence across EU Member States. The inclusion of variables such as Gross Value Added (GVA), Gross Fixed Capital Formation (GFCF), greenhouse gas emissions (GHGE), healthcare expenditure (SHA11), household internet access (HIAC), and net migration balances provides a multidimensional framework that captures both structural and infrastructural determinants of mobility. One key advantage of the dataset lies in its standardisation, which facilitates robust statistical modelling and mitigates the risk of inconsistencies stemming from national reporting biases. Moreover, the use of recent data (2013–2023) allows the study to reflect post-crisis recovery dynamics and the early effects of digital and green transitions within the EU. At the methodological level, the use of the M5P regression tree model and the Expectation-Maximisation (EM) clustering algorithm offers several benefits. The M5P model allows for the identification of non-linear interactions and provides interpretable rule-based structures, making it suitable for policy-oriented analysis. Meanwhile, the EM algorithm is capable of probabilistically assigning countries to latent clusters based on continuous distributions, offering greater granularity in capturing heterogeneous national profiles. However, the study is not without limitations. The cross-sectional structure of the dataset restricts the temporal dimension of the analysis, thereby limiting the capacity to model dynamic changes or delayed policy effects. While the model identifies robust patterns, it does not account for causality or endogeneity between variables. Furthermore, the exclusion of socio-political variables—such as institutional quality, governance indicators, or migration policy frameworks—due to data unavailability, narrows the scope of explanatory power. Similarly, the reliance on secondary data, while methodologically justified, may omit qualitative dimensions of migrant experiences that are critical for understanding integration processes and subjective motivations. Despite these limitations, the model provides a statistically sound and policy-relevant foundation for understanding intra-European migration patterns, which can be further refined in future research through longitudinal and mixed-method approaches.
4. Results
The period analysed (2010–2023) was marked by multiple crises (sovereign debt crisis, 2015 refugee crisis, COVID-19 pandemic, post-2022 energy crisis), which accentuated the existing structural inequalities in the EU. The distribution analysed confirms the persistence of an uneven development pattern (
Figure 1), where European cohesion policies have not yet succeeded in ensuring real convergence between the western regions and the eastern or southern ones. Moreover, the green and digital transition is adding additional pressure on countries with outdated infrastructures and insufficiently adapted human capital, boosting economic migration and exacerbating social tensions.
For the economic and social variables, the distribution shows specific features. The indicator “GVA” (gross value added) is strongly asymmetric, with a high concentration of values in the lower ranges, suggesting the existence of dominant economies with significantly higher values (represented by the western core of the EU (Germany, France, the Netherlands) dominating the European economic space), while most countries have lower levels. The same trend is also observed for “GFCF” (gross fixed capital formation), where most values are located in the lower part of the distribution, indicating large discrepancies between the economies analysed. The variable “SHA11” (health expenditure) shows a heterogeneous distribution, with a large number of values concentrated at the extremes, suggesting significant differences between countries in terms of budget allocations to this sector. This variation accentuates social inequalities and contributes to migration in search of better living conditions and social protection. Similarly, ‘HIAC’ (household internet access) shows an upward distribution, indicating a steady increase in internet penetration in most countries. Differences between countries indicate persistent challenges in digital cohesion, which indirectly influence migration decisions, especially in the context of the transition to a digital economy. The “GHGE” indicator (greenhouse gas emissions) shows an uneven distribution, with high values for a small number of industrialised countries. This reflects a concentration of polluting activities in countries with heavy industries and heavily used energy infrastructure. At the same time, these emissions are an indirect indicator of sustainability pressures, being correlated with risks of climate migration or a shift in public policy towards greening. The ‘MIGR’ (migration) variable has an unbalanced distribution, with only a few countries recording large migration flows, either as destination or source. This suggests a concentration of migratory pressures in developed regions, reflecting complex push–pull dynamics fuelled by differences in economic, social, and quality of life opportunities between EU countries.
The scatter plot of attributes for the migration class shown in
Figure 2 highlights the lack of strictly linear relationships between the explanatory variables (such as GVA, GHGE, SHA11) and the MIGR target class. This absence of linearity implies that the economic and social relationships driving migration are much more complex than a classical linear regression model would allow.
The non-linear observations suggest the existence of contextual and non-linear interactions between economic indicators and migration, probably influenced by structural factors such as regional inequalities, the absorptive capacity of public services or environmental pressures. At the same time, the visible emergence of distinct clusters in
Figure 2 indicates the formation of groups of countries with similar migration profiles and close economic characteristics, which justifies the use of classification algorithms such as decision trees or hybrid models (M5P) capable of capturing non-linear relationships and allowing logical segmentation of the data. Thus, these patterns support the proposed systemic approach, where migration is seen not as an isolated effect of a single indicator, but as the result of complex interdependencies between economic development, social sustainability and structural disparities between EU Member States.
The application of the K-Means algorithm aimed to identify hidden patterns in the economic and social data of the EU Member States over the period 2010–2023, focusing on the relationship between gross value added (GVA), greenhouse gas emissions (GHGE) and migration (MIGR). This approach supports the systemic approach, where migration is analysed as an emergent outcome of complex relationships between economic development, environmental pressures and social dynamics. The unsupervised machine learning technique (based on the K-Means algorithm) was applied on a set of 378 observations, containing the normalised variables GVA, GHGE, and MIGR (
Table 2). The objective was to automatically cluster countries according to the similarities between these attributes, without assuming linear relationships or predefined patterns. The algorithm determined two optimised clusters based on Euclidean distance and internal variance minimization, which allowed data segmentation according to the proximity of values.
According to the data presented in
Table 2, the results have allowed the definition of two clusters with distinct economic and social profiles, relevant for the interpretation of migration in the EU. Cluster 0 is characterised by 77% of the observations and includes the countries of the EU economic core (Germany, the Netherlands, France, Belgium, etc.), characterised by strong economies, investment in green technologies, and migration absorption capacity. They reflect a favourable combination of economic development and social sustainability. Cluster 1 is characterised by 23% of the observations and includes peripheral economies with polluting industrial infrastructure and economic performance below the European average, consisting of Eastern and South-Eastern European countries (Romania, Bulgaria, Greece). Within this cluster migration is low and indicates dominant emigration and low attractiveness for immigrants. During the period under analysis, the EU has faced multiple challenges that have accentuated structural differences between Member States, and the K-Means analysis captures the emerging economic binary between the prosperous ‘core’ and the vulnerable ‘periphery’. Cluster 0 represents the core of resilience, while Cluster 1 signals areas of systemic fragility. This segmentation highlights that migration in the EU cannot be understood in isolation, but only in interdependence with economic performance and environmental impacts (
Figure 3).
Additional tests were performed using 10-fold cross-validation applied to the M5P regression model on the complete dataset of 378 observations. The results confirm the model’s strong predictive performance, with a correlation coefficient of 0.878 and relative errors (RAE and RRSE) below 19% (
Table 3).
The mean absolute error (3.11) and root mean squared error (6.12) remain within acceptable bounds, reflecting both precision and stability in the prediction of migration flows. These validation outcomes demonstrate a high degree of generalizability and reinforce the robustness of the model across the dataset. The consistency of performance metrics supports the conclusion that the M5P model is well suited for capturing non-linear relationships between macroeconomic indicators and population mobility within the European Union.
The use of K-Means has overcome the limitations of traditional models and provides a framework for differentiated policies: for Cluster 1 countries, policies should aim at investing in green infrastructure, attracting human capital and reducing emissions, while for Cluster 0, the focus can be on integrating migrants and maintaining ecological balance. Using the K-Means, the conceptual map of economic and social fragmentation in the EU was highlighted, demonstrating that social sustainability cannot be achieved without balancing economic development and environmental impacts. The results obtained related to the study’s objective O1 capture the complexity of the relationships between economy, environment and society.
The application of the Expectation-Maximisation (EM) clustering method represents the second step in uncovering hidden patterns between macroeconomic indicators and the migration phenomenon (
Table 4), in a European context marked by economic and social tensions and transition towards sustainability, characteristic of the period under study, according to O4. EM is a probabilistic soft clustering algorithm, which, unlike K-Means (based on distance and spherical shapes of clusters), allows grouping data according to probabilistic distributions and recognising arbitrary shapes and overlaps between clusters and is applicable in the analysis of social and economic phenomena, which do not conform to rigid boundaries or uniform distributions.
The application of the EM algorithm allowed the identification of four distinct clusters, each reflecting specific configurations of the interplay between economic development, environmental impacts and migration. Cluster 0 (20%) consists of developed, migration-attractive states with strong economies and active environmental policies (e.g., Germany, the Netherlands, Sweden). Cluster 1 (24%) associates countries with a weak economy, but also with a low environmental footprint due to a lack of industrial activity (such as Bulgaria and Romania), countries that are economically vulnerable but not necessarily environmentally vulnerable. Cluster 2 (10%) is made up of countries that are economically underdeveloped but heavily polluting due to outdated industries and lack of environmental policies. They may represent critical areas for social and environmental sustainability. Cluster 3 (46%) is an “intermediate” cluster, bringing together countries in transition or partial development with mixed economic and social performance (e.g., Poland, Hungary, Portugal). The application of the EM algorithm (
Figure 4) allowed not only a statistical segmentation, but also a modelling of systemic vulnerabilities (some countries being exposed for economic reasons, others for environmental reasons or both).
Comparative analysis of the results provided by the K-Means (
Figure 3) and Expectation-Maximisation (EM) clustering algorithms reveals key methodological and epistemological differences in the ability of each to capture the complexity of the relationships between economic indicators and migration in the European Union (
Figure 5).
We proceeded with the training of classifiers using the M5P (Model Tree) model in order to build an explicit and predictive representation of the relationship between macroeconomic indicators and migration in the European Union, in the period 2010–2023. The M5P decision tree regression model M5P was trained using data corresponding to 378 instances, each containing information on GVA—gross value added, SHA11—health expenditure, GHGE—greenhouse gas emissions and MIGR—target variable representing migration, according to O5. The M5P model was preferred because it is able to handle non-linear relationships between macroeconomic variables and migration, while providing locally fitted linear models for each data subset (terminal nodes). More specifically, the model identified 18 significant subsets (leaves of the tree), for which a specific linear regression equation was constructed, highlighting the particular way in which combinations of GVA, SHA11, and GHGE influence the level of migration (
Table 5).
The results highlighted in
Table 4 demonstrate a high predictive ability of the model. The determined correlation coefficient is 0.9075, which indicates a very strong correlation between the predicted and actual migration values. The Mean Absolute Error (MAE) is 2.37% indicating a low level of deviation between predicted and actual values. The Root Mean Squared Error (RMSE) is 5.08 signalling robust model accuracy. Of the 378 instances, the model directly utilised 280, ignoring 98 instances where the class was unknown. The 18 equations obtained show significant variations in the coefficients, reflecting the diversity in the structural relationships between economic variables and migration in the different economic and social contexts of the EU (
Figure 6).
The M5P model (
Figure 6) provides a granular structuring of the migration phenomenon, which cannot be reduced to a global linear function, but must be analysed as a succession of local behaviours, adapted to the specificities of each category of countries. It validates and reinforces the central thesis of the article that migration in the European Union cannot be explained by a univocal linear causality, but must be understood as a systemic phenomenon, resulting from the complex and non-linear interaction between economic performance, environmental pressures and social investments, in a deeply fragmented European context specific to the period 2010–2023.
5. Discussion
Analysing the coefficients in the 18 local regression equations highlights the heterogeneity of the determinants of migration, which differs substantially from one subset of data to another. This variation is not random but reflects specific contextual patterns of Member States or European regions. The constant negative coefficient on SHA11 (health expenditure) in all equations suggests that the low level of investment in health has a push effect on the population, inducing individuals to migrate to countries with better performing and better funded health systems. This reinforces the link between migration and the social pillar of sustainability, in the sense that deficiencies in essential public services (such as health) amplify forced mobility, even within the European Union. The GHGE (greenhouse gas emissions) coefficient is positive in most of the equations (in some cases even with high values, e.g., +10.19), indicating that migration is partly driven by environmental factors, especially in regions where pollution and climate risks affect quality of life. This finding is consistent with the recent literature on environmental migration but provides an empirical nuance specific to the EU context, where climate policies are becoming increasingly important in modelling population mobility. The GVA (gross value added) coefficient, which ranges from close to zero to 0.613, indicates a dependent relationship between economic development and migration. In some subsets, GVA has a marginal influence (e.g., coefficient ≈ 0.001–0.015), suggesting that economic development below the European average creates barriers to migration flows. In other cases, GVA has a significant influence (e.g., coefficient > 0.6), signalling that migration may be attracted to regions with dynamic economies and real opportunities for labour market integration. Thus, GVA acts either as a pull factor or as a context-neutral variable, depending on its interaction with the other determinants.
The training of the M5P classifier and the interpretation of the resulting equations allow to move from a conventional approach of migration—as a reaction to income differentials—to a holistic, multi-sectoral, and interdependent approach, where economy, environment, and public services together constitute structural determinants. The model argues that the social sustainability of the European Union depends on the ability to understand and manage the systemic relationships that influence population mobility directly and indirectly.
Building upon the results of the Expectation-Maximisation algorithm, which revealed four distinct clusters of EU Member States characterised by heterogeneous combinations of economic strength, environmental vulnerability, and migration intensity, this section outlines a set of policy directions tailored to the specific needs and structural conditions of each group. The proposed measures aim not only to address the immediate socio-economic disparities, but also to lay the foundation for an integrated and sustainable approach to migration governance at both national and European levels.
For countries included in Cluster 0, generally characterised by high gross value added, low greenhouse gas emissions and significant inward migration, policy efforts should concentrate on reinforcing institutional mechanisms for migrant integration. These include investments in inclusive education and health systems, the development of targeted labour market integration programmes, and the deployment of digital infrastructure in underserved regions to reduce intra-national disparities and demographic pressure on urban centres. At the same time, the increasing burden borne by these countries in managing migration flows calls for the consolidation of European burden-sharing mechanisms that can equitably distribute responsibilities among Member States.
Cluster 1, encompassing economically weaker countries with low levels of industrial emissions, such as Romania or Bulgaria, reveals a configuration marked by structural underdevelopment and moderate emigration. In these contexts, national policies should prioritise the absorption of cohesion funds through simplified administrative procedures, the strengthening of public infrastructure and health systems, and the creation of employment opportunities through regional development programmes. Additionally, supporting circular migration schemes and incentivising the return of skilled diaspora members may help counteract the permanent outflow of human capital and foster local resilience. Cluster 2 includes states where economic stagnation coexists with high environmental stress due to outdated industrial structures and insufficient decarbonisation policies. These countries require targeted interventions through the Just Transition Mechanism of the European Union, focusing on the greening of industries, the requalification of the labour force, and the development of adaptive social protection systems. Moreover, the establishment of regional early warning systems to monitor environmental health risks and their potential to trigger climate-induced displacement is essential to anticipate and mitigate future migration shocks. Cluster 3 comprises transitional economies with partial convergence and moderate socio-economic vulnerabilities, such as Portugal, Hungary, or Poland. In these cases, policies should promote a more balanced regional development model, decentralising infrastructure investment and fostering public–private partnerships aimed at strengthening social cohesion in disadvantaged areas. Labour market integration strategies tailored to intra-European migration flows, as well as the harmonisation of digital and healthcare service access between rural and urban zones, are critical to preventing new patterns of territorial fragmentation.
The differentiated vulnerabilities revealed by the clustering analysis call for a shift from uniform policy prescriptions to a more nuanced governance model, grounded in empirical evidence and responsive to regional specificities. Translating the identified macro-structural patterns into actionable policy pathways strengthens the analytical contribution of the study and enhances its practical utility. In doing so, the research advances not only academic understanding but also the policy capacity to respond to the challenges posed by economic inequality, climate pressure, and social fragmentation in a post-crisis European Union.
The use of a regression tree decision model, implemented within the analytical component of the information system, facilitates the structured segmentation of complex economic data according to relevant decision thresholds. This branching allows the identification of subtle conditional relationships between explanatory variables and population mobility dynamics, leading to the formulation of robust predictive models. As the model progresses along its branches, each internal node reflects a categorisation decision based on an economic attribute, and the leaves become inference points, where local linear regressions provide quantifiable estimates of the economic impact.
In addressing the first research question (Q1), the study reveals a set of complex non-linear relationships between macroeconomic indicators—specifically Gross Value Added (GVA) and Gross Fixed Capital Formation (GFCF)—and migration flows within the European Union, with greenhouse gas emissions (GHGE) serving as both a contextual modifier and a latent driver of population mobility. The findings suggest that economic growth alone does not uniformly reduce migratory pressures; instead, its impact is mediated by structural disparities, environmental constraints, and regional development asymmetries. The M5P regression model highlights the heterogeneity of these relationships across different subsets of countries, demonstrating that GVA may act as either a pull or a neutral factor depending on local configurations, while GHGE emerges as a significant positive predictor of out-migration, particularly in regions facing environmental degradation and low institutional adaptability. With regard to the second research question (Q2), the results confirm that disparities in access to essential services, such as healthcare expenditure (SHA11) and household internet access (HIAC), exert a measurable influence on migration decisions. The systematic underinvestment in health systems, particularly in peripheral Member States, is associated with strong push effects, as individuals seek both socio-economic stability and quality of life abroad. Similarly, gaps in digital infrastructure deepen territorial inequalities and constrain individual capabilities for socio-economic integration within their country of origin, thereby reinforcing migration incentives among the younger, digitally literate population segments. These findings underscore the need to conceptualise access to services not merely as passive background variables, but as active structural determinants in shaping mobility patterns and social resilience.
In response to the third research question (Q3), the application of clustering algorithms (K-Means and Expectation-Maximisation) has proven effective in identifying latent structures among EU Member States, resulting in typologies that distinguish between economically robust, environmentally sustainable cores and structurally vulnerable peripheries. The EM algorithm, in particular, captured a more nuanced segmentation into four clusters, each reflecting a distinct interplay between economic performance, ecological pressure, and migration intensity. These differentiated profiles provide a valuable empirical foundation for designing targeted policy interventions. Rather than adopting a uniform approach to migration governance, policymakers are encouraged to align their strategies with the specific structural configurations of each cluster, thereby enhancing the effectiveness, equity, and contextual relevance of public policy. The study, thus, advances not only the empirical understanding of migration dynamics, but also their translation into differentiated governance frameworks responsive to the Union’s internal heterogeneity.
The analytical results thus obtained are integrated into the decision support component of the information system through a mechanism of automatic updating of knowledge bases and decision rules. This process is achieved either by linking the model directly to an interactive dashboard or by exporting the decision rules and predictions to a standardised reporting module, as O7. In both cases, the interpretation of the tree logic is translated into a format accessible to decision makers, either in the form of risk scores or as policy recommendations differentiated by socio-economic categories. Thus, visual-analytical components (e.g., risk maps, scenario graphs) become an integral part of the system interface, facilitating interactive simulations and prospective scenarios. Through this integration, the information system acquires an increased capacity for contextual adaptation and anticipatory response, providing a coherent, empirically validated framework for the development of effective, evidence-based public policies in line with the dynamics of the contemporary economic and social environment. Through this integration, the information system acquires an increased capacity for contextual adaptation and anticipatory response, providing a coherent, empirically validated framework for the development of effective, evidence-based public policies in line with the dynamics of the contemporary economic and social environment. A technological architecture for an HII information system integrating the M5P model should be conceived as a modular, flexible, and scalable structure, capable of handling large volumes of heterogeneous data and providing advanced analytical support for the formulation of public policies in the field of migration and equitable access to services. This architecture is organised, according to O8, on four key functional layers: data collection and integration, data storage and processing, predictive analytics and, interactive decision support.
At its core, the system is fuelled by a data acquisition layer, comprising administrative sources (such as population registers, public service record systems, tax databases), statistical sources (Eurostat, OECD, NSI), and alternative sources such as geospatial data or information generated through civic participation. These data are automatically retrieved through APIs or secure channels and are subject to a rigorous cleansing, validation and transformation process in an ETL (Extract-Transform-Load) environment integrated with data quality control rules.
The processed data are stored in a central data warehouse or, in complex cases, in a hybrid data lake optimised for analytical queries. This storage layer is directly connected to the analytical layer, where the M5P model is implemented within a machine learning engine, capable of running iterative training, validation, and model tuning cycles. The M5P model analyses the relationships between economic, social and demographic factors, structuring the information into regression trees with linear functions in leaves, so that the results are not only accurate but also interpretable by decision makers.
The findings of this study align with and extend several strands of the international migration literature. Consistent with previous research on structural determinants of intra-European mobility [
124,
125,
126,
127], our results confirm the persistent influence of macroeconomic disparities—particularly in terms of GVA and investment levels—on East–West migration patterns. However, the study also diverges from some earlier models by revealing the significant role of environmental stressors, such as greenhouse gas emissions (GHGE), as push factors—a dimension less emphasised in traditional economic migration models. Moreover, the identification of distinct national clusters through the EM algorithm echoes the typological approaches of De Haas [
4], but adds methodological refinement by applying unsupervised learning to capture latent structures based on a multidimensional set of indicators.
Notably, while many existing studies have addressed Romanian migration in qualitative or country-specific terms [
52,
53,
55], our research contributes by offering a comparative and data-driven framework that situates Romania within broader EU-wide patterns. The segmentation of EU Member States into four clusters provides empirical evidence that supports—but also nuances—the often-assumed East–West dichotomy in European migration. This typology reveals the existence of hybrid profiles, where countries with similar economic output may differ substantially in terms of service access or environmental performance, thereby influencing migration in more complex ways than previously described. Thus, our results not only reinforce established findings, but also advance the field by proposing a replicable methodological approach for integrating economic, infrastructural, and environmental dimensions into the analysis of contemporary mobility dynamics.
The results generated by the model are further forwarded to the decision support layer, where they are processed in an advanced visualisation module in the form of interactive dashboards, thematic maps, customised reports, and policy simulators. The user interface is built in such a way as to allow not only access to descriptive results, but also the possibility to test prospective scenarios by modifying input variables and observing the estimated impact on indicators of social inclusion or access to services. The whole system is orchestrated through a microservices-oriented architecture, which allows for easy integration of new analytical modules or data sources and is secured according to European standards on personal data protection. The M5P model, being periodically trained on updated data, supports the adaptive and self-learning nature of the system, ensuring its continued relevance in a changing socio-economic landscape, according to O7. This proposed technological architecture enables the transition from retrospective to predictive and prescriptive analysis, where decisions on migration and social equity policies are scientifically grounded, transparent, and within a reasonable time horizon for intervention. Such an architecture needs to be underpinned by a data-driven governance system capable of anticipating migratory developments and responding in real time to changes in the economic and social context. The integration of predictive models, such as M5P trees, into decision-making processes would allow better calibration of public interventions, and the establishment of a European Observatory on Systemic Migration would contribute to the continuous monitoring and analysis of the interdependencies between economy, environment and mobility. We consider that the adoption of this set of differentiated policies, adapted to the fragmented structure of the European Union and supported by an advanced analytical framework, is essential for the management of migration not only as a mobility phenomenon, but also as an expression of the structural (in)sustainability of the European development model.
6. Conclusions
The research fully achieved its seven objectives by applying a rigorous methodology based on the intelligent processing of socio-economic data and the use of advanced machine learning technologies. The relationships between economic indicators and migration have been accurately quantified, highlighting non-linear influences and contextual variables, in particular the impact of GVA, GFCF, and SHA11 on population mobility. The effects of greenhouse gas emissions (GHGE) on migration were clearly delineated, confirming the role of environmental pressures in driving mobility, while unequal access to essential services was demonstrated as a major structural driver of social and economic imbalance. The identification of clusters of countries according to economic-migration profiles was carried out using K-Means and EM algorithms, revealing distinct configurations of systemic vulnerability in the EU space. The predictive model developed by the M5P classifier achieved a high level of accuracy, providing detailed explanations of the interaction between macroeconomic indicators and migration trends. The integration of these results into a decision information system has allowed the simulation of migration scenarios and the substantiation of differentiated public policies, contributing to the strengthening of economic and social security in the European Union.
The study makes a significant contribution to the understanding of migration in the European Union, approaching the phenomenon not as a linear consequence of isolated variables, but as the emergent result of a complex system of interactions between economic indicators, environmental factors and elements of social infrastructure. Through the integration of advanced machine learning methods, the research has overcome the limitations of traditional econometric modelling, providing a detailed and differentiated picture of the mechanisms that determine population mobility in the EU area. The application of the K-Means and Expectation-Maximisation clustering techniques led to the identification of significant structures in the analysed data. While the K-Means revealed a binary polarisation between the economic “core” of the EU and the vulnerable “periphery”, the EM algorithm allowed a finer segmentation into four clusters, capturing relevant nuances of the differences between Member States—from developed economies with high capacity to absorb migrants to weak and highly polluting economies with high risk of emigration. This categorisation highlights the existence of a deep economic and social fragmentation within the Union, which cannot be ignored in the formulation of European public policies. Moreover, the training of classifiers through the M5P model allowed the construction of a robust predictive framework capable of capturing local variation in the relationships between explanatory variables and migration. The 18 regression equations generated clearly show that migration is differentially influenced by the structural context of each state or group of states. The negative coefficients associated with health expenditure (SHA11) confirm that underfunding of social systems generates push migration pressures, while the positive influence of greenhouse gas emissions (GHGE) supports the hypothesis of the emergence of environmental migration within the EU. Gross Value Added (GVA), with a variable coefficient across subsets, shows that economic development can be, depending on the context, both a pull factor and a neutral or even inhibiting factor for mobility. The methodological and empirical findings of the study support the need to reconceptualise migration as an indicator of social sustainability. Population mobility reflects not only differences in development between states, but also the degree of access to essential public services and the capacity of states to ensure a stable economic and ecological environment. In this respect, the models applied in the research provide not only a rigorous interpretation of the phenomenon, but also an applicable tool for the foundation of differentiated policies capable of responding to the structural fragmentation of the Union.
By using an advanced computer architecture, centred on an information system integrating the M5P model, the study manages to overcome the limitations of traditional econometric models, providing an analytical framework capable of capturing contextual variability and non-linear relationships between socio-economic indicators and population mobility. The M5P model, integrated in the analytical layer of the IIS, allowed the construction of a set of regression equations tailored to the specificities of each data subset, thus highlighting the differential way in which factors such as Gross Value Added (GVA), health expenditure (SHA11) and greenhouse gas emissions (GHGE) influence migration flows. Empirical findings suggest that GVA does not have a unidirectional effect, but acts according to the structural context, sometimes as a pull factor, sometimes as a deterrent to migration. Similarly, low health care expenditure confirms its push factor character, while GHGE points to the emergence of environmental migration as a complementary dimension of mobility in the EU space.
The integration of K-Means and Expectation-Maximisation classification techniques has enabled a robust segmentation of Member States according to vulnerabilities and absorption capacities, identifying not only traditional centre-periphery divides, but also areas of economic transition and latent social fragility. This complex classification translates directly into the DSS architecture, where the results can be visualised through interactive dashboards and fed back to policy makers in the form of risk scores or differentiated policy intervention scenarios. Methodologically, the study demonstrates the added value of using the M5P model as part of a modular and scalable decision-making platform, which not only offers advanced analytical capability, but also the possibility to continuously update predictions according to new available data. This adaptive nature of the system is essential in a European context characterised by economic volatility, social pressures and accelerated environmental transition.
At the same time, the analysis highlights the current limitations of static modelling, such as the absence of the time dimension and the lack of completeness of some datasets. These limitations will be addressed in future research by extending to the regional level (NUTS 2) and integrating time-series machine learning models capable of capturing the lagged effects and causal dynamics of public policies. A key limitation of the present study lies in the restricted scope of variables employed, which primarily encompass economic, infrastructural, and environmental determinants of migration. While these factors offer significant explanatory power, they do not capture the full complexity of migration dynamics, particularly the socio-political and cultural dimensions that shape individual and collective decisions to migrate. The exclusion of variables such as political stability, institutional quality, cultural proximity, and migration governance frameworks limits the study’s capacity to reflect the broader systemic forces at play. Recognising this limitation, future research will seek to expand the analytical framework by incorporating indicators related to political regimes, policy responsiveness, and social cohesion. Also, while the current model provides a robust snapshot of migration determinants, its integration with longitudinal data remains a key avenue for future research. Such an enhancement would allow for the modelling of dynamic trends, temporal shocks, and policy lags, offering greater insight into the evolving interplay between migration, economic structures, and institutional responses across the EU.
The results obtained provide a solid basis for the development of operational tools, such as an early warning system or a structural migration scoreboard, useful for national and European policy makers in anticipating and calibrating cohesion, integration and sustainability policies. Thus, the study proposes a new conceptual and operational framework for re-conceptualising migration as a strategic indicator of the performance of socio-economic systems, contributing to the foundation of fair, proactive and evidence-based public policies in a European Union undergoing profound transformation.
Beyond its immediate empirical contribution, this study offers broader international implications for understanding migration dynamics in the European Union. By integrating economic, infrastructural, and environmental indicators into a data-driven clustering framework, the research moves beyond binary distinctions such as East–West or North–South migration and instead proposes a multidimensional typology applicable to other regional contexts facing similar disparities. This approach holds relevance for countries both within and outside the EU where migration is shaped by structural inequalities, uneven access to services, and environmental stressors. The use of machine learning techniques such as M5P and Expectation-Maximisation further demonstrates the potential for combining computational methods with policy-relevant inquiry, thereby contributing to the growing body of literature at the intersection of migration studies and data science. The study also offers a foundation for follow-up research. Future authors may build on these results by incorporating longitudinal data to trace temporal dynamics and migration shocks more precisely, or by extending the variable set to include socio-political dimensions such as institutional quality, policy responsiveness, or indicators of social cohesion. Furthermore, qualitative or mixed-method studies could complement the current model by capturing subjective experiences of migrants—especially in relation to service access, digital exclusion, and perceived inequality—thereby enriching the explanatory framework and enhancing its applicability to real-world governance challenges. Nevertheless, the study is not without limitations. First, the cross-sectional nature of the data constrains the capacity to capture temporal shifts, causal mechanisms, or policy effects that unfold over time. Second, the availability of harmonised data restricted the inclusion of variables related to political stability, cultural integration, or migration governance, which are known to significantly influence mobility decisions. Third, while the clustering approach provides a useful macro-level typology, it does not fully account for intra-country variation or sub-national disparities, which remain critical in understanding the geography of migration pressures. Recognising these limitations, the study should be viewed as a conceptual and methodological starting point—one that invites further empirical refinement and interdisciplinary dialogue across migration research, public policy, and computational modelling.