Dataset of Multi-aspect Integrated Migration Indicators

Nowadays, new branches of research are proposing the use of non-traditional data sources for the study of migration trends in order to ﬁnd an original methodology to answer open questions about the human mobility framework. In this context we presents the Multi-aspect Integrated Migration Indicators (MIMI) dataset, an new dataset of migration drivers, resulting from the process of acquisition, transformation and merge of both ofﬁcial data about international ﬂows and stocks and original indicators not typically used in migration studies, such as online social networks. This work describes the process of gathering, embedding and merging traditional and novel features, resulting in this new multidisciplinary dataset that we believe could signiﬁcantly contribute to nowcast to forecast both present and future bilateral migration trends.


Introduction
In the last years the pursuit of original drivers and measures is becoming an increasing requirement to migration studies, considering the new methods and technologies used to characterize and understand human migration phenomenon.Many researchers [35,7,2,10] have proposed to employ non-traditional data sources to study migration trends, including so-called social Big Data such as online social networks.The usefulness of exploiting unconventional data sources for better understanding migration patterns, as well as the benefits of merging knowledge from both traditional and novel datasets, have already been proven [35].This unconventional approach is intended to find an alternative methodology to ultimately answer open questions about the human mobility framework (i.e.nowcasting flows and stocks, studying integration of multiple sources and knowledge, and investigating migration drivers).Nevertheless, in this context of meaningful combination of the conventional and the original, many types of data exist, still very scattered and heterogeneous: in the variety of this background, integration is not straightforward.
For this purpose we propose a tool to be exploited in migration studies as a concrete example of this new integration-oriented approach: the Multi-aspect Integrated Migration Indicators (MIMI) dataset.It includes both official data about bidirectional human mobility (traditional flow and stock data) with multidisciplinary features and original indicators, including the Facebook Social Connectedness Index (SCI), which measures the relative probability that two individuals across two countries are friends with each other on Facebook.The inclusion of SCI in the dataset enables it to be exploited as a non-traditional way to describe, understand and nowcast international migration.The combination of this index with socioeconomic variables measuring the similarity of two locations (such as per capita income, religiosity and language) already appeared in [4,5] where it has been shown that pairs of locations that are more similar on these dimensions share more friendship links.Nevertheless a similar approach on country level is still missing; moreover, such observations and conclusions about SCI have never been exploited in migration studies.For this reason, our aim is to use this "homophily" concept (defined as the empirical regularity with which individuals are more likely to be associated with other individuals of similar characteristic) [28], that literature has already linked to Facebook social connectedness [4,5], to present a new dataset useful for better understanding country-to-country human mobility trends.

Motivation
MIMI is an open dataset that provides multidimensional information about several traditional and non-traditional aspects related to human mobility phenomenon.Thanks to this variety of knowledge, experts from several research fields (demographers, sociologists, economists) could exploit MIMI to investigate the behavior of many drivers and relate it to migration trends, so as to build a comprehensive overview and understanding of them.
As an example, it could be possible to access existing correlations between original sources of data and traditional migration measures, explore and investigate them and try to identify any possible causal relationship.Moreover, it could be possible to develop complex models able to assess human mobility framework by evaluating related interdisciplinary drivers, as well as models able to nowcast and predict traditional migration indicators in accordance with original features, such as the strength of social connectivity.By means of these algorithms, companies and researchers could find an alternative methodology to answer open questions about emerging mobility trends.
Human migration is a complex phenomenon characterized by several related factors.It is also ancient as human history, and it has been widely studied, explored and described over time.However, the technological advancements and the rapid and drastic changes that society faced in the 21st century have impacted on the human mobility phenomenon, which consequently has undergone radical modifications.We believe that taking into account this same information about society changes and technological progress (such as economic, cultural and social big data) can be an effective strategy nowadays to detect new trends in bilateral migration and to better understand and nowcast it.The motivations for building and releasing the MIMI dataset precisely lie in this need of new perspectives, methods and analyses that can no longer prescind from taking into account a variety of new factors.The heterogeneous and multidimensional sets of data present in MIMI offer an all-encompassing overview of the characteristics of international human mobility, enabling a better understanding and an original potential exploration of the relationships between migration and non-traditional sources of data.

Data description
The MIMI [21] dataset version 1 (March 15, 2022) was released under the Creative Commons Attribution 4.0 International Public License (CC BY 4.01 ) and is publicly available on Zenodo (10.5281/zenodo.6360651).It consists of a single file containing more than 28,000 entries (records) and 480 different features.In this section we provide all the dataset specifications and describe the structure of the CSV file in detail, as well as how each feature was built.

Data files and format
The MIMI dataset is made up of one single CSV file that includes 28,725 rows and 485 columns.The index consists in uniquely identified pairs of countries, built from the join of the two ISO-3166 alpha-2 codes of origin and destination country respectively.Indeed, the dataset contains as main features country-to-country bilateral migration flows and stocks, together with the Facebook strength of connectedness of each pair.

Geographical coverage
The dataset comprises migration features and social strenght of Facebook connectedness for 254 different countries belonging to the following macro-areas: North America, South America, Europe, Asia, Africa, Oceania, Antarctica.

Temporal coverage
Since our work does not focus on the study of migration phenomenon per sé but on its possible relationship with social networks, in particular with the use of Facebook, the choice of the time range has been calculated accordingly.Therefore, the initial decision was not to select migration data antecedent to 2004.However, our intention was to make available a tool that could also be useful for the study on the differences between contemporary and past trends (e.g.alterations of some phenomenons, consistent changes of values compared to the past, consequences of previous data on the last few years, etc...): for this reason some features have been selected starting from 2000.Certainly, data selection according to predetermined temporal ranges always depends on the availability of sources: for example, during our data collection phase, Eurostat was not providing information about population density of countries before 2008.Table 1 provides a detailed temporal coverage of each time-related feature, apart from SCI for which we included the only one made available (the latest, which refers to October 13, 2021, updated in December 15, 2021).

Features definition
In this section we are going to list all the indicators included in the MIMI dataset, then we will describe them in detail in the following section.Table 2 contains a complete declaration of all drivers, grouped and categorized by context ("feature area").The column "Name" contains the identifier of each feature: since it would not be possible to list all features, a more compact replacement rule is presented in order to include them all in the table.From this simple rule it is possible to derive the exact name of each single indicator.The column "Name" should be read as follows: the invariant part of the identifier is static, while the interchangeable part must be substituted as explained below in order to obtain the exact name of the feature.
• country should be replaced with origin or destination.
• year and start-end should be replaced, respectively, with the reference year (in case of annual feature) or reference year range (for NET migration and NET migration rate features).Substituted values should be consistent with the temporal coverage available for each indicator, which can be found in Table 1.• source allows UN and ESTAT as replacement values.
• sex should be substituted with F, M or T (respectively, female, male or both).
• age allows only T as replacement value for data obtained from UN (both flows and stocks), while it can take four different values for ESTAT flows: T (total), <15 (less than 15 years), 15-64 (from 15 to 64 years), >65 (65 years or over).
Some examples are provided in Table 2 footnotes.

Features description and sources
In this section we are going to describe in detail each single feature listed in the previous section, also reporting all the data sources: some indicators may have multiple sources since they were necessary to better integrate missing values.
As stated in Section 2, the purpose of the integration of all these different drivers in the MIMI dataset is to allow the exploration of any of their possible connections with the international migration phenomenon, and eventually exploit them to better understand and nowcast it.
• Index (feature 1 in Table 2).The index consists in uniquely identified pairs of countries, built as follows: ISO2 code of origin country -ISO2 code of destination country (e.g.AL-FI index indicates records related to migration from Albania to Finland).Pairs having the same country codes for origin and destination indicate the so-called "returners" (e.g.BE-BE record represents people that were born or have citizenship in Belgium which moved their residence in Belgium in the reference year).
• Facebook data (feature 2 in Table 2).This indicator represents one of the most non-traditional feature (i.e.social media data) within the context of migration studies that we included.It consists in the so-called Facebook Social Connectedness Index (bit.ly/Facebook_SCI)publicly provided by "Data for Good at Meta"2 organisation on "Humanitarian Data Exchange, Data for Good" platform 3 .Country-to-country values of SCI are available in TSV format for more that 34,000 pairs, updated to December 2021 [32].
This indicator uses anonymized insights of active Facebook users and their friendship networks to measure the intensity of connectedness between locations [4].In this way, the resulting formulation in Equation 1 is a measure of the social connectedness between the two locations i and j, that is representative of the relative probability that two individuals across the two locations are friends with each other on Facebook: if SocialConnectednessIndex i,j is twice as large, a Facebook user in country i is about twice as likely to be connected with a given Facebook user in country j.
Specifically, in this work the concept of "locations" coincides with NUTS0 areas since our dataset only focuses on country-to-country bilateral migration.Nevertheless, SCI is also provided with respect to narrower geographical granularities, (e.g.NUTS2, NUTS3): we do not exclude future works focused on the study of migration trends at a smaller resolution (country-to-county, or county-to-county).
The SCI has a symmetric structure by definition of the concept of "friendship" and has been re-scaled to have a maximum value of 1,000,000,000 and a minimum value of 1.In our dataset, the minimum possible value was originally 0 (indicating pairs of countries for which the index was not available), subsequently replaced with an arbitrarily small value (chosen as half of the minimum available) in order to fix problems when computing Pearson correlation of the logarithmic SCI.
• Geographic features (features 3-15 in Table 2).These features portray and contextualize both origin and destination countries at geographical level providing all the necessary information to describe them, starting from the official codes and names, up to their land extent and how far they are.Specifically: features 5, 6, 7, 8 are ISO-3166 standards nomenclatures for country identification, retrieved from PyCountry Python module 4 and ISAN (International Standard Audiovisual Number) [23].
feature 3 consists in the pair code of origin continentcode of destination continent.Its functionality can be fully appreciated in chord diagrams of Section 5.
features 11, 12, 13 locate the position of the centroids of both origin and destination countries in a classic geographic coordinate system.They are gathered and integrated from Google DSPL [11] and from latlng() method of CountryInfo Python library 5 , and then merged together in a tuple (feature 13) built as a specific GeoPandas data structure called "geometry array" 6 .
feature 4 is the measure of distance between origin and destination, computed starting from the tuple in feature 13 of both countries and using the geodesic formulation7 [27] provided by GeoPy Python library 8 .It has already been observed in [4] that, at county level, much of the estimated effect of distance on migration might be coming from the relationship between distance and social connectedness: therefore the use SCI indicator could better explain the variation of migration flows than geographic distance alone can.
feature 14 consists in the list of countries that share a border with the given country.The utility of this feature is to find out if the two countries of origin and destination share a border, using a straightforward function to check if a country name (feature 6) is contained into the list of neighbors of the other, and vice versa.An additional binary feature (e.g."neighbors", having value True or False) could be derived from this method.Countries having empty list are islands.The corresponding sources for this feature are the following: GitHub repository in [20], borders() method of CountryInfo Python module9 and Wikipedia [46].
feature 15 is the measure of the area extension of the country in squared kilometers.It is gathered from The World Bank [37] and integrated with area() method of CountryInfo Python module 10 .
• Interdisciplinary indicators (features 16-25 in Table 2).Some of these drivers are considered non-traditional data in the context of migration studies since their use in migration understanding and nowcasting is poorly documented in literature.Despite this, most of the available studies consider these features as relevant in such context, as they are related to the behavior of international migration trends.
feature 17 is an indicator that provides per capita 11 annual values for gross domestic product (GDP) of a country, expressed in current international dollars and converted by purchasing power parity (PPP) 12 conversion factor.Data is retrieved from The World Bank [36].The gross domestic product is one of the "Development Indicators", already widely used in literature in combination with global migration.
features 16, 18 correspond to two lists containing, respectively, the most practiced religions, and the most spoken languages in the country (both including official ones and minorities).
The benefit of including these columns would be to discover if the two countries of origin and destination share some languages or religions (or both), since this could favor a migratory exchange between the two.Rare languages and religions used only in one country and not shared with any other have been removed as meaningless for our purposes.
Languages have been gathered from Wikipedia [47] while religions comes from DataHub [8] and have been integrated with Wikipedia data [48].
features 19, 20 indicates the quantity (respectively, as absolute number and as percentage of the total population) of Facebook users that a given country has.The source is World Population review [50], which refers to the latest available measure for each country (oldest date back to December 2020).
features 21-25 represents Cultural Indices of a location, intended as dimensions along which cultural values of that location can be analyzed [26].Their origin dates back to the work of [22] although, over the decades, independent research branches led to the creation and addition of new ones [49].Our work includes five of these indicators, of which we provide a brief individual description.Their applications in literature have been several (e.g.cross-cultural studies using Twitter data [3]), but the purpose of their inclusion in the MIMI dataset is to use them in an original way: our intention is to explore and understand their possible relation with international migration trends.Data about cultural indicators are available in different NUTS levels but in our work they only appear related to NUT0 (country) level since it is the only one that fits our geographic viewpoint.
Features 21-25 are the result of the integration of the two different datasets [24,9].Unfortunately, they are provided only for 66 of the more than 250 available countries but, despite this, most of them have already shown to be strongly involved in migration trends (see the behavior of their correlation values with the absolute number of migrants of a country, in Section 5.1).
Starting from cultural dimensions of both countries of origin destination, a new feature about cultural distance could be obtained: datasets with this configuration already exist [26,25] despite, at the moment, data is available only for a third of the countries (22 in total).* feature 21 is Power distance indicator (PDI) which is defined as "the extent to which the less powerful members of organizations and institutions (like the family) accept and expect that power is distributed unequally" [49].This index describes the extent to which hierarchical relations and unequal distribution of power in organisations and societal institutions are accepted in a culture.
* feature 22: Individualism indicator (IDV) 13 (as opposed to collectivism) explores the "degree to which people in a society are integrated into groups" [49]: it reflects the extent to which people prefer to act as individuals rather than as members of a community.
* feature 23 is Masculinity indicator (MAS), defined as "a preference in society for achievement, heroism, assertiveness and material rewards for success" [49]: as opposed to femininity, this dimension reveals to what degree traditionally masculine societal values, such as orientation towards accomplishment, prevail over values such as modesty, solidarity or tolerance.
* feature 24 is Uncertainty avoidance indicator (UAI) defined as "a society's tolerance for ambiguity", in which people embrace or avert an event of something unexpected, unknown, or away from the status quo [49].
* feature 25: Long-term orientation indicator (LTO) associates the connection of the past with the current and future actions/challenges.A lower degree of this index (short-term orientation) indicates that traditions are honored and kept [49].
• Demographic features (features 26-33 in Table 2).These features correspond to traditional migration and population measures obtained from official statistics, either from national censuses or from the population registries.
feature 26: annual population stocks, defined as the number of persons having their usual residence in a country in a given year, are gathered both from UN Population Division [45] (from which only records with "Zero migration" variant were selected) and EUROSTAT [19]: these two sources often refer to different groups of countries so their mutual integration allowed to cover most of the countries of the dataset.Where both measurements were available for the same country, both were reported.The two sources refer to different methodologies, since the annual total population measurement is performed on July 1st by UN, while on Dataset of Multi-aspect Integrated Migration Indicators D. Goglia et al.
January 1st by EUROSTAT.However, their ∼1 correlation value proves that the two measures, related to the same year, are well compatible and almost interchangeable: indeed missing values related to the former have been replaced with the latter, and vice versa.
feature 27 represents annual population density, defined as the ratio between the annual average population and the land area.Therefore, its unit of measure correponds to "persons per square kilometre".Data has been retrieved from ESTAT [18].
feature 28, 29: absolute number of migrants (respectively, immigrants and emigrants) per country.Data was taken from ESTAT [14,12] and from UN datasets on flows (see below feature 32) selecting, from these latters, records having "Total" as country (respectively, origin and destination country).
features 30, 31 indicate quinquennial NET migration and NET migration rate of each country.
The former is the difference between the number of immigrants and the number of emigrants in a given area during the reference year, while the latter is defined as the NET migration per 1,000 persons and so it indicates the contribution of migration to the overall level of population change.A positive value for them indicates that there are more migrants entering than leaving a country (NET immigration), while a negative one means that emigrants are more than immigrants (NET emigration).
Values have been taken from UN Population Division [41,40]: note that they apply also for EUROSTAT countries, and they have been widely used in literature in combination with them, even if NET migration rate calculation is based on midyear population (as required by the standard UN methodology).
feature 32: yearly migration flows for each pair of countries are defined as the number of people that have moved the country (i.e. that changed residence).Unlike a static stock measure, flow data are dynamic, summarising movements over defined period and consequently allow for a better understanding of past patterns and the prediction of future trends [1].
Both EUROSTAT and UN divide migration flows into three categories: by residence [44,42,13,17], by citizenship [43,15] and by country of birth [38,16].This is true in EUROSTAT for both inflows and outflows, while in UN only for inflows, as UN outflows exist only by residence.For our purposes, however, we selected EUROSTAT outflows only by residence, since the ones by citizenship and by country of birth cannot properly be defined "flows", having missing destination country.
feature 33: quinquennial migration stocks for each pair of countries consist in the absolute number of migrants residing in the destination country at given time.Data is obtained from UN [39] and includes stocks by sex and age.

Methods
The entire work was performed in Python 3.8 language, with the aid of Jupyter software 14 .
The initial phase consisted in data collection and acquisition, starting from the exploration of open source portals and proceeding with data selection and download.Initially, only migration flows data were imported.
Then a pre-processing phase started, where we carried out data understanding, cleaning and preparation.This has been managed by defining some functions that automatically clean and prepare source datasets.
Here our data was subjected to various computational standard processes (such as outliers detection, duplicates handling, uniforming notation, etc. . .).Some of the operations that have been performed at this level included the selection of task-relevant data (detection of country-to-country valid records, aggregation removal, and non-bilateral flows elimination).
Data transformation phase was fundamental to reshape the data in order to resemble the final structure (previously established by our design choices) so that to have a huge matrix with pairs of countries as rows.
Concretely, this meant converting, grouping, and unstacking records of source datasets in order to transform them in features (columns).We continued on shaping this framework by working on indexing: to obtain the dataset index we described in Section 3.2.2,duplicates of pairs of countries where not admissible.For this reason, specifically with respect to EUROSTAT flows, we established a priority for selection of pairs: the union of keys (pairs) was taken firstly selecting migration by citizenship, then by residence, and lastly by country of birth.
The following step was data integration were we collected, included and computed all other indicators.Geographic and interdisciplinary features related to single countries (5-25 in Table 2) have been processed in a separate dataset since, neither containing demographic data nor information about couples of countries, it can be reused in different contexts where needed.This countries.csvdataset has undergone the same pre-processing pipeline, but not the trasformation one, since it has its own structure and design: it was then merged 15 with the MIMI prototype previously obtained (already structured according to our needs) by matching both countries of origin and destination.
Finally the latest features (2-4 and demographic 26-33, in Table 2) were integrated by computing them or following the previously described merging process, matching single countries or pairs when needed.Once integration has been completed, it has been helpful to check data semantic and statistics of the resulting dataset and make some random inspections in order to verify the need for a further cleaning step.
The final data quality assessment phase was one of the longest and most delicate, since many values were missing and this could have had a negative impact on the quality of the desired resulting knowledge.They have been integrated from additional sources reported, for each feature, in Section 3.2.2.

Usage notes
In this section our focus is on documenting and describing salient patterns in distributions and correlations of data.We do not seek to provide causal analyses, nor do we want to imply causal relationships at this stage: however we believe it can be useful to analyze the obtained numerical results since they may guide possible future research and led to some interesting progress in human mobility studies.
Unless otherwise specified, correlation values have been computed as simple Pearson's correlation [34], measuring the linear relationship between two variables: values of -1 or +1 imply an exact linear relationship, while 0 implies no correlation.P-values have been computed in order to confirm of refute the relevance of each correlation value: results are indicated in heatmaps with a number of asterisks proportional to the relevance obtained.
no asterisks no relevance p-value ≥ 0.5 * little relevance 0.1 ≤ p-value < 0.5 ** medium relevance 0.01 ≤ p-value < 0.1 *** high relevance p-value < 0.01 When no asterisks are reported for all the values in the matrix, all the correlations computed are highly relevant, meaning p-values always below the threshold of 0.01.

Data statistics, distributions and correlations.
In this section we provide some practical examples of how to explore the data.Despite the impact of COVID-19 pandemic on international human mobility, mostly related to travel restrictions and "stay-at-home" measures which reduced internal movements within a country [31], Figures 9 and 12 confirm that the numbers in migration flows statistics did not suffer.However, a consistent flow of returners can be noticed for Thailand, probably due to COVID-19 itself, since in 2020 the pandemic prompted the return of hundreds of thousands of migrants to their countries of origin [33].
Regarding migration stocks in Figure 10, the impact of COVID-19 on the global population of international migrants is difficult to assess, since the latest available data refers mid-2020, fairly early in the pandemic.However, it is estimated that the pandemic may have reduced the growth in the stock of international migrants by around two million [31,29].
Moreover, almost all these pairs of countries are included in the "top 20 international migration country-tocountry corridors, 2020" list in the World Migration Report 2022 [31], (e.g.Mexico -United States, Syria -Turkey, India -Saudi Arabia, United Arab Emirates and United States, Afghanistan -Iran, Myanmar -Thailand), meaning that the greatest communities of permanently residing migrants in a host country have developed over years for safety reasons.
Boxplots in Figures 11,12 and 13 display the statistical distribution of migration flows and stocks values over the years, divided by sex.Increasing trends and regular patterns over time are well recognizable from the timeseries data plotted, as well as the statistics evidence on male migration that reveals largest numbers with respect to female one (about gender dimensions on human mobility refer to [30]).
Heatmap in Figure 14 shows correlations between the computed ratio of total migrants and total population of a country and its cultural indicators, while Figures 15 and 16 correspond to the outcome of the division of that heatmap in immigration and emigration with the mapping of annual correlation values in a bidimensional plane.
Values of almost all indicators seem to initially lie mostly in the upper zone of the plane, showing a quite strong positive correlation with emigration, until some breakpoint years occur and the correlation value becomes henceforth highly negative.This radical change in trend cannot yet be supported and explained by a causal relation, so we limit ourselves to report its behavior.Concerning correlation related to immigration, they lie on the middle region of the plot, quite far from the range in the upper and lower extremities, therefore assuming less polarized values.Besides, there are no trend reversal for it.
Correlation between NET migration rate and GDP of a country shown in Figure 17 confirms the existing relation, well documented in literature, between these two variables.Correlation is always positive, meaning that countries with high GDP face a NET immigration trend and so confirming that high per capita income are conducive to mobility [6].Specifically, human mobility is influenced by GDP values up to more than 10 years back.
Heatmaps in Figure 18 illustrates the trends in Spearman correlations over years between EUROSTAT migrations flows and UN migration stocks.Although the existing correlation between stocks at a given time t and flows relative to previous years is self-evident (as those same flows will be included in the total counting of stocks), it is interesting to notice that quite strong positive correlations also propagate forward in time: this could mean that the higher the stock count at a given time t, the more migration flows will be shared by the pair of countries.
Finally, Figure 19 explores the changes in trend of NET migration rate for a small sample of countries.
Figures 1, 2, 3 and 4 show some interesting insights about the distribution of values and the top coverage on global scale of Social Connectedness Index.

Figure 2 :
Figure 2: Density plot of SCI with logarithmic x axis.It shows a strongly right-skewed distribution, meaning that the smallest values of the indicator are the most frequent.

Figure 3 :
Figure 3: Sample of the highest value of SCI, over 99 quantile.It displays countries pairs with the highest strength of Facebook connectivity.

Figure 4 :
Figure 4: Facebook strength of connectedness among continents (averaged aggregation of SCI for each couple of countries in the continent).Very high values of connectivity can be noticed in Oceania, Africa and South America.Intra-continental connections are much stronger than inter-continental connections, confirming that the intensity of friendship links is strongly declining in geographic distance[4].

Figure 5 :
Figure 5: Inter-continental migration flows from UN (left) and EUROSTAT (right) in 2010.We point out a strong intra-continental mobility for Asia and Europe but also relevant immigration trends in Oceania and Europe.In contrast, Asia and South America are subject to almost only emigration.

Figure 6 :
Figure 6: Inter-continental migration flows from UN (left) and EUROSTAT (right) in 2014 .Asia and South America remain continents with a strong emigration, bound for the same destinations as in previous years and which appears even to increase in inter-continental trends.

Figure 7 :
Figure 7: Inter-continental migration flows from UN (left) and EUROSTAT (right) in 2018 is declining with respect to previous years: Oceania experiences far fewer incoming migrants as well as Europe with outgoing ones.

Figure 8 :
Figure 8: EUROSTAT bilateral migration flows in the most recent year available (2019): pairs of countries with the highest numbers of migrants sharing.

Figure 9 :
Figure 9: UN bilateral migration flows in the most recent year available (2020): pairs of countries with the highest numbers of migrants sharing.

Figure 10 :
Figure 10: UN bilateral migration stocks in the most recent year available (2020): pairs of countries with the highest numbers of permanent residing migrants.

Figure 11 :
Figure 11: Distribution of migration flows from EUROSTAT.Male migration is always higher than female migration for each annual measurement, while the general trend over time is a slight increase of the migration phenomenon.Two drops in the progressive grown of values can be identified, corresponding to triennium 2012-2014 and to 2017.

Figure 12 :
Figure12: Distribution of migration flows from UN.The increasing trend encountered in the previous chart is not present for these distributions, where instead it is possible to notice a regularity in the behavior over time: a gradual descent takes a few years (which ends coincide with the drops in the previous plot) and then have a sudden peak of ascent.The discrepancy between male and female migration is sharper.

Figure 13 :
Figure 13: Distribution of migration stocks.The five year measurement prevents you from having a more detailed look as it was for the flows: nevertheless, an increase in the general trend over years is quite evident.

Figure 14 :
Figure 14: Correlation between immigrants / emigrants over years and cultural indicators of a country.Absolute numbers of total migrants have been divided by the annual total population of the country.Strong positive values are indicated in red while strong negative values in blue.Asterisks indicates the relevance of the p-values obtained, as described in Section 5.

Figure 15 :
Figure 15: Distribution of correlation between total immigration and cultural indicators.Immigrants for each year are expressed as ratio with respect to the total population of the country for the same year.

Figure 16 :
Figure 16: Distribution of correlation between total emigration and cultural indicators.Emigrants for each year are expressed as ratio with respect to the total population of the country for the same year.

Figure 17 :
Figure 17: Correlation matrix between annual GDP per capita and five-year NET migration rate of a country.

Figure 18 :
Figure 18: Spearman correlation between migration flows and stocks, divided by sex.

Figure 19 :
Figure 19: Evolution of five-year NET migration rate over time, for a sample of countries.

Table 1 :
Temporal coverage of each time-related features."End" always refers to the latest available measure.For all the abbreviations refer to Section 5.1.

Table 2 :
Features list.The exact name of the each single indicator can be retrieved by following the rule explained in Section 3.2.1.