Anatomy of a Misfit: International Migration Statistics

Migration is one of the key aspects of the Sustainable Development Goals (SDGs). To understand global migration patterns, develop scenarios, design effective policies, focus on the population’s needs, and identify how these needs change over time, we need accurate, reliable and timely data. The gaps in international migration data have persisted since international organizations collect data. To improve the data gaps, there is a need to conceptualize the types of gaps and pinpoint the gaps within the international data systems. To that end, the ultimate objective of this paper is twofold, (i) to review and categorize the gaps in the literature and (ii) assess the statistical data sources, i.e., United Nations Department of Social and Economic Affairs (UN DESA), Organization for Economic Co-operation and Development (OECD), International Organization for Migration (IOM), Eurostat, and the United Nations High Commissioner for Refugees (UNHCR). Our results demonstrate that the gaps could be categorized under (1) definitions and measures, (2) drivers or reasons behind migration, (3) geographic coverage, (4) gaps in demographic characteristics and (5) the time lag in the availability of data. The reviewed sources suffer from the gaps, which are not mutually exclusive (they are interlinked): the quality and availability of both migration flows and stocks data vary across regions and countries, and migration statistics highly rely on immigrants’ arrival.


Introduction
Migration deserves its place at the top of the scientific and socio-political agendas due to being a consistent feature of every era. Not surprisingly, migration is considered as one of the key aspects for achieving the Sustainable Development Goals (SDGs) [1]. The 2030 Agenda for Sustainable Development, adopted in 2015, is the first international development framework to include and recognize migration as a dimension of development. 11 out of 17 SDGs contain targets directly related to migrants, migration and mobility, and demand data to measure the progress towards the achievement of numerically specified targets [2]. Hence, the new global development framework has posed enormous challenges on national statistical offices to review the existing concept on migration, explore the possible sources of information, generate migration relevant indicators, and report them regularly in a timely fashion. Most recently, the crisis associated with several large movements of asylum seekers and irregular migrants into Europe furthered the issues of migration as a critical global agenda. The New York Declaration for Refugees and Migrants recognizes that there are many gaps in our knowledge about migration due to the lack of data on the subject [3]. Although many national states, international organizations, e.g., the United Nations (UN), International Organization for Migration (IOM), Organisation for Economic Co-operation and Development (OECD), and NGOs, have been collecting data on international migration, the gaps in data have persisted for many decades. In recent years, there have been, to some extent, improvements in the availability, quality and comparability of data on international migration [4]. The UN gathered and made the estimates of migrant stock disaggregated by age, sex, origin and destination available for over 230 countries and areas in the world, covering from 1990 to 2019 [5]. Additionally, the European Commission and the Council agreed on an action plan in 2005 to take measures to improve the common analysis of migratory phenomena in all their aspects, such as reinforcing the collection, provision, exchange and efficient use of up-to-date information and data [6].
The Global Compact for Safe, Orderly and Regular Migration (It is an intergovernmentally negotiated agreement, aims to cover all dimensions of international migration in a holistic and comprehensive manner) comprises 23 objectives for better management of migration at the local, national, regional and global levels, the first objective (https://www.un.org/en/ga/ pdf/guidelines_submit_draft_proposals.pdf, accessed on 15 July 2020) aiming at improving data collection and analysis. The improvement of the quality of international migration data has been demanded by international organizations, NGOs and national authorities, researchers and policymakers. Inadequate data not only hinder decision-makers around the world from developing effective policies but also lead to miscalculations and make it difficult to navigate the field. Hence, to enhance migration statistics, it is essential to have timely, comparable, reliable, and effective data on migration to have better estimates and indicators, to advise policymakers in developing evidence-based policies and action strategies for undertaking migration aspects of the SDGs. The UN, for example, in its 2016 Secretary-General's report explicitly stresses bridging the gaps in data besides highlighting the insufficiency of data on migration [3]. Shortcomings in migration data result in a huge amount of existing data being unexploitable for national governments and international organizations to understand current migration dynamics and draw relevant migration policies [5,7].

Long-Lasting Challenges with Migration Statistics
Despite the efforts by national governments and international organizations, the improvements in international migration data have not been a success. The existence of notable gaps in migration data has been broadly discussed in a scattered fashion in the work of almost every scholar and practitioner of the topic. Although these efforts brought to the attention of national states the importance of statistical data on migration to some extent, the changes failed to tackle the issue of incompatibility of the data, and countries kept their national definitions, most often not compatible with the UN's 1998 recommendations [8].
A significant share of the existing statistics that are collected by national governments is not compatible with the UN recommended standards, therefore, making it difficult to project rigorous information on migrations. The incompatibility could be related to various aspects of the data. Gathering the data under inconsistent definitions and measures challenges the comparability and harmonization of the data [5,[9][10][11][12][13][14][15][16][17][18][19]. Population totals and inclusion of populations based on varying demographic characteristics is another reason for the incompatibility of the existing data [10,16,19,20]. Varying data collection methodologies and coverage at the national and regional levels are also a substantial driving force in making the data incompatible [7,10,19,21].
Access to the data and dissemination is another obstacle on the way towards adequate migration statistics. The UN recommendations encourage states and statistical institutions to use the advances in information technology in making the data publicly available [19]. Moreover, accessibility is not merely about not sharing the data but also about the unwillingness to collect timely data for the purpose of research and policy [20]. Additionally, restrictions imposed by countries' data protection regulations partially limit access to micro-data and administrative data [10]. In many instances, the data are not disseminated and tabulated with useful details [5,13,14].
The differences in data collection at the regional and global levels are immense. Politically and economically challenged countries, especially in Africa and Asia, are not able to gather statistics on immigration and emigration [22]. The imbalances in data collection across regions complicate the calculation and measurement of cross-regional and interregional migrations [23]. However, to date, there is no systematic study investigating international migration data literature and statistical data sources to conceptualize the gaps and directly investigate the data sources for the gaps. Thus, the ultimate objective of this paper is twofold, first to conduct a systematic review of the most relevant literature on migration statistics to identify and conceptualize the gaps within the existing literature and second, to review the most relevant international migration statistical data sources.
The next section sheds light on the methodology used for this research, the third section categorizes the gaps based on the existing literature, section four systematically reviews the data sources based on the gaps assessed in section three, and section five discusses the gaps and challenges in data and provides conclusive remarks.

Materials and Methods
The purpose of this paper is to identify and categorize the gaps in the most relevant migration literature and statistical data sources. To that end, we conduct a systematic review of the existing literature and data sources. Systematic reviews are widely used for knowledge synthesis in fields such as medicine [24]. However, in recent years it is also used to harmonize research evidence in social and political sciences [25][26][27][28]. For the purpose of this paper, we use the definition of a systematic review by Pham et al. [28], Cooper et al. [9], and Gough et al. [29] presented in Berrang-Ford et al. [30] p. 756, where "a systematic review refers to a focused review of the literature that seeks to answer a specific research question using predefined eligibility criteria for documents and explicitly outlined and reproducible methods". Moreover, since the paper's objective is twofold, first to identify and conceptualize the gaps in the literature and second to review the existing data based on the gaps identified in the first step, we conduct a two-step systematic review. The first step follows the PRISMA systematic review guidelines from Moher et al. [31], where the researchers follow the guidelines described in the PRISMA statement.
The second step of our research consists of looking into the gaps directly in the statistical data sources to evaluate the data's quality based on the gaps identified within the literature in the first step of this research. However, the existing review methodologies do not offer a particular modus operandi for reviewing statistical data sources and data portals. Therefore, we extend the systematic review methodology in this dimension, not only for the purpose of this study but also for its use in future research. We specify that to review the gaps in statistical data sources and portals, it is important to (i) define and specify the gaps, (ii) identify the specific data sources and (iii) establish a repository of the meta-data of the chosen datasets. Defining and specifying the gaps will help to have clear objectives during the reviewing process, and the identified gaps could be used as tools to investigate the data sources. For this paper, the first step of the research will define and specify the gaps. Predetermining the specific statistical data sources for the review will prevent the overlaps in the process and clarify the scope of the review for the readers. In the statistical data, metadata is the key to understand and navigate the data; therefore, reviewing the data sources requires reviewing the metadata as well.

Criteria for Selecting Literature and Data-Sources
The literature-selection criteria follow the instructions provided by Davidson and Carlin [32], whereby the publications must be relevant, original and scientifically valid. To evaluate the gaps, we assembled the greatest number possible of academic articles and official reports by international organizations, i.e., the UN reports and declarations, IOM reports and research publications, International Labour Organization (ILO) papers, and books published by practitioners. The main literature-selecting criterion was the relevancy of the discussions on gaps in the data on migration. The original and scientifically approved literature that discussed the gaps and problems in the data were thoroughly reviewed to enable the categorization of the gaps. In the second step, while reviewing the data sources, the criterion considered was that the data sources must be presenting data on international migration. The data presenting organizations and the sources where the data were collected ought to be officially recognized. Hence, the data sources and portals chosen for the review are the data from the UN portals and agencies, i.e., UN DESA, IOM, UNHCR and the OECD and Eurostat. The reason for selecting these specific data-providing sources is because they are the main formal institutions that provide data, and they fit into the scope of the study, which is international and regional data-providing organizations.

Scope of the Study
Scope in the systematic reviews means to define the limits, boundaries and timeframe of the review within the literature [26]. To concentrate on the research's main objectives, we selected the literature for a period from 1920 to July 2020. The year 1920 marks the very first resolution on migration statistics by the International Labour Conference, submitted by ILO's commission on migration statistics. The resolution and recommendations of the conference particularly emphasized collecting international immigration and emigration data based on unified definitions, methodology and communication of information among the ILO member countries [22]. The period 1920-2020 may look very long for the review. However, during this time, there has been very little focus on improving the migration data, and for a very long time, the importance of collecting data on migration was neglected by international organizations, national authorities as well as academia. This issue has attracted attention only for the last few decades. Therefore, despite the long period of time, the literature on the gaps in data is scarce. In the context of this research, we limit our focus on the sources and literature at the regional and global levels, not at the national and sub-national levels.

Search Strategy
The first step of the review followed the PRISMA guideline on gathering and classifying the literature (Figure 1). The main databases used for the search were Google Scholar, IMISCOE Migration Research Hub, and Web of Science. A total of 942 articles, books, book chapters, official documents, policy papers and reports were gathered. After the process of the records screening, duplicates removing, and assessing the full articles for eligibility based on the criteria described in the previous section, a total of 76 items were selected for conducting the full review. Subsequently, the qualitative analysis software Atlas.ti (version 8.4.4) was used to compare and categorize the gaps. The main codes used to identify the gaps within the selected literature included "gap", "shortcoming", "incompatibility", "compatibility", "missing", "inadequate", "adequate", "challenge", "issue", "problem", "coverage", "consistency", "inconsistency", "comparability", "harmonization", "deficiency", "limitation", "constraint", "sufficiency", "insufficiency", "liability", "incomplete", "scarce", "incapable", "disparity", "existing" and "inexistence". The rationale behind using these specific codes is that these codes were often used to refer to a gap in data or where the data were missing, e.g., "inadequate data on demographic characteristics of migrants", or "data limitation on drivers of migration", and "inconsistency in definitions". These examples indicate the specific syntax within the texts to refer to a gap. Hence, all sorts of gaps in the text are expressed by one of these properties. Such combinations have been used in other fields as well to refer to data gaps [33,34]. The results were drawn after analyzing the full contexts of the sentences and paragraphs detected by the software.
The second step of the review, which was aiming at reviewing the statistical data sources, followed the guidelines on a systematic review of statistical data sources developed in the previous section. First, a wide range of private, governmental and international data sources (websites and data-portals) that provide statistics on migration were identified, and five, namely the UN DESA, UNHCR, IOM, OECD and Eurostat, were selected for the review. The reason for selecting these specific data sources was that they provide data as publicly available, are authorized institutions and receive their data from official sources, such as national and local governments. The data sources and the meta-data provided for each dataset were evaluated based on the gaps identified in the first step of the review.  The second step of the review, which was aiming at reviewing the statistical data sources, followed the guidelines on a systematic review of statistical data sources developed in the previous section. First, a wide range of private, governmental and international data sources (websites and data-portals) that provide statistics on migration were identified, and five, namely the UN DESA, UNHCR, IOM, OECD and Eurostat, were selected for the review. The reason for selecting these specific data sources was that they provide data as publicly available, are authorized institutions and receive their data from official sources, such as national and local governments. The data sources and the metadata provided for each dataset were evaluated based on the gaps identified in the first step of the review.

Results and Gap Assessment
We reviewed the selected studies to investigate what the scholars and practitioners describe as gaps, where the data are short and what types of data are needed for future research. The scholars and institutional officials discussed the data gaps based on their disciplinary focus and institutional work process on an ad hoc basis. This means that the gaps in data are discussed in a scattered fashion across the literature from various disciplines and fields of the study. We analyzed all the criticism on the data within the studies and tried to understand the rationalities behind the critiques.

Results and Gap Assessment
We reviewed the selected studies to investigate what the scholars and practitioners describe as gaps, where the data are short and what types of data are needed for future research. The scholars and institutional officials discussed the data gaps based on their disciplinary focus and institutional work process on an ad hoc basis. This means that the gaps in data are discussed in a scattered fashion across the literature from various disciplines and fields of the study. We analyzed all the criticism on the data within the studies and tried to understand the rationalities behind the critiques.
We coded 916 gaps within the 76 studies, and we then tried to sort out the gaps. The codes were evaluated based on the reference of the gaps towards certain aspects of the challenges in the data. For instances, the codes "incompatibility", "incomparability", and "harmonization" was often used to refer to the definitions and measurements, the codes "scarce", "inexistence", "limitation" were used to refer to geographical coverages and the codes "inadequate", "insufficient", "deficiency", and "limitation" were used to refer to demographic characteristics of the data. Figure 2 shows that out of 916 cases of the gaps discussed in the literature, 611 referred to the definitions and measures, 113 to demographic characteristics, 101 to drivers and reasons, 48 to geographic coverage and 43 to the timeliness of the data. "scarce", "inexistence", "limitation" were used to refer to geographical coverages and the codes "inadequate", "insufficient", "deficiency", and "limitation" were used to refer to demographic characteristics of the data. Figure 2 shows that out of 916 cases of the gaps discussed in the literature, 611 referred to the definitions and measures, 113 to demographic characteristics, 101 to drivers and reasons, 48 to geographic coverage and 43 to the timeliness of the data. This allowed us to conceptualize the gaps under specific categories. For instance, i the studies discussed the issue of the incompatibility of data under unified definitions this issue was classified as gaps in the definition and measures, and in case the scholars discussed the lack of data for some specific regions and geographical coverage of the data this sort of problems were classified as the gaps in geographical coverage.
Following the results on the gaps by the qualitative analysis software, we reviewed the literature manually to check the accuracy of the outcome of the software-based analy sis. Table 1 shows the results of the manual review of the literature and is in line with the previous results. We classified the studies based on whether they are scientific articles UN and other official reports, books and chapters in books, conference proceedings and methodological and expert workshop reports. We sorted the gaps into five categories, in cluding (i) definitions and measurements, (ii) drivers or reasons behind migration, (iii) geographic coverage of the data, (iv) gaps in demographic characteristics and (v) the time lag in the availability of data. Hence, below is a brief explanation of these five categories of gaps followed by the second step of this review, which evaluates the statistical data sources. This allowed us to conceptualize the gaps under specific categories. For instance, if the studies discussed the issue of the incompatibility of data under unified definitions, this issue was classified as gaps in the definition and measures, and in case the scholars discussed the lack of data for some specific regions and geographical coverage of the data, this sort of problems were classified as the gaps in geographical coverage.
Following the results on the gaps by the qualitative analysis software, we reviewed the literature manually to check the accuracy of the outcome of the software-based analysis. Table 1 shows the results of the manual review of the literature and is in line with the previous results. We classified the studies based on whether they are scientific articles, UN and other official reports, books and chapters in books, conference proceedings and methodological and expert workshop reports. We sorted the gaps into five categories, including (i) definitions and measurements, (ii) drivers or reasons behind migration, (iii) geographic coverage of the data, (iv) gaps in demographic characteristics and (v) the time lag in the availability of data. Hence, below is a brief explanation of these five categories of gaps followed by the second step of this review, which evaluates the statistical data sources.

Definitions and Measurement
According to our analysis in the previous section, the main issue in the data is the differences in definitions and measures. In the data reported by countries for international migration data sources, the data are provided under different definitions and measures [42]. This problem has existed ever since the international organizations commenced gathering data on international migration. Almost every report and article on international migration data has emphasized the existence of gaps in definitions. The 1922 conference on migration data was organized by International Labour Organization to provide recommendations for improving the definitions and comparability of the data. The conference was followed by the 1932 conference on defining the long-term and short-term migrants [22]. These recommendations were followed by many additional UN recommendations for the improvement of the migration statistics, i.e., UN 1949UN , 1978UN , 2002UN , 2017 and 2019. However, the recommendations were not entirely implemented across countries.
There have been several attempts by the UN, ILO, OECD, the European Union and other international organizations and NGOs to resolve the problem of gaps in definitions and measures of international migration data [7]. However, there has not been much success in this endeavor. In 2007, the European Parliament, for example, adopted a regulation on migration statistics, which provides relatively clear definitions of immigration and emigration and states the categories under, which the data must be reported to the statistical office of the European Union, Eurostat. Nevertheless, there is no restriction on how the member states shall provide the required data, including their estimation methods [7,53]. The EU regulation of 2007 defines an international migration as 'a person, who moves to a country other than that of his or her usual residence for a period of at least a year', which complies with the United Nation's 1998 definition [7].
Another instance of the attempt towards harmonized international migration data is the creation of the Global Migration Data Portal in 2016. This initiative was taken by IOM with the cooperation of several other organizations and agencies to tackle the issue of timeliness, comprehensiveness, and reliability of migration statistics. The portal hosts the data from different data sources and presents the data, which are spread across different organizations and agencies. While this initiative is considered to be a great advancement towards the harmonization of data on migration, there is still more work required to harmonize the definitions.

Demographics
Demographic aspects of migration are the most important aspect of the migration phenomenon that is particularly difficult to measure [52]. Conventionally, statistical data on international migrations have been collected based on labor mobility and border crossings aspects [35]. The available datasets neglect the demographic characteristics of global migrants to a high extent [18,50]. The missing demographic information includes, e.g., the details of irregular migrants, information on visa, intra EU mobility, travelers and other disadvantaged groups in the databases [53]. Kupiszewski and Kupiszewska [74] in a book chapter on demographic consequences of migration, explicitly state that many demographic characteristics of migrants within the data are not available, which makes the understanding, forecasting and projection of future migration difficult. In 2007, the UN's expert group meeting on the use of censuses and surveys to measure international migration recommended that the data should be collected based on sex, age group or single year of birth, country of citizenship, country of birth, country of previous or future residence, marital status, educational attainment, purpose and duration of stay abroad, occupation, status of employment and industry of employer in previous country of residence and in the receiving country, type and duration of validity of permit, and occupation, characteristics [39]. However, for the time being, the existing datasets have not managed to fully adapt to these recommendations. Moreover, the existing formal data sources are not putting enough effort to estimate or present counts of the so-called "illegal migrants" or hidden populations [75]. Hidden populations often refer to drug addicts and locals under specific circumstances [76], but undocumented migrants, the Romani population and travelers are other types of hidden population that are understudied in academia and neglected in the databases.

Drivers (Reasons)
A third longstanding problem with the existing data on international migration is the missing aspect of the reasoning behind immigration and emigrations [5,8,10,20]. The data sources do not always cover the reasons for departure as well as return. In some countries, the censuses record to a high extent these reasons, but in the majority of countries, such data is missing or not represented [22]. Migration theories are build based on empirical evidence on economic, political, social, cultural, religious, psychological, emotional, and environmental motivations for the movement of the individuals. Nevertheless, the existing data hardly correspond to these reasons [17]. In many data sources, due to a high level of aggregation, the data are not broken down by reasons, except for Eurostat covering family, education, work and other reasons [8,17]. The category "other reasons" by itself is a big part of the data, which is not explained in meta-data.
Additionally, the issue with the (mainly non-European) data is that the officially stated reason for the stay does not always describe the actual intentions of the immigration [10]. In particular, this problem arises when an immigrant undergoes the admission procedure for family reasons and the family members are also allowed to work without having to change their status [10,22]. Furthermore, even if the data on the reasons are collected in the surveys, they are not readily available for research [53]. Moreover, another problem with the "reason" gap is that the system can register several motives for the same person and cannot distinguish between primary and other reasons [10].

Geographical Coverage
The fourth most emphasized gap in the existing data on international migration is the geographical aspects of the differences in the coverage of data across the regions [7,8,12,13,20,55]. The coverage of data varies in each database [40]. The biases in the data coverage area due to some countries being able to collect high-quality data, some being obliged by law to collect data, and developing countries being considerably less well-documented than that of developed countries [19]. The differences in geography and coverage are not only in the developing countries but also in developed countries as some populations remain out of coverage [7]. The untraditional reasons behind drivers of migration are also, to some extent, linked to the geographic coverage of data as the data on migration stock does not include populations migrating, for instance, as a result of environmental changes and lifestyle. Moreover, for the countries for which the data do not exist, the UN and other organizations present estimation of migration stocks and flows. However, the problem with such estimation is not only the accuracy of the estimation and the methods used, but the limitation of estimations in terms of presenting the data with extended demographic details [6].
The UN world migration report for 2020 puts forward recommendations for the need for further technical capacity for the countries that are not yet able to gather data on migration to develop a more comprehensive global picture of key aspects of migration [77]. With the development of technology and increase in means of transport for migration, e.g., air transportations, there are new ways of collecting data on migration flows. However, these methods are not advanced enough due to national governments' lack of interest and the national regulations on the ethics and ownership of the data.

Timeliness
Failure in the timely supply of the data is another prominent challenge in the existing data on international migration. The data publishing time is often lagging behind, and it changes for every region and data source. Timely provided data assists the researchers and policymakers with relevant and accurate migration policies. Having access to timely data has been emphasized in the work of scholars as well as the UN and other international organizations [5,36,40,44]. The 2014 independent advisory group on "Data Revolution for Sustainable Development" of the United Nations Secretary-General calls for, among others, timely and up-to-date data on international migration [5,73]. Certain issues about timeliness are directly related to the harmonization of data on internal migration. The data are collected at lower levels, and then it is published by national governments and international institutions [8,12,15,55]. Data collection is a very time-consuming process, and the period differs in each country. For instance, some countries publish the data on a yearly basis, while others release it on a two, three-and five-year basis. Projecting these data under one harmonized data for international migration degrades the quality of the data significantly.

Combing through the International Statistical Data Sources for the Gaps
The previous section systematically reviewed the literature on international migration statistics to detect and conceptualize the gaps in data. The results show that the gaps could be to a high extent categorized under five major groups. The gaps include (i) inconsistency in definitions and measurements that challenges the combination and comparison of the data gathered from different countries, (ii) gaps in collecting data with little consideration for the demographic characteristics of migrants, not always recording the reason behind migrations, (iii) gaps in the coverage of data across geographical regions and (iv) gaps in timely availability of the data. This section systematically reviews the international statistical data sources on migration based on those five categories of the gaps to investigate to what extent the gaps are evident in the data sources. The review follows the criteria and strategy elaborated in the methodology section.

Sources for International Migration Statistics
Several international and regional organizations gather empirical data on international migrants. The United Nations Department of Social and Economic Affairs (UN DESA), OECD, Eurostat, the World Bank, UNICEF, McKinsey & Company, Economist Intelligence Unit and other UN agencies are among the most prominent data providers. In 2015, the IOM established the Global Migration Data Analysis Centre (GMDAC). The IOM's GMDAC work closely with other agencies that are collecting data on migration, such as the European Commission's Knowledge Centre on Migration and Demography (KCMD), UN DESA, World Bank, UNICEF, McKinsey & Company, Economist Intelligence Unit, and the OECD to improve the collection, analysis and use of migration data for informed policies and programs. The GMDAC hosts the "Migration Data Portal", which aims at gathering the data that are scattered across different organizations and agencies. The portal does not collect the data by itself but serves as a single access point to timely, comprehensive and reliable migration statistics for policymakers, national statistics officers, journalists and the general public [5]. While the GMDAC's Migration Data Portal is considered to be a great development towards data harmonization, the gaps in international migration data remain highly problematic [44]. Table 2 provides a summary of the review results for the given data sources and the subsequent sections explain the gaps in detail.

UN DESA
The UN DESA data are derived from population censuses, population registers, and national representative surveys of the UN member states. The countries that are not able to record data on migration, the UN presents an estimation of the statistics for those countries based on total international migration stock, age, sex, origin and destination.

OECD
The data are provided by member states, and they are not necessarily based on common definitions and measures; and therefore, the OECD generally describes the migrants as a foreign population.
The reason for migration is of less concern for the OECD database. Only the worksheets on permanent migration inflows include drivers, such as family, family members of the workers, workers, free movements, humanitarian reasons and other reasons.
The data covers statistics for all OECD member countries and Russia.
The OECD data cover employment rates, unemployment, participation rates, and to some extent, the sex, country of birth and citizenship of immigrants across the countries. The other demographic specifications of the migrants are not included in the data.
The data are available from 1995 to 2016 and updated on an annual basis; however, major gaps exist in the statistics provided by countries for some years.

Data Sources Definitions Drivers (Reasons) Geography Demographics Timeliness
Eurostat Differences in definitions and practices of producing statistics exist between countries.
The only statistics available linked to drivers (reasons for migrating) concern the reasons for the first permit, family reasons, education or employment.
The statistics available from Eurostat are collected for the EU Member States and EFTA countries.
Eurostat provides breakdowns of the statistics by various characteristics, usually age, sex, country of birth and citizenship. For many topics (e.g., on "Demography and migration" and "Migrant integration"), Eurostat also allows for comparison with non-migrants. Data on minors are also presented, especially on "Asylum and managed migration".
The data are presented for each year, and it is set to be published during the one-year time period. However, it usually takes about two years to make the new data available.

IOM
The definitions for the collected data may differ in different countries, which affects the comparability and reliability of the statistics.
The data are collected on repatriation, resettlement and returns of refugees, victims of trafficking, stranded transit migrants, internally displaced person, unsuccessful asylum seekers, and soldiers who participated in demobilization programs.
The organization publishes the data gathered as a result of its operational missions and projects in over 133 countries.
The data are presented based on 66 variables on the sociodemographic profile of victims, e.g., gender or level of education, the trafficking process and the exploitation type.
The data collected routinely from over 133 countries by IOM as part of their operative missions are collected and published online on an irregular basis. There is a minimum of one-year lag in the data published online.

UNHCR
The numbers suffer from the issue of compatibility in understanding and definitions of the type of population used by each UNHCR member state.
Collects data on refugees, asylum seekers, internally displaced persons (IDPs), returned refugees, returned IDPs, stateless persons and others since 1951. The category "others" include individuals, who do not necessarily fall directly into any other groups, but to whom the UNHCR extends its protection and/or assistance services based on humanitarian grounds.
The data are collected in the areas where the agency is active and from the UNHCR member states.
The data are broken down by sex, age, as well as by location within the country of residence, whenever available.
On an irregular basis.
UN the foreign population. The UN DESA distinguishes the countries based on developed regions, less developed regions and the least developed countries. The developed regions include Northern America, Japan, New Zealand, Australia, and Europe, and less developed countries are most countries in Africa, Asia (excluding Japan), Latin America, and smaller islands. The least developed regions are comprised of the 47 nations defined by the General Assembly of the UN. The migrant stock data of the UN is publicly available in five tables with demographic characteristics, such as age group and gender, and three tables by country of origin and destination. The data are accompanied by a brief meta-data note prepared by the UN statistical team. The metadata does not elaborate on the methods that were arranged, complied and estimated.

Definitions and Measures
We investigated the gaps in definitions and measures in the data provided to the UN DESA datasets by member states. The data presented in the UN DESA portal are acquired from the censuses and population registers of UN member states, where available, and the remaining are estimations based on foreign-born population of other data sources. However, compiling data from multiple sources and origin is an extremely complicated task for presenting accurate harmonized data on international migration. The UN DESA datasets use rather broader definitions to overcome this problem. For instance, if the census data provide explicit data on the stock of refugees, migrant workers, family reunification migrants, the UN DESA present such under foreign-born and foreign population categories. Additionally, the foreign-born and foreign population are the considerably broad descriptions for representing international migrant stocks.
Another challenge for the UN DESA is the variations in national regulations and administrative data collection procedures. Country of citizenship could be used as a basis for identifying international migrants, whereas country of citizenship could refer to a different population, given the country. Some countries provide citizenship to migrant children upon birth, and some other issue citizenship by the nationality or ethnic background of the parents, even when they live abroad. A major issue with the UN DESA data is that there is not sufficient information provided about the definitions of data from source countries and methods of data compilations in the metadata.

Drivers
The UN DESA data mainly report the stock of migrants with limited focus on the reasons and drivers of migration. The UN DESA, to some extent, distinguishes between refugees and forced migration and other categories of migrants as a whole. Additionally, the estimations are presented for the stocks and variables, such as sex, age, countries of destination and origin are feasible to include. However, the reasoning behind the migration is yet to be included in the data except for the refugees. On the UN DESA website, some data that have been collected during the UNICEF operations and contribution of UNICEF member states are published online. These data are more detailed in terms of demographic and social variables, but the reasons behind those migrations are not covered or at least not published online.

Geographic Coverage
UN DESA covers a broad range of countries, states and regions around the world. The empirical and estimated data are available for about 232 countries and areas. However, as explained previously, the countries and regions are divided into developed regions, less developed regions and least developed regions. Therefore, the data coming from each division differs significantly in terms of coverage quality. The data for less and least developed regions is not of high-quality, and the coverage is also based on estimations mainly, and developed countries like members of the EU, EFTA and OECD are obliged to provide statistics on their emigrations and immigrations. The inequality in data coverage is a highly problematic issue for projecting international migration statistics. Only in sub-Saharan Africa, over 14 percent of the countries did not have the ability to update their information on their total number of international migrants since the 2000 round of population censuses [19].

Demographic Characteristics
The International Migrant Stock 2019 datasets of the UN DESA release the estimations based on sex, age groups and countries of origin and destination for overall migrants and refugees. These are highly important characteristics. However, due to variations in the sources of the data and dependency of the UN on member states, having data with more specific demographic characteristics is still a challenge for the UN DESA. For the data with more precise variables on characteristics, the UN DESA refers to local data-providing institutions. The UN agencies' data are available with better characteristics, however. For instance, the demographic characteristics of data by the UNHCR are explained in the coming sections.

Timeliness
Presenting timely data is complex for several reasons. First, a high amount of coordination is needed to compile the data from all areas and regions. Second, many countries and areas are not able to collect data. Hence, for those countries, estimations are required, which is a relatively timely process. Third, for many countries, having timely data on migration is not a priority, and it is very time-consuming to receive data from them on a regular basis. Finally, the sources for data from many countries and regions are the national census data, and census data are not evenly projected across countries. Currently, the UN DESA estimates are available for the years 1990, 1995, 2000, 2005, 2010, 2015 and 2019. There are some improvements in the timely publication of data by the UN. For instance, the last round of estimations was released in 2017, which was eventually updated in 2019.

OECD
The OECD manages three major databases on migration data, namely the OECD international migration database, database on immigrants in OECD countries and indicators of immigrant integration. The OECD international database presents data on annual flows, stocks and acquisition of nationality by foreign-born and foreigners across OECD member states. The database on OECD database on immigration provides comparative information on a relatively broader range of demographic and labor characteristics of immigrants living in OECD countries and countries cooperating with the OECD. The indicators of the immigrant integration portal provide a set of indicators of immigrant integration in the field of employment, education, skills, social inclusion, civic engagement and social cohesion. Below, we present the results of the review of OECD data based on five gap categories.

Definitions and Measures
The data for OECD databases is acquired from the member state's national correspondents. In addition to the OECD member states, the Russian Federation also contributes to the OECD data as part of its partnership with the OECD. The data provided to OECD by contributing countries are not necessarily under unified definitions and measures as the data are collected under different definitions in member states. The OECD broadly defines the migrants as foreign-born population and foreigners (foreign population). These broad titles for defining migration are due to a lack of unified definitions by data providers. In the datasets, 'foreign population' is used to describe all types of migrants, and the terms inflow and outflow are used for entrance and exit of all types of migrants.
The OECD data are derived from population registers, residence permits, labor force surveys and census sources. Many countries provide statistics from the population register on the number of residence permits, stocks and flow of migrants. Other countries use censuses to produce data with the same specifications. Using national registers and permit data for showing the stocks and flows could neglect some migration population from the records. For instance, people and family members of migrants who can travel without a permit for the reasons of free movement regimes are left out of the official statistics. In addition, many countries are not able to record the population who emigrate.
The records from the census are usually more accurate with many advantages, but not many countries collect census data. In recent years, the OECD uses labor force survey data, which is relatively better in terms of definitions and measurements, particularly on the questions of nationality, place of birth, labor market activity and so forth. The improvement in the OECD data compared to other data providers also includes the coverage to some extent of unrepresented populations, such as undocumented populations.

Drivers
Similar to UN DESA, the OECD database is also focused on migration stock rather than migration flows. The flows are covered within the databases, but some important variables that show the drivers of migration and the reasons behind migrations are yet to be included. The datasets on permanent migration flow included some variables, such as family migrants, workers, family members of migrants, migration based on free movement, migration permits on humanitarian grounds, and a category of "other migrants" on reasons for migration. However, the mentioned variables are not always explained in the metadata and statistical annexes of the data. Moreover, most of the OECD data are complemented by the UN agencies and Eurostat data.

Geographic Coverage
The OECD provides data at international and OECD member state levels. Stock and flows data are generally from the OECD members and partner countries, and emigration data are global. The data for Russia and on Golan Heights, East Jerusalem and Israeli settlements in the West Bank is recorded by Israel. Cyprus data are not unified; Southern Cyprus, which is controlled by the government of the Republic of Cyprus, provides data to the OECD.

Demographic Characteristics
Depending on the database and types of migrations in the datasets, the OECD provides data based on different demographic characteristics. The main database on immigrants in OECD countries contains data on immigrants by citizenship and age, detailed occupation, duration of stay, field of study, labor force status, occupation, sector and sex and age. The OECD international migration database provides data on inflows of foreign populations by nationality, outflows of foreign populations by nationality, inflows of asylum seekers by nationality, stock of foreign-born populations by country of birth, stock of foreign population by nationality, acquisition of nationality by country of former nationality, stock of foreign-born labor by country of birth and stock of foreign labor by nationality. Moreover, the other datasets provide the data by employment, unemployment and participation rates by place of birth and sex as well as employment rates by place of birth and educational attainment for age groups 25-64 details.
The OECD and the European Commission implemented a joint project to create a Migration Demography Database. The project monitors the demographic impact of migration and mobility on labor force dynamics. The project investigates the role of migrants on labor markets, occupation and skills improvement. This project will have a considerable impact on understanding the demographics of migration across OECD and EU countries.

Timeliness
The timeliness of the OECD data differs across databases. The data on the immigrant database are all retrieved from the census data 2000 round. The international migration database of the OECD provides annual data for the years 2000 to 2017. The datasets on employment and unemployment provide annual data for the years 2000 to 2018. The data on employment rates include annual data for the years 2000 to 2015. This shows that the time lag in the data is between two to 20 years. It is worth mentioning that since the EU member states are legally required to provide data for Eurostat, the EU countries' data are more updated than other regions in the OECD databases.

IOM
IOM works with other UN Agencies and government and non-government organizations to support migrants while crossing borders and sometimes takes action in emergency situations. In the process of its migrant support missions, IOM collects data on border crossing and other operative missions. Moreover, IOM hosts the GMDAC migration data portal, discussed in the first section of this paper. The scope of our review is merely on the data contributed by IOM as an organization, not the GMDAC portal.

Definitions and Measures
The IOM data are specific to their operative missions, and since it is a single international organization supporting migrants with legally crossing the borders and recording the missing migrants, the definitions are unified at the organization level. However, the local authorities in different countries may have their own interpretation and definition of certain types of migrants.

Drivers
The IOM presents data for over 133 countries and regions on repatriation, resettlement and returns of refugees, victims of trafficking, stranded transit migrants, internally displaced person, unsuccessful asylum seekers, and soldiers who participated in demobilization programs [38].

Geographic Coverage
IOM provides data for the countries and regions where they are active and have operative missions. Currently, IOM has more than 480 Country Offices and Sub-offices worldwide.

Demographics Characteristics
The organization provides data based on 66 sociodemographic variables on the profile of victims, including education, sex, trafficking regions, type of exploitations, among others. Since the data are on vulnerable populations, the information remains confidential.

Timeliness
Although the IOM data are gathered at the institutional level, the information is usually updated with a considerable time lag. The statistics on assisted voluntary returns exist for the years 2012 to 2018, and for missing migrants', information is available until 2020.

Eurostat
The statistical office of the European Union, the Eurostat, provides rather comprehensive data on migration compared to other international migration statistical sources. Eurostat's data on migration include annual statistics on immigration and emigration flows by various breakdowns, including country of birth, citizenship, regional level and demographic indicators, such as total fertility rates, life expectancy, median age, and naturalization rate. The Eurostat also provides annual data on the demography of migration and population projections. Moreover, Eurostat provides data on asylum and managed migration on the number of asylum applicants and decisions on applications, issued residence permits, statistics on the enforcement of immigration legislation and children in migration. Additionally, Eurostat provides data on migrant integration, which includes information on the integration of migrants in their host country by looking at rates of employment, education, health, social inclusion and active citizenship.

Definitions and Measures
The data are recorded by member states on an annual basis and are supplied to Eurostat by the national statistical authorities of the EU-27 Member States. The Eurostat data comes from administrative sources, mirror statistics, sample surveys and migrant population statistics based on estimations of the member states [78]. The administrative data correspond to sources from population registers, registers of foreigners, registers of residence or work permits, health insurance registers and tax registers. The EU regulation on migration statistics and the EU institutions working on migration constantly demand improving the comparability of migration data through unified statistics from member state sources, but providing data under a unified definition is much complex. The complexity lies behind the differences in method and definitions within the administrative systems of all 27 member states and the varying methods of estimation.

Drivers
The Eurostat statistics provide the data based on reasons and drivers of migration. In comparison to other data-providing organizations, Eurostat's data on flows and stocks cover more comprehensively the reason behind migration. It classifies the data based on regular and irregular border crossings, type of visa and permits issued, family reason and migration based on humanitarian grounds. The data based on other reasons, such as environmental change, lifestyle migration, ideological beliefs and others, are not recorded.

Geographic Coverage
The Eurostat data are available for the EU Member States, EFTA countries and sometimes for candidate EU member countries like Turkey and other eastern European counties. The emigration data includes migration from all corners of the world to the EU. The Eurostat database presents the data based on different geographic levels and regions. This includes the country level, NUTS-1, which is the macro-regional level, NUNTS-2 the regional and subregional, including provinces and NUTS-3 subregional level, including provinces and metropolitan areas, where applicable. The review shows that the data coverage of the Eurostat is more complete than other databases.

Demographics Characteristics
Age groups, sex, country of birth, country of citizenship and sometimes country of the previous residence are the main demographic characteristics of migrants in the Eurostat datasets. The Eurostat believes that the existing statistics should go beyond these limited demographic characteristics and cover more inclusive socioeconomic aspects of migratory movements of migrants and their descendants [79]. A prominent problem with current Eurostat data is that for a large proportion of the population, no demographic characteristics and reasons for migration are mentioned, and these statistics are sorted under the category "others".

Timeliness
The Eurostat data covers statistics since 1990, with some disruptions due to regulatory and methodological changes. Since 2009, annual data are available on migration. However, the data are not published on a timely basis, and there is usually a gap of one to five years until the datasets are fully updated.

UNHCR
The United Nations High Commissioner for Refugees (UNHCR) is one of the major UN agencies working on refugee issues around the world. The UNHCR also collects data on the population they work with which include internally displaced persons, asylum seekers, people with refugee status, returned migrants, returned internally displaced persons, among others. Additionally, the UNHCR collects data for people who do not directly fall under the above-mentioned categories and are vulnerable. The data mainly cover the general composition of the population undercover, like country or area of residence, origin and displacements.

Definitions and Measures, Reasons, Geographic Coverage and Timeliness
The UNHCR data are collected as part of the work of UNHCR country and regional offices. Therefore, the definitions and comparability are not an issue with such data. However, the perceptions and understanding of issues across the country and local levels where the data are collected might differ from one country or region to another. For instance, in Costa Rica, a stateless person is someone with an undetermined nationality, while in Haiti, it refers to an individual without a nationality who was born in the Dominican Republic before January 2010. Compared to the other data-providing organizations, the UNHCR data are published much timelier. However, there are stills a time lag of at least one year for the data to be online.

Demographics Characteristics
The UNHCR statistics are available since 2000, and the details include sex, age, and often location at the country of residence.

Discussion
Considering the complexity of the migration phenomena and its link to the SDGs, from the perspective of international migration statistics, we conducted a two-phase study. In the first stage, we conducted a systematic review of over 940 articles, books, book chapters, official documents, policy papers and reports to identify the evidenced gaps in the international migration data. Our results demonstrate that significant gaps could be categorized under (1) definitions and measures, (2) drivers or reasons behind migration, (3) geographic coverage of the data, (4) gaps in demographic characteristics and (5) the time lag in the availability of data.
The second phase of the study concentrated on diagnosing these gaps within major international organizations collecting migration statistics. Our main findings indicate that these gaps are not mutually exclusive, and they are interlinked. We also argue that the quality and availability of both migration flows and stock data vary across regions and countries. Although developing and disadvantaged countries are the main sending countries, migration statistics rely on the arrival of the immigrants, and hence the host (developed) countries collect more indicators, more detailed and recent data. Yet, this deepens the shortcoming regarding the emigration statistics.
In addition, timeliness is directly associated with the harmonization of data. The harmonization challenge is rooted in the bureaucratic process times and differences between institutes and organizations, but more important, in the definition of differentials. Even with substantial UN recommendations for progressive harmonization, there are still no internationally well-accepted definitions of migrants and the different types of migration. Comparability of migration data (including migration types) across countries is only possible when the legislative and regulatory definitions, which determine the particular national data collection means and methods, are harmonized. Until such harmonization is achieved, the metadata should be at least adapted to sustain clarification of the indicators.
Furthermore, every data source for international migration refers to census and register data for the indicators of migration stocks, both having shortcomings for making data available timely. Census data have their own pitfalls, such as large time intervals between waves (usually collected every 10 years), not including detailed information on migration drivers (reasons), or varying implementations for refugees and asylum seekers. To compensate for these, register data are usually deployed. However, national administrative registers are mostly not suitable for cross-country comparisons and fail to cover irregular migrants, internally displaced people and the homeless. Across different regions, especially in the European Union, the deficiency in knowledge has caused ambiguity in the policies of the Union and member states, impacting the Union's constitutional goals, such as the right to free movement and social security [80]. In addition, passive policies can lead to disastrous effects for national and regional development and competitiveness [81].
Finally, our findings illustrate that more data exist on convention-based international migrants, migrant stocks, labor migrants, family migrants and students, whereas fewer data are gathered on irregular migrations, smuggling, missing population, migration policies, return migration, and (e) migration flows. Additionally, measuring the progress towards the SDGs requires a comprehensive disaggregation of data. The under-coverage of migrants in data or overlooking the intersectional features of migration drivers impede not only the estimates for migration indicators but also the recognition of the migrants' necessities. Given the fundamental priority of "leaving no one behind" of the SDGs, the drawbacks in migration statistics obfuscate the achievability of the migration-related targets considering the fact that migrants and refugees are often marginalized, and the existing statistics are systematically lacking these groups.