Appendix A. Stata Code Used to Create Total and Sex-Specific Intermunicipal and Interstate Origin-Destination Matrices from IPUMS International Publicly-Available Data
* ==============================================================================
* File: MigrationStreamCreation.do
* Created by: Daniel H. Simon & Fernando Riosmena (University of Colorado)
* Date: November 30, 2018
* Last Updated: January 20, 2019
* Description: DO FILE FOR CREATION OF INTERNAL MIGRATION STREAMS
* ==============================================================================
* Code for DATA submission, “ Estimating internal migration in contemporary Mexico: //
// The missing link in understanding spatial distributions of population”
* Data Used: Mexican census data (2000 & 2010) obtained through IPUMS-International extract system, which is public use //
* ==============================================================================
* LOADING AND CLEANING IPUMS DATA
* ==============================================================================
clear //clear existing data
clear matrix //clear stored data in STATA memory
set more off, permanent
/* Replace with file path for IPUMS extract, or place IPUMS extract in same directory as do file */
cd “.”
/* Replace with Name of dataset with Raw census data */
use “IPUMSMX2000-2010.dta”
/* As per the IPUMS International user agreement, original raw needs to be directly downloaded by any interested user from IPUMS International */
/* Here, we present the code needed to aggregate flows (which can also be adapted to many other countries with minor adjustments (e.g., geocodes) */
/* while indeed sharing the final output aggregated data */
* 5-YEAR MIGRATION QUESTION N/A FOR CHILDREN UNDER 5
drop if age<5
* STATE AND MUNICIPALITY OF ORIGIN/DESTINATION
gen STATE_dest=. // Generate state of destination variables for each census year
replace STATE_dest=GEOLEV1 //Harmonized, first administrative division boundary
gen STATE_orig=. // Creating state of origin from variable that identifies “State or country of residence in 1995 for 2000 Census & 2005 for 2010 Census”
replace STATE_orig=mx2000a_resent if year==2000 & mx2000a_resent>=1 & mx2000a_resent<=32
replace STATE_orig=0 if year==2000 & mx2000a_resent>32 & mx2000a_resent<999
replace STATE_orig=mx2010a_migstat5 if year==2010 & mx2010a_migstat5>=1 & mx2010a_migstat5<=32 // Same procedure for 2005
replace STATE_orig=0 if year==2010 & mx2010a_migstat5==97
/* Note that this variable is NOT harmonized across IPUMS, so need to download one specific to each country-period of interest */
gen MUNID_dest=.
replace MUNID_dest=GEO_LEV2 //Harmonized, second administrative division boundary
gen MUNID_orig=.
replace MUNID_orig=STATE_orig*1000+mx2000a_resmun if year==2000 & STATE_orig>0 & STATE_orig<33
replace MUNID_orig=mx2010a_migmuni5 if year==2010 & mx2010a_migmuni5>=1001 & mx2010a_migmuni5<=32999
replace MUNID_orig=0 if STATE_orig==0
/* Note that this variable is NOT harmonized across IPUMS, so need to download one specific to each country-period of interest */
// 999 CODE IDENTIFIES INDIVIDUALS WITH UNKNOWN MUNICIPALITY OF ORIGIN
replace MUNID_orig=. if MUNID_orig==99998
replace MUNID_orig=. if MUNID_orig==1999
replace MUNID_orig=. if MUNID_orig==2999
replace MUNID_orig=. if MUNID_orig==3999
replace MUNID_orig=. if MUNID_orig==4999
replace MUNID_orig=. if MUNID_orig==5999
replace MUNID_orig=. if MUNID_orig==6999
replace MUNID_orig=. if MUNID_orig==7999
replace MUNID_orig=. if MUNID_orig==8999
replace MUNID_orig=. if MUNID_orig==9999
replace MUNID_orig=. if MUNID_orig==10999
replace MUNID_orig=. if MUNID_orig==11999
replace MUNID_orig=. if MUNID_orig==12999
replace MUNID_orig=. if MUNID_orig==13999
replace MUNID_orig=. if MUNID_orig==14999
replace MUNID_orig=. if MUNID_orig==15999
replace MUNID_orig=. if MUNID_orig==16999
replace MUNID_orig=. if MUNID_orig==17999
replace MUNID_orig=. if MUNID_orig==18999
replace MUNID_orig=. if MUNID_orig==19999
replace MUNID_orig=. if MUNID_orig==20999
replace MUNID_orig=. if MUNID_orig==21999
replace MUNID_orig=. if MUNID_orig==22999
replace MUNID_orig=. if MUNID_orig==23999
replace MUNID_orig=. if MUNID_orig==24999
replace MUNID_orig=. if MUNID_orig==25999
replace MUNID_orig=. if MUNID_orig==26999
replace MUNID_orig=. if MUNID_orig==27999
replace MUNID_orig=. if MUNID_orig==28999
replace MUNID_orig=. if MUNID_orig==29999
replace MUNID_orig=. if MUNID_orig==30999
replace MUNID_orig=. if MUNID_orig==31999
replace MUNID_orig=. if MUNID_orig==32999
* IDENTIFY MIGRANTS FOR STREAMS OFF HARMONIZED VARIABLE
// 10: Same major admin. unit; 11: Same major, same minor; 12: Same major, different minor; 20: Diferent major; 30: Abroad; 99: Missing
gen migrant=0
replace migrant=1 if migrate5==10
replace migrant=1 if migrate5==12
replace migrant=1 if migrate5==20
replace migrant=1 if migrate5==30
replace migrant=. if migrate5==99
// Create identifier for inter-state migrants only
gen interstatemig = 0
replace interstatemig=1 if migrate5==20
replace interstatemig=. if migrate5==99
// Create migrant sex identifiers for flows by sex
// User can modify to stratify flows by e.g., age, schooling levels or the like
// Beware, however, that these measurements are *after* migration,
// so either need to assume no changes in e.g., schooling,
// or adjust for the time change (in e.g., age)
gen migrantmale=0
replace migrantmale=1 if sex==1 & migrant==1
replace migrantmale=. if migrant==. | sex==.
gen migrantfem=0
replace migrantfem=1 if sex==2 & migrant==1
replace migrantfem=. if migrant==. | sex==.
keep MUNID_* STATE_* year migrant* interstatemig perwt
// Saving dataset with basic info to recall for different calculations
save “IPUMS_DatatoCollapse.dta”,replace
* ==============================================================================
* CREATE MUNICIPAL MIGRATION STREAMS FROM INDIVIDUAL IPUMS DATA
* ==============================================================================
* TO AGGREGATE DATA, COLLAPSE MIGRANTS BY MUNICIPALITIES TO GET MIGRANT COUNTS
// All migrants for total flows
clear all
use “IPUMS_DatatoCollapse.dta”
collapse (count) migrants=migrant ///
[pweight=perwt] if migrant==1, by (STATE_dest MUNID_dest STATE_orig MUNID_orig year)
// Technically, don’t need to include both STATE and MUNID identifiers if GEOLEVEL2 is unique for each country
// However, beware if this is not the case
// Also, if you are using more than 1 country, will need to add country variable
// to every by() procedure from here on out.
// Don’t want to count people moving from municipalities that were created in 2005-2010 to the municipalities they were created from
drop if MUNID_orig==MUNID_dest
replace STATE_orig=0 if MUNID_orig==0
sort MUNID_orig MUNID_dest year
save “IntermunicipalFlows.dta”, replace
//Aggregating flows by sex
// Male Migrant Flows
clear all
use “IPUMS_DatatoCollapse.dta”
collapse (count) malemigrants=migrantmale ///
[pweight=perwt] if migrantmale==1, by (STATE_dest MUNID_dest STATE_orig MUNID_orig year)
drop if MUNID_orig==MUNID_dest
replace STATE_orig=0 if MUNID_orig==0
sort MUNID_orig MUNID_dest year
save “IntermunicipalMaleFlows.dta”, replace
// Female Migrant Flows
clear all
use “IPUMS_DatatoCollapse.dta”
collapse (count) femalemigrants=migrantfem ///
[pweight=perwt] if migrantfem==1, by (STATE_dest MUNID_dest STATE_orig MUNID_orig year)
// This gets rid of people moving from municipalities that were created in 2005-2010 to the municipalities they were created from...
drop if MUNID_orig==MUNID_dest
replace STATE_orig=0 if MUNID_orig==0
sort MUNID_orig MUNID_dest year
save “IntermunicipalFemaleFlows.dta”, replace
* ==============================================================================
* MERGING MUNICIPAL FLOW DATA
* ==============================================================================
// Note we also create a “frame” with all possible combinations of
// intermunicipal flows to merge the data to
// This is helpful in order to assign zeroes to any origin-destination
// dyad without any data.
// Otherwise, output dataset would not yield as many origin-destination
// dyads as theoretically possible.
clear
use “IntermunicipalFlows.dta”
sort MUNID_orig MUNID_dest year
merge m:m MUNID_orig MUNID_dest year using “IntermunicipalMaleFlows.dta”, keepusing(malemigrants)
rename _merge _mergeMALEmig
replace malemigrants=0 if malemigrants==. & migrants>=0
merge m:m MUNID_orig MUNID_dest year using “IntermunicipalFemaleFlows.dta”, keepusing(femalemigrants)
rename _merge _mergeFEMALEmig
replace femalemigrants=0 if femalemigrants==. & migrants>=0
//Only keeping internal migration in this file (0 is code for international origins/destinations)
drop if MUNID_orig==0 | MUNID_dest==0
merge m:m MUNID_orig MUNID_dest year using MuniOriginDestFrame.dta, keepusing(MUNID_orig STATE_orig MUNID_dest STATE_dest)
// Dataset has entries for unknown origin for many destinations, including those with known state of prior residence but unknown muni...
// Assigning all dyads in which no one in Census
/// reported moving to and from as 0
foreach var of varlist *migrants* {
replace ‘var’=0 if _merge==2
}
rename _merge _mergeFRAME
keep year STATE_orig MUNID_orig STATE_dest MUNID_dest migrants malemig* female*
save “CompleteIntermunicipalFlows.dta”, replace
outsheet using "“CompleteIntermunicipalFlows.csv”, comma replace
* ==============================================================================
* CREATE INTERSTATE MIGRATION STREAMS FROM INDIVIDUAL IPUMS DATA (separate file)
* ==============================================================================
clear all
use “CompleteIntermunicipalFlows.dta”
drop if MUNID_orig==0 | MUNID_dest==0
collapse (sum) interstatemigrants=migrants, by (STATE_dest STATE_orig year)
drop if STATE_orig==STATE_dest // could keep if interested in no. of intrastate moves
save “InterstateMigrationFlows.dta”,replace
// Interstate flows by sex
// Male flows
clear
use “CompleteIntermunicipalFlows.dta”
drop if MUNID_orig==0 | MUNID_dest==0
collapse (sum) interstatemalemig=malemigrants, by (STATE_dest STATE_orig year)
drop if STATE_orig==STATE_dest // could keep if interested in no. of intrastate moves
save “InterstateMaleFlows.dta”,replace
// Female flows
clear
use “CompleteIntermunicipalFlows.dta”
drop if MUNID_orig==0 | MUNID_dest==0
collapse (sum) interstatefemalemig=femalemigrants, by (STATE_dest STATE_orig year)
drop if STATE_orig==STATE_dest // could keep if interested in no. of intrastate moves
save “InterstateFemaleFlows.dta”,replace
* ==============================================================================
* MERGING INTERSTATE FLOW DATA (no need for data frame as state-to-state flows are complete in data)
* ==============================================================================
clear
use “InterstateMigrationFlows.dta”
merge m:m STATE_dest STATE_orig year using “InterstateMaleFlows.dta”, keepusing(interstatemalemig)
rename _merge _mergeMALEmig
replace interstatemalemig=0 if interstatemalemig==. & interstatemigrants>=0
merge m:m STATE_dest STATE_orig year using “InterstateFemaleFlows.dta”, keepusing(interstatefemalemig)
rename _merge _mergeFEMALEmig
replace interstatefemalemig=0 if interstatefemalemig==. & interstatemigrants>=0
//Keeping only domestic destinations
drop if STATE_dest==0 | STATE_orig==0
sort STATE_dest STATE_orig year
save “CompleteInterstateFlows.dta”,replace
outsheet using “CompleteInterstateFlows.csv”, comma replace