Abstract
This study presents a decision tree model-based approach to classify rural net migration across Colombian departments using sociodemographic and economic variables. In the model formulation, immigration is considered the movement of people to a destination area to settle there, while emigration is the movement of people from that specific area to other places. The target variable was defined as a binary category representing positive (when the immigration is greater than emigration) or negative net migration. Four classification models were trained and evaluated: Decision Tree, Random Forest, AdaBoost, and XGBoost. Data were preprocessed using cleaning techniques, categorical variable encoding, and class balance assessment. Model performance was evaluated using various metrics, including accuracy, precision, sensitivity, score, and the area under the ROC curve. The results show that Random Forest achieves the highest accuracy, precision, sensitivity, and F1 score in the 10-variable and 15-variable settings, while XGBoost is competitive but not dominant. Furthermore, the importance of the model was analyzed to identify key factors influencing migration patterns. This approach allows for a more precise understanding of regional migration dynamics in Colombia and can serve as a basis for designing informed public policies.