Abstract:
The decision to migrate is complex and is often influenced by a combination of economic, social, political, and environmental pressures. Household survey instruments can capture detailed information about migration histories and their contexts, but it can be challenging to identify important predictors from large numbers of covariates with standard statistical methods, such as regression analyses. Machine learning techniques are well suited to pattern identification and can identify important covariates from large datasets. We report on the application of machine learning approaches to two large surveys collected from a total of more than 2800 households in southwestern Bangladesh. We applied random forest classification and regression models to identify significant covariates with the greatest predictive power for household migration decisions. The results show that random forest models are able to identify nuances in predictors of different types of migration and migration in different communities. Random forests also outperform logistic regression and support vector machines in predicting migration in all cases analyzed. Therefore, random forest models and other machine learning methods can be useful for improving the predictive accuracy of migration models and identifying patterns in complex social datasets. Future work should continue to explore the potential of machine learning techniques applied to questions of environmental migration.