Gender-Based Violence (GBV) is a serious problem that societies and governments must address using all applicable resources. This requires adequate planning in order to optimize both resources and budget, which demands a thorough understanding of the magnitude of the problem, as well as analysis of its past impact in order to infer future incidence. On the other hand, for years, the rise of Machine Learning techniques and Big Data has led different countries to collect information on both GBV and other general social variables that in one way or another can affect violence levels. In this work, in order to forecast GBV, firstly, a database of features related to more than a decade’s worth of GBV is compiled and prepared from official sources available due to Spain’s open access. Then, secondly, a methodology is proposed that involves testing different methods of features selection so that, with each of the subsets generated, four techniques of predictive algorithms are applied and compared. The tests conducted indicate that it is possible to predict the number of GBV complaints presented to a court at a predictive horizon of six months with an accuracy (Root Median Squared Error) of 0.1686 complaints to the courts per 10,000 inhabitants—throughout the whole Spanish territory—with a Multi-Objective Evolutionary Search Strategy for the selection of variables, and with Random Forest as the predictive algorithm. The proposed methodology has also been successfully applied to three specific Spanish territories of different populations (large, medium, and small), pointing to the presented method’s possible use elsewhere in the world.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.