Modeling and Forecasting Gender-Based Violence through Machine Learning Techniques

: Gender-Based Violence (GBV) is a serious problem that societies and governments must address using all applicable resources. This requires adequate planning in order to optimize both resources and budget, which demands a thorough understanding of the magnitude of the problem, as well as analysis of its past impact in order to infer future incidence. On the other hand, for years, the rise of Machine Learning techniques and Big Data has led di ﬀ erent countries to collect information on both GBV and other general social variables that in one way or another can a ﬀ ect violence levels. In this work, in order to forecast GBV, ﬁrstly, a database of features related to more than a decade’s worth of GBV is compiled and prepared from o ﬃ cial sources available due to Spain’s open access. Then, secondly, a methodology is proposed that involves testing di ﬀ erent methods of features selection so that, with each of the subsets generated, four techniques of predictive algorithms are applied and compared. The tests conducted indicate that it is possible to predict the number of GBV complaints presented to a court at a predictive horizon of six months with an accuracy (Root Median Squared Error) of 0.1686 complaints to the courts per 10,000 inhabitants—throughout the whole Spanish territory—with a Multi-Objective Evolutionary Search Strategy for the selection of variables, and with Random Forest as the predictive algorithm. The proposed methodology has also been successfully applied to three speciﬁc Spanish territories of di ﬀ erent populations (large, medium, and small), pointing to the presented method’s possible use elsewhere in the world.


Introduction
Right now, Intimate Partner Violence (IPV) is a significant issue for a large number of women around the globe. Its impact incorporates physical, sexual, and mental mischief by a current or previous partner, in any form or means. As per UN (United Nations) reports, practically 35% of women around the globe have encountered some sort of physical or sexual violence [1], while similar insights find that some 75% of women face physical and sexual hostility. This paper is looking to It is important, therefore, to bear in mind the intersectional nature of GBV. Many aspects can have a hand in influencing the course of violence, including poverty or health status [26]. Thus, although social wealth and the family's economic situation are drawn upon as the main characteristics involved, the wide variety of additional factors that can be found make a multidisciplinary approach necessary, incorporating environment factors, education, safety and security, health, and also, correlatively, the interaction of professionals in each sector [27]. Many of these variables are already taken into account when planning policies related to violence management [28].
Some previous papers have focused on the specific field of forecasting GBV by applying ML. In 2017, Thornton [29] tried to forecast domestic homicides and serious violence by using a database reflecting these situations in the county of Dorset (United Kingdom) and evaluating the police protocol. In doing so, he found that predicting deadly domestic violence dependent on insight from earlier police contacts did not seem possible at present, given the discovery that less than 50% of these cases had occurrences of earlier police contact and that, when contact occurred, the connections were evaluated by the protocol as not being of a high hazard in 89% of cases.
Chalkley and Strang [30] reproduced the methods used by Thornton and found false-negative risk assessments in 67% of the deadly violence cases that had prior contact with the police but were not classified within the existing protocol as high risk. They proposed that possible alternative predictors regarding sex, health, and other descriptors could improve the performance of the prediction. But this related data needs to be collected over a long period in order to obtain knowledge. Delgadillo-Alemán et al. [31] used the data provided by the Mexican Women's Institute, combined with other local organizations, and developed a mathematical deterministic model, which took into account variables like violence index, violence in childhood, the acceptance of machismo, and external factors, among others. By utilizing mathematical models, the authors showed their model's capability for diagnosing GBV risk in a certain couple. Although an interesting approach, its focus was on differential equations, so it does not explore the whole of society in a certain territory.
Spain, as with many other countries, has been gathering compelling and interesting data for decades relating to many aspects of society. In this country, we can find the previously mentioned INE which, in its current form, was founded in 1945, but its predecessor, the Kingdom's Statistical Commission, dates back to 1856. On its webpage, a range of time-series data is freely available and ready to be downloaded. Some authors have taken advantage of this availability in order to study the GBV phenomenon, using this database combined with other sources. De la Poza, Jódar, and Barreda [32], for example, proposed a mathematical model to infer hidden GBV incidence. Such work includes factors like the social awareness of men, age, drug consumption, and statistics of murdered women, all of which went into building a deterministic model and estimating the hidden population of aggressors. Unfortunately, however, this is not quantified by official statistics via which the accuracy of the model can be compared.
In conclusion, this review of the existing literature allows us to determine that the possibilities of ML have been widely proven as useful in making decisions related to social management, considering that these techniques are able to predict incidences of some public problems. We can also assess that the utility of ML in GBV is beyond doubt and, in this sense, some remarkable works have been identified. Despite this, however, we feel that the potential of ML in domestic violence forecasting for society as a whole is still unexplored and that the power of collected data is still insufficiently exploited. We think that, as previously shown regarding other disciplines, ML can be utilized to make useful GBV predictions for a certain territory, thereby optimizing the use of public resources. In this sense, to the best of the authors' knowledge, no ML-based study for the specific forecasting of GBV has been previously published that analyzes the features that most influence its appearance in a social group, that carries out a fair comparison between predictive ML algorithms applied to the same extensive database, and that considers differently populated territories. In any case, we have the feeling that many studies not have made a deep comparison of different methods of machine learning, both for selecting the most important variables and predictive techniques, and in this work we want to go beyond the works presented and check, not only if it is possible to predict the incidence of gender violence, but also what technique would be more appropriate, making a comparison. Regarding these deficiencies, in this work, the issues mentioned are addressed through the use of a vast Spanish GBV database disaggregated by territories, which should allow for the proposed methodology to be applied to any other country/region/city.

Feature Selection Techniques
Feature selection (FS) is the process of choosing the most relevant and pertinent features from an arrangement of features in a certain given dataset. For a dataset with d input features, the feature selection process brings about k features to such an extent that k < d, where k is the smallest arrangement of critical and applicable features [33]. This results in quicker ML algorithm training, the reducing of a model's complexity so it is simpler to decipher, better forecasting power, and the decreasing of overfitting by choosing the correct arrangement of features, among others.
There are three types of feature selection procedures [34]: -Wrapper methods -Filter methods -Embedded methods Wrapper methods use factor combinations to decide forecasting force. Normal wrapper strategies include: Subset Selection, Forward Stepwise Selection, and Backward Stepwise Selection (Recursive Features Elimination-RFE) [35]. The wrapper technique will locate the best mix of features, testing each variable against test models it builds with them to assess the outcomes [36]. Of the three strategies, this is more demanding computationally. In the Subset Selection strategy, we fit the model with every potential combination of N features [37]. With Forward Stepwise Selection, however, we first begin with a null model, i.e., beginning with one model variable. At this point, features are added one at a time with the best model picked depending on a metric (i.e., a valuation of the error) [36]. In this strategy, once the predictor is chosen, it never drops in the second step. This is done until the best subset of features is chosen, following a stopping criterion that establishes when the feature selection process must finish. In Backward Stepwise Selection (or Recursive Feature Elimination), the method works the opposite in that it wipes out features. As they are not run on each combination of features, they are less computationally concentrated by a significant degree when compared to straight Subset Selection [38]. Fundamentally, this is the inverse of Forward Stepwise selection. It begins with all predictors and, afterward, drops one feature at a time before selecting the best model. Likewise, the computational effort is fundamentally the same as that of Forward Selection. Filter and Wrapper strategies have been used and compared in some studies [39].
Filter methods are likewise considered as a Single Factor Analysis. By utilizing this technique, the predictive power of each individual variable (feature) is assessed, while different statistical methods can be utilized to decide predictive force [40]. One such pathway is achieved by correlating the feature with the objective (i.e., what we are foreseeing), with the features with the highest correlation being the most effective.
In contrast, Embedded Method (Shrinkage) is an inbuilt variable selection strategy, within which the features are not chosen or dismissed. With this approach, some value parameter controls (weights) are carried out, making it possible to name the LASSO (Least Absolute Shrinkage and Selection Operator) Regression. With this technique, regularization is carried out and some coefficients of a regression tend to be zero [41]. Therefore, as a portion of the coefficients tends to be equivalent to zero, we can drop or reject such variables. Another example is that of Ridge Regression (Tikhonov regularization), which includes a punishment that rises to the square of the greatness of coefficients [42]. All coefficients are shrunk by the same factor (so no single predictor is eliminated). Some of these techniques will be applied in our work, in which we use a Multi-Objective Evolutionary Search Strategy [43] and also a Ranker Strategy [44], minimizing the metric that could be the Root Mean Squared Error (RMSE) and also reducing the features set. The two different types of approaches in these two groups are univariate and multivariate. Univariate methods are faster and easily scalable but ignore variable dependencies. On the other hand, multivariate techniques are able to model feature dependencies but are slower and less scalable than univariate ones [45]. The chosen techniques will be exposed in detail in the Methodology Section. By minimizing the metric, it is possible to improve the forecasting stage.

Forecasting
After the FS is complete, the forecasting task in time series can be deployed. In 1996, Wolpert [46] stated that, without deep information about the underlying model, there is no certain model that will always achieve better performance than any other. As a result, a proper approach can be made by trying out various techniques, then determining which model operates better. Consequently, we have compiled linear and nonlinear techniques, with a focus on the most promising algorithms.
Linear Regression is one of the easiest approaches. This family of models attempts to find an estimation of the model parameters so that the sum of the squared errors is minimized [47]. Some modifications include partial least squares and penalized models such as Ridge Regression or LASSO.
A significant advantage of these models is that they are highly interpretable. The coefficients indicate relationships and they are usually easy to compute, so the use of several features is affordable. On the other hand, they can be limited in their performance [48]. They achieve good results when the relationship between the predictors and their response falls along a hyperplane. However, if there are relations of a higher order, like quadratic, cubic, and alike, then the nonlinear relationships may not be properly captured with these models and so other approaches are required [49].
Some other models are capable of understanding nonlinear trends and, fortunately, the exact form of nonlinearity is not required to be known before building the model. Support Vector Machines (SVM) is of one the most popular examples in this category. These are dual learning algorithms that process data merely by computing their dot-products [50], and these dot-products between variable arrays can be properly computed by a kernel function [51]. Given this function, the SVM learner attempts to find a hyperplane that separates the examples while maximizing the separation (margin) between them. SVMs are well known to be resilient to over-fitting and to keep a good generalization performance due to the max-margin criterion used in the optimization process. In addition, while other solutions may only provide a local optimum, SVMs are guaranteed to converge to a global optimum because of the corresponding convex optimization formulation [52].
Besides this, Regression Trees make up a family of modeling algorithms that is getting a lot of attention in recent years. Tree-based models use one or more 'if-then' statements for the predictors that will subsequently partition the data. Within these subsets, a model is used to forecast the outcome [53]. From a statistical point of view, reducing correlation among predictors can be achieved by adding randomness to a tree construction process, which is the basis of the Random Forest (RF) technique [54]. Each model in the set is then used to build a prediction for a new dataset, with these predictions then being averaged to provide the final forecast.
An RF model performs a variance reduction by selecting complex and strong learners that exhibit low bias. This leads to an improvement in error rates and, in addition, RF is robust to a noisy response [55].
Other comparative strategies, for example, Gaussian Processes (GPs) with Radial Basis Function Kernels (RBF) [56]-which permit an overall consistency and a non-limited number of basic functions-are infrequently utilized, albeit a few previous approaches have used this strategy with promising conclusions [57].
GPs represent a nonparametric methodology focused on modeling perceptible reactions from different training data points (function values) as multivariate normal random features [58]. A supposition is made of a priori distribution for such function data values, which will ensure the function's smoothness properties. To be explicit, there will be a high correlation between the two function values when there is closeness (in the feeling of Euclidean separation) between the comparing input vectors and if they decay as they diverge. Later, the distribution of unpredicted function data may be calculated from the use of an assumed distribution with the application of simple probability manipulation.

Database, Available Features, and Target to Be Forecasted
In order to study the relationship between different time series with data regarding GBV and other variables, we have accessed the database of Spanish INE (www.ine.es), where it is possible to freely find data from several decades related to demography, economy, employment market, education, energy, and so on. The data series are usually grouped by population groups (always disaggregated by sex), and also by territorial units. The frequency of data reporting can be monthly, quarterly, or annually. For our purposes, we have assembled the data by provinces and also the country's total. Spain has 50 provinces and 2 autonomous cities. We decided to select some examples as study cases then compare them with the evolution of the total country. With regard to the timescales, we decided to divide the data monthly as, on the one hand, this is the usual way of presenting the data in our database and, on the other hand, it offers sufficient granulometry to show the variable evolution. In Table 1, a brief description of the variables and the units utilized can be observed. All variables have been referred to population units (per capita) in order to make a fair comparison between territories. Although the Spanish Government has been collecting GBV casualties' data since 2003, we begin our database in 2009 in order to obtain a complete overview by avoiding any gaps in the early data of some variables. We also seek to reflect the changes introduced by the Organic Law on Measures Regarding Comprehensive Protection against Gender-Based Violence (LO 1/2004, December 28, 2004), some of whose measures were not fully implemented until some years later. Likewise, the most recent data may have a delay in their incorporation and be subject to revision, so full series data up to March 2020 have been used. With this approach, we have studied more than a decade of data.
The chosen features (among the available ones) are related to: -Territorial: We study the time-series data for the entire country but also some provinces as examples, in order to test the validation of our purpose. -Date and season: We will explore the evolution of GBV within years, month by month. We will also include the quarter to evaluate the influence of the season, as indicated by previous works [59]. -Demography and population: Considering population can offer insights into the influence of big population areas, but some changes in demography can also provide explanations of the course of couples [60]. In this manner, marriages, separations, and births are included, but also the proportion of men vs. women. -Specific variables related to GBV: In this sense, there are some interesting variables available, such as: Calls to the special number 016. This is a phone number dedicated to providing information to survivors, but also to manage assistance (imperative or not). Complaints: In particular, we will study the number of complaints presented to a court as the independent variable to be modeled and forecasted. Ultimately, we feel that complaints express the incidence of worst cases. Security devices for tracking offenders: This kind of device is proposed by a judge in high-risk cases. Protection orders: Also ordered by a judge in cases of high risk.
Level of risk of aggression for the survivor: After a police evaluation, the cases are classified as unappreciated, low, medium, high, and extremely high. Fatalities: Murdered victims of GBV.
-Wealth and employment: The level of wealth in a region can be related to the levels of crime and violence. Similarly, levels of unemployment (male and female) can give an idea of the level of economic stability [61]. We differentiate between the inactive population (retired, disabled) and also the employed and unemployed population. -Education level: The relationship of illiteracy (male and female) and other educational levels (primary, secondary, university) with violence will also be studied, as previous literature indicates this point [62]. With this, we have built a database of almost 250,000 data examples, taking into account all the months from January 2009 to March 2020 and the 52 Spanish territories plus the whole country. Using all this data, we will carry out a feature selection and then forecast future GBV complaints.

Territories under Study
As previously stated, a large amount of data is under consideration. The purpose of this work is to study the possibility of forecasting GBV complaints so as to provide reliable information to optimize public resources and to schedule actions in advance, thereby being useful for other countries-with the necessary adjustments. In this sense, instead of proving our methodology in each and every Spanish province, we will test our proposed procedure in some particular cases: the whole country (Spain) and the three representative provinces of Madrid, Alicante, and Segovia-representing large, medium, and low populations, respectively. In addition, each of them has its own and differentiated characteristics in terms of location and economy, as well as idiosyncrasies and cultural aspects.

The Waikato Environment for Knowledge Analysis (WEKA)
The Waikato Environment for Knowledge Analysis (WEKA v.3.8) is free software developed at the University of Waikato, New Zealand (https://waikato.github.io/weka-wiki/) and licensed under the GNU (GNU's Not Unix) General Public License. WEKA contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces that offer easy access to these functions. The software also supports several standard data-mining tasks-more specifically, data preprocessing, clustering, classification, regression, visualization, feature selection, modeling, and forecasting.
The use of WEKA facilitates data entry, algorithm execution, and visual context in the management of the entire process. This software has been successfully applied many times before and is still being

Computer Hardware
Due to the computational demands of the ML algorithms, they have been executed using a computer equipped with an AMD Ryzen 7 1700X processor, operating at 3.8 GHz with 32 GB DDR4 RAM at 2666 MHz CL19 and a Solid-State Disk Samsung 970 Evo Plus M.2 1000 GB PCI-E 3.0.

Data Cleaning, Regularization, and Lagged Variables
The above-mentioned database should be transformed in due course to provide the proper inputs of feature selection algorithms. The values are cleaned, and some gaps have been completed. In order to later make a fair comparison between the whole country and some provinces (which will become study cases), all the features were divided by the population (hundreds, thousands, tens of thousands).
As some features could have a delayed influence, they were given six lagged values (which means taking into account the last six months), except for the categorized and date features. The TimeSeriesLagManager routine of WEKA allows for easily creating as many lagged variables as required.

Features Selection
WEKA offers an intuitive graphical environment for carrying out a feature selection. The module AttributeSelection allows for specifying different Search Methods and Attribute Evaluators, with some combinations being tested and then evaluated at the forecasting stage. The features set that provides a more accurate prediction will become the chosen features. A short introduction to FS methods was presented in Section 3.1.

Search Methods
As stated, we use two searching methods: The Multi-Objective Evolutionary Search Strategy and also a Ranker Strategy.

Attribute Evaluators
From the feature selection methods offered by WEKA, we will choose the two most popularly used: -Wrapper methods. The WrapperSubsetEval routine implemented in WEKA will allow us to evaluate some approaches via multivariate techniques. For univariate ones, we need to instead use the ClassifierAttributeEval procedure. We will execute the following predictors: Linear Regression: This offers fast computation, fixing the coefficients for each feature. Random Forest [68]: As stated earlier, this is a tree-based algorithm well-known for classification purposes. Instance-Based K-nearest neighbor algorithm (IBk) [69]: A K-nearest neighbors classifier, this algorithm allows for selecting an appropriate value of K based on cross-validation but is also able to carry out distance weighting.
-Filter Method. On the side of the univariate methods, we will use the Ranker operation according to the below predictors: Relief Attribute (Rlf) [70]: Relief feature selection is based on scoring by the identification of feature value differences between the nearest neighbor instance pairs. Principal Component Analysis (PCA) [71]: With this technique, a new set of orthogonal coordinate axes is introduced, and, at the same time, the sample data variance is maximized. This leads to the scenario that the other directions, in which the variance is minor, are less important and, hence, can be removed from the dataset. PCA offers a very effective way of transforming the data in a lower dimensionality, while also being able to reveal some simplified patterns that often underlie the data.

Generated Subsets
With the exposed techniques, combined as indicated in Table 2, we can generate seven subsets of reduced data that will be under evaluation in the forecasting task. In all the exposed cases of FS, the metric to be optimized is the RMSE. In addition, we will study the prediction strategies for the original dataset, which are exposed in the following subsection. Table 3 compiles the different commands used in WEKA, presenting the used parameters.

Data Modeling and Forecasting
Once the seven reduced data subsets have been achieved, plus the original dataset, we will try to make a prediction of future values, taking into account the past time series collected in each dataset. We will attempt to forecast the complaints regarding GBV for a predictive horizon of six months but, in order to have the real data to evaluate/validate the prediction, we will apply a Cross-Validation (CV) method designed for time series [72]. We will train with a subset and then forecast the next six months/steps (for which the data is available but not included in the training dataset).
For each dataset, we use the following approaches as forecasting algorithms, always indicating the accuracy in terms of RMSE. A description of each method can be found in Section 3.2. Table 4 expresses the WEKA commands and the parameters of each technique.

Results and Discussion: Forecasting Performance
The FS procedure generates seven subsets which, together with the original complete dataset, will be tested using four forecasting techniques. In this way, we will execute 32 prediction tests for the whole country (Spain), forecasting six months of the time series of GBV-complaints. In each experiment, the predictive algorithm will first be trained with a subset of the data and then a predictive horizon of the next six months/steps will be forecast, executing a CV. An example of this first phase resulting in a trained model is shown in Figure 1, for the specific case of the subset MOES-RF and using RF as a predictive technique. At a glance, it can be seen that there is a cyclic stationary behavior combined with a certain tendency. A description of each method can be found in Section 3.2. Table 4 expresses the WEKA commands and the parameters of each technique.

Results and Discussion: Forecasting Performance
The FS procedure generates seven subsets which, together with the original complete dataset, will be tested using four forecasting techniques. In this way, we will execute 32 prediction tests for the whole country (Spain), forecasting six months of the time series of GBV-complaints. In each experiment, the predictive algorithm will first be trained with a subset of the data and then a predictive horizon of the next six months/steps will be forecast, executing a CV. An example of this first phase resulting in a trained model is shown in Figure 1, for the specific case of the subset MOES-RF and using RF as a predictive technique. At a glance, it can be seen that there is a cyclic stationary behavior combined with a certain tendency. The results of the forecasting task can be found in Table 5. With each predictive algorithm, and for each subset of data, we calculate the accuracy of the next six months/steps of the GBV complaints series. Using the CV technique, we can obtain the RMSE for each future step, then, as a measure of performance, obtain an average of the six values of RMSE regarding each FS technique (RMSE ). The standard deviation is also estimated in each forecasted series in order to infer the accuracy's variability. The results of the forecasting task can be found in Table 5. With each predictive algorithm, and for each subset of data, we calculate the accuracy of the next six months/steps of the GBV complaints series. Using the CV technique, we can obtain the RMSE for each future step, then, as a measure of performance, obtain an average of the six values of RMSE regarding each FS technique (RMSE). The standard deviation is also estimated in each forecasted series in order to infer the accuracy's variability. A Shapiro-Wilk test was used to determine whether the data presented a normal distribution for each 6-steps prediction. The results indicated that the data was normally distributed (p-values > 0.05).
To compare the similarity of the eight forecasted series per prediction technique, we performed the parametric Welch's T-test. The results indicated that the 6-steps evolution of the original dataset differed significantly from the other results (p-value < α = 0.05) in the four cases (LR, RF, SVM, and GP).
As can be seen in Table 5, the lower RMSE averaged between steps (RMSE) is obtained using RF as a foresight algorithm with the MOES-RF dataset (RMSE = 0.1686 u/10,000 pop). Other FS approaches are also promising. Figure 2 shows different predictions using RF from all of the datasets. Rnk-RF also provides an accurate result in most of the predictive situations, which can indicate that the use of RF as a predictor in each attribute evaluator is an interesting choice. MOES-LR closely follows the performance in this particular situation under analysis.
performed the parametric Welch's T-test. The results indicated that the 6-steps evolution of the original dataset differed significantly from the other results (p-value < α = 0.05) in the four cases (LR, RF, SVM, and GP).
As can be seen in Table 5, the lower RMSE averaged between steps (RMSE ) is obtained using RF as a foresight algorithm with the MOES-RF dataset ( RMSE = 0.1686 u/10,000 pop). Other FS approaches are also promising. Figure 2 shows different predictions using RF from all of the datasets. Rnk-RF also provides an accurate result in most of the predictive situations, which can indicate that the use of RF as a predictor in each attribute evaluator is an interesting choice. MOES-LR closely follows the performance in this particular situation under analysis.
But, as stated in the results, MOES-RF seems to be the better dataset for utilization. According to Table 5's results, with all the four different predictive techniques, the best accuracy is achieved with this subset. When using this FS combination, the best predictive algorithm is RF, as depicted in Figure 3, where it can be seen that RF offers the best average accuracy because of the low standard deviation in the RMSE in each step-which results in performance stability. SVM is able to achieve a better performance in short prediction, as well as LR, but they soon increase errors in future steps-which can be inferred by their bigger standard deviation.   But, as stated in the results, MOES-RF seems to be the better dataset for utilization. According to Table 5, with all the four different predictive techniques, the best accuracy is achieved with this subset. When using this FS combination, the best predictive algorithm is RF, as depicted in Figure 3, where it can be seen that RF offers the best average accuracy because of the low standard deviation in the RMSE in each step-which results in performance stability. SVM is able to achieve a better performance in short prediction, as well as LR, but they soon increase errors in future steps-which can be inferred by their bigger standard deviation.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 21 A Shapiro-Wilk test was used to determine whether the data presented a normal distribution for each 6-steps prediction. The results indicated that the data was normally distributed (p-values > 0.05). To compare the similarity of the eight forecasted series per prediction technique, we performed the parametric Welch's T-test. The results indicated that the 6-steps evolution of the original dataset differed significantly from the other results (p-value < α = 0.05) in the four cases (LR, RF, SVM, and GP).
As can be seen in Table 5, the lower RMSE averaged between steps (RMSE ) is obtained using RF as a foresight algorithm with the MOES-RF dataset ( RMSE = 0.1686 u/10,000 pop). Other FS approaches are also promising. Figure 2 shows different predictions using RF from all of the datasets. Rnk-RF also provides an accurate result in most of the predictive situations, which can indicate that the use of RF as a predictor in each attribute evaluator is an interesting choice. MOES-LR closely follows the performance in this particular situation under analysis.
But, as stated in the results, MOES-RF seems to be the better dataset for utilization. According to Table 5's results, with all the four different predictive techniques, the best accuracy is achieved with this subset. When using this FS combination, the best predictive algorithm is RF, as depicted in Figure 3, where it can be seen that RF offers the best average accuracy because of the low standard deviation in the RMSE in each step-which results in performance stability. SVM is able to achieve a better performance in short prediction, as well as LR, but they soon increase errors in future steps-which can be inferred by their bigger standard deviation.   The better forecasting results with RF can be also inferred by averaging the averaged RMSE (RMSE), expressed by RMSE, and then by making a comparison between prediction methods. RF then offers an RMSE = 0.2046 u/10,000 pop, followed by GP with RMSE = 0.2680 u/10,000 pop. SVM results in variable performance (as shown in Figure 3), averaging 0.2963 u/10,000 pop. LR cannot follow the accuracy of the other algorithms even closely.
We need to bear in mind that, for a population of 47,329,981 inhabitants, the achieved RMSE for RF is equal to a RMSE of 968.37 complaints for all the country (per month). Taking into account that the average of complaints in 2019 per month in Spain was 14,014 petitions, the error is around 6% and good enough for our purposes.
Taking into account these results, it is possible to make a practical check of the prediction evolution in the months from October 2019 to March 2020 using MOES-RF with RF as a forecasting technique, before comparing the results with the real evolution. As can be observed, the prediction is accurate enough to identify trends, while logically being more separated from the real curve as we progressively expand the predictive horizon (Figure 4). The better forecasting results with RF can be also inferred by averaging the averaged RMSE (RMSE ), expressed by RMSE , and then by making a comparison between prediction methods. RF then offers an RMSE = 0.2046 u/10,000 pop, followed by GP with RMSE = 0.2680 u/10,000 pop. SVM results in variable performance (as shown in Figure 3), averaging 0.2963 u/10,000 pop. LR cannot follow the accuracy of the other algorithms even closely.
We need to bear in mind that, for a population of 47,329,981 inhabitants, the achieved RMSE for RF is equal to a RMSE of 968.37 complaints for all the country (per month). Taking into account that the average of complaints in 2019 per month in Spain was 14,014 petitions, the error is around 6% and good enough for our purposes.
Taking into account these results, it is possible to make a practical check of the prediction evolution in the months from October 2019 to March 2020 using MOES-RF with RF as a forecasting technique, before comparing the results with the real evolution. As can be observed, the prediction is accurate enough to identify trends, while logically being more separated from the real curve as we progressively expand the predictive horizon ( Figure 4).   Following the same procedure with the selected Spanish provinces of different populations (Madrid, Alicante, and Segovia), we can confirm that similar results can be found. Table 6 summarizes the RMSE of the 6-step predictions carried out with the four predictive techniques and with each of the eight data subsets. From the results shown in Table 6, valuable conclusions can be obtained. MOES-RF appears to be the best FS technique, with RF being the best forecasting algorithm. This performance is consistent and to be expected as the time-series data from each province makes up the subsets of the whole country or, seen in a different light, the analyzed data from Spain makes up the sum of the provinces.
Nevertheless, we have to highlight that RMSE is higher for each predictive technique when the province is less populated. This can be explained because, when we are considering the whole of Spain or Madrid (millions of people), the consistency of some data is flush, and in some way occasional and punctual situations are retailed averaged in a big population, showing a smoother evolution of the social variables. On the other hand, with a low population, every single fluctuation stands out more, resulting in more variability in the prediction and, hence, higher error and a bigger standard deviation. This particular case can be easily appreciated by studying the table, showing that RMSE corresponding to each technique is around three times bigger than the whole country when applied to Segovia-0.2963 to 0.7190 u/10,000 pop (Spain and Segovia, respectively) when using SVM, or 0.2680 to 0.3852 u/10,000 pop (Spain-Segovia) when using GP. To deepen this idea, we average the RMSE of each technique by territory one more time (referred to as RMSE in Table 6), allowing us to appreciate this evolution clearly, growing from 0.2826, 0.4076, 0.5839, and 0.6389 u/10,000 pop (Spain, Madrid, Alicante, and Segovia, respectively).
Although, for the sake of simplicity, predictions are not detailed in every step of the provinces' comparison. Figure 5 shows the instance evolution or RMSE in each territory forecasting with MOES-RF (chosen as FS) and RF. As can be seen, all of the forecasting steps show the stability of RF as a predictive algorithm, although higher values and a more oscillating prediction can be observed in the case of Segovia.  Although, for the sake of simplicity, predictions are not detailed in every step of the provinces' comparison. Figure 5 shows the instance evolution or RMSE in each territory forecasting with MOES-RF (chosen as FS) and RF. As can be seen, all of the forecasting steps show the stability of RF as a predictive algorithm, although higher values and a more oscillating prediction can be observed in the case of Segovia.  Considering these results and the discussion, some important conclusions can be made in the next section.

Conclusions and Future Works
GBV makes for one of the great unresolved problems of our time that require urgent attention. Allocating resources of all kinds is essential for tackling these situations before they occur, allowing authorities to anticipate and act before the aggressions take place.
It is necessary, therefore, to consider the extent to which we can predict the incidence of this violence in order to optimize resources and allocate the necessary means in the most appropriate manner, both in time and space. Until now, achieving such a forecast has proven complex, but the periodic collection of data that reflects the state and evolution of society, together with the increased knowledge and applicability of ML algorithms, provides a new avenue for addressing the challenge of predicting the temporal evolution of gender violence.
In this work, the possibility of predicting the reports of gender violence with acceptable accuracy has been proven, with the most appropriate technique for selecting variables and the best predictive algorithm performance having been discussed. After testing eight sets with four known predictive techniques, it has been found that the most appropriate technique is one that combines MOES-RF as a variable selection with RF to predict future values. This conclusion has been obtained by using the data corresponding to the whole of Spain from January 2009 to September 2019, which has been corroborated by comparing it with certain provinces of the country with differing populations, such as Madrid, Alicante, and Segovia-each with a particular casuistry.
Given the difference in population of the provinces studied, as well as their different geographical situation, an adequate prediction per province allows for a correct distribution of the available state resources, so that awareness campaigns, police intervention, as well as economic resources and other social policies are distributed over time in a more efficient manner. Although it can be inferred from the study that there is seasonality in general, the maximum incidence from one province to another may differ, which, thanks to the results of this work, will allow for more adjusted planning in the provincial distribution of resources.
Other combinations of FS and predictive algorithms are also promising and may also be useful. Although there is consistency between the behavior of ML techniques in each territory, it has been shown that errors increase when the population decreases, as well as the error dispersion (greater variation), giving an impression that, the larger the population, the greater consistency in the data collected, which will reflect not a particular circumstance in time, but the presence of underlying circumstances with predictable cause-effect relationships. A smaller study population will mean isolated circumstances marking the oscillation of certain variables more significantly-a dynamic that will be attenuated in larger populations.
In any case, this work intends, rather than showing concrete results in a specific period of time in Spain and some of its provinces, to present a specific methodology and to study its viability. With the conclusions drawn, we aim to serve as a basis for studies similar to ours in other countries/territories with comparable (or other) variables to be taken into account. In this sense, some other public databases could validate the proposed methodology. The European Institute for Gender Equality, in its Gender Statistics Database (https://eige.europa.eu/gender-statistics/dgs), provide several data that can be used for validation purposes.
For this reason, future work should look to test other combinations of attribute selection and prediction, as well as replicating our method to address other social issues involving a large number of people (migration, education, consumption, economy, etc.), and continuing to check the performance of the work described here.