You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

20 November 2020

Modeling and Forecasting Gender-Based Violence through Machine Learning Techniques

,
,
,
and
1
Departamento de Ingeniería de Comunicaciones, ATIC Research Group, Universidad de Málaga, 29071 Málaga, Spain
2
Instituto Universitario de Investigación de Estudios de Género, Universitat d’Alacant, 03080 Alicante, Spain
3
Departamento de Tecnologías de la Información y las Comunicaciones, Universidad Politécnica de Cartagena, 30202 Cartagena, Spain
4
Departamento de Ciencias Sociales y Humanas, Universidad Miguel Hernández de Elche, 03202 Elche, Spain

Abstract

Gender-Based Violence (GBV) is a serious problem that societies and governments must address using all applicable resources. This requires adequate planning in order to optimize both resources and budget, which demands a thorough understanding of the magnitude of the problem, as well as analysis of its past impact in order to infer future incidence. On the other hand, for years, the rise of Machine Learning techniques and Big Data has led different countries to collect information on both GBV and other general social variables that in one way or another can affect violence levels. In this work, in order to forecast GBV, firstly, a database of features related to more than a decade’s worth of GBV is compiled and prepared from official sources available due to Spain’s open access. Then, secondly, a methodology is proposed that involves testing different methods of features selection so that, with each of the subsets generated, four techniques of predictive algorithms are applied and compared. The tests conducted indicate that it is possible to predict the number of GBV complaints presented to a court at a predictive horizon of six months with an accuracy (Root Median Squared Error) of 0.1686 complaints to the courts per 10,000 inhabitants—throughout the whole Spanish territory—with a Multi-Objective Evolutionary Search Strategy for the selection of variables, and with Random Forest as the predictive algorithm. The proposed methodology has also been successfully applied to three specific Spanish territories of different populations (large, medium, and small), pointing to the presented method’s possible use elsewhere in the world.

1. Introduction

Right now, Intimate Partner Violence (IPV) is a significant issue for a large number of women around the globe. Its impact incorporates physical, sexual, and mental mischief by a current or previous partner, in any form or means. As per UN (United Nations) reports, practically 35% of women around the globe have encountered some sort of physical or sexual violence [1], while similar insights find that some 75% of women face physical and sexual hostility. This paper is looking to highlight this issue. Unfortunately, in 2017, nearly 87,000 women were executed globally, of whom 58% (50,000) were murdered by their other half or by different family members (https://www.unodc.org/).
Lately, a great deal of research has focused on IPV and its association with many related issues. The scope of this analysis incorporates the resources utilized by the victims (perhaps better referred to as ‘survivors’) [2], as well as examining the boundaries involved, and also the public policies focused on preventing and overseeing Gender-Based Violence (GBV).
The Council of the Europe Convention on preventing and combating violence against women, including domestic violence (opened in 2011 and in effect since 1 August 2014)—better known as the Istanbul Convention—emphasizes in Article 17 that countries shall encourage the Information and Communication Technologies (ICT) sector especially to participate in policies to prevent violence against women.
In 2020, ICT and Machine Learning (ML) have clearly progressed and had their impact felt throughout all avenues of society, and all around the world. In the mid-1990s, Haraway [3] predicted the social changes and the impact, particularly for sex-related issues, that would come with ICT.
Fortunately, both ICT and ML strategies offer additional opportunities for preventing and dealing with these kinds of violence. Innovative advances, software, and new ideas—for example, the Internet of Things (IoT) and cloud-computing methodologies—offer a wide scope of opportunities for managing violence against women [4], especially after being effectively integrated with different fields, for example, e-health [5]. Systems for advanced data processing—with Machine Learning (ML) and Big Data [6] as two key examples—can likewise be utilized to battle gender violence.
Luckily, in recent years, governments have recognized the power of data analysis and its potential for policy planning, which has resulted in efforts to systematically collect information on a wide range of topics spanning many decades. In this regard, Spain has been attentive to these trends and, since 2003, has been collecting valuable data via the National Institute of Statistics (Instituto Nacional de Estadística, INE), structured as a time series related to violence against women that now makes it possible to analyze GBV and its relationship to other variables.
This work puts ML techniques into practice in order to model and forecast the incidence of GBV according to a predictive horizon of half a year, achieved by extracting the variables that have the highest influence on the existence of such violence from a total of more than 30 features extracted from a Spanish national database. In addition, the possibility of forecasting GBV is analyzed using four predictive algorithms so that governments can improve their policy planning on this issue, thereby optimizing and maximizing strategies.
To fulfill this paper’s objectives, Section 2 describes previous contributions based on ML applications for improving public policies and actions against GBV. Section 3 explores different techniques in the field of features selection and forecasting in time series. Section 4 explains the nature of the collected database that will be analyzed with the methodology proposed in Section 5. The results, in Section 6, include the performance of the modeling stage under different approaches added to a test of forecasting GBV and its accuracy under certain methods. Finally, Section 7 draws conclusions, suggests future works, and closes the document.

3. Feature Selection and Forecasting Time Series

3.1. Feature Selection Techniques

Feature selection (FS) is the process of choosing the most relevant and pertinent features from an arrangement of features in a certain given dataset. For a dataset with d input features, the feature selection process brings about k features to such an extent that k < d, where k is the smallest arrangement of critical and applicable features [33]. This results in quicker ML algorithm training, the reducing of a model’s complexity so it is simpler to decipher, better forecasting power, and the decreasing of overfitting by choosing the correct arrangement of features, among others.
There are three types of feature selection procedures [34]:
-
Wrapper methods
-
Filter methods
-
Embedded methods
Wrapper methods use factor combinations to decide forecasting force. Normal wrapper strategies include: Subset Selection, Forward Stepwise Selection, and Backward Stepwise Selection (Recursive Features Elimination—RFE) [35]. The wrapper technique will locate the best mix of features, testing each variable against test models it builds with them to assess the outcomes [36]. Of the three strategies, this is more demanding computationally. In the Subset Selection strategy, we fit the model with every potential combination of N features [37]. With Forward Stepwise Selection, however, we first begin with a null model, i.e., beginning with one model variable. At this point, features are added one at a time with the best model picked depending on a metric (i.e., a valuation of the error) [36]. In this strategy, once the predictor is chosen, it never drops in the second step. This is done until the best subset of features is chosen, following a stopping criterion that establishes when the feature selection process must finish. In Backward Stepwise Selection (or Recursive Feature Elimination), the method works the opposite in that it wipes out features. As they are not run on each combination of features, they are less computationally concentrated by a significant degree when compared to straight Subset Selection [38]. Fundamentally, this is the inverse of Forward Stepwise selection. It begins with all predictors and, afterward, drops one feature at a time before selecting the best model. Likewise, the computational effort is fundamentally the same as that of Forward Selection. Filter and Wrapper strategies have been used and compared in some studies [39].
Filter methods are likewise considered as a Single Factor Analysis. By utilizing this technique, the predictive power of each individual variable (feature) is assessed, while different statistical methods can be utilized to decide predictive force [40]. One such pathway is achieved by correlating the feature with the objective (i.e., what we are foreseeing), with the features with the highest correlation being the most effective.
In contrast, Embedded Method (Shrinkage) is an inbuilt variable selection strategy, within which the features are not chosen or dismissed. With this approach, some value parameter controls (weights) are carried out, making it possible to name the LASSO (Least Absolute Shrinkage and Selection Operator) Regression. With this technique, regularization is carried out and some coefficients of a regression tend to be zero [41]. Therefore, as a portion of the coefficients tends to be equivalent to zero, we can drop or reject such variables. Another example is that of Ridge Regression (Tikhonov regularization), which includes a punishment that rises to the square of the greatness of coefficients [42]. All coefficients are shrunk by the same factor (so no single predictor is eliminated).
Some of these techniques will be applied in our work, in which we use a Multi-Objective Evolutionary Search Strategy [43] and also a Ranker Strategy [44], minimizing the metric that could be the Root Mean Squared Error (RMSE) and also reducing the features set. The two different types of approaches in these two groups are univariate and multivariate. Univariate methods are faster and easily scalable but ignore variable dependencies. On the other hand, multivariate techniques are able to model feature dependencies but are slower and less scalable than univariate ones [45]. The chosen techniques will be exposed in detail in the Methodology Section. By minimizing the metric, it is possible to improve the forecasting stage.

3.2. Forecasting

After the FS is complete, the forecasting task in time series can be deployed. In 1996, Wolpert [46] stated that, without deep information about the underlying model, there is no certain model that will always achieve better performance than any other. As a result, a proper approach can be made by trying out various techniques, then determining which model operates better. Consequently, we have compiled linear and nonlinear techniques, with a focus on the most promising algorithms.
Linear Regression is one of the easiest approaches. This family of models attempts to find an estimation of the model parameters so that the sum of the squared errors is minimized [47]. Some modifications include partial least squares and penalized models such as Ridge Regression or LASSO.
A significant advantage of these models is that they are highly interpretable. The coefficients indicate relationships and they are usually easy to compute, so the use of several features is affordable. On the other hand, they can be limited in their performance [48]. They achieve good results when the relationship between the predictors and their response falls along a hyperplane. However, if there are relations of a higher order, like quadratic, cubic, and alike, then the nonlinear relationships may not be properly captured with these models and so other approaches are required [49].
Some other models are capable of understanding nonlinear trends and, fortunately, the exact form of nonlinearity is not required to be known before building the model. Support Vector Machines (SVM) is of one the most popular examples in this category. These are dual learning algorithms that process data merely by computing their dot-products [50], and these dot-products between variable arrays can be properly computed by a kernel function [51]. Given this function, the SVM learner attempts to find a hyperplane that separates the examples while maximizing the separation (margin) between them. SVMs are well known to be resilient to over-fitting and to keep a good generalization performance due to the max-margin criterion used in the optimization process. In addition, while other solutions may only provide a local optimum, SVMs are guaranteed to converge to a global optimum because of the corresponding convex optimization formulation [52].
Besides this, Regression Trees make up a family of modeling algorithms that is getting a lot of attention in recent years. Tree-based models use one or more ‘if-then’ statements for the predictors that will subsequently partition the data. Within these subsets, a model is used to forecast the outcome [53]. From a statistical point of view, reducing correlation among predictors can be achieved by adding randomness to a tree construction process, which is the basis of the Random Forest (RF) technique [54]. Each model in the set is then used to build a prediction for a new dataset, with these predictions then being averaged to provide the final forecast.
An RF model performs a variance reduction by selecting complex and strong learners that exhibit low bias. This leads to an improvement in error rates and, in addition, RF is robust to a noisy response [55].
Other comparative strategies, for example, Gaussian Processes (GPs) with Radial Basis Function Kernels (RBF) [56]—which permit an overall consistency and a non-limited number of basic functions—are infrequently utilized, albeit a few previous approaches have used this strategy with promising conclusions [57].
GPs represent a nonparametric methodology focused on modeling perceptible reactions from different training data points (function values) as multivariate normal random features [58]. A supposition is made of a priori distribution for such function data values, which will ensure the function’s smoothness properties. To be explicit, there will be a high correlation between the two function values when there is closeness (in the feeling of Euclidean separation) between the comparing input vectors and if they decay as they diverge. Later, the distribution of unpredicted function data may be calculated from the use of an assumed distribution with the application of simple probability manipulation.

4. Database, Available Features, and Target to Be Forecasted

In order to study the relationship between different time series with data regarding GBV and other variables, we have accessed the database of Spanish INE (www.ine.es), where it is possible to freely find data from several decades related to demography, economy, employment market, education, energy, and so on. The data series are usually grouped by population groups (always disaggregated by sex), and also by territorial units. The frequency of data reporting can be monthly, quarterly, or annually. For our purposes, we have assembled the data by provinces and also the country’s total. Spain has 50 provinces and 2 autonomous cities. We decided to select some examples as study cases then compare them with the evolution of the total country. With regard to the timescales, we decided to divide the data monthly as, on the one hand, this is the usual way of presenting the data in our database and, on the other hand, it offers sufficient granulometry to show the variable evolution. In Table 1, a brief description of the variables and the units utilized can be observed. All variables have been referred to population units (per capita) in order to make a fair comparison between territories. Although the Spanish Government has been collecting GBV casualties’ data since 2003, we begin our database in 2009 in order to obtain a complete overview by avoiding any gaps in the early data of some variables. We also seek to reflect the changes introduced by the Organic Law on Measures Regarding Comprehensive Protection against Gender-Based Violence (LO 1/2004, December 28, 2004), some of whose measures were not fully implemented until some years later. Likewise, the most recent data may have a delay in their incorporation and be subject to revision, so full series data up to March 2020 have been used. With this approach, we have studied more than a decade of data.
Table 1. Description of the features.
The chosen features (among the available ones) are related to:
-
Territorial: We study the time-series data for the entire country but also some provinces as examples, in order to test the validation of our purpose.
-
Date and season: We will explore the evolution of GBV within years, month by month. We will also include the quarter to evaluate the influence of the season, as indicated by previous works [59].
-
Demography and population: Considering population can offer insights into the influence of big population areas, but some changes in demography can also provide explanations of the course of couples [60]. In this manner, marriages, separations, and births are included, but also the proportion of men vs. women.
-
Specific variables related to GBV: In this sense, there are some interesting variables available, such as:
Calls to the special number 016. This is a phone number dedicated to providing information to survivors, but also to manage assistance (imperative or not).
Complaints: In particular, we will study the number of complaints presented to a court as the independent variable to be modeled and forecasted. Ultimately, we feel that complaints express the incidence of worst cases.
Security devices for tracking offenders: This kind of device is proposed by a judge in high-risk cases.
Protection orders: Also ordered by a judge in cases of high risk.
Level of risk of aggression for the survivor: After a police evaluation, the cases are classified as unappreciated, low, medium, high, and extremely high.
Fatalities: Murdered victims of GBV.
-
Wealth and employment: The level of wealth in a region can be related to the levels of crime and violence. Similarly, levels of unemployment (male and female) can give an idea of the level of economic stability [61]. We differentiate between the inactive population (retired, disabled) and also the employed and unemployed population.
-
Education level: The relationship of illiteracy (male and female) and other educational levels (primary, secondary, university) with violence will also be studied, as previous literature indicates this point [62].
With this, we have built a database of almost 250,000 data examples, taking into account all the months from January 2009 to March 2020 and the 52 Spanish territories plus the whole country. Using all this data, we will carry out a feature selection and then forecast future GBV complaints.

5. Methodology

5.1. Territories under Study

As previously stated, a large amount of data is under consideration. The purpose of this work is to study the possibility of forecasting GBV complaints so as to provide reliable information to optimize public resources and to schedule actions in advance, thereby being useful for other countries—with the necessary adjustments. In this sense, instead of proving our methodology in each and every Spanish province, we will test our proposed procedure in some particular cases: the whole country (Spain) and the three representative provinces of Madrid, Alicante, and Segovia—representing large, medium, and low populations, respectively. In addition, each of them has its own and differentiated characteristics in terms of location and economy, as well as idiosyncrasies and cultural aspects.
-
Spain: A Mediterranean country and member of the European Union. The total population consists of 47,329,981 people.
-
Madrid: locating the homonymous capital city of Spain, with a population of 6,661,949 people, is centered on the country’s map and has a dynamic economy.
-
Alicante: In the east of Spain with a population of 1,858,683 people. It has a marked open and Mediterranean character, medium-range age inhabitants, and a flourishing economy.
-
Segovia: An inland province located in the west of Spain with a population of only 153,342 people and an aging population.

5.2. The Waikato Environment for Knowledge Analysis (WEKA)

The Waikato Environment for Knowledge Analysis (WEKA v.3.8) is free software developed at the University of Waikato, New Zealand (https://waikato.github.io/weka-wiki/) and licensed under the GNU (GNU’s Not Unix) General Public License. WEKA contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces that offer easy access to these functions. The software also supports several standard data-mining tasks—more specifically, data preprocessing, clustering, classification, regression, visualization, feature selection, modeling, and forecasting.
The use of WEKA facilitates data entry, algorithm execution, and visual context in the management of the entire process. This software has been successfully applied many times before and is still being applied in recent literature. Thus, Hussain et al. used it in 2018 to study educational aspects with data mining techniques [63], or Kiranmai et al. to classify electrical power problems [64]. WEKA software is booming, and new modules are developed every year, like the one presented by Lang et al. in their 2019 work on deep learning [65].

5.3. Computer Hardware

Due to the computational demands of the ML algorithms, they have been executed using a computer equipped with an AMD Ryzen 7 1700X processor, operating at 3.8 GHz with 32 GB DDR4 RAM at 2666 MHz CL19 and a Solid-State Disk Samsung 970 Evo Plus M.2 1000 GB PCI-E 3.0.

5.4. Data Cleaning, Regularization, and Lagged Variables

The above-mentioned database should be transformed in due course to provide the proper inputs of feature selection algorithms. The values are cleaned, and some gaps have been completed. In order to later make a fair comparison between the whole country and some provinces (which will become study cases), all the features were divided by the population (hundreds, thousands, tens of thousands).
As some features could have a delayed influence, they were given six lagged values (which means taking into account the last six months), except for the categorized and date features. The TimeSeriesLagManager routine of WEKA allows for easily creating as many lagged variables as required.

5.5. Features Selection

WEKA offers an intuitive graphical environment for carrying out a feature selection. The module AttributeSelection allows for specifying different Search Methods and Attribute Evaluators, with some combinations being tested and then evaluated at the forecasting stage. The features set that provides a more accurate prediction will become the chosen features. A short introduction to FS methods was presented in Section 3.1.

5.5.1. Search Methods

As stated, we use two searching methods: The Multi-Objective Evolutionary Search Strategy and also a Ranker Strategy.
-
Multi-Objective Evolutionary Search Strategy (MOES): In particular, we execute the multi-objective evolutionary algorithm known as the Evolutionary NOn-dominated Radial slots-based Algorithm (ENORA) as a selection strategy for a random search method, which minimizes the selected features and also the RMSE [66].
-
Ranker: This search strategy makes ranks of features one by one by utilizing their evaluations [67].

5.5.2. Attribute Evaluators

From the feature selection methods offered by WEKA, we will choose the two most popularly used:
-
Wrapper methods. The WrapperSubsetEval routine implemented in WEKA will allow us to evaluate some approaches via multivariate techniques. For univariate ones, we need to instead use the ClassifierAttributeEval procedure. We will execute the following predictors:
Linear Regression: This offers fast computation, fixing the coefficients for each feature.
Random Forest [68]: As stated earlier, this is a tree-based algorithm well-known for classification purposes.
Instance-Based K-nearest neighbor algorithm (IBk) [69]: A K-nearest neighbors classifier, this algorithm allows for selecting an appropriate value of K based on cross-validation but is also able to carry out distance weighting.
-
Filter Method. On the side of the univariate methods, we will use the Ranker operation according to the below predictors:
Relief Attribute (Rlf) [70]: Relief feature selection is based on scoring by the identification of feature value differences between the nearest neighbor instance pairs.
Principal Component Analysis (PCA) [71]: With this technique, a new set of orthogonal coordinate axes is introduced, and, at the same time, the sample data variance is maximized. This leads to the scenario that the other directions, in which the variance is minor, are less important and, hence, can be removed from the dataset. PCA offers a very effective way of transforming the data in a lower dimensionality, while also being able to reveal some simplified patterns that often underlie the data.

5.5.3. Generated Subsets

With the exposed techniques, combined as indicated in Table 2, we can generate seven subsets of reduced data that will be under evaluation in the forecasting task. In all the exposed cases of FS, the metric to be optimized is the RMSE. In addition, we will study the prediction strategies for the original dataset, which are exposed in the following subsection. Table 3 compiles the different commands used in WEKA, presenting the used parameters.
Table 2. Applied Features Selection techniques.
Table 3. WEKA commands for Feature Selection.

5.6. Data Modeling and Forecasting

Once the seven reduced data subsets have been achieved, plus the original dataset, we will try to make a prediction of future values, taking into account the past time series collected in each dataset. We will attempt to forecast the complaints regarding GBV for a predictive horizon of six months but, in order to have the real data to evaluate/validate the prediction, we will apply a Cross-Validation (CV) method designed for time series [72]. We will train with a subset and then forecast the next six months/steps (for which the data is available but not included in the training dataset).
For this purpose, we use the time series Forecasting (http://wiki.pentaho.com/display/DATAMINING/Time+Series+Analysis+and+Forecasting+with+Weka) module of WEKA (v. 1.027). For each dataset, we use the following approaches as forecasting algorithms, always indicating the accuracy in terms of RMSE.
-
Linear Regression (LR).
-
Support Vector Machines (SVM).
-
Random Forest (RF).
-
Gaussian Process (GP).
A description of each method can be found in Section 3.2. Table 4 expresses the WEKA commands and the parameters of each technique.
Table 4. WEKA commands for forecasting.

6. Results and Discussion: Forecasting Performance

The FS procedure generates seven subsets which, together with the original complete dataset, will be tested using four forecasting techniques. In this way, we will execute 32 prediction tests for the whole country (Spain), forecasting six months of the time series of GBV-complaints. In each experiment, the predictive algorithm will first be trained with a subset of the data and then a predictive horizon of the next six months/steps will be forecast, executing a CV. An example of this first phase resulting in a trained model is shown in Figure 1, for the specific case of the subset MOES-RF and using RF as a predictive technique. At a glance, it can be seen that there is a cyclic stationary behavior combined with a certain tendency.
Figure 1. Training phase with RF algorithm of the subset MOES-RF for GBV complaints data series.
The results of the forecasting task can be found in Table 5. With each predictive algorithm, and for each subset of data, we calculate the accuracy of the next six months/steps of the GBV complaints series. Using the CV technique, we can obtain the RMSE for each future step, then, as a measure of performance, obtain an average of the six values of RMSE regarding each FS technique ( RMSE ¯ ). The standard deviation is also estimated in each forecasted series in order to infer the accuracy’s variability.
Table 5. RMSE to 6-step GBV complaints forecasting in Spain.
A Shapiro–Wilk test was used to determine whether the data presented a normal distribution for each 6-steps prediction. The results indicated that the data was normally distributed (p-values > 0.05). To compare the similarity of the eight forecasted series per prediction technique, we performed the parametric Welch’s T-test. The results indicated that the 6-steps evolution of the original dataset differed significantly from the other results (p-value < α = 0.05) in the four cases (LR, RF, SVM, and GP).
As can be seen in Table 5, the lower RMSE averaged between steps ( RMSE ¯ ) is obtained using RF as a foresight algorithm with the MOES-RF dataset ( RMSE ¯ = 0.1686 u/10,000 pop). Other FS approaches are also promising. Figure 2 shows different predictions using RF from all of the datasets. Rnk-RF also provides an accurate result in most of the predictive situations, which can indicate that the use of RF as a predictor in each attribute evaluator is an interesting choice. MOES-LR closely follows the performance in this particular situation under analysis.
Figure 2. Comparative evolution of GBV complaints’ RMSE evolution obtained via an RF predictive algorithm with different FS techniques.
But, as stated in the results, MOES-RF seems to be the better dataset for utilization. According to Table 5, with all the four different predictive techniques, the best accuracy is achieved with this subset. When using this FS combination, the best predictive algorithm is RF, as depicted in Figure 3, where it can be seen that RF offers the best average accuracy because of the low standard deviation in the RMSE in each step—which results in performance stability. SVM is able to achieve a better performance in short prediction, as well as LR, but they soon increase errors in future steps—which can be inferred by their bigger standard deviation.
Figure 3. Comparative evolution of GBV complaints’ RMSE evolution obtained with MOES-RF feature selection technique for different predictive algorithms.
The better forecasting results with RF can be also inferred by averaging the averaged RMSE ( RMSE ¯ ), expressed by RMSE ¯ ¯ , and then by making a comparison between prediction methods. RF then offers an RMSE ¯ ¯ = 0.2046 u/10,000 pop, followed by GP with RMSE ¯ ¯ = 0.2680 u/10,000 pop. SVM results in variable performance (as shown in Figure 3), averaging 0.2963 u/10,000 pop. LR cannot follow the accuracy of the other algorithms even closely.
We need to bear in mind that, for a population of 47,329,981 inhabitants, the achieved RMSE ¯ ¯ for RF is equal to a RMSE of 968.37 complaints for all the country (per month). Taking into account that the average of complaints in 2019 per month in Spain was 14,014 petitions, the error is around 6% and good enough for our purposes.
Taking into account these results, it is possible to make a practical check of the prediction evolution in the months from October 2019 to March 2020 using MOES-RF with RF as a forecasting technique, before comparing the results with the real evolution. As can be observed, the prediction is accurate enough to identify trends, while logically being more separated from the real curve as we progressively expand the predictive horizon (Figure 4).
Figure 4. Prediction for the months October 2019 to March 2020 and real data comparison. Feature selection: MOES-RF. Forecasting technique: RF.
Following the same procedure with the selected Spanish provinces of different populations (Madrid, Alicante, and Segovia), we can confirm that similar results can be found. Table 6 summarizes the RMSE ¯ of the 6-step predictions carried out with the four predictive techniques and with each of the eight data subsets.
Table 6. Average RMSE to 6-step GBV complaints forecasting in different territories.
From the results shown in Table 6, valuable conclusions can be obtained. MOES-RF appears to be the best FS technique, with RF being the best forecasting algorithm. This performance is consistent and to be expected as the time-series data from each province makes up the subsets of the whole country or, seen in a different light, the analyzed data from Spain makes up the sum of the provinces. Nevertheless, we have to highlight that RMSE ¯ ¯ is higher for each predictive technique when the province is less populated. This can be explained because, when we are considering the whole of Spain or Madrid (millions of people), the consistency of some data is flush, and in some way occasional and punctual situations are retailed averaged in a big population, showing a smoother evolution of the social variables. On the other hand, with a low population, every single fluctuation stands out more, resulting in more variability in the prediction and, hence, higher error and a bigger standard deviation. This particular case can be easily appreciated by studying the table, showing that RMSE ¯ ¯ corresponding to each technique is around three times bigger than the whole country when applied to Segovia—0.2963 to 0.7190 u/10,000 pop (Spain and Segovia, respectively) when using SVM, or 0.2680 to 0.3852 u/10,000 pop (Spain—Segovia) when using GP. To deepen this idea, we average the RMSE ¯ ¯ of each technique by territory one more time (referred to as RMSE ¯ ¯ ¯ in Table 6), allowing us to appreciate this evolution clearly, growing from 0.2826, 0.4076, 0.5839, and 0.6389 u/10,000 pop (Spain, Madrid, Alicante, and Segovia, respectively).
Although, for the sake of simplicity, predictions are not detailed in every step of the provinces’ comparison. Figure 5 shows the instance evolution or RMSE in each territory forecasting with MOES-RF (chosen as FS) and RF. As can be seen, all of the forecasting steps show the stability of RF as a predictive algorithm, although higher values and a more oscillating prediction can be observed in the case of Segovia.
Figure 5. Evolution of RMSE in 6-step GBV complaints’ forecast, performed by FS dataset MOES-RF and RF as predictive techniques for different territories.
Considering these results and the discussion, some important conclusions can be made in the next section.

7. Conclusions and Future Works

GBV makes for one of the great unresolved problems of our time that require urgent attention. Allocating resources of all kinds is essential for tackling these situations before they occur, allowing authorities to anticipate and act before the aggressions take place.
It is necessary, therefore, to consider the extent to which we can predict the incidence of this violence in order to optimize resources and allocate the necessary means in the most appropriate manner, both in time and space. Until now, achieving such a forecast has proven complex, but the periodic collection of data that reflects the state and evolution of society, together with the increased knowledge and applicability of ML algorithms, provides a new avenue for addressing the challenge of predicting the temporal evolution of gender violence.
In this work, the possibility of predicting the reports of gender violence with acceptable accuracy has been proven, with the most appropriate technique for selecting variables and the best predictive algorithm performance having been discussed. After testing eight sets with four known predictive techniques, it has been found that the most appropriate technique is one that combines MOES-RF as a variable selection with RF to predict future values. This conclusion has been obtained by using the data corresponding to the whole of Spain from January 2009 to September 2019, which has been corroborated by comparing it with certain provinces of the country with differing populations, such as Madrid, Alicante, and Segovia—each with a particular casuistry.
Given the difference in population of the provinces studied, as well as their different geographical situation, an adequate prediction per province allows for a correct distribution of the available state resources, so that awareness campaigns, police intervention, as well as economic resources and other social policies are distributed over time in a more efficient manner. Although it can be inferred from the study that there is seasonality in general, the maximum incidence from one province to another may differ, which, thanks to the results of this work, will allow for more adjusted planning in the provincial distribution of resources.
Other combinations of FS and predictive algorithms are also promising and may also be useful. Although there is consistency between the behavior of ML techniques in each territory, it has been shown that errors increase when the population decreases, as well as the error dispersion (greater variation), giving an impression that, the larger the population, the greater consistency in the data collected, which will reflect not a particular circumstance in time, but the presence of underlying circumstances with predictable cause–effect relationships. A smaller study population will mean isolated circumstances marking the oscillation of certain variables more significantly—a dynamic that will be attenuated in larger populations.
In any case, this work intends, rather than showing concrete results in a specific period of time in Spain and some of its provinces, to present a specific methodology and to study its viability. With the conclusions drawn, we aim to serve as a basis for studies similar to ours in other countries/territories with comparable (or other) variables to be taken into account. In this sense, some other public databases could validate the proposed methodology. The European Institute for Gender Equality, in its Gender Statistics Database (https://eige.europa.eu/gender-statistics/dgs), provide several data that can be used for validation purposes.
For this reason, future work should look to test other combinations of attribute selection and prediction, as well as replicating our method to address other social issues involving a large number of people (migration, education, consumption, economy, etc.), and continuing to check the performance of the work described here.

Author Contributions

Conceptualization, I.R.-R., J.-V.R., P.H.-G. and D.-J.P.-Q.; methodology, I.R.-R., P.H.-G., J.-V.R. and D.-J.P.-Q.; software, I.R.-R. and D.-J.P.-Q.; validation, I.R.-R., J.-V.R., P.H.-G. and D.-J.P.-Q.; formal analysis, I.R.-R., J.-V.R. and P.H.-G.; investigation, I.R.-R., J.-V.R., P.H.-G. and D.-J.P.-Q.; resources, J.-V.R. and I.C.; data curation, I.R.-R., J.-V.R. and D.-J.P.-Q.; writing—original draft preparation, I.R.-R., P.H.-G. and D.-J.P.-Q.; writing—review and editing, I.R.-R., P.H.-G., J.-V.R. and I.C.; visualization, I.R.-R., J.-V.R. and D.-J.P.-Q.; supervision, J.-V.R. and I.C.; project administration, J.-V.R. and I.C.; funding acquisition, I.C. All authors have read and agreed to the published version of the manuscript.

Funding

Ignacio Rodríguez-Rodríguez would like to thank the support of Programa Operativo FEDER Andalucía 2014–2020 under Project No. UMA18-FEDERJA-023 and Universidad de Málaga, Campus de Excelencia Internacional Andalucía Tech.

Acknowledgments

The authors would like to thank to Instituto Nacional de Estadística−INE (Spain) for its availability of the data and to the Instituto Universitario de Investigación Estudios de Género of Universidad de Alicante (Spain) for its support.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Devries, K.M.; Mak, J.Y.; Garcia-Moreno, C.; Petzold, M.; Child, J.C.; Falder, G.; Pallitto, C. The global prevalence of intimate partner violence against women. Science 2013, 340, 1527–1528. [Google Scholar] [CrossRef] [PubMed]
  2. Hyman, I.; Forte, T.; Mont, J.D.; Romans, S.; Cohen, M.M. Help-seeking rates for intimate partner violence (IPV) among Canadian immigrant women. Health Care Women Int. 2006, 27, 682–694. [Google Scholar] [CrossRef] [PubMed]
  3. Haraway, D. A manifesto for cyborgs: Science, technology, and socialist feminism in the 1980s. In Feminism/Postmodernism; Routledge: New York, NY, USA, 1990; pp. 190–233. [Google Scholar]
  4. Rodríguez-Rodríguez, I.; Rodríguez, J.V.; Elizondo-Moreno, A.; Heras-González, P.; Gentili, M. Towards a Holistic ICT Platform for Protecting Intimate Partner Violence Survivors Based on the IoT Paradigm. Symmetry 2020, 12, 37. [Google Scholar] [CrossRef]
  5. Rodríguez-Rodríguez, I.; Zamora-Izquierdo, M.Á.; Rodríguez, J.V. Towards an ICT-based platform for type 1 diabetes mellitus management. Appl. Sci. 2018, 8, 511. [Google Scholar] [CrossRef]
  6. Bryant, R.; Katz, R.H.; Lazowska, E.D. Big-data Computing: Creating Revolutionary Breakthroughs in Commerce, Science and Society. In Computing ResearchInitiatives for the 21st Century, Computing Research Association; Version 8; Washington, DC, USA, 2008; Available online: http://www.cra.org/ccc/docs/init/Big_Data.pdf (accessed on 11 August 2020).
  7. Islam, A.; Akter, A.; Hossain, B.A. HomeGuard: A Smart System to Deal with the Emergency Response of Domestic Violence Victims. arXiv 2018, arXiv:1803.09401. [Google Scholar]
  8. Hegde, N.; Bries, M.; Swibas, T.; Melanson, E.; Sazonov, E. Automatic recognition of activities of daily living utilizing insole-based and wrist-worn wearable sensors. IEEE J. Biomed. Health Inform. 2017, 22, 979–988. [Google Scholar] [CrossRef]
  9. Glaeser, E.L.; Hillis, A.; Kominers, S.D.; Luca, M. Crowdsourcing city government: Using tournaments to improve inspection accuracy. Am. Econ. Rev. 2016, 106, 114–118. [Google Scholar] [CrossRef]
  10. Cranmer, S.J.; Desmarais, B.A. What Can We Learn from Predictive Modeling? Political Anal. 2017, 25, 145–166. [Google Scholar] [CrossRef]
  11. Molina, M.; Garip, F. Machine learning for sociology. Ann. Rev. Sociol. 2019, 45, 27–45. [Google Scholar] [CrossRef]
  12. Kleinberg, J.; Ludwig, J.; Mullainathan, S.; Obermeyer, Z. Prediction policy problems. Am. Econ. Rev. 2015, 105, 491–495. [Google Scholar] [CrossRef]
  13. Cederman, L.E.; Weidmann, N.B. Predicting armed conflict: Time to adjust our expectations? Science 2017, 355, 474–476. [Google Scholar] [CrossRef] [PubMed]
  14. Beck, N.; King, G.; Zeng, L. Improving quantitative studies of international conflict: A conjecture. Am. Political Sci. Rev. 2000, 94, 21–35. [Google Scholar] [CrossRef]
  15. Brandt, P.T.; Freeman, J.R.; Schrodt, P.A. Real time, time series forecasting of inter-and intra-state political conflict. Confl. Manag. Peace Sci. 2011, 28, 41–64. [Google Scholar] [CrossRef]
  16. Perry, C. Machine learning and conflict prediction: A use case. Stab. Int. J. Secur. Dev. 2013, 2, 56. [Google Scholar]
  17. Kleinberg, J.; Liang, A.; Mullainathan, S. The Theory is Predictive, But is it Complete? An Application to Human Perception of Randomness. In Proceedings of the 2017 ACM Conference on Economics and Computation, Cambridge, MA, USA, 26–30 June 2017; pp. 125–126. [Google Scholar]
  18. Coglianese, C.; Lehr, D. Regulating by robot: Administrative decision making in the machine-learning era. Geo LJ 2016, 105, 1147. [Google Scholar]
  19. Lawrenz, F.; Lembo, J.F.; Schade, T. Time series analysis of the effect of a domestic violence directive on the number of arrests per day. J. Crim. Justice 1988, 16, 493–498. [Google Scholar] [CrossRef]
  20. Ozkan, T. Predicting Recidivism through Machine Learning. Doctoral Dissertation, University of Texas, Dallas, TX, USA, 2017. [Google Scholar]
  21. Ward-Lasher, A.; Sheridan, D.J.; Glass, N.E.; Messing, J.T. Prediction of Interpersonal Violence: An Introduction. Assess. Danger. 2017, 1, 1–23. [Google Scholar]
  22. Berk, R.A.; Sorenson, S.B.; Barnes, G. Forecasting domestic violence: A machine learning approach to help inform arraignment decisions. J. Empir. Leg. Stud. 2016, 13, 94–115. [Google Scholar] [CrossRef]
  23. Holcomb, J.P.; Sharpe, N.R. Forecasting police calls during peak times for the city of Cleveland. Case Stud. Bus. Ind. Gov. Stat. 2006, 1, 47–53. [Google Scholar]
  24. Sherman, L.W. Policing domestic violence 1967–2017. Criminol. Public Policy 2018, 17, 453–465. [Google Scholar] [CrossRef]
  25. Cohn, E.G. The prediction of police calls for service: The influence of weather and temporal variables on rape and domestic violence. J. Environ. Psychol. 1993, 13, 71–83. [Google Scholar] [CrossRef]
  26. Goodman, L.A.; Smyth, K.F.; Borges, A.M.; Singer, R. When crises collide: How intimate partner violence and poverty intersect to shape women’s mental health and coping? Trauma Violence Abus. 2009, 10, 306–329. [Google Scholar] [CrossRef]
  27. Hilton, N.Z.; Eke, A.W. Assessing risk of intimate partner violence. Assess. Danger. 2017, 207, 139–178. [Google Scholar]
  28. Heras-González, P.; Nardi-Rodríguez, A. Respuesta institucional a la Violencia de Género en la Comunidad Valenciana (España). Institutional response to Gender-based Violence in the Valencian Community (Spain). General. Valencia. Serv. Publ. 2020, 1, 1–30. [Google Scholar]
  29. Thornton, S. Police Attempts to Predict Domestic Murder and Serious Assaults: Is Early Warning Possible Yet? Camb. J. Evid.-Based Policy 2017, 1, 64–80. [Google Scholar] [CrossRef]
  30. Chalkley, R.; Strang, H. Predicting domestic homicides and serious violence in Dorset: A replication of Thornton’s Thames Valley analysis. Camb. J. Evid.-Based Policy 2017, 1, 81–92. [Google Scholar] [CrossRef]
  31. Delgadillo-Aleman, S.; Ku-Carrillo, R.; Perez-Amezcua, B.; Chen-Charpentier, B. A mathematical model for intimate partner violence. Math. Comput. Appl. 2019, 24, 29. [Google Scholar] [CrossRef]
  32. Poza, E.; Jódar, L.U.C.A.S.; Barreda, S. Mathematical Modeling of Hidden Intimate Partner Violence in Spain: A Quantitative and Qualitative Approach. In Abstract and Applied Analysis; Hindawi: New York, NY, USA, 2016; Volume 2016. [Google Scholar]
  33. Guyon, I.; Elissee, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  34. Sheikhpour, R.; Sarram, M.A.; Gharaghani, S.; Chahooki MA, Z. A survey on semi-supervised feature selection methods. Pattern Recognit. 2017, 64, 141–158. [Google Scholar] [CrossRef]
  35. Hastie, T.; Tibshirani, R.; Tibshirani, R.J. Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv 2017, arXiv:1707.08692. [Google Scholar]
  36. Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef]
  37. Karegowda, A.G.; Jayaram, M.A.; Manjunath, A.S. Feature subset selection problem using wrapper approach in supervised learning. Int. J. Comput. Appl. 2010, 1, 13–17. [Google Scholar] [CrossRef]
  38. Yang, K.; Yoon, H.; Shahabi, C. A supervised feature subset selection technique for multivariate time series. In Proceedings of the Workshop on Feature Selection for Data Mining: Interfacing Machine Learning with Statistics, New Port Beach, CA, USA, 23 April 2005; pp. 92–101. [Google Scholar]
  39. Crone, S.F.; Kourentzes, N. Feature selection for time series prediction—A combined filter and wrapper approach for neural networks. Neurocomputing 2010, 73, 1923–1936. [Google Scholar] [CrossRef]
  40. Sánchez-Maroño, N.; Alonso-Betanzos, A.; Tombilla-Sanromán, M. Filter Methods for Feature Selection—A Comparative Study. In International Conference on Intelligent Data Engineering and Automated Learning; Springer: Berlin/Heidelberg, Germany, 2007; pp. 178–187. [Google Scholar]
  41. Fonti, V.; Belitser, E. Feature selection using lasso. VU Amst. Res. Pap. Bus. Anal. 2017, 30, 1–25. [Google Scholar]
  42. Zhang, H.; Zhang, R.; Nie, F.; Li, X. A Generalized Uncorrelated Ridge Regression with Nonnegative Labels for Unsupervised Feature Selection. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2781–2785. [Google Scholar]
  43. Zitzler, E.; Thiele, L. Multiobjective evolutionary algorithms: A comparative case study and the strength Pareto approach. IEEE Trans. Evol. Comput. 1999, 3, 257–271. [Google Scholar] [CrossRef]
  44. Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 2013, 34, 483–519. [Google Scholar] [CrossRef]
  45. Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Distributed feature selection: An application to microarray data classification. Appl. Soft Comput. 2015, 30, 136–150. [Google Scholar] [CrossRef]
  46. Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef]
  47. Brockwell, P.J.; Davis, R.A.; Calder, M.V. Introduction to Time Series and Forecasting; Springer: New York, NY, USA, 2002; Volume 2, pp. 3118–3121. [Google Scholar]
  48. Faloutsos, C.; Gasthaus, J.; Januschowski, T.; Wang, Y. Forecasting big time series: Old and new. Proc. Vldb Endow. 2018, 11, 2102–2105. [Google Scholar] [CrossRef]
  49. Kalekar, P.S. Time series forecasting using holt-winters exponential smoothing. Kanwal Rekhi Sch. Inf. Technol. 2004, 4329008, 1–13. [Google Scholar]
  50. Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
  51. Schölkopf, B.; Smola, A.J. A Short Introduction to Learning with Kernels. In Advanced Lectures on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2003; pp. 41–64. [Google Scholar]
  52. Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2002; Volume 26. [Google Scholar]
  53. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  54. Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
  55. Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How Many Trees in A Random Forest? In International Workshop on Machine Learning and Data Mining in Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2012; pp. 154–168. [Google Scholar]
  56. Williams, C.K.; Barber, D. Bayesian classification with gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1342–1351. [Google Scholar] [CrossRef]
  57. Ortmann, L.; Shi, D.; Dassau, E.; Doyle, F.J.; Leonhardt, S.; Misgeld, B.J. Gaussian process-based model predictive control of blood glucose for patients with type 1 diabetes mellitus. In Proceedings of the 2017 11th Asian Control Conference (ASCC), Gold Coast, QLD, Australia, 17–20 December 2017. [Google Scholar]
  58. Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2, p. 4. [Google Scholar]
  59. Landau, S.F.; Fridman, D. The seasonality of violent crime: The case of robbery and homicide in Israel. J. Res. Crime Delinq. 1993, 30, 163–191. [Google Scholar] [CrossRef]
  60. Bowlus, A.J.; Seitz, S. Domestic violence, employment, and divorce. Int. Econ. Rev. 2006, 47, 1113–1149. [Google Scholar] [CrossRef]
  61. Anderberg, D.; Rainer, H.; Wadsworth, J.; Wilson, T. Unemployment and domestic violence: Theory and evidence. Econ. J. 2016, 126, 1947–1979. [Google Scholar] [CrossRef]
  62. Brahmapurkar, K.P. Gender equality in India hit by illiteracy, child marriages and violence: A hurdle for sustainable development. Pan Afr. Med. J. 2017, 28, 178. [Google Scholar] [CrossRef]
  63. Hussain, S.; Dahan, N.A.; Ba-Alwib, F.M.; Ribata, N. Educational data mining and analysis of students’ academic performance using WEKA. Indones. J. Electr. Eng. Comput. Sci. 2018, 9, 447–459. [Google Scholar] [CrossRef]
  64. Kiranmai, S.A.; Laxmi, A.J. Data mining for classification of power quality problems using WEKA and the effect of attributes on classification accuracy. Prot. Control Mod. Power Syst. 2018, 3, 29. [Google Scholar] [CrossRef]
  65. Lang, S.; Bravo-Marquez, F.; Beckham, C.; Hall, M.; Frank, E. Wekadeeplearning4j: A deep learning package for weka based on deeplearning4j. Knowl.-Based Syst. 2019, 178, 48–50. [Google Scholar] [CrossRef]
  66. Jiménez, F.; Sánchez, G.; García, J.M.; Sciavicco, G.; Miralles, L. Multi-objective evolutionary feature selection for online sales forecasting. Neurocomputing 2017, 234, 75–92. [Google Scholar] [CrossRef]
  67. Novaković, J. Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav J. Oper. Res. 2016, 21, 119–135. [Google Scholar] [CrossRef]
  68. Nicodemus, K.K. Letter to the editor: On the stability and ranking of predictors from random forest variable importance measures. Brief. Bioinform. 2011, 12, 369–373. [Google Scholar] [CrossRef] [PubMed]
  69. Aha, D.W.; Kibler, D.; Albert, M.K. Instance-based learning algorithms. Mach. Learn. 1991, 6, 37–66. [Google Scholar] [CrossRef]
  70. Kononenko, I. (1994, April). Estimating Attributes: Analysis and Extensions of RELIEF. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 1994; pp. 171–182. [Google Scholar]
  71. Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
  72. Bergmeir, C.; Benítez, J.M. On the use of cross-validation for time series predictor evaluation. Inf. Sci. 2012, 191, 192–213. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.