Applied Machine Learning in Social Sciences: Neural Networks and Crime Prediction

: This study proposes a crime prediction model according to communes (areas or districts in which the city of Buenos Aires is divided). For this, the Python programming language is used, due to its versatility and wide availability of libraries oriented to Machine Learning. The crimes reported (period 2016–2019) that occurred in the city of Buenos Aires selected to test the model are: homicides, theft, injuries, and robberies. With this, it is possible to generate a crime prediction model according to the city area based on the SEMMA (Sample, Explore, Modify, Model, and Assess) model and after data manipulation, standardization and cleaning; clustering is performed using K-means and subsequently the neural network is generated. For prediction, it is necessary to provide the model with the information corresponding to the predictive characteristics (predict); these characteristics being according to the developed neural network model: year, month, day, time zone, commune, and type of crime.


Introduction
The Autonomous City of Buenos Aires, the capital of the Argentine Republic, has a relevant crime rate (Mancini 2020); however, the same is generally true of the large metropolises throughout Latin America (Kliksberg 2002;Sain 2012;Lozano et al. 2018); they also have a relevant crime rate. This rate of crime has increased throughout the country, mainly due to the socio-economic crisis that has hit it (Cid Ferreira et al. 2017;La Ruffa 2019). In this context, the government of the City of Buenos Aires has among its main objectives to implement various preventive measures so that the crime rate will decrease, working in fields such as: Nowadays it is an irrefutable fact that digitalization is changing the traditional methods and balances of the current economic and social organization model (Jorge-Vázquez et al. 2019). The implementations of these measures represent a significant percentage of the city's annual budget, which is around 15%. Therefore, it is necessary to carry out an exhaustive analysis of the data on the crimes that have occurred, so that the lines of preventive action are directed towards the most affected areas . In this way, based on the open data published in the portal of the Autonomous City of Buenos Aires corresponding to the years 2016 to 2019 (Ministerio de Justicia y Seguridad. Policía de la

Materials and Methodology
There are different methodologies that are generally referenced within the Data Mining, of which the most widely adopted is SEMMA, an acronym corresponding to the five basic phases of the process (Sample, Explore, Modify, Model, Assess) (Rodríguez et al. 2003;Gómez-Palacios et al. 2017). The SAS Institute, developer of this methodology, defines it as "the process of selecting, exploring and modeling large amounts of data to discover unknown business patterns" (SAS: Analytics, Artificial Intelligence and Data Management 2020). However, this same method has been subsequently analyzed by different authors for its practical application in different sectors  or (Arumugam et al. 2020). The following steps are shown in Figure 1. 1. Sample: The objective of this phase is to select a representative sample of the problem under study. The SEMMA methodology establishes that for each sample considered for the analysis of the process, the level of confidence of the sample must be associated. 2. Explore: Once a sample has been determined, the available information must be explored, with a view to simplifying and subsequently optimizing the model to be created. Visualization tools or statistical techniques are usually used to help reveal relationships between variables, and then facilitate the choice of these. 3. Modify: This phase consists of the manipulation of data to achieve the appropriate format for feeding the model to be built. 4. Model: The objective of this phase is to establish a relationship between the explanatory variables and the variables under study, which will make it possible to infer the value of these with a given level of confidence. The techniques used include traditional statistical methods (such as discriminant analysis, clustering methods and regression analysis), as well as data-based techniques such as neural networks, decision trees, etc. 5. Assess: consists of the evaluation of the results through the analysis of the goodness of the model or models, contrasted with other statistical methods or with new sample populations. 6. The choice of methodology is based mainly on the simplicity and dynamism of its application, considering SEMMA as the most appropriate. The predictive model of crimes presents a low complexity, characterized mainly by (SAS: Analytics, Artificial Intelligence and Data Management 2020): -It contemplates the use of public data that has a structure that requires little modification for its use. -It does not require the participation or advice of specialists for the generation of an acceptable predictive model. -It is possible to evaluate the results of the model with basic knowledge. -Interdisciplinary work is not required for the project, therefore, neither are efforts made to coordinate and standardize work processes.

Sample
Based on the working problem, in this case, the generation of a predictive model of crimes in the City of Buenos Aires, the objective in this phase consists of obtaining a representative sample of the universe of crimes that occurred in the defined geographical area, which allows the inference of the behavior of the population. The Autonomous City of Buenos Aires (CABA) is organized in 15 communes (Ministerio de Justicia y Seguridad. Policía de la Ciudad 2020). The concept of crimes is very broad, since it can include everything from drug trafficking to money laundering. However, when it comes to restricting it to a geographical area and associating it with preventive measures that can be taken, these must be within its area of competence. In the Argentine Republic, as in most countries, there are also different areas of management and responsibility, and those that can be prevented, investigated, and prosecuted by the authorities dependent on the City of Buenos Aires are reduced to the following:

Sample
Explore Modify Model Assess 1. Sample: The objective of this phase is to select a representative sample of the problem under study. The SEMMA methodology establishes that for each sample considered for the analysis of the process, the level of confidence of the sample must be associated.

2.
Explore: Once a sample has been determined, the available information must be explored, with a view to simplifying and subsequently optimizing the model to be created. Visualization tools or statistical techniques are usually used to help reveal relationships between variables, and then facilitate the choice of these.

3.
Modify: This phase consists of the manipulation of data to achieve the appropriate format for feeding the model to be built.

4.
Model: The objective of this phase is to establish a relationship between the explanatory variables and the variables under study, which will make it possible to infer the value of these with a given level of confidence. The techniques used include traditional statistical methods (such as discriminant analysis, clustering methods and regression analysis), as well as data-based techniques such as neural networks, decision trees, etc.

5.
Assess: consists of the evaluation of the results through the analysis of the goodness of the model or models, contrasted with other statistical methods or with new sample populations. 6.
The choice of methodology is based mainly on the simplicity and dynamism of its application, considering SEMMA as the most appropriate. The predictive model of crimes presents a low complexity, characterized mainly by (SAS: Analytics, Artificial Intelligence and Data Management 2020): -It contemplates the use of public data that has a structure that requires little modification for its use. -It does not require the participation or advice of specialists for the generation of an acceptable predictive model. -It is possible to evaluate the results of the model with basic knowledge. -Interdisciplinary work is not required for the project, therefore, neither are efforts made to coordinate and standardize work processes.

Sample
Based on the working problem, in this case, the generation of a predictive model of crimes in the City of Buenos Aires, the objective in this phase consists of obtaining a representative sample of the universe of crimes that occurred in the defined geographical area, which allows the inference of the behavior of the population. The Autonomous City of Buenos Aires (CABA) is organized in 15 communes (Ministerio de Justicia y Seguridad. Policía de la Ciudad 2020). The concept of crimes is very broad, since it can include everything from drug trafficking to money laundering. However, when it comes to restricting it to a geographical area and associating it with preventive measures that can be taken, these must be within its area of competence. In the Argentine Republic, as in most countries, there are also different areas of management and responsibility, and those that can be prevented, investigated, and prosecuted by the authorities dependent on the City of Buenos Aires are reduced to the following: -Homicide: Crime consisting of killing someone without the circumstances of malice, price or overkill. (Rodríguez Collao 2011) -Theft: Crime consisting in taking for profit other people's personal belongings against the owner's will, without the circumstances that characterize the crime of theft.
(Olave Albertini 2018) -Theft: A crime committed by taking possession of another's property for profit, using violence or intimidation against people, or force on things. (Ramos Pérez 2015).
The Autonomous City of Buenos Aires, as part of its policy of transparency, and following the global trend that exists in the publication of data of facts and management made, presents a portal of open data with 380 datasets, (Ministerio de Justicia y Seguridad. Policía de la Ciudad 2020) organized under different themes. In this portal, the crimes are under the category "Security". The portal indicates that information is available from 2016 to 2019, with the same registration structure, on four types of crimes and with a level of detail at the daily and commune levels. In accordance with the preceding sections, the required data sample should include -Classification of crimes: including homicide, theft, and robbery. -Geographic area: corresponding to the Autonomous City of Buenos Aires, detailed by commune. -Timing: that has a level of detail that allows conclusions to be drawn in an appropriate manner, for example, at the daily level, as opposed to annual totals that would make it difficult to properly exploit the data.
In this way, the crime datasets published in the website of the City of Buenos Aires, corresponding to the years 2016 to 2019, will be considered. Although it is understood that the number of actual crimes is highly superior, mainly with regard to thefts and minor aspects, the information contained in the datasets corresponds to those that have been reported, constituting an appropriate sample for the work to be carried out. In principle, this source is considered sufficient for the generation of the predictive model. The data imported separately need to be consolidated in order to carry out their joint treatment. This process may present different challenges, which must be properly managed in order to obtain quality data. In the case of analysis, no problems have arisen, since it has a homogeneous structure and similar data in each dataframe, obtaining the dataset that is shown in Appendix A, Figure A1, which will form the basis of all the work.

Explore
In the exploration phase, the data from the selected information sources is analyzed, with a view to understanding its structure and the linkage it has with a view to the problem to be solved, in order to select the aspects that are most closely linked to it (Abdul . In order to address this phase, we will carry out the following tasks: -Exploration of each column, whose result appears in Table A1 of Appendix A -Analysis of quantities, which are shown in Figures 2 and 3. -Column selection, which is shown in Table 1.
As part of the exploration of the information, it is very useful to generate graphs that show totals of records by different characteristics, such as time zone, type of crime, communes, etc. It should be noted that under this type of data analysis, unique data types, such as the case of the "id" column, should not be considered. It is frequent that, throughout the use of the methodology, it is necessary to alter the order of the phases, as in this case when it is useful to advance on the transformation of the date field, dividing it into the day, month and year columns. There are characteristics for which the data present a uniform distribution, or very close to it, such as "year", "month", and "day".
From the graph on the left, we can already see certain trends such as that relating to the month of January, in which crimes increased over the years, while in November crimes decreased over the years. Crimes increased during six months, mainly those at the beginning of the year (January, February, March, April, June, and July); while they decreased during the remaining six months, mainly during the last half of the year (May, August, September, October, November, and December).  From the graph on the left, we can already see certain trends such as that relating to the month of January, in which crimes increased over the years, while in November crimes decreased over the years. Crimes increased during six months, mainly those at the beginning of the year (January, February, March, April, June, and July); while they decreased  From the graph on the left, we can already see certain trends such as that relating to the month of January, in which crimes increased over the years, while in November crimes decreased over the years. Crimes increased during six months, mainly those at the beginning of the year (January, February, March, April, June, and July); while they decreased pattern of continuous growth of cases over the years is identified. In the last year there has been a decrease in crimes in the ranges of 1:00 h to 7:00 h and 13:00 h to 23:00 h. In the last year an increase is observed in the 0 hour and in the range of 8:00 h to 12:00 h.
From the right graph, it is observed that with the years a diminution of crimes is produced in three communes (6, 10 and 11). No clear pattern of continuous growth of cases over the years is identified. In the last year there has been a decrease in crimes in seven municipalities (46.67%), which are: 5, 6, 7, 9, 10, 11, and 15. In the last year there has been an increase in eight municipalities (53.33%), which are: 1, 2, 3, 4, 8, 12, 13, 14. After analyzing the information provided by the various graphs, and by virtue of achieving the objectives of the model to be built, there are columns that will be discarded: -Id: the fields that are unique for each record are essential for the transactional processes, however, they do not provide any information in the predictive models. -Date: the temporary information in date format, do not provide substantial information in the predictive models, although they do provide information that can be extracted from them, such as the number of the month of the year. -Neighborhood: although this data could be very useful, it has been defined in advance that the highest level of detail would be the commune, therefore, this column that bridles a higher level of granularity will be discarded. -Lat and Long: both columns provide information on the geographic coordinates of the place where the reported crime occurred, therefore, it is a level of detail even higher than the one that bridles the information of the neighborhood, and since it was decided to limit the level of detail to the commune, it is appropriate to exclude it. -Sub-type of crime: In the area of definition and application of preventive measures for the different crimes, it is very important to have the level of detail of the crime. However, since this column presents more than 85% of the null values, it is decided to exclude it.
Thus, to date, the columns or characteristics that make up the data set are:

Modify
This phase is aimed at manipulating the available data to adapt it to the form that allows obtaining adequate results from the Machine Learning models (Krishnan et al. 2016;Gudivada et al. 2017 Accordingly, this phase will be addressed according to the following structure: Data cleansing, Coding, Information consolidation, and Standardization. Regarding data cleaning: The crime data have no outliers, are quite limited and therefore do not require special analysis in this regard. Only the value 'S/D' will be removed from the "time_band" field, which will allow converting the data type to integer (int). With respect to the null data, the column with a high percentage of null data was already discarded in the selection of attributes, which is the case of the subtype of crime. This leaves only the case of the 'commune' attribute, where there are 8324 rows with a null value (1.72 percent). Given that the concept of commune is highly relevant to the analysis to be undertaken, in this case it is appropriate to eliminate those rows (Krishnan et al. 2016).
With respect to coding, analyzing the data types of the columns of the data set results in the following: It can be observed (see Figure A2 in Appendix A) that the data types of 'time_stripe' and 'crime_type' are not correct, since they are indicated as 'object' when they should be of integer type (int). The field 'time_stripe' is not interpreted as a numerical data type because before it had the value 'S/D', but as the records that had this information have been eliminated, it is possible to make the modification without problems. The case of 'crime_type' is different, because it has non-numerical values, and must be transformed to incorporate it into the models. This change can be made manually or using libraries such as the case of "LabelEncoder" belonging to Sklearn.
As for the consolidation of information, in the selection of columns, characteristics that provided a higher level of granularity have been eliminated, such as the neighborhood. In this way, it is expected that the dataset has repeated rows, and depending on the model to be created, it is required to apply aggregation techniques for the consolidation of records. All this is shown in Figure A3 of Appendix A. The resulting dataset is more compact, with 123,681 rows less, equivalent to a 25% reduction. Although in most cases only one crime has been recorded, there are multiple cases that even reach 60.
Finally, with respect to standardization, once the data cleaning process has been completed, given the variation in magnitudes, provided especially by the "year" column, the standardization must be carried out (Ilyas and Chu 2019), a process by which the data is transformed in such a way that the magnitudes are now within the range of 0 and 1. In this case the standardization will be applied to the set X, while it is not believed necessary to apply it to Y. The sets are formed in the following way: -X (independent variables): "year", "month", "day", "time_band", "commune", "crime_type". -Y (dependent variable): "quantity_registered".
For the standardization is used the library sklearn.preprocessing.StandardScaler, which for each column, and replaces the value of the set by z, resulting from the equation: where µ is the mean and σ is the standard deviation of the column analyzed.

Model
The objective of this phase is to establish a relationship between the explanatory variables and the variables under study, which will make it possible to infer their value with a given level of confidence (Nowozin and Wright 2012). The techniques used include traditional statistical methods such as clustering methods and regression analysis, as well as data-based techniques such as neural networks, decision trees, etc. (Al-Jarrah et al. 2015).

Clustering
K-Means is a grouping method, which aims at partitioning a set of n observations into k groups in which each observation belongs to the group whose mean value is closest. It is a method used in data mining (Morissette and Chartier 2013). This algorithm to perform the grouping requires the definition of the K parameter, for this we use the elbow method (elbow) consisting of evaluating the error that is obtained with different values of k, and at Soc. Sci. 2021, 10, 4 8 of 20 that point where there is an abrupt fall is the value that is selected (Bholowalia and Kumar 2014;Marutho et al. 2018). It is possible to observe in Figure 4, that with K = 4 there is a decrease in the errors.
there is a decrease in the errors.
In the exploration phase, the application of unsupervised models is very useful since they allow the identification of relationships between the different variables of the data set, which a priori are not evident. Such is the case of K-means, which is a grouping method, which divides a set of n observations into k groups, assigning to each the group whose mean value is the closest.
Once the partition is made, the exploitation of the findings, which is generally sought in a visual way (in 2 or 3 dimensions), seeks the representation of a subset of variables in which a separation (ideally linear) between the data can be clearly identified. Thus, for example, in this case it could be observed that, the graph of the data set considering the time band and the day, there is a linear separation between four groups formed. Although in K = 3 there is also a decrease in errors, finally K = 4 is selected. The reason is that, in other combinations of variables, it is not possible to clearly identify the 4 groups; as well as for reasons of ease and speed of processing.
The visualization of the data set for the selected K, by the predictive variables of "day" and "time_band", it is possible to visualize clearly the 4 groups that are delimited by the quadrants defined by day 16 and 12 h, as it is shown in Figure 5. In the exploration phase, the application of unsupervised models is very useful since they allow the identification of relationships between the different variables of the data set, which a priori are not evident. Such is the case of K-means, which is a grouping method, which divides a set of n observations into k groups, assigning to each the group whose mean value is the closest.
Once the partition is made, the exploitation of the findings, which is generally sought in a visual way (in 2 or 3 dimensions), seeks the representation of a subset of variables in which a separation (ideally linear) between the data can be clearly identified. Thus, for example, in this case it could be observed that, the graph of the data set considering the time band and the day, there is a linear separation between four groups formed.
Although in K = 3 there is also a decrease in errors, finally K = 4 is selected. The reason is that, in other combinations of variables, it is not possible to clearly identify the 4 groups; as well as for reasons of ease and speed of processing.
The visualization of the data set for the selected K, by the predictive variables of "day" and "time_band", it is possible to visualize clearly the 4 groups that are delimited by the quadrants defined by day 16 and 12 h, as it is shown in Figure 5.

Neural Networks
Within the scope of neural networks, in the context of the regression problem to be solved, a multilayer perception will be employed. This type of network is formed by multiple layers, which allow you to solve problems that are not linearly separable. It can be totally or locally connected. This multilayer perception has been used in different fields such as predicting solar radiation as a function of time , wind speed (Madhiarasan and Deepa 2016), landslides (Pham et al. 2017), or even rain (Esteves et al. 2018). This study will be applied to predict the commission of one of the crimes under study. In the first case each exit of a neuron of the "i" layer is an entry of all the neurons of the "i + 1" layer, while in the second case each neuron of the "i" layer is an entry of a series of neurons (region) of the "i + 1" layer.
The layers can be classified into three types, as shown in Figure 6.

Neural Networks
Within the scope of neural networks, in the context of the regression problem to be solved, a multilayer perception will be employed. This type of network is formed by multiple layers, which allow you to solve problems that are not linearly separable. It can be totally or locally connected. This multilayer perception has been used in different fields such as predicting solar radiation as a function of time , wind speed (Madhiarasan and Deepa 2016), landslides (Pham et al. 2017), or even rain (Esteves et al. 2018). This study will be applied to predict the commission of one of the crimes under study. In the first case each exit of a neuron of the "i" layer is an entry of all the neurons of the "i + 1" layer, while in the second case each neuron of the "i" layer is an entry of a series of neurons (region) of the "i + 1" layer.
The layers can be classified into three types, as shown in Figure 6.

Neural Networks
Within the scope of neural networks, in the context of the regression problem to be solved, a multilayer perception will be employed. This type of network is formed by multiple layers, which allow you to solve problems that are not linearly separable. It can be totally or locally connected. This multilayer perception has been used in different fields such as predicting solar radiation as a function of time , wind speed (Madhiarasan and Deepa 2016), landslides (Pham et al. 2017), or even rain (Esteves et al. 2018). This study will be applied to predict the commission of one of the crimes under study. In the first case each exit of a neuron of the "i" layer is an entry of all the neurons of the "i + 1" layer, while in the second case each neuron of the "i" layer is an entry of a series of neurons (region) of the "i + 1" layer.
The layers can be classified into three types, as shown in Figure 6.  -Input Layer: Made up of those neurons that introduce the input patterns into the network. No processing takes place in these neurons. -Hidden layers: Formed by those neurons whose inputs come from earlier layers and whose outputs pass to neurons in later layers. -Output Layer: Neurons whose output values correspond to the outputs of the entire network.
In this case, a 3-layer model with the following characteristics will be used, which is also shown in Figure 7: 1.
Input layer: 16 neurons all fully-connected with a "ReLU" (rectified linear unit) activation (Kutyniok 2020) that has a value of 0 for negative values, and the identity function for positive values.
Output layer: 1 output layer with 1 neuron, and "linear" type activation.
In this case, a 3-layer model with the following characteristics will be used, which is also shown in Figure 7: 1. Input layer: 16 neurons all fully-connected with a "ReLU" (rectified linear unit) activation (Kutyniok 2020) that has a value of 0 for negative values, and the identity function for positive values. 2. Hidden layer: 1 hidden layer with 8 neurons and also ReLU type activation. 3. Output layer: 1 output layer with 1 neuron, and "linear" type activation.

Evaluation
In this phase, the results are evaluated by analyzing the goodness of the models, contrasting them with other statistical methods or with new sample populations. The best scenario has been provided by the multilayer perception, therefore, in this phase we will proceed to make a fine adjustment of the various parameters of that neural network. The variation in the number of layers was made starting from 2 and reaching up to 4, as shown in Figure 8.

Evaluation
In this phase, the results are evaluated by analyzing the goodness of the models, contrasting them with other statistical methods or with new sample populations. The best scenario has been provided by the multilayer perception, therefore, in this phase we will proceed to make a fine adjustment of the various parameters of that neural network. The variation in the number of layers was made starting from 2 and reaching up to 4, as shown in Figure 8.
The tests were carried out through the execution of 30 periods. Next, the results obtained on the training data (train) and the validation data (val) can be seen in Figure 9, showing the mean square error (MSE). For the training data, the lowest value of MSE was obtained by the 4-layer option, while in the validation data (val) it was obtained by the 3-layer architecture. It is considered that the most appropriate option is the 3-layer option because it has a more stable behavior against the other configurations. The tests were carried out through the execution of 30 periods. Next, the results obtained on the training data (train) and the validation data (val) can be seen in Figure 9, showing the mean square error (MSE). For the training data, the lowest value of MSE was The tests were carried out through the execution of 30 periods. Next, the results obtained on the training data (train) and the validation data (val) can be seen in Figure 9, showing the mean square error (MSE). For the training data, the lowest value of MSE was obtained by the 4-layer option, while in the validation data (val) it was obtained by the 3layer architecture. It is considered that the most appropriate option is the 3-layer option because it has a more stable behavior against the other configurations. Based on the architecture of the 3-layer perception, the alternatives considered in the variation of the amount of neurons, do so with different numbers in both the input and the hidden layers, while the output layer remains fixed in 1 neuron because it is subject to the output of the model. Five tests were carried out, with the diagrams shown in Figure  A4 of Appendix A.
The tests were carried out by running 30 periods. In Figure 10, the results obtained on the training data (train) and the validation data (val) for the mean square error (MSE) can be seen. In the training data, the lowest MSE was obtained by option 32-16-1, followed by 16-8-1 and 32-8-1. The behavior for the validation data (val) the lowest error corresponds to the 32-8-1 architecture, followed by 16-8-1. It is considered as the most appropriate architecture the one with 32 neurons in the input layer, 8 in the hidden layer, and 1 in the output layer (32-8-1). Based on the architecture of the 3-layer perception, the alternatives considered in the variation of the amount of neurons, do so with different numbers in both the input and the hidden layers, while the output layer remains fixed in 1 neuron because it is subject to the output of the model. Five tests were carried out, with the diagrams shown in Figure A4 of Appendix A.
The tests were carried out by running 30 periods. In Figure 10, the results obtained on the training data (train) and the validation data (val) for the mean square error (MSE) can be seen. In the training data, the lowest MSE was obtained by option 32-16-1, followed by 16-8-1 and 32-8-1. The behavior for the validation data (val) the lowest error corresponds to the 32-8-1 architecture, followed by 16-8-1. It is considered as the most appropriate architecture the one with 32 neurons in the input layer, 8 in the hidden layer, and 1 in the output layer (32-8-1).
The loss function considered in the original neural network is the mean absolute error (MAE) which is a measure of the difference between two continuous variables and serves to quantify the accuracy of a predictive technique. In this neural network, the mean square error (MSE) was defined as a metric that measures the average of the squared errors, that is, the difference between the estimator and what is estimated (Tamhane 1999). Sinceŷ is a predictor of n predictions, and the vector of n true values, metrics are defined as: Basically, the original neural network architecture was contrasted with the exchange between loss and metrics, which gave the result shown in Figure 11.
In both the training and validation datasets, the smallest error has been thrown by the MAE option as a loss function and MSE as a metric, which is the original setting. In an artificial neural network, each neuron calculates the linear combination of weights and inputs, the result of which, in order to be the output value, the activation function is previously applied. In the model in question the prediction that the model must give is a positive integer value, as opposed to options such as the cases in which the probability of belonging to a certain class must be predicted. For this reason, among the different options of activation functions, the tests were performed between linear, which is the identity function, and rectified linear (ReLU), which is the identity function for positive values, and 0 for negative values. The tests were performed maintaining the same activation function for the input layer and the hidden layer, and on the other hand the function of the output layer. The loss function considered in the original neural network is the mean absolute error (MAE) which is a measure of the difference between two continuous variables and serves to quantify the accuracy of a predictive technique. In this neural network, the mean square error (MSE) was defined as a metric that measures the average of the squared errors, that is, the difference between the estimator and what is estimated (Tamhane 1999). Since ŷ is a predictor of n predictions, and the vector of n true values, metrics are defined as: Basically, the original neural network architecture was contrasted with the exchange between loss and metrics, which gave the result shown in Figure 11. In both the training and validation datasets, the smallest error has been thrown by the MAE option as a loss function and MSE as a metric, which is the original setting. In an artificial neural network, each neuron calculates the linear combination of weights and inputs, the result of which, in order to be the output value, the activation function is previously applied. In the model in question the prediction that the model must give is a positive integer value, as opposed to options such as the cases in which the probability of belonging to a certain class must be predicted. For this reason, among the different options of activation functions, the tests were performed between linear, which is the identity  The loss function considered in the original neural network is the mean absolute er (MAE) which is a measure of the difference between two continuous variables and serv to quantify the accuracy of a predictive technique. In this neural network, the mean squ error (MSE) was defined as a metric that measures the average of the squared errors, t is, the difference between the estimator and what is estimated (Tamhane 1999). Since ŷ a predictor of n predictions, and the vector of n true values, metrics are defined as: Basically, the original neural network architecture was contrasted with the exchan between loss and metrics, which gave the result shown in Figure 11. In both the training and validation datasets, the smallest error has been thrown the MAE option as a loss function and MSE as a metric, which is the original setting. In artificial neural network, each neuron calculates the linear combination of weights a inputs, the result of which, in order to be the output value, the activation function is p viously applied. In the model in question the prediction that the model must give i positive integer value, as opposed to options such as the cases in which the probability belonging to a certain class must be predicted. For this reason, among the different optio of activation functions, the tests were performed between linear, which is the ident function, and rectified linear (ReLU), which is the identity function for positive valu As can be seen in the results, both in the training and validation data sets the 3 options considered give similar mean square errors (MSE), however, the option of using ReLU type activation in all layers shows a more stable behavior, and therefore is considered the best alternative. This is shown in Figure 12.
Mathematical optimization is the selection of the best element (with respect to some criterion) of a set of available elements (Lan 2019). In neural networks, the optimization function is used to choose the strategy to minimize the loss function. The options considered have been: 1.
SGD (Stochastic Gradient Descent) is the basic method of optimization, here the step of updating the weights (α) is fixed, it does not change during the training (Mandt et al. 2017).

2.
RMSProp (Root Mean Square Propogation): in its formula it includes a parameter d, called decaying rate, which makes the effect of the previous values less influential than the new ones (Lu 2018).

3.
Adam (Adaptive Moment Estimation) is similar to RMSProp but with momentum, that is, it not only considers the gradient of the current step, but also accumulates the gradients of previous steps (Kingma and Ba 2014;Tato and Roger 2018).
In both cases, i.e., with the training and validation values, the least error has been achieved by SGD. However, the optimizer that presents a more uniform behavior in both data sets is considered more appropriate, and this is only observed with Adam, which is the original network configuration. The results are shown in Figure 13.
Soc. Sci. 2021, 10, x FOR PEER REVIEW 13 of 21 and 0 for negative values. The tests were performed maintaining the same activation function for the input layer and the hidden layer, and on the other hand the function of the output layer.
As can be seen in the results, both in the training and validation data sets the 3 options considered give similar mean square errors (MSE), however, the option of using ReLU type activation in all layers shows a more stable behavior, and therefore is considered the best alternative. This is shown in Figure 12. Mathematical optimization is the selection of the best element (with respect to some criterion) of a set of available elements (Lan 2019). In neural networks, the optimization function is used to choose the strategy to minimize the loss function. The options considered have been: 1. SGD (Stochastic Gradient Descent) is the basic method of optimization, here the step of updating the weights (α) is fixed, it does not change during the training (Mandt et al. 2017). 2. RMSProp (Root Mean Square Propogation): in its formula it includes a parameter d, called decaying rate, which makes the effect of the previous values less influential than the new ones (Lu 2018). 3. Adam (Adaptive Moment Estimation) is similar to RMSProp but with momentum, that is, it not only considers the gradient of the current step, but also accumulates the gradients of previous steps (Kingma and Ba 2014;Tato and Nkambou 2018).
In both cases, i.e., with the training and validation values, the least error has been achieved by SGD. However, the optimizer that presents a more uniform behavior in both data sets is considered more appropriate, and this is only observed with Adam, which is the original network configuration. The results are shown in Figure 13.

Selecció n y uso del Modelo Predictivo
Depending on the different tests carried out, it is identified as the best artificial neural network model for the predictive model of crimes, which has the following characteristics, shown in Figures 14 and 15. To use the predictive model, the following information is required: Year, Day, Month, Time Zone, Commune, and Type of Crime. This information needs to be scaled up in order to be entered as input into the model, and then use the predict option to obtain the number of recorded crimes expected.

Selección y uso del Modelo Predictivo
Depending on the different tests carried out, it is identified as the best artificial neural network model for the predictive model of crimes, which has the following characteristics, shown in Figures 14 and 15. To use the predictive model, the following information is required: Year, Day, Month, Time Zone, Commune, and Type of Crime. This information needs to be scaled up in order to be entered as input into the model, and then use the predict option to obtain the number of recorded crimes expected.

Selecció n y uso del Modelo Predictivo
Depending on the different tests carried out, it is identified as the bes network model for the predictive model of crimes, which has the followin shown in Figures 14 and 15. To use the predictive model, the followin required: Year, Day, Month, Time Zone, Commune, and Type of Crime. needs to be scaled up in order to be entered as input into the model, a predict option to obtain the number of recorded crimes expected.

Results
Using the crime data published by the Government of the Autonomous City of Buenos Aires, a predictive regression model of the number of crimes expected for a given date, time, commune and type of crime has been defined and trained. The available data had information from 2016 to 2019, for the types of reported crimes that are: theft, robbery, injuries, and homicide. In the information exploration phase, it was possible to identify a close relationship between the information of the day and the time zone, being able to

Results
Using the crime data published by the Government of the Autonomous City of Buenos Aires, a predictive regression model of the number of crimes expected for a given date, time, commune and type of crime has been defined and trained. The available data had information from 2016 to 2019, for the types of reported crimes that are: theft, robbery, injuries, and homicide. In the information exploration phase, it was possible to identify a close relationship between the information of the day and the time zone, being able to visualize 4 groupings, corresponding to 4 quadrants, delimited by the 16th day and the 12th hour. For the construction of the model, a multi-layer artificial neuronal network was selected, with a structure called perception, on which multiple tests were made that allowed identifying the appropriate hyperparameters to achieve better results. Therefore, the multilayer perception has the architecture shown in Figure 16.

Results
Using the crime data published by the Government of the Autonomous City of Buenos Aires, a predictive regression model of the number of crimes expected for a given date, time, commune and type of crime has been defined and trained. The available data had information from 2016 to 2019, for the types of reported crimes that are: theft, robbery, injuries, and homicide. In the information exploration phase, it was possible to identify a close relationship between the information of the day and the time zone, being able to visualize 4 groupings, corresponding to 4 quadrants, delimited by the 16th day and the 12th hour. For the construction of the model, a multi-layer artificial neuronal network was selected, with a structure called perception, on which multiple tests were made that allowed identifying the appropriate hyperparameters to achieve better results. Therefore, the multilayer perception has the architecture shown in Figure 16.  This determines that for the prediction it is required to use the information corresponding to the predictive characteristics (predict), being these: year, month, day, time zone, commune and type of crime, to then scale them to feed the model. The model for the prediction of crimes in the city of Buenos Aires, created and implemented through a neural network called multilayer perception, allows for obtaining future information, not only with respect to possible crimes, but with the appropriate level of detail for the definition of preventive and detection measures in the different communes in which the city is organized. The information provided by the model is also useful in the decision-making process for the distribution of the budget allocated to Security.
From the initial data set, after the cleaning, transformation and grouping processes, a total of 361,184 records were obtained. This data set was divided into two groups: the training group, with 80%, equivalent to 288,947 records, and the validation group, with 20%, equivalent to 72,237 records.
Given that this is a regression problem, in contrast to those of classification, in the evaluation of the predictive quality of the model the concept of "accuracy" or quantity of correct vs. incorrect predictions is not used, but rather other types of metrics are used, such as the average of the sum of the absolute value of the differences between the estimated value and the correct value, which is called MAE (mean absolute error), and in this case the final model obtained has yielded a value of 0.4095, so it is expected to obtain an average error of approximately less than 0.5 cases, for example if in reality were recorded 3 cases, the prediction would yield values between 2.6 and 3.6, which, in terms of the possible modes of use of the model is appropriate.

Conclusions
The model of crime prediction in the city of Buenos Aires created, implemented through a neural network called multilayer perception, allows to obtain future information, not only regarding possible crimes, but with the appropriate level of detail for the definition of preventive and detection measures in the different communes where the city is organized. Therefore, the information provided by the model can also be useful in the decision-making process for the distribution of the budget allocated to Security, more so in a context of crisis where government resources are scarce; it becomes even more necessary to optimize public spending. For the use of the model, it should be considered that the information regarding crimes is published by the government of the Autonomous City of Buenos Aires, within three months after the end of the year. The model created has been trained with the data corresponding to the years 2016 to 2019. Therefore, it is expected that the information for the year 2020 will be available in March 2021, and it would be convenient to update the model with this information. However, it will be a situation, given that due to the issues of the pandemic linked to the COVID-19 virus, restrictive mobility measures have been taken, among which home confinement predominates between the months of March and October. As a result of this situation, it is to be expected that a very different scenario will arise from the usual one and described by the data for the period 2016 to 2019. In this way, the incorporation of the information from 2020 will have to be properly tested and evaluated, considering different scenarios that could vary from making transformations, to even discarding them.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Figure A1. Initial data frame. Source: Own elaboration.   Source: Own elaboration based on data extracted from https://data.buenosaires.gob.ar/dataset/delitos. Figure A2. Result of the execution of df.info(). Source: Own elaboration. Figure A2. Result of the execution of df.info(). Source: Own elaboration. Figure A2. Result of the execution of df.info(). Source: Own elaboration. Figure A3. Data frame before and after the consolidation process Source Own elaboration. Figure A3. Data frame before and after the consolidation process Source Own elaboration. Soc. Sci. 2021, 10, x FOR PEER REVIEW Figure A4. Tests performed on the variation of neurons. Source: Own elaboration.