Spatiotemporal Predictive Geo-Visualization of Criminal Activity for Application to Real-Time Systems for Crime Deterrence, Prevention and Control

: This article presents the development of a geo-visualization tool, which provides police ofﬁcers or any other type of law enforcement ofﬁcer with the ability to conduct the spatiotemporal predictive geo-visualization of criminal activities in short and continuous time horizons, according to the real events that are happening: that is, for those geographical areas, time slots, and dates that are of interest to users, with the ability to consider individual events or groups of events. This work used real data collected by the Colombian National Police (PONAL); it constitutes a tool that is especially effective when applied to Real-Time Systems for crime deterrence, prevention, and control. For its creation, the spatial and temporal correlation of the events is carried out and the following deep learning techniques are employed: CNN-1D (Convolutional Neural Network-1D), MLP (multilayer perceptron), LSTM (long short-term memory), and the classical technique of VAR (vector autoregression), due to its appropriate performance in the multi-step and multi-parallel forecasting of multivariate time series with sparse data. This tool was developed with Open-Source Software (OSS) as it is implemented in the Python programming language with the corresponding machine learning libraries. It can be implemented with any geographic information system (GIS) and used in relation to other types of activities, such as natural disasters or terrorist activities.


Introduction
The high costs associated with delinquency and criminality are, to a greater or lesser extent, known to every society [1,2]. As such, considering the constant work that law enforcement, public security agencies, and investigators throughout the world carry out, it is clear that efforts should be directed towards the creation of tools, policies, and strategies for the deterrence, prevention, and control of this type of activity. Such strategies also make it possible to detect their displacement patterns; this is perhaps one of the best ways to address these issues, save lives, and avoid consequences of all kinds. This is the case for tools that allow for the prevention or prediction criminal activity, which can be applied to real-time scenarios and situations.
The article shows the development of a tool based on this concept. It offers police units or any other type of law enforcement officer a spatiotemporal predictive geo-visualization of criminal activities with a short time horizon using feedback from real events that are happening; these are used to perform the retraining of the predictive model and the following forecasts. Thus, forecasts are made continuously, in real-time, and with a reduced error rate. It is applicable to geographical areas, time slots, and dates of interest, either according to individual event codes or to groups of event codes.

Works Related to the Concept of Spatiotemporal Predictive Geo-Visualization
The prediction of when and to what extent all kinds of events and activities could happen, whether these events are human, natural, or otherwise, requires that this information be represented graphically. The interaction of these events constitutes a wide source of information that allows observers, analysts, investigators, and security and control agents to achieve greater accuracy in future projections and to undertake more efficient decisionmaking in relation to crime deterrence, prevention, and control, allowing for enhanced situational awareness.
As a consequence, the capabilities of spatiotemporal predictive geo-visualization have been investigated in relation to various applications, from observations for the prevention and early warning of natural disasters, such as floods [15], to predictive crime surveillance [16,17]. Therefore, development approaches depend largely on the specific needs to be satisfied. Such is the case for the spatial visualization of conflict hotspots using statistical tools [18][19][20][21], mapping by street segments [22], using contrast patterns to predict spatiotemporal events [23], using multimodal data for event prediction with deep learning tools [24][25][26][27][28], applying temporal geospatial analysis to event classification [29,30] and near-repeat and risk terrain modeling [31].
However, none of these development areas integrate tools that allow for the geovisualization of forecasts of criminal activity by geographical areas, time slots, and dates, making use of the maximum amount of real data available and in a useful timeframe, such as real time. Moreover, considering that the data are sparse, if they are filtered by individual event codes or groups of event codes, this sparsity is increased further.
However, none of the currently developed strategies are able to make use of the now huge amount of real available data provided by security agencies when performing geo-visualizations of criminal activity forecasts in real-time, where the data are correlated in time and space and allow for activity data filtering by zones, time slots, and dates. As a consequence, the work presented in this paper constitutes an architectural proposal for a real-time system that fills the aforementioned research gap and satisfies the needs detailed above.

Development of an Effective Tool for the Spatiotemporal Predictive Geo-Visualization of Activity for Real-Time Systems
The Colombian National Police (PONAL) divides cities and towns into quadrants, which are variable areas of extended land that are divided in a non-physical way according to the ratio of population density directly proportional to the armed forces or number of police officers at their service [66]. The quadrants are served by police stations and CAIs (immediate action commands) [66].
Considering this strategic division of police control and the fact that, like Colombian cities and towns, any geographic area can be divided into hypothetical quadrants, the intervention described here is generic, since it can be used for any other place. This grouping of terrain into grids that represent the division into quadrants of any police jurisdiction is the first step in creating a tool for spatiotemporal predictive geo-visualization, since this grouping of areas into sub-areas allows for the correlation of the events that have occurred within each sub-area relative to the others.
The entire process described in this section was undertaken using the open-source programming language Python, which has a large number of mature, community-proven libraries providing visualization, numerical calculations, data analysis, and prediction modules for machine learning and deep learning, as well as classical techniques for multivariate time series.

Sources of Information and Pre-Processing
Through confidentiality agreements, the PONAL has provided years of data regarding criminal incidents associated with the area of jurisdiction of the city of Santiago de Cali. The first step in preprocessing the data is carried out based on the PONAL database system, where data cleaning is performed to generate a sub-database, eliminating unconfirmed or erroneous events that could have been captured, and extracting the necessary fields, including the timestamp (date and time), latitude, longitude, and event code (case code). After this first step, the next one consists of carrying out a data cleaning and data wrangling process, which basically adapts the geographic coordinate format and the timestamp forma, and performs the organization by timestamp [67]. This process results in a database with the format shown in Figure 1.  Once this process is established, databases can be queried by any time range (date and time) and by any extension of the geographical area of the jurisdiction of the PONAL (latitude and longitude). Analysts and commanders, or any other user, can request from the system the data of smaller or larger areas and, within the chosen area, the range of the temporary date and time data, according to their interests. At this point, the dataset is Once this process is established, databases can be queried by any time range (date and time) and by any extension of the geographical area of the jurisdiction of the PONAL (latitude and longitude). Analysts and commanders, or any other user, can request from the system the data of smaller or larger areas and, within the chosen area, the range of the temporary date and time data, according to their interests. At this point, the dataset is ready to be used.

Geographic Spatial Grouping of the Observation Area for Its Correlation
Once the observation area has been defined by its geographic coordinates of latitude and longitude and its event data are ready to be used, a grid bounded by these geographic coordinates is created. This latitude and longitude grid system allows the observation area to be divided into sub-areas, so that, when one or more criminal activity events are detected within each sub-area, they are represented and counted within it as in a density map. In turn, this also facilitates the correlation of events between adjacent sub-areas when it is necessary to generate forecasts, as, following this approach, there is a univariate time series for each of the sub-areas and, therefore, a multivariate time series of the same size as the resolution of the entire grid. This is discussed further in the section titled "Data Forecasting and its Spatiotemporal Geo-Visualization".
The observation zone can be as wide as the analyst requires in terms of the geographic extension, and the grid division can have the resolution that the analyst chooses according to the observation needs: see Figure 2a. This grid should be placed as a layer over the geographical map of the observation area on the Geographic Information System (GIS) in which the spatiotemporal predictive geo-visualization tool will be used, as shown in Figure 2b. In other words, the concept described in this article constitutes a tool that acts as a plug-in that can be adapted to any GIS through its API (application programming interface). Here, the GIS used is the one integrated into the real-time systems of the PONAL [67]. For the case of PONAL-specific use, and since it refers to spherical coordinates on the terrestrial globe, we created a grid of 40 squares on the latitudinal axis (approximately 896,625 m each) and 25 squares on the longitudinal axis (approximately 1,087,344 m each at the northern latitude and approximately 1,087,704 m at the southern latitude). However, it is important to reiterate that these values are used for this case, but this grid resolution value can be adjusted to the monitoring needs of any geographic territory as re-   For the case of PONAL-specific use, and since it refers to spherical coordinates on  the terrestrial globe, we created a grid of 40 squares on the latitudinal axis (approximately  896,625 m each) and 25 squares on the longitudinal axis (approximately 1,087,344 m each at  the northern latitude and approximately 1,087,704 m at the southern latitude). However, it is important to reiterate that these values are used for this case, but this grid resolution value can be adjusted to the monitoring needs of any geographic territory as required by commanders, analysts, or users in general.

Temporal Grouping of the Criminal Events of the Observation Area for Correlation
Regardless of the size of the geographic segment chosen as the observation area, the analyst can also select the preferred time range (date and time) in which to analyze this area. Thus, for the same observation area, a different time range for analysis could be chosen-for example, one year, two years, a few months, etc.-as well as precisely choosing exactly which dates are required.
The temporal grouping of criminal "vent' In the observation area within the chosen time range is performed, and this temporary grouping is achieved by reorganizing the dataset as a multivariate time series, as follows: the frequency of the multivariate time series is defined as how often the measurement of the cases that occurred will be sampled, and the amount of time within which events in each frame will be represented is defined. For the case considered here, the spatiotemporal predictive geo-visualizer shows the criminal events that occurred during a 30 min period, in frames that are updated every 10 min. Consequently, the frequency of the multivariate time series is 10 min [67].
The above point implies that, according to this choice of values, in each representation of a frame, there will be an overlap of cases of 20 min; that is, the representation of the second frame will again show the cases that occurred in the last 20 min of the previous frame [67]. It is designed in this way with the aim of generating continuity and ease in the forecasting algorithms.
However, it must be clarified that these two parameters have been specified as shown for this case by initially selecting random values and adjusting them by trial and error until finding those that served the intended purpose. Therefore, it is clear that these parameters can also be adjusted according to the requirements of a given scenario, and there is even the possibility of using parameters that do not generate overlaps [67]. Figure 3a shows an example of a 3D multivariate time series, and Figure 3b shows an example of a 2D multivariate time series, once the dataset is rearranged as a multivariate time series the frequency of which will correspond to the value of 10 min.
In Figure 3a, each colored box represents a sub-area of the entire observation area, and each color represents a different density of events. In Figure 3b, each univariate time series, named "SerTemp_x", represents each of the sub-areas.
In Figure 4, an example is shown where the graphical representation of some frames was randomly taken to show how the tool provides visualization capabilities up to this stage. That is to say, it shows the density of criminal events by sub-areas for each time measured, frame by frame, within a complete observation area (in the example, this is the entire jurisdiction of the PONAL over the city of Santiago de Cali). This geo-visualization also helps users, analysts, and commanders to verify whether the chosen area, its resolution, and the analysis time range are adequate for their objectives or if it would be preferable to modify those parameters.
for this case by initially selecting random values and adjusting them by trial and error until finding those that served the intended purpose. Therefore, it is clear that these parameters can also be adjusted according to the requirements of a given scenario, and there is even the possibility of using parameters that do not generate overlaps [67]. Figure 3a shows an example of a 3D multivariate time series, and Figure 3b shows an example of a 2D multivariate time series, once the dataset is rearranged as a multivariate time series the frequency of which will correspond to the value of 10 min.    was randomly taken to show how the tool provides visualization capabilities up to this stage. That is to say, it shows the density of criminal events by sub-areas for each time measured, frame by frame, within a complete observation area (in the example, this is the entire jurisdiction of the PONAL over the city of Santiago de Cali). This geo-visualization also helps users, analysts, and commanders to verify whether the chosen area, its resolution, and the analysis time range are adequate for their objectives or if it would be preferable to modify those parameters.

Data Forecasting and Its Spatiotemporal Geo-Visualization
A truly useful crime event prediction model for the objectives that this tool proposes must: 1.
Make forecasts continuously, for short time horizons, and in a useful timeframe as close as possible to real time.

2.
Be frequently retrained using the largest amount of data from real events in such a way that it not only enhances the forecasts' reliability but also allows commanders and analysts to observe possible trends in criminal activity more precisely.

3.
Be as simple as possible, so that its computational cost does not become an obstacle to its proper functioning and model overfitting risks are avoided.

4.
Provide the visualization of criminal activity forecasts on a map and with timelines.
In accordance with these requirements and as stated in the previous sections, by grouping the data on criminal activity events spatially and temporally, multivariate time series are obtained by the unification of all the univariate series that make up each of the sub-areas of the observation area. In other words, there is a multivariate time series of data that is made up of several univariate time series equal to the number of sub-areas that the resolution of the grid contains, since each univariate time series corresponds to the density of the criminal events in each sub-area. The frequency of the entire multivariate time series is 10 min for this case: see Figure 3.
Grouping the densities of criminal activity events, both spatially and temporally, in a multivariate time series, has the following advantages: • Spatial correlation is achieved between the sub-areas of the observation area when forecasting future events, as each of the univariate time series conforms with the densities of criminal events in each of the sub-areas. Prediction algorithms take this spatial relationship into account, without the need to provide the location parameter (latitude and length) as a predictor variable.

•
Temporal correlation between the sub-areas of the observation area is achieved when forecasting future events, as each of the univariate time series is fitted with the densities of the criminal events of each of the sub-areas. Prediction algorithms take this temporal relationship into account, without the to provide the timestamp parameter as a predictor variable.
This means that the prediction algorithms can be used for multivariate time series, with the predictor variables being the values of the densities of criminal events for each sub-area. This simplifies the model and helps to speed up forecast convergence, but without giving up the spatial and temporal correlation of these events, as these two correlations are essential for the forecast's reliability. That is, within an observation area, it is only possible to correctly forecast the shift in activity if the events of each of the sub-areas that make it up are correlated with each other.
To carry out prediction tests of criminal activity events, an observation area of 10 quadrants is considered (with the parameter values already chosen in Sections 3.2 and 3.3), which is the approximate area of jurisdiction of a police station in a city; the analysis considers a time frame of approximately two years, but, again, it must be stated that these parameters are adaptable to any other given scenario. Another important factor to consider is the percentage sparsity of data, which can be checked visually by users in the tool, since the choice of prediction algorithms largely depends on this.
Bearing in mind that, in the proposed use case, all the criminal activity data that occurred in the observation period are used, i.e., no filters are applied by case code, the results show that the data exhibit a sparsity of 0.9871357368590777. In this test, the closer the result is to 1, the higher the dispersion, so it can be concluded that, in this case, we are dealing with data that present a high percentage of zero values when making forecasts for the future. This calculation was performed in Python according to the following formula: Sparsity = 1.0 − (count of non-zero values within the data)/total size of the data This feature makes sense, as criminal activity fluctuates according to zones, time slots, and dates, even more so when providing the ability to filter criminal activity events by event codes; this is so that an analyst or commander can request, for example, only the fight cases, or the fight cases added to the property damage cases, and so on.
There is, then, a sparse-type multivariate time series. Regarding the size of the resolution of the quadrants chosen for the observation area, the prediction variables are the integer values of the densities of the criminal activity in each of the sub-areas or in each univariate time series to which it conforms. According to the figures shown above, a scale from 0 to 10 was chosen, with 0 being the absence of events (transparency or the absence of color) and 10 being the maximum concentration of events (black color). It should be noted that the values of this scale can be readjusted if necessary.
The multivariate time series does not have exogenous variables, only endogenous ones; that is, for each univariate time series that makes up the multivariate series, forecasts are required. In addition, it is a high-frequency multivariate time series, since the frequency of the series is 10 min. Additionally, it is multi-parallel, because the forecast of a step (or period), is equivalent to the forecast of a step of the value of the densities of the events in each one of the sub-areas (quadrants) of the observation area, in multi-parallel. This means that the time series is not only multivariate but also multi-parallel. It is then stated that the prediction time horizon or one-step forecast, with which this tool is designed, is 10 min (see Figure 5). slots, and dates, even more so when providing the ability to filter criminal activity events by event codes; this is so that an analyst or commander can request, for example, only the fight cases, or the fight cases added to the property damage cases, and so on.
There is, then, a sparse-type multivariate time series. Regarding the size of the resolution of the quadrants chosen for the observation area, the prediction variables are the integer values of the densities of the criminal activity in each of the sub-areas or in each univariate time series to which it conforms. According to the figures shown above, a scale from 0 to 10 was chosen, with 0 being the absence of events (transparency or the absence of color) and 10 being the maximum concentration of events (black color). It should be noted that the values of this scale can be readjusted if necessary.
The multivariate time series does not have exogenous variables, only endogenous ones; that is, for each univariate time series that makes up the multivariate series, forecasts are required. In addition, it is a high-frequency multivariate time series, since the frequency of the series is 10 min. Additionally, it is multi-parallel, because the forecast of a step (or period), is equivalent to the forecast of a step of the value of the densities of the events in each one of the sub-areas (quadrants) of the observation area, in multi-parallel. This means that the time series is not only multivariate but also multi-parallel. It is then stated that the prediction time horizon or one-step forecast, with which this tool is designed, is 10 min (see Figure 5).  Since the objective is to achieve continuous forecasts, with short time horizons and in a timeframe as useful as real-time, while the model benefits from using large amounts of real data to continuously train itself and to make new forecasts, Figure 6 shows the overall work loop and the requirements for achieving this capability. Since the objective is to achieve continuous forecasts, with short time horizons and in a timeframe as useful as real-time, while the model benefits from using large amounts of real data to continuously train itself and to make new forecasts, Figure 6 shows the overall work loop and the requirements for achieving this capability.
According to the example, if the model were trained using the observations made up to 16:10 h, then it would start its forecast operation with the first step showing the observations made up to 16:20 h, and the second forecast step would show the observations made up to 16:30 h. These forecasts would be geo-visualized by commanders and analysts on the geographic map of the observation zone. However, time goes on and the real-time system continues to collect data from actual events within that first time interval of the first forecast step (between 16:10 h and 16:20 h). Therefore, once 16:20 h is reached, the real-time system has the events up to that moment and can store them in the database of the tool (thus increasing it). Therefore, the model can be retrained with these data, and two forecast steps are generated with the new actual data added. ...and it happens in the same way successively for each step.
Full frame displayed over system Full frame displayed over system Full frame displayed over system According to the example, if the model were trained using the observations made up to 16:10 h, then it would start its forecast operation with the first step showing the observations made up to 16:20 h, and the second forecast step would show the observations made up to 16:30 h. These forecasts would be geo-visualized by commanders and analysts on the geographic map of the observation zone. However, time goes on and the real-time system continues to collect data from actual events within that first time interval of the first forecast step (between 16:10 h and 16:20 h). Therefore, once 16:20 h is reached, the real-time system has the events up to that moment and can store them in the database of the tool (thus increasing it). Therefore, the model can be retrained with these data, and two forecast steps are generated with the new actual data added.
Meanwhile, the geo-visualization has not stopped showing the forecast between 16:10 h and 16:30 h, so, when the model retrains and performs a new two-step forecast, (this time the first one spans until 16:30 h and the second until 16:40 h), the system updates, overwriting the forecast until 16:30 h and displaying the results for the interval until 16:40 h, acting in this way for each of the forecast steps. This gives the observer the feeling of continuity and real time in the forecasts being shown; however, behind the scenes, it is in fact constantly being updated and is using as much real data as possible as it is gathered. This is a highly useful approach for the PONAL environment, as it would notably improve situational awareness and future projections and, therefore, agility and efficiency in police decision-making processes. This is because, among other things, it allows commanders and analysts to observe the trend of the displacement of criminal activity in certain places, either in general, for isolated crimes, or for groups of crimes.
For this ability to be achieved by a system, a model must be able to: • Be retrained and generate multi-parallel forecasts in a time shorter than the duration of one forecast step of time, that is, in a time shorter than the frequency of the multivariate time series, which, in this case, is 10 min. This gives the observer the feeling of continuity and real time in the forecasts being shown; however, behind the scenes, it is in fact constantly being updated and is using as much real data as possible as it is gathered. This is a highly useful approach for the PONAL environment, as it would notably improve situational awareness and future projections and, therefore, agility and efficiency in police decision-making processes. This is because, among other things, it allows commanders and analysts to observe the trend of the displacement of criminal activity in certain places, either in general, for isolated crimes, or for groups of crimes.
For this ability to be achieved by a system, a model must be able to: • Be retrained and generate multi-parallel forecasts in a time shorter than the duration of one forecast step of time, that is, in a time shorter than the frequency of the multivariate time series, which, in this case, is 10 min.

•
Have the capacity to generate reliable forecasts for a time horizon of at least two steps at a time, since, with single-step forecasts, continuity in geo-visualization cannot be realized. This is because the time slot will not be long enough to collect new information, retrain the model, and generate new forecasts.

•
Be as simple and efficient as possible so that everything described above can be fulfilled.
As described up to this point, the conclusive guidelines for the choice of prediction algorithms are set, as shown, by the geo-visualization system's real-time requirements, and the restrictions of the real data under consideration. For example, not all prediction algorithms for multivariate time series that are able to forecast in multi-parallel will work properly when applied to high-sparsity datasets. The choice of predictive algorithms is also driven by two key features of time series data, namely: • Seasonality: the multivariate time series discussed here does not present seasonality, especially as it is a multivariate high-frequency sparse-type time series. • Stationarity: the multivariate time series discussed here turns out to be stationary, so there is no need to perform any type of transformation on the data to achieve it. Stationarity was tested using the Dickey-Fuller test.

Baseline Model
To start testing with the prediction algorithms that may be useful for the purposes of this work, a forecast or baseline model must be established to determine whether the classic models, machine learning, or deep learning are useful and contribute to the achievement of a truly reliable future forecast. In the case of multivariate time series, it is better to establish a naïve reference model, naïve implying that such a forecast will provide observations directly, without any processing; this is also called a persistence forecast due to the observations' persistence. There are also other possibilities for establishing these baseline models, such as, for example, making averages of some previous observations and using them as a forecast, for which the time series dataset must be transformed into one suitable for supervised learning problems, that is, in the form of inputs (X) and outputs (y). However, the main concern is to find and choose the baseline model that presents the least errors, to be compared with the performance of more elaborate models in order for its value to be assessed, since any model that performs worse than the baseline model should be discarded. On the contrary, if a model improves the forecast error of the baseline model, this is definitely a model that can be considered as a solution to the forecasting problem.
In this case, tests were carried out, with ample sufficiency, to find the best baseline model; it was found that the naïve model of one (1) to one (1) persistence, that is, the model that used the observations of the step of time just before each of the univariate series, was the best for predicting the observations of the next time step of each of the univariate series. This was the model with the lowest error result, because of this, this was chosen as the baseline model and its error metric was as follows, Figure 7: As described up to this point, the conclusive guidelines for the choice of predict algorithms are set, as shown, by the geo-visualization system's real-time requireme and the restrictions of the real data under consideration. For example, not all predict algorithms for multivariate time series that are able to forecast in multi-parallel will w properly when applied to high-sparsity datasets. The choice of predictive algorithm also driven by two key features of time series data, namely: • Seasonality: the multivariate time series discussed here does not present seasonal especially as it is a multivariate high-frequency sparse-type time series. • Stationarity: the multivariate time series discussed here turns out to be stationary there is no need to perform any type of transformation on the data to achieve it. S tionarity was tested using the Dickey-Fuller test.

Baseline Model
To start testing with the prediction algorithms that may be useful for the purpose this work, a forecast or baseline model must be established to determine whether the c sic models, machine learning, or deep learning are useful and contribute to the achie ment of a truly reliable future forecast. In the case of multivariate time series, it is be to establish a naïve reference model, naïve implying that such a forecast will provide servations directly, without any processing; this is also called a persistence forecast d to the observations' persistence. There are also other possibilities for establishing th baseline models, such as, for example, making averages of some previous observati and using them as a forecast, for which the time series dataset must be transformed i one suitable for supervised learning problems, that is, in the form of inputs (X) and o puts (y). However, the main concern is to find and choose the baseline model that prese the least errors, to be compared with the performance of more elaborate models in or for its value to be assessed, since any model that performs worse than the baseline mo should be discarded. On the contrary, if a model improves the forecast error of the ba line model, this is definitely a model that can be considered as a solution to the forecast problem.
In this case, tests were carried out, with ample sufficiency, to find the best basel model; it was found that the naïve model of one (1) to one (1) persistence, that is, the mo that used the observations of the step of time just before each of the univariate series, w the best for predicting the observations of the next time step of each of the univariate ries. This was the model with the lowest error result, because of this, this was chosen the baseline model and its error metric was as follows, General information about the results of the models and the tests conducted: • Time series are a particular case of stochastic processes; therefore, their models also usually stochastic. This means that they present a certain randomness in th parameters, which is why, when training a model, the values of the error metrics m General information about the results of the models and the testsconducted: • Time series are a particular case of stochastic processes; therefore, their models are also usually stochastic. This means that they present a certain randomness in their parameters, which is why, when training a model, the values of the error metrics may vary. The implication here is that, when measuring the performance of one of these models, several tests are carried out and the error metrics are averaged. That is, the average is considered the performance value of the model. As a consequence, the values of the error metrics of the models shown in the following sections correspond to the average values of the operation of each model.

•
Once the requirements of the system and also those of the predictive models and its forecasts are clarified, the description of the models offered below is based on the methodology of taking all those that meet the initial requirements, according to the nature of the data, and gradually discarding those that do not exceed some thresholds based on the performance criteria. In other words, all possible models are tested, and a filter is used for those where the operation does not meet the needs of the data and of the system, in order to continue working on adjusting only those that provide the expected functionalities.
• For each model, the walk-forward validation technique was used; in this way, not only was the error metric calculated, but the time spent by each model in retraining and generating new forecasts was also verified. • A model can be considered useful if its two-time-step forecast error metric is less than the reference model's error metric.

Classical Models for Forecasting Multivariate Time Series
Classical models for forecasting time series are those developed specifically for these purposes; they are SARIMAX models since, in this case, the data are multivariate time series, so the classical models to apply are the -Vector-SARIMAX or V-SARIMAX models. According to the fact that the resulting multivariate time series of data for this case is stationary and non-seasonal, where all its variables are endogenous, it is possible to exempt the usage of components of integration "I", seasonality "S", and exogenous variables "X" of the classical vector forecast models for time series V-SARIMAX. As the test option in classical methods, for this case, the VARMA models and their possible combinations (VAR, VMA, and VARMA) are used.
However, it should be noted that we are dealing here with sparse-type data, which implies that, if a sub-area within the observation area is chosen where all the values of the densities of criminal activity are zero, throughout the analysis time range, none of these models will be functional. In other words, VARMA and its derivatives have the restriction that they work only if none of the univariate time series that make up the multivariate time series are completely zero.
VARMA (vector autoregressive moving average) and VMA (vector moving average) models: when performing simple initial tests on these two models (of order 1), the VARMA and VMA models were discarded since, when trained and when forecasts were requested, their convergence took a long time and its error metric did not improve on that of the reference model. It should be recalled that the time convergence between the training and the generation of forecasts of two time steps is required to be a maximum of 10 min so that the system can work properly in the real-time systems of the PONAL, in this case.
Classical Model for Multivariate Time Series, SPARSE type: the classic model for sparsetype multivariate time series, VAR-SPARSE (vector auto-regressive sparse), which is only available for the R programming language through its libraries "bigtime" [68], and "sparsevar" [69], was also tested. However, it was not functional because, even for a small multivariate time series, its convergence was very slow and it took hours to converge. In addition, its consumption of RAM (random access memory) was too high and not justified by the satisfactory performance of the model. VAR (Vector Autoregressive) model: On the other hand, the VAR model generated a result that improved the error metric of the reference model, and the retraining time and generation of new forecasts were much shorter than 10 min. The result of the VAR model was, Figure 8:  The results were achieved by taking into account that the input data used wer (without any type of transformation), without the support of the GPU (graphic proce unit and its powerful parallelization capabilities), and by allowing the model to auto ically take the order that it considered best for its operations.
Therefore, it can be concluded that this model represents a solution to the pro addressed here, as it works better than the reference model even in a forecast with a horizon of two steps at a time. It can also be retrained and can generate at least this step forecast in a multi-parallel mode in less time than the frequency of the multiv time series (10 min). This allows for the generation of the predictive continuity prop by the system that defines this tool; however, it must be remembered that this mo The results were achieved by taking into account that the input data used were raw (without any type of transformation), without the support of the GPU (graphic processing unit and its powerful parallelization capabilities), and by allowing the model to automatically take the order that it considered best for its operations. Therefore, it can be concluded that this model represents a solution to the problem addressed here, as it works better than the reference model even in a forecast with a time horizon of two steps at a time. It can also be retrained and can generate at least this two-step forecast in a multi-parallel mode in less time than the frequency of the multivariate time series (10 min). This allows for the generation of the predictive continuity proposed by the system that defines this tool; however, it must be remembered that this model is linear. Therefore, it will only work well with stationary time series and it does not work when some sub-area or a univariate time series within the observation area presents a density of cases equal to zero.

Machine Learning (Including Deep Learning) Models for the Forecast of Multivariate Time Series
Random Forest Model: the Random Forest algorithm allows for multi-parallel forecasting for multivariate time series of the sparse type. This is possible for one (1) to one (1) models (taking a previous observation to forecast an observation in the future), and for models that can take several previous observations (X) to forecast several steps into the future (y). Its convergence (training time and forecast generation) is approximately two (2) minutes without the support of the GPU (graphic processing unit). Raw data can also be used, and its only additional requirement is the transformation of the dataset time series to one suitable for supervised learning problems, that is, in the form of inputs (X) and outputs (y).
However, although Random Forest fulfills all the requirements to address all the forecasting needs of this work, its error metric never improved that of the reference model, even after all the possible changes were made within its parameters and all the possible combinations of inputs and outputs for the forecast were assessed. On the contrary, with certain combinations of parameters, the consumption of RAM was too high and without results to justify it. With Random Forest, the best error metric obtained was in the 4 (four) to 3 (three) model, that is, four inputs (X) and three outputs (y) and its value was: RMSE = 0.498. For this reason, the model was discarded.
Deep Learning Models for Multivariate Time Series Forecasting: for time series forecasting, deep learning techniques promise several outstanding benefits, such as:

•
The automatic learning of linear and non-linear relationships.

•
Learning time structures that present data such as trends and seasonality.

•
The handling of long sequences and noisy data. • Theulti-parallel forecasting of several input and output steps without making assumptions about mapping functions.

•
Operating with datasets with missing and sparse values, among others. • Finally, although the stationary time series represents an advantage, it is not a mandatory requirement for its use.
This scenario is can be tested as a solution in the case that concerns us here, where the deep learning algorithms that specialize in handling sequences and therefore time series are CNN-1D (Convolutional Neural Network-1D), MLP (multilayer perceptron) and LSTM (long short-term memory), with its different variants and combinations of variants Vanilla-LSTM, Stacked-LSTM, Bidirectional-LSTM, CNN-LSTM, and ConvLSTM.
To carry out tests with these algorithms, the data were left raw (without any type of transformation), it was necessary to transform the time series dataset to one suitable for supervised learning problems, that is, in the form of inputs (X) and outputs (y). The performance results of the models shown here were supported by approximately 11% GPU usage.

1.
MLP (multilayer perceptron): this is a simple neural network model that offers an excellent solution for this prediction problem. Figure 9 shows the configuration, the network diagram, and the results of this neural network. transformation), it was necessary to transform the time series dataset to one suitable for supervised learning problems, that is, in the form of inputs (X) and outputs (y). The performance results of the models shown here were supported by approximately 11% GPU usage.
1. MLP (multilayer perceptron): this is a simple neural network model that offers an excellent solution for this prediction problem. Figure 9 shows the configuration, the network diagram, and the results of this neural network.
• One input layer of X = 6 previous observations times the number of subareas. Therefore, it can be concluded that this neural network model is suitable as a solution for the system proposed here, given its simplicity, rapid convergence, efficiency, and error metric, which improves on the reference model even for a two-step forecast.
2. CNN-1D (Convolutional Neural Network-1D): in general, convolutional neural networks, whether 1D, 2D, or 3D, are designed to preserve spatial structures in raw input data; this is called representation learning. CNNs manage to extract the characteristics of the data regardless of how they are produced, since they remain invariant with the position of the objects and the distortion of the scenes. The CNN-1D, which retains these beneficial features, is ideal for time series forecasting, since time series Therefore, it can be concluded that this neural network model is suitable as a solution for the system proposed here, given its simplicity, rapid convergence, efficiency, and error metric, which improves on the reference model even for a two-step forecast.

2.
CNN-1D (Convolutional Neural Network-1D): in general, convolutional neural networks, whether 1D, 2D, or 3D, are designed to preserve spatial structures in raw input data; this is called representation learning. CNNs manage to extract the characteristics of the data regardless of how they are produced, since they remain invariant with the position of the objects and the distortion of the scenes. The CNN-1D, which retains these beneficial features, is ideal for time series forecasting, since time series are sequences of observations that can be treated as one-dimensional images from which the model can extract its main elements, mapping a sequence of earlier observations from the raw data as the input to one or more future observations as the output. Figure 10 shows the configuration, the network diagram, and the results of the CNN-1D model that offered the best solution for this prediction problem.
are sequences of observations that can be treated as one-dimensional images from which the model can extract its main elements, mapping a sequence of earlier observations from the raw data as the input to one or more future observations as the output. Figure 10 shows the configuration, the network diagram, and the results of the CNN-1D model that offered the best solution for this prediction problem. • Multi-channel with a first input hidden convolutional layer of X = 6 previous observations times the number of subareas. Therefore, it can be concluded that this Neural Network model can be considered adequate as a solution for the system proposed here, given its convolutional neural network 1D features and its error metric, which improves on the reference model even for a two-step forecast. Therefore, it can be concluded that this Neural Network model can be considered adequate as a solution for the system proposed here, given its convolutional neural network 1D features and its error metric, which improves on the reference model even for a twostep forecast.

3.
LSTM (Long Short-Term Memory): by their nature, LSTM neural networks read one-time steps from the sequence at a given time and create the representation of that internal step to use as learned context when making forecasts. In other words, LSTM neural networks offer native support for sequences such as time series. The LSTM models that offer viable solutions for this prediction problem according to their convergence (training time and forecast generation), the possibility of forecasting at least two steps, and an acceptable performance that would improve the performance of the reference model, with data in Float 32, are: Univector output models: The ConvLSTM model Figure 11 shows the diagrams of these neural network models and their results. Encoder-Decoder type models for Multivector output: Figure 12 shows the diagrams of these Neural Network models and their results. Encoder-Decoder type models for Multivector output: Figure 12 shows the diagrams of these Neural Network models and their results. These models are slightly more complex and, although their convergence lays within the established limits, it is a little higher. However, it can be concluded that taking these models into account can be very useful, since, depending on the data, one model or another could fit better. The summary of the results of the LSTM models is as follows, Figure  13: These models are slightly more complex and, although their convergence lays within the established limits, it is a little higher. However, it can be concluded that taking these models into account can be very useful, since, depending on the data, one model or another could fit better. The summary of the results of the LSTM models is as follows, Figure 13:

Forecast Geo-Visualization
A geo-visualizer shows a data forecast on the geographic information system (GIS) of the real-time systems of the PONAL, or any other GIS, in terms of the coordinates of latitude and longitude. However, to show the detail of the system, Figure 15 presents a geo-visualization of the comparison between the real data and the predicted data, scaled in parts of the grid since this parameter is also adjustable.

Forecast Geo-Visualization
A geo-visualizer shows a data forecast on the geographic information system (GIS) of the real-time systems of the PONAL, or any other GIS, in terms of the coordinates of latitude and longitude. However, to show the detail of the system, Figure 15 presents a geo-visualization of the comparison between the real data and the predicted data, scaled in parts of the grid since this parameter is also adjustable.

Forecast Geo-Visualization
A geo-visualizer shows a data forecast on the geographic information system (GIS) of the real-time systems of the PONAL, or any other GIS, in terms of the coordinates of latitude and longitude. However, to show the detail of the system, Figure 15 presents a geo-visualization of the comparison between the real data and the predicted data, scaled in parts of the grid since this parameter is also adjustable. Figure 15. A random sample of the comparison between the geo-visualization of the real data and the predicted data (capture). Figure 15. A random sample of the comparison between the geo-visualization of the real data and the predicted data (capture).

Results and Discussion
It transpires that the concept proposed here, of using the spatiotemporal predictive geovisualization of criminal activity for real-time systems, is a highly useful tool; it can be used in any geographical location and is applicable to any kind of activity if the data are adapted to the format shown (e.g., terrorist activity and natural disaster activity). This concept allows commanders, analysts, or other relevant users to constantly spatiotemporally geovisualize the forecasts of the criminal influx in areas of interest, based on the desired dates and time ranges, as well as according to individual event codes or groups of such codes. These achievements were made possible by methodology established in this paper, whereby generating multi-step forecasts (in this case, two-step forecasts) allows the system to update itself by overwriting the last forecast step, and a forecasting environment is created that gives the user the perception of forecasting in real-time. In addition, this forecast is made even more reliable thanks to the use of the largest amount of real data that exist and the flexible ability for users to define their parameters. This also provides the possibility of analyzing the trends and the displacement of the activity considered.
Artificial intelligence techniques were used to make these forecasts; this study is specifically framed within the field of deep learning and, in this work, all the possible techniques for forecasting sparse-type multivariate time series with multi-parallel forecasting were compiled for the solution of this problem. However, it must be noted that the research on the modeling and forecasting of multivariate time series remains an open field and there is still a long way to go. Therefore, the objective of this research was not to focus on these predictive techniques but to concentrate on the operating methodology that is required for spatiotemporal predictive real-time systems geo-visualization tools and to search for those algorithms, techniques, and models that allow for reliable forecasting under the operating parameters discussed in this paper.
The results of this work show that it is possible to develop a spatiotemporal predictive geo-visualization tool for criminal and terrorist activities that is aligned with the mission and strategic objectives of an entity in charge of ensuring security; up to now, this was not feasible in the context of the PONAL. However, there are technical challenges to its implementation that cannot go undiscussed.
In the context of the proposed work, it is necessary to make forecasts by areas with certain extensions in accordance with the tactical and strategic operation of the PONAL. However, in a larger scale deployment, we must considered the operation of forecasting algorithms for sparse-type multivariate time series, which, according to their nature, will work better or worse depending on the size of the series. There must be other developments and integrations with other sources of geo-information, such as the GIS used by other agencies that would use the tool. In addition, another challenge relates to the computational capacity requirements of map, database, and predictive model processing, which may arise from this tool. To overcome these challenges in the context of PONAL, we may benefit from approaches such as cloud-type solutions or solutions with centralized processing in a local data processing centre (CPD) with parallel processing capacity with the support of GPUs with compatibility with CUDA ® from NVIDIA ® . Additionally, CPUs (central processing units) with adequate power must be considered because different algorithms will be executed, including some deep learning algorithms.
The work carried out here is a clear example of the application of the multi-step and multi-parallel forecasting of sparse-type multivariate time series, for real-time systems, which opens up the possibility of further study in this field of research. In Colombia, this type of work and its results demonstrate the viability and potential impact of creating tools like this one to improve and increase the functionalities of the real-time systems of the PONAL, improving the opportunities to allocate public resources for their implementation.

Conclusions
The development of new, better, and broader tools that can be applied to real-time systems to facilitate, expand, and improve the work of the law enforcement agencies in charge of crime control, terrorism, rescue, and natural disaster control contributes significantly to the construction of safe cities and, therefore, smart cities. This research goal is within the bounds of the international commitments of the sustainable development objectives (SDO), according to the United Nations [14]. This is the case of the tool presented in this article, which is both useful for the established purpose and is novel, because no other tool provides these same benefits together. Moreover, it is also applicable to, for example, the command and control centers of any entity dedicated to crime control, terrorism, rescue, and natural disaster control, because it can be adapted to any geographic information system (GIS). It must be noted that these types of entities must have support systems that allow them to react in real time and even, on occasion, in critical real-time, in order to control said activities.
Such tools are made possible by the methodology used to create this spatiotemporal predictive geo-visualization tool. With multiple inputs, and using geospatial and temporal groupings and a multi-parallel forecasting method in short and continuous time horizons, this tool generates a real-time forecasting environment. The reliable model retraining and the generation of new forecasts make use of the vast amount of real data that are continuously generated. At the same time, the design allows the models to converge very quickly and reducing the resources needed. In addition, the tool is flexible, as each parameter of the model can be determined according to the user's needs. The tool was created as open-source software (OSS) in the Python programming language.
According to the PONAL analysis, this type of tool contributes to the operational capabilities of an institution such as the Colombian police because it fits within the Command and Control Centers for Public Safety (C2S) [70] of the PONAL, specifically within Command and Control Information Systems (C2IS). The operational area of the C2S is responsible for processing and transmitting the necessary information to the police commanders in the strategic area. It is also responsible for interpreting that information and creating the strategic objectives to be acted upon when it corresponds. It then relays these objectives to tactical area commanders who must implement them, mostly in real time. This cycle is repeated once information about the follow-up of the actions is obtained. These actions may be, for example, the distribution and management of physical and human resources in the fight against crime.
Therefore, if the C2S is appropriate for the command process it supports (as in the case of the PONAL), it significantly improves strategic aspects such as situational awareness, future projection of the situation (situation understanding), decision-making support, and agility in the fulfilment of police missions, facilitating crime deterrence, prevention, and control.
Because this work aims to demonstrate the logic and the complete and detailed procedure used to create this tool, which has a range of characteristics that make it versatile and novel, this article provides sufficient information for the reader to understand the method, to create this tool, and to recreate it for any of its potential uses, either fully or partially, as desired. In addition, this work shows examples of the results that can be obtained from the tool.
Finally, it is concluded that, if this tool produced favourable results despite using sparse data, it could be even more effective in circumstances where the data were less sparse. This study can serve as a basis for future work, such as studies that handle different types of data or even larger datasets, where further processing is required to obtain reliable forecasts under the conditions proposed here, namely, for real-time systems. Other future work would concern large-scale deployments and research on new forecasting techniques for multivariate time series with multi-parallel forecasting, particularly if high rates of sparsity are presented, since this is a research field that still has a long way to go, whether we are talking about classical techniques or machine learning (including deep learning).