Multi-Scale Geo-Temporal Crime Embedding (MSG-TCE): A Hierarchical Spatiotemporal Framework for Crime Prediction with Hyperbolic Spatial Pooling and Periodic Transformers
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis study addresses the spatiotemporal prediction of urban crime, applying AI-related technical methodologies to crime prediction, which possesses practical value and social significance to some extent. The authors employ the MSG-TCE framework, integrating the Hierarchical Residual Temporal Encoder (HRTE), the Periodic Transformer Encoder (PTE), and the Hyperbolic Spatial Pooler (HSP), to conduct prediction experiments on crime data from three major cities( Chicago, Los Angeles, and New York City). The paper proposes a unified embedding framework tailored for crime prediction tasks, which offers certain heuristic value for spatiotemporal modeling research in the GeoAI domain. Overall, the paper has clear objectives and is relatively well written.
However, I believe the paper may still have the following issues, which I raise for the authors' consideration:
- The paper is overly focused on the introduction of technical methods while neglecting readability and practical applicability. I feel that some key points should not be glossed over. For instance, the authors state that "but these methods often fail to capture the hierarchical nature of criminal behavior patterns," "Moreover, the spatial distribution of crimes follows non-Euclidean patterns, with hotspots forming hierarchical structures that reflect urban geography and social networks—a characteristic that standard spatial encoding methods cannot adequately represent," and "These methods capture local spatial dependencies but ignore the hierarchical structure of urban environments." These claims actually require further elaboration, including but not limited to adding illustrative examples and supplementary references. In particular, the issue of hierarchical structures in crime deserves more explanation to help readers understand the underlying problems. Furthermore, as a study targeting the field of geo-information science, the paper does not provide any spatial distribution maps of predicted crime risk (such as thematic maps comparing predicted values against actual values, or raster maps of hotspot risk levels). The paper only reports that MSG-TCE demonstrates superior prediction results. Could the authors present the outputs or comparative analyses on actual maps of the three cities (e.g., Chicago) to further highlight the method's advantages? For example, supplementing spatial visualization maps that compare predictions against ground truth would be highly persuasive.
- From the perspective of environmental criminology, there are many theoretically and empirically validated associations between crime and covariates such as socioeconomic conditions, demographic structure, and the built environment. Yet the authors conduct crime prediction based solely on historical crime count data. Have the authors considered introducing these covariates? Whether incorporating such covariates would improve prediction accuracy is a meaningful question worth exploring. The absence of these variables directly limits the upper bound of the model's predictive capability, and the model essentially becomes merely an extrapolation of historical data.
- Some details require careful attention. For example, Table 1 is poorly designed, making the data comparison unclear. For the RMSE/P@20/DTW metrics, could the authors present them in a clearer tabular format (e.g. sub-tables with explicit notation)? Additionally, the authors claim that Figure 2 shows "MSG-TCE shows more uniform performance across different areas than baselines, particularly improving predictions in peripheral regions with complex spatial hierarchies." However, this conclusion is difficult to discern from Figure 2. Could the authors add annotations and provide more in-depth explanation?
Author Response
Please see the attached response to Reviewer 1.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe topic of the manuscript is very interesting and addresses topical issues facing modern cities. I have a few comments and suggestions regarding the manuscript that will help the authors improve the text and thereby increase its impact among the academic community.
The introduction explains the basic motivation, but the literature review appears selective and is not always convincingly linked to the current state of research.
As I have mentioned above, there is a lack of a more robust grounding in key literature on predictive policing, for example, the CPTED concept, where urban design and its elements directly influence crime levels (I recommend looking at CPTED in Matlovicova, K. Mocak, P. Kolesarova, J. 2016. Environment of estates and crime prevention through urban environment formation and modification. Geographica Pannonica 20(3), pp. 168–180, which also describes the application of the concept in practice), as well as GeoAI, spatial statistics (see also Michaek, A. 2022. An Aetiology of Crime in the Suburbs: The Case Study of Bratislava. Folia Geographica 64/1, pp. 90-111, where the correlations and links between selected types of crime and certain monitored indicators are clearly demonstrated, as are the spatial differences between the center and the suburb, showing that the suburban areas of the capital city of Bratislava are safer than the capital itself), and the ethical risks of algorithmic crime. Furthermore, some of the citations do not appear sufficiently convincing as support for the strong methodological claims found in the manuscript.
A clearer definition of the research gap is lacking.
The research design is ambitiously stated, but it appears more like a conceptual proposal than a fully validated empirical study. It is not sufficiently clear how the data, spatial units, and experimental setup were controlled.
The methods are described mathematically, but several implementation details remain unclear. In particular, data preprocessing, graph construction, and hyperparameter tuning require a more precise description.
The results are presented numerically, but without sufficient interpretation, statistical testing, and critical validation. The graphs appear more illustrative than as convincing empirical evidence.
The conclusions are stronger than the evidence presented I think. Claims of significant progress and practical applicability are not sufficiently supported by convincing validation.
Despite the minor shortcomings mentioned, which can be addressed, I consider the submitted study to be very interesting and valuable. I believe that its publication will contribute to the academic discussion on this topic and generate a positive response from readers.
Following revisions, I recommend publishing the manuscript.
Comments on the Quality of English LanguageThe English is generally understandable, but the text contains several awkward turns of phrase, overly generic academic phrases, and, in places, imprecise statements regarding methodology and results. The language is not the manuscript’s main problem, but editing it would enhance the clarity and coherence of the argument.
Author Response
Please see the attached response to Reviewer 2.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors
The paper proposes MSG-TCE (Multi-Scale Geo-Temporal Crime Embedding), a deep learning framework designed for spatiotemporal crime prediction. The architecture introduces three main components: a Hierarchical Residual Temporal Encoder (HRTE) for multi-scale temporal trends, a Periodic Transformer Encoder (PTE) for cyclical patterns, and a Hyperbolic Spatial Pooler (HSP) to capture spatial hierarchical structures using graph convolutions in hyperbolic space. These representations are fused via a gated cross-attention mechanism. The authors report that MSG-TCE outperforms several baselines on real-world datasets.
My primary concern is that the paper frames well-established criminological concepts as novel machine learning discoveries without contributing new domain insights. Phenomena such as temporal periodicity, spatial multi-scale structures, and spatiotemporal interactions have been exhaustively theorized and empirically validated for decades within environmental criminology through frameworks like Routine Activity Theory and Crime Pattern Theory. The authors need to explicitly clarify what new underlying mechanisms MSG-TCE uncovers that traditional spatial statistics or shallow models cannot. If the framework merely maps known empirical regularities into complex neural network blocks without advancing our understanding of crime behavior, the contribution remains purely an engineering exercise rather than a scientific advancement.
The proposed architecture is exceptionally heavy, simultaneously coupling dilated convolutions, Transformer self-attention, hyperbolic mappings, graph convolutions, and cross-attention mechanisms. Given that real-world crime data is notoriously sparse, noisy, and highly stochastic, such an over-engineered parameter space raises immediate red flags regarding overfitting. The slight marginal gains in metrics reported in the evaluation section might simply be the result of a high-capacity model memorizing dataset-specific noise rather than learning robust, generalizable laws. The authors must justify this heavy architectural overhead by providing a strict computational efficiency analysis and demonstrating that the model maintains performance without degradation under severe data sparsity.
The manuscript claims that the Hyperbolic Spatial Pooler "better represents" the inherent hierarchical structure of crime hotspots, yet it fails to explain what this hierarchy actually means in the physical world. In deep learning, due to the non-linear coupling of modules, it is impossible to know whether the model is truly learning spatial laws or just capitalizing on statistical biases. To break open this black box, the authors must provide rigorous explainable AI (XAI) evidence. Specifically, I would like to see visualizations of the embeddings in the Poincaré disk to prove whether the discovered "hierarchy" aligns with actual urban geography, such as policing sectors or land-use clusters, alongside attention weight visualizations demonstrating that the PTE is capturing meaningful temporal rhythms rather than random distributions.
Author Response
Please see the attached response to Reviewer 3.
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI recommend accepting the manuscript for publication in its current form.
Reviewer 3 Report
Comments and Suggestions for AuthorsNo

