Abstract
Constructing reusable accident-text corpora is hindered by anonymization, heterogeneous sources, and sparse labels, which complicate cross-document event linking. We propose a spatiotemporal lattice-constrained approach that encodes administrative hierarchies and temporal granularity, defines domain-informed consistency criteria, instantiates spatial/temporal relations via a subset of RCC-8 and Allen’s interval algebra, estimates anchor weights via smoothing with monotonic projection, and fuses signals using a constrained monotonic network with explicit probability calibration. An active-learning decision rule—combining maximum probability with a probability-gap criterion—supports scalable automatic labeling, and controlled augmentation leverages instruction-tuned LLMs under lattice constraints. Experiments show competitive ranking (Hit@1 = 41.51%, Hit@5 = 77.33%) and discrimination (ROC-AUC = 87.34%), with the best F1 (62.46%). The method yields the lowest calibration errors (Brier = 0.14; ECE = 1.97%), maintains performance across sources, and exhibits the smallest F1 fluctuation across thresholds (Δ = 1.7%). In deployment-oriented analyses, it auto-labels 77.7% of cases with 97.51% accuracy among high-confidence outputs while routing 22.3% to review, where the true-positive rate is 81.46%. These findings indicate that integrating structured constraints with calibrated probabilistic fusion enables accurate, auditable, and scalable event linking for accident-corpus construction.