Estimating Urban Travel Intensity from Ambient Seismic Signals via a Hybrid CatBoost–LSTM Framework

Guo, Kai; Hou, Jianmin

doi:10.3390/app16073407

Open AccessArticle

Estimating Urban Travel Intensity from Ambient Seismic Signals via a Hybrid CatBoost–LSTM Framework

by

Kai Guo

^1,2,3,*

and

Jianmin Hou

¹

China Earthquake Networks Center, Beijing 100045, China

²

Computer Network Information Center, Chinese Academy of Sciences, Beijing 100090, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(7), 3407; https://doi.org/10.3390/app16073407

Submission received: 14 February 2026 / Revised: 27 March 2026 / Accepted: 27 March 2026 / Published: 1 April 2026

(This article belongs to the Special Issue Machine Learning Applications in Seismology: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Urban travel intensity is a practical proxy for human mobility, but direct mobility data are often costly, geographically restricted, and privacy sensitive. UTScan uses continuous ambient seismic data to estimate urban travel intensity in a passive, non-intrusive manner. Model development used 10 cities in Hubei Province during January–April 2020, and external validation used 84 non-Hubei cities that satisfied the study’s data-quality criteria. From each hourly power spectral density (PSD) curve, we extracted 13 features in the 2–20 Hz anthropogenic band, applied a station-wise low-activity baseline subtraction, and then modeled daily travel intensity with a CatBoost–LSTM framework. Under the calendar-based forward-validation protocol, the final UTScan implementation (FusionB) achieved a mean RMSE of 0.537 ± 0.214 and a mean Pearson correlation of 0.768 ± 0.076 across the internal Hubei folds and a mean RMSE of 0.789 ± 0.229 and a mean Pearson correlation of 0.605 ± 0.370 across the 84-city external validation set. Additional sensitivity analyses using alternative validation windows and light-touch outlier handling indicated that the main conclusions were stable, while single-station representativeness remained the principal limitation. Ambient seismic noise is therefore a useful passive proxy for estimating city-scale mobility dynamics, especially for abrupt mobility disruptions, but its interpretation remains conditional on station siting, source mixture, and the proxy nature of the Baidu travel-intensity target.

Keywords:

urban travel intensity; ambient seismic noise; machine learning; CatBoost; LSTM; baseline subtraction; cross-city validation

1. Introduction

Sustainable Development Goal 13 (SDG 13) calls for timely, data-driven assessments of anthropogenic activity and its environmental consequences [1,2,3]. Urban travel is one of the most pervasive human activities, yet city-scale mobility estimates commonly rely on large proprietary datasets and can raise privacy concerns. Although platform-based mobility products such as those from Google and Baidu are useful for scientific analysis [4,5], they are not always publicly available, are not uniformly accessible across regions, and reflect the measurement assumptions of the underlying data provider. Continuous seismic observations provide a complementary pathway: they are passively collected, widely archived by earthquake monitoring networks, and do not directly identify individuals. In this context, seismic data offer a low-cost and privacy-preserving source for estimating aggregated mobility dynamics.

The traditional role of seismic monitoring networks is earthquake detection and Earth-structure research, but during periods without strong earthquakes much of the recorded high-frequency signal is generated by human activity near the surface, including pedestrians, road traffic, rail transport, and industrial operations [6,7,8,9]. Machine-learning approaches have also become common in seismic signal analysis, including source location, denoising, and traffic-related signal extraction [10,11,12,13,14,15]. Controlled-source and ambient-noise studies in urban settings further show that anthropogenic and structural signals can coexist in complex ways [16,17]. Recent urban seismology studies have nevertheless shown that high-frequency ambient noise can track changes in human activity and mobility during COVID-19 restrictions and other urban experiments [18,19,20,21,22,23,24,25]. Our contribution, therefore, is not to claim the first quantitative mobility-noise relation but to build and test a cross-city estimation framework under a more explicit quality-control and validation protocol.

At the same time, seismic amplitude is not uniquely controlled by urban travel. Site amplification, local geology, basin resonance, instrument installation environment, industrial sources, and transient non-mobility signals can all influence the observed power spectrum [9,26,27,28,29,30,31]. This physical ambiguity makes cross-city generalization challenging, especially when only one station represents a city. A successful multi-city model must therefore align station-dependent PSD distributions, isolate mobility-sensitive features as carefully as possible, and use a validation strategy that does not exploit temporal leakage.

The COVID-19 period in China provides a natural experiment for this purpose. Following the implementation of epidemic control measures in Hubei Province in January 2020, urban travel intensity dropped abruptly and then recovered progressively as restrictions changed [32,33]. The broader scientific attention drawn by COVID-related seismic quieting also underscored the potential of such observations as a passive complement to conventional mobility indicators [24,34]. These pronounced mobility swings supply a stringent test bed for evaluating whether seismic observations can recover meaningful travel-intensity variations.

In this study, we develop UTScan as a hybrid machine-learning framework for estimating urban travel intensity from ambient seismic noise. This study extends our earlier seismic-data-driven urban travel evaluation model [35] by preserving the original CatBoost-plus-LSTM route while evaluating it under a stricter protocol designed for temporally ordered model development and external cross-city testing. Station-wise baseline subtraction is used to align PSD features from different stations, model development is confined to 10 Hubei cities, and cross-city transfer is tested on 84 non-Hubei cities that remained usable after quality control. Figure 1 shows the travel-intensity trajectories for representative Hubei cities during the study period.

1. We formalize station-wise baseline subtraction as a core preprocessing step for aligning PSD features from different seismic stations into a comparable feature space.

2. We use an explicit calendar-based forward-validation design for model development and evaluate cross-city transfer on 84 non-Hubei cities that satisfied the data-quality criteria.

3. We compare CatBoost-static, LSTM-only, FusionA, and FusionB under a unified protocol and identify FusionB as the most stable hybrid implementation for the final UTScan framework.

2. Materials and Methods

This study assembled continuous records from approximately 200 candidate broadband stations archived by the China National Earthquake Data Center and paired them with the Baidu urban travel-intensity index for January–April 2020. After excluding station–city pairs with unstable continuity, clearly abnormal PSD behavior, or no suitable station within 50 km of the mapped city center, and after requiring usable Baidu labels and a valid station-specific baseline vector, 94 city-level records remained. These comprised 10 Hubei cities for model development and 84 non-Hubei cities for external validation.

Data Splitting and Validation Strategy. Model development within the 10 Hubei cities used four calendar-based forward folds with 14-day validation blocks. Fold 1 trained on 1 January 2020–11 February 2020 (42 days) and validated on 12 February 2020–25 February 2020 (14 days); Fold 2 trained on 1 January 2020–25 February 2020 (56 days) and validated on 26 February 2020–10 March 2020 (14 days); Fold 3 trained on 1 January 2020–10 March 2020 (70 days) and validated on 11 March 2020–24 March 2020 (14 days); and Fold 4 trained on 1 January 2020–24 March 2020 (84 days) and validated on 25 March 2020–7 April 2020 (14 days). After model selection, the chosen configuration was refit on the full 1 January 2020–30 April 2020 Hubei interval and evaluated once on the 84 non-Hubei external cities. Because validation-block length can influence internal metrics, we additionally examined 7-day and 21-day alternatives for the final FusionB model as a sensitivity analysis.

Urban travel activities generate seismic signals with distinctive frequency characteristics [9,18,31]. We therefore used PSD-derived frequency-domain features instead of raw waveforms. The archived seismic-noise products provide hourly PSD curves for each mapped station; from each hourly PSD, we extracted the values nearest to 13 target periods of 0.050, 0.058, 0.068, 0.079, 0.100, 0.110, 0.117, 0.126, 0.155, 0.200, 0.230, 0.317, and 0.490 s (equivalent to 20.00, 17.24, 14.71, 12.66, 10.00, 9.09, 8.55, 7.94, 6.45, 5.00, 4.35, 3.15, and 2.04 Hz), and then averaged the available hourly values within each day to form the daily feature vector. PSD amplitudes recorded by different stations differ markedly because of local geology, site response, installation environment, and instrument background. To reduce these station-dependent offsets, each station was normalized by subtracting a precomputed 13-band low-activity baseline vector before model training. The baseline was computed separately for each station from an independently defined low-activity interval and then applied as a fixed station-wise correction; it was not recalculated inside each validation fold. To assess the sensitivity of the final results to this choice, we repeated the final external evaluation of FusionB with an alternative baseline defined as the 5th percentile of the available PSD values at each target period for each station. This comparison was used only as a sensitivity check and did not affect model selection or the main protocol. We intentionally adopted a light-touch preprocessing strategy because overly aggressive filtering could remove mobility-relevant variability together with unwanted transients. Robustness checks using alternative daily aggregation and simple spike screening did not materially change the main conclusions. The effect of station-wise baseline subtraction on PSD distributions and city-wise correlations is illustrated in Figure 2.

Tree-Based and Sequence Models. Among gradient-boosting decision-tree methods [36,37,38], we retained CatBoost [38] as the tree-based backbone of the framework. CatBoost-static uses same-day baseline-subtracted PSD features to predict same-day travel intensity. LSTM-only uses seven-day sequences of daily seismic features to model temporal dependence directly. FusionA and FusionB both start from CatBoost preliminary predictions and then use LSTM modules to absorb temporal information, but they do so differently. FusionA is a residual-style route that combines historical preliminary predictions with historical travel values during training and predicts a correction term. FusionB is the deployment-consistent route that consumes only historical preliminary predictions and directly outputs the final travel-intensity estimate. The protocol is summarized in Table 1 and Figure 3.

Hyperparameter tuning for the CatBoost and LSTM components was performed only within the Hubei development cities under the calendar-based folds. For CatBoost we searched learning rates of 0.05 and 0.10, tree depths of 4, 6, and 8, and L2 regularization values of 3 and 5 with 800 fixed iterations. For the sequence models, we searched hidden sizes of 16 and 32, one or two recurrent layers, batch sizes of 32 and 64, a learning rate of 0.001, and dropout fixed at 0.0. Early stopping was applied to control overfitting. Data preprocessing, model development, statistical analysis, and visualization were performed in Python 3.9, using PyTorch 1.13.1+cu116 as the main deep-learning library.

Table 1 summarizes the calendar-based protocol. All model selection occurs within the Hubei development cities under strictly forward validation windows; the chosen configuration is then refit on the full 1 January–30 April Hubei interval and evaluated only on the separate 84-city external validation set. A supplementary sensitivity check compares 7-day, 14-day, and 21-day validation windows for the final FusionB model.

We compared four model families under the same common protocol: CatBoost-static, LSTM-only, FusionA, and FusionB. Performance was summarized using RMSE, MAE, MAPE, R², and Pearson correlation, with internal metrics averaged across the four calendar-based Hubei folds and external metrics averaged across the 84-city external validation set. The resulting comparison is reported in Table 2, and the final selected model definitions and configurations are summarized in Table 3.

Table 2 shows that temporal modeling improves upon the static same-day baseline and also clarifies that hybrid performance depends critically on the stage-two formulation. FusionB remains stable because the stage-two LSTM receives the same type of input at training and deployment, whereas FusionA degrades under rollout because it is trained with historical observed travel values but deployed with predicted history.

We therefore retained FusionB as the final UTScan implementation. It best preserves the intended CatBoost-plus-LSTM route while providing the most stable overall balance between strict internal validation and external cross-city evaluation under the revised protocol. FusionA is retained as a diagnostic ablation rather than the final model.

UTScan Model. The final UTScan definition corresponds to FusionB. A CatBoost model first produces day-wise preliminary predictions from the baseline-subtracted seismic features, and an LSTM then processes a sequence of these preliminary predictions to account for temporal dependence and refine the final daily estimate. This direct-fusion formulation avoids the train–deployment mismatch observed in FusionA.

Table 3 makes the stage-one input, stage-two input, prediction target, and selected configuration explicit for each model family so that the final UTScan formulation can be distinguished clearly from the diagnostic FusionA variant.

3. Results

This section summarizes the main quantitative results under the strict forward-validation protocol and the 84-city external validation set.

Assessing the Generalization Capability of UTScan.

We evaluated the final UTScan implementation on 94 city-level records derived from approximately 200 candidate broadband stations. After excluding unstable or clearly abnormal PSD records and retaining only station-city pairs with a usable baseline vector, a usable Baidu label series, and a station located within 50 km of the mapped city center, 94 cities remained. These comprised 10 Hubei cities for model development and 84 non-Hubei cities reserved for external validation. Figure 4 and Figure 5 summarize the resulting external performance and show that the model reproduces major lockdown-related changes in many unseen cities, while also revealing substantial heterogeneity across cities.

Across the 84-city external validation set, FusionB achieved mean external metrics of RMSE 0.789 ± 0.229, MAE 0.630 ± 0.210, MAPE 16.374 ± 6.170, and Pearson correlation 0.605 ± 0.370. Relative to the stage-one baseline, FusionB reduced RMSE in 76 of 84 cities, reduced MAE in 66 of 84 cities, and improved Pearson correlation in 72 of 84 cities. The external predictions also showed a modest negative mean bias (−0.147 on the Baidu intensity scale) and moderate variance compression (prediction/actual standard-deviation ratio = 0.856). Under the present protocol, FusionB provided the most stable overall balance between temporally ordered internal validation and external city-to-city transfer and was therefore retained as the final UTScan implementation. We report city-wise R² for completeness but use RMSE, MAE, and Pearson correlation as the main cross-city summary metrics because R² becomes unstable when the observed Baidu proxy has a narrow dynamic range or extended low-variance plateaus.

External performance was heterogeneous rather than uniform. Sixteen of 84 cities had RMSE below 0.6, fifty had RMSE below 0.8, and fifty-eight had Pearson correlation above 0.6. Thirty-three cities showed negative city-wise R² values. These cases were concentrated in cities with weaker temporal variability in the Baidu proxy: after stratifying the external set by proxy standard deviation, negative R² occurred in 19 of 28 low-dynamic-range cities, 13 of 28 medium-range cities, and 1 of 28 high-range cities. We therefore interpret the external results as evidence of city-to-city transfer under the present data conditions, rather than as evidence of uniformly stable performance in every city.

To address the remaining reviewer concerns, we carried out three targeted checks (Table 4). First, we compared 7-day, 14-day, and 21-day forward-validation windows within the 10 Hubei development cities. Second, we repeated the final FusionB external evaluation after replacing the file-based station-specific low-activity baseline with a percentile-based (P5) alternative. Third, we compared light-touch preprocessing variants for the 84-city external validation set. The 7-day window yielded the lowest mean internal RMSE, whereas the 14-day and 21-day windows gave slightly higher mean correlation/R² values; we therefore retain 14 days as the prespecified main protocol rather than present it as a universally optimal choice. Replacing the file-based baseline with the P5 baseline increased mean RMSE from 0.789 to 1.122 and mean MAE from 0.630 to 0.908, while making the mean bias more negative (−0.147 to −0.639). Mean Pearson correlation changed less and even increased slightly (0.605 to 0.675), indicating that baseline definition affected calibration and absolute error more strongly than rank-order association. Finally, under median daily aggregation, enabling or disabling hourly outlier filtering changed the external summary metrics only marginally, indicating that isolated short-duration spikes did not dominate the daily predictors.

To provide an additional qualitative illustration beyond the 2020 development period, we retain the Shanghai 2022 case shown in Figure 6. Because a directly comparable ground-truth travel-intensity label is unavailable for that period, this case study is presented as a qualitative out-of-sample illustration rather than as part of the formal quantitative validation. The broad consistency between the UTScan-derived temporal pattern and contemporaneous NO₂ variability suggests that the model captures major urban-activity disruptions, but the two variables should be regarded as complementary rather than interchangeable indicators.

4. Discussion

Estimating urban travel intensity is useful for monitoring the environmental and societal effects of human activity. The present results show that continuous seismic records can be used to estimate a Baidu-based proxy of travel intensity, but only under clearly stated conditions. Station-wise baseline subtraction remains necessary because raw PSD amplitudes differ strongly across stations. The forward-validation design also shows that temporal information improves predictive performance while making clear that hybrid performance depends on how the second-stage model is formulated. Under the present protocol, FusionB offered the most stable balance between internal forward validation and external city-to-city transfer.

The comparison across model families also clarifies why the two hybrid variants behave differently. FusionA uses historical observed travel values during training but must roll forward on predicted values during deployment, which likely explains its weaker external robustness. FusionB avoids that mismatch by using only CatBoost preliminary predictions as the sequence input at both training and inference. More broadly, this comparison helps distinguish deployment-consistent temporal refinement from hybrid formulations that appear effective in development settings but become less stable at deployment.

The physical interpretation must remain cautious. Seismic noise responds to travel activity, but it also contains contributions from broader urban activity, industrial sources, and local site effects. The 2–20 Hz band is therefore mobility-sensitive, not mobility-exclusive. Because the predictors are daily features aggregated from hourly PSD measurements across 13 frequency bands, and because the final model learns their temporal evolution jointly, isolated short-duration perturbations are unlikely to dominate the final estimates. This motivated the light-touch preprocessing strategy adopted here. Consistent with that rationale, median aggregation with and without a simple outlier filter produced nearly identical external summary metrics. A minimal baseline-sensitivity analysis further showed that replacing the file-based station-specific baseline with a percentile-based alternative worsened calibration and absolute error, even though mean Pearson correlation changed less. We therefore infer that baseline normalization mainly affects cross-city calibration rather than simple temporal co-variation. This does not imply that all non-mobility sources have been removed: strong local earthquakes, calibration pulses, or persistent industrial sources may still perturb individual days. Similarly, the Baidu migration index is an informative supervisory proxy but not a complete measurement of travel behavior. Accordingly, UTScan should be interpreted as an estimation framework for a mobility proxy rather than as a direct measurement of city-wide travel.

Single-station representativeness remains the largest limitation. This is especially important in large metropolitan areas, where one station cannot fully sample the mobility field of an entire city. Our results are nevertheless consistent with the possibility that a single station may sense an aggregate urban-activity footprint extending beyond its immediate surroundings; however, the effective footprint is sitedependent and was not quantified in this study. Future work should therefore investigate denser station coverage, explicit spatial fusion, longer multi-year label records, and more systematic quality-control procedures for station–city matching.

5. Conclusions

In this study, UTScan is formulated as a hybrid machine-learning framework for estimating urban travel intensity from ambient seismic noise. The final workflow combines station-wise baseline subtraction, calendar-based forward model development in 10 Hubei cities, and external evaluation on 84 non-Hubei cities that satisfied the data-quality criteria.

Under this protocol, FusionB provided the most stable overall balance between internal forward-validation performance (mean RMSE = 0.537; mean Pearson r = 0.768) and external validation performance (mean RMSE = 0.789; mean Pearson r = 0.605). FusionA, by contrast, illustrates how a train-deployment mismatch in the stage-two inputs can weaken hybrid generalization even when the overall architecture appears similar.

Taken together, these results support a restrained conclusion: ambient seismic noise can be used to estimate a Baidu-based proxy of urban travel intensity across cities when station-wise normalization, quality control, and temporally ordered validation are applied.

Rather than replacing platform-based mobility products, UTScan should be viewed as a passive complement under the present data conditions.

Future work should prioritize multi-station integration, stronger spatial quality control, and broader validation across additional regions and years. Particular priorities include quantifying the effective station footprint, disentangling mobility-related energy from other urban sources, and re-evaluating the framework with longer non-COVID label series.

Author Contributions

Conceptualization, K.G. and J.H.; methodology, K.G.; software, K.G.; validation, K.G. and J.H.; formal analysis, K.G.; investigation, K.G.; resources, J.H.; data curation, K.G.; writing—original draft preparation, K.G.; writing—review and editing, K.G. and J.H.; visualization, K.G.; supervision, J.H.; project administration, J.H.; funding acquisition, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shanxi Provincial Key R&D Program of China, grant number 202202020101009, and the National Key R&D Program of China, grant number 2021YFC3000705. The APC was funded by the Shanxi Provincial Key R&D Program of China, grant number 202202020101009.

Data Availability Statement

Urban travel-intensity data were obtained from the Baidu Migration Big Data Platform for the periods analyzed in this study. Continuous waveforms from the China National Earthquake Data Center are too large to distribute online and can be accessed offline through the official data-sharing portal.

Conflicts of Interest

The authors declare no conflict of interest.

References

Deng, Z.; Ciais, P.; Tzompa-Sosa, Z.A.; Saunois, M.; Qiu, C.J.; Tan, C.; Sun, T.C.; Ke, P.Y.; Cui, Y.N.; Tanaka, K.; et al. Comparing national greenhouse gas budgets reported in UNFCCC inventories against atmospheric inversions. Earth Syst. Sci. Data 2022, 14, 1639–1675. [Google Scholar] [CrossRef]
Kongboon, R.; Gheewala, S.H.; Sampattagul, S. Greenhouse gas emissions inventory data acquisition and analytics for low carbon cities. J. Clean. Prod. 2022, 343, 130711. [Google Scholar] [CrossRef]
Wang, J.; Azam, W. Natural resource scarcity, fossil fuel energy consumption, and total greenhouse gas emissions in top emitting countries. Geosci. Front. 2024, 15, 101757. [Google Scholar] [CrossRef]
Yangtianzheng, Z.; Ying, G. Spatial patterns and trends of inter-city population mobility in China—Based on Baidu migration big data. Cities 2024, 151, 105124. [Google Scholar] [CrossRef]
Alshahrani, R.; Babour, A. An Infodemiology and Infoveillance Study on COVID-19: Analysis of Twitter and Google Trends. Sustainability 2021, 13, 8528. [Google Scholar] [CrossRef]
Chen, Q.-F.; Li, L.; Li, G.; Chen, L.; Peng, W.-T.; Tang, Y.; Chen, Y.; Wang, F.-Y. Seismic features of vibration induced by train. Acta Seismol. Sin. 2004, 17, 715–724. [Google Scholar] [CrossRef]
Riahi, N.; Gerstoft, P. The seismic traffic footprint: Tracking trains, aircraft, and cars seismically. Geophys. Res. Lett. 2015, 42, 2674–2681. [Google Scholar] [CrossRef]
Scafetta, N.; Mazzarella, A. Cultural noise and the night-day asymmetry of the seismic activity recorded at the Bunker-East (BKE) Vesuvian Station. J. Volcanol. Geotherm. Res. 2018, 349, 117–127. [Google Scholar] [CrossRef]
Diaz, J.; Ruiz, M.; Sanchez-Pastor, P.S.; Romero, P. Urban seismology: On the origin of earth vibrations within a city. Sci. Rep. 2017, 7, 15296. [Google Scholar] [CrossRef]
Chen, Y.K.; Savvaidis, A.; Fomel, S.; Saad, O.M.; Chen, Y.F. RFloc3D: A machine-learning method for 3-D microseismic source location using P- and S-wave arrivals. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5901310. [Google Scholar] [CrossRef]
Elsayed, H.S.; Saad, O.M.; Soliman, M.S.; Chen, Y.K.; Youness, H.A. EQConvMixer: A deep-learning approach for earthquake location from single-station waveforms. IEEE Geosci. Remote Sens. Lett. 2023, 20, 7504905. [Google Scholar] [CrossRef]
Min, R.; Chen, Y.F.; Wang, H.; Chen, Y.K. DAS vehicle signal extraction using machine learning in urban traffic monitoring. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5908510. [Google Scholar] [CrossRef]
Mousavi, S.M.; Ellsworth, W.L.; Zhu, W.; Chuang, L.Y.; Beroza, G.C. Earthquake transformer-an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat. Commun. 2020, 11, 3952. [Google Scholar] [CrossRef]
Zhu, W.; Mousavi, S.M.; Beroza, G.C. Seismic Signal Denoising and Decomposition Using Deep Neural Networks. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9476–9488. [Google Scholar]
Yang, L.; Liu, X.; Zhu, W.; Zhao, L.; Beroza, G.C. Toward improved urban earthquake monitoring through deep-learning-based noise suppression. Sci. Adv. 2022, 8, eabl3564. [Google Scholar] [CrossRef]
Diaz, J.; DeFelipe, I.; Ruiz, M.; Andres, J.; Ayarza, P.; Carbonell, R. Identification of natural and anthropogenic signals in controlled source seismic experiments. Sci. Rep. 2022, 12, 3171. [Google Scholar] [CrossRef] [PubMed]
Diaz, J.; Ventosa, S.; Schimmel, M.; Ruiz, M.; Macau, A.; Gabas, A.; Marti, D.; Akin, O.; Verges, J. Mapping the basement of the Cerdanya Basin (eastern Pyrenees) using seismic ambient noise. Solid Earth 2023, 14, 499–514. [Google Scholar] [CrossRef]
Diaz, J.; Ruiz, M.; Jara, J.-A. Seismic monitoring of urban activity in Barcelona during the COVID-19 lockdown. Solid Earth 2021, 12, 725–739. [Google Scholar] [CrossRef]
Zhao, Y.; Li, Y.E.; Nilot, E.; Fang, G. Urban Running Activity Detected Using a Seismic Sensor during COVID-19 Pandemic. Seismol. Res. Lett. 2022, 93, 181–192. [Google Scholar] [CrossRef]
Dias, F.L.; Assumpção, M.; Peixoto, P.S.; Bianchi, M.B.; Collaço, B.; Calhau, J. Using seismic noise levels to monitor social isolation: An example from Rio de Janeiro, Brazil. Geophys. Res. Lett. 2020, 47, e2020GL088748. [Google Scholar] [CrossRef]
Grecu, B.; Borleanu, F.; Tiganescu, A.; Poiata, N.; Dinescu, R.; Tataru, D. The effect of 2020 COVID-19 lockdown measures on seismic noise recorded in Romania. Solid Earth 2021, 12, 2351–2368. [Google Scholar] [CrossRef]
Hayashida, T.; Yoshimi, M.; Suzuki, H.; Mori, S.; Kagawa, T.; Ichii, K.; Yamada, M. Tracking the effect of human activity on MeSO-net noise using seismic data traffic: Did seismic noise in Tokyo truly decrease during the COVID-19 state of emergency? Seismol. Res. Lett. 2023, 94, 2750–2764. [Google Scholar] [CrossRef]
Li, Y.E.; Nilot, E.A.; Zhao, Y.; Fang, G. Quantifying Urban Activities Using Nodal Seismometers in a Heterogeneous Urban Space. Sensors 2023, 23, 1322. [Google Scholar] [CrossRef]
Lecocq, T.; Hicks, S.P.; Van Noten, K.; van Wijk, K.; Koelemeijer, P.; De Plaen, R.S.M.; Massin, F.; Hillers, G.; Anthony, R.E.; Apoloner, M.T.; et al. Global quieting of high-frequency seismic noise due to COVID-19 pandemic lockdown measures. Science 2020, 369, 1338–1343. [Google Scholar] [CrossRef]
Nimiya, H.; Ikeda, T.; Tsuji, T. Temporal changes in anthropogenic seismic noise levels associated with economic and leisure activities during the COVID-19 pandemic. Sci. Rep. 2021, 11, 20439. [Google Scholar] [CrossRef]
Rahman, S.I.B.A.; Lythgoe, K.; Muktadir, M.G.; Akhter, S.H.; Hubbard, J. Characterization and spatiotemporal variations of ambient seismic noise in eastern Bangladesh. Front. Earth Sci. 2024, 12, 1334248. [Google Scholar] [CrossRef]
Wilson, D. Broadband seismic background noise at temporary seismic stations observed on a regional scale in the southwestern United States. Bull. Seismol. Soc. Am. 2002, 92, 3335–3342. [Google Scholar] [CrossRef]
Kotov, A.N.; Agibalov, A.O.; Sentsov, A.A. Low-Frequency Noise Pollution in the Northeastern Part of Mosrentgen (Moscow). Izv. Atmos. Ocean. Phys. 2023, 59, 959–970. [Google Scholar] [CrossRef]
Saadia, B.; Fotopoulos, G. Characterizing ambient seismic noise in an urban park environment. Sensors 2023, 23, 2446. [Google Scholar] [CrossRef] [PubMed]
Smith, K.; Tape, C. Seismic Noise in Central Alaska and Influences From Rivers, Wind, and Sedimentary Basins. J. Geophys. Res. Solid Earth 2019, 124, 11678–11704. [Google Scholar] [CrossRef]
McNamara, D.E. Ambient Noise Levels in the Continental United States. Bull. Seismol. Soc. Am. 2004, 94, 1517–1527. [Google Scholar] [CrossRef]
Shi, S.; Pain, K.; Chen, X. Looking into mobility in the COVID-19 ‘eye of the storm’: Simulating virus spread and urban resilience in the Wuhan city-region travel flow network. Cities 2022, 126, 103675. [Google Scholar] [CrossRef] [PubMed]
Sohrabi, C.; Alsafi, Z.; O’Neill, N.; Khan, M.; Kerwan, A.; Al-Jabir, A.; Iosifidis, C.; Agha, R. World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19). Int. J. Surg. 2020, 76, 71–76. [Google Scholar] [CrossRef]
Gibney, E. Coronavirus lockdowns have changed the way Earth moves. Nature 2020, 580, 176–177. [Google Scholar] [CrossRef]
Guo, K.; Li, J.H.; Shi, L. Intelligent urban travel evaluation model driven by seismic data. High Technol. Lett. 2025, 35, 1300–1310. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Ke, G.L.; Meng, Q.; Finley, T.; Wang, T.F.; Chen, W.; Ma, W.D.; Ye, Q.W.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 3140–3148. [Google Scholar]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 2–8 December 2018. [Google Scholar]

Figure 1. Changes in urban travel intensity in multiple cities in Hubei Province, China, during the COVID-19 pandemic. It shows the travel intensity from 1 January 2020 to 30 April 2020 for eight cities in Hubei Province, including Wuhan, Huangshi, and Shiyan. The red dashed lines indicate the dates when control measures were announced due to the pandemic, showing a significant decrease in urban travel intensity, with Wuhan having the lowest intensity. Urban travel intensity is defined as the ratio of the traveling population to the total population of the city, with values ranging from 0 to 10, where higher values indicate a higher proportion of travel.

Figure 2. Distribution and correlation analysis of raw and normalized seismic noise with urban travel intensity. The PSD archive provides hourly PSD curves for each station; from each hourly curve, we extracted 13 target features in the 2–20 Hz range and then averaged the available hourly values within each day to build the daily feature vector. (a) shows the raw multi-band noise distribution and the corresponding travel-intensity series for 10 Hubei cities, highlighting large station-to-station amplitude differences. (b) shows the same comparison after station-wise baseline subtraction, which improves cross-city comparability by reducing station-dependent amplitude offsets. (c,d) present Pearson correlation coefficients between eight representative stations and the corresponding city travel-intensity series before and after alignment. In the correlation panels, warmer colors denote higher values and cooler colors denote lower values. Differences in the frequency-domain shape of the curves reflect varying mixtures of road traffic, broader urban and industrial activity, local site effects, and installation conditions; therefore, the seismic amplitude is mobility-sensitive but not mobility-exclusive.

Figure 3. Calendar-based forward-validation design used in this study. Each fold used all earlier days for training and the immediately following 14-day block for validation: Fold 1 trained on 1 January 2020–11 February 2020 (42 days) and validated on 12 February 2020–25 February 2020 (14 days); Fold 2 trained on 1 January 2020–25 February 2020 (56 days) and validated on 26 February 2020–10 March 2020 (14 days); Fold 3 trained on 1 January 2020–10 March 2020 (70 days) and validated on 11 March 2020–24 March 2020 (14 days); and Fold 4 trained on 1 January 2020–24 March 2020 (84 days) and validated on 25 March 2020–7 April 2020 (14 days). After fold-based model selection, the final models were refit on the full Hubei development period (1 January 2020–30 April 2020) and evaluated once on the separate 84-city external set.

Figure 4. City-wise prediction-error distribution across the 84-city external validation set for the final UTScan implementation (FusionB). The error metric is the root mean squared error (RMSE) between the predicted travel intensity and the observed Baidu travel-intensity proxy. The dashed vertical line marks the mean RMSE (0.789). Sixteen cities (19.0%) have RMSE below 0.6, and fifty cities (59.5%) have RMSE below 0.8. The right-skewed tail indicates that prediction error is concentrated in a limited subset of more difficult cities rather than being uniformly elevated across the external set.

Figure 5. FusionB predictions versus the observed Baidu travel-intensity proxy in six representative external cities: Jining, Shantou, Zhongshan, Yangjiang, Chuxiong, and Baiyin. These examples were selected to span a range of performance levels, temporal patterns, and urban contexts, rather than to present only best-performing cases.

Figure 6. Qualitative out-of-sample illustration of UTScan-derived urban travel intensity in Shanghai from January to early April 2022. The purple dashed line shows contemporaneous NO₂ concentrations. The first red dashed line marks the Chinese New Year period; the second marks the onset of pandemic control measures in Shanghai. This panel is retained as a qualitative case study rather than a formal quantitative validation.

Table 1. Data partitions and calendar-based validation design used in this study.

Use in This Study	Calendar Span	Cities	Stage
Internal forward validation	Training: 1 January 2020–11 February 2020 (42 days) Validation: 12 February 2020–25 February 2020 (14 days)	10 Hubei cities	Fold 1
Internal forward validation	Training: 1 January 2020–25 February 2020 (56 days) Validation: 26 February 2020–10 March 2020 (14 days)	10 Hubei cities	Fold 2
Internal forward validation	Training: 1 January 2020–10 March 2020 (70 days) Validation: 11 March 2020–24 March 2020 (14 days)	10 Hubei cities	Fold 3
Internal forward validation	Training: 1 January 2020–24 March 2020 (84 days) Validation: 25 March 2020–7 April 2020 (14 days)	10 Hubei cities	Fold 4
Full-period refit after fold selection	1 January 2020–30 April 2020	10 Hubei cities	Final refit
Completely unseen-city validation	Same calendar interval after preprocessing and sequence warm-up	84 non-Hubei cities	External validation

Table 2. Comparative performance of the four model families under the common protocol. Internal metrics are mean ± standard deviation across the four Hubei forward folds; external metrics are mean ± standard deviation across the 84-city external validation set.

Model	Internal RMSE	Internal R²	Internal Pearson r	Internal MAE	External RMSE	External R²	External Pearson r	External MAE
CatBoost-static	0.982 ± 0.226	−2.569 ± 3.879	0.618 ± 0.123	0.708 ± 0.134	0.835 ± 0.216	−0.238 ± 1.281	0.572 ± 0.349	0.659 ± 0.206
LSTM-only	0.575 ± 0.143	0.371 ± 0.312	0.560 ± 0.402	0.426 ± 0.096	0.804 ± 0.222	−0.025 ± 0.706	0.701 ± 0.256	0.658 ± 0.215
FusionA	0.871 ± 0.171	−1.443 ± 2.651	0.676 ± 0.144	0.638 ± 0.082	0.865 ± 0.234	−0.283 ± 1.307	0.563 ± 0.359	0.682 ± 0.215
FusionB (final UTScan)	0.537 ± 0.214	0.533 ± 0.126	0.768 ± 0.076	0.403 ± 0.131	0.789 ± 0.229	−0.071 ± 1.128	0.605 ± 0.370	0.630 ± 0.210

Table 3. Final model definitions and selected configurations used in this comparison.

Role in Manuscript	Selected Configuration	Definition	Model
Static comparator	Tree d8; lr 0.05; L2 = 5; 800 iter	Same-day baseline-subtracted PSD bands → actual_t	CatBoost-static
Sequence comparator	LSTM h16; 1 layer; batch64; lr 0.001	Seven-day PSD feature sequence → actual_t	LSTM-only
Diagnostic hybrid	Tree d8; lr 0.05; L2 = 3; LSTM h16; 1 layer; batch64; lr 0.001	Seven-day [baseline, historical actual] sequence → residual_t	FusionA
Final model	Tree d8; lr 0.05; L2 = 3; LSTM h16; 1 layer; batch32; lr 0.001	Seven-day baseline-prediction sequence → actual_t	FusionB (UTScan)

Table 4. Sensitivity and robustness checks for the final UTScan implementation (FusionB).

Analysis	Settings/Data	Key Quantitative Outcome	Interpretation
Validation-window sensitivity	7-, 14-, and 21-day forward-validation windows within the 10 Hubei development cities.	7 d: RMSE 0.339, Pearson 0.737; 14 d: RMSE 0.537, Pearson 0.768; 21 d: RMSE 0.583, Pearson 0.755.	The main conclusions do not depend on a single validation-window choice.
Baseline-normalization sensitivity	Final FusionB external evaluation on the 84-city external set; file-based station-specific low-activity baseline versus a percentile-based (P5) alternative.	File baseline: RMSE 0.789, MAE 0.630, Pearson 0.605, bias −0.147; P5 baseline: RMSE 1.122, MAE 0.908, Pearson 0.675, bias −0.639.	Baseline choice affects calibration and absolute error more strongly than rank-order association; the file-based baseline was therefore retained for the main analysis.
Transient-outlier robustness	Median aggregation with/without outlier filtering; mean aggregation with outlier filtering on the 84-city external set.	median + filter: RMSE 0.789, Pearson 0.605; median − filter: RMSE 0.798, Pearson 0.608; mean + filter: RMSE 0.748, Pearson 0.625.	Alternative light-touch preprocessing choices changed the external summary metrics only modestly, indicating that isolated spikes were not the dominant driver of the daily predictors.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, K.; Hou, J. Estimating Urban Travel Intensity from Ambient Seismic Signals via a Hybrid CatBoost–LSTM Framework. Appl. Sci. 2026, 16, 3407. https://doi.org/10.3390/app16073407

AMA Style

Guo K, Hou J. Estimating Urban Travel Intensity from Ambient Seismic Signals via a Hybrid CatBoost–LSTM Framework. Applied Sciences. 2026; 16(7):3407. https://doi.org/10.3390/app16073407

Chicago/Turabian Style

Guo, Kai, and Jianmin Hou. 2026. "Estimating Urban Travel Intensity from Ambient Seismic Signals via a Hybrid CatBoost–LSTM Framework" Applied Sciences 16, no. 7: 3407. https://doi.org/10.3390/app16073407

APA Style

Guo, K., & Hou, J. (2026). Estimating Urban Travel Intensity from Ambient Seismic Signals via a Hybrid CatBoost–LSTM Framework. Applied Sciences, 16(7), 3407. https://doi.org/10.3390/app16073407

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimating Urban Travel Intensity from Ambient Seismic Signals via a Hybrid CatBoost–LSTM Framework

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI