1. Introduction
Recent advances in mobile technology enable continuous location tracking, which has emerged as a promising approach to digital phenotyping—the passive, real-time measurement of behaviour using smartphone sensors [
1]. In mental health contexts, location-based metrics derived from GPS data have been increasingly linked to symptoms of depression, anxiety, and other psychiatric conditions. For example, features such as the number of location clusters, location entropy, and travel distance demonstrated strong associations with depressive symptom severity. Individuals with higher depressive symptoms are likely to exhibit reduced mobility, lower location entropy, and increased time spent at home [
2,
3,
4]. Features including homestay and location variance have consistently correlated with depression severity, highlighting the clinical potential of GPS-based behavioural metrics [
3].
Despite the growing use of these features in mental health applications, raw GPS data remain problematic due to various sources of error, including erratic location points, unintended path tracking, and differences in data quality across mobile operating systems [
5]. These technical inconsistencies could significantly diminish the reliability and interpretability of behavioural metrics, when combining data across devices in particular. Recent sensor validation studies identified systematic differences between Android and iOS platforms in terms of GPS data completeness, noise levels, and sampling rates [
6]. iOS devices, whilst less prone to noise, often experience data gaps due to stricter background sensing policies, whereas Android devices may collect more frequent but noisier data [
7,
8]. Therefore, ensuring accuracy and cross-platform consistency is essential for scalable digital phenotyping applications.
Moreover, the value of GPS tracking in mental health applications lies not in strict geographical precision, but in its ability to extract meaningful behavioural insights that capture patterns correlating with mental health states and provide actionable, clinically relevant information [
9]. Accuracy in this context refers to whether GPS-derived data can detect behavioural regularities and deviations that correspond to mental health symptoms. Reliability denotes the consistency of behavioural feature extraction across different technological conditions, including devices, operating systems, and environmental contexts. From this functional perspective, minor geographic discrepancies are acceptable if the derived metrics remain stable and behaviourally informative.
Several studies attempted to address GPS accuracy issues in mobile health applications. Algorithmic approaches including Kalman filtering [
10], distance-based filtering [
11], and multi-sensor fusion techniques [
12] have been proposed to mitigate measurement errors. More recently, hybrid frameworks that combine physical modelling with AI-enhanced feature extraction have been shown to substantially improve monitoring accuracy in biomedical electronics [
13]. In this context, the finite element method (FEM) offers a useful conceptual foundation: FEM is a numerical approach for solving physics-based governing equations (e.g., heat diffusion) by discretising a continuous domain into small elements and computing field variables under material and boundary constraints. When used to guide representation learning, FEM can provide mechanistically grounded priors or uncertainty estimates that complement data-driven inference. Although Pratticò et al. focus on thermographic monitoring using FEM-guided representations, their approach offers a valuable parallel for digital phenotyping by illustrating how mechanistic constraints can be incorporated into inference and how complementary signals can be integrated to improve robustness. Analogously, GPS-based phenotyping could benefit from physics-informed uncertainty modelling, together with multimodal sensor cues (e.g., bioelectrical, thermographic, or embedded intelligence), to better disambiguate true behavioural changes from measurement artefacts.
However, most existing validation studies focused either on technical GPS accuracy metrics (e.g., metres of spatial error) or on clinical correlations, with the limited integration of both perspectives. Furthermore, few studies systematically evaluated preprocessing algorithms across both major mobile platforms under naturalistic conditions whilst maintaining focus on behavioural feature validity for mental health applications.
This study addresses these research gaps by developing and validating a preprocessing algorithm specifically designed to enhance GPS-based location detection for mental health digital phenotyping. We hypothesise that:
A preprocessing algorithm can enable mobile GPS data to detect major activity centres with high behavioural accuracy (within a tolerable margin of spatial error) across both Android and iOS platforms.
Location-based behavioural metrics derived from preprocessed GPS data will demonstrate consistency across different mobile platforms and environmental contexts.
The preprocessing algorithm will yield behavioural patterns that are plausible and meaningful at the individual level in naturalistic settings.
To test these hypotheses, we employed a two-phase research design involving both controlled experiments and real-world observational tracking. In the first phase, participants followed predefined protocols at known activity locations, enabling direct comparison between algorithm-detected clusters and ground truth. In the second phase, passive GPS data were collected in naturalistic settings without explicit instructions, allowing validation of the methodology under ecologically valid conditions. This integrative approach enables comprehensive assessment of the robustness, sensitivity, and reliability of the proposed preprocessing algorithm for GPS-based behavioural tracking in mental health applications.
4. Discussion
4.1. Summary of Principal Findings
This study evaluated the accuracy and reliability of a GPS-based location detection algorithm for use in digital mental health applications. Through a two-phase validation approach—combining controlled action-based testing with real-world observations—we demonstrated that the proposed DBSCAN-based preprocessing methodology can accurately identify major activity hubs and capture behaviourally meaningful mobility patterns under both structured and naturalistic conditions.
In the controlled experimental setting (Phase 1), the algorithm achieved high spatial accuracy, with average distance errors below 50 metres across key activity types (Home, Work/School, Stationary) on both Android and iOS platforms. The overall mean spatial error was 30.63 m, well within the acceptable range for deriving location-based behavioural features. Additionally, the number of detected activity hubs closely matched ground truth values, particularly under high-accuracy GPS modes (Android’s PRIORITY_ACCURACY mode: N-error = 0.6; iOS’s BEST_FOR_NAVIGATION mode: N-error = 0.4), whilst battery-conserving modes introduced substantial detection errors (Android PRIORITY_LOW_POWER: N-error = 3.0; iOS REDUCED_ACCURACY: N-error = 1.8). Importantly, despite platform-specific variability in raw spatial accuracy, derived behavioural features (location entropy, normalised entropy, location variance) remained highly consistent across devices (ratios: 0.98–1.08), underscoring the method’s robustness for extracting clinically relevant metrics.
Real-world case analyses (Phase 2) further substantiated the algorithm’s utility in capturing individual mobility routines and deviations. For instance, Participant ID 152 exhibited stable weekday routines that were reliably detected across multiple days, alongside meaningful deviations on weekends (e.g., long-distance trips, additional activity clusters, or missing habitual locations), demonstrating the method’s ecological validity and sensitivity to behavioural changes. The algorithm also performed robustly across different transportation modes, as evidenced by Participant ID 203’s consistent detection of origin–destination clusters regardless of whether a private vehicle or high-speed train was used. These results support the core hypothesis that GPS-derived data—when processed using an appropriate clustering framework—can yield reliable, interpretable behavioural indicators for mental health research.
4.2. Strengths and Implications
The findings highlight several key strengths of the proposed GPS-based location detection method, each with important implications for digital phenotyping in mental health applications.
Behavioural Accuracy Over Geographic Precision. The algorithm prioritises behaviourally meaningful accuracy over strict geographic precision, emphasising its ability to detect daily routines and clinically interpretable deviations—an approach aligned with the functional objectives of digital phenotyping. Rather than achieving centimetre-level positioning (which is unnecessary for mental health monitoring), the method focuses on identifying meaningful activity centres within acceptable spatial tolerance (~50 m), enabling reliable extraction of mobility features such as location entropy and homestay duration that have been linked to depression severity.
Cross-Platform Consistency. The method demonstrated strong platform consistency, a critical requirement for scalable deployment. Despite variability in raw GPS data quality across mobile operating systems and device settings, the algorithm yielded comparable behavioural features (within 8% agreement), supporting its applicability across diverse technological environments. This finding addresses a major barrier identified in previous digital phenotyping research, where platform differences have compromised data comparability and limited multi-site studies.
Real-World Adaptability. The algorithm proved adaptable to real-world complexity, successfully capturing nuanced behavioural patterns such as long-distance travel, temporary activity modifications, and weekday–weekend variability without requiring user input or controlled conditions. This capability is particularly valuable for mental health applications, where naturalistic monitoring over extended periods is essential for detecting gradual changes or early warning signs of symptom exacerbation.
Sensitivity to Behavioural Deviations. The approach exhibited sensitivity to meaningful deviations in behaviour, offering potential utility for early detection of changes relevant to mental health monitoring. The ability to distinguish between stable routines and deviations (e.g., missing habitual activities, introducing new locations) may provide objective markers of behavioural activation, social withdrawal, or routine disruption—features that have been associated with mood disorders.
4.3. Comparison with Prior Work
The spatial accuracy achieved in this study (mean error: 30.63 m) compares favourably with previous GPS validation research in health applications. A smartphone GPS accuracy study in urban environments reported overall average horizontal position accuracy of 7–13 m for iPhones, whilst a comparative study of GPS devices and mobile phones found that GPS-enabled smartphones are typically accurate to within 4.9 m (16 ft.) radius under open sky, with accuracy worsening near buildings, bridges, and trees. Our findings align with these reports whilst demonstrating that even with moderate spatial errors, behavioural features remain valid and consistent across platforms.
Regarding cluster detection accuracy, our results showing N-error of 0.4–0.6 under optimal settings represent a substantial improvement over methods that rely on raw GPS coordinates without preprocessing. Previous research analysing over 57 million GPS data points showed that prediction accuracy for depression dropped from approximately 80% in homogeneous student samples to approximately 60% in heterogeneous populations, highlighting the importance of robust preprocessing algorithms like the one developed in this study.
The cross-platform consistency observed in our study (behavioural feature ratios: 0.98–1.08) addresses a critical gap identified in prior validation work. Previous studies have documented systematic differences between Android and iOS in GPS data completeness and quality, but few have demonstrated that these differences can be mitigated through appropriate algorithmic preprocessing whilst maintaining behavioural feature validity.
4.4. Limitations
Several limitations should be acknowledged when interpreting these findings.
Temporal Resolution Constraints. Whilst the clustering algorithm successfully identified major activity hubs, transient locations (e.g., brief stops under 40–45 min) were occasionally missed due to the temporal resolution threshold used in DBSCAN (min_samples = 10 points × 5 min intervals = 50 min minimum dwell time). For applications requiring detection of shorter activities, the min_samples parameter can be reduced (e.g., to 5 for a 25 min minimum duration), though this requires validation to balance sensitivity against false positive detection from GPS noise.
Limited Sample Size in Controlled Validation. The sample size for the controlled validation phase was limited (N = 5 participants over 5 days), which may constrain the statistical power and generalisability of the spatial accuracy findings. Although the real-world validation phase included a broader sample (N = 38 participants over 7 days), providing stronger evidence for ecological validity, future research should incorporate larger and more demographically and clinically diverse cohorts to further evaluate the robustness of the proposed method across different populations, age groups, geographic regions, and clinical conditions.
Absence of Naturalistic Ground Truth. A significant limitation of the real-world validation phase is the absence of concurrent ground truth behavioural assessments during naturalistic tracking. Without ecologically valid, time-aligned reference data, it remains difficult to definitively confirm whether all detected activity patterns correspond to actual meaningful behaviours or whether some meaningful activities were missed. The case analyses presented provide face validity through visual inspection and plausibility assessment, but lack quantitative ground truth verification. This limitation could be addressed in future studies by integrating multimodal sensing with physics-informed uncertainty modelling (e.g., FEM-AI-inspired approaches), thereby introducing contextual constraints from complementary sensors and improving discrimination between true behavioural patterns and GPS-related artefacts.
Transportation Mode Detection. Under the current algorithmic setting, prolonged low-speed congestion could, in principle, be misclassified as a stay if GPS points remain within ≈100 m for ≥50 min; although we did not observe comparable cases in our naturalistic dataset, this remains a limitation of GPS-only clustering when transportation context is unavailable. Whilst behaviourally trivial segments (e.g., motorway passages or railway transitions) were intentionally excluded from cluster detection, future applications may benefit from finer-grained transportation mode detection to differentiate passive transit from active engagement. Such information could provide additional behavioural context, for example, distinguishing between walking (potentially indicating exercise or social activity) and vehicular travel (more passive behaviour), and could also reduce false stay detection under traffic-related slow movement.
Device- and Environment-Dependent GPS Uncertainty. We note that environmental and device-related factors that could affect GPS quality were not explicitly controlled in this study. In dense urban areas, multipath and satellite-geometry effects (urban canyon conditions) might generate spurious point clouds that produce false micro-clusters and increase sensitivity to DBSCAN parameters (ε and min_samples). Additionally, device carrying position (e.g., pocket, bag, or hand) and device orientation can affect GPS signal reception via body shielding and antenna attenuation.
Furthermore, device-specific factors such as antenna quality, chipset capabilities, and operating system location service implementations may introduce systematic differences in GPS accuracy and noise characteristics. These contextual and device-related factors were not recorded in our study and could not be quantified. Although we indirectly mitigated such noise by requiring spatiotemporal persistence (min_samples = 10, or 50 min of sustained presence) and applying conservative clustering criteria, these measures may not completely eliminate false positives in extremely challenging environments.
More robust approaches that explicitly account for such uncertainties—for example, density-adaptive clustering methods (OPTICS, HDBSCAN), physics-informed uncertainty models that weight GPS points by estimated accuracy or device orientation, or multimodal validation using complementary sensors—remain important directions for future work.
Single-Country Context. This study was conducted entirely in South Korea, with a relatively homogeneous sample reflecting activity patterns typical of urban and suburban Korean environments. GPS performance, urban density, building structures, and daily routine patterns may differ substantially in other geographic and cultural contexts. Validation in diverse international settings would strengthen confidence in the method’s generalisability. In addition, future work could explore subgroups of users who share similar data-collection environments and investigate adaptive clustering strategies tailored to these contextual subtypes.
4.5. Clinical and Practical Implications
The validated GPS preprocessing algorithm has several important implications for digital phenotyping in mental health research and clinical practice.
Research Applications. The demonstrated reliability and cross-platform consistency enable researchers to deploy GPS-based digital phenotyping across diverse participant populations without concern that platform differences will confound behavioural feature extraction. This facilitates large-scale, multi-site studies and reduces the need for platform-specific calibration or separate analyses for Android and iOS users.
Clinical Monitoring. The algorithm’s ability to detect both stable routines and meaningful deviations suggests potential applications in clinical monitoring contexts. For example, detecting reduced location entropy, increased homestay, or missing habitual activity locations could serve as early warning indicators of depressive episodes, whilst detecting erratic mobility patterns or uncharacteristic long-distance travel might signal manic episodes in bipolar disorder. The passive, unobtrusive nature of GPS data collection makes it particularly suitable for longitudinal monitoring between clinical visits.
Practical Deployment Considerations. The findings regarding GPS accuracy modes have direct practical implications: researchers and clinicians should prioritise high-accuracy GPS settings (PRIORITY_ACCURACY on Android, BEST_FOR_NAVIGATION on iOS) when designing digital phenotyping studies, as battery-conserving modes substantially compromise detection accuracy. However, this must be balanced against participant burden, as high-accuracy modes increase battery drain and may reduce compliance over extended monitoring periods. Accordingly, future work should systematically examine the adherence-battery trade-off and develop energy-aware sensing strategies that preserve behavioural validity while minimising user burden for sustained longitudinal monitoring.
4.6. Future Directions
Building on these findings, several avenues for future research warrant attention.
Parameter Optimisation. Optimising temporal and spatial clustering parameters for different behavioural contexts may improve sensitivity to transient but clinically relevant activities. For example, context-aware thresholding (e.g., lower min_samples during daytime hours, higher thresholds during sleep periods) or location-specific settings (e.g., smaller ε in dense urban areas, larger ε in suburban settings) could enhance detection sensitivity whilst maintaining specificity. Although we employed fixed DBSCAN parameters to ensure interpretability and reproducibility, future work should explore adaptive or hybrid strategies in which ε and/or min_samples are dynamically adjusted using contextual or signal-quality cues (e.g., indoor/outdoor likelihood, motion state, GPS sampling density). Physics-informed and AI-enhanced frameworks (e.g., FEM-AI-inspired approaches) offer a useful conceptual foundation for such context-sensitive thresholds by explicitly modelling measurement uncertainty and learning cross-modal consistency patterns to guide clustering decisions.
Multimodal Data Integration. Integrating multimodal passive sensing data represents a promising avenue for enriching behavioural inference beyond GPS alone. Following recent advances in physics-informed AI approaches for biomedical monitoring [
13], future implementations could incorporate physics-informed uncertainty modelling to better characterise GPS measurement error based on environmental context (e.g., urban density, satellite visibility, signal-to-noise ratios). We envision multimodal sensor integration serving two complementary purposes that address different limitations of GPS-based phenotyping:
(i) Validating GPS-detected locations (>50 min threshold): Complementary sensors would be able to provide cross-modal confirmation of detected activity centres. For example, extended presence at a location cluster could be validated against concordant evidence from physiological sensors (e.g., heart rate variability or electrodermal activity indicating quiescence), accelerometry (e.g., indicating sedentary behaviour), environmental sensors (e.g., ambient temperature, light, or noise levels indicating indoor context), or device usage patterns (e.g., screen time indicating stationary engagement). When multiple sensor modalities provide concordant evidence, confidence in the behavioural interpretation increases; conversely, discordant signals may indicate GPS measurement error or behavioural state transitions requiring closer examination.
(ii) Detecting brief events (<50 min threshold): High-temporal-resolution sensors could identify transient but clinically meaningful events that fall below the current GPS clustering threshold. For instance, accelerometry and step count can capture brief periods of physical activity or social movement; heart rate variability and electrodermal activity can detect acute stress or anxiety episodes; and screen time or application usage can indicate engagement patterns. These sensors provide behavioural context during periods when GPS spatial displacement is minimal or location stops are too brief for cluster formation.
This dual-purpose approach would enable more comprehensive behavioural phenotyping across temporal scales: GPS provides spatial context for longer-duration activities, whilst high-frequency sensors capture brief fluctuations and validate detected patterns. Additionally, sensor fusion could enable automated activity classification (e.g., distinguishing home rest from work activity, or differentiating social visits from solitary activities), enhancing the clinical interpretability of mobility patterns.
Missingness and Imputation. Although we did not observe substantial limitations at the intended 5 min sampling interval in this study, real-world deployments may still exhibit intermittent location-data gaps that affect feature stability. Future work should systematically investigate optimal imputation strategies for intermittent location-data gaps (e.g., bounded carry-forward, short-gap interpolation, or motion state-based imputation) and define appropriate tolerance thresholds to minimise spurious behavioural inference.
Naturalistic Ground Truth Validation. Future studies should prioritise developing and implementing methods for capturing concurrent ground truth behavioural assessments during naturalistic tracking. Approaches such as GPS-triggered ecological momentary assessment (where brief surveys are automatically prompted when new locations are detected), participant-initiated activity logging through simple mobile interfaces, or integration with calendar data could provide time-aligned validation whilst minimising participant burden. Where feasible, pairing ecological momentary assessment data or participant-reported activity labels with complementary passive signals (e.g., wearable-derived activity state or physiological responses) may provide additional objective anchors for short-duration events that are difficult to validate using GPS alone. Such designs would also enable systematic quantitative evaluation of behavioural inference (e.g., agreement between detected clusters/events and self-reported or sensor-anchored ground truth).
Clinical Outcome Validation. Longitudinal studies examining the relationship between GPS-derived mobility features and clinical outcomes (e.g., depression symptom trajectories, relapse events, treatment response) are needed to establish the clinical utility of these metrics. Such studies should examine both concurrent validity (association between mobility patterns and current symptom severity) and predictive validity (ability of mobility changes to forecast future symptom changes or clinical events). Future work should also assess longer-term structural changes in dominant locations (e.g., relocation), which are best inferred post hoc by tracking sustained shifts in centroids, visit frequency, or dwell-time patterns across weeks, potentially using drift or change-point detection to distinguish persistent relocation from short-lived anomalies [
17,
18].
Algorithm Refinement. Future algorithm development could explore alternative clustering approaches (e.g., hierarchical clustering, Gaussian mixture models) or machine learning methods (e.g., recurrent neural networks for sequence modelling) that might capture more complex behavioural patterns. Additionally, developing automated methods to distinguish between different activity types at detected clusters (e.g., home vs. work vs. social vs. healthcare) would enhance clinical interpretability.
5. Conclusions
This study provides robust evidence that GPS data from personal smartphones, when processed using an appropriate preprocessing algorithm, can reliably detect major activity centres and yield behaviourally meaningful mobility patterns for digital phenotyping in mental health applications. The demonstrated spatial accuracy (<50 m error), cluster detection reliability (N-error <1 under optimal settings), and cross-platform consistency (behavioural feature agreement within 8%) establish a methodological foundation for deploying GPS-based digital phenotyping at scale.
Beyond methodological validation, the proposed framework enables the derivation of clinically interpretable behavioural metrics that may help objectify behavioural assessment. Examples include routine regularity (e.g., stability of dominant activity centres across days), mobility diversity (e.g., entropy of visited locations), time-at-home proportion, and deviation indices capturing departures from an individual’s typical spatiotemporal patterns. Such metrics align with clinically relevant constructs such as behavioural activation, social withdrawal, circadian regularity, and functional mobility, which are commonly implicated in mental health conditions.
The algorithm’s ability to capture both stable routines and meaningful deviations in naturalistic settings, combined with its robustness across different transportation modes and activity contexts, supports its utility for real-world mental health monitoring. Whilst limitations including temporal resolution constraints and the need for naturalistic ground truth validation remain, these findings represent an important step towards leveraging ubiquitous smartphone technology for objective, continuous behavioural assessment in mental health research and clinical care.