Next Article in Journal
Stacked Intelligent Metasurfaces: Key Technologies, Scenario Adaptation, and Future Directions
Previous Article in Journal
Drain-Voltage Assessment-Based RC Snubber Design Approach for GaN HEMT Flyback Converters
error_outline You can access the new MDPI.com website here. Explore and share your feedback with us.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development and Validation of a GPS Error-Mitigation Algorithm for Mental Health Digital Phenotyping

1
Medical Research Team, Digital Medic Co., Ltd., Seoul 06159, Republic of Korea
2
Department of Psychiatry, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin 16995, Republic of Korea
3
Institute of Behavioral Science in Medicine, Yonsei University College of Medicine, Seoul 06159, Republic of Korea
4
Technical Development Team, Digital Medic Co., Ltd., Seoul 06159, Republic of Korea
5
Department of Human-Centered Artificial Intelligence, College of Intelligence Information Engineering, Sangmyung University, Seoul 06159, Republic of Korea
*
Authors to whom correspondence should be addressed.
Electronics 2026, 15(2), 272; https://doi.org/10.3390/electronics15020272
Submission received: 28 November 2025 / Revised: 29 December 2025 / Accepted: 6 January 2026 / Published: 7 January 2026
(This article belongs to the Section Electronic Materials, Devices and Applications)

Abstract

Mobile Global Positioning System (GPS) data offer a promising approach to inferring mental health status through behavioural analysis. Whilst previous research has explored location-based behavioural indicators including location clusters, entropy, and variance, persistent GPS measurement errors have compromised data reliability, limiting the practical deployment of smartphone-based digital phenotyping systems. This study develops and validates an algorithmic preprocessing method designed to mitigate inherent GPS measurement limitations in mobile health applications. We conducted comprehensive evaluation through controlled experimental protocols and naturalistic field assessments involving 38 participants over a seven-day period, capturing GPS data across diverse environmental contexts on both Android and iOS platforms. The proposed preprocessing algorithm demonstrated exceptional precision, consistently detecting major activity centres within an average 50-metre margin of error across both platforms. In naturalistic settings, the algorithm yielded robust location detection capabilities, producing spatial patterns that reflected plausible and behaviourally meaningful traits at the individual level. Cross-platform analysis revealed consistent performance regardless of operating system, with no significant differences in accuracy metrics between Android and iOS devices. These findings substantiate the potential of mobile GPS data as a reliable, objective source of behavioural information for mental health monitoring systems, contingent upon implementing sophisticated error-mitigation techniques. The validated algorithm addresses a critical technical barrier to the practical implementation of GPS-based digital phenotyping, enabling the more accurate assessment of mobility-related behavioural markers across diverse mental health conditions. This research contributes to the growing field of mobile health technology by providing a robust algorithmic framework for leveraging smartphone sensing capabilities in healthcare applications.

1. Introduction

Recent advances in mobile technology enable continuous location tracking, which has emerged as a promising approach to digital phenotyping—the passive, real-time measurement of behaviour using smartphone sensors [1]. In mental health contexts, location-based metrics derived from GPS data have been increasingly linked to symptoms of depression, anxiety, and other psychiatric conditions. For example, features such as the number of location clusters, location entropy, and travel distance demonstrated strong associations with depressive symptom severity. Individuals with higher depressive symptoms are likely to exhibit reduced mobility, lower location entropy, and increased time spent at home [2,3,4]. Features including homestay and location variance have consistently correlated with depression severity, highlighting the clinical potential of GPS-based behavioural metrics [3].
Despite the growing use of these features in mental health applications, raw GPS data remain problematic due to various sources of error, including erratic location points, unintended path tracking, and differences in data quality across mobile operating systems [5]. These technical inconsistencies could significantly diminish the reliability and interpretability of behavioural metrics, when combining data across devices in particular. Recent sensor validation studies identified systematic differences between Android and iOS platforms in terms of GPS data completeness, noise levels, and sampling rates [6]. iOS devices, whilst less prone to noise, often experience data gaps due to stricter background sensing policies, whereas Android devices may collect more frequent but noisier data [7,8]. Therefore, ensuring accuracy and cross-platform consistency is essential for scalable digital phenotyping applications.
Moreover, the value of GPS tracking in mental health applications lies not in strict geographical precision, but in its ability to extract meaningful behavioural insights that capture patterns correlating with mental health states and provide actionable, clinically relevant information [9]. Accuracy in this context refers to whether GPS-derived data can detect behavioural regularities and deviations that correspond to mental health symptoms. Reliability denotes the consistency of behavioural feature extraction across different technological conditions, including devices, operating systems, and environmental contexts. From this functional perspective, minor geographic discrepancies are acceptable if the derived metrics remain stable and behaviourally informative.
Several studies attempted to address GPS accuracy issues in mobile health applications. Algorithmic approaches including Kalman filtering [10], distance-based filtering [11], and multi-sensor fusion techniques [12] have been proposed to mitigate measurement errors. More recently, hybrid frameworks that combine physical modelling with AI-enhanced feature extraction have been shown to substantially improve monitoring accuracy in biomedical electronics [13]. In this context, the finite element method (FEM) offers a useful conceptual foundation: FEM is a numerical approach for solving physics-based governing equations (e.g., heat diffusion) by discretising a continuous domain into small elements and computing field variables under material and boundary constraints. When used to guide representation learning, FEM can provide mechanistically grounded priors or uncertainty estimates that complement data-driven inference. Although Pratticò et al. focus on thermographic monitoring using FEM-guided representations, their approach offers a valuable parallel for digital phenotyping by illustrating how mechanistic constraints can be incorporated into inference and how complementary signals can be integrated to improve robustness. Analogously, GPS-based phenotyping could benefit from physics-informed uncertainty modelling, together with multimodal sensor cues (e.g., bioelectrical, thermographic, or embedded intelligence), to better disambiguate true behavioural changes from measurement artefacts.
However, most existing validation studies focused either on technical GPS accuracy metrics (e.g., metres of spatial error) or on clinical correlations, with the limited integration of both perspectives. Furthermore, few studies systematically evaluated preprocessing algorithms across both major mobile platforms under naturalistic conditions whilst maintaining focus on behavioural feature validity for mental health applications.
This study addresses these research gaps by developing and validating a preprocessing algorithm specifically designed to enhance GPS-based location detection for mental health digital phenotyping. We hypothesise that:
  • A preprocessing algorithm can enable mobile GPS data to detect major activity centres with high behavioural accuracy (within a tolerable margin of spatial error) across both Android and iOS platforms.
  • Location-based behavioural metrics derived from preprocessed GPS data will demonstrate consistency across different mobile platforms and environmental contexts.
  • The preprocessing algorithm will yield behavioural patterns that are plausible and meaningful at the individual level in naturalistic settings.
To test these hypotheses, we employed a two-phase research design involving both controlled experiments and real-world observational tracking. In the first phase, participants followed predefined protocols at known activity locations, enabling direct comparison between algorithm-detected clusters and ground truth. In the second phase, passive GPS data were collected in naturalistic settings without explicit instructions, allowing validation of the methodology under ecologically valid conditions. This integrative approach enables comprehensive assessment of the robustness, sensitivity, and reliability of the proposed preprocessing algorithm for GPS-based behavioural tracking in mental health applications.

2. Materials and Methods

2.1. Research Design

This study employed a two-phase validation approach to evaluate the accuracy and reliability of GPS-based location detection for mental health digital phenotyping applications. Phase 1 consisted of controlled action-based experiments with predefined activities and ground truth validation, whilst Phase 2 involved naturalistic observational tracking in uncontrolled settings. Both phases collected GPS data across multiple devices and operating systems to assess cross-platform consistency and algorithm robustness under diverse conditions.

2.2. Phase 1: Controlled Action-Based Validation

2.2.1. Experiment Protocol

We implemented a controlled action-based testing protocol to evaluate the accuracy and reliability of GPS data under known conditions. Having defined accuracy as the capacity to detect major activity centres, we established ground truth data for essential daily behaviours. Participants completed pre-designed activities without restrictions on specific locations or temporal sequences. They maintained detailed activity logs documenting the temporal and spatial parameters of each activity, including start and end times, location addresses, and corresponding GPS coordinates.

2.2.2. Activity Categories

The experimental protocol comprised five activity categories designed to represent typical daily movement patterns:
  • Home Activity: Participants remained within or in close proximity to their residence for extended periods, establishing a primary stationary activity hub. This activity represented the baseline location where individuals typically spend the most time.
  • Work/School Activity: Participants travelled to and remained at their workplace or educational institution, creating a second distinct activity centre. This activity represented regular, recurring locations central to daily routines.
  • Stationary Activity: Participants visited at least one additional location (e.g., gymnasium, café, library) for a duration exceeding one hour, recorded as separate activity hubs. This component was designed to verify whether the algorithm could classify meaningful outdoor visits as major activities distinct from home and work locations.
  • Peripheral Activity: Participants conducted minor behaviours in proximity to their major activity hubs (home, work, or school). This component assessed whether brief, trivial activities would contaminate the detection of major hubs. Examples included visits to convenience stores near residences or brief outdoor excursions in the vicinity of primary locations.
  • Supplementary Activities (Optional): Additional activities beyond those predesignated were permitted and documented alongside required activities. When present, these were classified as major activities for validation purposes.

2.2.3. Multi-Device Data Collection

To assess the reliability of GPS-based tracking across different technological environments, participants carried multiple devices with different operating systems simultaneously throughout the experimental period. Each participant was equipped with four devices: three Android smartphones—Galaxy Z Flip3 (SM-F711N), Galaxy S23 (SM-S911N), and Galaxy A31 (SM-A315N) (Samsung Electronics Co., Ltd., Suwon, Republic of Korea)—and one iOS device—iPhone 12 mini (Apple Inc., Cupertino, CA, USA). All devices were configured to collect real-time GPS coordinates independently using the same data collection application (Big4+ v1.2.6; Digital Medic Co., Ltd., Seoul, Republic of Korea), installed separately on each device. Android devices ran Android 12 or later (Google LLC, Mountain View, CA, USA), and the iOS device ran iOS 16.0 or later (Apple Inc., Cupertino, CA, USA). This multi-device approach enabled direct comparison of GPS data quality and algorithm performance across platforms under identical movement conditions, controlling for individual behavioural variability and environmental factors.

2.2.4. Ground Truth Documentation

To characterise specific contexts of daily movement and activities, participants maintained comprehensive travel records throughout the experimental period. These records included addresses and GPS coordinates (latitude and longitude) of departure and arrival points for all movements between activity centres, as well as the transportation modalities utilised for each journey (including walking, public transport, private vehicle, or other modes). This documentation provided precise ground truth data for validation of algorithm-detected activity centres.

2.3. Phase 2: Naturalistic Observational Study

Beyond controlled scenarios, we examined how the detection algorithm performs with unpredictable real-world data through an observational study. Participants were instructed to maintain their personal mobile devices’ GPS tracking applications actively running throughout the entire study duration. No additional behavioural instructions or activity restrictions were imposed, allowing participants to follow their natural daily routines. This phase provided ecologically valid data for assessing algorithm performance under authentic conditions without artificial constraints.

2.4. Data Collection and Preprocessing

2.4.1. Raw GPS Data Collection

GPS data were collected using the BIG4+(Big4+ v1.2.6; Digital Medic Co., Ltd., Seoul, Republic of Korea) application, a mobile application designed to collect various passive sensing data including GPS coordinates, step count, and movement distance [8]. Data collection occurred at 5 min intervals during daily active hours (09:00 to 23:59) throughout the study period. Each location record captured critical spatial and temporal metadata, including precise latitude, longitude, and corresponding timestamps.
To assess the trade-off between location accuracy and battery consumption, device-specific accuracy modes were strategically employed across different devices. Higher accuracy modes typically consume more battery power but provide more precise location data [9,10,11]:
  • Android devices utilised three accuracy settings: PRIORITIZE_ACCURACY, BALANCED_POWER_ACCURACY, and PRIORITY_LOW_POWER.
  • iOS devices operated in two accuracy modes: BEST_FOR_NAVIGATION and REDUCED_ACCURACY.
This multi-mode approach enabled evaluation of the algorithm’s robustness across different data quality conditions and provided insights into optimal settings for real-world deployment.

2.4.2. Location Clustering Algorithm

Following data collection, we implemented location clustering using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm to identify meaningful activity centres from raw GPS coordinates [12,14]. We selected DBSCAN for its ability to identify clusters of arbitrary shape and its robustness to noise, making it particularly suitable for GPS data that may contain erratic points.
The DBSCAN algorithm requires two core hyperparameters: epsilon (ε), which defines the maximum distance between neighbouring points within a cluster, and min_samples, which specifies the minimum number of points required to form a dense cluster. The parametric values used in this research are summarised in Table 1.
We set epsilon (ε) to 0.001 degrees. To contextualise this value in physical space, we converted degrees to metres using standard geographic approximations. The linear ground distance corresponding to the preset parametric value (ε) at mean latitude (φ) can be approximated by Equation (1) [15,16]:
d N - S   =   ε   ×   111 , 139 m   and   d E - W   =   ε   ×   111 , 139 m ×   cos φ
where 111,139 m Earth s   meridional   length   per   degree , d N S :   geodistance North South and d E W :   geodistance East West . Here, d represents distance in kilometres, ε is the epsilon parameter in degrees, and φ is the latitude in radians.
Since one degree of latitude corresponds to approximately 111 km, 0.001 degrees translates to approximately 111 m in the north–south direction. For longitude, the conversion depends on the cosine of the latitude. At South Korea’s typical latitudes (approximately 37–38° N), one degree of longitude equals approximately 87–88 km, meaning our clustering radius in the east–west direction was approximately 87–88 m.
We set the min_samples parameter to 10 to ensure that only locations where the participant remained within the given radius for at least 50 min (10 consecutive points × 5 min sampling interval) would qualify as a location cluster. This constraint helped filter out short, transient stops and isolated GPS points that did not represent meaningful activity centres.
Once clustering was completed, we calculated the geographic centre (centroid) of each identified cluster by averaging the latitude and longitude coordinates of all GPS points within that cluster. These centroid coordinates were treated as the representative locations of the detected activity centres for subsequent analyses.

2.5. Data Analysis

The data analysis employed a multi-step approach to comprehensively evaluate GPS tracking performance for mental health applications:

2.5.1. Cluster Detection Accuracy

GPS data points were grouped into activity hubs using the DBSCAN algorithm based on spatial proximity (within approximately 100-metre radius) and temporal consistency (sustained presence at the location for a minimum duration of 50 min). The total number of detected clusters was compared with the reported number of ground truth activity centres documented in participants’ activity logs. This comparison assessed the algorithm’s accuracy in identifying the correct number of meaningful activity locations, which is a fundamental requirement for deriving valid mobility features in mental health research.

2.5.2. Spatial Error Assessment

In addition to detecting the overall number of activity centres, geographical alignment of each cluster’s centroid to its corresponding ground truth location was evaluated. Geodesic distance was calculated between the centroid of each detected activity hub and its corresponding ground truth coordinates using the Haversine formula, which accounts for the Earth’s curvature. This metric served as an indicator of spatial precision, which can significantly affect subsequent processing steps when extracting mobility features that depend on the spatial location of activity centres (e.g., homestay duration, location variance, entropy).

2.5.3. Cross-Platform Feature Consistency

Mobility-derived features commonly used in behavioural mental health research were extracted and compared across different mobile operating systems and devices, whilst controlling for identical ground truth behaviour. The evaluated features included:
  • Location Entropy: A measure of the variability in time spent across different locations, calculated as:
H = i = 1 n p i log 2 p i ,
where p i represents the proportion of time spent at location i, and n is the total number of locations.
  • Normalised Location Entropy: Location entropy normalised by the logarithm of the number of locations, providing a scale-independent measure of location diversity.
  • Location Variance: A measure of the spatial spread of visited locations, calculated as the variance in latitude and longitude coordinates.
This comparison aimed to assess the consistency of feature outputs and ensure methodological reliability under varying technological environments, which is critical for multi-platform deployment in clinical settings.

2.5.4. Naturalistic Case Analysis

A detailed case analysis was conducted using data from Phase 2 to evaluate the interpretability and behavioural plausibility of derived mobility patterns across naturally occurring conditions. This analysis examined diverse scenarios including weekday versus weekend routines, different transportation modes, varying travel distances, and diverse environmental contexts (e.g., urban versus suburban settings). The goal was to assess the algorithm’s capacity to produce mobility patterns that reflect meaningful individual behaviours in contextually rich, real-world settings without artificial experimental constraints.
This comprehensive analytical framework enabled robust evaluation of GPS tracking performance from a mental health research perspective, accounting for both technological variability and ecological validity in naturalistic conditions.

3. Results

3.1. Participant Characteristics

In Phase 1 (controlled action-based validation), we recruited five participants who demonstrated high compliance with the experimental protocol. The study was conducted over five consecutive days, from 1 to 5 April 2023, during which each participant carried the provided test devices for one full day. This design enabled systematic evaluation of algorithm performance under controlled conditions with known ground truth data.
Phase 2 (naturalistic observational study) expanded the validation to a broader real-world context, encompassing 38 participants, each monitored over a seven-day period from 1 to 7 December 2023. This phase provided ecologically valid data for assessing algorithm performance in uncontrolled, everyday settings.

3.2. Phase 1: Controlled Action-Based Validation

3.2.1. Cluster Detection Accuracy

Based on the documented activity logs, we established a ground truth dataset representing the primary activity hubs and their geographic locations for each participant. Using this reference, we determined the true number of location clusters—corresponding to the number of major activity centres—and compared them with cluster counts derived from the DBSCAN algorithm. To evaluate the precision of different mobile operating systems and accuracy modes, we computed the average location cluster count error (N-error) across participants, defined as the absolute difference between detected and ground truth cluster counts (Figure 1).
Android Device Performance. For Android devices, the PRIORITY_ACCURACY mode yielded the lowest average location N-error (0.6), whilst the BALANCED_POWER_ACCURACY and PRIORITY_LOW_POWER modes showed considerably higher errors (1.8 and 3.0, respectively). As illustrated in Figure 1A, cluster counts under the PRIORITY_ACCURACY mode typically deviated by no more than one from the true number of activity hubs. In contrast, both BALANCED_POWER_ACCURACY and PRIORITY_LOW_POWER modes consistently underestimated the number of hubs, indicating reduced sensitivity in detecting individual behavioural patterns. The progressive increase in N-error with lower accuracy priority modes demonstrates a clear trade-off between battery conservation and cluster detection accuracy.
iOS Device Performance. For iOS devices, the BEST_FOR_NAVIGATION accuracy mode exhibited a low average N-error (0.4), comparable to Android’s PRIORITY_ACCURACY mode and representing the best performance across all tested configurations. As shown in Figure 1B, this mode missed at most one location in the majority of cases. Conversely, the REDUCED_ACCURACY mode presented a substantially higher average N-error (1.8), frequently omitting more than three hubs and, in some instances, overestimating the cluster count. This suggests that the REDUCED_ACCURACY mode may yield distorted representations of individual mobility behaviours, limiting its utility for mental health applications where accurate activity hub detection is critical.

3.2.2. Spatial Accuracy Assessment

Subsequently, distance error (measured in metres) was computed to evaluate the spatial accuracy of detected cluster centroids in relation to ground truth locations. This analysis focused on three predefined essential activity hubs—Home, Work/School, and Stationary activities—which were consistently present across all participants. Distance errors were calculated using the Haversine formula to account for Earth’s curvature, and comparisons were made between Android and iOS devices operating in their optimal accuracy modes (PRIORITY_ACCURACY and BEST_FOR_NAVIGATION, respectively).
Table 2 presents the average distance errors for each activity type across both platforms. For the Home activity, the average distance error was 44.67 m on Android devices and 37.83 m on iOS devices. For Work/School locations, the errors were 30.04 m and 19.98 m, respectively. In the case of Stationary activities, Android devices showed an average error of 30.92 m, whilst iOS devices exhibited a lower error of 20.32 m. Across all essential activity hubs, the average distance error remained below 50 m for both Android and iOS platforms.
These findings indicate that the algorithm is capable not only of accurately estimating the number of major activity hubs but also of precisely localising their positions in geographic space. The consistently low spatial errors (well below the DBSCAN clustering radius of ~100 m) validate the algorithm’s suitability for deriving location-based behavioural features in mental health research.

3.2.3. Cross-Platform Behavioural Feature Consistency

In addition to evaluating geographical precision, we examined behavioural features commonly derived from location clusters—namely, number of locations, location entropy, normalised location entropy, and location variance. These features are intended to characterise individual behavioural patterns and are frequently employed in digital phenotyping studies for mental health assessment. However, their validity may be compromised by environmental and technological factors, such as device type or operating system differences, which can introduce systematic bias. To assess the robustness of these behavioural metrics, we compared feature values between Android (PRIORITY_ACCURACY mode) and iOS (BEST_FOR_NAVIGATION mode) devices under identical ground truth conditions.
Table 3 presents the ratio of mean feature values across operating systems, where a value of 1.00 indicates perfect consistency. The ratios for number of locations, location entropy, normalised location entropy, and location variance were 1.08, 1.05, 0.98, and 1.02, respectively. These values, all within 8% of perfect agreement, suggest minimal discrepancies between operating systems when behavioural features are derived from preprocessed GPS data.
These results support the notion that, whilst raw geographic precision may vary between platforms, the transformation of location data into behavioural metrics remains largely consistent—an essential requirement for reliable clinical application and cross-platform deployment.

3.3. Phase 2: Naturalistic Observational Study

To evaluate the utility of GPS data collected in uncontrolled, naturalistic environments for mental health applications, we conducted a series of detailed case analyses. We aimed to assess whether individual-level mobility patterns, as detected through our clustering-based GPS tracking methodology, could bring about interpretable and behaviourally meaningful insights in real-world settings.
For visualisation purposes, detected cluster centroids were marked in red, whilst raw GPS data points—automatically aggregated for clarity—were displayed in blue with colour intensity encoding the frequency of observations. This representation enabled simultaneous examination of raw data distribution and algorithmically identified activity centres.

3.3.1. Case Analysis I: Weekday Routine Detection and Deviation Identification

We first performed a representative case analysis on the weekday activity data of Participant ID 152 to assess the algorithm’s capacity to detect both consistent routines and subtle deviations. The results demonstrated the method’s sensitivity and ecological validity in capturing real-world behavioural patterns (Figure 2).
Baseline Weekday Pattern. As illustrated in Figure 2A,B, the participant exhibited a regular weekday pattern of movement: transitioning from home to workplace, then to a fitness centre, and returning home. These three activity hubs appeared consistently across multiple weekdays with only minor spatial variations, indicating that the algorithm reliably captured the participant’s baseline mobility structure. The consistent detection of these hubs across different days demonstrates the algorithm’s reproducibility and stability.
Behavioural Deviations. Figure 2C,D highlight meaningful deviations from this baseline pattern. In Figure 2C, an additional cluster was detected corresponding to a visit to a cafeteria, suggesting an added location to the routine that was not present on typical weekdays. Conversely, Figure 2D shows the absence of the fitness centre hub, indicating a deviation from the usual activity pattern—possibly due to schedule changes or behavioural variation. These changes were effectively captured through the clustering approach, demonstrating the algorithm’s sensitivity to behavioural fluctuations that may be clinically relevant in mental health monitoring contexts.
Overall, these findings suggest that the proposed location detection algorithm is capable not only of identifying regular daily routines but also of capturing meaningful deviations, thereby offering a valuable tool for behavioural monitoring in real-world mental health research and applications.

3.3.2. Case Analysis II: Weekend Long-Distance Travel

Weekend activities of the same participant (ID 152) differed markedly from the regular weekday patterns, providing an opportunity to evaluate algorithm performance during non-routine, long-distance travel. On weekends, the participant travelled to more distant locations, as inferred from the detection of clusters corresponding to the participant’s home, motorway service areas, and accommodation sites (Figure 3A).
Transient Location Detection Limitations. Upon closer inspection, some transient locations—such as a stop near a petrol station (~45 min) and a visit to a tourist attraction (~40 min)—were not detected as distinct clusters. This outcome is consistent with the algorithm’s design parameters: given that one data point represents the average coordinates within a 5 min interval, and the DBSCAN algorithm requires a minimum of 10 consecutive points (50 min of sustained presence), brief stops below this threshold are intentionally filtered out to avoid spurious cluster detection.
Meaningful Activity Hub Detection. Nevertheless, the return to home was successfully captured, along with four meaningful clusters representing overnight accommodations, a large multi-purpose swimming facility, the participant’s home, and a brief stop at a fitness centre (Figure 3B). These observations demonstrate the algorithm’s capability to detect irregular or non-routine activities that diverge from the individual’s typical weekday structure. Despite some missed detections at transient locations, the method exhibited sufficient sensitivity to reconstruct a plausible outline of the participant’s overall mobility behaviour, even in complex and unfamiliar settings. This capability is particularly relevant for mental health applications, where changes in travel patterns or engagement in novel activities may signal behavioural changes associated with mood fluctuations or treatment responses.

3.3.3. Case Analysis 3: Transportation Mode Robustness

Additionally, Participant ID 203—who exhibited consistent movement patterns between the same departure and destination points using different transportation modes—was analysed to examine whether transportation type influences algorithmic detection accuracy. The participant completed long-distance travel on separate occasions, sometimes using a private vehicle and at other times via high-speed rail.
Transportation-Invariant Detection. In both scenarios, transient transit segments (e.g., motorway passages or rail corridors) were not identified as distinct clusters; instead, only the departure and destination points were detected as meaningful activity hubs. This outcome is consistent with expectations, as there were no prolonged stopovers during transit, and such segments can be considered behaviourally trivial for the purposes of daily activity modelling. The algorithm appropriately filtered out these non-essential movement segments whilst preserving detection of meaningful endpoints.
Cross-Modal Consistency. Importantly, aside from differences in geographic routes between the two transportation modes, no significant differences in behavioural interpretation were observed (Figure 4). The same origin and destination clusters were consistently identified regardless of whether the participant travelled by car or train. This reinforces the robustness and efficiency of the proposed method, which appropriately filters out non-essential variability and maintains consistency in identifying meaningful activity hubs across different transportation contexts. Furthermore, the analysis demonstrates that long-distance movements can be accurately detected using the same algorithmic parameters (ε = 0.001°, min_samples = 10) without requiring mode-specific calibration, thereby enhancing the method’s generalisability and practical applicability.

4. Discussion

4.1. Summary of Principal Findings

This study evaluated the accuracy and reliability of a GPS-based location detection algorithm for use in digital mental health applications. Through a two-phase validation approach—combining controlled action-based testing with real-world observations—we demonstrated that the proposed DBSCAN-based preprocessing methodology can accurately identify major activity hubs and capture behaviourally meaningful mobility patterns under both structured and naturalistic conditions.
In the controlled experimental setting (Phase 1), the algorithm achieved high spatial accuracy, with average distance errors below 50 metres across key activity types (Home, Work/School, Stationary) on both Android and iOS platforms. The overall mean spatial error was 30.63 m, well within the acceptable range for deriving location-based behavioural features. Additionally, the number of detected activity hubs closely matched ground truth values, particularly under high-accuracy GPS modes (Android’s PRIORITY_ACCURACY mode: N-error = 0.6; iOS’s BEST_FOR_NAVIGATION mode: N-error = 0.4), whilst battery-conserving modes introduced substantial detection errors (Android PRIORITY_LOW_POWER: N-error = 3.0; iOS REDUCED_ACCURACY: N-error = 1.8). Importantly, despite platform-specific variability in raw spatial accuracy, derived behavioural features (location entropy, normalised entropy, location variance) remained highly consistent across devices (ratios: 0.98–1.08), underscoring the method’s robustness for extracting clinically relevant metrics.
Real-world case analyses (Phase 2) further substantiated the algorithm’s utility in capturing individual mobility routines and deviations. For instance, Participant ID 152 exhibited stable weekday routines that were reliably detected across multiple days, alongside meaningful deviations on weekends (e.g., long-distance trips, additional activity clusters, or missing habitual locations), demonstrating the method’s ecological validity and sensitivity to behavioural changes. The algorithm also performed robustly across different transportation modes, as evidenced by Participant ID 203’s consistent detection of origin–destination clusters regardless of whether a private vehicle or high-speed train was used. These results support the core hypothesis that GPS-derived data—when processed using an appropriate clustering framework—can yield reliable, interpretable behavioural indicators for mental health research.

4.2. Strengths and Implications

The findings highlight several key strengths of the proposed GPS-based location detection method, each with important implications for digital phenotyping in mental health applications.
Behavioural Accuracy Over Geographic Precision. The algorithm prioritises behaviourally meaningful accuracy over strict geographic precision, emphasising its ability to detect daily routines and clinically interpretable deviations—an approach aligned with the functional objectives of digital phenotyping. Rather than achieving centimetre-level positioning (which is unnecessary for mental health monitoring), the method focuses on identifying meaningful activity centres within acceptable spatial tolerance (~50 m), enabling reliable extraction of mobility features such as location entropy and homestay duration that have been linked to depression severity.
Cross-Platform Consistency. The method demonstrated strong platform consistency, a critical requirement for scalable deployment. Despite variability in raw GPS data quality across mobile operating systems and device settings, the algorithm yielded comparable behavioural features (within 8% agreement), supporting its applicability across diverse technological environments. This finding addresses a major barrier identified in previous digital phenotyping research, where platform differences have compromised data comparability and limited multi-site studies.
Real-World Adaptability. The algorithm proved adaptable to real-world complexity, successfully capturing nuanced behavioural patterns such as long-distance travel, temporary activity modifications, and weekday–weekend variability without requiring user input or controlled conditions. This capability is particularly valuable for mental health applications, where naturalistic monitoring over extended periods is essential for detecting gradual changes or early warning signs of symptom exacerbation.
Sensitivity to Behavioural Deviations. The approach exhibited sensitivity to meaningful deviations in behaviour, offering potential utility for early detection of changes relevant to mental health monitoring. The ability to distinguish between stable routines and deviations (e.g., missing habitual activities, introducing new locations) may provide objective markers of behavioural activation, social withdrawal, or routine disruption—features that have been associated with mood disorders.

4.3. Comparison with Prior Work

The spatial accuracy achieved in this study (mean error: 30.63 m) compares favourably with previous GPS validation research in health applications. A smartphone GPS accuracy study in urban environments reported overall average horizontal position accuracy of 7–13 m for iPhones, whilst a comparative study of GPS devices and mobile phones found that GPS-enabled smartphones are typically accurate to within 4.9 m (16 ft.) radius under open sky, with accuracy worsening near buildings, bridges, and trees. Our findings align with these reports whilst demonstrating that even with moderate spatial errors, behavioural features remain valid and consistent across platforms.
Regarding cluster detection accuracy, our results showing N-error of 0.4–0.6 under optimal settings represent a substantial improvement over methods that rely on raw GPS coordinates without preprocessing. Previous research analysing over 57 million GPS data points showed that prediction accuracy for depression dropped from approximately 80% in homogeneous student samples to approximately 60% in heterogeneous populations, highlighting the importance of robust preprocessing algorithms like the one developed in this study.
The cross-platform consistency observed in our study (behavioural feature ratios: 0.98–1.08) addresses a critical gap identified in prior validation work. Previous studies have documented systematic differences between Android and iOS in GPS data completeness and quality, but few have demonstrated that these differences can be mitigated through appropriate algorithmic preprocessing whilst maintaining behavioural feature validity.

4.4. Limitations

Several limitations should be acknowledged when interpreting these findings.
Temporal Resolution Constraints. Whilst the clustering algorithm successfully identified major activity hubs, transient locations (e.g., brief stops under 40–45 min) were occasionally missed due to the temporal resolution threshold used in DBSCAN (min_samples = 10 points × 5 min intervals = 50 min minimum dwell time). For applications requiring detection of shorter activities, the min_samples parameter can be reduced (e.g., to 5 for a 25 min minimum duration), though this requires validation to balance sensitivity against false positive detection from GPS noise.
Limited Sample Size in Controlled Validation. The sample size for the controlled validation phase was limited (N = 5 participants over 5 days), which may constrain the statistical power and generalisability of the spatial accuracy findings. Although the real-world validation phase included a broader sample (N = 38 participants over 7 days), providing stronger evidence for ecological validity, future research should incorporate larger and more demographically and clinically diverse cohorts to further evaluate the robustness of the proposed method across different populations, age groups, geographic regions, and clinical conditions.
Absence of Naturalistic Ground Truth. A significant limitation of the real-world validation phase is the absence of concurrent ground truth behavioural assessments during naturalistic tracking. Without ecologically valid, time-aligned reference data, it remains difficult to definitively confirm whether all detected activity patterns correspond to actual meaningful behaviours or whether some meaningful activities were missed. The case analyses presented provide face validity through visual inspection and plausibility assessment, but lack quantitative ground truth verification. This limitation could be addressed in future studies by integrating multimodal sensing with physics-informed uncertainty modelling (e.g., FEM-AI-inspired approaches), thereby introducing contextual constraints from complementary sensors and improving discrimination between true behavioural patterns and GPS-related artefacts.
Transportation Mode Detection. Under the current algorithmic setting, prolonged low-speed congestion could, in principle, be misclassified as a stay if GPS points remain within ≈100 m for ≥50 min; although we did not observe comparable cases in our naturalistic dataset, this remains a limitation of GPS-only clustering when transportation context is unavailable. Whilst behaviourally trivial segments (e.g., motorway passages or railway transitions) were intentionally excluded from cluster detection, future applications may benefit from finer-grained transportation mode detection to differentiate passive transit from active engagement. Such information could provide additional behavioural context, for example, distinguishing between walking (potentially indicating exercise or social activity) and vehicular travel (more passive behaviour), and could also reduce false stay detection under traffic-related slow movement.
Device- and Environment-Dependent GPS Uncertainty. We note that environmental and device-related factors that could affect GPS quality were not explicitly controlled in this study. In dense urban areas, multipath and satellite-geometry effects (urban canyon conditions) might generate spurious point clouds that produce false micro-clusters and increase sensitivity to DBSCAN parameters (ε and min_samples). Additionally, device carrying position (e.g., pocket, bag, or hand) and device orientation can affect GPS signal reception via body shielding and antenna attenuation.
Furthermore, device-specific factors such as antenna quality, chipset capabilities, and operating system location service implementations may introduce systematic differences in GPS accuracy and noise characteristics. These contextual and device-related factors were not recorded in our study and could not be quantified. Although we indirectly mitigated such noise by requiring spatiotemporal persistence (min_samples = 10, or 50 min of sustained presence) and applying conservative clustering criteria, these measures may not completely eliminate false positives in extremely challenging environments.
More robust approaches that explicitly account for such uncertainties—for example, density-adaptive clustering methods (OPTICS, HDBSCAN), physics-informed uncertainty models that weight GPS points by estimated accuracy or device orientation, or multimodal validation using complementary sensors—remain important directions for future work.
Single-Country Context. This study was conducted entirely in South Korea, with a relatively homogeneous sample reflecting activity patterns typical of urban and suburban Korean environments. GPS performance, urban density, building structures, and daily routine patterns may differ substantially in other geographic and cultural contexts. Validation in diverse international settings would strengthen confidence in the method’s generalisability. In addition, future work could explore subgroups of users who share similar data-collection environments and investigate adaptive clustering strategies tailored to these contextual subtypes.

4.5. Clinical and Practical Implications

The validated GPS preprocessing algorithm has several important implications for digital phenotyping in mental health research and clinical practice.
Research Applications. The demonstrated reliability and cross-platform consistency enable researchers to deploy GPS-based digital phenotyping across diverse participant populations without concern that platform differences will confound behavioural feature extraction. This facilitates large-scale, multi-site studies and reduces the need for platform-specific calibration or separate analyses for Android and iOS users.
Clinical Monitoring. The algorithm’s ability to detect both stable routines and meaningful deviations suggests potential applications in clinical monitoring contexts. For example, detecting reduced location entropy, increased homestay, or missing habitual activity locations could serve as early warning indicators of depressive episodes, whilst detecting erratic mobility patterns or uncharacteristic long-distance travel might signal manic episodes in bipolar disorder. The passive, unobtrusive nature of GPS data collection makes it particularly suitable for longitudinal monitoring between clinical visits.
Practical Deployment Considerations. The findings regarding GPS accuracy modes have direct practical implications: researchers and clinicians should prioritise high-accuracy GPS settings (PRIORITY_ACCURACY on Android, BEST_FOR_NAVIGATION on iOS) when designing digital phenotyping studies, as battery-conserving modes substantially compromise detection accuracy. However, this must be balanced against participant burden, as high-accuracy modes increase battery drain and may reduce compliance over extended monitoring periods. Accordingly, future work should systematically examine the adherence-battery trade-off and develop energy-aware sensing strategies that preserve behavioural validity while minimising user burden for sustained longitudinal monitoring.

4.6. Future Directions

Building on these findings, several avenues for future research warrant attention.
Parameter Optimisation. Optimising temporal and spatial clustering parameters for different behavioural contexts may improve sensitivity to transient but clinically relevant activities. For example, context-aware thresholding (e.g., lower min_samples during daytime hours, higher thresholds during sleep periods) or location-specific settings (e.g., smaller ε in dense urban areas, larger ε in suburban settings) could enhance detection sensitivity whilst maintaining specificity. Although we employed fixed DBSCAN parameters to ensure interpretability and reproducibility, future work should explore adaptive or hybrid strategies in which ε and/or min_samples are dynamically adjusted using contextual or signal-quality cues (e.g., indoor/outdoor likelihood, motion state, GPS sampling density). Physics-informed and AI-enhanced frameworks (e.g., FEM-AI-inspired approaches) offer a useful conceptual foundation for such context-sensitive thresholds by explicitly modelling measurement uncertainty and learning cross-modal consistency patterns to guide clustering decisions.
Multimodal Data Integration. Integrating multimodal passive sensing data represents a promising avenue for enriching behavioural inference beyond GPS alone. Following recent advances in physics-informed AI approaches for biomedical monitoring [13], future implementations could incorporate physics-informed uncertainty modelling to better characterise GPS measurement error based on environmental context (e.g., urban density, satellite visibility, signal-to-noise ratios). We envision multimodal sensor integration serving two complementary purposes that address different limitations of GPS-based phenotyping:
(i) Validating GPS-detected locations (>50 min threshold): Complementary sensors would be able to provide cross-modal confirmation of detected activity centres. For example, extended presence at a location cluster could be validated against concordant evidence from physiological sensors (e.g., heart rate variability or electrodermal activity indicating quiescence), accelerometry (e.g., indicating sedentary behaviour), environmental sensors (e.g., ambient temperature, light, or noise levels indicating indoor context), or device usage patterns (e.g., screen time indicating stationary engagement). When multiple sensor modalities provide concordant evidence, confidence in the behavioural interpretation increases; conversely, discordant signals may indicate GPS measurement error or behavioural state transitions requiring closer examination.
(ii) Detecting brief events (<50 min threshold): High-temporal-resolution sensors could identify transient but clinically meaningful events that fall below the current GPS clustering threshold. For instance, accelerometry and step count can capture brief periods of physical activity or social movement; heart rate variability and electrodermal activity can detect acute stress or anxiety episodes; and screen time or application usage can indicate engagement patterns. These sensors provide behavioural context during periods when GPS spatial displacement is minimal or location stops are too brief for cluster formation.
This dual-purpose approach would enable more comprehensive behavioural phenotyping across temporal scales: GPS provides spatial context for longer-duration activities, whilst high-frequency sensors capture brief fluctuations and validate detected patterns. Additionally, sensor fusion could enable automated activity classification (e.g., distinguishing home rest from work activity, or differentiating social visits from solitary activities), enhancing the clinical interpretability of mobility patterns.
Missingness and Imputation. Although we did not observe substantial limitations at the intended 5 min sampling interval in this study, real-world deployments may still exhibit intermittent location-data gaps that affect feature stability. Future work should systematically investigate optimal imputation strategies for intermittent location-data gaps (e.g., bounded carry-forward, short-gap interpolation, or motion state-based imputation) and define appropriate tolerance thresholds to minimise spurious behavioural inference.
Naturalistic Ground Truth Validation. Future studies should prioritise developing and implementing methods for capturing concurrent ground truth behavioural assessments during naturalistic tracking. Approaches such as GPS-triggered ecological momentary assessment (where brief surveys are automatically prompted when new locations are detected), participant-initiated activity logging through simple mobile interfaces, or integration with calendar data could provide time-aligned validation whilst minimising participant burden. Where feasible, pairing ecological momentary assessment data or participant-reported activity labels with complementary passive signals (e.g., wearable-derived activity state or physiological responses) may provide additional objective anchors for short-duration events that are difficult to validate using GPS alone. Such designs would also enable systematic quantitative evaluation of behavioural inference (e.g., agreement between detected clusters/events and self-reported or sensor-anchored ground truth).
Clinical Outcome Validation. Longitudinal studies examining the relationship between GPS-derived mobility features and clinical outcomes (e.g., depression symptom trajectories, relapse events, treatment response) are needed to establish the clinical utility of these metrics. Such studies should examine both concurrent validity (association between mobility patterns and current symptom severity) and predictive validity (ability of mobility changes to forecast future symptom changes or clinical events). Future work should also assess longer-term structural changes in dominant locations (e.g., relocation), which are best inferred post hoc by tracking sustained shifts in centroids, visit frequency, or dwell-time patterns across weeks, potentially using drift or change-point detection to distinguish persistent relocation from short-lived anomalies [17,18].
Algorithm Refinement. Future algorithm development could explore alternative clustering approaches (e.g., hierarchical clustering, Gaussian mixture models) or machine learning methods (e.g., recurrent neural networks for sequence modelling) that might capture more complex behavioural patterns. Additionally, developing automated methods to distinguish between different activity types at detected clusters (e.g., home vs. work vs. social vs. healthcare) would enhance clinical interpretability.

5. Conclusions

This study provides robust evidence that GPS data from personal smartphones, when processed using an appropriate preprocessing algorithm, can reliably detect major activity centres and yield behaviourally meaningful mobility patterns for digital phenotyping in mental health applications. The demonstrated spatial accuracy (<50 m error), cluster detection reliability (N-error <1 under optimal settings), and cross-platform consistency (behavioural feature agreement within 8%) establish a methodological foundation for deploying GPS-based digital phenotyping at scale.
Beyond methodological validation, the proposed framework enables the derivation of clinically interpretable behavioural metrics that may help objectify behavioural assessment. Examples include routine regularity (e.g., stability of dominant activity centres across days), mobility diversity (e.g., entropy of visited locations), time-at-home proportion, and deviation indices capturing departures from an individual’s typical spatiotemporal patterns. Such metrics align with clinically relevant constructs such as behavioural activation, social withdrawal, circadian regularity, and functional mobility, which are commonly implicated in mental health conditions.
The algorithm’s ability to capture both stable routines and meaningful deviations in naturalistic settings, combined with its robustness across different transportation modes and activity contexts, supports its utility for real-world mental health monitoring. Whilst limitations including temporal resolution constraints and the need for naturalistic ground truth validation remain, these findings represent an important step towards leveraging ubiquitous smartphone technology for objective, continuous behavioural assessment in mental health research and clinical care.

Author Contributions

Conceptualization, J.H.L. (Joo Ho Lee) and G.H.D.; methodology, J.H.L. (Joo Ho Lee); software, S.J.L.; validation, J.H.L. (Joo Ho Lee), G.H.D. and J.H.L. (Jee Hang Lee); formal analysis, J.H.L. (Joo Ho Lee); investigation, J.H.L. (Joo Ho Lee); resources, G.H.D. and S.J.L.; data curation, G.H.D. and S.J.L.; writing—original draft preparation, J.H.L. (Joo Ho Lee); writing—review and editing, J.H.L. (Jee Hang Lee); visualisation, J.H.L. (Joo Ho Lee); supervision, J.H.L. (Jee Hang Lee) and J.Y.P.; project administration, J.Y.P. and S.H.P.; funding acquisition, J.Y.P. and G.H.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (grant number: RS-2023-KH135442) and by Korea Institute of Police Technology (KIPoT; Police Lab 2.0 programme) grant funded by MSIT (RS-2023-00281194; Development for a remotely operated testimony system for the children based on AI and Cloud technology).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Public Institutional Bioethics Committee designated by the Ministry of Health and Welfare of the Korean Government (MOHW; P01-202310-01-013) on 11 October 2023.

Informed Consent Statement

Informed consent for participation was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors due to the sensitive nature of behvioral data.

Acknowledgments

Digital Medic Co., Ltd., supported this research for data collection. During the preparation of this manuscript/study, the authors used ChatGPT (GPT 4o; OpenAI, San Francisco, CA, USA) for the purposes of language refinement. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

Author J.H.L. (Joo Ho Lee), S.H.P., S.J.L. were employed by the company Digital Medic Co., Ltd; G.H.D is the CEO. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Onnela, J.P. Opportunities and challenges in the collection and analysis of digital phenotyping data. Neuropsychopharmacology 2021, 46, 45–54. [Google Scholar] [CrossRef] [PubMed]
  2. Saeb, S.; Zhang, M.; Kwasny, M.; Karr, C.J.; Kording, K.; Mohr, D.C. The relationship between clinical, momentary, and sensor-based assessment of depression. In Proceedings of the 2015 9th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth), Istanbul, Turkey, 20–23 May 2015; IEEE: New York, NY, USA, 2015; pp. 229–232. [Google Scholar]
  3. De Angel, V.; Lewis, S.; White, K.; Oetzmann, C.; Leightley, D.; Oprea, E.; Lavelle, G.; Matcham, F.; Pace, A.; Mohr, D.C.; et al. Digital health tools for the passive monitoring of depression: A systematic review of methods. NPJ Digit. Med. 2022, 5, 3. [Google Scholar] [CrossRef] [PubMed]
  4. Rohani, D.A.; Faurholt-Jepsen, M.; Kessing, L.V.; Bardram, J.E. Correlations between objective behavioral features collected from mobile and wearable devices and depressive mood symptoms in patients with affective disorders: Systematic review. JMIR Mhealth Uhealth 2018, 6, e9691. [Google Scholar] [CrossRef] [PubMed]
  5. Harari, G.M.; Müller, S.R.; Aung, M.S.; Rentfrow, P.J. Smartphone sensing methods for studying behavior in everyday life. Curr. Opin. Behav. Sci. 2017, 18, 83–90. [Google Scholar] [CrossRef]
  6. Huckins, J.F.; DaSilva, A.W.; Wang, R.; Wang, W.; Hedlund, E.L.; Murphy, E.I.; Lopez, R.B.; Rogers, C.; Holtzheimer, P.E.; Kelley, W.M.; et al. Fusing mobile phone sensing and brain imaging to assess depression in college students. Front. Neurosci. 2019, 13, 248. [Google Scholar] [CrossRef] [PubMed]
  7. Insel, T.R. Digital phenotyping: Technology for a new science of behavior. Jama 2017, 318, 1215–1216. [Google Scholar] [CrossRef] [PubMed]
  8. Ahn, J.S.; Jeong, I.; Park, S.; Lee, J.; Jeon, M.; Lee, S.; Do, G.; Jung, D.; Park, J.Y. App-Based Ecological Momentary Assessment of Problematic Smartphone Use During Examination Weeks in University Students: 6-Week Observational Study. J. Med. Internet Res. 2025, 27, e69320. [Google Scholar] [CrossRef] [PubMed]
  9. Google. Change Location Settings: LocationRequest Priority Constants. Android Developers. 2025. Available online: https://developer.android.com/develop/sensors-and-location/location/change-location-settings (accessed on 20 March 2025).
  10. Apple Inc. kCLLocationAccuracyBest Constant. Apple Developer Documentation. 2025. Available online: https://developer.apple.com/documentation/corelocation/kcllocationaccuracybest (accessed on 20 March 2025).
  11. Apple Inc. kCLLocationAccuracyReduced Constant. Apple Developer Documentation. 2025. Available online: https://developer.apple.com/documentation/corelocation/kcllocationaccuracyreduced (accessed on 20 March 2025).
  12. Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Paper presented at the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
  13. Pratticò, D.; Carlo, D.D.; Silipo, G.; Laganà, F. Hybrid FEM-AI Approach for Thermographic Monitoring of Biomedical Electronic Devices. Computers 2025, 14, 344. [Google Scholar] [CrossRef]
  14. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. Available online: https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html (accessed on 20 March 2025).
  15. U.S. Geological Survey. How Much Distance Does a Degree, Minute, and Second Cover on Your Maps? USGS: Reston, VA, USA. Available online: https://www.usgs.gov/faqs/how-much-distance-does-a-degree-minute-and-second-cover-your-maps (accessed on 7 July 2025).
  16. Wikipedia contributors. Geographic coordinate system. In Wikipedia, The Free Encyclopedia; Wikimedia Foundation: San Francisco, CA, USA, 2025; Available online: https://en.wikipedia.org/wiki/Geographic_coordinate_system (accessed on 7 July 2025).
  17. Montoliu, R.; Gatica-Perez, D. Discovering human places of interest from multimodal mobile phone data. In Proceedings of the 9th International Conference on Mobile and Ubiquitous Multimedia, Limassol, Cyprus, 1–3 December 2010; pp. 1–10. [Google Scholar]
  18. Do, T.M.T.; Gatica-Perez, D. Where and what: Using smartphones to predict next locations and applications in daily life. Pervasive Mob. Comput. 2014, 12, 79–91. [Google Scholar] [CrossRef]
Figure 1. Cluster detection accuracy across accuracy modes and platforms. (A) Distribution of location cluster count errors (N-error) for Android devices across three accuracy modes: PRIORITY_ACCURACY (red), BALANCED_POWER_ACCURACY (black), and PRIORITY_LOW_POWER (grey). The dashed line represents the ground truth. (B) Distribution of N-error for iOS devices across BEST_FOR_NAVIGATION (red) and REDUCED_ACCURACY (black) modes. The dashed line represents the ground truth. Lower N-error indicates better agreement with ground truth activity hub counts.
Figure 1. Cluster detection accuracy across accuracy modes and platforms. (A) Distribution of location cluster count errors (N-error) for Android devices across three accuracy modes: PRIORITY_ACCURACY (red), BALANCED_POWER_ACCURACY (black), and PRIORITY_LOW_POWER (grey). The dashed line represents the ground truth. (B) Distribution of N-error for iOS devices across BEST_FOR_NAVIGATION (red) and REDUCED_ACCURACY (black) modes. The dashed line represents the ground truth. Lower N-error indicates better agreement with ground truth activity hub counts.
Electronics 15 00272 g001
Figure 2. Weekday mobility patterns and deviations for Participant ID 152. Red markers indicate detected cluster centroids; blue points represent raw GPS observations with intensity encoding frequency. (A,B) Typical weekday pattern showing consistent three-hub structure: home, workplace, and fitness centre. (C) Deviation case 1: Additional cafeteria visit detected. (D) Deviation case 2: Fitness centre visit absent.
Figure 2. Weekday mobility patterns and deviations for Participant ID 152. Red markers indicate detected cluster centroids; blue points represent raw GPS observations with intensity encoding frequency. (A,B) Typical weekday pattern showing consistent three-hub structure: home, workplace, and fitness centre. (C) Deviation case 1: Additional cafeteria visit detected. (D) Deviation case 2: Fitness centre visit absent.
Electronics 15 00272 g002
Figure 3. Weekend long-distance travel pattern for Participant ID 152. (A) Extended travel route including motorway service areas and distant accommodation. (B) Detected meaningful activity hubs: home, accommodations, swimming facility, and fitness centre. Transient stops < 50 min were appropriately filtered.
Figure 3. Weekend long-distance travel pattern for Participant ID 152. (A) Extended travel route including motorway service areas and distant accommodation. (B) Detected meaningful activity hubs: home, accommodations, swimming facility, and fitness centre. Transient stops < 50 min were appropriately filtered.
Electronics 15 00272 g003
Figure 4. Transportation mode robustness analysis for Participant ID 203. Detection of identical origin-destination clusters despite different travel modes: (A) Private vehicle route. (B) High-speed rail route. Transit segments appropriately excluded from cluster detection.
Figure 4. Transportation mode robustness analysis for Participant ID 203. Detection of identical origin-destination clusters despite different travel modes: (A) Private vehicle route. (B) High-speed rail route. Transit segments appropriately excluded from cluster detection.
Electronics 15 00272 g004
Table 1. Parametric values and corresponding geographic meanings of the DBSCAN algorithm.
Table 1. Parametric values and corresponding geographic meanings of the DBSCAN algorithm.
ParameterParametric ValuePhysical MeaningRationale
ε (latitude)0.001 deg111 m N-SCaptures movement within ~1 city block
ε (longitude)0.001 deg87 m E-W (φ = 37.5°)Longitude scale shrinks with cos φ
Note, E-W distance scales with latitude
min_samples10 GPS points≥50 min dwell (10 pts × 5 min)Filters transient stops
Table 2. Average spatial distance errors (in metres) between detected cluster centroids and ground truth locations for essential activity hubs.
Table 2. Average spatial distance errors (in metres) between detected cluster centroids and ground truth locations for essential activity hubs.
Activity TypeAndroid (PRIORITY_ACCURACY)iOS (BEST_FOR_NAVIGATION)Overall Mean
Home44.6737.8341.25
Work/School30.0419.9825.01
Stationary30.9220.3225.62
Overall
Average
35.2126.0430.63
Table 3. Cross-platform consistency of mobility-derived behavioural features.
Table 3. Cross-platform consistency of mobility-derived behavioural features.
Behavioural FeatureRatio (Android/iOS)Ratio Interpretation
Number of Locations1.088% difference
Location Entropy1.055% difference
Normalised
Location Entropy
0.982% difference
Location Variance1.022% difference
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, J.H.; Park, J.Y.; Park, S.H.; Lee, S.J.; Do, G.H.; Lee, J.H. Development and Validation of a GPS Error-Mitigation Algorithm for Mental Health Digital Phenotyping. Electronics 2026, 15, 272. https://doi.org/10.3390/electronics15020272

AMA Style

Lee JH, Park JY, Park SH, Lee SJ, Do GH, Lee JH. Development and Validation of a GPS Error-Mitigation Algorithm for Mental Health Digital Phenotyping. Electronics. 2026; 15(2):272. https://doi.org/10.3390/electronics15020272

Chicago/Turabian Style

Lee, Joo Ho, Jin Young Park, Se Hwan Park, Seong Jeon Lee, Gang Ho Do, and Jee Hang Lee. 2026. "Development and Validation of a GPS Error-Mitigation Algorithm for Mental Health Digital Phenotyping" Electronics 15, no. 2: 272. https://doi.org/10.3390/electronics15020272

APA Style

Lee, J. H., Park, J. Y., Park, S. H., Lee, S. J., Do, G. H., & Lee, J. H. (2026). Development and Validation of a GPS Error-Mitigation Algorithm for Mental Health Digital Phenotyping. Electronics, 15(2), 272. https://doi.org/10.3390/electronics15020272

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop