Cross-Domain Travel Mode Detection for Electric Micro-Mobility Using Semi-Supervised Learning

Lev-Ran, Eldar; Łukawska, Mirosława; Servizi, Valentino; Dalyot, Sagi

doi:10.3390/ijgi14090358

Open AccessArticle

Cross-Domain Travel Mode Detection for Electric Micro-Mobility Using Semi-Supervised Learning

¹

Mapping and Geo-Information Engineering, Civil and Environmental Engineering Faculty, Technion—Israel Institute of Technology, Haifa 3200003, Israel

²

Department of Technology, Management and Economics, Technical University of Denmark, Akademivej Bygning 358, 2800 Kongens Lyngby, Denmark

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(9), 358; https://doi.org/10.3390/ijgi14090358

Submission received: 18 May 2025 / Revised: 1 September 2025 / Accepted: 2 September 2025 / Published: 17 September 2025

Download

Browse Figures

Versions Notes

Abstract

Electric micro-mobility modes, such as e-scooters and e-bikes, are increasingly used in urban areas, posing challenges for accurate travel mode detection in mobility studies. Traditional supervised learning approaches require large labeled datasets, which are costly and time-consuming to generate. To address this, we propose xSeCA, a semi-supervised convolutional autoencoder that leverages both labeled and unlabeled trajectory data to detect electric micro-mobility travel modes. The model architecture integrates representation learning and classification in a compact and efficient manner, enabling accurate detection even with limited annotated samples. We evaluate xSeCA on multi-city datasets, including Copenhagen, Tel Aviv, Beijing and San Francisco, and benchmark it against supervised baselines such as XGBoost. Results demonstrate that xSeCA achieves high classification accuracy while exhibiting strong generalization capabilities across different urban contexts. In addition to validating model performance, we examine key travel properties relevant to micro-mobility behavior. This research highlights the benefits of semi-supervised learning for scalable and transferable travel mode detection, offering practical implications for urban planning and smart mobility systems.

Keywords:

travel-mode detection; deep learning; electric micro-mobility; neural networks

Graphical Abstract

1. Introduction

Transportation systems significantly influence the economy, environment, public health, and societal well-being. In urban environments, informed transportation planning is crucial for ensuring sustainable and efficient road networks. Poorly managed networks result in traffic congestion, prolonged travel times, elevated emissions, and increased accident risks [1].

The rise of electric micro-mobility modes, particularly e-scooters and e-bikes, has reshaped short-distance travel in cities. These modes offer cost-effective and flexible mobility solutions but also introduce new challenges related to infrastructure, safety, and urban policy [2]. Detecting and classifying these modes from trajectory data is essential for understanding usage patterns and supporting data-driven urban planning. Micro-mobility modes are defined as transport options with a design speed of up to 45 kph and weight up to 350 kg [3]. This includes traditional human-powered options (e.g., bicycles, kick scooters) and their electric counterparts. Electrically powered micro-mobility modes—especially e-scooters and e-bikes—are rapidly gaining popularity, both in private ownership and through shared systems [4,5,6].

Analyzing mobility patterns using high-resolution travel data supports planning in areas such as mode choice and demand forecasting [7], behavioral modeling [8,9], and emission mitigation [10]. Historically, such analysis relied on travel surveys using diaries, interviews, or forms [11,12], which were prone to self-reporting biases and incomplete data. Efforts to automate these processes emerged in studies using GNSS devices linked to handheld computers [13]. However, these early datasets were limited in participant diversity. Technological advances over the past two decades—such as miniaturized GNSS, Bluetooth, GSM, and Wi-Fi-based tracking—have significantly improved the quality and scale of trajectory data [14,15,16]. Smartphones and wearables now enable large-scale, heterogeneous data collection.

Trajectory data alone, however, lack travel semantics and context. Semantics include trip duration, distance, mode, and purpose; context covers spatial, temporal, social, and environmental dimensions [17]. Automatically inferring these attributes remains difficult due to limitations in positioning accuracy, sampling frequency, and labeling. Additionally, the similarity of travel characteristics across modes—especially between electric and traditional micro-mobility—adds ambiguity. Urban-specific patterns also limit the transferability of existing models.

Addressing these challenges requires advanced classification methods capable of inferring travel modes from spatiotemporal signals. Traditional statistical approaches often struggle in dense environments where modes share features such as speed. Furthermore, the hybrid characteristics of electric micro-mobility—combining aspects of motorized and non-motorized travel—complicate detection. Despite increasing interest in machine learning (ML) for travel mode detection, most models are supervised and rely on extensive labeled datasets. This poses problems for scalability and generalization, especially with emerging modes [18,19,20,21]. Additionally, few existing solutions explicitly support electric micro-mobility modes, leading to poor performance and reduced utility in practice. Semi-supervised learning presents a promising alternative by leveraging both labeled and unlabeled data. To date, few studies have applied such techniques for electric micro-mobility mode detection, and fewer still have evaluated their effectiveness across multiple urban contexts. Further, interpretability remains limited, hindering adoption by transportation practitioners.

We hypothesize that a semi-supervised neural network architecture using minimal labeled data and contextual trajectory features can accurately detect electric micro-mobility travel modes and generalize across diverse urban environments. Specifically, we focus on detecting e-bikes and e-scooters—two rapidly expanding modes not fully addressed in current research. To that end, we propose a novel detection model xSeCA (extended semi-supervised convolutional autoencoder), capable of learning from labeled and unlabeled trajectory data. To our knowledge, this is the first study to jointly detect e-scooters and e-bikes using such an approach. Additionally, we implement Shapley value sampling to identify discriminative travel features and explain model decisions, enhancing interpretability. Our study contributes to the field in three primary ways: (1) we investigate and customize xSeCA—a novel semi-supervised model—for electric micro-mobility mode detection; (2) we assess its robustness and transferability using multi-city datasets; and (3) we analyze travel properties and provide interpretable outputs to support transportation policy and planning.

2. Related Research

2.1. Mode-Detection Models

Recent advancements in ML have significantly enhanced the ability to infer travel modes from trajectory data. Supervised approaches such as decision trees, random forests (RF), support vector machines (SVM), and deep neural networks (e.g., CNNs and LSTMs) have shown promising results in travel mode detection tasks. However, these methods often require extensive labeled datasets, which are challenging to obtain, particularly for emerging micro-mobility modes. This reliance limits their applicability in real-world scenarios and constrains their generalization across diverse cities and mobility cultures. In contrast, semi-supervised learning approaches have emerged as a viable solution to address data scarcity by leveraging large volumes of unlabeled data alongside small labeled subsets.

Mode-detection models often use a fixed sliding window, where the mode is calculated according to the travel parameters (i.e., features) that are associated with the trajectory data points, using statistical or learning-based classification algorithms [19,22]. Yet augmenting GNSS data could improve classification accuracy. For example, ref. [23] used RF classifications, showing that introducing GIS transportation networks increased the precision score of all modes, from 75% to 94%. An SVM method based on features derived from GNSS and a wrist accelerometer was developed in [24]. When data from both devices were combined, accuracy increased from 56–79% to 91%.

Another approach is to use varying lengths of trajectory subsegments, thereby adding another layer of complexity to the algorithm; the segmentation method (i.e., “find stop points”) could ensure that each segment only contains one single travel mode (e.g., [25]), as required for optimal travel behavior analysis. Preferred mode detection methods lean towards ensemble-learning methods, fuzzy logic, or other membership-based functions for identifying travel modes [20,26,27]. The literature also addresses rule-based methods for creating trajectory segmentations, as seen in [28,29]. The latter, for example, implemented rules for identifying adjacent segments that are not logical, such as a bicycle segment that directly changes into a motorized vehicle segment (without idling). They then implemented an SVM model that reached an overall testing accuracy of 93%, albeit on a relatively small dataset of just a few dozen trajectories. Additionally, [26] demonstrated the capability of achieving high travel-mode classification accuracies over single-mode trips using ensemble supervised learning methods (RF, gradient boosting decision tree, and XGBoost). Based on high-frequency GNSS data (1 to 0.2 Hz), XGBoost achieved the best overall accuracy score of 91%.

Given the exponential growth of computational power, a range of deep learning studies have emerged. Deep artificial neural networks for mode detection from logged sensor data were developed in [21,28]; while high validation accuracy was achieved, these were fully supervised models with no utilization of unlabeled data to improve robustness. A similar neural network in a semi-supervised manner was trained in [30], implementing an autoencoder that simultaneously trains using labeled and unlabeled trajectories. Using the Geolife dataset, they achieved an overall accuracy of 77% and an F1-score of 76%. A semi-supervised ensemble learning system was implemented in [31], training multiple models on labeled data while iteratively adding unlabeled data based on model agreement during validation. With 50% labeled training data, accuracy reached 90%; with only 1% labeled data, the model still achieved 85%, indicating robustness with sparsely labeled datasets.

Generative adversarial networks (GANs) have also been explored for generating artificial labeled samples [32]. Since unlabeled travel data is more widely available, such approaches may broaden applicability in domains like micro-mobility. However, many studies still rely on single-city datasets, limiting generalizability and producing context-specific findings.

2.2. Micro-Mobility Mode Detection

Increasing the mode share of active travel is a major transportation policy goal globally. Micro-mobility provides an attractive solution for travelers whose physical condition or commute length precludes other active options such as walking or cycling [33]. Promoting micro-mobility can help reduce emissions and congestion [34]. Still, empirical studies demonstrating their benefits and practical impacts remain limited [3].

Detecting and differentiating electric micro-mobility modes such as e-bikes and e-scooters remains a challenge. Some studies apply computer vision for mobility behavior analysis [35]. Bayesian networks to classify e-bikes from GNSS trajectories were implemented in [36], achieving recall and precision scores of 89% and 80%, respectively. Using RF, ref. [27] reported a relatively low F1-score of 73% for e-bikes based on GNSS data. In [25], subway trips were first identified with a rule-based method using GIS subway network data, followed by a Gaussian process classifier for detecting other modes. E-bike recall and precision scores reached 89% and 92%, respectively. Collecting GNSS and inertial measurement data from 34 participants, ref. [37] developed a deep CNN that achieved 85% precision and 93% recall. CNNs were applied in [38] to classify e-scooter and bicycle types, achieving an F1-score of 92%, although their scope excluded other similar travel modes, limiting applicability, mainly since travel modes commonly share similar travel characteristics (e.g., [39]).

Unlike prior models such as [30], which focused on fully supervised learning and context-specific datasets, SeCA introduces a semi-supervised convolutional autoencoder that learns from both labeled and unlabeled trajectory segments. Its architecture supports compactness, transferability, and minimal supervision. Moreover, SeCA enables cross-city deployment. This cross-domain adaptability aligns with recent work emphasizing generalization in AI-based mobility studies. Notably, ref. [40] proposed a cross-city neural adaptation framework to improve smart mobility applications. Similarly, ref. [41] showed that semantic segmentation of urban mobility patterns using sparse labels are most effective and adaptable using hybrid models that combine neural networks with optimization algorithms—reinforcing the need for scalable models that perform well across spatial contexts.

In summary, although recent studies have experimented with semi-supervised learning, few have addressed electric micro-mobility specifically or validated model performance across diverse cities. Our research addresses this gap by developing and evaluating xSeCA, a compact semi-supervised architecture designed for cross-city generalization and interpretable classification of emerging electric micro-mobility modes.

3. Methodology

Mode detection models aim to classify the mode of travel from a predefined set based on a sequence of observations, each with a fixed set of features. For practical deployment, models must be robust across diverse geographic regions and behavioral contexts. Accordingly, our model was designed to (i) operate in a semi-supervised setting to mitigate the cost and errors of manual labeling; (ii) remain “lightweight” on resources to support inference standard hardware, including mobile devices that can perform directly the classification task; (iii) support cross-domain transfer learning for rapid deployment in new geographic contexts; and (iv) adapt easily to emerging travel modes, such as electric micro-mobility, with minimal feature engineering.

3.1. Dataset Description

Our study employed an ensemble of labeled trajectory datasets, several of which included e-scooters and e-bikes, from diverse urban environments including Germany, Tel Aviv, Copenhagen, China, and San Francisco, capturing a wide range of infrastructures and travel cultures. Most datasets were previously validated in urban mobility studies [42,43,44,45,46]. The Geolife dataset [42] contains multimodal trajectories from Beijing, excluding electric micro-mobility modes. In contrast, the Bernitt dataset [43] from Copenhagen includes only e-bike trips. The Bird dataset from Tel Aviv consists exclusively of e-scooter trajectories. The Sultan dataset [45] from Germany and The Netherlands comprises bicycle trajectories sourced from AllTrails.com, while the PedFlow dataset [46] contains pedestrian trajectories only from Tel Aviv collected for pedestrian flow modeling. Travel-mode overlap between datasets was limited; only the MobilityNet dataset [44] from San Fransico contained all relevant travel modes and was thus selected for cross-domain (transfer learning) evaluation. For consistency, all motorized vehicles were grouped into a single category (“other”), allowing the model to focus on differentiating active travel modes.

Together, the datasets encompass hundreds of thousands of hours of travel data, ensuring heterogeneity in both user behavior and travel contexts. Figure 1 visualizes the geographic spread of trajectories, and Table 1 summarizes dataset statistics. Unlabeled sequences accounted for approximately 44% of the total dataset. E-bike and e-scooter segments made up 43% of all data. It should be noted that to preserve users’ privacy, the Bird dataset trajectories were trimmed to avoid exposing user origin (trajectory head) and destination (trajectory tail). Moreover, all trajectories were temporally aggregated, such that on average, three trajectory waypoints (mean value of 3.22) had the same coordinate tuple value. Accordingly, stacked waypoint locations were linearly interpolated according to the provided timestamp and average velocity to reconstruct trajectories. Standard preprocessing steps were applied to all datasets before training, including trajectory segmentation, noise filtering (using a moving average filter), and temporal downsampling to 1 Hz. Labels were standardized across datasets, and unlabeled data were retained for unsupervised learning.

3.2. Definitions

For the sake of clarity and consistency throughout this research article, the following are terms and definitions:

Waypoint. A timestamped position over a measured GNSS trajectory.
GNSS trajectory. A sequence of waypoints measured by a single GNSS position log. A GNSS trajectory can store hours of positioning data, depicting a user’s series of trips and travel activities, and can include a manifold of travel modes.
Sequence. An array of waypoints ordered by their time stamp. We infer that consecutive waypoints in a sequence are separated by less than 30 s.
Trip. Part of a trajectory that may contain several segments.
Segment. Part of a trip that was traveled using a single travel mode.
Sliding window. Portion of a segment used to calculate the travel properties over a single waypoint, within the context of the trip. A sliding window is composed of the queried waypoint and its preceding and succeeding n neighbors.

3.3. Model Architecture

We adapt SeCA (semi-supervised convolutional autoencoder) [30], a compact neural architecture designed to classify travel modes using both labeled and unlabeled trajectory data. SeCA combines an unsupervised encoder–decoder branch that learns latent spatiotemporal features, with a supervised classification branch for mode detection. This dual-path design allows the model to generalize effectively across cities while minimizing reliance on labeled samples.

SeCA’s encoder consists of three convolutional blocks with ReLU activations, batch normalization, and max-pooling layers. Each block extracts hierarchical features from input tensors, with filter sizes set to 1 × 3 and strides of 1. Padding ensures spatial consistency throughout the network. The decoder mirrors the encoder, employing deconvolutional layers and MaxUnpool operations to reconstruct inputs from latent space, capturing spatial dependencies in the data. The supervised path flattens the encoder output and applies dropout (50%) before feeding it into a fully connected multilayer perceptron (MLP) classifier. The overall model workflow is illustrated in Figure 2.

Training is performed in PyTorch 2.2.0 using the AdamW optimizer [47] with the AMSGrad variant [48]. Two loss functions are minimized: (1) reconstruction loss on all input sequences, and (2) cross-entropy classification loss on labeled data. These losses are jointly optimized, either concurrently or sequentially.

xSeCA processes fixed-length trajectory sequences, each composed of 64 waypoints with seven features—64 × 7 (Figure 3). This represents an update over the original SeCA implementation in [30], which used 248 × 4 input tensors. xSeCA’s shorter sequence length and greater number of features enhances sensitivity to short trips and rapid transitions typical of electric micro-mobility modes; for example, ref. [49] found that most e-scooter trips span just a few hundred meters. To maintain continuity and address ambiguity at window edges, we apply a 56-waypoint overlap between consecutive windows.

The architecture is designed to be lightweight (the technical specifications of the machine used for development and analysis of this work are Nvidia GeForce RTX 3060 GPU, Intel Core i7-7700k CPU and 16 GB of RAM) and efficient, enabling deployment on standard hardware, including mobile devices. This makes xSeCA suitable for real-time, on-device inference in scalable urban mobility applications.

3.4. Model Features

The model processes trajectory segments as multivariate time series of geographic and kinematic features. These features represent travel properties commonly used to distinguish mode behaviors.

Each sliding window is transferred to an appropriate format with higher level kinematic features. This aids the model during training, compared to training over a sequence of raw coordinates. The following three features describe the basic movement characteristics: (a) relative distance (RD), depicted in Equation (1), which defines the metric distance between two consecutive waypoints {(x_i, y_i) and (x_i+₁, y_i+₁)}; (b) time interval (∆t) in seconds, depicted in Equation (2), between two consecutive waypoints’ timestamp; and (c) bearing (B) of two consecutive waypoints, depicted in Equation (3), which represents the azimuth value from point i to point i + 1.

{R D}_{i + 1} = \sqrt{{(y_{i + 1} - y_{i})}^{2} + {(x_{i + 1} - x_{i})}^{2}}

(1)

∆ t_{i + 1} = t_{i + 1} - t_{i}

(2)

B_{i + 1} = a t a n 2 (y_{i + 1} - y_{i}, x_{i + 1} - x_{i})

(3)

Features a and b allow the calculation of the following: (d) speed (S), depicted in Equation (4), which measures the rate of distance change over time; (e) acceleration (A), depicted in Equation (5), which measures the rate of speed change over time; and (f) jerk (J), depicted in Equation (6), which measures the rate of acceleration change over time. Features a and c allow the calculation of the bearing rate (BR), depicted in Equation (7), for measuring the rate of bearing change over time.

S_{i + 1} = \frac{{R D}_{i + 1}}{∆ t_{i + 1}}

(4)

A_{i + 1} = \frac{S_{i + 1} - S_{i}}{∆ t_{i + 1}}

(5)

J_{i + 1} = \frac{A_{i + 1} - A_{i}}{∆ t_{i + 1}}

(6)

{B R}_{i + 1} = \frac{B_{i + 1} - B_{i}}{∆ t_{i + 1}}

(7)

We also implemented a new feature that we hypothesized as differentiating between travel modes. This feature is based on the Ramer–Douglas–Peucker (RDP) smoothing algorithm [50], which is calculated over a fixed sequence. Given a polyline (in our case a trajectory) and a distance threshold ε, this algorithm attempts to find a simpler, more generalized representative polyline that is modelled according to a subset of the original polyline’s points. An example of the RDP algorithm implementation, with ε = 5 m, is depicted in Figure 4, where the output points (green) are a subset of the original sequence of trajectory waypoints (red) of a train (left) and e-bike (right).

The rationale for implementing the RDP smoothing algorithm is that electric micro-mobility modes should show a larger number of maneuvers and turns over distance for each trip when compared to motorized vehicles. This is evident in Figure 4: the e-bike (right) requires more waypoints over similar distances to describe its trajectory. The algorithm is adapted as a Boolean feature per waypoint over a sequence: True if the original waypoint is preserved after smoothing, otherwise False. We choose the value of ε = 0.5 m in the trajectory segmentation based on the maneuverability radius typical for electric micro-mobility vehicles, particularly e-scooters, which can make sharp turns within a sub-meter threshold [51]).

3.5. Model Evaluation

3.5.1. Training and Evaluation

In our proposed model, the training is carried out simultaneously on both supervised and unsupervised data portions, where for each epoch: (1) a mixed batch of data goes through the encoder; (2) the encoded data go through the decoder; and (3) the data portion that contains the training labels goes through the MLP segment to receive label predictions. Utilizing the commonly used iterative AdamW optimizer and AMSGrad variant, the cross-entropy loss and mean square error loss functions are minimized as a means for fine-tuning the parameters. Doing so ensures better convergence in a mini-batch training procedure and at a learning rate of 0.001. With a predefined batch size of 180, a training epoch (iteration) optimizes the loss function (and thereby the weights) by going through the training partition and randomly drawing 180 sliding windows to be used as input for the neural network (NN). The imbalanced sampler balances the randomized draw, resulting in about 30 samples (unlabeled ones included) per label. This means that at each epoch, certain sliding windows of smaller label distributions may be drawn more than once and sliding windows of larger distributions may not be drawn at all—ascertaining that each batch of data contains 50% unlabeled sliding windows and 50% labelled ones.

3.5.2. Score Metrics

Given a confusion matrix of classification results per sequence, true positive (tp), false positive (fp), true negative (tn) and false negative (fn) prediction counts are extracted per label (travel mode). Score metrics, depicted in Equations (8)–(10), are calculated discretely for each label, to account for possible imbalances.

R e c a l l = \frac{t p}{t p + f p}

(8)

P r e c i s i o n = \frac{t p}{t p + f n}

(9)

F 1_s c o r e = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(10)

The validation loss metrics of the detection model per training iteration provides information regarding the NN’s model convergence. The mean square error loss is depicted in Equation (11), and the mean cross entropy loss is depicted in Equation (12).

L_{M S E} (x, \hat{x}) = \frac{\sum_{i = 1}^{N} {(x_{i} - \hat{x_{i}})}^{2}}{N}

(11)

L_{C E} (x, y) = \sum_{i = 1}^{N} \frac{1}{\sum_{i = 1}^{N} w_{y n}} \cdot (- w_{y n} \cdot \log (\frac{\exp (x_{n, y_{n}})}{\sum_{c = 1}^{C} \exp (x_{n, c})}))

(12)

where x is an input observation (i.e., sequence of features), y is the classification output, N is the data batch size (amount of data sequences fed into the NN simultaneously), C is the number of classes, and w is the weight given to the observation’s loss. In case of unbalanced validation data, each weight will receive a value that is inverse to its label’s proportion in the dataset, as a means for balancing the overall metric score.

3.6. Travel Property Analysis

Shapley value sampling analysis allows us to interpret model predictions based on the cooperative game theory [52]. This is achieved by estimating the contribution of each feature, given a baseline, by permutating the input. The approximate Shapley value formula that provides the attribution for feature i of N features is depicted in Equation (13).

φ_{i} (∆) = \frac{1}{n!} \cdot \sum_{O \in π (N)} (∆ (P r e^{i} (O) \cup {i}) - ∆ (P r e^{i} (O))), i = 1, \dots, n

(13)

where Preⁱ(O) denotes the set of features with order (as they appear in Section 3.3) and Preⁱ (O)∪{i} denotes the set of features with order including feature i. The ∆ function denotes the difference between two predictions: (1) a prediction that relies solely on a subset of known feature values; and (2) a prediction where no feature values are known. Since these attribution values show the relation between the influence of certain features to a trained model’s prediction, and as these values can be calculated for each label (i.e., travel mode), it is also possible to see the value and contribution of each feature to the model. A distribution of feature values with noticeable positive contributions indicates that these feature values are unique properties of the given travel mode.

In practice, it is required to provide a fair number of data samples through the NN, each with several permutations per feature (this means that the feature value is either kept or zeroed out in the sample). Many permutations can be performed, as each data sample consists of seven calculated features per waypoint, for a sequence of 64 waypoints—resulting in 448 feature values per sample. However, if the contribution regarding the feature value and position in the sequence is neglected (with no attention being paid to whether a certain feature value appears at the start, middle, or end), much fewer permutations are needed to be tested as all values of a certain feature take (or do not take) part. This results in a total of

\sum_{i = 1}^{7} (\binom{7}{i}) = \sum_{i = 1}^{6} \frac{7!}{i! (7 - i)} = 126

possible permutations (considering no repetitions allowed and with no importance of order).

4. Results

4.1. Descriptive Analysis

Statistical metrics of spatiotemporal features could provide a basic understanding as to which travel properties show correlations with a specific travel mode. The speed feature, for example, represents a somewhat good correlation in most metrics, as the histograms calculated per travel mode show peaks of higher probability relating to a specific mode, as depicted in Figure 5 (top). Other features, such as relative distance and RDP, show similar results (see Appendix A for analysis of all features). On the other hand, ambiguities are observed for the acceleration and jerk features. In terms of distribution of the acceleration metric, e-scooters and e-bikes seemingly differ from other modes, as depicted in Figure 5 (bottom). The different acceleration magnitude behaviors for the electric micro-mobility modes may stem from the fact that both quickly reach high acceleration values over short distances [53].

4.2. Training Results

Training and validation experiments were performed using the Bird, Bernitt, GeoLife, Sultan and PedFlow datasets. For each dataset, 70% of its sequences were randomly selected for training, while the remaining 30% were used for validation, thereby ensuring that no “leakage” of sliding windows between partitions occurred. Due to the substantial amount of training and validation data that was used (more than 3.3 million sliding windows for training, including ~1.5 million unlabeled; and 600,000 labelled sliding windows for validation), the possible cross-validation effect is considered negligible, rendering the selection occurring in a random manner. In total, five training experiments were performed. To handle the labelled data bias between travel modes, the imbalanced sampler was implemented, re-balancing every batch so that on average, there is an equal number of sliding-window samples per label (including unlabeled ones).

The most noticeable pattern is the quick initial convergence of the NN. This is depicted in the loss metrics per training batch over the first epoch in Figure 6, indicating that the data converge to a minimum loss. This is further reflected in Figure 7A–E, which presents a monotonous positive trend of the mean recall and precision scores for all labels. However, unlike the cross-entropy loss, the mean square loss does not show a monotonous convergence over epochs (Figure 7F). This may be due to the decoder’s difficulty in reconstructing some of the feature values.

4.3. Validation Results

The average scores of five validation implementations (one for each training result) were calculated over the 30% data partition, as depicted in Figure 8A–E. The loss scores (Figure 8F) are similar between epochs. This is due to the high convergence rate: on average, a single pass through the dataset allowed the NN to achieve high recall and precision scores (>60% mark for each label). It is also evident that by epoch 13, the NN had reached convergence, as no improvements are seen in the loss metrics from that point onwards. Moreover, the cross-entropy loss also barely decreased as the epochs increased. Finally, the slight increase of cross-entropy loss by epoch 14 signifies that the optimizer “overshot” its convergence step.

The high metric scores of the e-scooter performance are associated with the overfit hypothesis that was speculated in Section 3.1. The unstable bike scores may be due to the kinematic features for the bike mode being less accurate per point-in-time due to the relatively low position frequencies of the bike dataset (Sultan), in turn adding ambiguity to the data. The low precision of the walk mode is probably due to shared travel property similarities with ambiguous sliding windows of other travel modes, caused by mislabeling errors or recorded sequences done at very low speeds. This can also be inferred by the fact that the speed feature added ambiguities to pedestrian sliding-window detection, which will be further explained in Section 4.6.

4.4. xSeCA Variations and Comparison to XGBoost

We compared xSeCA and SeCA to XGBoost, a widely used gradient boosting classifier, due to its strong performance in tabular trajectory features [54], outperforming other tree-based ensemble classifiers in previous studies [26,55]. To maintain comparability, we used the same input features for both models. Hyperparameters were optimized via grid search, first fixed to their default values with a learning rate of 0.1. The training partition of the dataset is used to train the classifier, while the validation partition is used for evaluation and early stopping. Due to the large data volume (~1.5 million labeled sliding windows for training) and features per sample (7 × 64 = 448), 500 estimators are calculated per variation with an early stopping threshold of 50. The best variation was found to have a maximum depth of 6 per tree and a minimum child weight of 3. The analysis results are depicted in Table 2, where the xSeCA and SeCA results are the average validation scores of five validation attempts; ^index (i.e., 32 or 64) stands for the sliding window length (number of waypoints).

On average, the optimal XGBoost classifier shares similar performance scores with the semi-supervised autoencoder. Comparing xSeCA’s NN implementation to the original SeCA one, using 64 sliding window size shows that the larger feature set contributes to detecting the travel mode over shorter sliding windows, mostly the active mobility ones; this can be seen in the slightly higher F1-scores (e.g., up by 1.33% for the bike label). It should be noted that in [30], the classification results of the GeoLife dataset achieved less accurate results than in this study, taking into consideration that in the original SeCA implementation electric micro-mobility modes were not classified, while a longer sliding window size was used, and training was over a single dataset from a single location—all pointing to the robustness of the newly developed xSeCA model.

Training both models over a smaller sliding window (32 waypoints) shows better scores in favor of the larger feature set used in xSeCA. Given the short sequence that was divided into overlapping sliding windows, the original SeCA implementation may not accurately detect the point-in-time of the travel mode change. Largest score differences between the XGBoost and the xSeCA appear to be between the bike detection rates, with a maximum difference in precision of 3.74% in favor of the xSeCA. Moreover, the on-par scores between the XGBoost and the xSeCA implementation suggest that the labeled data in the ensemble dataset is sufficient for detecting travel modes. However, further data processing or architecture modifications may be required to minimize the effect of ambiguities that cause lower performance scores (most evident for the bike travel mode).

While both classifiers share similar performance scores, the xSeCA holds three robust advantages over XGBoost, especially when handling very large travel datasets: model adaptability and customization, model compactness, and model optimization. (1) Model adaptability and customization. To apply the XGBoost detection model to a new dataset (e.g., from other locations, see Section 4.5)—all estimators would have to be recalculated to fit the new dataset. With the NN, on the other hand, only fine-tuning of the existing weights is needed, adapting between datasets via transfer learning. (2) Model compactness. The existing NN has a total of 49,804 stored weights (bias weights included). For travel-mode detection without semi-supervised training, the decoder portion of the NN can be discarded—nearly halving the number of weights and decreasing calculation complexity. In contrast, the XGBoost method has 252,689 tree nodes, including 127,594 that are leaves—approximately five times the number as in the current model, adding complexity to the model and performance (training time). (3) Model optimization. The XGBoost will most probably underperform, since its computation complexity will increase exponentially (as data volume increases), while the NN will maintain linearity. Regarding model compactness, the NN introduces tensors with continuous weights, unlike XGBoost, where different tree nodes and leaves store sporadic values.

4.5. Cross-Domain Analysis

We conducted a cross-domain (transfer learning) evaluation to assess the robustness of the xSeCA model across diverse urban contexts. The model was trained using the combined labeled and unlabeled data (datasets A, B, C, E, and F) and tested on the MobilityNet dataset (D). This setup simulates deployment in a previously unseen domain. Table 3 shows that the larger the portions dedicated for training, the higher the classification scores. None means the model was evaluated without transfer learning, i.e., tested directly without any training. Another expected result is the prementioned overfit, as the e-scooter detection performance zeroed out (in the no transfer experiment) compared to the near 100% scores during validation. This means that Bird’s e-scooter trajectories are vastly different from the MobilityNet ones; with small data transfer portions, similar accuracies are achieved.

4.6. Travel Property Analysis

A representative sample of the validation dataset (1000 random sliding windows per travel mode) is permutated by its input features and passed through the NN. The attribution of each permutation is calculated based on the baseline of the input tensor (i.e., the zero tensor) and then summarized across all permutations. This estimates the feature’s attribution score, producing 2D heatmaps for depicting correlations (please also refer to Appendix B). The horizontal axis denotes the feature’s value, while the vertical axis denotes the attribution calculated for that value in a sliding window. The colors represent the occurrence of the calculated value, normalized to the data count, with darker colors representing higher concentrations and stronger correlations of feature values and attributions.

As opposed to the descriptive analysis results, speed only shows positive attributions for the detection of bikes and e-bikes, with bikes showing lower speeds than e-bikes (Figure 9, top). The maxima of the e-bikes’ heatmap (close to 10 m/s) is higher than that of the bikes (close to 5 m/s). This signifies that speed is a relatively distinct travel property, yet not by a large margin. Additionally, the speed attribution is mostly negative for walk and e-scooter sliding windows, meaning that the NN relies on other features for detecting these travel modes (see Table 4 and Appendix B). The acceleration (Figure 9, bottom) and jerk features show strong positive attributions for e-bikes and e-scooters, meaning that these features are unique indicators for the emerging electric travel modes; this is probably since they both show large values of acceleration and change in acceleration before and after stops. It should be noted, however, that the extreme attribution of the e-scooter may be the result of the NN’s overfit.

Additionally, Table 4 shows that the RDP (with ε = 0.5 m) primarily contributes to the detection of e-scooters. This may be due to the travel mode’s more dynamic behavior in urban spaces (e.g., rapid turns and the use of various paths), yet also due to the processing (interpolation of new locations) that was performed on the Bird dataset. Finally, the RD was also found to contribute to the detection of e-scooters, yet this could simply be due to the processing that was performed.

5. Discussion and Conclusions

This study introduced xSeCA, a semi-supervised convolutional autoencoder tailored to the detection of micro-mobility travel modes, including e-scooters and e-bikes. The customized xSeCA design implemented here addresses key challenges associated with urban mobility data—such as short trip durations and inconsistent sampling intervals—by leveraging a compact, transferable architecture that performs well even when labeled data are limited. The results affirm that deep neural networks, when trained in a semi-supervised setting, offer strong generalization capabilities across diverse geographic regions and behavioral contexts. Notably, xSeCA achieves high validation F1-scores ranging from ≥74.38% for bikes to ≤96.29% for e-bikes and ≤99.95% for e-scooters. These scores surpass the performance of benchmark models like XGBoost, particularly in cross-domain scenarios, where xSeCA’s deep feature representations enhance transferability. xSeCA’s F1-score and precision for e-bikes, for example, also surpass the results in work such as [36], which reported a precision of 80%, [27], which achieved an F1-score of only 73%, and [25], which reported a precision of 92%.

Beyond predictive accuracy, xSeCA offers practical advantages for real-world deployment. Its lightweight design enables efficient on-device inference, and its low dependence on labeled data makes it ideal for regions with limited annotation resources. Compared to traditional supervised approaches such as XGBoost, which perform well on known domains but degrade sharply in transfer settings, xSeCA maintains robust performance with significantly less manual intervention. This is particularly valuable in emerging markets or low-resource environments.

The study introduced interpretability understandings, including feature importance insights via Shapley analysis and embeddings of latent space, which support transparency in decision-making. However, interpreting the full decision logic of deep neural models remains challenging. Future work should integrate explainability methods—such as attention maps, gradient-based attribution, or class activation mappings—to facilitate stakeholder trust, particularly for transportation planners and urban policy makers.

The analysis showed that acceleration and jerk, rather than speed alone, were more discriminative for electric micro-mobility detection, as supported in, e.g., [56,57]. Additionally, the RDP feature displayed high attribution for e-scooter classification, reinforcing the model’s qualitative strengths. These findings highlight the value of context-aware features in distinguishing travel behaviors.

Model performance was also shown to depend on sampling frequency, with detection accuracy decreasing for lower-frequency inputs—especially for bikes—likely due to noise in kinematic feature estimation. This suggests a trade-off between hardware constraints and detection granularity. Future models should consider sensor fusion approaches (e.g., accelerometers, gyroscopes) to enrich temporal resolution and improve robustness against data sparsity. Moreover, incorporating contextual and environmental data (e.g., weather conditions, land use, topography) can support behavioral modeling, while exploring crowdsourced labeling strategies and active learning pipelines can reduce labeling burdens and improve data quality. Investigating federated learning for privacy-aware training, which avoids transmitting sensitive location data [58,59], can also support future model advances. Future models should also control for socio-economic characteristics such as age and gender, which influence mobility patterns in distinct ways [60].

Importantly, the results demonstrate xSeCA’s cross-domain capabilities, where the model was evaluated on a new dataset from a region that was not included in the model training stage. This cross-geographic evaluation revealed consistent performance, validating the model’s generalizability. We recognize that ablation studies can offer deeper insights into domain-specific feature contributions. However, the transferability observed in our experiments, supported by diverse behavioral and infrastructural conditions, provides a sound basis for claiming robustness. Such cross-domain testing aligns with recent work emphasizing generalization in urban AI applications (e.g., [38]).

Combining datasets from varied regions introduces a risk of implicitly conflating regional context with travel mode characteristics—especially for modes exclusive to certain areas. While our dataset composition was carefully curated to ensure heterogeneity, quantifying this implicit bias remains difficult. Nonetheless, performance differences between regions reflect underlying differences in infrastructure and user behavior. For example, fragmented bike lanes in Tel Aviv may yield noisier trajectory data, while more structured networks in Copenhagen support clearer signal patterns. This further supports the need for context-aware modeling.

Finally, our work demonstrates that up to 95% of training data can remain unlabeled if tailored to a specific region, significantly reducing manual labeling efforts. xSeCA’s semi-supervised training scheme makes large-scale GNSS data mining feasible without costly surveys or questionnaires. Shapley-based feature importance analysis and latent space embedding further support interpretability and comparative analysis between regions. In this way, the model facilitates large-scale urban mobility analysis, enabling planners, policy makers and researchers to better understand and support the growth of micro-mobility in urban environments, contributing to better planning and design of urban transportation networks.

Author Contributions

Conceptualization, Eldar Lev-Ran, Valentino Servizi and Sagi Dalyot; methodology, Eldar Lev-Ran, Mirosława Łukawska, Valentino Servizi and Sagi Dalyot; software, Eldar Lev-Ran; validation, Eldar Lev-Ran and Mirosława Łukawska; formal analysis, Eldar Lev-Ran; investigation, Eldar Lev-Ran, Mirosława Łukawska, Valentino Servizi and Sagi Dalyot; resources, Sagi Dalyot; data curation, Sagi Dalyot; writing—original draft, Eldar Lev-Ran and Sagi Dalyot; writing—review and editing, Mirosława Łukawska, Valentino Servizi and Sagi Dalyot; visualization, Eldar Lev-Ran; supervision, Sagi Dalyot; project administration, Sagi Dalyot; funding acquisition, Sagi Dalyot. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the European Institute of Innovation and Technology (EIT) (grant number 21172).

Acknowledgments

Bird Global, for providing access to the Bird dataset.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Descriptive Analysis

Appendix A.1. Speed

Speed value distribution is logical: pedestrians move at the slowest speed values, followed by bicycles, e-bikes, e-scooters, and then motorized vehicles at the far end. Still, there exists an overlap in the mean and median speed value histograms per trip. Density in urban areas, as well as existing speed limits on certain roads, commonly prevent motorized vehicles from constantly driving at speeds higher than those of e-scooters, e-bicycles, bicycles and even of pedestrians. Depicted in Figure A1, some speed values indicate relatively higher probabilities of a single travel mode (e.g., walk ~0.0013 km/h, e-scooter ~0.0048 km/h). Other speed quantile metrics show similar histograms.

Figure A1. Mean (left) and median (right) speed histograms per label.

Appendix A.2. Relative Distance

The relative distance histograms, depicted in Figure A2, show a relatively clear separation in the 90% quantile metric among the walk, e-bike, and e-scooter travel mode histograms. It should be noted that this feature is highly influenced by the time interval, and hence values might be inconsistent and less reliable for analysis.

Figure A2. The 90% quantile (left) and standard deviation (right) relative distance histograms per label.

Appendix A.3. Acceleration

All acceleration value histograms depicted in Figure A3 show near complete overlap between all travel modes, other than e-scooter and e-bike. The two peaks in the e-bike histogram (right) might indicate two different trip types, i.e., urban, and rural (e.g., Bernitt dataset in Copenhagen).

Figure A3. Mean (left) and 10% quantile (right) acceleration histograms per travel mode.

Appendix A.4. Jerk

Jerk shows similar behavior as acceleration over the various metrics, albeit all travel modes show similar distribution for the mean jerk metric, as depicted in Figure A4.

Figure A4. Mean (left) and 90% quantile (right) jerk histograms per label.

Appendix A.5. Bearing

Bearing, depicted in Figure A5 and Figure A6, shows relatively uniform mean and standard deviation value distributions, with noticeable probability peaks for e-scooter, e-bike, and bicycle travel modes in the quantile distributions. The peaks are likely due to the attributed road network alignments in certain cities, as each peak signifies a direction close to or exactly north, south, east, and west (e.g., PedFlow dataset in Tel-Aviv).

Figure A5. Standard deviation (left) and 10% quantile (right) bearing histograms per travel mode.

Figure A6. Median (left) and 75% quantile (right) bearing histograms per label.

Appendix A.6. Bearing Rate

Bearing rate metrics, depicted in Figure A7, show mostly uniform results with a significant peak for e-scooters, signifying that they hold unique mobility properties, in this case rapid and swift turns, compared to the rest.

Figure A7. 90% quantile (left) and mean (right) bearing rate histograms per label.

Appendix A.7. Ramer–Douglas–Peucker

The RDP smoothing algorithm, depicted in Figure A8, shows a good separation between probability peaks for walk, bicycle, and e-scooter. The best separation is seen for small ε value, as increasing it gradually merges the probability distributions. It is assumed that the urban form influences this distribution, as navigation in denser urban areas requires frequent turns, such that micro mobility and walk travel mode trajectories are composed of numerous deviations; these travel modes are also less confined to street navigation restrictions and can move more freely over shorter distances.

Figure A8. Distance standard deviation between output points in trajectory segment after applying RDP at 0.5 [m] (left) and 4 [m] (right).

Appendix B. Shapley Analysis over the Validation Dataset

Appendix B.1. Speed

The detection model is positively affected by the speed features of bikes and e-bikes, as depicted in Figure 9 (top) in the main article. Unlike walk and e-scooter heatmaps, depicted in Figure A9, this mainly shows negative attributions for all speed values, meaning that speed hinders the NN’s ability to detect these travel modes. This is unlike the descriptive analysis, where a clear speed distribution difference between walk and rest was seen; as such, it could be assumed that the speed feature could contribute to the walk classification accuracy.

Figure A9. Normalized heatmap: attribution per speed value for walk (left) and e-scooter (right).

Appendix B.2. Relative Distance

This feature, depicted in Figure A10 and Figure A11, seems to mainly contribute to the detection of bikes and e-scooters. E-bikes, on the other hand, showed the largest negative attribution values for this feature. This contradicts the descriptive analysis, where major overlaps for all modes were seen, yet with a distinct peak for the e-bike distribution. Walk also showed positive albeit smaller attributions values (up to 8.5), while the ‘other’ mode showed small values centered around zero (−5 to 5). The largest concentration of positive bike-attribution values seems to be over distances that are smaller than those of e-scooters (~10 m and ~12.5 m, respectively).

Figure A10. Normalized heatmap: attribution per relative distance value for bicycle (left) and e-scooter (right).

Figure A11. Normalized heatmap: attribution per relative distance value for walk (left) and e-bike (right).

Appendix B.3. Acceleration and Jerk

Other than the attribution for e-bikes and e-scooters discussed in the main text, all other modes, depicted in Figure A12, show mainly zero if not negative attribution for all acceleration values. Jerk attribution shows almost identical trends to those of the acceleration attribution. These features may hold the most unique values of e-bike and e-scooter travel modes, having the largest acceleration range with positive attributions (see Table 4).

Figure A12. Normalized heatmap: attribution per acceleration value for other (left) and bicycle (right).

Appendix B.4. Bearing

The bearing heatmaps, depicted in Figure A13 and Figure A14, for bikes and e-scooters show close-to-uniform contributions from all bearing values. While no intuitive relations can be made between the calculated attributions per travel mode’s bearing values, it could simply be a beneficial input for the NN to calculate high level features that better benefit the label detection sequences made by modes. Another influence might be related to the road network’s orientation that shows values that have higher concentrations (such as e-scooters in Tel-Aviv). Thus, model detection in locations that show a distinct urban form might benefit from the use of this feature. The bearing feature holds one of the main differences from the descriptive analysis, where some travel-mode attributions are positive in the former yet have large bearing variance in the latter.

Figure A13. Normalized heatmap: attribution per bearing value for bicycle (left) e-scooter (right).

Figure A14. Normalized heatmap: attribution per bearing value for other (left) and e-bike (right).

Appendix B.5. Bearing Rate

In general, the bearing derivative, depicted in Figure A15 and Figure A16, shows positive attribution scores for bikes and negative scores for e-scooters. Walk and other modes mainly show positive attributions, yet with lower magnitudes (most attribution values do not exceed 5), with e-bike values showing even lower magnitudes. Similar to the bearing feature, this feature could provide input for the detection model by creating high level features.

Figure A15. Normalized heatmap: attribution per bearing rate value for bicycle (left) and e-scooter (right).

Figure A16. Normalized heatmap: attribution per bearing rate value for walk (left) and other (right).

Appendix B.6. Ramer–Douglas–Peucker

RDP with ε = 0.5 m, depicted in Figure A17, showed very high contribution scores for e-scooters. Excluding e-bikes, all other travel modes mainly showed small yet positive attribution values. These results are similar to the descriptive analysis (Section 4.2), mainly with distinct distance distributions of walk and e-scooters between consecutive RDP curve points.

Figure A17. Normalized heatmap: attribution per RDP distance value for e-scooter (left), walk (center) and e-bike (right).

References

Lindsey, R.; Santos, G. Addressing transportation and environmental externalities with economics: Are policy makers listening? Res. Transp. Econ. 2020, 82, 100872. [Google Scholar] [CrossRef]
Zhang, C.; Du, B.; Zheng, Z.; Shen, J. Space sharing between pedestrians and micro-mobility vehicles: A systematic review. Transp. Res. Part D Transp. Environ. 2023, 116, 103629. [Google Scholar] [CrossRef]
Abduljabbar, R.L.; Liyanage, S.; Dia, H. The role of micro-mobility in shaping sustainable cities: A systematic literature review. Transp. Res. Part D Transp. Environ. 2021, 92, 102734. [Google Scholar] [CrossRef]
Schumann, H.H.; Haitao, H.; Quddus, M. Passively generated big data for micro-mobility: State-of-the-art and future research directions. Transp. Res. Part D Transp. Environ. 2023, 121, 103795. [Google Scholar] [CrossRef]
Kaya, Ö. Designing green and safe micro mobility routes: An advanced geo-analytic decision system based approach to sustainable urban infrastructure. Eng. Sci. Technol. Int. J. 2025, 64, 102027. [Google Scholar] [CrossRef]
Altintasi, O.; Yalcinkaya, S. Siting charging stations and identifying safe and convenient routes for environmentally sustainable e-scooter systems. Sustain. Cities Soc. 2022, 84, 104020. [Google Scholar] [CrossRef]
Toole, J.L.; Colak, S.; Sturt, B.; Alexander, L.P.; Evsukoff, A.; González, M.C. The path most traveled: Travel demand estimation using big data resources. Transp. Res. Part C Emerg. Technol. 2015, 58, 162–177. [Google Scholar] [CrossRef]
Klöckner, C.A.; Friedrichsmeier, T. A multi-level approach to travel mode choice—How person characteristics and situation specific aspects determine car use in a student sample. Transp. Res. Part F Traffic Psychol. Behav. 2011, 14, 261–277. [Google Scholar] [CrossRef]
Tokey, A.I.; Shioma, S.A.; Jamal, S. Analysis of spatiotemporal dynamics of e-scooter usage in Minneapolis: Effects of the built and social environment. Multimodal Transp. 2022, 1, 100037. [Google Scholar] [CrossRef]
Sui, Y.; Zhang, H.; Song, X.; Shao, F.; Yu, X.; Shibasaki, R.; Sun, R.; Yuan, M.; Wang, C.; Li, S.; et al. GPS data in urban online ride-hailing: A comparative analysis on fuel consumption and emissions. J. Clean. Prod. 2019, 227, 495–505. [Google Scholar] [CrossRef]
Prelipcean, A.C.; Gidófalvi, G.; Susilo, Y.O. Transportation mode detection—An in-depth review of applicability and reliability. Transp. Rev. 2017, 37, 442–464. [Google Scholar] [CrossRef]
Sarker, M.A.A.; Asgari, H.; Chowdhury, A.Z.; Jin, X. Exploring Micromobility Choice Behavior across Different Mode Users Using Machine Learning Methods. Multimodal Transp. 2024, 3, 100167. [Google Scholar] [CrossRef]
Wolf, J.; Guensler, R.; Bachman, W. Elimination of the travel diary: Experiment to derive trip purpose from global positioning system travel data. Transp. Res. Rec. J. Transp. Res. Board 2001, 1768, 125–134. [Google Scholar] [CrossRef]
Angel, A.; Cohen, A.; Dalyot, S.; Plaut, P. Estimating pedestrian traffic with Bluetooth sensor technology. Geo-Spat. Inf. Sci. 2023, 27, 1391–1404. [Google Scholar] [CrossRef]
Mun, M.; Estrin, D.; Burke, J.; Hansen, M. Parsimonious mobility classification using GSM and WiFi traces. In Proceedings of the Fifth Workshop on Embedded Networked Sensors (HotEmNets), Charlottesville, VA, USA, 2–3 June 2008; pp. 1–5. [Google Scholar]
Argyros, D.; Jensen, A.F.; Rich, J.; Dalyot, S. Riding smooth: A cost-benefit assessment of surface quality on Copenhagen’s bicycle network. Sustain. Cities Soc. 2024, 108, 105473. [Google Scholar] [CrossRef]
Bohte, W.; Maat, K. Deriving and validating trip purposes and travel modes for multi-day GPS-based travel surveys: A large-scale application in the Netherlands. Transp. Res. Part C Emerg. Technol. 2009, 17, 285–297. [Google Scholar] [CrossRef]
Sadeghian, P.; Håkansson, J.; Zhao, X. Review and evaluation of methods in transport mode detection based on GPS tracking data. J. Traffic Transp. Eng. (Engl. Ed.) 2021, 8, 467–482. [Google Scholar] [CrossRef]
Martin, B.D.; Addona, V.; Wolfson, J.; Adomavicius, G.; Fan, Y. Methods for real-time prediction of the mode of travel using smartphone-based GPS and accelerometer data. Sensors 2017, 17, 2058. [Google Scholar] [CrossRef]
Yazdizadeh, A.; Patterson, Z.; Farooq, B. An automated approach from GPS traces to complete trip information. Int. J. Transp. Sci. Technol. 2019, 8, 82–100. [Google Scholar] [CrossRef]
Zha, W.; Guo, Y.; Li, B.; Liu, D.; Zhang, X. Individual Travel Transportation Modes Identification Based on Deep Learning Algorithm of Attention Mechanism. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 3609–3614. [Google Scholar]
Zhou, X.; Yu, W.; Sullivan, W.C. Making pervasive sensing possible: Effective travel mode sensing based on smartphones. Comput. Environ. Urban Syst. 2016, 58, 52–59. [Google Scholar] [CrossRef]
Stenneth, L.; Wolfson, O.; Yu, P.S.; Xu, B. Transportation mode detection using mobile phones and GIS information. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Chicago, IL, USA, 1–4 November 2011; pp. 54–63. [Google Scholar]
Roy, A.; Fuller, D.; Stanley, K.; Nelson, T. Classifying transport mode from global positioning systems and accelerometer data: A machine learning approach. Findings 2020, 1–8. [Google Scholar] [CrossRef]
Xiao, G.; Cheng, Q.; Zhang, C. Detecting travel modes using rule-based classification system and Gaussian process classifier. IEEE Access 2019, 7, 116741–116752. [Google Scholar] [CrossRef]
Xiao, Z.; Wang, Y.; Fu, K.; Wu, F. Identifying different transportation modes from trajectory data using tree-based ensemble classifiers. ISPRS Int. J. Geo-Inf. 2017, 6, 57. [Google Scholar] [CrossRef]
Wang, B.; Gao, L.; Juan, Z. Travel mode detection using GPS data and socioeconomic attributes based on a random forest classifier. IEEE Trans. Intell. Transp. Syst. 2017, 19, 1547–1558. [Google Scholar] [CrossRef]
Yazdizadeh, A.; Patterson, Z.; Farooq, B. Ensemble convolutional neural networks for mode inference in smartphone travel survey. IEEE Trans. Intell. Transp. Syst. 2019, 21, 2232–2239. [Google Scholar] [CrossRef]
Zhang, L.; Dalyot, S.; Eggert, D.; Sester, M. Multi-stage approach to travel-mode segmentation and classification of GPS traces. Geospatial Data Infrastructure: From Data Acquisition and Updating to Smarter Services. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2011, W25, 87–93. [Google Scholar] [CrossRef]
Dabiri, S.; Lu, C.-T.; Heaslip, K.; Reddy, C.K. Semi-supervised deep learning approach for transportation mode identification using GPS trajectory data. IEEE Trans. Knowl. Data Eng. 2019, 32, 1010–1023. [Google Scholar] [CrossRef]
James, J.Q. Semi-supervised deep ensemble learning for travel mode identification. Transp. Res. Part C Emerg. Technol. 2020, 112, 120–135. [Google Scholar]
Li, L.; Zhu, J.; Zhang, H.; Tan, H.; Du, B.; Ran, B. Coupled application of generative adversarial networks and conventional neural networks for travel mode detection using GPS data. Transp. Res. Part A Policy Pract. 2020, 136, 282–292. [Google Scholar] [CrossRef]
Wood, J.; Bradley, S.; Hamidi, S. Preparing for Progress: Establishing Guidelines for the Regulation, Safe Integration, and Equitable Usage of Dockless Electric Scooters in American Cities. 2019. Available online: https://rosap.ntl.bts.gov/view/dot/54415 (accessed on 23 May 2023).
Brunner, H.; Hirz, M.; Hirschberg, W.; Fallast, K. Evaluation of various means of transport for urban areas. Energy Sustain. Soc. 2018, 8, 9. [Google Scholar] [CrossRef]
Gilroy, S.; Mullins, D.; Jones, E.; Parsi, A.; Glavin, M. E-Scooter Rider detection and classification in dense urban environments. Results Eng. 2022, 16, 100677. [Google Scholar] [CrossRef]
Xiao, G.; Juan, Z.; Zhang, C. Travel mode detection based on GPS track data and Bayesian networks. Comput. Environ. Urban Syst. 2015, 54, 14–22. [Google Scholar] [CrossRef]
Liu, L.; Li, Y.; Gruyer, D.; Tu, M. Non-linear relationship between built environment and active travel: A hybrid model considering spatial heterogeneity. Int. J. Transp. Sci. Technol. 2024; in press. [Google Scholar] [CrossRef]
Fang, Z.; Wu, D.; Pan, L.; Chen, L.; Gao, Y. When Transfer Learning Meets Cross-City Urban Flow Prediction: Spatio-Temporal Adaptation Matters. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 23–29 July 2022; Volume 22, pp. 2030–2036. [Google Scholar]
Jain, Y.; Pandey, K. Transforming Urban Mobility: A Systematic Review of AI-Based Traffic Optimization Techniques. Arch. Comput. Methods Eng. 2025, 1–37. [Google Scholar] [CrossRef]
Alaoui, F.T.; Fourati, H.; Kibangou, A.; Robu, B.; Vuillerme, N. Kick-scooters detection in sensor-based transportation mode classification methods. IFAC-Pap. 2021, 54, 81–86. [Google Scholar] [CrossRef]
Matkovic, V.; Waltereit, M.; Zdankin, P.; Weis, T. Towards bike type and e-scooter classification with smartphone sensors. In Proceedings of the MobiQuitous 2020—17th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, Darmstadt, Germany, 7–9 December 2020; pp. 395–404. [Google Scholar]
Zheng, Y.; Xie, X.; Ma, W.Y. GeoLife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull. 2010, 33, 32–39. [Google Scholar]
Bernitt, C. The Choice Between E-Bike and Car—Economical Decision by Data Mining. Unpublished Bachelor’s Thesis, Technical University of Denmark, Lyngby, Denmark, 2017. [Google Scholar]
Shankari, K.; Fuerst, J.; Argerich, M.F.; Avramidis, E.; Zhang, J. MobilityNet: Towards a Public Dataset for Multi-modal Mobility Research. In Proceedings of the ICLR 2020 Workshop on Tackling Climate Change with Machine Learning, Addis Ababa, Ethiopia, 26 April 2020. [Google Scholar]
Sultan, J.; Ben-Haim, G.; Haunert, J.; Dalyot, S. Extracting spatial patterns in bicycle routes from crowdsourced data. Trans. GIS 2017, 21, 1321–1340. [Google Scholar] [CrossRef]
Cohen, A.; Dalyot, S. Machine-learning prediction models for pedestrian traffic flow levels: Towards optimizing walking routes for blind pedestrians. Trans. GIS 2020, 24, 1264–1279. [Google Scholar] [CrossRef]
Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
Reddi, S.J.; Kale, S.; Kumar, S. On the convergence of adam and beyond. arXiv 2019, arXiv:1904.09237. [Google Scholar] [CrossRef]
Zou, Z.; Younes, H.; Erdoğan, S.; Wu, J. Exploratory analysis of real-time e-scooter trip data in Washington, DC. Transp. Res. Rec. J. Transp. Res. Board 2020, 2674, 285–299. [Google Scholar] [CrossRef]
Douglas, D.H.; Peucker, T.K. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica 1973, 10, 112–122. [Google Scholar]
Li, T.; Kovaceva, J.; Dozza, M. Modeling collision avoidance maneuvers for micromobility vehicles. J. Saf. Res. 2023, 87, 232–243. [Google Scholar] [CrossRef]
Strumbelj, E.; Kononenko, I. An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 2010, 11, 1–18. [Google Scholar]
Cafiso, S.; Di Graziano, A.; Marchetta, V.; Pappalardo, G. Urban road pavements monitoring and assessment using bike and e-scooter as probe vehicles. Case Stud. Constr. Mater. 2022, 16, e00889. [Google Scholar] [CrossRef]
Zhang, Y.; Shi, X.; Zhang, S.; Abraham, A. A xgboost-based lane change prediction on time series data using feature engineering for autopilot vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19187–19200. [Google Scholar] [CrossRef]
Vlachogiannis, D.M.; Moura, S.; Macfarlane, J. Intersense: An XGBoost model for traffic regulator identification at intersections through crowdsourced GPS data. Transp. Res. Part C Emerg. Technol. 2023, 151, 104112. [Google Scholar] [CrossRef]
Guo, M.; Liang, S.; Zhao, L.; Wang, P. Transportation mode recognition with deep forest based on GPS data. IEEE Access 2020, 8, 150891–150901. [Google Scholar] [CrossRef]
Ma, Q.; Yang, H.; Mayhue, A.; Sun, Y.; Huang, Z.; Ma, Y. E-Scooter safety: The riding risk analysis based on mobile sensing data. Accid. Anal. Prev. 2021, 151, 105954. [Google Scholar] [CrossRef]
Reddi, S.; Charles, Z.; Zaheer, M.; Garrett, Z.; Rush, K.; Konečný, J.; Kumar, S.; McMahan, H.B. Adaptive federated optimization. arXiv 2020, arXiv:2003.00295. [Google Scholar]
Ezequiel, C.E.J.; Gjoreski, M.; Langheinrich, M. Federated learning for privacy-aware human mobility modeling. Front. Artif. Intell. 2022, 5, 867046. [Google Scholar] [CrossRef]
Roy, A.; Fuller, D.; Nelson, T.; Kedron, P. Assessing the role of geographic context in transportation mode detection from GPS data. J. Transp. Geogr. 2022, 100, 103330. [Google Scholar] [CrossRef]

Figure 1. Schematic locator maps providing geographic context for the GeoLife (A), Bernitt (B), Bird (C), MobilityNet (D), Sultan (Osnabrück data only, (E)), and PedFlow (F) travel datasets. Brighter colors indicate higher speeds. Background: Google Maps. Scale bars are 5 km, except (A,D), which use 20 km, and (F), which uses 1 km.

Figure 2. xSeCA mode detection model—implementation workflow (* xSeCA).

Figure 3. Implemented architecture: blue—encoder; green—decoder; red—classifier. Ovals—input and output vectors.

Figure 4. RDP implementation: train (left) and e-bike (right). Output RDP points (green) that are a subset of the original sequence of trajectory waypoints (red). Background: OpenStreetMap contributors, Beijing area.

Figure 5. The 75% quantile speed (top) and 90% quantile acceleration (bottom) histograms per label.

Figure 6. Convergence rate (Y-axis) per training batch (X-axis) in the first training epoch of one experiment.

Figure 7. Average training recall and precision scores (Y-axis) per epoch (X-axis) of all labels (A–E), with the NN loss scores (F).

Figure 8. Average validation recall and precision scores (Y-axis) per epoch (X-axis) of all labels (A–E), with the NN loss scores (F).

Figure 9. Normalized heatmaps for (top) attribution per speed value for e-bike (left) and bike (right), and (bottom) attribution per acceleration value for e-bike (left) and e-scooter (right).

Table 1. Travel datasets—statistical summary.

Dataset	Location/Collection Date	Travel Mode						User Number	Trajectory Number	Total Hours	Total Kilometers	High Sampling Rate (0.2 hz–1 hz)
Dataset	Location/Collection Date	Bike	e-Bike	e-Scooter	Walk	Other	Unlabeled	User Number	Trajectory Number	Total Hours	Total Kilometers	High Sampling Rate (0.2 hz–1 hz)
Geolife (A) Microsoft Research Asia [42]	China (mainly Beijing)/April 2007 to August 2012	+			+	+	+	182	18,670	631,483 (99% unlabeled)	1861,195 (93% unlabeled)	91.5%
Bernitt (B) Technical University of Denmark [43]	Denmark (mainly Copenhagen)/2014 to 2016		+					47	4339	2342	38,204	99.1%
Bird (C)	Israel (Tel Aviv)/June 2020			+				NA	39,431	8156	272,434	97.1%
MobilityNet (D) [44]	United States (San Franciso)/2020	+	+	+	+	+		NA	NA	466	27,748	91.2%
Sultan (E) GPSies (acquired by Alltrails) (https://www.alltrails.com/explore (accessed on 12 July 2025)) [45]	Germany (Osnabruck) and The Netherlands (Amsterdam)/June 2015	+						NA	140	830	25,672	84.1%
PedFlow (F) [46]	Israel (Tel Aviv)/July 2018				+			1	3	65	20	92.2%

Table 2. Comparative validation results (all values in %).

Method	Score	Bike	E-Bike	E-Scooter	Walk	Other
XGBoost⁶⁴	Recall	75.9	93.8	100.0	93.1	86.4
	Precision	67.4	97.3	100.0	70.0	95.3
	F1-score	71.4	95.5	100.0	79.9	90.6
xSeCA⁶⁴	Recall	77.7	94.9	100.0	92.4	86.5
	Precision	71.1	97.6	100.0	71.7	95.1
	F1-score	74.0	96.2	100.0	80.7	90.6
SeCA⁶⁴	Recall	75.8	94.9	99.9	92.7	87.4
	Precision	69.9	97.3	100.0	72.4	95.4
	F1-score	72.7	96.1	99.9	81.3	91.2
xSeCA³²	Recall	71.3	93.6	99.9	92.8	82.0
	Precision	67.6	96.7	99.9	65.9	94.9
	F1-score	69.1	95.1	99.9	77.0	88.0
SeCA³²	Recall	71.3	93.2	99.7	91.3	83.9
	Precision	65.7	96.6	99.9	66.2	94.6
	F1-score	68.3	94.9	99.8	76.8	88.9

Table 3. Testing results (in %).

% of Data for Training	Convergence Epoch	Score	Bike	E-Bike	E-Scooter	Walk	Other
None	N.A.	Recall	10.1	88.8	0.0	12.4	38.0
		Precision	20.6	8.2	0.0	51.7	98.6
		F1-score	13.5	15.0	0.0	20.0	54.9
5	7	Recall	80.9	61.0	68.9	97.3	90.5
		Precision	85.3	59.9	70.2	89.4	94.7
		F1-score	81.8	57.7	68.7	93.2	92.5
10	12	Recall	86.7	66.2	78.6	96.7	92.5
		Precision	83.6	70.7	68.4	93.2	94.9
		F1-score	84.9	67.9	72.4	94.9	93.7
20	16	Recall	88.5	74.5	82.7	97.2	93.6
		Precision	81.9	75.5	73.2	95.7	95.8
		F1-score	84.5	74.2	76.9	96.4	94.7

Table 4. Average attribution values, validation data. Color scheme depicts strong negative (yellow) to strong positive (green) attribution values.

Label	Relative Distance	Speed	Acceleration	Jerk	Bearing	Bearing Rate	RDP (0.5 m)
Bike	3.3	4.84	−1.13	−1.19	4.63	2.95	0.91
E-Bike	−7.2	3.69	3.74	5.7	−5.06	0.2	−0.5
E-Scooter	9.38	−14.11	6.14	3.63	4.86	−8.51	10.3
Other	1.32	7.23	−0.25	−0.32	1.73	1.59	1.81
Walk	1.17	−2.04	−0.18	−0.21	1.91	1.78	1.94
Average	1.6	−0.08	1.66	1.52	1.61	−0.4	2.89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lev-Ran, E.; Łukawska, M.; Servizi, V.; Dalyot, S. Cross-Domain Travel Mode Detection for Electric Micro-Mobility Using Semi-Supervised Learning. ISPRS Int. J. Geo-Inf. 2025, 14, 358. https://doi.org/10.3390/ijgi14090358

AMA Style

Lev-Ran E, Łukawska M, Servizi V, Dalyot S. Cross-Domain Travel Mode Detection for Electric Micro-Mobility Using Semi-Supervised Learning. ISPRS International Journal of Geo-Information. 2025; 14(9):358. https://doi.org/10.3390/ijgi14090358

Chicago/Turabian Style

Lev-Ran, Eldar, Mirosława Łukawska, Valentino Servizi, and Sagi Dalyot. 2025. "Cross-Domain Travel Mode Detection for Electric Micro-Mobility Using Semi-Supervised Learning" ISPRS International Journal of Geo-Information 14, no. 9: 358. https://doi.org/10.3390/ijgi14090358

APA Style

Lev-Ran, E., Łukawska, M., Servizi, V., & Dalyot, S. (2025). Cross-Domain Travel Mode Detection for Electric Micro-Mobility Using Semi-Supervised Learning. ISPRS International Journal of Geo-Information, 14(9), 358. https://doi.org/10.3390/ijgi14090358

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cross-Domain Travel Mode Detection for Electric Micro-Mobility Using Semi-Supervised Learning

Abstract

1. Introduction

2. Related Research

2.1. Mode-Detection Models

2.2. Micro-Mobility Mode Detection

3. Methodology

3.1. Dataset Description

3.2. Definitions

3.3. Model Architecture

3.4. Model Features

3.5. Model Evaluation

3.5.1. Training and Evaluation

3.5.2. Score Metrics

3.6. Travel Property Analysis

4. Results

4.1. Descriptive Analysis

4.2. Training Results

4.3. Validation Results

4.4. xSeCA Variations and Comparison to XGBoost

4.5. Cross-Domain Analysis

4.6. Travel Property Analysis

5. Discussion and Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Descriptive Analysis

Appendix A.1. Speed

Appendix A.2. Relative Distance

Appendix A.3. Acceleration

Appendix A.4. Jerk

Appendix A.5. Bearing

Appendix A.6. Bearing Rate

Appendix A.7. Ramer–Douglas–Peucker

Appendix B. Shapley Analysis over the Validation Dataset

Appendix B.1. Speed

Appendix B.2. Relative Distance

Appendix B.3. Acceleration and Jerk

Appendix B.4. Bearing

Appendix B.5. Bearing Rate

Appendix B.6. Ramer–Douglas–Peucker

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI