GPS-Assisted State-of-Charge Prediction for Electric Vehicles in Shuttle Service Applications

Joh, Woncheol; Park, Kyuyong; Shin, Donghwa; Kim, Jaemin

doi:10.3390/electronics15051014

Open AccessArticle

GPS-Assisted State-of-Charge Prediction for Electric Vehicles in Shuttle Service Applications

¹

Department of Electronics Engineering, Myongji University, Yongin 17058, Republic of Korea

²

dSPECTER, Seongnam 13453, Republic of Korea

³

School of AI Convergence, Soongsil University, Seoul 06978, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Electronics 2026, 15(5), 1014; https://doi.org/10.3390/electronics15051014

Submission received: 13 January 2026 / Revised: 11 February 2026 / Accepted: 23 February 2026 / Published: 28 February 2026

(This article belongs to the Section Electrical and Autonomous Vehicles)

Download

Browse Figures

Versions Notes

Abstract

Accurate prediction of the state of charge (SoC) of batteries is essential for ensuring the safe, reliable, and uninterrupted operation of electric vehicles (EVs). The prediction fundamentally depends on the ability to accurately predict power consumption. This study investigates the use of GPS-derived information to support SoC prediction, with a particular focus on repeated loop routes such as campus shuttles and closed-circuit EV operations. Real-world driving data are collected using a self-built electric vehicle equipped with a custom battery management system (BMS). These data are used to train three deep learning models, namely gated recurrent unit (GRU), long short-term memory (LSTM), and Transformer, to predict the future SoC of the EV. Experimental results show that the GPS-assisted model consistently outperforms the non-GPS baseline, achieving up to a 23% improvement in prediction accuracy for one-minute-ahead predictions and up to a 76% improvement for ten-minute-ahead predictions. These results demonstrate that GPS-assisted SoC prediction can be effective for forward-looking energy management in practical electric mobility applications.

Keywords:

electric vehicles; GPS-assisted state of charge prediction; shuttle service; deep learning

1. Introduction

Prediction of the battery state of charge (SoC), which is a forward-looking process that forecasts future SoC based on currently available information, is crucial to ensuring the safety and reliability of electric vehicles (EVs) [1]. Online SoC estimation provides essential information about the current status of the battery. Such estimation is critical because it serves as baseline information to prevent unexpected energy shortages and enable safe operation. However, SoC estimation for EVs is not sufficient to provide accurate information, such as predicting the exact point when the vehicle becomes inoperable. For example, drivers often rely solely on their experience when deciding when to charge the vehicle. An anxious driver may interrupt their service to charge the vehicle even when the remaining SoC is sufficient to complete the route. On the other hand, an overly optimistic driver may face a shortage of SoC while on the road. SoC prediction involves forecasting future battery states over a specified time horizon. This process plays a pivotal role in enabling proactive energy strategies such as adaptive power control, optimal route planning, charging schedule optimization, and energy-aware driving assistance [2].

SoC prediction methods that rely solely on conventional estimation techniques often suffer from compounded errors. For example, estimation techniques, such as Coulomb counting, are widely used due to their simplicity and low computational requirements. However, they suffer from accumulated integration errors and drift caused by current sensor bias, making them less reliable over longer prediction horizons [1,2]. Model-based approaches, including equivalent circuit models and Kalman filter techniques, can partially address these limitations, but they often depend heavily on precise parameter identification and tend to degrade under variable and uncertain real-world conditions [2]. Moreover, temperature variations, fluctuating loads, differences in driving behavior, and changes in environmental conditions introduce nonlinearities that are difficult to model explicitly. As a result, an accurate SoC prediction in practical EV applications remains a challenging task.

Recent advances in deep learning offer new opportunities to overcome these challenges. Recurrent neural networks (RNNs), Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRUs), and Transformer-based architectures have shown strong capabilities in modeling complex temporal dependencies without requiring explicit physical modeling [1]. These data-driven approaches are particularly well-suited for learning nonlinear relationships between multi-sensor signals and SoC dynamics. However, most prior work has focused primarily on battery-side measurements, such as voltage, current, temperature, and historical SoC, while ignoring external contextual factors that can significantly influence future power consumption [3].

This study focuses on electric vehicles used for regular-route shuttle services whose estimated SoCs are observable and exhibit consistent driving patterns. Unlike private vehicles, shuttle buses typically run fixed routes at set times. They stop and vary their mass load as passengers board and disembark at a specific point, which reinforces spatial regularity. Especially on certain shuttles, such as the in-campus shuttle, there are no traffic lights, which increases spatial regularity. The shuttle bus’s driving profile is also generally more stable than that of private vehicles, since it must adhere to the given schedule. Hence, locational data shows a strong correlation among acceleration, mass-load, and road slope, which are major factors in vehicle power consumption. On the other hand, temperature data, which affects battery discharge efficiency, is also collected from the battery management system (BMS). Therefore, integrating GPS and BMS data, such as voltage, current, temperature, and estimated SoC, may improve the accuracy of predicting a vehicle’s energy consumption, especially for shuttle service vehicles. It explores integrating GPS-derived geographic data, such as driving location and altitude, into deep learning frameworks for SoC prediction. GPS information provides valuable contextual cues about driving behavior, route topology, and road gradient, which directly affect vehicle power demand and thus battery discharge characteristics [3,4]. Incorporating such information may enable the predictive model to anticipate energy consumption patterns more effectively, leading to improved operationally relevant horizon SoC prediction performance. To evaluate this hypothesis, three neural network (NN) architectures are applied and compared, including LSTM, GRU, and Transformer models. Real-world driving data collected from a self-built electric vehicle and an in-house BMS are also used.

Although GPS-derived features have been less explored in conventional battery prediction models, recent studies in the field of energy-aware mobility suggest that location-aware features can significantly enhance forecasting accuracy [3,4]. By leveraging this insight, our work aims to systematically investigate the role of GPS information in improving SoC prediction.

The main contributions of this paper are as follows:

A context-aware deep learning framework integrating GPS-derived features for SoC prediction;
A comparative analysis of three neural architectures (LSTM, GRU, and Transformer) using real-world shuttle EV data;
Empirical demonstration that incorporating location-based features improves SoC prediction performance for operationally relevant prediction horizons.

The rest of the paper is organized as follows. Section 2 reviews related work, including approaches to SoC prediction and NNs. In Section 3, the paper introduces an overview of the process. Section 4 describes the target electric vehicle used for the experiment. The deep learning models used in this paper are introduced in Section 5. The experiment setup and results are presented in Section 6 and Section 7, respectively. Finally, the conclusion summarizes the work with limitations.

2. Related Works

Several studies have been conducted to enhance the accuracy of SoC prediction. The SoC estimation leverages past and present power consumption data to determine the current state. Both for estimation and prediction, a physics-based approach is heavily studied. Physics-based models explicitly incorporate traditional electrochemical mechanisms and design parameters, thereby providing high accuracy. However, they often entail high computational costs when capturing complex dynamics and uncertainties. In contrast, data-driven models can achieve fast, accurate predictive performance under specific operating conditions, even with limited observable data. However, when physical constraints are not explicitly incorporated, such models may exhibit a strong black-box behavior and suffer from limited generalization. Recently, hybrid modeling approaches that combine physical insight with data-driven methods have been actively researched to address these limitations [5,6]. However, predicting future SoC or EV power consumption inevitably suffers from uncertainty, unless the future driving trajectory is already planned.

One approach is to improve SoC prediction accuracy by considering external environmental factors, such as driving speed, road gradient, and ambient temperature, recognizing that future SoC cannot be reliably predicted from historical power signals alone. Madziel et al. [4] proposed a framework for SoC prediction in electric vehicles based on external driving variables. The model’s input consisted of vehicle speed, acceleration, road gradient, and ambient temperature to estimate the battery discharge behavior. This study aimed to reduce system complexity for traffic microsimulation by excluding direct voltage and current sensing. The results demonstrated that environmental factors have a significant influence on energy consumption during dynamic driving conditions. However, the model does not incorporate battery pack voltage or current, which could reflect fluctuating vehicle loads, such as the number of passengers. Liu et al. [7] demonstrated that variations in road slope can lead to substantial differences in energy usage, especially during urban stop-and-go operations, using a statistical approach. The study emphasized the necessity of incorporating environmental and topographical information for accurate range and SoC prediction. While these studies highlight the critical role of topography and environmental factors in energy consumption, their modeling approaches primarily focus on external driving conditions without considering real-time sensor data, such as battery voltage and current, which reflect varying passenger loads. Furthermore, these approaches provide valuable physical insights but lack a predictive structure suitable for real-time SoC prediction within constrained embedded systems.

Another major research direction focuses on NNs, which have been raised as an emerging technology to predict the battery SoC of EVs in recent years by learning the nonlinear relationships between battery signals and SoC under various operating and aging conditions. El Fallah et al. [8] developed a comprehensive deep-learning framework for lithium-ion battery SoC prediction under temperature and aging variations. The study compared multiple architectures, including deep neural networks (DNN), GRU, and LSTM networks, across diverse operating conditions. Their results showed that recurrent structures such as GRU and LSTM outperformed feed-forward models by effectively capturing temporal dependencies. The framework highlighted the potential of neural models to learn nonlinear electrochemical dynamics from historical current and voltage data. Nevertheless, these models are primarily “context-blind”; by focusing exclusively on internal battery signals (current, voltage), they do not account for upcoming external driving factors such as speed, altitude, or route information, which strongly affect the discharge profile. This highlights a significant gap, as real-world driving conditions have been shown to be critical for reliable SoC and energy-related predictions.

Ariche et al. [9] conducted a comparative study of machine learning approaches for SoC prediction under real driving conditions. They considered the impact of factors such as road topology and heating, ventilation, and air conditioning (HVAC) usage on battery discharge behavior, and evaluated multiple models, including linear regression, support vector regression, random forests, and NNs. Their findings underscored the importance of incorporating real-world driving conditions to achieve reliable SoC predictions. Although the approach improved prediction accuracy by accounting for realistic operating conditions, it relied primarily on vehicle-level signals and did not explicitly leverage detailed spatial or route-based information obtained from GPS. Consequently, the model’s ability to predict based on upcoming road characteristics remains limited. Another work has further demonstrated the role of route-related context for a vehicle’s energy consumption prediction, which is fundamental for reliable future SoC prediction. Petkevicius et al. [10] investigated electric vehicle (EV) energy consumption prediction using probabilistic deep learning. By modeling route information and speed profiles, they demonstrated that external driving factors play a crucial role in determining energy consumption, which is closely linked to the discharge profile. Their study highlighted the effectiveness of integrating probabilistic models with deep learning to capture uncertainty in prediction. Although their work did not focus on SoC directly, it nevertheless provided strong evidence that incorporating route-related information, which can be obtained from GPS, is essential for accurate predictions. Crucially, for such predictive insights to be of practical utility to the driver, the SoC estimation must be performed online and in-vehicle, necessitating a robust embedded implementation. Hong et al. [11] proposed a real driving cycle-based SoC prediction framework using deep learning models. In their study, driving data were collected from the vehicle interface, and GPS information of traffic lights was employed to segment driving cycles into meaningful intervals. They applied Dynamic Time Warping (DTW) to group similar patterns and developed a Temporal Attention LSTM (TA LSTM) model to predict vehicle speed, which was then fed into a Functional Mock-Up Interface (FMI) simulation to obtain SoC predictions. Their results showed improved accuracy, but they used GPS data only indirectly for cycle segmentation, not as direct model features.

Considering these findings, it is effective to use NN models combined with external environmental factors to predict SoC. Recent studies focus on this. For example, Tang et al. [12] introduced an Improved Genetic Algorithm (IGA) coupled with a GRU network to enhance lithium-ion battery SoC prediction accuracy. By integrating external temperature and current variations, the study partially addressed the effect of environmental conditions on battery behavior. The proposed IGA-GRU model achieved superior predictive accuracy compared to conventional GRU and back-propagation networks. Nevertheless, while temperature was treated as an external variable, other dynamic environmental and spatial factors, such as road topology and vehicle speed, were not explicitly modeled. On the other hand, Babu et al. [13] explored real-time driving telemetry—including current, voltage, ambient temperature, and driving cycle data—and applied various machine learning regression algorithms (including NNs) to predict battery SoC. Although their results demonstrated improved prediction performance by incorporating driving-condition variables, the model did not explicitly include detailed spatial or route-based features such as GPS latitude/longitude, altitude, or travel heading. De Cauwer et al. [14] proposed a data-driven regression framework for predicting electric-vehicle (EV) energy consumption under real-world driving conditions. Their approach used GPS-derived route information—such as road gradients, speed limits, and traffic patterns—to estimate energy consumption along candidate paths, demonstrating that external geographic factors substantially influence prediction accuracy. Although their method relied on conventional regression models rather than NNs, their findings highlight the strong predictive value of route-level contextual information. This finding supports the motivation of this study, in which GPS-based driving context is not only incorporated but also modeled using NNs to capture nonlinearities and temporal dependencies in SoC variations; the data-driven approach adopted in this study is not chosen merely to align with recent research trends. Specifically, this work does not aim to accurately estimate the internal battery state but rather to predict short-term future SoC in operational environments with repetitive driving routes, such as shuttle services.

A qualitative comparison with representative SoC prediction studies is summarized in Table 1. A direct quantitative comparison with external works is not directly feasible due to fundamental differences in target scenarios and experimental settings. Existing studies, such as [15,16], focus on private-vehicle scenarios and rely on general-purpose driving datasets collected from electric vehicles operating along diverse, largely unconstrained urban or highway routes. In contrast, our work focuses on an embedded model tailored for shuttle service scenarios, characterized by route-constrained operation, high spatial regularity, and repetitive stop-and-go patterns at predefined stations, which are dominant factors influencing energy consumption. It differs fundamentally from the operating conditions of private electric vehicles. In particular, unlike prior works that utilize GPS information indirectly (e.g., for driving cycle segmentation [11]), our approach directly incorporates latitude, longitude, and altitude as primary model inputs and adopts a ΔSoC prediction formulation. This enables the model to learn route-specific, location-dependent energy consumption patterns for repetitive shuttle operations.

In summary, prior studies have emphasized either the indirect use of GPS for segmentation, the integration of environmental conditions, or the impact of real-world auxiliary loads on SoC prediction. However, this work directly incorporates GPS data (latitude, longitude, altitude, speed, and heading) as input to deep learning models, in contrast to prior works that primarily use GPS information for segmentation or consider external factors in a limited manner. This approach not only enhances SoC prediction but also trains limited embedded models that meet the specific needs for shuttle-bus-like driving scenarios, where route patterns strongly correlate with power consumption.

3. Overview of the Process

The predicted SoC at time

t + τ

from the current time t is defined as

\hat{SoC} (t + τ) = SoC (t) + Δ \hat{SoC} (t + τ),

(1)

where

Δ \hat{SoC} (t + τ)

denotes the SoC variation at time t +

τ

, and

SoC (t)

represents the estimated SoC at the current time t. The SoC at time t (

SoC (t)

) can be estimated by various methods, such as models coupled with aging mechanisms for accurate estimation. While a more precise estimation method, such as electrochemical modeling, may improve SoC estimation accuracy, this paper focuses on enhancing short-term (1- or 10-min) SoC prediction. Although battery aging, consisting of both calendar and cycle aging, is a critical factor in long-term battery management, its impact is negligible within the 1-to-10-min prediction horizon addressed in this work, as aging is a degradation process that typically progresses over months or years. Instead, this paper enhances the accuracy of short-term SoC prediction by examining the effects of external factors, such as road conditions and driving patterns, on EV energy consumption. It is clarified that this study targets electric vehicles used for regular-route shuttle services, which exhibit consistent driving patterns. All predictions are only based on

SoC (t)

and external variables at time t so the error between

Δ SoC (t + τ)

and

Δ \hat{SoC} (t + τ)

does not accumulate over time, even if there exist estimation errors in

SoC (t)

or sensor measurements. In this paper, therefore,

SoC (t)

relies on Coulomb counting, which is a reasonable reference and a proper ground truth for deep learning models. In addition, the SoC level is recalibrated using the open-circuit voltage (OCV) and the manufacturer’s OCV table before each driving session begins to reset the accumulated bias. Similarly, commercial BMS, such as Orion BMS, estimate SoC by combining Coulomb counting with OCV calibration using an SoC-OCV lookup table.

An overview of the entire workflow of this work is presented in Figure 1. The workflow consists of data collection, preprocessing, model training, and prediction processes. The first step of the workflow is data collection to develop a NN model for SoC prediction. There are two methods for acquiring data for training the NN: one is from physically modeled simulations, and the other is from the real world. The simulation method is typically used for reinforcement learning. However, it is challenging to accurately replicate the real environment unless the dynamic characteristics of the EV and the geographical characteristics of the target course are accurately implemented. Instead of relying on theoretical dynamic models or simulations [11], real-world EV operation data is utilized to ensure the NN model captures complex and realistic driving patterns.

The next step is data preprocessing. The collected signals have different sampling frequencies. For instance, a typical GPS module provides positioning information every one to a few seconds, whereas vehicle speed and acceleration can vary much more rapidly. In contrast, ambient and battery temperatures change only slowly over time. Before training the NN, data with lower sampling rates are linearly interpolated to synchronize the time series. GPS information (latitude, longitude, altitude, speed, and heading), power-related variables (voltage and current), and environmental variables (temperature) were carefully synchronized to form a clean, structured time-series dataset suitable for model training. Additionally, all input features are normalized. The dataset was then divided into training, validation, and test sets to enable fair evaluation of the models’ performance [17]. The trained model targets the shuttle service vehicle, which has fixed, regular routes; it exploits spatial regularity and memorization. It is not a limitation but a practical solution, as this study aims to improve prediction performance under realistic operating conditions for shuttle service vehicles equipped with an embedded platform environment. The model is targeted for an embedded platform with a relatively small model size. Although it is true that the model must be trained for each shuttle service route, its small parameter count enables efficient, centrally controlled updates when the vehicle’s service route changes.

On the modeling side, two deep learning models were developed: one incorporating GPS information and the other excluding it. Both models shared identical data structures and training conditions, differing only in input variables. This allowed for a clear and direct assessment of the contribution of GPS information to SoC prediction performance. By isolating this factor, the workflow aims to rigorously validate the impact of GPS features on improving SoC prediction accuracy.

4. Electric Vehicle Implementation for Data Collection

A renewed lightweight EV, as shown in Figure 2 and previously used in [18], serves to collect data for this work. The EV was originally designed to exploit solar power as a partial power source for EVs. The battery pack, custom battery management system (BMS), and data collection system, including GPS, are redesigned and replaced to collect more accurate information, as described in Section 4.1 and Section 4.2. The EV is front-wheel driven. It is powered by two brushless DC (BLDC) motors, each rated at 1.5 kW (7.17 Nm torque) and a maximum power of 3.0 kW (14.43 Nm torque). The motor controllers are equipped with a regenerative braking system that enables partial recovery of kinetic energy during deceleration. Unlike the previous version, which harvested energy from solar power, the vehicle relies solely on battery power. The vehicle has no additional auxiliary loads such as heating, ventilation, and air conditioning (HVAC) or infotainment systems. This configuration ensures that all recorded energy consumption corresponds to traction power, simplifying the analysis and supporting the accurate training of the NN model. The EV accommodates up to 2 passengers, allowing for training the NN model under various load conditions. Its maximum speed is approximately 50 km/h, which is sufficient to replicate low- to medium-speed driving patterns similar to those of electric shuttle buses on campus. The custom BMS and data collection system are integrated to collect synchronized data on vehicle dynamics (speed, position, altitude) and battery states during operation.

4.1. Battery Pack

The battery pack for the vehicle is designed using newly manufactured cells to replace the aged original battery pack. The new battery pack consists of lithium iron phosphate (LiFePO₄) prismatic cells (LF80 [19]), chosen for their high thermal stability, long cycle life, and enhanced safety compared to other lithium-ion chemistries [20]. Each cell has a nominal capacity of 80 Ah. A total of 24 cells are connected in series, each with an operating voltage range of 2.65 V to 3.65 V, while its nominal voltage is 3.2 V. This configuration yields a nominal pack voltage of 76.8 V and a total energy capacity of approximately 6 kWh. The standard discharge rate of the cell is 1 C, while the maximum discharge rate of the cell is 3 C. The battery provides a maximum of 18 kW, which is sufficient to operate both motors of the EV, with a standard power of 3 kW and a maximum power of 6 kW, even under extreme driving conditions. Although no active cooling is employed, the relatively high thermal stability of LiFePO₄ cells and moderate discharge rates of the EV ensure safe operation without significant battery temperature concerns. The battery pack is newly built with fresh cells, so the state of health is nearly 100%. Additionally, battery aging is a long-term degradation process that typically progresses over months or years; its impact is negligible within the 1- or 10-min prediction horizon considered in this work. The battery pack is depicted in Figure 2.

4.2. Battery Management System (BMS)

The EV is equipped with an in-house BMS, as shown in Figure 2. A BMS typically monitors individual cell voltages, pack voltage, current, and temperature in real time, and provides protection against conditions such as cell-to-cell state-of-charge (SoC) imbalance, over-discharge, and over-charge [21]. In addition, it functions as a data acquisition unit and performs preliminary SoC estimation [22]. In this work, an in-house BMS is developed, since it is difficult to obtain long-term, detailed data, such as individual cell voltages, from commercial BMSs in real-time. Two analog front-end (AFE) LTC6804 devices for accurate cell-voltage measurement with a resolution of 0.1 mV [23], and a 32-bit microcontroller (MCU) STM32F405RGTX for data processing [24] are used to deal with 24 cells. The communication between the MCU and AFEs is performed via an isolated serial peripheral interface (ISOSPI) bus [25] with an electrical transformer. It enables reliable communication across multiple cells while maintaining galvanic isolation.

A secure digital (SD) card logging functionality is also integrated into the BMS, based on the generic FAT filesystem module (FATFS) library [26], enabling long-term data collection of voltage, current, temperature, and GPS information. For on-board SoC estimation, the BMS implements a conventional Coulomb counting method:

SoC (t) = SoC (t_{0}) - \frac{1}{C_{nom}} \int_{t_{0}}^{t} I (τ) d τ,

(2)

where

C_{nom}

is the nominal capacity (80 Ah) and

I (τ)

is the measured current.

In addition to SoC estimation, the BMS supports fundamental safety features including over-voltage, over-current, and over-temperature protection, as well as passive balancing to maintain cell-to-cell voltage consistency. For the safety functionality, mechanical relays with dry contacts are installed in the BMS. The relays can control the main contactor, which can handle the high voltage and high current of the pack. The integration of GPS data with battery parameters within the BMS enables synchronized recording of vehicle dynamics and battery states, providing the comprehensive dataset used for training and evaluation of the proposed prediction models. The SoC estimated by the BMS serves as the ground truth for

SoC (t)

in our deep learning models.

4.3. GPS Module

For GPS data collection, a u-blox NEO-6M GPS module [27] is used, which is a widely adopted high-performance solution for embedded systems. It is configured to 1 Hz for data logging and post-processing, although the module supports an update rate of up to 5 Hz. The manufacturer specifies accuracies of 2.5 m for horizontal positioning, 0.1 m/s for velocity, and 0.5° for heading. Due to the clear-sky conditions along the driving path, the module is expected to find the satellite in 27 s. Raw data is logged as the national marine electronics association 0183 (NMEA-0183) protocol [28]. Real-time kinematic (RTK) positioning is not employed in this study, as the GPS precision is enough for the scale of the driving path. The technical specifications of the module are summarized in Table 2.

5. Deep Learning Models

This study applies three representative deep learning architectures for time-series prediction of

Δ \hat{SoC} (t + τ)

: the GRU, the LSTM, and the Transformer. The GRU is a recurrent NN that learns temporal dependencies using gating mechanisms [29]. The GRU architecture was adopted due to its ability to model temporal dependencies while using fewer parameters than LSTM, enabling efficient sequence learning. The LSTM is designed to capture long-term dependencies through input, forget, and output gates [30]. The LSTM network, originally introduced to address the vanishing gradient problem and capture long-term dependencies, was used as a baseline recurrent model.

Transformer models address the limitations of recurrent architectures by capturing long-range dependencies more effectively through self-attention [31]. Unlike RNN-based models, they process the entire input sequence in parallel and do not rely on recurrence, enabling better scalability to long-horizon forecasting tasks. The input features are projected into an embedding space and combined with positional encodings to retain sequential information. The Transformer encoder, composed of multi-head self-attention layers and feed-forward networks, has shown strong performance in long-term time-series prediction [32].

The SoC prediction problem is formulated as a sequence-to-one regression task. The model

f_{θ} (X_{t}) = {\hat{y}}_{t + τ}

predicts the future SoC change

{\hat{y}}_{t + τ}

given an input window

X_{t} = [x_{t - L + 1}, \dots, x_{t}] \in R^{L \times d}

, where L,

τ

,

θ

, and d denote the input sequence length, prediction horizon, the model parameters, and number of total input features, respectively. The input features consist of SoC, each of the 24 cell voltages, temperature, current, and five GPS variables (latitude, longitude, altitude, speed, and heading). The target output is the future-ahead SoC change,

Δ \hat{SoC} (t + τ)

. In this work, the model is trained to predict one- and ten-minute horizons SoC with a 60-s input sequence length. Once

Δ \hat{SoC} (t + τ) = {\hat{y}}_{t + τ}

is predicted, future-ahead

\hat{SoC} (t + τ)

is calculated as declared in Equation (1).

All models shared the same preprocessing and training procedures. Feature-wise min–max normalization is applied using statistics from the training set and reused for the validation and test sets. The test set is constructed by selecting the longest driving log, not merely for its duration, but because it covers the widest SoC range and includes diverse driving patterns, such as repeated stop-and-go events and steady-speed segments. This enables a more realistic evaluation. Additionally, the dataset is partitioned at the operation level to avoid temporal leakage that could arise from random sample-level partitioning. The models are trained to minimize the mean squared error (MSE) between predicted and ground truth values, with early stopping applied to prevent overfitting. For each model, two configurations were trained and evaluated under identical conditions: (1) a GPS-enabled version using all 32 input features, and (2) a GPS-ablated version excluding the five GPS channels (

d = 27

). This comparison isolates the contribution of GPS information to short-term ΔSoC prediction accuracy. The hyperparameter settings are summarized in Table 3.

6. Experimental Setup

The driving conditions are carefully designed to prevent the model from overfitting to specific driving patterns by intentionally introducing a variety of driving scenarios. For this purpose, several parameters are systematically varied. The driving course, depicted in Figure 3, is a closed-loop circuit designed to emulate a shuttle vehicle operation. The course includes a balanced combination of uphill, downhill, and flat segments.

Each driving session continues under various battery SoC levels, allowing data to be collected over a wide range of operating conditions. The number of passengers is varied to simulate different load conditions, and the vehicle is occasionally guided to deviate from the designated route to diversify GPS trajectories. Intermittent stops are introduced during driving to emulate real-world shuttle driving behavior and sudden load variations. Furthermore, the experiments alternate between driving with and without charging to reflect irregular patterns of energy consumption. The distribution of variables such as different passenger numbers, route deviations, and parking duration mainly affects vehicle energy consumption by altering electrical behavior, such as current and voltage profiles, which are directly included as model inputs. Therefore, their influence is implicitly captured in the dataset. Even though the route was fixed due to the shuttle service scenario, the factors were not synchronized across drivings. Therefore, although the experimental design is not fully orthogonal, it provides sufficient practical diversity for evaluating the proposed model under realistic operating conditions. Consequently, these strategies ensure that the collected signals, including GPS, voltage, current, and SoC, capture realistic variations across multiple driving conditions. A total of 15 driving sessions are conducted for Course A, each lasting between 1 and 4 h, resulting in more than 35 h of real-world driving data. Additionally, a 30-min driving session is conducted for Course B to validate the effectiveness of the proposed process on a separate route.

For model training, the input sequence

X_{t}

is constructed using a sliding window of 60 s:

X_{t} = [x_{t - 59}, \dots, x_{t}] \in R^{60 \times d} .

(3)

This sequence provides temporal context over the past 60 s, enabling the model to learn the relationship between past operating conditions and future SoC changes. The corresponding label

y_{t + 60}

represents the change in SoC in the next 60 s. By leveraging temporal dependencies, the model can learn patterns such as load fluctuations, driving dynamics, and energy usage trends that are not captured by single time-point inputs.

Before model training, comprehensive data preprocessing is executed, especially from the GPS module. Algorithm 1 summarizes the preprocessing and alignment procedure used to construct time-synchronized multiple inputs. First, for all data sets, timestamps are converted to monotonic time to correct potential timestamp discontinuity between multiple driving sessions. Raw GPS data is first parsed and decoded to extract feature tuples consisting of latitude, longitude, altitude, speed, and heading. After gathering meaningful information from the raw data, interpolation and smoothing are conducted, since GPS signals contain intermittent missing values and spikes due to GPS limitations.

First, invalid points are masked based on speed and acceleration thresholds of the EV, and Hampel filtering [33] is used to remove spikes. Then, missing segments shorter than a predefined gap are linearly interpolated, while longer gaps remain unfilled to avoid artificial data. After that, a Savitzky–Golay filter [34] is then applied to smooth the trajectory. Finally, altitude, speed, and heading were interpolated using Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) [35] for smooth nonlinear interpolation and circular interpolation for heading angles.

Since each sensor had a different sampling rate (e.g., voltage: 220 ms/SoC, current, and temperature: 10 ms/GPS: 1 s), the GPS timestamp is chosen as the reference. Using the processed GPS sequence as the temporal reference, BMS signals including voltage, current, temperature, and SoC are aligned via nearest-neighbor matching within a predefined tolerance (

\pm 50

ms for voltage,

\pm 15

ms for SoC, and

\pm 15

ms for current, respectively). The moving windows (size 5) are applied to the acquired battery voltage and current data. This approach ensures data size and consistent time alignment across all features.

For Course A, each driving record is treated as an independent sequence source, and data splitting is performed at the file level to prevent temporal leakage. Because the length of each driving session varies, the number of raw samples (rows) differs across files. For a sequence length of 60 s, a file with N rows produces

N - 59

training samples. Table 4 shows the detailed information on dataset composition. For Course B, a single driving data set is included to augment the Course A data set and train a new model. Each input sequence consists of 60 s of data sampled at 1 Hz.

Algorithm 1: Preprocessing and alignment procedure to construct time-synchronized multi-sensor inputs

7. Experimental Results

Three model architectures—GRU, LSTM, and Transformer—are trained and evaluated under two configurations: (1) without GPS data (voltage, current, temperature, and SoC only), and (2) with GPS data (including latitude, longitude, altitude, speed, and heading). The models are implemented in PyTorch (v2.7.1) and optimized using the Adam optimizer with early stopping based on validation loss. Model performance is assessed using the Mean Absolute Error (MAE) for both one-minute and ten-minute SoC predictions.

For all models, the optimal number of layers and hidden sizes are determined to maximize model performance. The grid search is performed manually for the GRU and LSTM models. In contrast, the Transformer model employs Optuna for automatic hyperparameter optimization [36], where parameters such as

d_{model}

, number of attention heads, feedforward dimension, and dropout rate are tuned. Due to the large hyperparameter space of the Transformer architecture, exhaustive manual tuning (e.g., varying only the number of layers and hidden dimensions) is not practical. The reported Transformer result corresponds to the best-performing configuration selected via Optuna-based hyperparameter optimization after evaluating multiple candidate architectures. This procedure ensures that the Transformer baseline is not under-tuned or arbitrarily configured, but instead represents a competitive state-of-the-art sequence modeling approach. The optimal Transformer configuration selected by Optuna achieves

d_{model} = 128

,

n_{head} = 8

, and a feedforward dimension of 256 with a dropout rate of 0.17156254, as shown in Table 5.

7.1. One-Minute Prediction Results for Course A

Table 6 summarizes the performance comparison in terms of MAE and root mean squared error (RMSE) for one-minute SoC prediction for Course A. In the case of mean absolute percentage error (MAPE), it exhibits excessively large values due to near-zero values of

Δ \hat{S o C}

in the sample. Therefore, MAE (and RMSE) are used as the primary performance metrics. Across all architectures, the inclusion of GPS data consistently reduced prediction error. The best GRU performance is achieved with a single layer and a hidden size of 16, yielding a mean absolute error (MAE) of 0.3017 with GPS compared with 0.3918 without GPS, representing an improvement of approximately 23%. Similarly, the best LSTM configuration (two layers and a hidden size of 16) achieves an MAE of 0.2988 with GPS, outperforming the version without GPS (0.3456) by approximately 13.6%. The Transformer model, tuned using Optuna, achieves an MAE of 0.3592 with GPS, which outperforms the result without GPS (0.4312), showing a relative improvement of 16.7%. The time-series prediction

Δ \hat{SoC} (t + 60)

result is illustrated in Figure 4. As shown, the model with GPS information more closely follows the actual

Δ \hat{SoC} (t + 60)

compared with the model without GPS data. The one-minute performance comparison under the low-SoC condition (20–40%) for Course A is summarized in Table 7.

7.2. Ten-Minute Prediction Results for Course A

Ten-minute ahead SoC prediction for Course A results are summarized in Table 8. As the prediction horizon increases, overall accuracy also decreases. However, models trained with GPS information still maintain superior accuracy, indicating a substantial improvement in longer horizon prediction accuracy. For the GRU model, the best configuration with GPS information (two layers, hidden size 16) achieves a MAE of 0.6741, representing a 76% reduction compared with the model without GPS information (2.8065). The LSTM model demonstrates the most substantial improvement, with the best configuration (two layers, hidden size 16) achieving an MAE of 0.3887 with GPS information, compared with 1.2679 without GPS information, representing an improvement of approximately 69.3%. The Transformer also benefits from GPS data, reducing the MAE from 1.6327 to 0.7070 in the configuration with one layer and 16 hidden units, a 56.7% reduction in prediction error. These results demonstrate that incorporating GPS data improves ten-minute SoC prediction accuracy by roughly 56.7–76%, depending on the model architecture. In detail, Figure 5 and Figure 6 illustrate the prediction results over time. Furthermore, the absolute error distribution graph in Figure 7 illustrates that the probability density is significantly concentrated within the low-absolute error range, demonstrating the high predictive accuracy and reliability of the proposed model. Without GPS information, the model tends to be inaccurate, particularly when the remaining battery SoC is low. In contrast, the model with GPS information closely follows the actual SoC trajectory, resulting in a more accurate ten-minute prediction. The ten-minute prediction performance comparison under the low-SoC condition (20–40%) for Course A is summarized in Table 9.

7.3. Ablation Study on GPS Features

On the other hand, the ablation study is performed to investigate the contribution of individual GPS features to one-minute and ten-minute SoC prediction, as shown in Table 10. In each evaluation, a specific GPS component from the input features is removed. The resulting performance is compared with both the full GPS model and a configuration without any GPS information. For the one-minute prediction task using the LSTM (2,16) model, removing latitude and longitude increases the MAE from 0.2988 to 0.3290. Removing speed also leads to noticeable degradation. In contrast, removing altitude and heading results in only marginal performance changes. These results indicate that GPS information provides a moderate performance benefit for one-minute horizon SoC prediction, while no single GPS component is critically dominant. For the ten-minute prediction task using the GRU (2,16) model, the impact of GPS features becomes substantially more pronounced. Removing latitude and longitude results in the most severe degradation, increasing the MAE from 0.6741 to 1.7162. When all GPS features are removed, the MAE increases to 2.8065, highlighting the critical importance of GPS information for reliable long-term SoC prediction. Overall, the ablation results demonstrate that the contribution of GPS features strongly depends on the prediction horizon. While one-minute SoC prediction benefits moderately from GPS inputs, ten-minute horizon prediction relies more heavily on spatial information, particularly latitude and longitude, and degrades severely in the absence of GPS features.

7.4. Computational Costs of the Proposed Models

Table 11 summarizes the computational cost of the proposed models. The one-minute-ahead LSTM model exhibits a mean inference latency of 0.719 ms with 5425 parameters, while the 10-min-ahead GRU model requires 2.671 ms with 4081 parameters. Although the GRU model shows a slightly higher latency due to the longer prediction horizon, its model size remains extremely compact (15.6 KB). The latency was measured on a PC running an Intel i5-7500 CPU using a Python (v3.12.4) environment. Generally, the i5-7500 CPU is approximately 200 times faster than MCUs, such as the ARM Cortex-M7 (e.g., STM32F746ZGT). However, considering that C-compiled code typically executes at least ten times faster than Python, the estimated latency on an embedded platform would be no more than 20 times that measured on the PC. Typical battery management system (BMS) control cycles range from 100 ms to 1 s; the estimated embedded latency (approx. 14.4 ms to 53.4 ms) is well within the real-time constraints. Therefore, the proposed model is expected to operate within embedded BMS platforms.

7.5. Course B Results

On Course A, the model achieved an MAE of 0.2988 and an RMSE of 0.3568. When evaluated on Course B, the prediction errors increased even after retraining, reflecting the higher variability and complexity of the new driving environment. Specifically, the GPS-enabled model achieved an MAE of 0.3743 and an RMSE of 0.4754, whereas the GPS-disabled model yielded an MAE of 0.5316 and an RMSE of 0.6209 on Course B. Figure 8 shows that the model with GPS follows the ground truth more accurately compared to the model without GPS, incorporating GPS information reduced the MAE by approximately 29.6% and the RMSE by 23.4% on Course B. These results indicate that although the new track introduces more challenging driving patterns, GPS features consistently improve prediction accuracy by capturing driving context information, such as vehicle speed variations and road elevation changes. Consequently, GPS information enhances the SoC prediction model even when the training data are augmented with newly collected driving data.

Overall, the results show that integratGPS information reduced the MAing GPS-based contextual features consistently reduces prediction error for both one- and ten-minute SoC prediction horizons. Furthermore, models regardless of GPS input commonly exhibit an increased error rate toward the end of the time series. This suggests that the models tend to overfit to temporal stamps or SoC levels rather than capturing the specific characteristics of the driving course or external environment. The observed improvements are more evident in longer prediction horizons, particularly under dynamic driving conditions.

8. Conclusions

This study presents a deep learning-based framework for predicting the future State of Charge (SoC) in electric vehicles, especially for the shuttle service, with a particular focus on the impact of incorporating GPS-derived information. By collecting real-world data from a self-built electric vehicle equipped with a 24-cell LiFePO₄ battery pack and a custom-designed battery management system (BMS), it is demonstrated that contextual GPS features such as speed, location, and altitude can significantly enhance prediction accuracy.

Three representative neural architectures—GRU, LSTM, and Transformer—are trained and evaluated under both configurations with and without GPS data. Across all models, the inclusion of GPS data leads to notable MAE improvements in SoC prediction performance: up to 23% for one-minute and up to 76% for ten-minute predictions. These results indicate that environmental features, such as spatial and velocity cues derived from GPS, are correlated with vehicle power consumption, thereby contributing to improved forecasting performance of future battery states. The findings highlight the practical effectiveness of integrating GPS information into data-driven SoC prediction frameworks for electric shuttle buses and similar route-based applications. Beyond performance improvements, such models enhance the scope for predictive energy management strategies that support optimal route planning, adaptive power control, and proactive charging scheduling.

Although not the primary focus of this work, integrating physics-informed (e.g., electrochemical) or hybrid modeling approaches could provide a complementary solution for extending the proposed framework to non-repetitive driving scenarios. Furthermore, our approach focuses on a small-footprint model that fits each individual driving course rather than a generalized feature. Consequently, retraining is required when deploying the model to entirely new driving routes or new vehicles. Also, it does not predict the long-term future changes, such as battery capacity degradation (e.g., State of Health, SoH). Integrating such long-term degradation models with our short-term SoC prediction, as well as developing a generalized model that minimizes retraining, remains a promising direction for future research.

Author Contributions

Conceptualization, D.S. and J.K.; methodology, J.K.; software, W.J. and K.P.; validation, W.J. and J.K.; formal analysis, J.K.; investigation, W.J. and K.P.; resources, D.S. and J.K.; data curation, D.S. and J.K.; writing—original draft preparation, W.J., K.P. and J.K.; writing—review and editing, D.S. and J.K.; visualization, W.J. and K.P.; supervision, D.S. and J.K.; project administration, D.S. and J.K.; funding acquisition, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2021R1F1A1060208 and RS-2023-00208046) and the Starting growth Technological R&D Program (TIPS Program, (No. RS-2023-00261771)) funded by the Ministry of SMEs and Startups (MSS, Korea) in 2026.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

This manuscript includes content that will also appear in the author’s graduate thesis.

Conflicts of Interest

Author Kyuyong Park was employed by the company dSPECTER. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Lin, S.L. Deep learning-based state of charge estimation for electric vehicle batteries: Overcoming technological bottlenecks. Heliyon 2024, 16, e35780. [Google Scholar] [CrossRef] [PubMed]
Rahman, M.A.; Holt, D.; Farajpour, Y.; Mammeri, A.; Khiabani, H. Predictive modeling of energy demands for battery electric buses using real-world data. Energy Inform. 2025, 8, 64. [Google Scholar] [CrossRef]
Zhu, Q.; Huang, Y.; Lee, C.F.; Liu, P.; Zhang, J.; Wik, T. Predicting Electric Vehicle Energy Consumption from Field Data Using Machine Learning. IEEE Trans. Transp. Electrif. 2025, 11, 2120. [Google Scholar] [CrossRef]
Mądziel, M. State of Charge Prediction for Li-Ion Batteries in EVs for Traffic Microsimulation. Energies 2025, 18, 4992. [Google Scholar] [CrossRef]
Huang, Y.; Wik, T.; Finegan, D.; Li, Y.; Zou, C. Learning chemical potentials and parameters from voltage data for multi-phase battery modeling. ChemRxiv 2025. [Google Scholar] [CrossRef]
Guo, W.; Vilsen, S.B.; Li, Y.; Verma, A.; Stroe, D.I.; Brandell, D. Uncovering the impact of battery design parameters on health and lifetime using short charging segments. Energy Environ. Sci. 2025, 18, 8462–8474. [Google Scholar] [CrossRef]
Liu, K.; Yamamoto, T.; Morikawa, T. Impact of road gradient on energy consumption of electric vehicles. Transp. Res. Part D Transp. Environ. 2017, 54, 74–83. [Google Scholar] [CrossRef]
El Fallah, S.; Kharbach, J.; Vanagas, J.; Vilkelytė, Ž.; Tolvaišienė, S.; Gudžius, S.; Kalvaitis, A.; Lehmam, O.; Masrour, R.; Hammouch, Z.; et al. Advanced State of Charge Estimation Using Deep Neural Network, Gated Recurrent Unit, and Long Short-Term Memory Models for Lithium-Ion Batteries under Aging and Temperature Conditions. Appl. Sci. 2024, 14, 6648. [Google Scholar] [CrossRef]
Ariche, S.; Boulghasoul, Z.; El Ouardi, A.; Elbacha, A.; Tajer, A.; Espié, S. A Comparative Study of Electric Vehicles Battery State of Charge Estimation Based on Machine Learning and Real Driving Data. JLPEA 2024, 14, 59. [Google Scholar] [CrossRef]
Petkevicius, L.; Saltenis, S.; Civilis, A.; Torp, K. Probabilistic Deep Learning for Electric-Vehicle Energy-Use Prediction. In Proceedings of the 17th International Symposium on Spatial and Temporal Databases (SSTD 2021); ACM: New York, NY, USA, 2021; pp. 85–95. [Google Scholar] [CrossRef]
Hong, S.; Hwang, H.; Kim, D.; Cui, S.; Joe, I. Real Driving Cycle-Based State of Charge Prediction for EV Batteries Using Deep Learning Methods. Appl. Sci. 2021, 11, 11285. [Google Scholar] [CrossRef]
Tang, P.; Jiang, M.; Xu, W.; Ding, Z.; Lv, M. Prediction of lithium-ion battery SOC based on IGA-GRU and the fusion of multi-head attention mechanism. Energy Inform. 2024, 7, 147. [Google Scholar] [CrossRef]
Obuli, P.D.; Preethem, S.B.; Indragandhi, V.; Vedhanayaki, S. Enhanced SOC estimation of lithium ion batteries with Real-Time data using machine learning algorithms. Sci. Rep. 2024, 14, 16036. [Google Scholar] [CrossRef]
De Cauwer, C.; Verbeke, W.; Coosemans, T.; Faid, S.; Van Mierlo, J. A Data-Driven Method for Energy Consumption Prediction and Energy-Efficient Routing of Electric Vehicles in Real-World Conditions. Energies 2017, 10, 608. [Google Scholar] [CrossRef]
NaitMalek, Y.; Najib, M.; Lahlou, A.; Bakhouya, M.; Gaber, J.; Essaaidi, M. A Hybrid Approach for State-of-Charge Forecasting in Battery-Powered Electric Vehicles. Sustainability 2022, 14, 9993. [Google Scholar] [CrossRef]
Liu, C.; Pei, L. Accurate Multi-Step State of Charge Prediction for Electric Vehicle Batteries Using the Wavelet-Guided Temporal Feature Enhanced Informer. Appl. Sci. 2025, 15, 1431. [Google Scholar] [CrossRef]
El-Sayed, E.I.; ElSayed, S.K.; Alsharef, M. Data-Driven Approaches for State-of-Charge Estimation in Battery Electric Vehicles Using Machine and Deep Learning Techniques. Sustainability 2024, 16, 9301. [Google Scholar] [CrossRef]
Kim, J.; Baek, D.; Hong, J.; Chang, N. Partially solar powered full electric vehicles. In Proceedings of the 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS); IEEE: Piscataway, NJ, USA, 2014; pp. 358–361. [Google Scholar] [CrossRef]
EVE Energy Co., Ltd. LF80 LiFePO₄ Prismatic Battery Cell (80 Ah) Specifications. Available online: https://www.energy-x.org/lifepo4-cell/eve-lf80-80ah-lifepo4-battery-cell.html (accessed on 8 February 2026).
Chen, T.; Li, M.; Bae, J. Recent Advances in Lithium Iron Phosphate Battery Technology: A Comprehensive Review. Batteries 2024, 10, 424. [Google Scholar] [CrossRef]
Habib, A.K.M.A.; Hasan, M.K.; Issa, G.F.; Singh, D.; Islam, S.; Ghazal, T.M. Lithium-Ion Battery Management System for Electric Vehicles: Constraints, Challenges, and Recommendations. Batteries 2023, 9, 152. [Google Scholar] [CrossRef]
Sarda, J.; Patel, H.; Popat, Y.; Hui, K.L.; Sain, M. Review of Management System and State-of-Charge Estimation Methods for Electric Vehicles. World Electr. Veh. J. 2023, 14, 325. [Google Scholar] [CrossRef]
Analog Devices, Inc. LTC6804-1/LTC6804-2 Multicell Battery Monitor. Available online: https://www.analog.com/en/products/ltc6804-1.html (accessed on 8 February 2026).
STMicroelectronics. STM32F405RG High-Performance ARM Cortex-M4 Microcontroller. Available online: https://www.st.com/en/microcontrollers-microprocessors/stm32f405rg.html (accessed on 8 February 2026).
Analog Devices, Inc. Isolated SPI Communication Made Easy (isoSPI). Available online: https://www.analog.com/en/resources/technical-articles/isolated-spi-communication-made-easy.html (accessed on 8 February 2026).
ChaN. FAT File System Module (FATFS). Available online: https://elm-chan.org/fsw/ff/ (accessed on 8 February 2026).
u-blox AG. NEO-6 GPS Modules Data Sheet. Available online: https://www.u-blox.com/sites/default/files/products/documents/NEO-6_DataSheet_%28GPS.G6-HW-09005%29.pdf (accessed on 8 February 2026).
National Marine Electronics Association (NMEA). NMEA 0183 Standard. Available online: https://www.nmea.org/nmea-0183.html (accessed on 8 February 2026).
Ahmadzadeh, E.; Kim, H.; Jeong, O.; Kim, N.; Moon, I. A Deep Bidirectional LSTM-GRU Network Model for Automated Ciphertext Classification. IEEE Access 2022, 10, 3228–3237. [Google Scholar] [CrossRef]
Krichen, M.; Mihoub, A. Long Short-Term Memory Networks: A Comprehensive Survey. AI 2025, 6, 215. [Google Scholar] [CrossRef]
Fan, J.; Wang, B.; Bian, D. TEDformer: Temporal Feature Enhanced Decomposed Transformer for Long-Term Series Forecasting. IEEE Access 2025, 13, 120821–120829. [Google Scholar] [CrossRef]
Wang, N.; Zhao, X. Enformer: Encoder-Based Sparse Periodic Self-Attention Time-Series Forecasting. IEEE Access 2023, 11, 112004–112014. [Google Scholar] [CrossRef]
Hampel, F.R. The Influence Curve and Its Role in Robust Estimation. J. Am. Stat. Assoc. 1974, 69, 383–393. [Google Scholar] [CrossRef]
Sadeghi, M.; Behnia, F.; Amiri, R. Window Selection of the Savitzky–Golay Filters for Signal Recovery from Noisy Measurements. IEEE Trans. Instrum. Meas. 2020, 69, 5418–5427. [Google Scholar] [CrossRef]
He, J.; Yuan, L.; Lei, H.; Wang, K.; Weng, Y.; Gao, H. A Novel Piecewise Cubic Hermite Interpolating Polynomial-Enhanced Convolutional Gated Recurrent Method under Multiple Sensor Feature Fusion for Tool Wear Prediction. Sensors 2024, 24, 1129. [Google Scholar] [CrossRef]
Kumar Sahu, P.; Fatma, T. Optimized Breast Cancer Classification Using PCA-LASSO Feature Selection and Ensemble Learning Strategies With Optuna Optimization. IEEE Access 2025, 13, 35645–35661. [Google Scholar] [CrossRef]

Figure 1. An overview of the proposed process of SoC prediction, including data collection, modeling, and analysis.

Figure 2. Electric vehicle for data collection: custom electric vehicle used to collect real-world data, battery pack for the EV, and in-house battery management system for the EV.

Figure 3. Driving courses A and B, used for data collection and evaluation in the experiment. To ensure route-specific optimization, a separate model is trained for each course.

Figure 4. One-minute look-forward

Δ \hat{SoC} (t + 60)

Course A prediction result of LSTM model (two-layer with 16 hidden size). Note that the Y-axis scales differ to emphasize the variation details at each horizon.

Figure 4. One-minute look-forward

Δ \hat{SoC} (t + 60)

Course A prediction result of LSTM model (two-layer with 16 hidden size). Note that the Y-axis scales differ to emphasize the variation details at each horizon.

Figure 5. Ten-minute look-forward

Δ \hat{SoC} (t + 600)

Course A prediction result of GRU model (two-layer with hidden size 16). Note that the Y-axis scales differ to emphasize the variation details at each horizon.

Figure 5. Ten-minute look-forward

Δ \hat{SoC} (t + 600)

Course A prediction result of GRU model (two-layer with hidden size 16). Note that the Y-axis scales differ to emphasize the variation details at each horizon.

Figure 6. Ten-minute look-forward

\hat{SoC} (t + 600)

prediction.

Figure 6. Ten-minute look-forward

\hat{SoC} (t + 600)

prediction.

Figure 7. The absolute-error distribution graphs of one- and ten- minute look-forward prediction

Δ \hat{SoC}

of Course A.

Figure 7. The absolute-error distribution graphs of one- and ten- minute look-forward prediction

Δ \hat{SoC}

of Course A.

Figure 8. One- and ten- minute look-forward

Δ \hat{SoC}

prediction results for Course B. The model trained with GPS data shows more accurate tracking of the ground truth (black line). Note that the Y-axis scales differ to emphasize the variation details at each horizon.

Figure 8. One- and ten- minute look-forward

Δ \hat{SoC}

prediction results for Course B. The model trained with GPS data shows more accurate tracking of the ground truth (black line). Note that the Y-axis scales differ to emphasize the variation details at each horizon.

Table 1. Qualitative comparison of representative SoC prediction studies, including our proposed method.

Study (Year)	Input Features	Target	Horizon	Scenario	Model
NaitMalek (2022) [15]	Speed, Accel, Road type,	Future SoC	Up to 180 s	Urban	Hybrid
NaitMalek (2022) [15]	GPS, Weather, Traffic		Up to 180 s	driving	(LSTM + Phys.)
Hong (2021) [11]	V, I, Temp, Predicted speed	Future SoC	Next cycle	Real cycle	TA-LSTM +
Hong (2021) [11]	(GPS used for segmentation)		Next cycle	Real cycle	FMI Simul.
Liu (2025) [16]	SoC, V, I, Driver behavior,	Multi-step	60 s/1200 s	Real-driving	WG-TFE-Informer
Liu (2025) [16]	Temp, Terrain, Weather	SoC	60 s/1200 s	(1000 EVs)	WG-TFE-Informer
Our Work	SoC, V, I, Temp,	Future SoC	60 s/600 s	Real-driving	GRU, LSTM,
Our Work	GPS (Directly)	(by ΔSoC)	60 s/600 s	shuttle	Transformer

Table 2. Technical specifications of the U-blox NEO-6M GPS module.

Parameter	Specification
Receiver type	50-channel u-blox 6 engine GPS L1 frequency
Navigation update rate	Up to 5 Hz
Horizontal position accuracy	2.5 m
Velocity accuracy	0.1 m/s
Heading accuracy	0.5°
Sensitivity (Tracking)	−161 dBm
Time-To-First-Fix (Cold Start)	27 s
Supply voltage	2.7 V to 3.6 V

Table 3. Hyperparameter settings used for the deep learning.

Hyperparameter	Value
Sequence Length (SEQ_LEN)	60
Epochs	300
Early Stopping Patience	30
Dropout	0.2
Learning Rate	$1 \times 10^{- 4}$
Batch Size	256
Optimizer	Adam
Loss Function	MSELoss
Validation Split	10%
Normalization	Min–Max
Device	Geforce RTX 5060ti
Random Seed	42

Table 4. Detailed dataset composition of Course A after preprocessing.

Split	Files Used	#Files	Rows per File	Total Rows	Time Step	Sequences
Training	3–15	13	784–8045	35,503	60 s	34,736
Validation	2	1	8259	8259	60 s	8200
Test	1	1	9441	9441	60 s	9382
Total	1–15	15	784–9441	53,203	–	52,318

Table 5. Hyperparameters and search space for Transformer optimized by Optuna.

Hyperparameter	Search Space/Distribution	Best Value
$d_model$	{32, 64, 128}	128
$n_head$	{2, 4, 8}	8
$num_layers$	{1, 2, 3, 4}	4
$\dim_feedforward$	{64, 128, 256}	256
dropout	Uniform [0.1, 0.5]	0.17156254
lr	LogUniform [ $10^{- 5}$ , $10^{- 3}$ ]	0.00024761
$batch_size$	{128, 256, 512}	256

Table 6. Accuracy comparison of one-minute SoC prediction for Course A. The results are MAE and RMSE, where lower values indicate better accuracy.

Layer	Hidden	GRU (MAE)		GRU (RMSE)		LSTM (MAE)		LSTM (RMSE)
Layer	Hidden	w GPS	wo GPS	w GPS	wo GPS	w GPS	wo GPS	w GPS	wo GPS
1	16	0.3017	0.3918	0.3801	0.4723	0.3607	0.3888	0.4405	0.4733
	32	0.3066	0.5177	0.3824	0.6460	0.3121	0.3820	0.3710	0.4601
	64	0.3142	0.3358	0.3935	0.3994	0.3182	0.4235	0.3821	0.5167
2	16	0.3200	0.3385	0.3921	0.4032	0.2988	0.3456	0.3568	0.4069
	32	0.3164	0.3385	0.3873	0.3982	0.3167	0.3432	0.3889	0.4068
	64	0.3330	0.3377	0.4043	0.4067	0.3152	0.3404	0.3870	0.4045
3	16	0.3440	0.3564	0.4209	0.4294	0.3121	0.3397	0.3891	0.4024
	32	0.3337	0.3481	0.4129	0.4185	0.3121	0.3397	0.3891	0.4027
	64	0.3195	0.3939	0.3755	0.4757	0.3121	0.3397	0.3891	0.4024
Layer	Hidden	Trans. (MAE)		Trans. (RMSE)
Layer	Hidden	w GPS	wo GPS	w GPS	wo GPS
4	–	0.3592	0.4312	0.4298	0.5271

Table 7. Accuracy comparison of one-minute SoC prediction in low-SoC state (20–40%) for Course A. The results are MAE and RMSE, where lower values indicate better accuracy.

Layer	Hidden	GRU (MAE)		GRU (RMSE)		LSTM (MAE)		LSTM (RMSE)
Layer	Hidden	w GPS	wo GPS	w GPS	wo GPS	w GPS	wo GPS	w GPS	wo GPS
1	16	0.4294	0.4082	0.5234	0.4944	0.4213	0.4945	0.5327	0.5975
	32	0.4264	0.8581	0.5280	0.9794	0.3152	0.4173	0.3682	0.5131
	64	0.4481	0.3649	0.5475	0.4335	0.3854	0.5341	0.4500	0.6407
2	16	0.3334	0.3497	0.4330	0.4169	0.3263	0.3563	0.3837	0.4117
	32	0.3726	0.3521	0.4614	0.4166	0.3798	0.3631	0.4541	0.4299
	64	0.3574	0.3445	0.4485	0.3998	0.3591	0.3605	0.4309	0.4318
3	16	0.4051	0.3721	0.4917	0.4532	0.3333	0.3442	0.4081	0.4058
	32	0.4091	0.3780	0.4979	0.4575	0.3337	0.3445	0.4088	0.4060
	64	0.3393	0.5399	0.3961	0.6370	0.3385	0.3499	0.4122	0.4121
Layer	Hidden	Trans. (MAE)		Trans. (RMSE)
Layer	Hidden	w GPS	wo GPS	w GPS	wo GPS
4	–	0.4529	0.7007	0.5426	0.7906

Table 8. MAE and RMSE results for Ten-minute SoC prediction for Course A. Incorporating GPS data consistently improves longer-horizon prediction performance.

Layer	Hidden	GRU (MAE)		GRU (RMSE)		LSTM (MAE)		LSTM (RMSE)
Layer	Hidden	w GPS	wo GPS	w GPS	wo GPS	w GPS	wo GPS	w GPS	wo GPS
1	16	0.7706	0.9005	1.0120	1.1378	0.5810	1.5490	0.8159	2.4291
	32	1.4106	1.5049	1.7867	1.9332	1.1080	1.1770	1.4145	1.5510
	64	1.5110	2.9054	1.9100	4.0237	1.3745	2.3171	1.7444	3.4544
2	16	0.6741	2.8065	0.9311	3.9276	0.3887	1.2679	0.5500	1.6161
	32	1.0403	0.7022	1.3285	0.9102	0.4605	1.0242	0.6118	1.3876
	64	0.6982	2.6268	0.9267	3.9125	0.5151	1.1278	0.6488	1.5192
3	16	1.3500	1.4122	1.8747	0.5822	0.4206	1.4023	0.5394	0.5780
	32	1.1038	1.3822	1.4115	0.5158	0.4485	1.8702	0.6000	2.4119
	64	1.4429	1.9351	1.8350	3.2006	1.3741	1.4381	2.0132	1.9285
Layer	Hidden	Trans. (MAE)		Trans. (RMSE)
Layer	Hidden	w GPS	wo GPS	w GPS	wo GPS
4	–	0.7070	1.6327	0.9323	2.241

Table 9. MAE and RMSE results for Ten-minute SoC prediction in low-SoC state (20–40%) for Course A. Incorporating GPS data consistently improves longer-horizon prediction performance.

Layer	Hidden	GRU (MAE)		GRU (RMSE)		LSTM (MAE)		LSTM (RMSE)
Layer	Hidden	w GPS	wo GPS	w GPS	wo GPS	w GPS	wo GPS	w GPS	wo GPS
1	16	1.1052	1.5392	1.2902	1.6888	0.9506	5.3640	1.3913	5.4574
	32	2.7704	3.3020	2.8252	3.3792	2.0791	2.7610	2.2737	2.8311
	64	3.0423	7.7592	3.1200	7.8363	2.8629	7.3916	2.9458	7.5199
2	16	1.2493	7.8053	1.3889	7.9371	0.3764	2.5808	0.5102	2.6223
	32	2.0623	1.3537	2.1522	1.4431	0.4049	2.3879	0.5363	2.4338
	64	1.0202	8.4447	1.1926	8.5934	0.8248	2.8617	0.9216	2.9611
3	16	3.3324	0.3821	3.5413	0.5508	0.5363	0.3872	0.6031	0.5554
	32	2.2908	0.6318	2.3997	0.7234	0.4674	3.9896	0.6208	4.0198
	64	3.0668	7.5146	3.1376	7.7261	3.9490	3.5180	4.1272	3.5716
Layer	Hidden	Trans. (MAE)		Trans. (RMSE)
Layer	Hidden	w GPS	wo GPS	w GPS	wo GPS
4	–	1.2258	4.2406	4.2406	4.2950

Table 10. Ablation study results of GPS features for one-minute and ten-minute SoC prediction.

GPS Feature Removed	One-min. (LSTM 2,16)		Ten-min. (GRU 2,16)
GPS Feature Removed	MAE	RMSE	MAE	RMSE
All feature	0.2988	0.3568	0.6741	0.9311
without lat., lon.	0.3290	0.3882	1.7162	2.1464
without alt.	0.3031	0.3642	0.7220	1.0848
without speed	0.3242	0.3937	0.6804	0.8921
without heading	0.3012	0.3628	0.7481	1.0618
No feature	0.3456	0.4069	2.8065	3.9276

Table 11. Computational performance and resource requirements of the SoC prediction models using an i5-7500 CPU platform.

Model	Param. Count	Model Size (KB)	Mean Latency (ms)
LSTM (one-minute ahead)	5425	20.7	0.719
GRU (ten-minute ahead)	4081	15.6	2.671

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Joh, W.; Park, K.; Shin, D.; Kim, J. GPS-Assisted State-of-Charge Prediction for Electric Vehicles in Shuttle Service Applications. Electronics 2026, 15, 1014. https://doi.org/10.3390/electronics15051014

AMA Style

Joh W, Park K, Shin D, Kim J. GPS-Assisted State-of-Charge Prediction for Electric Vehicles in Shuttle Service Applications. Electronics. 2026; 15(5):1014. https://doi.org/10.3390/electronics15051014

Chicago/Turabian Style

Joh, Woncheol, Kyuyong Park, Donghwa Shin, and Jaemin Kim. 2026. "GPS-Assisted State-of-Charge Prediction for Electric Vehicles in Shuttle Service Applications" Electronics 15, no. 5: 1014. https://doi.org/10.3390/electronics15051014

APA Style

Joh, W., Park, K., Shin, D., & Kim, J. (2026). GPS-Assisted State-of-Charge Prediction for Electric Vehicles in Shuttle Service Applications. Electronics, 15(5), 1014. https://doi.org/10.3390/electronics15051014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GPS-Assisted State-of-Charge Prediction for Electric Vehicles in Shuttle Service Applications

Abstract

1. Introduction

2. Related Works

3. Overview of the Process

4. Electric Vehicle Implementation for Data Collection

4.1. Battery Pack

4.2. Battery Management System (BMS)

4.3. GPS Module

5. Deep Learning Models

6. Experimental Setup

7. Experimental Results

7.1. One-Minute Prediction Results for Course A

7.2. Ten-Minute Prediction Results for Course A

7.3. Ablation Study on GPS Features

7.4. Computational Costs of the Proposed Models

7.5. Course B Results

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI