iEVEM: Big Data-Empowered Framework for Intelligent Electric Vehicle Energy Management

Guo, Siyan; Zhao, Cong

doi:10.3390/systems13020118

Open AccessArticle

iEVEM: Big Data-Empowered Framework for Intelligent Electric Vehicle Energy Management

by

Siyan Guo

^1,2 and

Cong Zhao

^1,2,*

¹

School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China

²

National Engineering Laboratory for Big Data Analytics, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Systems 2025, 13(2), 118; https://doi.org/10.3390/systems13020118

Submission received: 17 December 2024 / Revised: 31 January 2025 / Accepted: 11 February 2025 / Published: 13 February 2025

Download

Browse Figures

Versions Notes

Abstract

Recent years have witnessed an unprecedented boom of Electric Vehicles (EVs). However, EVs’ further development confronts critical bottlenecks due to EV Energy (EVE) issues like battery hazards, range anxiety, and charging inefficiency. Emerging data-driven EVE Management (EVEM) is a promising solution but still faces fundamental challenges, especially in terms of reliability and efficiency. This article presents iEVEM, the first big data-empowered intelligent EVEM framework, providing systematic support to the essential driver-, enterprise-, and social-level intelligent EVEM applications. Particularly, a layered data architecture from heterogeneous EVE data management to knowledge-enhanced intelligent solution design is provided, and an edge–cloud collaborative architecture for the networked system is proposed for reliable and efficient EVEM, respectively. We conducted a proof-of-concept case study on a typical EVEM task (i.e., EV energy consumption outlier detection) using real driving data from 4000+ EVs within three months. The experimental results show that iEVEM achieves a significant boost in reliability and efficiency (i.e., up to 47.48% higher in detection accuracy and at least 3.07× faster in response speed compared with the state-of-art approaches). As the first intelligent EVEM framework, iEVEM is expected to inspire more intelligent energy management applications exploiting skyrocketing EV big data.

Keywords:

energy system; electric vehicle energy management; big data; edge–cloud collaboration

1. Introduction

Revolutionary Electric Vehicles (EVs) [1,2] are attracting significant attention nowadays. In addition to their inherent advantages in coping with the global energy crisis and environmental pollution [3], EVs equipped with cutting-edge Information and Communication Technologies (ICTs) (e.g., on-site sensing, artificial intelligence, and 5G) are rapidly progressing to the next-generation personal intelligent mobile terminals. These advancements are expected to impact people’s daily lives profoundly [4] in the near future. However, EV Energy (EVE) issues like battery safety, range anxiety, and charging economy have become the critical bottleneck hindering the increase in EV acceptance and penetration rates [5]. Specifically, the spontaneous combustion caused by battery faults (e.g., thermal failures) seriously limits consumer trust in EVs [6,7]. Also, since EVs’ residual range is obviously affected by various factors like ambient temperature and traffic congestion, it is difficult to accurately estimate, and this intensely concerns EV drivers [8,9]. As for charging economy [10,11], uncomfortable experiences, including inconvenient locations, prolonged queues, and climbing costs caused by less optimized siting of social charging facilities are gradually wearing out drivers’ preference for commuting with EVs.

To alleviate the aforementioned issues, both academia and industry are keen on constructing intelligent EVE Management (EVEM) solutions. On one hand, due to the inherently complex and highly non-linear electrochemical processes, accurate analytical modeling of real-world EVE status remains a significant challenge [8,12,13]. Benefiting from the significantly enhanced ICT capabilities of the EV industry, massive EVE data are being extensively collected during the entire EVE lifecycle, spanning production, service, and retirement. Flourishing attempts focus on data-driven methods, which leverage big data analytics through artificial intelligence techniques, such as machine learning and deep learning, and perform promisingly in addressing a variety of EVE issues, such as battery anomaly detection [6,7], vehicle energy consumption prediction [8,9], and charging pile location recommendation [10,11]. On the other hand, networked EVEM systems comprise distributed and interconnected equipment originating from diverse stakeholders (e.g., vehicle-mounted devices from individuals and cloud servers from organizations). Such a distributed trait offers a range of new opportunities in innovative computing schemes like edge intelligence [14,15,16], which leverages both distributed computing resources to enable efficient and intelligent processing.

However, existing works attempting EVEM are narrowly scattered on solving detailed data-related or system-specific matters, often lacking generalizability and adaptability across different scenarios. Such a non-generic manner not only results in redundant development efforts but also hinders the proliferation of EVEM, i.e., impeding the scalability and widespread adoption of existing EVEM solutions. Therefore, it elicits the urgent need for a unified framework considering both data and system issues for coping with ever-increasing EVE big data and intelligent EVEM applications. Despite extensive studies on various big data frameworks, e.g., [17,18], their direct application to achieve accurate and efficient EVEM is still challenging due to the following fundamental differences.

Particularity of Knowledge-Implied EVE Data. Except for traditional big data characteristics [19], EVE big data imply underlying complex domain-specific mechanisms, e.g., the EV energy recovery during braking. General data-driven approaches, which often neglect the incorporation of inherent domain knowledge, face significant challenges in accurately modeling EVE status [8,12,20]. Despite the presence of frameworks incorporating knowledge embedding, they are insufficient to fully support the knowledge embedding of EVEM due to its more complex and diverse knowledge (e.g., various non-linear and uncontrollable electrochemical reactions). Thus, developing a framework that facilitates the seamless embedding of subtle and domain-specific knowledge is of paramount importance for achieving precise EVEM.
Constraints of Resource-Limited EVEM Systems. Networked EVEM systems comprise heterogeneous devices with varying computational and communication capabilities. Traditional edge- and cloud-based schemes, while widely adopted, are often constrained by computational limitations (e.g., on-site EV devices with restricted processing power and memory) or communication bottlenecks (e.g., limited vehicle-to-everything bandwidth under dynamic network conditions). These inherent constraints significantly hinder their ability to ensure the prompt and reliable responses required for latency-sensitive applications [21], e.g., failure to promptly alert about battery abnormalities potentially results in fatal consequences like spontaneous combustion or explosion. Considerable studies have focused on the issue of resource constraints, while only limited works attempt to reduce end-to-end (E2E) latency in such resource constraints. Consequently, an efficient EVEM framework capable of operating within limited resources is indispensable for practical deployment.
Deficiencies of Distributed EVEM Systems and Isolated EVE Data. To protect the privacy [22,23,24] of different EV stakeholders like manufacturers, vendors, and consumers, EVEM systems are physically distributed and networked, and EVE data are strictly isolated and unassociated. While there are many works dedicated to addressing data isolation issues (e.g., federated learning), they primarily focus on simplex horizontal or vertical federated learning scenarios. Richer scenarios with multi-party, multi-level, and multi-scale spatio-temporal joint analysis are demanded in EVEM systems. The property critically affects the feasibility and efficiency of the traditional big data framework. Therefore, addressing these challenges within the framework design is crucial to ensure comprehensive and practical EVEM.

Such differences between EVEM and traditional big data scenarios make it challenging to directly apply mainstream big data frameworks to EVEM, and there are no frameworks that can address these three issues simultaneously. To comprehensively address the above issues, this article presents iEVEM, the first systematic and scalable big data framework designed for intelligent EVEM. The framework aims to serve as a tutorial for constructing intelligent EVEM solutions from both data and system perspectives. The detailed key contributions of this work are summarized below.

(1): We conduct the first comprehensive investigation on intelligent EVEM, aiming to provide insight into this emerging promising field. Particularly, we introduce the background on EVE and clarify essential EVEM applications at the driver-, enterprise-, and social levels, effectively highlighting the practical significance of EVEM. Meanwhile, we systematically identify and extract the key challenges associated with designing and implementing a framework for intelligent EVEM.
(2): We propose a novel big data framework, termed iEVEM, to address the challenges mentioned above. Specifically, we construct a layered architecture of EVE data processing and analysis, starting from the physical layer, which manages heterogeneous and isolated EVE data for data collection. This is followed by the data layer and algorithm layer, which enable supporting the efficient design of knowledge-enhanced intelligent solutions, ultimately supporting diverse intelligent EVEM applications in the application layer. Additionally, an edge–cloud collaborative system architecture is introduced to facilitate practical application deployment while effectively addressing the resource constraints of distributed systems. The framework outlines a standardized development process for intelligent EVEM solutions, which serves as a tutorial to guide practitioners in developing intelligent applications in practice.
(3): We conducted a proof-of-concept case study of iEVEM using real-world data to validate its effectiveness. For EV energy consumption outlier detection, the experimental results demonstrate that iEVEM achieves significant improvements in both detection accuracy, with gains of up to 47.48% higher, and response speed, being at least 3.07× faster compared with state-of-the-art methods. The case study has demonstrated the effectiveness of our framework, enabling further innovation and progress in EVEM and even other fields like smart cities and intelligent industry. Furthermore, we also highlight several important open issues and research directions for the further development and refinement of intelligent EVEM and relevant fields.

2. Background Knowledge

In this section, we first introduce the current status of EVE and analyze the factors that affect the development of EVE to illustrate the significance of EVEM.

2.1. The Current Status of EVE

According to the Global EV Outlook 2024 of the International Energy Agency (IEA) [25], the EV market has grown exponentially, with over 40 million EVs on roads globally by the end of 2023. EV sales neared 14 million in 2023, representing 18% of total new car sales worldwide (i.e., one in five cars sold in 2023 was electric), 95% of which were in China, Europe, and the United States. China continues to be the leading market for EV sales, representing 55% of global sales, followed by Europe at 28% and the United States at 12%. In these countries, EVs accounted for a large share of car markets in 2023, e.g., over one in three new car registrations in China was electric, over one in five in Europe, and one in ten in the United States. Compared to the EV sales share of 12% in 2022 and 2% in five years ago, it is evident that the growth momentum remains strong as the EV market matures.

The rising sales of EVs are driving an increased demand for batteries, maintaining the upward trajectory seen in recent years. Lithium-ion batteries dominate the energy storage market, with 70% of the global battery production capacity dedicated to EVs, supporting an average battery energy density of 280 Wh/kg and capacities ranging from 60 kWh (passenger cars) to 1 MWh (heavy-duty trucks). As a statistic, the global EV fleet consumed approximately 130 TWh of electricity, roughly equivalent to Norway’s total electricity consumption. In 2023, the demand for EV batteries exceeded 750 GWh, marking a 40% increase compared to 2022. Battery production reached 2.5 TWh in 2023, representing an increase of 780 GWh in capacity compared to 2022. The capacity added in 2023 was over 25% higher than in 2022, indicating the continued growth of demand for batteries.

Charging infrastructure is rapidly expanding; more than four million public charging points were installed globally by 2023, a 40% increase from 2022. According to projections, the global number of public charging points will exceed 15 million by 2030, up four-fold compared to 2023. By 2035, this number will reach almost 25 million, a six-fold increase relative to 2023. Besides, there are nearly ten times more private chargers than public ones, as most owners charge at home. While the abundance of private chargers is significant, the IEA highlights that public charging roll-out needs to keep pace with EV sales, and governments should strengthen support for public charging infrastructure.

2.2. The Factors Affecting the Development of EVE

Despite the rapid growth of EVE, it still faces numerous obstacles, with varying progress across different countries and regions. The main barriers involve three aspects, i.e., safety, convenience, and economy.

Safety. The safety of EVs is a primary concern for consumers. Recent incidents involving battery fires and explosions have raised public apprehension. Taking Tesla EV as an example, more than 20 cars of the Model X/S series suffered battery thermal runaway accidents from 2018 to 2019 [26], highlighting the need for improvements in EV safety, which impacts consumers’ willingness to purchase EVs.
Convenience. Convenience is significantly impacted by users’ range anxiety. This anxiety arises from two main issues. On one hand, the state-of-charge (i.e., battery level) is nonlinearly influenced by various factors, such as driving habits, road conditions, and temperature, making it difficult for users to estimate their remaining range reliably [8]. On the other hand, insufficient charging infrastructure creates concerns regarding the feasibility of charging [25], especially during long trips. For example, in the United States, there are approximately 120,000 charging stations for over 2 million EVs in 2021, resulting in a ratio of about 1 charging station per 17 vehicles. The inaccurate estimation of battery levels and the imbalance between vehicles and piles may hinder the continued growth and acceptance of EVs.
Economy. The economic viability of EVs is affected by multiple factors, including government policies, subsidy levels, and electricity pricing [27]. Many countries provide incentives, such as purchase subsidies and tax reductions, to encourage EV sales. In China, for example, subsidies can reach up to 20,000 RMB (approximately 3000 USD) per vehicle. However, as the subsidies decrease, the total cost of ownership may rise, affecting consumer purchase decisions.

The solution addressing the above issues can be split into two parts, i.e., policy and technology. For example, various policies and incentives by the government, such as tax exemptions, toll exemptions, and free public charging infrastructure can effectively reduce the usage costs of EVs, thereby promoting their penetration. Different technologies (e.g., timely anomaly alerts, accurate remaining range estimation, and rational charging strategies) can also alleviate users’ anxiety about EV usage. Our work aims to address these issues from a technical perspective, intending to promote the advancement of EVs.

3. Essential EVEM Applications

We first comprehensively classify EVEM applications into distinct user levels, i.e., driver, enterprise, and social levels, which are shown in Figure 1.

3.1. Driver-Level Applications

Driver-level applications serve individuals to enhance general user experiences. Leveraging the energy supplement (i.e., battery) and consumption (i.e., appliances) related data, EVEM offers services to improve driving safety and energy economy.

3.1.1. Driving Safety

Such services guarantee driving safety by identifying and preventing vehicle failures that may cause hazardous even fatal consequences to drivers. For instance, Status Monitoring [28] perceives the current operating condition of EVE components (e.g., battery) and alerts when there is an anomaly (e.g., excessive cell temperature). In such cases, abnormal temperatures need to be warned in time, otherwise, thermal failure would result in combustion or an explosion that seriously hampers driving safety [6]. Hence, real-time data analysis is critical for such services. Another example is Predictive Maintenance, which predicts EVE status for preventing potential failures in advance. For example, the battery’s Remaining Useful Life (RUL) [29] indicates its capacity degradation, whose premature decline is a sign of hidden hazards. However, accurate RUL prediction is challenging since it is affected by multiple factors varied with time (e.g., long-term exposure to low temperature will cause irreversible effects on RUL while short-term exposure only results in temporary RUL fluctuations), which indicates the necessity of multi-scale spatiotemporal analysis for addressing EVE status prediction uncertainty.

3.1.2. Energy Economy

Energy economy [30] devotes efficient energy usage, which helps drivers reduce driving overhead by providing the following personalized recommendations. For example, Charging Strategy Optimization [31] offers drivers convenient and efficient charging opportunities. Such strategy development requires multiple considerations from different participants, e.g., driver driving habits, grid electricity prices, and surrounding available chargers from map providers. The multi-party decision-making raises privacy concerns, which brings challenges of joint analysis without sharing sensitive information (i.e., raw data). Another example is Energy Consumption Minimization, which helps drivers reduce energy consumption by offering advice, e.g., energy-efficient route planning [11]. The energy consumption is predicted based on various factors (e.g., road slope, traffic congestion, and driving behavior). However, due to the high dynamics of these factors, accurate energy consumption prediction remains a challenge.

3.2. Enterprise-Level Applications

Enterprise-level applications assist relevant enterprises (e.g., battery manufacturers, vehicle companies, and repair factories) to increase their economic values, where EVEM provides tools for boosting profits by quality control and cost reduction.

3.2.1. Quality Control

Such efforts enhance corporate reputation and profits invisibly by ensuring product quality. For instance, Service Vehicle Monitoring [32] guarantees the quality of managed vehicles during operation. By remotely monitoring vehicle status, anomalies are timely detected for early maintenance. For enterprises managing large-scale vehicles, a prompt response dealing with high concurrency is challenging and should be addressed. Another example is Production Quality Improvement [33], which pursues higher quality products. For instance, cell consistency significantly affects battery quality while different aspects (e.g., voltage and capacity) of consistency imply diverse impacts. To improve the overall consistency, the underlying influence on quality from different aspects requires expert cognition, showing the need for knowledge embedding.

3.2.2. Cost Reduction

These applications improve economic benefits in the direct way of minimizing enterprise costs. Illustratively, Production Process Optimization [34] adjusts the production process (e.g., eliminating redundant work steps) to avoid unnecessary time and labor costs. Actually, the identification and streamlining of inefficiencies in production processes need consultations with experts for practical feasibility, which illustrates the necessity of introducing expert knowledge into implementable production line optimization. Another example is Intelligent Fault Diagnosis [7]. It assists repairing staff to improve diagnosis efficiency and reduce maintenance costs, which involves automatically detecting faults, identifying root causes, and making decisions. Due to scarce fault data, the expert experience should be efficiently integrated into fault diagnosis since it plays a vital role (e.g., prior failure records) in dealing with insufficient data.

3.3. Social-Level Applications

Social-level applications aid government decision-making in improving social benefits. Hence, EVEM aims at mobilizing resources to build a sustainable and convenient society from environmental protection and public welfare.

3.3.1. Environmental Protection

Such tasks encourage the public to reduce their carbon footprint and promote recycling. For example, Carbon Assets Management [35] is designed to track, measure and manage carbon emissions. For instance, a company managing a fleet of EVs earns carbon credits by counting the carbon emissions reduction in the fleet. For fairness, carbon assets must be calculated accurately, which is hard since it is affected by complicated factors. Another example is Battery Cascade Utilization [36]. It is the practice of repurposing batteries withdrawn from EVs, which is eco-friendly by extending the battery lifespan. The quality of second-life batteries needs to be guaranteed, preferably with access to historical data from different sources (e.g., production, service, and maintenance), where data privacy should be fully considered during cross-silo analysis.

3.3.2. Public Welfare

These applications reflect the government’s efforts to promote the well-being of all citizens. Illustratively, Energy Infrastructure Construction [37] indicates the planning, design, and building of socially valuable energy facilities. For example, charging pile location is a high-profile initiative, which attempts to make chargers more accessible to drivers. Since it requires information from multiple parties (e.g., transport bureau, grid company, and land office), the privacy leakage issue among multiple participants needs to be properly addressed. Another example is Vehicle-to-Grid (V2G) [38]. It refers to the bi-directional flow of energy between EVs and grids, where EVs provide energy back to grids as a distributed power supply during peak demand to reduce the strain on grids. To balance benefits between grids and drivers (e.g., grids pay drivers for energy storage and drivers bear their own costs), multi-objective optimizations should be constructed for achieving a win-win V2G, where multifaceted limitations should be considered with the aid of expert advice.

4. Challenges to the EVEM Framework

To support the above applications, significant challenges arise from both data and system perspectives during framework design.

4.1. Data Challenges

Data challenges mainly affect the reliability of EVEM.

①: It is difficult to accurately model EVE using general methods due to underlying complex knowledge. On one hand, given the inherent complexity, nonlinearity, and uncontrollability of energy reactions, EVE status is hard to model formally, which brings great challenges for existing mechanism-driven methods. On the other hand, lacking effective solutions to integrate inherent knowledge, pure data-driven methods struggle to model EVE accurately [8,12] and cannot support reliable EVEM.
②: Unassociated fragmented EVE data pose challenges to multi-scale spatio-temporal correlation analysis. Since EVE status is impacted by a range of factors varying over space and time, multi-scale joint analysis is necessary for EVEM. However, EVE data are collected and possessed in distributed manners and isolated at different owners (e.g., drivers, enterprises, and government agencies) without a way to associate them [39]. This seriously impedes the feasibility of joint analysis.

4.2. System Challenges

System challenges primarily hinder the efficiency of EVEM.

③: Rapid response is difficult to satisfy by conventional schemes with limited system resources. In naturally distributed EVEM systems, low E2E latency is challenging with the limited computing capabilities of edge nodes (e.g., vehicle-mounted devices) and communication resources between nodes (e.g., moving EVs) [21]. Specifically, predominating cloud-based methods requiring massive data uploading suffer from prolonged communication time. Local-based methods, processing data locally entirely, result in an unacceptable computation time and cannot support efficient EVEM.
④: Isolated EVEM systems pose challenges to multi-party joint analysis. Numerous EVEM applications inherently require multiple stakeholders (e.g., drivers, enterprises, and government agencies) to participate. However, with widespread and growing privacy concerns [23,24,40] of participants, all data are best kept locally to prevent privacy leakage. Hence, the strictly isolated systems severely hinder the feasibility of joint analysis across multiple parties.

To address these issues, we propose iEVEM, explicitly presenting its data intelligence architecture and edge–cloud collaborative system architecture in the following sections.

5. Data Intelligence Architecture of iEVEM

To address data challenges, the data intelligence architecture, as demonstrated in Figure 2, is proposed for intelligent solution construction fully considering underlying knowledge and isolated data, comprising the physical, data, and algorithm layers.

5.1. The Physical Layer

The physical layer deals with the original data sources. As shown in Figure 2, EVE data are acquired from the EVE full life cycle (i.e., production, service, and maintenance) and domain experts (e.g., mechanical engineer and business manager).

Production Phase refers to the process from raw materials (e.g., electrolyte) to concrete power battery products (e.g., cell, module, and pack) [41]. Production data are collected by manufacturing equipment, mainly containing battery monitoring records (e.g., current, voltage, and resistance), which are generally structured in a predefined format like spreadsheets and acquired continuously near real-time following specific industrial standards (e.g., ISO 12405 [42] in the European Union).
Service Phase indicates the usage of finished products like EVs and charging piles, where service data record their operating information. Particularly, EV service data are collected by on-board sensors [39] and mainly perceive the status of eic systems, i.e., the battery (e.g., temperature), motor (e.g., velocity), and controller (e.g., regenerative braking). Following the national standard (e.g., GB/T 32960 [43] in China), service data are also collected in structured with a prescribed format and frequencies.
Maintenance Phase indicates the status of out-of-service, including repairing and recycling. The maintenance data are collected by checkout equipment, which includes the testing information of productions, e.g., fault in repairing and RUL in recycling. Among them, repairing data are usually formatted in semi-structure and varied with enterprises, while recycling data tend to be structured in the required testing procedures along with the increasingly published recycling standards (e.g., UL 1974 [44] in America).
Domain Expert refers to EVE-domain specialists, and the expert knowledge indicates the information converted by prior experience, which is evolved in the aforementioned phases (e.g., working procedures in production, energy mechanisms in service, and repairing logs in maintenance). Knowledge is usually formatted in semi-structured (e.g., worksheet) and unstructured data (e.g., text), and as an additional input for intelligent solution construction, knowledge representation and embedding are crucial.

5.2. The Data Layer

Based on the data collected from the physical layer, the data layer is applied for data processing. As illustrated in Figure 2, iEVEM has special considerations in data association and knowledge-enhanced feature engineering.

5.2.1. Data Association

In addition to traditional data preprocessing (e.g., data cleansing), data association is required for supporting multi-scale spatiotemporal correlation analysis by aligning distributed and time-dependent EVE data. Considering the EVE property in state transition (e.g., small-scale transition of cells-modules-packs and large-scale transition of service-maintenance-recycle), an identifier for data association is designed for addressing challenge ②, which acts at arbitrary adjacent links. Specifically, the identifier needs to contain temporal (e.g., precursors and successors in linked list structures) and spatial (e.g., hierarchical inclusion relationships like tree structures) information. For any battery pack, the historical records (i.e., precursors links) of cells in it (i.e., inclusion relationships) can be traced back.

5.2.2. Knowledge-Enhanced Feature Engineering

Feature engineering is the process of obtaining informative features from data. For high-quality features (i.e., meaningful, task-oriented, and quantity-appropriate), knowledge should be embedded into feature engineering with challenge ① being considered.

Step 1: Feature Extraction transforms raw data into sets of features with underlying patterns. Traditional feature extraction, relying on straightforward mathematics properties (e.g., mean and variance), ignores physical meaning with potentially critical information unexplored (e.g., the peak of the incremental capacity curve is a decisive factor for capacity estimation [45]). Knowledge embedding effectively alleviates the issue by forming feature candidates for each data dimension in advance, which facilitates extracting meaningful features by the feat of expert experience.
Step 2: Feature Selection intends to identify relevant features for given tasks from feature candidates, which is usually achieved by feature importance ranking. However, existing methods (e.g., decision tree) are prone to unstable ranking since they strongly rely on sample data. To enhance the reliability of task-oriented feature selection, expert knowledge is used to guide the identification of critical relevant features (e.g., expert knowledge can be utilized to assign feature weights in feature ranking) for given tasks. For example, the feature selection can be conducted in a stepwise manner. First, feature grouping is performed. Data-driven feature correlations (e.g., Pearson correlation coefficient) can be used to construct an undirected graph where nodes represent individual features and edges represent the correlation weights between features. Expert knowledge can help verify and adjust the correlation weights. Ultimately, feature grouping is achieved through subgraph partitioning, where edges with weights below a certain threshold are removed. Next, feature ranking is performed based on the feature groups. Specifically, representative features are randomly selected from the groups, and experts indicate their relevance (1 for relevant and 0 for not relevant). After multiple rounds of feature sample and group labeling, task-oriented group weights are generated. These weights are normalized and combined with data-based ranking to produce a stable feature ranking, which minimizes expert labeling costs by only requiring assessments of a few features within each group.
Step 3: Dimensionality Reduction refers to the process of reducing the number of features while preserving sufficient and necessary information, which significantly contributes to subsequent efficient data analysis and model performance. Traditional reduction methods (e.g., principal component analysis and autoencoders) reduce features by changing feature spaces, where transformed dimensions lack clear physical meanings. Domain knowledge helps obtain refined features with practical meaning retained in original feature spaces (e.g., reduce redundant features according to physical correlations or integrate multiple features into one with practical meaning). For example, we only select one feature in each feature group in step 2 to reduce redundancy, as features within the same group exhibit high correlation.

5.3. The Algorithm Layer

The algorithm layer is responsible for data analysis based on the features from the data layer, and its results are applied to support the application implementation.

5.3.1. General Model

Existing general models are classified into mechanism- and data-driven based on design principles.

Mechanism-driven Models are constructed based on the fundamental insights of underlying EVE mechanisms (e.g., physical laws and chemical reactions), which emphasize interpretability and physical fidelity, making them indispensable for EVEM. These models are mainly developed in formalized mathematical expressions for representing the intrinsic principles (e.g., the electrochemical and thermal dynamics of batteries, the operational characteristics of motors, and the energy flow in powertrain systems). For example, equivalent circuit models [7] are widely used to describe battery behavior, leveraging electrical circuit analogies to represent processes like charge transfer and diffusion. While mechanism-driven models exhibit strong interpretability, they often face challenges in terms of adaptability to complex, nonlinear, and uncontrollable energy reactions and systems. Nonetheless, these models remain a reserve and cornerstone for EVEM.
Data-driven Models are constructed to uncover patterns, relationships, and decision-making rules directly from data, bypassing the need for explicit physical or mechanistic understanding. Such methods are primarily developed by statistics, machine learning, and deep learning. By virtue of learning patterns and relationships from massive historical data, the solution is built automatically based on mined rules. In the context of EVEM, supervised learning algorithms [32], such as decision trees in machine learning and neural networks in deep learning [20], are commonly used to predict battery degradation and RUL based on historical usage patterns. As another model basis of EVEM, the primary strength of data-driven models lies in their ability to automatically learn complex, nonlinear, and uncontrollable relationships from data without domain knowledge. However, these methods also exhibit notable drawbacks in their stability and reliability, suffering from their poor interpretability.

It is worth noting that mechanism-driven and data-driven models are fundamental components of the model pool in the algorithm layer. They can be integrated into the mechanism data dual-driven models and various integration methods exist. For example, introducing correction terms based on mechanism-driven models and using data-driven techniques to fit these terms for improved model accuracy. Additionally, during the training of data-driven models, mechanism-driven models can serve as constraints or loss functions, guiding rapid training and enhancing overall performance.

5.3.2. Knowledge-Enhanced Algorithm Construction

Considering the challenge ①, the above general models have difficulty satisfying EVEM demands, where the specified algorithms construction procedure (i.e., knowledge-enhanced algorithm construction) is shown in Figure 2.

Step 1: Problem Definition abstracts and models the target problem, including task types (e.g., classification or regression) and requirements (e.g., optimization objectives and constraint conditions) from real scenarios, which should be expressed explicitly with the aid of domain experts. For instance, expert knowledge in the text form can be transformed into optimization formulas through a large language model [46].
Step 2: Algorithm Development indicates the design of specified intelligent solutions. Depending on the task type and requirements from the problem definition, practicable general models are selected from the model pool (i.e., mechanism- or data-driven models), whose characteristics have been elaborated in advance by experts. After that, the algorithm is designed (e.g., construct a novel one or modify general models) with further consideration of available data, application demands, and muttons with knowledge guidance (e.g., the optimum parameters are set by prior experience). Moreover, in a knowledge-enhanced way, in addition to expert-guided practicable general model selection and proper parameter setting (e.g., learning rate during training), knowledge representation and embedding are utilized for algorithm design to further improve performance. For example, the knowledge-enhanced mechanism-driven approach utilizes expert knowledge to set and adjust parameters such as the drag coefficient and frontal area within the mechanism-driven vehicle dynamic model during new vehicle design. These parameters vary among different vehicle types, allowing for an accurate simulation of real-world energy consumption during the development stage. The early identification of potential issues in energy consumption performance ultimately provides valuable feedback for product improvement. The knowledge-enhanced data-driven approach presents the correlation of EV energy components in the knowledge graph with expert help, where nodes represent components (e.g., battery, motor, and air conditioner) and edges capture their dependencies (e.g., energy flow). If a component fails, a data-driven GNN [47], leveraging feature propagation and aggregation between nodes and their neighbors, can trace the connections to identify the root cause, such as linking abnormal motor performance to upstream issues like battery instability or inverter faults. Expert knowledge can be used to refine the construction of knowledge graphs by enhancing node attributes and edge properties within the energy domain [48]. Specifically, for attributes of each component node, knowledge-enhanced feature engineering helps select the most critical features for energy anomalies. Besides, with the help of experts, the properties of edges not only depict energy flow but also reflect the mechanistic influences and relational weights between components. A more pronounced example of integrating a mechanism-driven, data-driven, and knowledge-enhanced approach is charging optimization with cutting-edge multi-agent reinforcement learning. Specifically, each user can be viewed as an agent, and data-driven methods are applied to recommend optimal charging strategies for these agents based on user behavior data. The mechanism-driven models of calculating charging stations’ load balancing serve as the reward model. In a knowledge-enhanced way, user satisfaction degree is integrated into the learning process through Reinforcement Learning from Human Feedback [49], allowing for the embedding of expert knowledge (i.e., human evaluation) into the reward structure.
Step 3: Solution Validation is the feasibility evaluation of constructed solutions before application launch. However, practical challenges arise for traditional methods (e.g., cross-validation) due to the time and labor costs caused by the data availability (e.g., insufficient failure data make the verification of fault diagnosis difficult), label accessibility (e.g., limited labeled samples for cross-validation), and experiment producibility (e.g., battery degradation requiring years to manifest). Therefore, the validation design needs to rely on domain experts to fully consider actual situations (e.g., constructing a simulation environment by domain experts) to address this dilemma.

6. Edge–Cloud Collaborative System Architecture of iEVEM

To address system challenges, an edge–cloud collaborative system architecture, as shown in Figure 3, is adopted for the practical implementation of intelligent solutions in resource-constrained and device-isolated EVEM systems.

6.1. EVEM Systems

As shown in Figure 3a, EVEM systems are naturally distributed and hierarchical [17], i.e., the government is connected with multiple enterprises where a company manages a large number of vehicles. The mapping between either vehicles-enterprise or enterprises-government is roughly abstracted as the edge–cloud architecture, i.e., a cloud is connected with multiple edges that are illustrated in Figure 3b. For EVEM systems, on the one hand, the available system resources are generally constrained. As shown in Figure 3b, the principal resources of edge–cloud EVEM systems are clarified conceptually as the computing capability of the edges and the cloud and the communication resource between them. First, the computing capacities of edges are limited. For example, vehicles are generally equipped with small chips (e.g., Qualcomm Snapdragon Automotive and NVIDIA DRIVE series), while enterprises are capable of applying powerful servers (e.g., NVIDIA GeForce RTX and AMD Radeon RX series) or even clusters. Then, edge–cloud communication is restricted, e.g., the most commonly used communication technology (i.e., Long Term Evolution (LTE) [15]) in vehicles-enterprise may suffer bandwidth fluctuation easily, particularly for high-speed moving vehicles. On the other hand, the sensitive information of EV stockholders (e.g., the driver’s personnel information and the organization’s core technologies) raises ubiquitous privacy concerns in distributed EVEM systems. Therefore, the data of some participants in EVEM systems need to be strictly isolated.

6.2. Edge–Cloud Collaborative Solution

Considering the resource constraints and isolated manners of networked EVEM systems, an edge–cloud collaborative scheme is adopted for big data processing, including data storage and data computing, with challenges ③ and ④ addressed.

6.2.1. Edge–Cloud Collaborative Storage

Storage collaboration refers to a hybrid data storage architecture designed to balance local storage at edge devices and centralized storage on cloud servers, aiming to optimize efficiency, scalability, and privacy preservation in EVEM. All data generated from edges are initially stored locally. If there are no privacy concerns, the data could be uploaded to the cloud server for permanent storage (e.g., adoop Distributed File System). Otherwise, the data are kept local for privacy preservation. In such scenarios, privacy-preserving techniques, including differential privacy or encryption, can be applied to the data before selective sharing with the cloud.

6.2.2. Edge–Cloud Collaborative Computing

It offloads partial tasks from the cloud to the edges while fully respecting the edge–cloud resource imbalance, which achieves a rapid response by preventing massive data uploading and excessive local computing load. There are three distinctive and alternative approaches for edge–cloud collaboration (shown in Figure 3c), where edges generally work on local data and the cloud serves as an additional resource for massive data processing and analysis. Note that considering predominant EVE data are collected in real-time as streaming data, stream processing (e.g., Flink) is particularly necessary in addition to the commonly used batch processing (e.g., Spark) for big data processing on the cloud.

Model Lightweight involves deploying an entire small and efficient model directly on edge devices. In such scenarios, edge devices can independently accomplish tasks without relying on cloud resources, ensuring prompt and robust responses even under poor communication conditions (e.g., vehicles performing in-situ energy-efficient route planning while traveling through a tunnel with limited connectivity). To achieve such lightweight models, techniques, such as model distillation, pruning, and quantization merit further exploration, as they enable the reduction in model complexity while maintaining a sufficient accuracy for real-time applications.
Model Partition refers to the strategy of splitting parts of a large-scale model between the cloud and edge devices. For example, in energy component fault diagnosis using a GNN, the first few GNN layers are executed at vehicles for extracting shallow features (e.g., local anomalies in the voltage or current). The extracted features are then sent to the cloud, where the remaining layers of GNN are carried out to perform a deeper fault diagnosis, such as identifying root causes. Uploading features instead of massive raw data effectively reduces communication time and thus the response latency. The communication-efficient technologies like traffic compression (e.g., quantization and sampling) are crucial for further minimizing response latency.
Model Cascade refers to synergy-varisized functional models at the edge device and the cloud server in a staged manner. Take EV fault diagnosis as an example; an EV can perform a quick self-check using a lightweight local model to detect potential anomalies and provide rapid alerts. If the local model identifies an ambiguous or complex fault, the cloud-based large model can be engaged for a more accurate and comprehensive diagnosis. Dynamic cascading (i.e., determining when to involve the cloud model based on task) is conducive to the trade-off between latency and accuracy, adapting to real-time requirements and system constraints effectively.

Note that for joint analysis across multiple entities, distributed (e.g., federated learning [23,24]) and centralized (e.g., cloud-based) methods are applied with or without privacy concerns, respectively. Both of them are supported by the edge–cloud collaborative scheme.

7. Case Study: Outlier Detection of EV Energy Consumption

To demonstrate the effectiveness of iEVEM, we conducted a case study on EV energy consumption outlier detection.

7.1. Scenario

Energy consumption outlier vehicles are those with abnormal energy consumption caused by factors like damaged components or manual irregularities. To avoid potential safety risks and operational reliability, accurate and rapid outlier detection is required. From a business perspective, once the actual energy consumption deviates from the rational range, the vehicle is identified as an outlier. Therefore, the EV energy consumption outlier detection can be divided into two key steps, i.e., rational energy consumption estimation and outlier identification. Accordingly, there are three main obstacles to practical application implementation: First, cross-organizational analysis struggles with data silos in privacy-focused enterprise cloud networks. Then, the rational energy consumption is difficult to estimate accurately since it suffers from complex and dynamic driving conditions. Besides, the timely outlier detection is hard to achieve within resource-limited vehicle-enterprise networks. Therefore, without loss of generality, we focused on the EV component with the highest energy consumption ratio, the motor, as a representative example in the case study.

7.2. Experimental Setup

7.2.1. Dataset

We used real-world vehicle operation data in the southwestern region of China from our partner (a leading global EV manufacturer), encompassing over 4000 EVs in three different types of EV within three months (from August to October of 2021). Specifically, each vehicle collects 638 data dimensions of the data field per second, following the enterprise standard and national standard of GB/T 32960, which includes the basic information (e.g., vehicle and battery version), vehicle operating status (e.g., velocity and acceleration), battery operating status (e.g., state-of-charge and state-of-health), appliance operating status (e.g., current and voltage), and external factors (e.g., temperature and altitude), etc. As statistics, approximately 1% of vehicles are considered as abnormal, with an energy consumption deviation of 5

σ

(i.e., five standard deviations from the mean of the normal data distribution).

7.2.2. Metrics

For evaluating the general performance of iEVEM, the Area Under the Curve (AUC) [50] is adopted as a primary indicator of reliability, which is a widely recognized metric to measure classification performance, particularly in scenarios involving an imbalance between positive and negative samples. Specifically, the AUC is calculated as follows:

$AUC = \int_{0}^{1} T P R (F P R) d F P R,$

(1)

where the True Positive Rate (i.e., TPR), also known as Sensitivity or Recall, measures the proportion of actual positive cases that are correctly identified by the model, and the False Positive Rate (i.e., FPR) indicates the proportion of actual negative cases that are incorrectly identified as positive by the model. It is also referred to as the probability of false alarm. Note that the closer the AUC is to 1 indicates superior performance. Besides, the E2E latency is utilized as a critical metric for reflecting efficiency, where it denotes the response time from data generation to the results obtained, representing the system’s ability to process and respond in a timely manner.
For evaluating the effectiveness and necessity of iEVEM components, the Mean Absolute Percentage Error (MAPE) [51], indicating the energy consumption estimation precision, is used for reflecting reliability. Specifically,

$\frac{1}{n} \sum_{i = 1}^{n} |\frac{Y_{i} - {\hat{Y}}_{i}}{Y_{i}}| \times 100 %,$

(2)

where $Y_{i}$ is the ground truth, ${\hat{Y}}_{i}$ is the predicted value, and n is the number of samples. A lower MAPE value reflects higher estimation accuracy, which is critical for ensuring dependable EVEM. Additionally, the E2E latency is also applied to compare the efficiency of different deployment schemes.

7.2.3. Comparatives

Given the absence of sufficient abnormal data, we compared the general performance of iEVEM with state-of-the-art unsupervised outlier detection methods [50]. These methods operate under the assumption that anomalies are typically located in low-density regions of the data distribution. They can be roughly categorized into shallow machine learning (i.e., KNN, CBLOF, IForest, and ECOD) and deep neural network methods (i.e., DSVDD). Since outlier detection requires multiple vehicle participation for distribution statistics, edge-only schemes lacking global information are impracticable. Therefore, all comparatives are implemented in cloud-based settings, with iEVEM being the only solution employing the edge–cloud collaborative design.

7.3. Implementation

7.3.1. The Implementation of Data Intelligence Architecture

The Physical Layer. The application of detecting outlier vehicles in energy consumption primarily involves data collection from onboard sensors in service vehicles. As introduced in the datasets, vehicles collect 638 dimensional data every second, and these data record various states and information of the vehicle’s operation. It is worth noting that if data from other sources are introduced, such as road information (e.g., road grade), environmental information (e.g., weather), and traffic conditions (e.g., congestion), it would be helpful. However, considering the currently available data, we only focus solely on the onboard sensor data from vehicles.
The Data Layer. According to the domain knowledge, the data dimensional can be roughly divided into six categories: (1) basic information, such as collection date, vehicle identity number, and battery type, (2) vehicle statuses, such as speed, location, and temperature inside the car, (3) battery statuses, such as state-of-charge, current, and voltage, (4) appliance status like the current and voltage of the motor, air conditioning, and lights, (5) failure statuses comprise the indication of all components’ fault, (6) mechanical information such as seat angle, tire pressure, and others. Based on the expert labeling, 74 attributes were selected from the original 638 dimensions, with empirically irrelevant attributes to energy consumption (e.g., the group of failure statuses and mechanical information) being systematically eliminated. Besides, 49 additional features (e.g., acceleration derived from velocity and time) were constructed based on 74 attributes with essential physical and statistical laws.
The Algorithm Layer. As for the algorithm design, referring to the expert business understanding, a two-step solution was constructed, comprising a regression sub-task $F_{1}$ of rational energy consumption estimation with extreme gradient boosting (i.e., XGBoost [52]) and a classification sub-task $F_{2}$ of outlier detection with a Gaussian distribution instead of conventional unsupervised one-step methods. In the first step, XGBoost operates by aggregating multiple decision trees to produce the final prediction, where each tree is trained to fit the residuals of the previous tree. Assuming there are K trees in total, given each vehicle’s input features x obtained from the data layer, sub-task $F_{1}$ can be approximated as follows:

$\hat{y} = F_{1} (x) = \sum_{k = 1}^{K} f_{t} (x),$

(3)

where ${f_{k}}_{k = 1}^{K}$ is a series of serially-generated trees and $\hat{y}$ is the predicted rational energy consumption value based on the driving features. The actual energy consumption y is calculated by the integration of voltage U and current I within given time frame t, that is,

$y = \int U I d t .$

(4)

In the second step, the difference between the actual energy consumption value y and the rational value $\hat{y}$ determines the vehicle’s degree of deviation $ε$ :

$ε = | y - \hat{y} | .$

(5)

In practice, energy consumption deviations are normal and permissible. It is only when the deviation exceeds a certain threshold $δ$ that vehicles may be identified as outliers. Therefore, the sub-task $F_{2}$ can be summarized as an indicator function:

$F_{2} (ε) = \{\begin{matrix} 0, & ε < δ \\ 1, & ε \geq δ \end{matrix},$

(6)

where 1 indicates that the vehicle is an outlier and 0 signifies that it is not. In our design, the threshold is set through the Gaussian distribution of energy consumption deviation across all vehicles.

7.3.2. The Implementation of Edge–Cloud Collaborative System Architecture

Edge–cloud collaborative storage. Data were collected in real-time from each vehicle and initially stored locally. Due to local storage limitations and the allowance for data sharing between enterprises and their vehicles (e.g., through data-sharing agreements), local data would be uploaded for centralized storage backup when vehicle-enterprise network resources are available. Only data from a certain time frame are retained locally and are periodically overwritten. Nevertheless, data privacy should be protected between different enterprises and third parties (i.e., governments), that is, the aggregated data within each enterprise must be strictly stored locally to prevent any potential privacy breaches.
Edge–cloud collaborative computing. For online model inference, an edge–cloud collaborative prototype was constructed with a Jetson Nano serving as the edge device (representing the EV’s on-site computer) and an NVIDIA 2080 Ti acting as the cloud server (representing the enterprise cloudlet). The 10 Mbps edge–cloud bandwidth followed the LTE standard. The edge–cloud communication was configured with a 10 Mbps bandwidth, adhering to the LTE standard, to simulate realistic vehicle-enterprise network conditions. In this setup, a model cascade was employed for efficient edge–cloud collaboration. Specifically, the rational energy consumption estimation sub-task was deployed on the edge device to process local data and minimize the need for massive raw data uploads, thereby reducing bandwidth usage. The cloud server, in turn, aggregated the energy consumption deviations reported by multiple edges and performed a centralized outlier detection sub-task using Flink. The collaboration ensures a balance between local processing efficiency and cloud-level computational scalability, meeting the requirements of real-time and large-scale EVEM. Among them, the first task requires a large amount of data for training. For offline model training, we simulate the process among multiple enterprises based on our cloud server. Without loss of generality, we consider three enterprises and a trusted third party (e.g., government) in our settings, where the enterprises represent edges and the government acts as the cloud. Since the training time does not affect application performance, we focus solely on the training accuracy, disregarding communication conditions. The objective function L of XGBoost is expressed as follows:

$L = l (y, \hat{y}) + \sum_{k = 1}^{K} Ω (f_{k}),$

(7)

where l represents the mean square error loss function and $Ω$ refers to the complexity of the tree. The specific definition follows the literature. To address cross-organizational privacy concerns, federated learning was adopted to achieve multiple-party joint training. During tree construction, each edge participant uploads the derivatives of the objective function to the cloud encrypted with differential privacy. The cloud calculates the optimal split point by aggregating these derivatives using weighted average methods like FedAvg [53] and returns it to the edges. Differential privacy indicates adding random noise to the uploaded data, preventing exposure of raw information. During aggregation, the noises from multiple edges cancel out, allowing the cloud to derive overall model insights without revealing individual enterprise details, thus ensuring data security.

7.4. Main Results

To thoroughly validate the effectiveness of iEVEM, we first present its general performance and then explain the necessity of framework components by ablation experiments.

7.4.1. The General Performance

Based on the above setting, we compared the AUC and E2E latency of iEVEM with that of all the comparatives. Note that considering the statistics of real-world outliers (i.e., 1% anomaly proportion and

5 σ

deviation degree), we conducted extensive experiments with extended different ratios of anomaly injection exceptions R (i.e.,

0.1 %

and

10 %

) and deviation degrees D (i.e.,

3 σ

and

7 σ

) indicating scenarios with a hard and easy mode (smaller deviations indicate anomalies that closely resemble normal situations and are more challenging to recognize), respectively. As illustrated in Table 1, iEVEM achieves at least 0.94 in terms of AUC and 185 ms in terms of E2E latency with various settings, enabling reliable support and efficient EVEM. Additionally, iEVEM is distinctly superior to the comparatives (12.86% to 47.48% higher in AUC and

3.07 \times

to

148.97 \times

lower in E2E latency), which demonstrates iEVEM outperforms in detecting outlier vehicles in terms of reliability and efficiency.

For a more intuitive demonstration of the effectiveness of the developed algorithm, the results of two sub-tasks are shown in Figure. For the regression sub-task of rational energy consumption estimation, we randomly selected a 3-min driving segment from a vehicle for demonstration purposes. As shown in Figure 4a, the red curve represents the actual energy consumption per second, while the green curve indicates the rational energy consumption predicted by iEVEM. It is evident that the developed algorithm is capable of accurately predicting the normal EV energy consumption, thereby laying the groundwork for the detection of abnormal energy consumption situations. Note that negative energy consumption values indicate energy recovery during braking by the driver. For the classification sub-task of outlier vehicle detection, we sampled 100 vehicles, among which one exhibited abnormal behavior. The energy consumption error for all vehicles during their driving segments is shown in Figure 4b, with the shaded area representing the normal energy consumption range. It is obvious that iEVEM effectively detects outlier vehicles through distribution patterns. Upon investigation, it was found that the abnormal vehicle had illegally installed a heating device, resulting in significantly lower energy consumption during normal winter driving conditions.

7.4.2. Ablation Experiments

The effectiveness of the components in iEVEM is demonstrated as follows:

The Impact of Knowledge-enhanced Approach. We evaluated the impact of the data intelligence architecture by the MAPE of rational energy consumption estimation with different data processing and analysis, i.e., mechanism-driven and data-driven methods. The mechanism-driven method is built upon vehicle dynamics referring to [51], i.e., an analytical formulation of vehicle velocity and road grade, which is a representative work for EV energy consumption calculation. Specifically, the energy consumption E is calculated by the integration of power P over time t, i.e.,

$E = \int P d t .$

(8)

Following the formula derivation in the literature, the energy consumption estimation model used in our case study is expressed as follows:

$P = η [\frac{r R^{2}}{K^{2}} {(m a + k v^{2} + f_{r l} m g + m g sin θ)}^{2} + v (k v^{2} + f_{r l} m g + m g sin θ) + m a v],$

(9)

where $\frac{r R^{2}}{K^{2}} {(m a + k v^{2} + f_{r l} m g + m g sin θ)}^{2}$ is the power losses by the motor, $v (k v^{2} + f_{r l} m g + m g sin θ)$ is the power losses because of travel resistance, $m a v$ is the possible gained energy from acceleration (or deceleration). Among them, $η$ is the transmission efficiency, r refers to the equivalent resistance of the motor, R indicates the radius of the tire, K indicates the product of the armature constant and magnetic flux, m is the vehicle weight, a is the accelerated velocity, k is the aerodynamic resistance coefficient, v indicates the velocity, $f_{r l}$ refers to the rolling resistance coefficient, g is the acceleration of gravity, and $θ$ is the road slope gradient. Note that the suggested value of the hyperparameters is listed in [51]. The data-driven method is constructed on the same model as iEVEM but without knowledge-enhanced feature engineering, i.e., all the data dimensions are utilized. The results are shown in Figure 5a; iEVEM outperforms the comparatives in terms of MAPE. Specifically, the knowledge-enhanced method achieves a MAPE of 9.9%, which is substantially lower than the mechanism-driven method’s 13% and the data-driven method’s 12%. It manifests that the knowledge-enhanced approach is conducive to a more reliable EVEM.
The Impact of Edge–cloud Collaborative Deployment. We first compared the performance differences between federated learning and centralized training. As illustrated in Figure 6, the loss function and validation set performance during the training process are depicted, with the x-axis representing the number of iterations and the y-axis displaying the mean squared error and MAPE, respectively. In each iteration, a tree is generated, with a maximum of 50 trees utilized in our settings. As illustrated in Figure 6a, it is evident that federated learning converges more slowly and exhibits greater fluctuations compared to centralized training. However, this does not adversely affect the model’s performance after convergence. As shown in Figure 6b, the results indicate that federated learning can achieve a performance comparable to that of centralized training while preserving privacy. Then, we evaluated the impact of the edge–cloud collaborative system architecture on the E2E latency of outlier detection with conventional cloud computing. Shown in Figure 5b, the E2E latency of iEVEM is significantly lower where the identical two-step model is adopted. Specifically, the E2E latency of edge–cloud collaborative deployment is approximately 185 ms, which is significantly lower compared to the 685 ms observed in the cloud-based deployment. It is worth noting that the collaborative scheme reduces traffic by more than $100 \times$ compared to the cloud-based scheme. The reduction is attributed to the transformation of the raw data into energy consumption values at the edge of the proposed two-step model. Hence, the edge–cloud collaboration can effectively reduce the traffic and thus E2E latency, enabling efficient EVEM.

8. Open Issues

We have demonstrated the effectiveness of iEVEM above. There are still important open issues deserving further exploration for more sophisticated EVEM applications.

Multimodal Data Fusion for EVEM: In addition to the structured data discussed, incorporating broader and more diverse data modalities [28] should be considered to further enhance the effectiveness and accuracy of intelligent EVEM. For instance, integrating the visual data and point-cloud data of the road environment can provide richer contextual information, facilitating more precise vehicle energy consumption modeling and prediction. Developing efficient approaches for subtle multimodal data fusion remains a critical challenge.
Automatic EVEM Knowledge Embedding: A simple attempt at knowledge-enhanced modeling is proven to be effective in this article. However, automated knowledge embedding is essential for handling the vast, diverse, and ever-changing EVEM knowledge. For example, integrating new findings in battery materials or regularly revised energy management standards will require a systematic and automated approach. Nevertheless, achieving such a unified, automatic, and scalable knowledge embedding mechanism poses significant technical challenges and demands further investigation.
Dynamic Resource Management of EVEM Systems: Given the dynamic and often unpredictable nature of EVEM system resources (e.g., vehicle-to-cloud communication may degrade significantly inside tunnels or during network congestion), developing an agile platform for dynamic resource and scheme management is critical. For example, such a platform could enable seamless switching from in-situ energy-efficient route planning to cloud-based solutions when exiting tunnels or encountering better network conditions. Addressing this issue effectively will require novel strategies to adapt EVEM operations to varying resource availability in real time.

9. Conclusions

This article presents iEVEM, a novel big data-empowered framework specifically for the intelligent management of EV energy, aiming to address the current development bottleneck faced by EVs from the technology perspective. By leveraging advanced intelligent techniques, iEVEM addresses the challenges associated with the complexity and fragmentation of EVE data in distributed and heterogeneous EVEM systems.

Specifically, through the comprehensive discussion of EVE status and the taxonomy of essential EVEM applications, the fundamental challenges of designing a framework are systematically sorted out from data and system perspectives. To address these issues, the proposed iEVEM serves as the tutorial for intelligent EVEM solutions, presenting data intelligence architecture and edge–cloud collaborative system architecture. For the data intelligence architecture, a hierarchical structure is proposed. The physical layer is responsible for managing distributed and isolated EVE data, while the data layer and algorithm layer work collaboratively by embedding domain-specific knowledge to derive more reliable big data processing and analysis methods, thereby providing robust support for a wide range of intelligent EVEM applications. For the edge–cloud collaborative system architecture, the edge–cloud collaborative storage and computation is introduced to address the resource constraints and privacy concerns of distributed EVEM systems.

To validate the effectiveness of iEVEM, a case study on energy consumption outlier vehicle detection was conducted using real-world data. The experimental results demonstrate the performance gain of iEVEM in terms of detection accuracy and response speed, showcasing the potential of iEVEM to outperform traditional approaches and be conducive to a wider range of intelligent EVEM applications. Moreover, this article lays a solid foundation for further exploration and innovation in the field of intelligent EVEM, thus, additional promising opportunities are highlighted at the end of this article for the further development of intelligent EVEM applications. The methodology proposed in our work is motivated to inspire the realization of theoretical concepts into practical cases and prompt the development of science in practice.

Author Contributions

Conceptualization, S.G. and C.Z.; methodology, S.G. and C.Z.; software and data curation, S.G.; validation and visualization, S.G.; writing—original draft preparation, S.G.; supervision, C.Z.; project administration, S.G. and C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from this manuscript due to the privacy of study participants.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Heidrich, O.; Dissanayake, D.; Lambert, S.; Hector, G. How cities can drive the electric vehicle revolution. Nat. Electron. 2022, 5, 11–13. [Google Scholar] [CrossRef]
Simpkins, G. Benefits of electric vehicle adoption. Nat. Rev. Earth Environ. 2023, 4, 432. [Google Scholar] [CrossRef]
Böhm, M.; Nanni, M.; Pappalardo, L. Gross polluters and vehicle emissions reduction. Nat. Sustain. 2022, 5, 699–707. [Google Scholar] [CrossRef]
Husain, I.; Ozpineci, B.; Islam, M.S.; Gurpinar, E.; Su, G.J.; Yu, W.; Chowdhury, S.; Xue, L.; Rahman, D.; Sahu, R. Electric drive technology trends, challenges, and opportunities for future electric vehicles. Proc. IEEE 2021, 109, 1039–1059. [Google Scholar] [CrossRef]
Herberz, M.; Hahnel, U.J.; Brosch, T. Counteracting electric vehicle range concern with a scalable behavioural intervention. Nat. Energy 2022, 7, 503–510. [Google Scholar] [CrossRef]
Zhang, J.; Wang, Y.; Jiang, B.; He, H.; Huang, S.; Wang, C.; Zhang, Y.; Han, X.; Guo, D.; He, G.; et al. Realistic fault detection of li-ion battery via dynamical deep learning. Nat. Commun. 2023, 14, 5940. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Luo, W.; Xu, S.; Yan, Y.; Huang, L.; Wang, J.; Hao, W.; Yang, Z. Electric Vehicle Lithium-Ion Battery Fault Diagnosis Based on Multi-Method Fusion of Big Data. Sustainability 2023, 15, 1120. [Google Scholar] [CrossRef]
Kim, D.; Shim, H.G.; Eo, J.S. A machine learning method for ev range prediction with updates on route information and traffic conditions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 28 February–1 March 2022; Volume 36, pp. 12545–12551. [Google Scholar]
Shen, H.; Zhou, X.; Ahn, H.; Lamantia, M.; Chen, P.; Chen, P. Personalized Velocity and Energy Prediction for Electric Vehicles with Road Features in Consideration. IEEE Trans. Transp. Electrif. 2023, 9, 3958–3969. [Google Scholar] [CrossRef]
Cuchỳ, M.; Vokřínek, J.; Jakob, M. Multi-Objective Electric Vehicle Route and Charging Planning with Contraction Hierarchies. In Proceedings of the International Conference on Automated Planning and Scheduling, Banaff, AB, Canada, 1–6 June 2024; Volume 34, pp. 114–122. [Google Scholar]
Zhang, Y.; Yin, Z.; Xiao, H.; Luo, F. Coordinated Planning of EV Charging Stations and Mobile Energy Storage Vehicles in Highways With Traffic Flow Modeling. IEEE Trans. Intell. Transp. Syst. 2024, 25, 21572–21584. [Google Scholar] [CrossRef]
Khiari, J.; Olaverri-Monreal, C. Uncertainty-Aware Vehicle Energy Efficiency Prediction Using an Ensemble of Neural Networks. IEEE Intell. Transp. Syst. Mag. 2023, 15, 109–119. [Google Scholar] [CrossRef]
Zhu, Z.; Chen, W.; Xia, R.; Zhou, T.; Niu, P.; Peng, B.; Wang, W.; Liu, H.; Ma, Z.; Gu, X.; et al. Energy forecasting with robust, flexible, and explainable machine learning algorithms. AI Mag. 2023, 44, 377–393. [Google Scholar] [CrossRef]
Du, P.; Xiao, T.; Chakraborty, C.; Cao, H.; Alfarraj. Energy-efficient UAVs and BSs management in distributed Edge intelligence empowered IoV networks. IEEE Internet Things J. 2024. Early Access. [Google Scholar] [CrossRef]
Al-Turjman, F.; Altrjman, C. Enhanced Medium Access for Traffic Management in Smart-Cities’ Vehicular-Cloud. IEEE Intell. Transp. Syst. Mag. 2021, 13, 273–280. [Google Scholar] [CrossRef]
Yan, G.; Liu, K.; Liu, C.; Zhang, J. Edge Intelligence for Internet of Vehicles: A Survey. IEEE Trans. Consum. Electron. 2024, 70, 4858–4877. [Google Scholar] [CrossRef]
Zhou, Z.; Yu, H.; Xu, C.; Chang, Z.; Mumtaz, S.; Rodriguez, J. BEGIN: Big data enabled energy-efficient vehicular edge computing. IEEE Commun. Mag. 2018, 56, 82–89. [Google Scholar] [CrossRef]
Yin, L.; Luo, J.; Qiu, C.; Wang, C.; Qiao, Y. Joint task offloading and resources allocation for hybrid vehicle edge computing systems. IEEE Trans. Intell. Transp. Syst. 2024, 55, 10355–10368. [Google Scholar] [CrossRef]
Li, B.; Kisacikoglu, M.C.; Liu, C.; Singh, N.; Erol-Kantarci, M. Big data analytics for electric vehicle integration in green smart cities. IEEE Commun. Mag. 2017, 55, 19–25. [Google Scholar] [CrossRef]
So, D.; Oh, J.; Jeon, I.; Moon, J.; Lee, M.; Rho, S. BiGTA-Net: A Hybrid Deep Learning-Based Electrical Energy Forecasting Model for Building Energy Management Systems. Systems 2023, 11, 456. [Google Scholar] [CrossRef]
Yang, H.; Zheng, K.; Zhang, K.; Mei, J.; Qian, Y. Ultra-reliable and low-latency communications for connected vehicles: Challenges and solutions. IEEE Netw. 2020, 34, 92–100. [Google Scholar] [CrossRef]
Khamfroush, H. Resource-aware Federated Data Analytics in Edge–Enabled IoT Systems. In Proceedings of the AAAI Symposium Series, Arlington, VA, USA, 7–9 November 2024; Volume 3, p. 305. [Google Scholar]
Zhang, X.; Mavromatis, A.; Vafeas, A.; Nejabati, R.; Simeonidou, D. Federated Feature Selection for Horizontal Federated Learning in IoT Networks. IEEE Internet Things J. 2023, 10, 10095–10112. [Google Scholar] [CrossRef]
Liu, Y.; Kang, Y.; Zou, T.; Pu, Y.; He, Y.; Ye, X.; Ouyang, Y.; Zhang, Y.Q.; Yang, Q. Vertical Federated Learning: Concepts, Advances, and Challenges. IEEE Trans. Knowl. Data Eng. 2024, 10, 3615–3634. [Google Scholar] [CrossRef]
IEA. Global EV Outlook 2024. Available online: https://www.iea.org/reports/global-ev-outlook-2024 (accessed on 10 February 2025).
Hu, X.; Gao, F.; Xiao, Y.; Wang, D.; Gao, Z.; Huang, Z.; Ren, S.; Jiang, N.; Wu, S. Advancements in the safety of Lithium-Ion Battery: The trigger, consequence and mitigation method of thermal runaway. Chem. Eng. J. 2024, 481, 148450. [Google Scholar] [CrossRef]
Peng, R.; Tang, J.H.C.G.; Yang, X.; Meng, M.; Zhang, J.; Zhuge, C. Investigating the factors influencing the electric vehicle market share: A comparative study of the European Union and United States. Appl. Energy 2024, 355, 122327. [Google Scholar] [CrossRef]
Lin, M.; You, Y.; Meng, J.; Wang, W.; Wu, J.; Stroe, D.I. Lithium-ion batteries SOH estimation with multimodal multilinear feature fusion. IEEE Trans. Energy Convers. 2023, 38, 2959–2968. [Google Scholar] [CrossRef]
Zhang, J.; Huang, C.; Chow, M.Y.; Li, X.; Tian, J.; Luo, H.; Yin, S. A data-model interactive remaining useful life prediction approach of lithium-ion batteries based on PF-BiGRU-TSAM. IEEE Trans. Ind. Informatics 2023, 20, 1144–1154. [Google Scholar] [CrossRef]
Cao, Y.; Yi, J.; Liu, Y.; Zhao, C.; Li, D.; Zhang, Y.; Han, Z. Joint Routing and Charging Optimization of Electric Passenger Vehicles With Uninterruptible Charging Service. Chem. Eng. J. 2024, 11, 18180–18192. [Google Scholar] [CrossRef]
Wang, R.; Wang, H.; Zhu, K.; Yi, C.; Wang, P.; Niyato, D. Mobile Charging Services for the Internet of Electric Vehicles: Concepts, Scenarios, and Challenges. IEEE Veh. Technol. Magazine. 2023, 18, 110–119. [Google Scholar] [CrossRef]
Zhu, Q.; Huang, Y.; Lee, C.F.; Liu, P.; Zhang, J.; Wik, T. Predicting electric vehicle energy consumption from field data using machine learning. IEEE Trans. Transp. Electrif. 2024, 11, 2120–2132. [Google Scholar] [CrossRef]
McGovern, M.E.; Bruder, D.D.; Huemiller, E.D.; Rinker, T.J.; Bracey, J.T.; Sekol, R.C.; Abell, J.A. A review of research needs in nondestructive evaluation for quality verification in electric vehicle lithium-ion battery cell manufacturing. J. Power Sources 2023, 561, 232742. [Google Scholar] [CrossRef]
Xu, L.; Wu, F.; Chen, R.; Li, L. Data-driven-aided strategies in battery lifecycle management: Prediction, monitoring, and optimization. Energy Storage Mater. 2023, 59, 102785. [Google Scholar] [CrossRef]
Zhang, Z.; Li, J.; Guan, D. Value chain carbon footprints of Chinese listed companies. Nat. Commun. 2023, 14, 2794. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, C.; Hao, Z.; Cai, X.; Liu, C.; Zhang, J.; Wang, S.; Chen, Y. Study on the life cycle assessment of automotive power batteries considering multi-cycle utilization. Energies 2023, 16, 6859. [Google Scholar] [CrossRef]
Kalakanti, A.K.; Rao, S. Charging Station Planning for Electric Vehicles. Systems 2022, 10, 6. [Google Scholar] [CrossRef]
Zhao, H.; Wu, H.; Lu, N.; Zhan, X.; Xu, E.; Yuan, Q. Lane Changing in a Vehicle-to-Everything Environment: Research on a Vehicle Lane-Changing Model in the Tunnel Area by Considering the Influence of Brightness and Noise Under a Vehicle-to-Everything Environment. IEEE Intell. Transp. Syst. Mag. 2023, 15, 225–237. [Google Scholar] [CrossRef]
Xiang, C.; Feng, C.; Xie, X.; Shi, B.; Lu, H.; Lv, Y.; Yang, M.; Niu, Z. Multi-Sensor Fusion and Cooperative Perception for Autonomous Driving: A Review. IEEE Intell. Transp. Syst. Mag. 2023, 15, 36–58. [Google Scholar] [CrossRef]
Hahn, D.; Munir, A.; Behzadan, V. Security and Privacy Issues in Intelligent Transportation Systems: Classification and Challenges. IEEE Intell. Transp. Syst. Mag. 2021, 13, 181–196. [Google Scholar] [CrossRef]
Kwade, A.; Haselrieder, W.; Leithoff, R.; Modlinger, A.; Dietrich, F.; Droeder, K. Current status and challenges for automotive battery production technologies. Nat. Energy 2018, 3, 290–300. [Google Scholar] [CrossRef]
ISO 12405; Electrically Propelled Road Vehicles—Test Specification for Lithium-Ion Traction Battery Packs and Systems—Part 4: Performance Testing. ISO: Geneva, Switzerland, 2018.
GB/T 32960; Electric Vehicle Battery Management System. Standardization Administration of China: Beijing, China, 2016.
UL 1974; The Standard for Evaluation for Repurposing Batteries. UL: Northbrook, IL, USA, 2018.
Li, X.; Wang, Z.; Yan, J. Prognostic health condition for lithium battery using the partial incremental capacity and Gaussian process regression. J. Power Sources 2019, 421, 56–67. [Google Scholar] [CrossRef]
Zhang, J.; Wang, W.; Guo, S.; Wang, L.; Lin, F.; Yang, C.; Yin, W. Solving general natural-language-description optimization problems with large language models. arXiv 2024, arXiv:2407.07924. [Google Scholar]
Wang, D.; Chen, Z.; Ni, J.; Tong, L.; Wang, Z.; Fu, Y.; Chen, H. Hierarchical graph neural networks for causal discovery and root cause localization. arXiv 2023, arXiv:2302.01987. [Google Scholar]
Wehner, C.; Kertel, M.; Wewerka, J. Interactive and intelligent root cause analysis in manufacturing with causal Bayesian networks and knowledge graphs. In Proceedings of the 2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring), Florence, Italy, 20–23 June 2023. [Google Scholar] [CrossRef]
Li, Z.; Yang, Z.; Wang, M. Reinforcement learning with human feedback: Learning dynamic choices via pessimism. arXiv 2023, arXiv:2305.18438. [Google Scholar]
Han, S.; Hu, X.; Huang, H.; Jiang, M.; Zhao, Y. Adbench: Anomaly detection benchmark. Adv. Neural Inf. Process. Syst. 2022, 35, 32142–32159. [Google Scholar] [CrossRef]
Zhang, J.; Wang, Z.; Liu, P.; Zhang, Z. Energy consumption analysis and prediction of electric vehicles based on real-world driving data. Appl. Energy 2020, 275, 115408. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Secureboost: Xgboost: A scalable tree boosting system. IEEE Intell. Syst. 2016, 7, 785–794. [Google Scholar] [CrossRef]
Cheng, K.; Fan, T.; Jin, Y.; Liu, Y.; Chen, T.; Papadopoulos, D.; Yang, Q. Secureboost: A lossless federated learning framework. IEEE Intell. Syst. 2021, 36, 87–98. [Google Scholar] [CrossRef]

Figure 1. Essential EVEM applications.

Figure 2. Data intelligence architecture of iEVEM.

Figure 3. Edge–cloud collaborative system architecture of iEVEM.

Figure 4. The performance evaluation of the developed algorithm with two sub-tasks. (a) The illustration of the rational energy consumption estimation regression sub-task. (b) The illustration of the outlier vehicle detection classification sub-task.

Figure 5. Performance of different data intelligence (i.e., knowledge-enhanced vs. mechanism- and data-driven) and system (i.e., edge–cloud collaborative vs. cloud-based) architectures in EV energy consumption outlier detection.

Figure 6. The performance evaluation of different training schemes, i.e., federated learning is marked in red and centralized training is in blue. (a) The illustration of the training loss. (b) The illustration of the validation set result.

Table 1. The overall performance of iEVEM and comparatives.

	Real-World Data (R = 1%, D = 5)	Deviation Degree (R = 1%)		Injection Ratio (D = 5)		E2E Latency (ms)
	Real-World Data (R = 1%, D = 5)	Hard ( $D$ = 3)	Easy ( $D$ = 7)	Hard ( $R$ = 0.01%)	Easy ( $R$ = 10%)	E2E Latency (ms)
KNN	0.8195	0.8181	0.8219	0.7851	0.8372	27560
CBLOF	0.7304	0.7147	0.7402	0.6353	0.7854	568
IForest	0.7185	0.6755	0.7447	0.6431	0.7865	694
ECOD	0.5303	0.5184	0.5484	0.5297	0.5389	9651
DSVDD	0.5000	0.4996	0.5000	0.4998	0.5000	675
iEVEM	0.9644	0.9467	0.9748	0.9591	0.9668	185

Note: The bolded items represent the best performance among all comparison items.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, S.; Zhao, C. iEVEM: Big Data-Empowered Framework for Intelligent Electric Vehicle Energy Management. Systems 2025, 13, 118. https://doi.org/10.3390/systems13020118

AMA Style

Guo S, Zhao C. iEVEM: Big Data-Empowered Framework for Intelligent Electric Vehicle Energy Management. Systems. 2025; 13(2):118. https://doi.org/10.3390/systems13020118

Chicago/Turabian Style

Guo, Siyan, and Cong Zhao. 2025. "iEVEM: Big Data-Empowered Framework for Intelligent Electric Vehicle Energy Management" Systems 13, no. 2: 118. https://doi.org/10.3390/systems13020118

APA Style

Guo, S., & Zhao, C. (2025). iEVEM: Big Data-Empowered Framework for Intelligent Electric Vehicle Energy Management. Systems, 13(2), 118. https://doi.org/10.3390/systems13020118

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

iEVEM: Big Data-Empowered Framework for Intelligent Electric Vehicle Energy Management

Abstract

1. Introduction

2. Background Knowledge

2.1. The Current Status of EVE

2.2. The Factors Affecting the Development of EVE

3. Essential EVEM Applications

3.1. Driver-Level Applications

3.1.1. Driving Safety

3.1.2. Energy Economy

3.2. Enterprise-Level Applications

3.2.1. Quality Control

3.2.2. Cost Reduction

3.3. Social-Level Applications

3.3.1. Environmental Protection

3.3.2. Public Welfare

4. Challenges to the EVEM Framework

4.1. Data Challenges

4.2. System Challenges

5. Data Intelligence Architecture of iEVEM

5.1. The Physical Layer

5.2. The Data Layer

5.2.1. Data Association

5.2.2. Knowledge-Enhanced Feature Engineering

5.3. The Algorithm Layer

5.3.1. General Model

5.3.2. Knowledge-Enhanced Algorithm Construction

6. Edge–Cloud Collaborative System Architecture of iEVEM

6.1. EVEM Systems

6.2. Edge–Cloud Collaborative Solution

6.2.1. Edge–Cloud Collaborative Storage

6.2.2. Edge–Cloud Collaborative Computing

7. Case Study: Outlier Detection of EV Energy Consumption

7.1. Scenario

7.2. Experimental Setup

7.2.1. Dataset

7.2.2. Metrics

7.2.3. Comparatives

7.3. Implementation

7.3.1. The Implementation of Data Intelligence Architecture

7.3.2. The Implementation of Edge–Cloud Collaborative System Architecture

7.4. Main Results

7.4.1. The General Performance

7.4.2. Ablation Experiments

8. Open Issues

9. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI