Development of a Smart Energy-Saving Driving Assistance System Integrating OBD-II, YOLOv11, and Generative AI

Meng-Hua Yen; You-Xuan Lin; Kai-Po Huang; Chi-Chun Chen

doi:10.3390/electronics14173435

,

and

Department of Electronic Engineering, National Chin-Yi University of Technology, Taichung 41170, Taiwan

^*

Author to whom correspondence should be addressed.

Electronics2025, 14(17), 3435;https://doi.org/10.3390/electronics14173435

This article belongs to the Special Issue Intelligent Computing and System Integration

Version Notes

Order Reprints

Abstract

In recent years, generative AI and autonomous driving have been highly popular topics. Additionally, with the increasing global emphasis on carbon emissions and carbon trading, integrating autonomous driving technologies that can instantly perceive environ-mental changes with vehicle-based generative AI would enable vehicles to better under-stand their surroundings and provide drivers with recommendations for more energy-efficient and comfortable driving. This study employed You Only Look Once version11 (YOLOv11) for visual detection of the driving environment, integrating it with vehicle speed data received from the OBD-II system. All information is integrated and processed using the embedded Nvidia Jetson AGX Orin platform. For visual detection validation, part of the test set includes standard Taiwanese road signs. Experimental results show that incorporating Squeeze-and-Excitation Attention (SEAttention), into YOLOv11 improves the mAP50–95 accuracy by 10.1 percentage points. Generative AI processed this information in real time and provided the driver with appropriate driving recommendations, such as gently braking, detecting a pedestrian ahead, or warning of excessive speed. These recommendations are delivered through voice output to prevent driver distraction caused by looking at an interface. When a red light or pedestrian is detected, early deceleration is suggested, effectively reducing fuel consumption while also enhancing driving comfort, ultimately achieving the goal of energy-efficient driving.

Keywords:

driving assistance; YOLOv11; traffic sign detection; autonomous driving; eco-driving; OBDII; RAG; generative artificial intelligence

1. Introduction

With growing global awareness of environmental protection and energy efficiency, achieving energy-saving and carbon reduction in transportation has become a key issue. Meanwhile, autonomous driving remains a hot topic, with vehicle automation levels having been defined [], and ongoing discussions comparing the advantages and limitations of LiDAR and vision-based systems. The functionality of autonomous vehicles relies heavily on sensors to detect the external environment, and driving characteristics and driver behavior significantly influence fuel consumption and emissions. Traditional driver assistance systems are mainly based on human experience and often lack real-time analysis and alert capabilities, making it difficult to support drivers in making environmentally friendly driving decisions.

Green driving refers to a smoother driving style that reduces rapid acceleration and sudden braking, which helps save fuel and improve driving safety []. By integrating deep learning and computer vision, real-time road detection can be achieved. When specific signs such as speed limits or traffic lights are detected, and if the vehicle is moving too fast or approaching a yellow light that has not yet turned red, the system can use deep learning and vision results to gently apply the brakes in advance. This approach supports both comfortable and energy-efficient driving. In our previous research, we developed an integrated OBD-II system and applied two types of neural networks for fuel consumption analysis and prediction, with generated reports offering evaluations and behavioral suggestions [,].

Traditional warning systems often use rigid, predefined messages and require internet access to provide additional information. However, connecting a vehicle to the internet introduces potential risks, such as hacking or personal data breaches. An offline assistant system can reduce these risks by eliminating one common entry point for cyberattacks. With the integration of generative AI, the system’s role goes beyond that of a simple assistant. For example, linking generative AI with OBD-II allows for fault diagnosis [,]. In cases where an abnormal throttle opening is detected—a condition that may lead to uncontrolled acceleration but is difficult to notice in the early stage—generative AI can help monitor and issue early warnings. By combining YOLO, generative AI, and OBD-II, more advanced functions can be developed, further enhancing the capabilities of driver-assistance systems.

Numerous studies have applied the YOLO [,,] family of models for vehicle detection tasks [,]. Our laboratory has also developed a custom-integrated OBD-II system and used two types of neural networks to analyze and predict fuel consumption [,]. Generative AI has demonstrated strong capabilities in text processing and information integration, making it a promising core component for an in-vehicle assistant. Although some recent research has explored combining YOLO with image-based generative AI [], there is currently no study that fully integrates YOLO, generative AI, and OBD-II into a unified system.

This study integrates an On-Board Diagnostics II (OBD-II) system [] with real-time road condition recognition to develop an eco-driving assistance system. Using OBD-II, real-time vehicle operation data such as engine RPM, fuel consumption, and speed can be collected. These data are combined with the YOLOv11 deep-learning model, which is used to detect road objects ahead, including pedestrians, traffic lights, vehicles, and speed limit signs, in order to evaluate whether the driver’s behavior aligns with energy-saving principles. Furthermore, information from both OBD-II and image recognition is converted into textual input, which is then processed by a fine-tuned generative language model, Breeze:7B [,]. This model can generate context-aware suggestions, such as gently applying the brakes in advance or issuing a warning when the vehicle is speeding. The generated content is then delivered via a speech synthesis module, helping reduce driver distraction and enhancing both user experience and system intelligence.

2. Materials and Methods

2.1. System Architecture

This section describes the system architecture and overall operational flow of the proposed framework. As shown in Figure 1, the intelligent driving assistance system developed in this study integrates object detection, vehicle speed acquisition, distance estimation, and generative AI-based inference to provide real-time and accurate driving alerts and informational prompts.

Figure 1. System Architecture.

First, the YOLOv11 (v11.0, Ultralytics, Frederick, MD, USA) Python 3.10, PyTorch 1.13.0, CUDA 11.7, etc. [] model is trained, and its weights are converted and deployed onto the Jetson AGX Orin Developer Kit (NVIDIA, Santa Clara, CA, USA), an embedded platform. A real-time video stream is captured using a camera SG2-IMX390C-5200-GMSL2-H120H (Shenzhen SENSING Technology Co., Ltd., Shenzhen, Guangdong, China). The system reads the vehicle’s current speed in real time via an OBD-II device and, in combination with the generative AI module, infers the distance to detected objects ahead and potential driving risks. Finally, the inference results are integrated into a GUI interface, providing functionalities such as speed limit warnings, distance alerts, voice prompts, and road object recognition.

2.2. System Flowchart

This section presents the architecture and overall workflow of the intelligent driving assistance system proposed in this study. As illustrated in Figure 2, the system integrates modules for image recognition, vehicle speed acquisition, object detection, and generative AI-based inference, with the goal of providing real-time driving alerts and safety recommendations.

Figure 2. System Flowchart.

For visual processing, the system uses OpenCV to interface with the camera and capture real-time color images. The YOLOv11 model, which has been converted to the Open Neural Network Exchange (ONNX) format, is applied to detect specific road objects such as traffic lights, pedestrians, and vehicles. The detection results are then processed with bounding boxes and annotations and subsequently passed to downstream inference modules for further analysis.

For vehicle speed acquisition, the system connects to the OBD-II device via the RS232 communication protocol. It receives packet data that conforms to a predefined format (e.g., with a starting flag of 0 × 23 and length verification) and parses the Parameter ID (PID) values to obtain real-time vehicle speed information.

All acquired information—including image recognition results and vehicle speed data—is integrated into the generative AI module for semantic understanding and inference. The AI model determines whether to issue alerts (such as overspeed warnings, insufficient distance to the vehicle ahead, or red light detection) or provide driving suggestions (such as pedestrian presence or yellow light status). Finally, the inference results are converted into textual messages, which are either displayed on the user interface or delivered via voice prompts to enhance driving decision support and improve the overall user experience [].

The YOLOv11 neural network used in this study is a one-stage object detection model []. Its main advantages include high processing speed and satisfactory recognition accuracy. Compared to two-stage object detection algorithms, YOLOv11 achieves a higher frame rate (FPS), with the system in this study reaching an average of over 80 FPS.

2.3. YOLOv11 Training Process

Figure 3 illustrates the training process of YOLOv11. The dataset used in this study was compiled from multiple sources to increase diversity, including: (1) road images collected during on-road driving, (2) selected road imagery from Google Maps in the Taiping District of Taichung, Taiwan, (3) publicly available datasets from Kaggle, (4) publicly shared dashcam footage from the internet, and (5) publicly available overseas datasets, among others. In total, the dataset consists of 2926 images.

Figure 3. YOLOv11 Training Workflow.

In terms of object detection performance, it was observed that the model performed poorly when detecting small objects. To address this, we not only made architectural adjustments but also augmented the dataset with nighttime and rainy-condition images. Finally, we integrated the SEAttention [] module to enhance the detection of small objects and tuned the hyperparameters during model training. As a result, the final model evaluation showed significant improvements in both accuracy and mAP50–95 compared to the original configuration.

2.3.1. YOLO Training Image Categories

In this study, part of the dataset comes from a large-scale dataset on Kaggle. For the speed limit signs, we adopted images from the German Traffic Sign Recognition Benchmark (GTSRB), which originates from the multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) in 2011. The GTSRB dataset contains 40 categories and 50,000 images. From this dataset, we selected 411 images and combined them with 2515 self-collected images, resulting in a total of 2926 images. These images include nighttime and rainy-condition scenes, which were subsequently annotated.

The annotated categories cover 11 classes: speed limits from 30 km/h to 80 km/h, traffic lights, pedestrians, and vehicles. We adopted a fixed split for training and testing, with 2341 images (80%) used as the training set and 585 images (20%) as the testing set, for a total of 2926 images in the training-validation split. Figure 4 and Figure 5 show the 11 training categories and the distribution of annotations, while Table 1 lists the label IDs for each category.

Figure 4. Illustration of the 11 Detection Categories Used in This Study.

Figure 5. Distribution of the number of annotations in each category.

Table 1. Number of Training Data Samples per Category for YOLOv11 Training.

For the annotation process, this study utilized LabelImg, an open-source image annotation tool commonly used for labeling objects with bounding boxes. During annotation, the bounding boxes for each object in the image were converted to the YOLO-specific format. Unlike the Pascal VOC format, which defines bounding boxes using the top–left and bottom–right coordinates, the YOLO format uses the normalized center coordinates (Xc, Yc) and relative width and height (w, h), which simplifies real-time computation and model output standardization.

After annotation, each object is assigned a corresponding class ID, and the labels are exported as text files (.txt) in YOLO format. Once a sufficient number of images are labeled and used for training, the model generates a weight file. This file can then be used for inference, during which the model draws bounding boxes around detected objects.

Each bounding box consists of four key points:

○: X_min, Y_min: top-left corner
○: X_max, Y_min: top-right corner
○: X_min, Y_max: bottom-left corner
○: X_max, Y_max: bottom-right corner

The conversion formulas used to generate YOLO annotations are as follows:

X_{C} = \frac{1}{w} \frac{1}{2} (X_{m i n} + X_{m a x})

(1)

Y_{C} = \frac{1}{h} \frac{1}{2} (Y_{m i n} + Y_{m a x})

(2)

w_{y o l o} = \frac{X_{m a x} - X_{m i n}}{w}

(3)

h_{y o l o} = \frac{Y_{m a x} - Y_{m i n}}{h}

(4)

where:

w, h represent the original width and height of the image (in pixels).
All results must be normalized (range 0 to 1) to meet the input requirements of the YOLO model.

Since the training image classes and quantities were customized, this study used a batch size of 2948 epochs and a learning rate of 0.03. After training, the model achieved an accuracy of over 90%.

For distance estimation, monocular vision was employed to calculate [] the distance between the camera and detected objects. Based on the principle of similar triangles, the distance can be approximated by using the known actual height of the object and the height of its bounding box in the image (in pixels), as detected by the YOLO algorithm.

D = \frac{f \cdot H}{h}

where:

D: Distance between the object and the camera (meters)
f: Focal length of the camera
H: Actual height of the object
h: Height of the bounding box in the image (pixels)

2.3.2. YOLOv11 Algorithm Optimization

During vehicle motion, target scenes change rapidly and can be highly complex. To ensure accurate detection in such environments, this study evaluated the change in model performance after adding different modules. Ultimately, after weighing the pros and cons, we decided to incorporate SEAttention as the newly added module, as shown in Figure 6. The final results indicate that although computation speed decreased slightly, accuracy improved significantly compared to the model without module modifications.

Figure 6. The YOLOv11 Neural Network Architecture Used in This Study.

In real driving scenarios, the environment changes quickly and is highly complex. SEAttention helps the model ignore “irrelevant” background channels by emphasizing important channels and reducing interference from useless or redundant ones. This enables the model to capture target objects more accurately.

The Cross-Stage Partial Selective Attention (C2PSA) module in YOLOv11 is not a traditional attention module; rather, it can be considered a lightweight attention mechanism. SEAttention, on the other hand, can stabilize performance without significantly impacting computation speed. The combination of these two modules effectively addresses the common weakness of one-stage detectors in detecting small objects.

Regarding the Cross Stage Partial with Kernel-2 Fusion Module (C3K2) in YOLOv11, it is a variant of the C3 module introduced in early YOLOv7. Its primary function is feature extraction, combining convolutions with two different kernel sizes to enhance the model’s ability to perceive objects at multiple scales. In the backbone structure shown in Figure 6, the model builds hierarchical features from layers 0–7. Layers 9–11 perform feature refinement and fusion, where layer 9 is Spatial Pyramid Pooling-Fast (SPPF), using multiple pooling operations to fuse multi-scale features; layer 10 strengthens key channel features; and layer 11 integrates channel and spatial attention to further refine and enhance the information.

In summary, this study optimizes the model’s object detection capability by incorporating additional modules tailored to improve small object detection and overall performance.

2.4. OBD-II Data Acquisition Process

The OBDII operation flow (as shown in Figure 7) begins by creating a data buffer that continuously reads incoming data. Upon receiving the header information 0 × 23, the protocol first checks whether the preceding conditions are met and whether the data length is greater than 4. If this condition is satisfied, it further checks whether the data length is less than 8. When both conditions are met, the process proceeds to the next stage. At this stage, the PID parameter is used to determine the type of data to be read: PID = 1 corresponds to vehicle speed, PID = 2 to RPM, PID = 3 to STime, PID = 4 to coolant temperature, and so on. In this study, only the vehicle speed information is read for estimating and optimizing comfortable driving.

Figure 7. OBDII Flowchart.

When retrieving vehicle speed data, it is necessary to first decode the data packet received from the OBD-II interface. The packet structure is shown in Table 2. Among the fields, the Protocol Type is used to identify the communication protocol being used. For example, a protocol type of 01 indicates SAE J1850 PWM, which is commonly used by Ford vehicles. Different OBD-II protocols impose different constraints on packet length and format. The final result of the decoding and parsing process is illustrated in Figure 8.

Table 2. Packet Format Structure Description.

Figure 8. Data transmitted from the OBD-II reading simulator. The highlighted blue field indicates the vehicle speed value (62 km/h) parsed from the data packet.

An example of the actual data parsing process is as follows. Assume that the received packet is:

0 × 23, 0 × 21, 0 × 01, 0 × 3C, 0 × 3D

0 × 23: Start flag indicating the beginning of the packet.
0 × 21: Encodes both the data type and length.
-
Type = 2 → obtained by (0 × 21 >> 4)
-
Length = 1 → obtained by (0 × 21 & 0 × 0F)
0 × 01: PID (Parameter ID), where 0 × 01 represents vehicle speed.
0 × 3C: Data field, representing the vehicle speed value = 60 km/h (0 × 3C in hexadecimal equals 60 in decimal).
0 × 3D: Checksum (CS). The expected checksum is calculated as:
-
CS = 0 × 01 + 0 × 3C = 0 × 3D

2.5. Generative AI Process

During the training of the generative AI model, fine-tuning was first conducted on a dedicated experimental server equipped with dual NVIDIA RTX A6000*2 GPUs and 96 GB of RAM. The first step involved text data collection, which included both custom-prepared datasets and publicly available data. The custom dataset was constructed by manually inputting questions relevant to this research and generating related responses using ChatGPT (developed by OpenAI, model: ChatGPT-4o). Irrelevant or redundant content was removed, followed by hyperparameter tuning for optimization.

Once fine-tuning was completed, the model was quantized using llama.cpp [] to reduce computational complexity and accelerate inference. Prompts were then applied for performance testing. Finally, the optimized model was deployed to an edge computing device for real-time execution. Figure 9 illustrates the workflow of the generative AI process.

Figure 9. Generative AI Process Flowchart.

To deploy Breeze:7B on the Jetson AGX Orin Developer Kit, it is essential to minimize computational load during runtime []. Therefore, the fine-tuning strategy adopted in this study involved dividing the dataset into two categories: a general-purpose dataset, which contains question-and-answer pairs relevant to the overall research context, and a domain-specific dataset, which focuses on critical alert-related targets such as red lights, speed limit signs, and dangerously close objects [].

The model was then fine-tuned using the LoRA (Low-Rank Adaptation) method [], which allows the model to acquire additional task-specific knowledge while preserving its original reasoning capabilities. After fine-tuning, prompt engineering was applied to guide the model’s responses according to driving assistance requirements.

2.5.1. Categories of Data Used for Generative AI Training

During the fine-tuning of Breeze:7B, it was necessary to prepare appropriate training data, as illustrated in Figure 10. This study adopted a structured JSON format, with each training sample composed of three fields, instruction, input, and output, following the format used in Alpaca [] and the LLaMA Factory framework.

Figure 10. Illustration of Fine-tuning Data Structure for Generative AI.

The data sources included manually written question–answer pairs relevant to the research domain, supplemented with additional content generated by ChatGPT. All responses were then manually reviewed and refined to improve semantic precision and logical consistency.

In the generative AI dataset, the general-purpose data consist of individual entries, each designed around a question relevant to the proposed system. These questions cover topics such as system advantages, hardware architecture, and implementation techniques. Each question is paired with a detailed response written in a tone consistent with technical documentation. Some responses include references to specific hardware (e.g., NVIDIA Jetson Orin), AI modules (e.g., YOLOv11 with SEAttention), and functional workflows (e.g., OBD-II data retrieval, task classification procedures). This structure helps the model develop an understanding of the overall system design and technical context during fine-tuning.

The domain-specific dataset focuses on scenarios where the system must determine whether to issue driving alerts. For instance, if a red light is detected, the current speed exceeds the posted limit, or nearby objects are too close to the vehicle (e.g., red: 2 car: 4 distance: 1 m, indicating 2 red lights, 4 vehicles, and a minimum distance of 1 m), the model is expected to generate a prompt suggesting that the driver apply the brakes. In contrast, when no critical conditions are present (e.g., green: 1 car: 2 distance: 8 m), the system remains in a normal state without issuing alerts. These data samples serve as the fine-tuning foundation for the generative language model, helping it learn to generate responses that are logically sound, contextually appropriate, and terminologically accurate for specific driving assistance tasks.

2.5.2. Evaluation Metrics for Generative AI

In this study, the driving assistance prompts generated by the model are compared with reference answers written according to traffic regulations. ROUGE and BERTScore are adopted as the evaluation metrics.

ROUGE measures the similarity in vocabulary and syntactic structure between the generated text and the reference answer based on n-gram overlap. It includes three sub-metrics:

ROUGE-1: Evaluates the overlap of unigrams (single characters or words) to assess whether the generated content contains key terms.

ROUGE-2: Evaluates the overlap of bigrams (two consecutive words) to assess the coherence of the generated text.

ROUGE-L: Calculates the overlap based on the Longest Common Subsequence (LCS) to assess the similarity in sentence structure and word order.

BERTScore evaluates the semantic similarity between the generated text and the reference answer by computing the cosine similarity of their word embeddings.

2.5.3. Hyperparameter Tuning for the Generative AI Model

During the fine-tuning process of Breeze:7B, a series of hyperparameter configurations were applied to enhance the model’s generalization capability and convergence efficiency for task-specific objectives. As shown in Table 3 and Table 4, the training process in this study was conducted with the following settings:

Table 3. Key Fine-Tuning Hyperparameters.

Table 4. Simplified Hyperparameter Tuning and Module Strategy.

Dataset Configuration

The maximum sequence length per sample (cutoff_len) was set to 2048 tokens, enabling support for extended contextual understanding in long-form input.

2.: Training Type and Resource Allocation

Mixed-precision training was applied using bf16 (Brain Floating Point 16) to improve computational efficiency and reduce memory usage.

Each training step used a batch size of four, with eight gradient accumulation steps to simulate a larger effective batch size.

Dataset offloading was disabled, meaning data were processed directly in system memory (RAM).

3.: Learning Rate and Optimizer Settings

The initial learning rate was set to 1 × 10⁻⁵, optimized using the adamw_torch optimizer.

A cosine decay learning rate scheduler was adopted to ensure smooth convergence during training.

4.: Module Selection and Freezing Strategy

The majority of the model’s architecture was frozen, with only two layers kept trainable (freeze_trainable_layers: 2).

The all-module scope was specified, indicating global application of the freezing strategy.

This setup helps preserve the pre-trained language knowledge while adapting select layers for downstream tasks.

5.: Parameter-Efficient Fine-Tuning with LoRA and GaLore

LoRA (Low-Rank Adaptation) was configured with rank = 8 and alpha = 16, with no dropout applied.

GaLore (Gradient Low-Rank Reuse) was enabled with rank = 16, reducing memory and computational overhead while preserving training quality.

6.: Custom Training Engine and Scheduling Framework

The training used custom optimization strategies, including Apollo and Badam, both of which dynamically adjusted training behaviors through parameters such as update_interval and scale.

These strategies allow for layer-wise adaptive scheduling, enhancing stability, and fine-tuning performance on task-specific objectives.

Additionally, a special configuration was applied with badam_mode set to layer and badam_switch_mode set to ascending. This setting enhances adaptive learning dynamics in the higher layers of the model, allowing them to adjust more actively during fine-tuning and better capture task-specific representations.

In summary, the hyperparameter tuning strategy for Breeze: 7B was designed to strike a balance between task-specific stability, computational efficiency, and preservation of pre-trained knowledge. Achieving this balance is essential for obtaining optimal results in specialized domain tasks. This approach is particularly well-suited for low-resource environments where fine-tuning must be both lightweight and effective.

2.5.4. Conversion Breeze: 7B to GPT Graph Unified Format (GGUF)

After completing the fine-tuning of Breeze: 7B, the model weights were converted into the GPT Graph Unified Format (GGUF) to enable efficient inference on resource-constrained edge devices such as the Jetson AGX Orin Developer Kit. In this study, we used the open-source inference engine llama.cpp as the tool for model conversion and deployment. This engine supports quantization, memory optimization, and cross-platform execution, making it well-suited for deploying large language models in limited-resource environments.

3. Results

3.1. Testing of the Generative AI Model

The first part of the generative AI model evaluation focuses on verifying the model’s inference capability and semantic response quality. Figure 11 illustrates the processing workflow after an object is detected and converted into textual input. Table 5 presents the inference test results of the fine-tuned Breeze: 7B model, conducted using a Python3.10 based script.

Figure 11. Image Input Processing Workflow for the Generative AI Model.

Table 5. Generative AI Question–Answering Examples for Fine-Tuning.

The model responded:

“This system uses the Jetson AGX Orin Developer Kit as the edge computing device, along with an OBD-II adapter module and an in-vehicle camera for multimodal data collection.”

In another test, the input prompt was:

“What can this system do?”

The model responded:

“It provides real-time road condition monitoring and driving assistance.”

These results demonstrate that the Breeze: 7B model, after fine-tuning with a domain-specific dataset, is capable of understanding the contextual semantics of in-vehicle systems and generating natural language responses that are specific, accurate, and logically coherent. The model can effectively respond to hardware-related and functional queries, showing strong potential for deployment in edge environments.

It is important to note that the dataset entry “Car: 2 red: 1 distance: 8 m” in Table 5 is not presented in the form of a question. This is because such data do not represent natural language queries; rather, they are sensor status inputs received by an in-vehicle system in real driving scenarios. The design of this input aims to test whether the model can correctly interpret numerical representations of the environment (e.g., detecting two cars and one red light at a distance of 8 m simultaneously) and generate appropriate recommendations (such as “Please brake immediately”). Therefore, Table 5 includes both “question–answer type inputs” and “sensor status inputs” in order to comprehensively evaluate the model’s reasoning and response capabilities under different input conditions.

3.2. YOLO Model Results

Figure 12 shows the normalized confusion matrix results of the proposed model on the test dataset. It can be observed that most categories are accurately recognized by the model, with values along the diagonal close to 1, indicating strong classification performance in these classes. For example, traffic sign categories such as speed_limit_30 to speed_limit_80 all achieved 100% prediction accuracy, demonstrating the model’s high recognition capability for numeric speed limit signs.

Figure 12. Testing result: Confusion Matrix.

In the car category, 77% of the samples were misclassified as background. This is because, during dataset annotation, vehicles in the background were often left unmarked when they were either overlapping with other objects or too small, leading to many unannotated vehicles in the background class. However, subsequent real-world testing showed that this did not affect the detection of vehicles in practice.

A similar phenomenon has been reported in other YOLO-based studies []. For instance, in Rui Qian’s vehicle detection research (YOLOv10) [], although the car category also showed a notable confusion ratio, the F1-score still reached 0.76, indicating stable performance in real-world vehicle detection. Likewise, the application of YOLOv4 to aerial imagery [] showed that while some small vehicles were misclassified, major vehicle types (e.g., sedans) still maintained an accuracy of over 93%, suggesting that inaccuracies in the confusion matrix do not necessarily affect detection performance in real-world scenarios.

3.3. Hardware Deployment Diagram

Figure 13 shows the installation of the hardware modules on the vehicle, which include the automotive-grade camera, OBDII module, Jetson AGX Orin Developer Kit, and the MAX9296 development board. The experimental vehicle used in this study is a Mitsubishi Lancer Fortis/iO.

Figure 13. Physical Setup of Hardware Modules in the Vehicle.

The OBDII port is located at the lower left of the driver’s instrument panel. As shown in the figure, once the OBDII module is connected to the vehicle, the indicator lights operate normally, indicating that the OBDII is successfully receiving data. The bottom–right section of the figure displays the system’s operating interface.

Figure 14 presents the results of on-road testing, where Figure 14A illustrates a safe scenario and Figure 14B illustrates a warning scenario. The system demonstrated good stability during the tests, as seen in the GUI outputs in Figure 14, where the object bounding boxes are accurately drawn.

Figure 14. (A) Green Light—Road Test Scenario. (B) Red Light—Road Test Scenario.

In Figure 14A, when a green light is detected and no nearby vehicles are within an unsafe distance, the generative AI responds with safe. In Figure 14B, when a red light is detected and another vehicle is too close, the system issues a warning, such as Warning: Be aware of your surroundings and stop immediately. These results indicate that the system is capable of providing reliable driving suggestions and early warnings during real-world road tests.

4. Discussion

This study integrates three core components: YOLOv11, OBD-II, and a generative AI model. The YOLOv11 architecture was customized to improve the accuracy of road object detection, enabling the system to better interpret environmental information. The OBD-II module captures real-time vehicle speed data, allowing the system to comprehensively gather both external (road) and internal (vehicle) information.

The more complete the incoming data, the more effectively the generative AI model can consolidate and reason over this information to generate responses. In this study, the generative AI serves as the “brains” of the system, offering several advantages over traditional rule-based systems. Not only can it integrate multimodal inputs more intelligently, but it also produces natural and flexible alert messages rather than rigid warnings. Furthermore, it supports offline interaction, enabling real-time question answering without requiring cloud access.

As shown in Figure 15A, the final training results were evaluated using common performance metrics, including Precision, mean Average Precision (mAP), Recall, and F1-score. The figure demonstrates that both the mAP and Precision curves not only consistently increased but also remained stable without significant fluctuations. Specifically, the model achieved a Precision of 0.91 and an mAP50–95 of 0.668. The smooth trend lines indicate that the model converged in a stable manner throughout training.

Figure 15. (A). Model Training Curve. (B). Training Curve of the Original YOLOv11 Model.

Compared to the original YOLOv11 model [], the experimental results show that the addition of SEAttention improves both accuracy and mAP50–95, as well as the overall convergence behavior. As shown in Figure 15B, when trained using the same hyperparameters and dataset, the baseline YOLOv11 exhibited greater oscillation in its training loss curve.

With SEAttention integrated into the YOLOv11 architecture, the model not only achieved higher precision but also demonstrated smoother convergence during training, reducing fluctuations in the learning process. This indicates that the enhanced model is more stable and accurate under the same training conditions.

Although the training performance of YOLOv11 was generally satisfactory, misclassification between yellow lights and red lights was occasionally observed. This issue is likely caused by variations in lighting conditions or viewing angles. To address this, the dataset can be augmented with additional images of yellow lights captured under diverse scenarios.

Furthermore, an approximately 80% similarity between the “background” and “car” classes was noted. This can be attributed to incomplete annotations during data labeling—certain vehicles were occluded or too small and thus were not annotated, resulting in visual overlap between these two categories. Improving the labeling strategy is expected to reduce this confusion and enhance classification accuracy. Detailed values are shown in Table 6.

Table 6. Accuracy and other indicators.

In the case of the generative AI component, there are corresponding limitations. Due to the relatively high computational resource requirements, the model is unable to perform inference on every single frame in real time. Nevertheless, the system is capable of issuing warnings when critical events occur, such as red light detection or the presence of imminent hazards. These alerts provide drivers with sufficient time to respond and thus fulfill the system’s role as a driving assistance mechanism.

In future research, more comprehensive traffic-related elements—such as pedestrian movement and vehicle behavior—can be further integrated into the proposed visual recognition system. Expanding the dataset to include a wider variety of road conditions and scenarios will enhance the system’s robustness and generalization. This approach could also be extended to autonomous driving (AV) applications, aligning with the principles of eco-driving to support energy-efficient and intelligent decision-making in future unmanned vehicle systems.

Additionally, to improve the reliability of Breeze: 7B in critical tasks, the integration of Retrieval-Augmented Generation (RAG) [] is recommended. RAG can enhance the model’s response accuracy by incorporating external knowledge retrieval during inference, thereby reducing the likelihood of factual errors [] and improving performance in mission-critical environments.

Table 7 presents the comparison between the modified YOLOv11-SE and other models. In the table, results highlighted in red indicate the best performance, while those in blue indicate the worst. It can be observed that, aside from being slightly lower than YOLOv7 in accuracy, YOLOv11-SE does not show inferior performance in any other category compared with the other versions, and it achieves strong results in mAP.

Table 7. Comparison with other versions.

As shown in Table 8, after integrating the SEAttention module, all evaluation metrics exhibit a clear improvement.

Table 8. Ablation Study.

In the evaluation results for BERTScore and ROUGE, as shown in Table 9, Breeze7B (fine-tuned) outperforms other non-fine-tuned models. For the ROUGE-2 metric, the non-fine-tuned models, lacking an understanding of YOLO’s label structure, tend to produce outputs with broken phrases and disordered word sequences. This leads to ROUGE-2 scores that are nearly zero, as the models fail to generate continuous word pairs corresponding to the reference answers.

Table 9. BERTScore and ROUGE Evaluation Results for Each Model.

In contrast, Breeze7B (fine-tuned) achieves a Precision of 0.3126, Recall of 0.2528, and F1-score of 0.2756, demonstrating a significantly better ability to interpret YOLO output labels and generate coherent Chinese prompt sentences compared with the near-zero scores of the base models. The results are presented in Table 9.

All latency values in this table are based on actual measurements. Using Python, we recorded the time difference before and after the execution of each module with the time. Time() function to obtain the latency for each stage. After collecting 100 latency measurements for each stage, we calculated the average total latency. Table 10 presents the quantified latency values for each step.

Table 10. Delay quantization experiment.

As shown in Table 10, the generative AI module also has inherent limitations. Due to its relatively high computational resource requirements, it is not feasible to perform inference for every single frame. However, in this study, the latency introduced by the generative AI does not compromise driving safety. When a red light or hazardous situation is detected, the system bypasses the generative AI and issues a direct alert instead. This approach ensures that the driver has sufficient time to respond, thereby fulfilling the goal of providing effective driver assistance.

5. Conclusions

This study implemented an energy-efficient driving assistance system based on generative AI. The system utilizes the YOLOv11 neural network to recognize common object types encountered on the road. YOLOv11 demonstrated favorable detection speed and accuracy, particularly in rapidly changing and complex driving environments. By integrating real-time vehicle speed data from the OBD-II interface, the system can provide contextual driving analysis. The collected information is then passed to Breeze: 7B, a generative language model, which successfully interprets the input and delivers appropriate driving suggestions or warnings.

The experimental site of this study was located in the urban area of Taichung City, where dedicated bicycle lanes have been planned. As a result, non-motorized vehicles (e.g., bicycles) typically do not interact with the main traffic flow, and thus bicycles were not included as one of the detection targets in this study. In future work, we will consider expanding the data collection scope to include scenarios with non-motorized vehicles in order to enhance the model’s generalization capability in more diverse road environments.

Encouraging drivers to brake earlier not only helps reduce fuel consumption but also decreases brake wear; however, it may extend travel time. This study did not incorporate these factors into our system for quantification. Nevertheless, in our previous work, we measured fuel consumption under different driving behaviors. In future research, we plan to integrate those findings with the proposed system to further calculate the fuel savings attributable to its interventions while also assessing its impact on greenhouse gas emissions. This approach will enable us to more comprehensively validate the system’s energy-saving and environmental benefits.

In future application scenarios, the proposed system is particularly well-suited for deployment on fully or semi-autonomous platforms. For example, autonomous taxis, shuttle buses, or other driverless vehicles can directly benefit from the system’s ability to process traffic scene data (through YOLO-based object detection and OBDII information) combined with a fine-tuned language model to generate context-aware driving commands. In such applications, the system can not only control driving behaviors but also function as an interactive assistant, responding to passengers’ questions or concerns regarding the driving route.

Author Contributions

Methodology, Y.-X.L.; Software, Y.-X.L. and K.-P.H.; Validation, Y.-X.L.; Writing—original draft, Y.-X.L.; Writing—review & editing, M.-H.Y. and C.-C.C. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge the support provided for this study by the National Science and Technology Council, Taiwan, under grant number NSTC 113-2622-E-167-011, NSTC 113-2221-E-167-006, NSTC 114-2221-E-167-005-MY3, NSTC 114-2637-E-167-002 and NSTC 114-2221-E-167-039.

Informed Consent Statement

Informed consent was obtained form all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy and research project restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhao, J.; Liang, B.; Chen, Q. The key technology toward the self-driving car. Int. J. Intell. Unmanned Syst. 2018, 6, 2–20. [Google Scholar] [CrossRef]
Puchalski, A.; Komorska, I. Driving style analysis and driver classification using OBD data of a hybrid electric vehicle. Transp. Probl. 2020, 15, 83–94. [Google Scholar] [CrossRef]
Chen, C.C.; Tien, S.L.; Lin, Y.T.; Teng, C.C.; Yen, M.H. Truck Driving Assistance System. In Proceedings of the 22nd IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing SNPD 2021-Fall, Taichung, Taiwan,, 24–26 November 2021; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2022; pp. 122–125. [Google Scholar] [CrossRef]
Yen, M.H.; Tian, S.L.; Lin, Y.T.; Yang, C.W.; Chen, C.C. Combining a universal OBD-II module with deep learning to develop an eco-driving analysis system. Appl. Sci. 2021, 11, 4481. [Google Scholar] [CrossRef]
Stappen, L.; Dillmann, J.; Striegel, S.; Vögel, H.J.; Flores-Herr, N.; Schuller, B.W. Integrating Generative Artificial Intelligence in Intelligent Vehicle Systems. In Proceedings of the IEEE Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2024; pp. 5790–5797. [Google Scholar] [CrossRef]
Ranchev, V.; Jordanov, R.; Miletiev, R. Integration of Generative AI for Intelligent Diagnostic of Vehicles. In Proceedings of the National Conference with International Participation (TELECOM), Sofia, Bulgaria, 21–22 November 2024; Institute of Electrical and Electronics Engineers: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE Computer Society: Los Alamitos, CA, USA, 2016; pp. 779–788. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Zhang, X.; Song, Y.; Song, Y.; Yang, D.; Ye, Y.; Zhou, J.; Zhang, L. AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters. arXiv 2023, arXiv:2311.11587. [Google Scholar] [CrossRef]
Lu, L.; He, D.; Liu, C.; Deng, Z. MASF-YOLO: An Improved YOLOv11 Network for Small Object Detection on Drone View. arXiv 2025, arXiv:2504.18136. [Google Scholar] [CrossRef]
Alif, M.A.R. YOLOv11 for Vehicle Detection: Advancements, Performance, and Applications in Intelligent Transportation Systems. arXiv 2024, arXiv:2410.22898. [Google Scholar] [CrossRef]
Modak, S.; Stein, A. Enhancing weed detection performance by means of GenAI-based image augmentation. arXiv 2024, arXiv:2411.18513. [Google Scholar]
Intelligent Vehicle Diagnostic System for Service Center Using OBD-II and IoT. In Proceedings of the International Conference of Science and Technology 2021, Oluvil, Sri Lanka, 27 July 2021; Available online: https://www.researchgate.net/publication/355184771_Intelligent_Vehicle_Diagnostic_System_for_Service_Center_using_OBD-II_and_IoT (accessed on 23 August 2025).
Hsu, C.-J.; Liu, C.-L.; Liao, F.-T.; Hsu, P.-C.; Chen, Y.-C.; Shiu, D.-S. Breeze-7B Technical Report. arXiv 2024, arXiv:2403.02712. [Google Scholar] [CrossRef]
Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; de las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825. [Google Scholar] [CrossRef]
Rasheed, A.F.; Zarkoosh, M. YOLOv11 Optimization for Efficient Resource Utilization. J. Supercomput. 2025, 81, 1085. [Google Scholar] [CrossRef]
Say the Word: Voice Systems Can Reduce Some Types of Distraction—TRID. Available online: https://trid.trb.org/View/1347066 (accessed on 23 August 2025).
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, The Switzerland, 2016; pp. 21–37. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks|IEEE Conference Publication|IEEE Xplore. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; Available online: https://ieeexplore.ieee.org/document/8578843 (accessed on 23 August 2025).
Azurmendi, I.; Zulueta, E.; Lopez-Guede, J.M.; González, M. Simultaneous Object Detection and Distance Estimation for Indoor Autonomous Vehicles. Electronics 2023, 12, 4719. [Google Scholar] [CrossRef]
Chen, L.; Zhao, Y.; Xie, Q.; Sheng, Q. Optimization of Armv9 architecture general large language model inference performance based on Llama.cpp. arXiv 2024, arXiv:2406.10816. [Google Scholar] [CrossRef]
Li, X.L.; Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the ACL-IJCNLP 2021—59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2021; pp. 4582–4597. [Google Scholar] [CrossRef]
Zheng, J.; Hong, H.; Wang, X.; Su, J.; Liang, Y.; Wu, S. Fine-tuning Large Language Models for Domain-specific Machine Translation. arXiv 2024, arXiv:2402.15061. [Google Scholar]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-Rank Adaptation of Large Language Models. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022), Virtual, 25–29 April 2022. [Google Scholar]
Taori, R.; Gulrajani, I.; Zhang, T.; Dubois, Y.; Li, X.; Guestrin, C.; Liang, P.; Hashimoto, T.B. Alpaca: A Strong, Replicable Instruction-Following Model. Available online: https://crfm.stanford.edu/2023/03/13/alpaca.html (accessed on 23 August 2025).
Geetha, A.S.; Alif, M.A.R.; Hussain, M.; Allen, P. Comparative Analysis of YOLOv8 and YOLOv10 in Vehicle Detection: Performance Metrics and Model Efficacy. Vehicles 2024, 6, 1364–1382. [Google Scholar] [CrossRef]
Qian, R.; Ding, Y. An Efficient UAV Image Object Detection Algorithm Based on Global Attention and Multi-Scale Feature Fusion. Electronics 2024, 13, 3989. [Google Scholar] [CrossRef]
Lin, T.H.; Su, C.W. Oriented Vehicle Detection in Aerial Images Based on YOLOv4. Sensors 2022, 22, 8394. [Google Scholar] [CrossRef] [PubMed]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation: San Diego, CA, USA, 2020. [Google Scholar]
Jävergård, N.; Lyons, R.; Muntean, A.; Forsman, J. Preserving correlations: A statistical method for generating synthetic data. arXiv 2024, arXiv:2403.01471. [Google Scholar] [CrossRef]

Figure 1. System Architecture.

Figure 2. System Flowchart.

Figure 3. YOLOv11 Training Workflow.

Figure 4. Illustration of the 11 Detection Categories Used in This Study.

Figure 5. Distribution of the number of annotations in each category.

Figure 6. The YOLOv11 Neural Network Architecture Used in This Study.

Figure 7. OBDII Flowchart.

Figure 8. Data transmitted from the OBD-II reading simulator. The highlighted blue field indicates the vehicle speed value (62 km/h) parsed from the data packet.

Figure 9. Generative AI Process Flowchart.

Figure 10. Illustration of Fine-tuning Data Structure for Generative AI.

Figure 11. Image Input Processing Workflow for the Generative AI Model.

Figure 12. Testing result: Confusion Matrix.

Figure 13. Physical Setup of Hardware Modules in the Vehicle.

Figure 14. (A) Green Light—Road Test Scenario. (B) Red Light—Road Test Scenario.

Figure 15. (A). Model Training Curve. (B). Training Curve of the Original YOLOv11 Model.

Table 1. Number of Training Data Samples per Category for YOLOv11 Training.

Class ID	Class Name	Label Quantity
0	red_light	1005
1	green_light	1030
2	yellow_light	211
3	speed_limit_30	49
4	speed_limit_40	63
5	speed_limit_50	103
6	speed_limit_60	75
7	speed_limit_70	51
8	speed_limit_80	56
9	car	14,402
10	person	674

Table 2. Packet Format Structure Description.

Field	Size (Bytes)	Description
Header	1 byte	Fixed value 0 × 23, used to indicate the start of a packet.
Type + Length	1 byte	The upper 4 bits represent the Protocol Type; the lower 4 bits indicate the Packet Length.
PID	1 byte	Parameter Identifier, used to specify the type of vehicle data contained.
Data	N bytes	The data payload, with a length of (Packet Length-1).
Checksum (CS)	1 byte	Used for validation, calculated as the sum of PID and Data bytes.

Table 3. Key Fine-Tuning Hyperparameters.

Parameter Name	Value	Description
cutoff_len	2048	Supports long-text context, suitable for handling complex tasks
compute_type	bf16	Utilizes bf16 mixed-precision training to save memory and improve efficiency
batch_size	4	Trains with small batch size; used in conjunction with gradient accumulation for stability
gradient_accumulation_steps	8	Simulates large-batch training to enhance model stability
learning_rate	1 × 10⁻⁵	Initial learning rate, influencing convergence speed
optimizer	adamw_torch	Uses the AdamW optimizer, balancing sparsity and training stability
lr_scheduler_type	cosine	Applies cosine decay strategy to promote smooth convergence

Table 4. Simplified Hyperparameter Tuning and Module Strategy.

Parameter Name	Value	Description
freeze_trainable_layers	2	Only two layers are trainable; others are frozen to preserve pre-trained knowledge
lora_rank	8	LoRA rank, controlling the dimensionality of low-rank adaptation
lora_alpha	16	Scaling factor to regulate the magnitude of LoRA adjustments
galore_rank	16	Applies GaLore to reduce memory and computational overhead
apollo_scale	32	Scaling factor for dynamic adjustment in Apollo-based parameter tuning

Table 5. Generative AI Question–Answering Examples for Fine-Tuning.

Dataset Category	Question	Responses
General Datasets	What hardware does this system use?	This system utilizes the NVIDIA Jetson AGX Orin as the edge computing device, along with an OBDII adapter and automotive-grade cameras for data collection.
General Datasets	What can this system do?	It provides real-time traffic monitoring and driving assistance.
Specialized Datasets	Car: 2 red: 1 distance: 8 m	Please brake immediately.

Table 6. Accuracy and other indicators.

Class Name	Class Value
precision	90.1%
recall	90.2%
mAP50–95	66.6%
F1 score	90.14%

Table 7. Comparison with other versions.

Model	Precision (%)	mAP@50%	mAP@50~95%	Param/M	FPS
YOLOv7	90.4	84.7	52.9	36.5	161
YOLOv7-tiny	86.4	80.1	49.2	6	384
YOLOv5l	90.1	83.7	52.7	46.2	106
YOLOv8n	83.2	74.1	48.1	3	333
YOLOv8x	88.8	83.2	53.9	68.1	49
YOLOv11	82.8	80.2	56.5	3.4	189
YOLOv11-SE	90.1	92	66.6	5.05	81

Table 8. Ablation Study.

Model	Precision (%)	mAP@50%	mAP@50~95%	Param/M	FPS
YOLOv11	82.8	80.2	56.5	3.4	189
YOLOv11-SE	90.1	92	66.6	5.05	81

Table 9. BERTScore and ROUGE Evaluation Results for Each Model.

Model	Metric	Precision	Recall	F1
Breeze7b (fine-turn)	BertScore	0.8375	0.7732	0.8041
	Rouge1	0.5341	0.4381	0.4746
	Rouge2	0.3126	0.2528	0.2756
	RougeL	0.4904	0.3914	0.4273
Breeze7b (Base)	BertScore	0.7458	0.6631	0.7020
	Rouge1	0.0350	0.0459	0.0371
	Rouge2	0.0002	0.0003	0.0002
	RougeL	0.0218	0.0380	0.0063
llama_3_8b (Base)	BertScore	0.62164	0.5909	0.6561
	Rouge1	0.0118	0.0197	0.0371
	Rouge2	0	0	0
	RougeL	0.0053	0.0132	0.0075
Gemma7b (Base)	BertScore	0.4672	0.5131	0.4891
	Rouge1	0.0048	0.0099	0.0063
	Rouge2	0	0	0
	RougeL	0.0034	0.0085	0.0048

Table 10. Delay quantization experiment.

Module	Description	Latency (ms)
YOLOv11-SE	Detects 640 × 640 images, FPS 81	~12 ms
Label-to-prompt conversion	Converts detected object class labels into prompts	~5 ms
Breeze: 7B (unquantized, 40 tokens)	Generates AI response	~2000 ms
Pyttsx3	Text-to-speech output	~100 ms
Total latency (per instance)	YOLO → Breeze → pyttsx3	~2117 ms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Development of a Smart Energy-Saving Driving Assistance System Integrating OBD-II, YOLOv11, and Generative AI

Abstract

1. Introduction

2. Materials and Methods

2.1. System Architecture

2.2. System Flowchart

2.3. YOLOv11 Training Process

2.3.1. YOLO Training Image Categories

2.3.2. YOLOv11 Algorithm Optimization

2.4. OBD-II Data Acquisition Process

2.5. Generative AI Process

2.5.1. Categories of Data Used for Generative AI Training

2.5.2. Evaluation Metrics for Generative AI

2.5.3. Hyperparameter Tuning for the Generative AI Model

2.5.4. Conversion Breeze: 7B to GPT Graph Unified Format (GGUF)

3. Results

3.1. Testing of the Generative AI Model

3.2. YOLO Model Results

3.3. Hardware Deployment Diagram

4. Discussion

5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics