Next Article in Journal
Improving Plausibility of Coordinate Predictions by Combining Adversarial Training with Transformer Models
Previous Article in Journal
StyleVision: AI-Integrated Stylist System with Intelligent Wardrobe Management and Outfit Visualization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Evaluation of Impact of Convolutional Neural Network-Based Feature Extractors on Deep Reinforcement Learning for Autonomous Driving †

Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724, Taiwan
*
Author to whom correspondence should be addressed.
Presented at 8th International Conference on Knowledge Innovation and Invention 2025 (ICKII 2025), Fukuoka, Japan, 22–24 August 2025.
Eng. Proc. 2025, 120(1), 27; https://doi.org/10.3390/engproc2025120027
Published: 2 February 2026
(This article belongs to the Proceedings of 8th International Conference on Knowledge Innovation and Invention)

Abstract

Reinforcement Learning (RL) enables learning optimal decision-making strategies by maximizing cumulative rewards. Deep reinforcement learning (DRL) enhances this process by integrating deep neural networks (DNNs) for effective feature extraction from high-dimensional input data. Unlike prior studies focusing on algorithm design, we investigated the impact of different feature extractors, DNNs, on DRL performance. We propose an enhanced feature extraction model to improve control effectiveness based on the proximal policy optimization (PPO) framework in autonomous driving scenarios. Through a comparative analysis of well-known convolutional neural networks (CNNs), MobileNet, SqueezeNet, and ResNet, the experimental results demonstrate that our model achieves higher cumulative rewards and better control stability, providing valuable insights for DRL applications in autonomous systems.

1. Introduction

Reinforcement learning (RL) is a machine learning approach where an agent interacts with an environment and learns to make optimal decisions through trial and error by receiving rewards or penalties. The agent’s objective is to learn a policy that maps environmental states to actions in a way that maximizes long-term cumulative rewards. This process involves balancing exploration, trying new actions to discover their effects, and exploitation, selecting known actions that yield high rewards [1,2]. Next, deep reinforcement learning (DRL) extends traditional RL by incorporating deep neural networks (DNNs) to manage complex and high-dimensional state spaces effectively. In DRL, agents utilize deep learning techniques to extract features from raw sensory inputs, enabling them to learn optimal policies directly from the environment. This integration allows DRL to excel in complex tasks, e.g., video game playing, robotic control, autonomous driving, and so on [1,2,3]. This approach relies on DNNs for automatic feature extraction, which addresses a key limitation of conventional RL methods: their inability to efficiently process large-scale or unstructured data representations.
Previous research has focused on how to improve DRL algorithms [3,4,5]. For example, deep Q-network (DQN) combines value-based Q-Learning with DNNs to enable RL to handle high-dimensional state spaces. However, DQN suffers from instability and sample inefficiency, which motivates the development of enhancements such as double DQN, dueling DQN, prioritized experience replay, and so on. On the policy-based side, policy gradient provides a direct way to optimize policies, though it often struggles with high variance. This leads to the emergence of actor-critic architectures, such as advantage actor-critic (A2C) and asynchronous advantage actor-critic (A3C), which balance bias and variance more effectively. Recently, proximal policy optimization (PPO) further improved stability by constraining policy updates within a trusted region. On the other hand, alternative approaches are adopted to enhance performance. For instance, a training strategy is adopted to concentrate exclusively on critical events, challenging segments within the driving environment [3] to improve training efficiency and accelerate learning.
This idea inspires us to explore whether achieving better training and inference performance is possible by modifying only the feature extractors, without altering the overall learning framework. Moreover, similar to conventional CNN architectures for other applications, deeper is not always better. While increasing depth can enhance feature extraction and improve performance on complex visual tasks, it also introduces potential drawbacks, e.g., overfitting, higher computational overhead, training instability, and so forth. Also, in the context of DRL, where real-time decision-making is critical, moderately deep CNNs often offer a more practical balance between representational power and computational efficiency.
In this study, we utilize Highway-Env [6,7] as the simulation environment and Stable-Baselines3 (SB3) [8,9] as the RL framework. The former provides a set of realistic and configurable driving scenarios, such as highways, intersections, and merging lanes, which serve as the training ground for autonomous agents. The latter offers reliable implementations of state-of-the-art RL algorithms, including PPO, DQN, and SAC, enabling efficient and reproducible training. By integrating these two tools, we can design and evaluate our feature extractor in dynamic and diverse traffic environments while maintaining a modular experimental setup.
The remainder of this article is structured as follows: Section 2 introduces the fundamental concepts that form the basis of this study. Section 3 describes the proposed methodologies. Section 4 presents and analyzes the experimental results. Finally, the last section concludes the paper by summarizing the key findings and outlining directions for future work.

2. Background Knowledge

2.1. DRL Framework

Figure 1 illustrates a typical DRL framework, highlighting the data flow from environmental sensing to decision-making. First, the procedure begins with the agent receiving raw environmental observations. These observations may include sensor data or structured states. Next, the raw observations are then passed through a feature extractor, which transforms high-dimensional inputs into lower-dimensional, informative representations. This step helps the agent focus on relevant features for decision-making. Finally, the extracted features are fed into a neural network, which is the core of the agent’s policy or value function. This network processes these features to infer the action (policy-based methods) or the value of a given state (value-based methods), which guides the agent’s behavior in the environment.

2.2. Feature Extractor

In DRL, feature extractors play a crucial role in transforming high-dimensional raw observations into compact and meaningful representations that are easier for the algorithm to process. By extracting features from the environment, they help reduce the complexity of the input space and emphasize task-relevant information, which is essential for efficient policy and value function learning. A well-designed feature extractor improves training stability and efficiency and enhances the model’s generalization ability to new scenarios. Increasing depth enhances feature extraction and improves performance on complex visual tasks, but it also introduces potential drawbacks. Also, in DRL, where real-time decision-making is critical, moderately deep CNNs often offer a more practical balance between representational power and computational efficiency. Hence, we need to consider new design approaches, rather than simply making the network deeper or more complex.

2.3. Critical Components

The spatial attention module (SAM) enhances the representational power of CNNs by focusing on where the important features are located in the spatial domain [10]. It computes a 2D attention map by applying average-pooling and max-pooling along the channel axis, then uses a convolutional layer to generate attention weights. Figure 2 shows the SAM procedure. This approach allows the network to highlight informative regions while suppressing irrelevant background features. Moreover, SAM introduces minimal computational overhead, making it an effective and efficient addition to CNNs.
Reference [11] introduced a multi-scale concept to enhance feature extraction. This design employs multiple convolutional branches with varying channel sizes, enabling the network to capture image features at different scales. It achieves richer feature representations by integrating these diverse features through dense and skip connections while maintaining low computational complexity.

3. Our Feature Extractor Design

Based on the above discussion, we integrated these two critical components, SAM and the multi-scale feature module, into our feature extractor. This combination enhances the model’s ability to focus on spatially informative regions while simultaneously capturing features at different scales, leading to richer and more discriminative representations. Moreover, both modules are lightweight and introduce minimal computational overhead, making the overall design effective and efficient for real-time applications.
Figure 3 illustrates the infrastructure component we repeatedly utilized to construct the feature extractor neural network. It begins with a convolutional layer that extracts initial features. These features are then refined by SAM, which generates an attention map emphasizing important spatial regions. The refined features are element-wise multiplied with the original features via a residual connection, enhancing key information while retaining context. Finally, a max pooling layer reduces the spatial dimensions, preparing the output for subsequent stages. This module is a reusable building block in constructing the feature extractor network.
Figure 4 illustrates the architecture of our feature extractor network, which integrates multi-scale infrastructure components followed by fully connected layers. The network consists of two parallel branches: one composed of infrastructure components with 3 × 3 kernels, and the other with 7 × 7 kernels. Each component is followed by a ReLU activation to introduce non-linearity. After three stages of processing in each branch, the outputs are flattened and concatenated to fuse the multi-scale features. This combined representation is then passed through a series of fully connected layers, each followed by ReLU activations, to generate the final feature.
The feature extractor network corresponds to the blue block in Figure 1. The preceding orange block represents the observation module, which obtains input data from the Highway-Env simulator. The subsequent blue block represents an independent RL control algorithm.

4. Result and Discussion

We used the following settings for the experiment.
  • Experimental PC specifications: CPU: Intel i7-12700k, GPU: NVIDIA RTX 3060 12 GB, RAM: 48 GB.
  • Python version: 3.12.9 [12].
  • PyTorch version: 2.6.0 [13].
  • CUDA version: 12.6 [14].
  • Observation: 150 × 150 gray images from Highway-Env simulator.
  • Feature extractors: MobileNet V3 Small, SqueezeNet, ResNet18, or our design.
  • RL control algorithm: PPO.
  • The steering angle is represented as a continuous value in the range of −1 to 1.
  • The ego-vehicle is yellow, while the opponent vehicles (randomly 1 to 5) are blue.
  • Rewards and penalties: collision: −1 (reset environment), improper action: −0.3, lane-keeping reward: 2.
  • Training steps: 5,000,000.
  • The experimental flowchart can thus be represented in Figure 5.
Figure 5. Experimental procedure.
Figure 5. Experimental procedure.
Engproc 120 00027 g005
The experimental results of training cumulated reward are shown in Figure 5 and Figure 6. All models were trained without any pretrained weights. The proposed method consistently achieved significantly higher accumulated rewards than MobileNet, SqueezeNet, and ResNet18, demonstrating superior learning efficiency under identical training conditions. We incorporated pretrained weights into other well-known models (Figure 7), where all models except ours are initialized with their corresponding pretrained parameters. Despite this advantage, our method still achieves the highest accumulated reward throughout training, outperforming the pretrained baselines. This implies that pretrained weights offer limited advantage in the setting, because their source domain differs from our training environment, making them less relevant for the target task.
Figure 8 shows the average survival time from the simulator achieved by different models. The average is calculated by running races ten times and computing the mean survival duration. The proposed method outperforms all others, including models with and without pretrained weights, demonstrating superior decision-making capabilities and robustness in dynamic environments.

5. Conclusions

We investigated the impact of feature extractors on DRL performance in autonomous driving. By integrating a spatial attention module and multi-scale feature extraction into a lightweight design, the proposed model significantly improves cumulative reward and survival time compared to existing CNNs. The results demonstrate that optimizing the feature extraction process alone, without altering the learning algorithm, can lead to superior learning efficiency and robust decision-making.
We plan to explore more architectures to enhance the expressiveness and robustness of extracted features. Specifically, we intend to investigate integrating advanced mechanisms to allow the network to capture more discriminative and context-aware representations across time and space. In addition, the proposed feature extractor needs to be explored in more diverse and challenging autonomous driving scenarios. These experiments can assess our model’s generalization capabilities under varying dynamics and uncertainties.

Author Contributions

Conceptualization, C.-C.C. and P.-T.W.; methodology, C.-C.C., P.-T.W. and Y.-M.O.; software, P.-T.W.; validation, C.-C.C. and P.-T.W.; writing—original draft preparation, C.-C.C. and P.-T.W.; writing—review and editing, C.-C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Science and Technology Council, Taiwan, under Grant 112-2221-E-035-062-MY3.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hwang, K. Cloud Computing for Machine Learning and Cognitive Applications; The MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
  2. Hwang, K.; Chen, M. Big-Data Analytics for Cloud, IoT and Cognitive Computing; John Wiley & Sons: Hoboken, NJ, USA, 2017. [Google Scholar]
  3. Ooi, Y.-M.; Chang, C.-C. Improving recurrent deterministic policy gradient strategy in autonomous driving. Soft Comput. 2025, 29, 1931–1946. [Google Scholar] [CrossRef]
  4. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; The MIT Press: Cambridge, UK, 2018. [Google Scholar]
  5. Zhu, P.; Dai, W.; Yao, W.; Ma, J.; Zeng, Z.; Lu, H. Multi-robot flocking control based on deep reinforcement learning. IEEE Access 2020, 8, 150397–150406. [Google Scholar] [CrossRef]
  6. Highway-Env Documentation. Available online: https://highway-env.farama.org (accessed on 11 July 2025).
  7. GitHub—czh513/Auto-Driving-RL-Decision-Making: An Environment for Autonomous Driving Decision-Making. Available online: https://github.com/czh513/Auto-driving-RL-decision-making (accessed on 11 July 2025).
  8. Stable-Baselines3 Docs—Reliable Reinforcement Learning Implementations—Stable Baselines3 2.7.0a1 Documentation. Available online: https://stable-baselines3.readthedocs.io/en/master/index.html (accessed on 11 July 2025).
  9. Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-baselines3: Reliable reinforcement learning implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
  10. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  11. Lee, Y.; Jun, D.; Kim, B.-G.; Lee, H. Enhanced single image super resolution method using lightweight multi-scale channel dense network. Sensors 2021, 21, 3351. [Google Scholar] [CrossRef] [PubMed]
  12. Welcome to Python.org. Available online: https://www.python.org (accessed on 15 July 2025).
  13. PyTorch. Available online: https://pytorch.org (accessed on 15 July 2025).
  14. CUDA Toolkit—Free Tools and Training|NVIDIA Developer. Available online: https://developer.nvidia.com/cuda-toolkit (accessed on 18 July 2025).
Figure 1. DRL framework.
Figure 1. DRL framework.
Engproc 120 00027 g001
Figure 2. SAM procedure.
Figure 2. SAM procedure.
Engproc 120 00027 g002
Figure 3. Proposed infrastructure component in this study.
Figure 3. Proposed infrastructure component in this study.
Engproc 120 00027 g003
Figure 4. Proposed feature extractor network in this study.
Figure 4. Proposed feature extractor network in this study.
Engproc 120 00027 g004
Figure 6. Comparison of training cumulated reward across different networks without pretrained weights.
Figure 6. Comparison of training cumulated reward across different networks without pretrained weights.
Engproc 120 00027 g006
Figure 7. Comparison of training cumulated reward across different networks with pretrained weights.
Figure 7. Comparison of training cumulated reward across different networks with pretrained weights.
Engproc 120 00027 g007
Figure 8. Survival time comparison across different models.
Figure 8. Survival time comparison across different models.
Engproc 120 00027 g008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chang, C.-C.; Wu, P.-T.; Ooi, Y.-M. Evaluation of Impact of Convolutional Neural Network-Based Feature Extractors on Deep Reinforcement Learning for Autonomous Driving. Eng. Proc. 2025, 120, 27. https://doi.org/10.3390/engproc2025120027

AMA Style

Chang C-C, Wu P-T, Ooi Y-M. Evaluation of Impact of Convolutional Neural Network-Based Feature Extractors on Deep Reinforcement Learning for Autonomous Driving. Engineering Proceedings. 2025; 120(1):27. https://doi.org/10.3390/engproc2025120027

Chicago/Turabian Style

Chang, Che-Cheng, Po-Ting Wu, and Yee-Ming Ooi. 2025. "Evaluation of Impact of Convolutional Neural Network-Based Feature Extractors on Deep Reinforcement Learning for Autonomous Driving" Engineering Proceedings 120, no. 1: 27. https://doi.org/10.3390/engproc2025120027

APA Style

Chang, C.-C., Wu, P.-T., & Ooi, Y.-M. (2025). Evaluation of Impact of Convolutional Neural Network-Based Feature Extractors on Deep Reinforcement Learning for Autonomous Driving. Engineering Proceedings, 120(1), 27. https://doi.org/10.3390/engproc2025120027

Article Metrics

Back to TopTop