Next Article in Journal
RDINet: A Deep Learning Model Integrating RGB-D and Ingredient Features for Food Nutrition Estimation
Previous Article in Journal
The Three-Dimensional Analytical Modeling of Lightning-Induced Heat Diffusion: The Critical Roles of the Continuing Current and Lightning Channel Radius in Structural Damage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Perceptual Elements and Sensitivity Analysis of Urban Tunnel Portals for Autonomous Driving

1
School of Civil Engineering, Chongqing Jiaotong University, Chongqing 400074, China
2
State Key Laboratory of Bridge and Tunnel Engineering in Mountainous Areas, Chongqing 400074, China
3
Urban Intelligence Academy, Chongqing Jiaotong University, Chongqing 400074, China
4
School of Traffic and Transportation, Chongqing Jiaotong University, Chongqing 400074, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(1), 453; https://doi.org/10.3390/app16010453
Submission received: 20 November 2025 / Revised: 25 December 2025 / Accepted: 29 December 2025 / Published: 31 December 2025
(This article belongs to the Section Civil Engineering)

Abstract

Urban tunnel portals constitute critical safety zones for autonomous vehicles, where abrupt luminance transitions, shortened sight distances, and densely distributed structural and traffic elements pose considerable challenges to perception reliability. Existing driving scenario datasets are rarely tailored to tunnel environments and have not quantitatively evaluated how specific infrastructure components influence perception latency in autonomous systems. This study develops a requirement-driven framework for the identification and sensitivity ranking of information perception elements within urban tunnel portals. Based on expert evaluations and a combined function–safety scoring system, nine key elements—including road surfaces, tunnel portals, lane markings, and vehicles—were identified as perception-critical. A “mandatory–optional” combination rule was then applied to generate 48 logical scene types, and 376 images after brightness (30–220 px), blur (Laplacian variance ≥ 100), and occlusion filtering (≤0.5% pixel error) were obtained after luminance and occlusion screening. A ResNet50–PSPNet convolutional neural network was trained to perform pixel-level segmentation, with inference rate adopted as a quantitative proxy for perceptual sensitivity. Field experiments across ten urban tunnels in China indicate that the model consistently recognized road surfaces, lane markings, cars, and motorcycles with the shortest inference times (<6.5 ms), whereas portal structures and vegetation required longer recognition times (>7.5 ms). This sensitivity ranking is statistically stable under clear, daytime conditions (p < 0.01). The findings provide engineering insights for optimizing tunnel lighting design, signage placement, and V2X configuration, and offers a pilot dataset to support perception-oriented design and evaluation of urban tunnel portals in semi-enclosed environments. Unlike generic segmentation datasets, this study quantifies element-specific CNN latency at tunnel portals for the first time.

1. Introduction

1.1. Background and Motivation

Urban tunnel portals represent one of the most visually complex transition zones in urban road networks. Vehicles approaching or exiting tunnels are exposed to abrupt luminance variations—often exceeding 120,000 lx—together with compressed sight distances and the dense coexistence of static infrastructure and dynamic traffic participants [1,2,3,4]. Empirical studies have shown that both human drivers and autonomous vehicles experience elevated takeover rates and perception instability in such environments compared with open roads [1,2,3].
In recent years, substantial research efforts have focused on improving visual perception robustness in tunnel-related scenarios, including luminance adaptation, high-dynamic range (HDR) pipelines, low-light image enhancement, and sensor fusion strategies [4,5,6,7,8]. While these approaches have effectively improved recognition accuracy and robustness under adverse lighting conditions, they predominantly emphasize what can be recognized, rather than how efficiently visual information is processed under real-time constraints.
In practical autonomous driving pipelines, perception performance is jointly determined by accuracy and computational efficiency. Even when recognition accuracy is maintained, excessive inference latency at the perception stage may propagate through downstream planning and control modules, potentially degrading system responsiveness. However, despite its relevance, perception latency has rarely been examined from an element-level perspective, particularly in tunnel portal environments.
Most existing datasets and scenario benchmarks are designed for highways or open urban roads and are organized around object categories or traffic situations, without explicitly analyzing how different infrastructure and environmental elements contribute to computational load during CNN inference. Consequently, there remains a lack of perception-oriented diagnostic knowledge regarding which tunnel-related scene components impose higher computational demands on vision-based models. This gap motivates a systematic investigation of element-level inference characteristics in urban tunnel portal scenarios.

1.2. Research Gap

Although dozens of open scenario libraries for autonomous driving have been developed worldwide—such as nuScenes, Waymo Open, and ApolloScape—their construction logic remains predominantly data-driven or expert-template-based. Typically, these datasets are generated by large-scale fleet data collection or manual enumeration of scenarios, followed by post hoc annotation of element categories [9,10]. Such paradigms have performed well in urban surface roads and highway environments, yet they generally overlook the coupling between individual scene elements and perception latency—that is, whether the presence of a specific infrastructure element significantly increases the inference time of a convolutional neural network (CNN). Recent work by Liu and Zhang (2024) employed machine learning models to rank influencing factors of tunnel-entrance vision degradation, validating the feasibility of sensitivity-based evaluation [11]. For urban tunnel portals, where vehicle perception and decision-making are extremely sensitive to latency, this omission results in two direct consequences:
(1)
Redundant scenario libraries, containing a large number of “pseudo-critical” samples that have negligible influence on perception delay;
(2)
A lack of quantitative design guidance for tunnel engineers to determine which elements should be prioritized or avoided in order to improve perception efficiency and operational safety.
In the field of tunnel environment perception, research over the past five years has primarily focused on sensor fusion techniques—combining visible, infrared, and LiDAR data—and low-illumination image enhancement methods, such as histogram equalization, Retinex-based correction, and GAN-based denoising [7,8,9]. These approaches have notably improved object detection accuracy (mean Average Precision, mAP), yet they have not systematically addressed the fundamental question of which categories of elements are most susceptible to delayed recognition within complex tunnel environments. Moreover, most existing studies on tunnel scenarios are limited to single-site case analyses or synthetic simulations, lacking cross-tunnel and cross-period comparative experiments. This limitation considerably weakens the external validity of their conclusions and restricts their applicability to real-world tunnel engineering practice [6,7,12].
More critically, there remains an absence of a requirement-driven integrated framework for the selection and sensitivity ranking of perceptual elements. Such a framework should achieve the following:
(1)
Derive, from the core operational tasks of autonomous driving—namely lane keeping, collision avoidance, and traffic compliance—a list of elements that must be detected rapidly, continuously, and accurately;
(2)
Employ CNN inference rate as a hardware-aware sensitivity indicator, enabling millisecond-level ranking of element recognition priority;
(3)
Translate the ranking results into actionable design parameters for tunnel lighting, signage layout, and V2X communication configuration.

1.3. Research Objectives and Innovations

This study aims to achieve the following:
(1)
Establish a comprehensive element system specifically designed for urban tunnel portals, capturing both static infrastructure and dynamic environmental factors relevant to autonomous driving perception;
(2)
Develop a balanced image dataset characterized by single-element variation and pixel-level accuracy, ensuring that each perceptual element can be independently analyzed under consistent visual conditions;
(3)
Adopt inference rate as a direct and hardware-sensitive metric to quantify the perceptual sensitivity of key tunnel elements with millisecond precision.
The principal contributions are threefold:
(1)
A function–safety scoring framework that reduces 64 candidate combinations to nine safety-critical elements;
(2)
A high-resolution image dataset covering 48 logical scenarios (376 images) with an annotation error below 0.5%;
(3)
A ResNet50-PSPNet CNN that discloses a stable sensitivity ranking: road ≈ markings < car < motorcycle < traffic sign < portal structure < vegetation.

2. Selection of Scene-Specific Information Perception Elements

2.1. Systematic Framework for Scene Information Perception

2.1.1. Scenario Definition and Portal-Segment Characteristics

(1)
Scenario Definition
In traffic research, a “scenario” is defined as the integrated situation prevailing within a specific time–space window. Its constitutive attributes are as follows:
(i)
Temporal condition—the instant or interval of occurrence;
(ii)
Spatial condition—the locus or geographical extent;
(iii)
Environmental features—the physical circumstances present;
(iv)
Agent activity—the behavior and state of humans or objects within that environment.
While in everyday language “scenario” denotes any event in context, the urban transport literature restricts the term to the road layout, function, and traffic performance observed under given spatiotemporal conditions (e.g., infrastructure geometry, traffic volume, roadside facilities).
Urban-tunnel scenes exhibit additional complexity. Beyond the tunnel’s own physical geometry, the scene embraces the adjoining urban links, the interior lighting and traffic-control devices, pavement markings, signage, and the surrounding topographic and traffic environment. Consequently, an urban-tunnel road scenario is conceived as the union of five core components: tunnel structure, connecting road system, interior lighting and equipment, ambient environmental factors, and traffic-management measures.
(2)
Characteristics of urban tunnel portal zones
The portal segment constitutes the critical transition between the external natural environment and the enclosed artificial environment of the tunnel, exhibiting pronounced spatial and environmental discontinuities. When traversing from the open approach into the tunnel interior, the driving environment undergoes a drastic shift from fully open to semi-confined conditions, manifested in the following aspects:
Firstly, an abrupt change in illuminance dominates the portal zone. The approach road is governed by natural lighting whose intensity varies with time of day and weather, whereas the tunnel interior is sustained at a constant luminance by an artificial system. Rapid entry thus imposes a “bright–dark” or “dark–bright” adaptation shock on both human drivers and machine vision, frequently producing transient blind regions and perceptual latency.
Secondly, spatial confinement intensifies instantaneously. The open approach offers an unrestricted sight-line and long preview distance; at the portal, this vista is abruptly truncated by the tunnel walls, portal frame, and ancillary structures, creating a semi-enclosed cavity. The simultaneous contraction of spatial scale and visual field perturbs driving behavior and destabilizes autonomous perception algorithms.
Thirdly, traffic complexity surges. The portal is a locus where flow turbulence and behavioral transitions concentrate; lane changes, decelerations, and accelerations are routinely executed immediately ahead of the entrance. The resulting kinematic uncertainty raises the perceptual and decision load for automated driving systems.
Finally, infrastructural and environmental elements are superimposed within a short reach. In addition to the carriageway and portal opening, lighting luminaires, ventilation grids, traffic signs, pavement markings, and landscape vegetation appear in rapid succession, constituting a highly cluttered transition backdrop that imposes stringent demands on vision-based perception modules.

2.1.2. Hierarchical Scene Model

Focusing on the portal segment, this study establishes the scene hierarchy from an infrastructure-engineering perspective. Stratification of the operational scene is a prerequisite for autonomous-driving deployment: it furnishes a precise testing and validation environment for automated vehicles and simultaneously enhances the efficiency of intelligent tunnel management. Scene construction must explicitly account for the atypical environmental conditions on both sides of the portal, the distinct traffic-flow patterns, and the adaptability limits of autonomous systems. As a critical component of the urban road network, a tunnel exhibits distinctive geometry, illumination regimes, and traffic-control schemes; consequently, its operational scene must be planned and designed through a multi-level approach. On the basis of the foregoing considerations, the operational-scene architecture for the urban tunnel portal is decomposed into four hierarchically coupled layers (Figure 1):
Layer 1—tunnel-infrastructure stratum;
Layer 2—environmental-perception stratum;
Layer 3—dynamic-traffic stratum;
Layer 4—vehicle–infrastructure-cooperative and communication stratum.
(1)
Layer 1—Tunnel-infrastructure stratum
As the backbone of the operational scene, this stratum comprises structural geometry, fixed traffic fixtures, and pavement composition. Design provisions are required to guarantee crash-worthiness while simultaneously delivering optically unambiguous navigation cues—namely road markings, longitudinal retro-reflective lines, and luminaire layouts—tailored to the detection thresholds of automated vehicles.
(2)
Layer 2—Static environmental-perception stratum
This stratum encapsulates ambient conditions and non-moving traffic artifacts. Baseline micro-climate variables (illuminance, visibility, temperature) are continuously acquired; their rapid variation across the portal exerts a first-order effect on camera and LiDAR performance. By segmenting these environmental transients into discrete, time-stamped snapshots, the layer delivers a quantified, fragment-based static scene representation to the automated vehicle.
(3)
Layer 3—Dynamic-traffic stratum
The dynamic-traffic stratum of the tunnel portal segment subsumes both dynamic traffic and dynamic environment. Dynamic traffic encapsulates real-time flow regimes, participant kinematics, and signal phase states, whereas dynamic environment denotes anomalous weather episodes such as torrential rain or advection fog; the automated vehicle is required to sense and update these coupled transients on a frame-by-frame basis to safeguard uninterrupted and collision-free passage.
(4)
Layer 4—Vehicle–infrastructure-cooperative and communication stratum
The vehicle–infrastructure-cooperative and communication stratum is the critical layer ensuring safe and efficient operation of automated vehicles within the tunnel portal segment. Via V2X links, the automated vehicle conducts real-time information exchange with infrastructure, traffic-management systems, and surrounding vehicles both inside and outside the tunnel, thereby counteracting the complex traffic environment.
At the same time, the conceptual framework was further translated into operable experimental inputs as follows:
(1) Hierarchy-to-Element Mapping (Table 1).
Layers 1–4 were decomposed into 11 measurable attributes (e.g., openness ratio of the portal shading canopy, retroreflective coefficient of the pavement, and number of motorcycle instances), each assigned a corresponding field name and data type (float/int/bool).
Each collected image was automatically written into these fields, thereby achieving a three-level alignment of layer–attribute–pixel label, which enables direct invocation in subsequent sensitivity analyses.
(2) Hierarchy-to-Model Parameter Mapping
The portal illuminance in Layer 2 (environment perception layer) was discretized into five levels (<5 klx, 5–20 klx, …, >80 klx) and used as a brightness-gain mask for the multi-scale inputs of PSPNet, explicitly participating in forward inference.
The number of motorcycles in Layer 3 (dynamic traffic layer) was incorporated into the online data augmentation strategy: when the number of motorcycle instances in the labels was ≥2, additional rotations of ±5° and brightness variations of ±10% were triggered to balance the samples and reduce CNN inference latency bias.
(3) Hierarchy-to-Sensitivity Metric Mapping
With Layer 1 elements (roadway and tunnel portal) serving as the skeleton and Layer 2–4 elements as plug-ins, 48 sets of scatter plots relating hierarchical combinations to inference latency were constructed. A Pearson correlation coefficient of r = 0.92 indicates that hierarchical attributes can linearly explain more than 85% of the latency variance, thereby establishing a quantitative linkage from conceptual hierarchy to millisecond-level performance metrics.
This hierarchical architecture furnishes a systematic description of the tunnel-portal scene, enabling calibrated testing of automated vehicles and evidence-based design of intelligent infrastructure.

2.1.3. Scene Classification

Previous studies have suggested that, from the interpretability perspective of autonomous driving systems, driving scenes can be organized into three hierarchical levels: functional scenes, logical scenes, and concrete scenes. As the hierarchy progresses downward, the descriptive granularity increases, and the number of scenarios grows exponentially. The functional scene provides an abstract semantic description of a traffic situation. The logical scene defines the parameter ranges of state variables—such as vehicle speed, lane number, and lane width. The concrete scene fixes these variables to specific values, supplements detailed environmental information, and generates unique executable test cases. However, if the sampling interval at the logical level is set too narrowly, the resulting number of concrete scenes increases dramatically, leading to excessive expansion of the scenario library and raising the cost of dataset construction and maintenance.
Therefore, to construct autonomous driving test scenarios for urban tunnel portal sections, this study first establishes a categorical framework, while simultaneously incorporating two core considerations into the classification process:
The first consideration is that scene classification should allow clear expression of scene characteristics through qualitative descriptions. In practical terms, this means that the various aspects of a scene can be described using both natural language and quantitative models. Qualitative description facilitates the more intuitive understanding and differentiation of distinct scenarios, enabling engineers and system developers to grasp environmental variations beyond purely numerical parameters.
The second consideration is that scene categorization must maintain an appropriate balance between generality and specificity. An overly broad classification may lead to ambiguity or uncertainty in perception and information processing for autonomous systems, whereas an excessively detailed classification may result in a proliferation of subcategories, increasing the complexity and difficulty of scene management. Therefore, a well-structured scene classification should strike a balance between accuracy and practicality, ensuring both conceptual clarity and operational efficiency in scenario construction.
The urban tunnel portal section exhibits significant vertical diversity in its spatial hierarchy and horizontal complexity in its structural composition. To enable systematic analysis, the scene is first decomposed into constituent components at the initial stage of this study. Accordingly, the tunnel portal scenes are further classified into three major types, as illustrated in Figure 2:
(1)
Basic scenes;
(2)
Spatial variation scenes (hereafter referred to as spatial variation scenes);
(3)
Spatiotemporal variation scenes (hereafter referred to as spatiotemporal variation scenes).
This classification framework reflects the progressive increase in environmental and perceptual complexity across the three scene types, providing a structured basis for scenario construction and subsequent sensitivity analysis.
(1)
Basic Scene
The basic scene of an urban tunnel portal section consists primarily of the tunnel infrastructure and the surrounding light environment, representing the transition zone between the external roadway and the tunnel interior. The lighting conditions at the portal exert a significant influence on visual perception for autonomous driving. However, because illumination varies considerably across different times and weather conditions, the lighting environment is fixed as a constant spatiotemporal background in the basic scene and is not treated as a variable.
(2)
Spatial Variation Scene
Building upon the basic scene, the spatial variation scene incorporates additional static traffic and environmental elements, such as road signs, signal lights, signboards, medians, and adjacent buildings, emphasizing the layout and configuration of static facilities near the tunnel portal. In this scene type, the lighting environment varies spatially—there exists a contrast between external natural light and internal artificial illumination in the transition zone—which affects both visual perception and driving safety of autonomous vehicles, and therefore must be considered as a key influencing factor.
(3)
Spatiotemporal Variation Scene
On the basis of the spatial variation scene, the spatiotemporal variation scene introduces dynamic traffic factors and adverse weather conditions, making the traffic state at the portal more complex and realistic. The lighting environment varies with time: during the day and night, natural and artificial light interact, producing mixed luminance distributions that influence both machine vision perception and decision-making safety in autonomous driving systems.
This hierarchical scene classification captures both the structural diversity and environmental dynamics of urban tunnel portals, providing a solid foundation for systematic scene construction and sensitivity analysis in subsequent stages of this research.

2.2. Method for Screening Perceptual Elements in Tunnel Portal Scenes

2.2.1. Requirement-Driven Selection of Key Elements

Perceptual elements refer to independent entities within a scene that possess specific functions or attributes. These elements may be static, dynamic, or interactive, depending on their roles and relationships within the tunnel environment. The types and quantities of perceptual elements may vary depending on specific application requirements and operational objectives. When constructing autonomous driving scenarios for urban tunnel portal sections, the screening of perceptual elements must be guided by the actual requirements of the final application scenario for the perception system. Accordingly, a requirement-driven selection (RDS) approach is adopted to determine which elements must be included in the scenario library and which may be downgraded or temporarily omitted. Compared with data-driven or expert experience-based approaches, the requirement-driven selection (RDS) method directly traces back to the core question—“What tasks must an autonomous vehicle accomplish at the tunnel portal?” By decomposing the scenario from this perspective, the method performs a first-level screening of perceptual elements based on their functional relevance and safety criticality, thereby avoiding both redundancy (overly complex scenes) and omission (absence of key perception elements).
The mission profile imposed by the urban tunnel portal segment on automated vehicles condenses into three canonical tasks:
(1)
Lane keeping and path tracking: the vehicle shall align to the target lane prior to portal entry and maintain lateral-control accuracy under abrupt illuminance transitions (strong-to-dim or dim-to-strong).
(2)
Collision-risk detection and evasion: within the sight-restricted, luminance-varying transition zone, the real-time discrimination and avoidance of low-speed lead vehicles, two-wheelers, and pedestrians are required.
(3)
Traffic-rule adaptation: portal-specific signs (speed limit, lane-change prohibition, dipped-beam mandate) must be recognized, followed by commensurate longitudinal speed adjustment and driving-mode switching.
Mapping the above tasks onto perception requirements yields three information classes that must be detected accurately, rapidly, and continuously: (i) road geometry and pavement markings, (ii) dynamic obstacles, and (iii) mandatory traffic signs and longitudinal markings.

2.2.2. Taxonomy of Elements

Within autonomous-driving applications, scene information perception elements conventionally encompass, yet are not restricted to, the following classes:
(1)
Static elements: entities that remain spatially invariant or exhibit negligible variation within the observation horizon—typical instances include carriageways, civil structures, ground planes, vegetation, and fixed obstacles.
(2)
Dynamic elements: time-varying entities or agents that interact with the scene—e.g., vehicles, pedestrians, evolving weather states, illuminance transients, and traffic-flow volumes.
(3)
Ambient elements: external-field variables exerting distributed influence, namely precipitation (rain, snow, fog), natural illumination (day, night, low-sun), temperature, and wind speed.
The portal segment mandates that the automated-driving system perform path-planning and obstacle-avoidance tasks under highly heterogeneous environments; consequently, a requirement-driven screening of scene-perception elements is adopted, with emphasis placed on the following aspects:
(1)
Static subset: tunnel-type carriageways, ground planes, portal structures, vegetation, and fixed obstacles.
(2)
Dynamic subset: vehicles, traffic-flow states, weather evolution, and illuminance transients.
(3)
Ambient subset: precipitation regimes, natural illumination, temperatures, and wind speeds.
The above taxonomy reveals overlapping entries (e.g., weather appears in both dynamic and ambient classes). To eliminate redundancy, the three generic categories are re-cross-mapped against the four-stratum scene hierarchy, yielding two super-classes and five sub-classes (Table 2):
(1)
Facility-type elements, subdivided into infrastructure (tunnel cross-section, portal geometry) and traffic signs and markings.
(2)
Environment-type elements, subdivided into ambient conditions (vegetation, weather, temperature); luminous environment (illuminance, glare, contrast); and dynamic operational environment (traffic volume, vehicle speed, flow disturbances).
Within the urban tunnel portal section, each perception requirement was evaluated using a two-dimensional five-point Likert scale (1 = very low; 5 = very high) for both functional relevance (R) and safety criticality (S). Only elements with an average score of (R + S)/2 ≥ 4 were retained. Here, functional relevance (R) refers to whether a missed or false detection of the element by the autonomous driving perception system would directly hinder the three core tasks described in Section 2.1—lane keeping, collision avoidance, and rule adaptation. Safety criticality (S) indicates whether the failure of the element within the tunnel portal area would immediately result in safety incidents such as collisions, rear-end crashes, or lane departures, which cannot be corrected by the driver within an average reaction time of less than 2.5 s. During the evaluation process, six experts in tunnel design and operation (professional experience ≥ 5 years), six engineers specializing in autonomous driving perception algorithms (professional experience ≥ 3 years), and four experts in traffic safety and human factors (professional experience ≥ 2 years) were invited to rate the elements listed in Table 3. Based on the scoring results, a screening matrix was established (Table 4 and Table 5), and nine types of elements (Table 6) were identified as “high relevance–high safety” factors to be included in subsequent scenario combinations. Although “portal weather” and “portal illumination” in the lighting environment significantly affect perception performance, they were regarded as stochastic and uncontrollable variables. Therefore, they were considered only as background conditions rather than combinatorial factors during the element selection phase.
The nine categories of elements derived from the demand-oriented approach were further validated for consistency with road design codes and traffic regulations to ensure that no combinations would result in “legally compliant but practically infeasible” scenarios. For example, according to the Code for Design of Urban Road Tunnels (CJJ 221-2015), longitudinal deceleration markings and speed-limit signs must be installed at tunnel entrances. Therefore, “traffic markings (B2)” and “traffic signs (B1)” were designated as mandatory elements in subsequent scenario combinations, while elements such as vegetation and two-wheeled motorcycles were defined as optional elements. This strategy ensures compliance with both functional and safety requirements while minimizing the risk of generating infeasible combinations.

3. Sensitivity Analysis Method

3.1. CNN-Based Image Recognition Model Construction

In this study, a convolutional neural network (CNN) was employed to perform sensitivity analysis on the fundamental construction elements of autonomous driving scenarios at urban tunnel portals. The objective was to simulate the first stage of visual perception in autonomous vehicles—namely, scene information classification. The key task at this stage is to design a feature extraction network framework suitable for the characteristics of image data from tunnel portal scenes. Considering the specific visual characteristics of the tunnel portal environment, this study integrates the pyramid pooling module of the Pyramid Scene Parsing Network (PSPNet) with the Residual Network (ResNet) architecture. The constructed network model consists of three main components: the backbone for feature extraction, the neck for multi-scale feature integration, and the head responsible for the final output and classification.
First, tunnel image data collected by vehicle-mounted cameras under autonomous driving mode from ten tunnels at different time periods were organized and pre-processed. The processed data were then fed into the network, where the backbone performed initial feature extraction to prepare for subsequent processing stages. In this study, ResNet50 was adopted as the backbone network, followed by a pyramid pooling module to capture multi-scale contextual information. The extracted global and local features were upsampled and fused through a feature fusion module (FFM), while skip connections were employed to concatenate features from different layers to enrich semantic representation. Finally, through the output network (head), the model accomplished both image segmentation and classification tasks (as illustrated in Figure 3).
The image parameter dataset was first fed into the front-end backbone of the network. According to the principle of backpropagation, when the input values are excessively large, the gradient magnitude during backward propagation also increases, which may reduce the effective learning rate. Since the parameter weights and gradients across different neural network layers can vary by several orders of magnitude, this imbalance significantly increases computational cost and search time. To address this issue, batch normalization (BN) was applied to the input data, and BN layers were also introduced into the intermediate hidden layers. After BN processing, the nonlinear representation capability of the network was enhanced, computational efficiency was optimized, and a stable learning process was ensured.
The batch normalization (BN) algorithm can be summarized as follows:
Batch process input x (min-batch):
β = x 1 , x 2 , , x n
Standardizing online content:
y i = B N y , β ( h h x i )
① Calculate the mean of batch data:
μ ρ 1 m i = 1 m x i
② Calculate the variance of batch processing data:
σ β 2 1 m i = 1 m x i μ β 2
③ Standardization:
x i x i μ β σ β 2 + ε
④ Scale change and offset:
y i γ x i + β = B N γ   , β x i
⑤ Return value: the learned parameter scale factor γ and shift factor β.
The three-channel input data are first processed by the designed front-end backbone module ResNet50. This module employs a 7 × 7 convolution kernel with 64 filters to generate a 64-channel feature map, followed by pooling using a 3 × 3 pooling window. Subsequently, after 48 convolution operations, the number of channels expands to 2048, fully extracting the features of the data information. The data then proceed through forward propagation into the pyramid pooling module, where pooling operations are performed under 1 × 1, 2 × 2, 3 × 3, and 6 × 6 pooling windows to obtain multi-scale feature maps. Next, 1 × 1 convolution is used to reduce the dimensionality of the channels, and bilinear interpolation upsampling is applied to restore the feature maps to an appropriate size before entering the feature fusion module. In the feature fusion module, two fusion methods are typically employed, i.e., one is channel concatenation (abbreviated as Concat), and the other is pixel-wise addition followed by convolution operations. Assuming the input channels are x1, x2,..., xₙ and y1, y2,..., yₙ, the concatenation method performs feature fusion through convolution operations (Equation (7)) to enhance the model’s ability to represent features at different scales.
C o n c a t = i n x i k i + i n y i k i + n
During the forward propagation process, it is necessary to perform semantic segmentation on the output, dividing it into different categories. The choice of activation function in classification tasks depends on the number of categories: binary classification problems typically use the Sigmoid function, while multi-class classification problems employ the Softmax function. This study selects the Softmax activation function to map the output into a probability distribution of discrete categories, which aligns with the discrete frequency characteristics of the input data. Compared to regression problems that produce continuous outputs, classification modeling better fits the characteristics of the research problem and helps improve prediction accuracy. To further enhance the model’s generalization capability and stability, the mean squared error (L2 norm), commonly used in regression tasks, is introduced into the objective function as the loss function.
The definition of the Softmax function is as follows:
a j = e z j i e z j
Here, zj represents the input of the j-th neuron in the final layer, and aj denotes the output of the j-th neuron in the final layer. The natural number e is used to amplify the differences in probabilities, while ∑i ezj represents the sum of the inputs of all neurons in the final layer. The purpose of using the Softmax function is to evaluate the output of the neurons in the final layer in the form of a probability distribution. The higher the output probability of a specific neuron, the more likely it corresponds to the true class.
The CNN architecture adopted in this study is based on the standard ResNet50–PSPNet framework and does not represent a methodological contribution. It is employed as a well-established and widely validated baseline to ensure that the observed differences in inference latency can be attributed to scene-element characteristics rather than architectural novelty or optimization bias. Similar configurations have been extensively reported in the literature for scene parsing [13,14] and semantic segmentation [15,16] tasks.
The mathematical formulations of batch normalization, Softmax activation, and feature concatenation presented in this section follow standard definitions and are included for completeness and reproducibility. In the context of this study, these components serve a specific purpose: batch normalization stabilizes intermediate feature distributions to prevent layer-wise computational imbalance; Softmax ensures consistent multi-class probability normalization across element categories; and feature concatenation enables multi-scale feature aggregation required for accurate pixel-level segmentation.

3.2. Training Dataset Construction

The training data for autonomous driving scenarios in urban tunnel portal sections must simultaneously meet two requirements: “comparable feature sensitivity” and “balanced sample distribution.”
(1)
Combinatorial Generation Rules
According to national standards, each tunnel portal must appear, with the road, tunnel portal, and traffic markings being mandatory elements. The remaining six elements—traffic signs, signboards, portal vegetation, portal structures, cars, and two-wheeled motorcycles—are set as optional. The theoretical number of combinations is 26 = 64. To avoid category imbalance caused by purely static scenes, 16 combinations featuring “only vegetation/structures without vehicles or signs” are excluded. The remaining 48 logical scenarios, labeled Scene-01 to Scene-48 (see Appendix A, Table A1), form the training framework.
(2)
Annotation System and Quality Control
To meet the dual requirements of label accuracy and scalability for “feature sensitivity evaluation,” this study adopts a “semantic-instance” two-tier annotation system and establishes a three-level quality control process to ensure the mask error remains below 0.5%, thereby preventing annotation noise from interfering with the comparison of CNN inference speeds.
(1) Two-tier labeling system
① Semantic level: Format—8-bit single-channel PNG, values 0–9; 0 represents background, 1–9 correspond sequentially to tunnel portal, traffic sign, traffic marking, signboard, vegetation, structure, car, and motorcycle; resolution consistent with the original image (1024 × 512), using nearest-neighbor interpolation to avoid category aliasing caused by edge anti-aliasing.
② Instance level: Unique Instance IDs (starting from 1000 and incrementing) are assigned only to the dynamic elements of “cars” and “motorcycles” to support subsequent validation of instance segmentation model extensions; the ID encoding is written into an independent 16-bit PNG layer, sharing the same filename as the semantic label file but with a different suffix for easy reading and maintenance.
(2) Three-level quality control process
Step 1: Pre-labeling
Manually draw rough polygons using LabelMe, then invoke Meta AI’s Segment Anything Model (SAM, ViT-H checkpoint) to generate initial masks. For high-contrast areas (e.g., tunnel entrances with strong backlighting), apply CLAHE enhancement before inputting to SAM to improve edge recall. The average pre-labeling time is 2.1 s per frame, with 92% of elements achieving IoU ≥ 0.85, laying a solid foundation for subsequent manual correction.
Step 2: Cross-manual verification
Two annotators independently corrected the pre-annotated results without knowing each other’s outcomes. A pixel-level IoU was used as the consistency metric: elements with IoU ≥ 0.95 were directly approved, while those with IoU < 0.95 were forced into a “dispute zone” for online collaborative redrawing by both parties until IoU ≥ 0.95 was achieved. This process increased the average IoU from 0.87 to 0.97, with disputed elements accounting for 6% and an average redrawing time of 4.5 s per element.
Step 3: Expert Sampling Inspection
Randomly select 10% of the frames that have passed cross-validation and submit them to engineers with over 3 years of perception algorithm experience for blind review. Review criteria: ① edge deviation > 2 pixels; ② incorrect category labeling; and ③ duplicate or skipped instance IDs. Any error is counted as a defect. The defect rate must be less than 0.5% for overall approval; otherwise, expand the sampling to 20% and return for re-labeling. The measured defect rate was 0.38%, meeting the quality threshold.
(3) Quality Indicators and Results
Through the three-level process of “automatic initialization—manual cross-checking—expert sampling,” the pixel-level error of 3428 element masks across 376 images was controlled within 0.5% (Table 5), providing a high-consistency, low-noise ground truth foundation for the fair comparison of CNN sensitivity models. Although the final dataset consists of 376 images, its size is sufficient for the objective of this study, which focuses on element-level inference latency comparison rather than large-scale classification generalization. Unlike accuracy-driven perception benchmarks that require extensive data diversity, the proposed sensitivity analysis relies on controlled experimental conditions and low-variance measurements of inference time.
To ensure statistical reliability, all images undergo strict screening in terms of illumination range, motion blur, occlusion, and annotation quality, effectively reducing noise that would otherwise require larger sample sizes. Each of the 48 logical scene combinations is represented by multiple frames collected across different tunnels, and inference latency is measured repeatedly under identical network and hardware configurations.
(3)
Dataset Partitioning and Augmentation
To validate the generalization capability of the feature sensitivity model while avoiding evaluation bias caused by tunnel specificity or class imbalance in scenarios, this paper implements a rigorous hierarchical partitioning of 376 frames of “golden samples” and adopts a conservative photometric augmentation scheme. This ensures that differences in CNN inference speed solely reflect the “feature category” itself, without interference from image content shifts or geometric distortions.
A two-layer sampling strategy of “intra-class balancing and tunnel isolation” is adopted:
(1) Intra-class balancing: at least 6 frames are retained for each of the 48 logical scenarios, split in an 80%/12.5%/7.5% ratio to ensure consistent category distribution across training, validation, and test sets are compared to the overall dataset (X2 test p = 0.21, no significant deviation).
(2) Tunnel isolation: the tunnel numbers included in the test set do not appear in the training or validation sets, achieving “tunnel-level” out-of-domain validation to prevent inflated sensitivity scores due to the model memorizing background textures.
The final division results are shown in Table 7.
The core assumption of sensitivity evaluation is that differences in inference speed are solely driven by feature categories. Therefore, only lossless geometric photometric augmentations are applied during the training phase, with specific parameters as follows: brightness shift: ±10% (0.9–1.1 times linear gain); contrast adjustment: ±5% (0.95 to 1.05 times the slope); and additive Gaussian noise: σ = 2, truncated at ±3σ. Augmentations are applied online only during training with a probability of 0.5, while validation and test sets retain original pixel values. This strategy, on the one hand, expands the sample space for illumination robustness, and on the other hand, avoids edge misalignment caused by geometric transformations such as rotation, scaling, and cropping, thereby ensuring that timing differences across features in CN are not influenced by the coupling of deformation and computational load. Pre-experimental results indicate that, within the above augmentation range, the inference time fluctuation of ResNet50 is <0.3 ms, significantly below the threshold for notable inter-feature differences (1ms), meeting the requirements for fair comparison. Through the combined strategy of class-balanced and domain-isolated partitioning, along with geometric-invariant and photometric fine-tuning augmentations, this dataset provides an unbiased and reproducible experimental foundation for subsequent sensitivity ranking.
(4) Statistical Characteristics
The dataset resolution is uniformly scaled to 1024 × 512, with the pixel proportion of elements consistent with the real vehicle perspective (road 40%, vehicles 6%, etc.); the average brightness is 78 ± 21, covering typical transition zones of tunnels; the class imbalance ratio of 1:14 is reduced to 1:1.8 using median-frequency weighting. Annotation errors in 376 frames are less than 0.5%, meeting the fairness requirement for sensitivity comparison.

3.3. Inference Speed

To quantify the perception sensitivity of elements in urban tunnel portal scenes, this paper adopts “single-frame forward inference time” as the core metric, replacing traditional evaluations such as mAP or IoU. This indicator directly reflects the real-time processing differences in convolutional neural networks across various elements, better aligning with the stringent low-latency requirements of autonomous driving systems. (Here, “inference time” refers exclusively to the CNN forward-pass duration measured on-board, excluding image pre-processing, NMS post-processing, and I/O transfer. This metric is chosen because it is (i) hardware-sensitive and (ii) directly linked to the computational cost of detecting each element class.)
It should be emphasized that inference latency is not assumed to be equivalent to perceptual accuracy, nor is it intended to replace conventional performance metrics such as IoU or mAP. In this study, inference latency is adopted as a hardware-aware, element-level sensitivity indicator to quantify the relative computational burden associated with different scene elements under real-time constraints. To ensure the interpretability of this metric, the CNN architecture (ResNet50–PSPNet), hardware platform, input resolution, illumination range, blur level, and annotation quality are strictly controlled throughout the experiments. Under these controlled conditions, all element categories achieve stable segmentation accuracy, and the observed variations in inference latency primarily reflect differences in feature complexity, spatial distribution, and contextual coupling induced by individual element types, rather than recognition failure or data bias.

4. Experimental Design

4.1. Experimental Vehicles and Equipment

Before the experiment begins, it is necessary to prepare a high-performance laptop equipped with an RTX 3060+ graphics card and a high-speed SSD for data storage during the experiment. The experiment utilized the ROS (Robot Operating System) Noetic Ninjemys (version 1.15.8, Ubuntu 20.04) software to synchronously record video and CAN signals via ROS-bag. The test vehicle for this experiment is a Li Auto L8 MAX model (featuring an 8MP front-view and 2MP front-side-view cameras with a 120° FOV) (Intrinsic parameters: A checkerboard-based calibration with OpenCV 4.7 was applied, achieving a reprojection error of <0.12 pixels. Extrinsic parameters: The vehicle body coordinate origin was defined at the wheel center. RTK-GNSS was used to provide ground truth, and hand–eye calibration yielded an RMS projection error of <5 cm at 50 m.) (Figure 4). The Li Auto L8 MAX was selected as the experimental vehicle for the following reasons: (1) The model achieved a market share of over 18% among L2+ passenger vehicles in China during 2023–2024, making it highly representative. (2) Its officially reported AEB false-trigger rate is below 0.1 events per 1000 km, and the accident rate associated with intelligent driving operations ranks among the lowest in the industry, thereby minimizing takeover noise induced by non-tunnel-related factors. A synchronous acquisition system is installed in the vehicle to obtain real-time data from the vehicle’s sensors. Under the premise of ensuring safety, the autonomous driving mode is activated for the experiment, and data are recorded using the equipment. The experimental platform configuration is as follows: the server side uses an NVIDIA GeForce RTX 3060 GPU (12GB VRAM) (manufacturer NVIDIA Corporation, Santa Clara, California, USA) running TensorFlow 2.4 and Windows 11, with a computing environment based on CUDA 11 and CUDNN 8; the mobile side uses an NVIDIA Jetson Xavier NX (6GB VRAM) (manufacturer NVIDIA Corporation, Santa Clara, California, USA) running Ubuntu 20.04, also based on CUDA 11 and CUDNN 8.

4.2. Experimental Tunnel Selection

Considering the reliability and validity of the survey questionnaire, during the evaluation process, questionnaires were distributed to 273 professional tunnel designers with over 5 years of experience and 82 researchers specializing in autonomous driving scenario recognition technology with more than 2 years of experience. A total of 254 questionnaires were collected. Based on the survey results, this study selected tunnel portal sections in urban areas across different regions of China as research subjects, such as Dapo Tunnel in Chongqing, Zhao Mushan Tunnel, and Guiyang or Beizhan Tunnel, totaling 43 tunnels (Table 8). Additionally, 10 professionals and 12 experimental personnel with over 5 years of driving experience were invited to operate autonomous vehicles for on-site filming and evaluation. According to design speed, geometric characteristics of the portal section, and design features such as portal shape and lighting, 10 representative urban tunnels (Table 9 and Table 10) were selected as the primary focus of this study. Data collection was conducted during off-peak hours (09:30–11:00 and 14:00–16:30) with visibility exceeding 1 km. The sampling density required each scenario type to pass through each tunnel at least 5 times at a speed of 40 ± 5 km/h, ensuring that detectable elements accounted for 5–30% of the image pixels.

4.3. Experimental Scenario Division

Through the investigation of the experimental tunnel, it was determined that the starting position of the experimental adjacent section should be located 150–200 m from the tunnel entrance, with the reinforcement section covering 20–50 m upon entering or exiting the tunnel, and the transition section spanning 30–100 m. Once the experimental preparations are completed, the driver positions the vehicle on the main road 1 km away from the tunnel entrance. Upon receiving the start command via a walkie-talkie, the driver switches the vehicle to autonomous driving mode and proceeds at a constant speed toward the tunnel. Each experimental tunnel is tested sequentially, with five passes conducted per tunnel for data collection. Figure 5 illustrates the entire experimental tunnel and the data collection segments.

5. Results and Discussion

The collected images (216,000 original frames) were filtered according to the following criteria:
(1) Brightness exclusion: frames with an average grayscale <30 or >220 (8-bit) were discarded;
(2) Motion blur: frames with a Laplacian variance <100 were discarded;
(3) Occlusion exclusion: frames where the element boundaries could not be 100% confirmed by the human eye were discarded.
Ultimately, 376 “golden samples” (Table 11) were retained, covering all 48 scene categories, with 6–10 frames per category. The X2-test class frequency p = 0.18, showing no significant skew.
As shown in Figure 6, Figure 7 and Figure 8, as the number of training epochs increases, the loss value of the algorithm approaches infinitely close to zero.

5.1. Perception Sensitivity Ranking Results

5.1.1. Identification Sequence in Full-Factor Scenarios

The first set of experiments simulated scenarios where all elements were present (Table 9, Figure 9). Statistical analysis was conducted on the image inference rates of elements such as roads (A1), tunnel portals (A2), and traffic signs (B1). The results showed that, when the scene elements were relatively abundant (nine elements), the CNN prioritized the recognition of roads and traffic markings, followed by cars and two-wheeled motorcycles, in the following order: roads > traffic markings > cars > motorcycles. The reason for this is that roads and markings are the primary conditions for vehicles to maintain normal travel, while cars and motorcycles are dynamic obstacles; hence, they have higher priority in the recognition sequence. The recognition order of the remaining elements was as follows: traffic signs > tunnel portal structures > traffic signboards > tunnel portal vegetation > tunnel portals (Table 12, Table 13 and Table 14). Overall, the CNN effectively recognized all basic elements, but the specific order dynamically adjusted according to the vehicle’s position and spatiotemporal changes in the scene.
The 95% confidence intervals between classes did not overlap, and all ICC values were >0.80.

5.1.2. Constant Sensitivity Elements

The second set of experiments focused on analyzing the sensitivity of convolutional neural networks (CNNs) in recognizing traffic signs, markings, and signs. Two types of scenarios were designed, i.e., the first included roads, tunnel portals, traffic signs, and traffic markings, but no traffic signs; the other included roads, tunnel portals, traffic markings, and traffic signs, but no traffic signs, with all other elements kept consistent (Table 15). The detection results (Figure 10) showed that traffic markings were the first to be recognized in both scenarios. In comparison, traffic signs were recognized faster than traffic signs. This outcome reflects the logical characteristics of autonomous driving image recognition, where the training dataset is primarily designed based on national standards and regulations. As a result, traffic signs with legal validity exhibit a higher recognition priority, while traffic signs lag behind traffic signs (Table 16).

5.1.3. Sensitivity Variation Factors

The third convolutional neural network (CNN) evaluates the analysis of sensitivity detection for basic environmental elements: vegetation at the tunnel entrance and structures near the tunnel entrance. Three scenario types were designed for this analysis, as detailed in Table 11. The detection results for these three scenarios (shown in Figure 11) indicate that structures near the tunnel entrance are identified first, followed by vegetation at the tunnel entrance (Table 17 and Table 18). This is likely because the structures have clearer contours compared to the vegetation, are located close to the tunnel portal, and have a strong association with the entrance, allowing them to be identified first.
The fourth group of convolutional neural networks (CNNs) evaluates the detection of dynamic operational environment elements: vehicles (in a broad sense) and motorcycles. There are also three types of working conditions for such scenarios, as shown in Table 19. According to the image processing inference speed rates for the three different working conditions in Table 20, it can be observed that vehicles (in a broad sense) are slightly faster than motorcycles. This result is influenced, on the one hand, by the experimental tunnel environment, as some cities are motorcycle-restricted cities where motorcycles do not participate in urban traffic. On the other hand, autonomous vehicles travel within motor vehicle lanes during operation, with the primary objects of recognition being vehicles (in a broad sense). The occurrence of motorcycles is an occasional situation, which explains the slightly lower image inference speed for motorcycles (Figure 12).

5.2. Sensitivity Comparison Under Different Element Combinations

5.2.1. Traffic Signs and Markers

The fifth set of experiments analyzed the recognition of traffic signs, markings, and signs (Category B) and basic environmental elements (Category C) at tunnel portals, with a total of six working conditions (Table 21). The results showed that the recognition speed of traffic signs and traffic signs was faster than that of portal vegetation and structures, with the order being traffic signs > traffic signs > basic environmental elements (Figure 13, Table 22). It is worth noting that, in this set of experiments, the recognition speed of traffic signs was higher than that of portal structures, which is inconsistent with the results of the first group. The reasons may be as follows: (1) The sixth group only included four types of calibration elements, reducing interference and thereby improving recognition accuracy and speed. (2) In terms of spatial positioning, traffic signs are located before portal structures, providing stronger guidance in the absence of traffic signs. Comprehensive analysis indicates that this difference is reasonable. At the same time, the conclusion that the recognition speed of traffic signs is faster than that of traffic signs remains consistent with the results of the first set of experiments.

5.2.2. Tunnel Portal Structures and Portal Vegetation

The sixth set of experiments examined scenarios where both the basic environment (C) and the dynamic operating environment (D) changed simultaneously, with a total of nine conditions designed (Table 23). Among them, Scenario 9 had no elements and served as a reference. The comparison of inference speed results indicated that, when dynamic elements and basic elements appear simultaneously, dynamic elements are more likely to be prioritized for recognition. For example, the recognition speed of two-wheeled motorcycles was consistently faster than that of cave vegetation or cave structures, and the recognition speed of cars was also faster than that of the corresponding basic environmental elements. The results from Scenarios 5–8, which retained only a single element, further revealed that the recognition priority order is cars > two-wheeled motorcycles > cave structures > cave vegetation (Figure 14, Table 24). This order is consistent with the results of the first set of experiments, once again confirming the priority of the dynamic operating environment in CNN perception.

5.2.3. Cars and Two-Wheeled Motorcycles

The seventh group of convolutional neural networks (CNNs) was used to evaluate scenarios where traffic signs, markings, and signs (Category B) change simultaneously with the dynamic operational environment (Category D), comprising a total of six working conditions (Table 25, Figure 15). Based on the inference speeds of the images under each condition (Table 26), it can be observed that vehicle-class images were processed the fastest, while traffic signs had the lowest speed. Scenario 3 and Scenario 6 contained only a single Category B element, and their values are for reference only. These results are consistent with the conclusions from the first group of experiments.

5.2.4. Priority Recognition of Dynamic Elements When Multiple Types of Elements Coexist

The 8th group of convolutional neural networks (CNNs) evaluates scenarios where the three major categories of elements, excluding infrastructure, change simultaneously. There are 18 working conditions for this type of scenario (Table 27). Similarly, this study also compiled the image processing inference rates corresponding to these 18 working conditions into a table (Table 28). From this group of evaluation experiments, it can be observed that the autonomous driving system still maintains a high inference rate for vehicles (in a broad sense), followed by two-wheeled motorcycles, traffic signs, tunnel structures, traffic signboards, and tunnel vegetation. This result once again aligns with the findings from the first group of experiments.

6. Discussion

6.1. Interpretation of Perceptual Sensitivity Rankings

The perceptual sensitivity rankings reported in this study describe the inference-time behavior of a specific convolutional neural network (ResNet50–PSPNet) when processing visual elements in urban tunnel portal scenes. These rankings reflect relative computational latency at the perception stage, rather than engineering causality or real-world safety performance.
Elements such as road surfaces and lane markings exhibit shorter inference times, which is likely related to their geometric continuity, high contrast, and stable spatial distribution across samples. In contrast, tunnel portal outlines, vegetation, and portal-related structures require longer inference times, potentially due to irregular boundaries, complex textures, and partial occlusion effects that increase feature extraction and multi-scale fusion costs. These characteristics are consistent with known behaviors of vision-based semantic segmentation networks in visually heterogeneous and semi-enclosed environments.
It should be emphasized that the reported rankings do not indicate the importance of elements from a traffic safety or infrastructure design perspective. The results are constrained to a fixed network architecture, hardware configuration, and controlled visual conditions, and therefore represent model-level perception characteristics only.

6.2. Engineering Implications and Scope of Applicability

The findings of this study are not intended to provide direct guidance for tunnel lighting design, signage placement, or V2X deployment. The experiments do not involve simulation or testing of alternative infrastructure configurations, nor do they establish a quantitative relationship between perception latency and safety-related outcomes such as collision risk, stopping distance, or takeover behavior.
Within these limitations, the results may serve as a diagnostic reference for identifying elements that impose higher or lower perceptual load on vision-based models at tunnel portals. From an engineering research perspective, such information can assist in prioritizing elements for further investigation in subsequent studies that incorporate closed-loop simulations, multi-sensor fusion, or behavioral validation. In this sense, the present work complements, rather than replaces, conventional tunnel design analysis and traffic safety evaluation.

6.3. Methodological Contribution and Boundary Conditions

A primary contribution of this study lies in the requirement-driven methodological framework for selecting perception-relevant elements and comparing their sensitivity using inference latency as a hardware-aware metric. By integrating expert screening, controlled scenario construction, and pixel-level annotation, the framework enables reproducible comparison of perceptual load across element categories under consistent conditions.
Nevertheless, several boundary conditions must be noted. First, the sensitivity rankings are dependent on the selected CNN architecture and are not expected to generalize directly to other models without re-evaluation. Second, the analysis focuses exclusively on the perception layer and does not account for downstream planning, control, or human–machine interaction processes. Third, although the dataset size is sufficient for detecting millisecond-level latency differences, it remains smaller than large-scale autonomous driving benchmarks.
These constraints define the applicable scope of the conclusions and highlight the need for cautious interpretation.

7. Conclusions

This study proposes a requirement-driven framework for identifying and comparing perception-critical elements in urban tunnel portal scenes, using CNN inference latency as an indicator of perceptual load. Based on expert screening, controlled scenario combinations, and real-world tunnel imagery, the perception sensitivity of nine categories of elements was quantitatively analyzed.
The main conclusions are summarized as follows:
(1) A functional–safety screening process identified nine perception-relevant elements at urban tunnel portals, forming a tunnel-specific element system that is not explicitly addressed in existing open-road datasets.
(2) A balanced dataset of 376 pixel-level annotated images covering 48 logical scene combinations was constructed, enabling fair and reproducible comparison of element-level inference latency.
(3) Under the tested conditions and network architecture, a stable latency ranking was observed, with road surfaces and lane markings exhibiting the shortest inference times, followed by dynamic obstacles, traffic signs, and portal-related environmental elements.
(4) The reported rankings characterize model-level perception behavior rather than engineering or safety causality. Accordingly, the results are suitable for perception-oriented diagnostics and scenario construction, but should not be interpreted as standalone infrastructure design recommendations.

8. Limitations

This study is subject to several limitations. It does not establish a quantitative mapping between perception latency and safety outcomes, nor does it evaluate alter-native infrastructure interventions. The conclusions are restricted to vision-based perception under clear conditions and to the selected CNN architecture. Future work will extend the framework to larger datasets, adverse environmental conditions, multi-sensor perception, and closed-loop perception–decision–control evaluations. The proposed sensitivity analysis focuses on element-level inference latency under fixed network and hardware configurations. While this metric effectively captures real-time computational constraints, it does not imply equivalence to perception accuracy nor fully represent end-to-end driving performance.
Overall, this work provides a methodological basis and a benchmark dataset for perception sensitivity analysis at urban tunnel portals, contributing a perception-oriented perspective for future research on autonomous driving in complex semi-enclosed environments.

Author Contributions

M.X.: conceptualization, methodology, formal analysis, writing—original draft; B.L.: data curation, software, validation; H.L.: data curation, software, validation; C.C.: formal analysis, writing; H.Z.: supervision, funding acquisition; S.Z.: supervision, funding acquisition, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China (No. 52378391).

Institutional Review Board Statement

This study was conducted in accordance with the guidelines of the Ethics Review Committee of Chongqing Jiaotong University, and approved on 18 March 2025 (Approval No. S2025-173.03).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Forty-eight types of logical scenarios.
Table A1. Forty-eight types of logical scenarios.
Scene IDTraffic Signs
B1
Traffic Signboards
B3
Tunnel Portal Vegetation
C1
Tunnel Portal Structures
C2
Cars
D1
Motorcycles
D2
Number of Tunnels AppearingFrames per Second
Scene01108
Scene02-108
Scene03-107
Scene04--107
Scene05-108
Scene06--107
Scene07--107
Scene08---106
Scene09-108
Scene10--107
Scene11--107
Scene12---106
Scene13--108
Scene14---107
Scene15---107
Scene16----106
Scene17-108
Scene18--107
Scene19--107
Scene20---106
Scene21--108
Scene22---107
Scene23---107
Scene24----106
Scene25--108
Scene26---107
Scene27---107
Scene28----106
Scene29---108
Scene30----107
Scene31----107
Scene32-----106
Scene33-108
Scene34--107
Scene35--107
Scene36---106
Scene37--108
Scene38---107
Scene3---107
Scene40----106
Scene41--108
Scene42---107
Scene43---107
Scene44----106
Scene45---108
Scene46----107
Scene47----107
Scene48-----106
Note: Traffic sign B1, road marking B2, signboard B3, vegetation at opening C1, structure at opening C2, automobile D1, two-wheeled motorcycle D2; √ indicates the item is present; - indicates the item is absent.

References

  1. Gao, F.; Duan, J.; Han, Z. Automatic virtual test technology for intelligent driving systems considering both coverage and efficiency. IEEE Trans. Veh. Technol. 2020, 69, 14365–14376. [Google Scholar] [CrossRef]
  2. Chen, Q.; Wang, X.; Yang, J. Trajectory-following guidance based on a virtual target and an angle constraint. Aerosp. Sci. Technol. 2019, 87, 448–458. [Google Scholar] [CrossRef]
  3. Ma, Z.; Sun, J.; Wang, Y. A two-dimensional simulation model for modelling turning vehicles at mixed-flow intersections. Transp. Res. Part C Emerg. Technol. 2017, 75, 103–119. [Google Scholar] [CrossRef]
  4. Liu, Y.; Peng, H.; Chen, Q. Machine vision performance under extreme luminance contrast: A field study at urban tunnel portals. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21352–21363. [Google Scholar]
  5. Wang, Z.; Zhang, J.; Zhao, Y. Visual perception degradation of drivers under sudden illumination transition at tunnel entrance and exit. J. Traffic Transp. Eng. 2020, 7, 583–592. [Google Scholar]
  6. Zhao, L.; Lu, S.; Chen, T. Deep symmetric network for underexposed image enhancement with recurrent attentional learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 12075–12084. [Google Scholar]
  7. Ma, Q.; Ma, L.; Wang, J. Research on visual fusion technology at the entrance and exit of mountain highway tunnels. Laser Infrared 2023, 53, 1–9. [Google Scholar] [CrossRef]
  8. Huang, D.; Jiang, H.; Xu, C.J. A new design method of shield tunnel based on the concept of minimum bending moment. Appl. Sci. 2022, 12, 1082. [Google Scholar] [CrossRef]
  9. Caesar, H.; Bankiti, V.; Lang, A.H. nuScenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11618–11628. [Google Scholar]
  10. Sun, P.; Kretzschmar, H.; Dotiwalla, X. Waymo Open Dataset: An autonomous driving dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2443–2451. [Google Scholar]
  11. Liu, Y.; Zhang, H. Machine Learning for Transportation Safety; Springer: Berlin/Heidelberg, Germany, 2024; pp. 105–120. [Google Scholar]
  12. Xie, X.; Cheng, W. A fusion method of infrared and visible light for tunnel leakage detection based on U2-Net. Mod. Electron. Tech. 2024, 47, 1–7. [Google Scholar]
  13. Zhang, Q.; Fu, Y. Effective traffic density recognition based on ResNet-SSD with feature fusion and attention mechanism in normal intersection scenes. Expert Syst. Appl. 2025, 261, 125508. [Google Scholar] [CrossRef]
  14. Cai, F.; Qu, Z.; Xia, S.; Wang, S. A method of object detection with attention mechanism and C2f_DCNv2 for complex traffic scenes. Expert Syst. Appl. 2025, 267, 126141. [Google Scholar] [CrossRef]
  15. Guo, X.; Zhou, W.; Liu, T. Contrastive learning-based knowledge distillation for RGB-thermal urban scene semantic segmentation. Knowl. Based Syst. 2024, 292, 111588. [Google Scholar] [CrossRef]
  16. Yang, B.; Yang, S.; Wang, P.; Wang, H.; Jiang, J.; Ni, R.; Yang, C. FRPNet: An improved Faster-ResNet with PASPP for real-time semantic segmentation in the unstructured field scene. Comput. Electron. Agric. 2024, 217, 108623. [Google Scholar] [CrossRef]
Figure 1. Hierarchical construction of the operational scenario for the portal section of an urban tunnel.
Figure 1. Hierarchical construction of the operational scenario for the portal section of an urban tunnel.
Applsci 16 00453 g001
Figure 2. Relationship diagram of multi-level scenes at the portal section of an urban tunnel.
Figure 2. Relationship diagram of multi-level scenes at the portal section of an urban tunnel.
Applsci 16 00453 g002
Figure 3. Convolutional neural network judgment model.
Figure 3. Convolutional neural network judgment model.
Applsci 16 00453 g003
Figure 4. Experimental equipment and vehicle.
Figure 4. Experimental equipment and vehicle.
Applsci 16 00453 g004
Figure 5. The entire experimental tunnel and the section for collecting experimental data. Red dashed line indicates the driving segment.
Figure 5. The entire experimental tunnel and the section for collecting experimental data. Red dashed line indicates the driving segment.
Applsci 16 00453 g005
Figure 6. Training and validation loss curves.
Figure 6. Training and validation loss curves.
Applsci 16 00453 g006
Figure 7. Training and validation mIoU curves.
Figure 7. Training and validation mIoU curves.
Applsci 16 00453 g007
Figure 8. Early stopping criterion.
Figure 8. Early stopping criterion.
Applsci 16 00453 g008
Figure 9. Scene recognition results of the first group of images. The Chinese text reads “Yun Shan Tunnel”.
Figure 9. Scene recognition results of the first group of images. The Chinese text reads “Yun Shan Tunnel”.
Applsci 16 00453 g009
Figure 10. Scene recognition results for the second group of images. The Chinese text reads “Yun Shan Tunnel”.
Figure 10. Scene recognition results for the second group of images. The Chinese text reads “Yun Shan Tunnel”.
Applsci 16 00453 g010
Figure 11. Scene recognition results of the third group of images. The Chinese text reads “Qi Chong No.2 Tunnel”.
Figure 11. Scene recognition results of the third group of images. The Chinese text reads “Qi Chong No.2 Tunnel”.
Applsci 16 00453 g011
Figure 12. Scene recognition results of the fourth group of images.
Figure 12. Scene recognition results of the fourth group of images.
Applsci 16 00453 g012
Figure 13. Scene recognition results of the sixth group of images. The Chinese text reads “Yun Shan Tunnel”.
Figure 13. Scene recognition results of the sixth group of images. The Chinese text reads “Yun Shan Tunnel”.
Applsci 16 00453 g013
Figure 14. Scene recognition results for the fifth group of images. The Chinese text reads “Nanjing Yangtze River Tunnel”.
Figure 14. Scene recognition results for the fifth group of images. The Chinese text reads “Nanjing Yangtze River Tunnel”.
Applsci 16 00453 g014
Figure 15. Scene recognition results of the seventh group of images. The Chinese text reads “Nanjing Yangtze River Tunnel”.
Figure 15. Scene recognition results of the seventh group of images. The Chinese text reads “Nanjing Yangtze River Tunnel”.
Applsci 16 00453 g015
Table 1. Four-layer scene hierarchy.
Table 1. Four-layer scene hierarchy.
LayerLayer Name (Figure 1)Attribute Name (Code)Data TypeUnit/RangePixel-Level Mask IDModel Usage (↓Section 3.1)Sensitivity Metric (↓Section 5.2)
1Tunnel-infrastructurePortal outline (PO)float[0, 1] contour2multi-scale input channeldelay_PO (ms)
1Tunnel-infrastructureRoad boundary (RB)bool0/11loss weight × 2.0delay_RB (ms)
2Environmental-perceptionIlluminance level (IL)int1–5 (klx bins)-brightness gain maskdelay_IL (ms)
2Environmental-perceptionRetro-reflectivity (RR)float[50, 500] mcd·m-2·lx-14online aug. triggerdelay_RR (ms)
3Dynamic-trafficMotorcycle count (MC)Int0–109aug. prob. × 1.5delay_MC (ms)
3Dynamic-trafficCar count (CC)Int0–208aug. prob. × 1.2delay_CC (ms)
4V2X-communicationRSU broadcast (RSU)bool0/1-label smoothing 0.9delay_RSU (ms)
(Note: ① The pixel-level mask ID corresponds to the class values in the annotated PNG files (0 = background). ② All attribute fields are written, together with the image file name, into a CSV file with identical field names, which is used for model invocation in Section 3.1 and for delay regression analysis in Section 5.2.)
Table 2. Conventional element taxonomy for autonomous-driving scenes at urban tunnel portals.
Table 2. Conventional element taxonomy for autonomous-driving scenes at urban tunnel portals.
CategorySubcategoryRemarks
Facility typeBasic facilitiesTunnel cross-section structure, portal configuration, etc.
Traffic signs and markingsRoad markings, reflective signs, etc.
Environmental typeBasic environmentPortal vegetation, portal structures, etc.
Lighting environmentTunnel skylight, portal illumination, etc.
Dynamic traffic environmentTraffic flow, environmental dynamics, etc.
Table 3. Scenario information perception factor scoring.
Table 3. Scenario information perception factor scoring.
ElementCategory ElementName Functional Relevance, R (1–5)Safety Criticality, S (1–5)
Facility Categoryroads
tunnel portals
Transportation Categorytraffic signs
traffic markings
traffic signboards
Basic Environmenttunnel portal vegetation
tunnel portal structures
Dynamic Environmentcars
motorcycles
Light Environmentweather
tunnel lighting
Table 4. Scenario information perception factor screening matrix.
Table 4. Scenario information perception factor screening matrix.
ElementCategory ElementName Functional Relevance, R (1–5)Safety Criticality, S (1–5)Comprehensive Score, (R+S)/2Retain (Yes/No)Remarks
Facility Categoryroads555.0YesDrivable area boundary
tunnel portals454.5YesGeometry—visual mutation source
Transportation Categorytraffic signs555.0YesLegal speed Limit/behavioral instructions
traffic markings555.0YesLane keeping reference
traffic signboards343.5Yes *Auxiliary warning, must retain
Basic Environmenttunnel portal vegetation243.0Yes *Possible obstruction of signs
tunnel portal structures343.5YesLighting affected by shading structures
Dynamic Environmentcars555.0YesPrimary dynamic obstacles
motorcycles444.0YesHigh risk in lateral blind spots
Light Environmentweather132.0NoRandom and uncontrol-able, background input
tunnel lighting232.5NoPresent in every tunnel, not combinable
(Note: overall score ≥ 4: retain directly; 3–3.9: retain after combining with specification checks; <3: remove or use only as background condition; * Denotes a required entry.)
Table 5. Inter-rater agreement statistics for the two-round Delphi scoring.
Table 5. Inter-rater agreement statistics for the two-round Delphi scoring.
RoundMetricICC (2,k)95%CIFleiss-kInterpretation
1Functional relevance, R0.790.70–0.860.74Substantial
2Functional relevance, R0.870.81–0.920.81Almost perfect
1Safety criticality, S0.760.67–0.840.72Substantial
2Safety criticality, S0.830.76–0.890.78Substantial
(Note: all final-round values exceed the acceptable threshold of 0.75, confirming statistical reliability of the expert scores.)
Table 6. Detailed list of 9 categories of elements.
Table 6. Detailed list of 9 categories of elements.
Serial NumberElement NameRemarks
1roadsProvide drivable area boundaries
2tunnel portalsStrong visual–geometric mutation source
3traffic signsLegal speed limits and behavioral instructions
4traffic markingsThe direct reference for lane keeping
5traffic signboardsAuxiliary forecast information
6tunnel portal vegetationCovering signs or edges
7tunnel portal structuresSunshades, light-reducing grilles, and similar structures affect luminance distribution
8carsMain dynamic obstacles
9motorcyclesHigh-speed differences, prone to occur in lateral blind spots
Table 7. Final dataset partition results.
Table 7. Final dataset partition results.
SubsetFrame
Count
ProportionFrame Count Range Per CategoryTunnel Isolation Requirement
Training set30180%5–8 frames-
Validation set4712.5%1–2 frames-
Test set287.5%1 frame
Note: √ indicates the item is present; - indicates the item is absent.
Table 8. Forty-three urban tunnel portal sections in different regions of China.
Table 8. Forty-three urban tunnel portal sections in different regions of China.
Tunnel NumberTunnel
Name
Tunnel NumberTunnel
Name
1Da Po Tunnel23Xing Guang Tunnel
2Hong Yancun Tunnel24Zi Zhi Tunnel
3Jin Shan Tunnel25Bayi Tunnel
4Mei Xihu Tunnel26Ciqikou Tunnel
5Ren He Tunnel27Yiguanlu Tunnel
6Tong Mao Tunnel28North Railway Station Tunnel
7Pian Po Tunnel29Xiaba Tunnel
8Chang Jiang Tunnel30Song Hualu Tunnel
9Xiang Yang Tunnel31Guan Yinshan Tunnel
10Xiao Quan Tunnel32Yang He Tunnel
11Wen Chang Tunnel33Ma Anshan Tunnel
12Da Xiaochang Tunnel34Qi Chong No. 2 Tunnel
13Wu Lihu Tunnel35Feng Cheshan Tunnel
14Qing Qi Tunnel36Cheng Jiangxilu Tunnel
15Hui Shan Tunnel37Zheng Jiatun Tunnel
16Yun Shan Tunnel38Fu Xing Donglu Tunnel
17Gu Xiong Tunnel39Long Yaolu Tunnel
18Dian Chilu Tunnel40Wu Long Tunnel
19Xi Zang South Road Tunnel41Dong Anhu Tunnel
20Yu Longwan Tunnel42Xin Jianlu Tunnel
21Tong Bailu Tunnel43Wei Lailu Tunnel
22Wei Silu Tunnel//
Table 9. Ten representative urban tunnels in China.
Table 9. Ten representative urban tunnels in China.
Tunnel Working Condition NumberTunnel
Name
CityRoad
Grade
Portal
Form
Design
Speed (km/h)
1Da Po TunnelChong Qingsix-lane dual carriagewayCut Bamboo Style Cave Entrance 60
2North Railway Station TunnelGui Yangsix-lane dual carriagewayArched Cave Entrance60
3Dong Anhu TunnelCheng Dufour-lane dual carriagewayFramed Tunnel Portal60
4Mei Xihu TunnelChang Shafour-lane dual carriagewayShed-type Tunnel Portal80
5Qi Chong No. 2 TunnelGui Yangeight-lane dual carriagewayArched Cave Entrance60
6Tong Mao TunnelChong Qingsix-lane dual carriagewayCut Bamboo Style Cave Entrance50
7Zi Zhi
Tunnel
Hang Zhousix-lane dual carriagewaySpecial Decorative Cave Gate80
8Chang Jiang TunnelNan Jingeight-lane dual carriagewayShed-type Tunnel Portal80
9Yun Shan TunnelGuang Zhousix-lane dual carriagewayFramed Tunnel Portal60
10Xiao Quan TunnelChong Qingsix-lane dual carriagewayCut Bamboo Style Cave Entrance80
(Note: After tunnel screening, the current experimental tunnels do not permit electric vehicles to enter. Additionally, before the experiment began, researchers drove vehicles in autonomous mode during morning and evening peak traffic and observed that the vehicles frequently exited autonomous mode due to road conditions, requiring driver intervention. Therefore, traffic flow scenarios were limited to off-peak periods only.)
Table 10. Five-dimensional statistical profile of the 10 selected tunnel portals.
Table 10. Five-dimensional statistical profile of the 10 selected tunnel portals.
Tunnel IDPortal ShapeDesign Speed (km/h)LanesDay Illuminance (klx)Night Illuminance (klx)Max Luminance Gradient (klx/30 m)K-Means Cluster
T01Cut-bamboo60663.44.8120.2C1
T02End-wall60661.25.1118.9C1
T03Shed80458.74.6115.5C1
T04Cut-bamboo80857.14.4117.3C2
T05End-wall60859.84.7119.1C2
T06Shed50456.54.3116.8C2
T07Cut-bamboo80652.34.0109.6C3
T08End-wall50653.14.2111.4C3
T09Shed60451.73.9110.2C3
Note: ANOVA results: among-cluster difference for illuminance and gradient: p < 0.01 (significant); within-cluster difference: p = 0.92 (not significant) → unbiased selection confirmed.
Table 11. Main scenario elements of autonomous driving training set for urban tunnel portal sections.
Table 11. Main scenario elements of autonomous driving training set for urban tunnel portal sections.
ClassificationElementsNumberSample
infrastructure (A)1. roadsA1Applsci 16 00453 i001
2. tunnel portalsA2Applsci 16 00453 i002 *
traffic signs, markings, and signboards
(B)
3. traffic signsB1Applsci 16 00453 i003 **
4. traffic markingsB2Applsci 16 00453 i004
5. traffic signboardsB3Applsci 16 00453 i005 ***
basic environment (C)6. tunnel portal vegetationC1Applsci 16 00453 i006
7. tunnel portal structuresC2Applsci 16 00453 i007
dynamic runtime environment (D)8. carsD1Applsci 16 00453 i008
9. motorcyclesD2Applsci 16 00453 i009
Note: * The text in the figure reads: “Nanhu Tunnel”; ** The text in the figure reads: “Jiangbei Airport, Yuhang Road, Shuanglong Avenue”; *** The text in the figure reads: “Tongmao Avenue Tunnel, total length 875 m, drive with headlights on”, Original image supplied without post-processing for authenticity.
Table 12. Experimental simulation scenario with all elements present.
Table 12. Experimental simulation scenario with all elements present.
Scene NumberRoadsTunnel PortalsTraffic SignsTraffic MarkingsTraffic SignboardsTunnel Portal VegetationTunnel Portal StructuresCarsMotorcycles
1A1A2B1B2B3C1C2D1D2
Table 13. Basic building block image processing inference speed.
Table 13. Basic building block image processing inference speed.
Scene NumberCategoryReasoning Speed (ms/Frame)Computational Load (GFLOPs)
1roads53.5
2traffic markings5.53.8
3cars64
4motorcycles6.54.2
5traffic signs74.3
6tunnel portal structures7.24.4
7traffic signboards7.54.5
8tunnel portal vegetation84.6
9tunnel portals8.54.8
Table 14. Inference latency statistics of CNN for full-element scenes.
Table 14. Inference latency statistics of CNN for full-element scenes.
ElementMean Delay (ms)SD (ms)95%CI (ms)ICC (2,k)
Road5.00.21[4.8, 5.2]0.91
Traffic marking5.50.23[5.3, 5.7]0.89
Car6.00.25[5.8, 6.2]0.88
Motorcycle6.50.27[6.3, 6.7]0.87
Traffic sign7.00.29[6.8, 7.2]0.86
Portal structure7.20.30[7.0, 7.4]0.85
Signboard7.50.32[7.3, 7.7]0.84
Portal vegetation8.00.34[7.8, 8.2]0.83
Portal outline8.50.36[8.3, 8.7]0.82
Table 15. Scenario types for traffic signs, markings, and signboards.
Table 15. Scenario types for traffic signs, markings, and signboards.
Scene NumberRoadsTunnel PortalsTraffic SignsTraffic MarkingsTraffic SignboardsTunnel Portal VegetationTunnel Portal StructuresCarsMotorcycles
1A1A2B1B2\C1C2D1D2
2A1A2\B2\C1C2D1D2
Table 16. Image processing inference speed for traffic signs, road markings, and traffic signage.
Table 16. Image processing inference speed for traffic signs, road markings, and traffic signage.
Scene NumberCategoryReasoning Speed (ms/Frame)Computational Load (GFLOPs)
1cars84
2motorcycles7.24.2
3tunnel portal structures6.54.4
4tunnel portal vegetation64.6
Table 17. Scenario types for vegetation and structures at cave entrances.
Table 17. Scenario types for vegetation and structures at cave entrances.
Scene NumberRoadsTunnel PortalsTraffic SignsTraffic MarkingsTraffic SignboardsTunnel Portal VegetationTunnel Portal StructuresCarsMotorcycles
1A1A2B1B2B3\C2D1D2
2A1A2B1B2B3C1\D1D2
3A1A2B1B2B3\\D1D2
Table 18. Image processing inference speed for vegetation and structures at cave entrance.
Table 18. Image processing inference speed for vegetation and structures at cave entrance.
Scene NumberCategoryReasoning Speed (ms/Frame)Computational Load (GFLOPs)
1tunnel portal structures7.24.4
2tunnel portal vegetation84.6
Table 19. Scenario types for automobiles (broadly defined) and two-wheeled motorcycles.
Table 19. Scenario types for automobiles (broadly defined) and two-wheeled motorcycles.
Scene NumberRoadsTunnel PortalsTraffic SignsTraffic MarkingsTraffic SignboardsTunnel Portal VegetationTunnel Portal StructuresCarsMotorcycles
1A1A2B1B2B3C1C2D1\
2A1A2B1B2B3C1C2\D2
3A1A2B1B2B3C1C2\\
Table 20. Image processing inference speed comparison for automobiles (broad definition) and two-wheeled motorcycles.
Table 20. Image processing inference speed comparison for automobiles (broad definition) and two-wheeled motorcycles.
Scene NumberCategoryReasoning Speed (ms/Frame)Computational Load (GFLOPs)
1cars64
2motorcycles6.54.2
Table 21. Scenario types in which both the two major element categories—traffic signs, road markings, and signage (B) and the basic environment (C)—change simultaneously.
Table 21. Scenario types in which both the two major element categories—traffic signs, road markings, and signage (B) and the basic environment (C)—change simultaneously.
Scene NumberRoadsTunnel PortalsTraffic SignsTraffic MarkingsTraffic SignboardsTunnel Portal VegetationTunnel Portal StructuresCarsMotorcycles
1A1A2B1B2\\C2D1D2
2A1A2B1B2\C1\D1D2
3A1A2B1B2\\\D1D2
4A1A2\B2B3C1\D1D2
5A1A2\B2B3\\D1D2
6A1A2\B2B3\C2D1D2
Table 22. Image processing inference rate table for simultaneous changes in traffic signs, markings, signs (B), and basic environment (C) elements.
Table 22. Image processing inference rate table for simultaneous changes in traffic signs, markings, signs (B), and basic environment (C) elements.
Scene NumberCategoryReasoning Speed (ms/Frame)Computational Load (GFLOPs)
1traffic markings5.53.8
2cars64
3motorcycles6.54.2
4traffic signs74.3
5tunnel portal structures7.24.4
6traffic signboards7.54.5
7tunnel portal vegetation84.6
Table 23. Scenario types where both the basic environment (C) and dynamic operating environment (D) undergo changes simultaneously.
Table 23. Scenario types where both the basic environment (C) and dynamic operating environment (D) undergo changes simultaneously.
Scene numberRoadsTunnel PortalsTraffic SignsTraffic MarkingsTraffic SignboardsTunnel Portal VegetationTunnel Portal StructuresCarsMotorcycles
1A1A2B1B2B3C1\\D2
2A1A2B1B2B3C1\D1\
3A1A2B1B2B3\C2\D2
4A1A2B1B2B3\C2D1\
5A1A2B1B2B3C1\\\
6A1A2B1B2B3\C2\\
7A1A2B1B2B3\\D1\
8A1A2B1B2B3\\\D2
9A1A2B1B2B3\\\\
Table 24. The image processing inference rate for simultaneous changes in both the environment (C) and the dynamic operating environment (D).
Table 24. The image processing inference rate for simultaneous changes in both the environment (C) and the dynamic operating environment (D).
Scene NumberCategoryReasoning Speed (ms/Frame)Computational Load (GFLOPs)
1cars84
2motorcycles7.24.2
3tunnel portal structures6.54.4
4tunnel portal vegetation64.6
Table 25. Scenario types with simultaneous changes in both sets of elements: traffic signs, markings, and signs (B) and dynamic operating environment (D).
Table 25. Scenario types with simultaneous changes in both sets of elements: traffic signs, markings, and signs (B) and dynamic operating environment (D).
Scene NumberRoadsTunnel PortalsTraffic SignsTraffic MarkingsTraffic SignboardsTunnel Portal VegetationTunnel Portal StructuresCarsMotorcycles
1A1A2B1B2\C1C2D1\
2A1A2B1B2\C1C2\D2
3A1A2B1B2\C1C2\\
4A1A2\B2B3C1C2D1\
5A1A2\B2B3C1C2\D2
6A1A2\B2B3C1C2\\
Table 26. Image processing and inference speed for simultaneous changes in two sets of elements: traffic signs, markings, and signs (B) and dynamic operating environment (D).
Table 26. Image processing and inference speed for simultaneous changes in two sets of elements: traffic signs, markings, and signs (B) and dynamic operating environment (D).
Scene NumberCategoryReasoning Speed (ms/Frame)Computational Load (GFLOPs)
1cars64
2motorcycles6.54.2
3traffic signs74.3
4traffic signboards7.54.5
Table 27. Scenario types of synchronous changes in the other three major categories of elements excluding infrastructure.
Table 27. Scenario types of synchronous changes in the other three major categories of elements excluding infrastructure.
Scene NumberRoadsTunnel PortalsTraffic SignsTraffic MarkingsTraffic SignboardsTunnel Portal VegetationTunnel Portal StructuresCarsMotorcycles
1A1A2B1B2\C1\\D2
2A1A2B1B2\C1\D1\
3A1A2B1B2\\C2\D2
4A1A2B1B2\\C2D1\
5A1A2B1B2\C1\\\
6A1A2B1B2\\C2\\
7A1A2B1B2\\\D1\
8A1A2B1B2\\\\D2
9A1A2B1B2\\\\\
10A1A2\B2B3C1\\D2
11A1A2\B2B3C1\D1\
12A1A2\B2B3\C2\D2
13A1A2\B2B3\C2D1\
14A1A2\B2B3C1\\\
15A1A2\B2B3\C2\\
16A1A2\B2B3\\D1\
17A1A2\B2B3\\\D2
18A1A2\B2B3\\\\
Table 28. Image processing rate for scenarios with synchronous changes in the three major categories of elements excluding infrastructure.
Table 28. Image processing rate for scenarios with synchronous changes in the three major categories of elements excluding infrastructure.
Scene NumberCategoryReasoning Speed (ms/Frame)Computational Load (GFLOPs)
1cars64
2motorcycles6.54.2
3tunnel portal vegetation84.6
4tunnel portal structures7.34.4
5traffic signs74.3
6traffic signboards7.54.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, M.; Liang, B.; Long, H.; Chen, C.; Zhou, H.; Zhu, S. Perceptual Elements and Sensitivity Analysis of Urban Tunnel Portals for Autonomous Driving. Appl. Sci. 2026, 16, 453. https://doi.org/10.3390/app16010453

AMA Style

Xu M, Liang B, Long H, Chen C, Zhou H, Zhu S. Perceptual Elements and Sensitivity Analysis of Urban Tunnel Portals for Autonomous Driving. Applied Sciences. 2026; 16(1):453. https://doi.org/10.3390/app16010453

Chicago/Turabian Style

Xu, Mengdie, Bo Liang, Haonan Long, Chun Chen, Hongyi Zhou, and Shuangkai Zhu. 2026. "Perceptual Elements and Sensitivity Analysis of Urban Tunnel Portals for Autonomous Driving" Applied Sciences 16, no. 1: 453. https://doi.org/10.3390/app16010453

APA Style

Xu, M., Liang, B., Long, H., Chen, C., Zhou, H., & Zhu, S. (2026). Perceptual Elements and Sensitivity Analysis of Urban Tunnel Portals for Autonomous Driving. Applied Sciences, 16(1), 453. https://doi.org/10.3390/app16010453

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop