1. Introduction
With the continuous evolution of automated driving technologies, safety validation has become a critical factor constraining their large-scale deployment. At present, the industry generally adopts a multi-pillar validation framework composed of simulation testing, proving ground testing, and open-road testing [
1,
2,
3,
4]. However, behavioral uncertainty in complex traffic environments, the diversity of traffic participants, and the high dynamic nature of scenarios make it difficult for any single testing modality to comprehensively cover the potential risk space. In particular, under heterogeneous traffic conditions, different testing environments exhibit significant differences in terms of risk exposure capability, cost structure, and controllability. How to achieve quantitative and interpretable scenario allocation among simulation, proving grounds, and open roads, and to establish a systematic evaluation scheme for testing resources, still lacks unified theoretical support.
In terms of testing platforms and testing environment frameworks, existing studies have mainly focused on the provision of testing capabilities and the construction of validation infrastructures, leading to the development of various testing platforms and scenario libraries [
5]. These include large-scale simulation platforms for virtual environments and vehicle–road collaborative testing platforms [
6,
7,
8], reconfigurable proving grounds and multi-level scenario libraries that support diverse road conditions and facility configurations [
9,
10], as well as layered validation frameworks integrating virtual testing with physical testing [
11]. These studies have achieved significant progress in enhancing the operability of testing environments and the coverage of scenarios. However, their research emphasis has predominantly remained on the supply side of platforms and resources, without theoretically elucidating why different scenario attributes impose differentiated requirements on simulation-based testing, closed-proving-ground testing, and open-road testing, nor providing a unified quantitative evaluation scheme applicable to scenario allocation across different testing environments.
With regard to improving the detection efficiency of automated driving testing, existing studies have established a class of testing methodologies that take increasing the occurrence probability of high-risk conditions as the core objective [
12,
13,
14,
15,
16]. Such studies mainly focus on the scenario exposure process itself [
17], with optimization targets concentrated on scenario sampling and risk manifestation within a single testing environment. However, most existing research is conducted under a single testing environment framework and has not systematically addressed how scenarios with different levels of complexity and risk should be coordinated and allocated across multiple testing environments [
18], nor has it provided a theoretical basis for a unified quantitative evaluation of such cross-environment allocation strategies.
Existing studies indicate that the performance of automated driving systems varies significantly across scenarios with different levels of complexity and risk [
19,
20]. They lack unified mathematical representations and quantitative comparison mechanisms, and have not established an interpretable mapping from scenario attributes to testing requirements.
Overall, within existing research frameworks, long-standing and pervasive challenges remain in terms of result consistency and validation credibility across different testing environments. Therefore, there is still no approach that simultaneously possesses:
- (1)
a unified quantitative representation of scenario complexity and risk;
- (2)
a mechanism to characterize the nonlinear relationships between scenario attributes and testing requirements;
- (3)
a systematic framework that supports the evaluation of scenario allocation between proving ground testing and open-road testing.
To address the above challenges, this paper investigates the coupled modeling of scenario characteristics and testing resource allocation in automated driving validation. The focus is placed on how scenario-level attributes can support coordinated testing decisions across heterogeneous environments, including closed proving grounds and open-road testing.
A scenario-driven evaluation framework is developed, in which scenario complexity and risk are jointly modeled to form a unified representation of testing requirements. This unified scenario space enables consistent quantitative descriptions of multi-source and multi-type traffic scenarios, providing a common basis for subsequent allocation decisions across different testing environments.
Based on this representation, a fuzzy inference mechanism is employed to establish an explicit mapping between scenario attributes and testing environment allocation ratios. To improve the stability of this mapping under varying scenario conditions, evolutionary optimization is incorporated to refine the inference structure and parameter configuration while maintaining the semantic organization of the rule base.
The proposed framework is evaluated using representative highway, urban, and proving-ground scenarios, together with a cost-efficiency assessment model. The experimental results demonstrate the effectiveness of the framework in terms of testing coverage, resource utilization efficiency, and economic feasibility, indicating its applicability to multi-scenario automated driving validation.
The structure of the paper is illustrated in
Figure 1.
2. Materials and Methods
2.1. Scenario and Data Construction
This section constructs a testing scenario set covering three typical road environments—highways, urban roads, and proving grounds—and establishes a unified feature space for quantitative scenario characterization. The construction process is primarily data-driven. Through the joint analysis of naturalistic driving data and accident samples, key features reflecting both safety-criticality and the diversity of traffic interactions are extracted, including speed, traffic flow density, traffic participant types, and environmental disturbance conditions. In testing environment allocation for automated driving systems, scenario characterization needs to account for both aspects. Accordingly, scenario risk is introduced to represent the potential severity and uncertainty of safety outcomes, which directly constrain the admissible testing environments, as commonly considered in scenario risk assessment studies [
21]. Scenario complexity captures the structural richness of traffic interactions and the combinatorial space of participant behaviors, which has been widely used to identify challenging driving scenarios based on real-world data [
22]. As these two attributes are complementary and may vary independently in real-world traffic, a joint risk–complexity representation is adopted as the basis for subsequent testing environment allocation.
2.1.1. Introduction to the China-FOT and CIDAS Data Sources and Data Selection Criteria
Existing datasets used in automated driving research can generally be categorized into two types: naturalistic driving databases, which record continuous real-world driving behaviors under normal operating conditions, and accident or in-depth crash databases, which focus on safety-critical events and collision causation. Representative datasets include China-FOT and CIDAS from China, as well as the European UDRIVE database and the U.S. SHRP2 Naturalistic Driving Study (NDS). The basic differences and primary purposes of the four databases are summarized in
Table 1.
UDRIVE is a multi-country European naturalistic driving dataset with standardized behavioral annotations and is mainly used for driver behavior analysis and traffic safety studies, while SHRP2 NDS is a large-scale U.S. dataset providing long-term naturalistic driving trajectories together with detailed vehicle–driver information for behavior modeling and risk assessment. China-FOT is a large-scale naturalistic driving database collected under real-world traffic conditions across multiple Chinese cities, providing high-frequency time-series data on vehicle kinematics, driver control behaviors, surrounding traffic participant states, and environmental conditions. Owing to the high traffic density, mixed traffic composition, and frequent interaction scenarios in urban China, China-FOT is particularly suitable for capturing complex multi-agent interactions and realistic behavior distributions. CIDAS is a structured national accident database that focuses on detailed accident causation analysis and contains comprehensive information on collision types, impact configurations, injury severity, road environments, and traffic participant involvement. Unlike naturalistic driving datasets, CIDAS explicitly represents safety-critical and failure scenarios, thereby enabling systematic extraction of high-risk interaction patterns and key risk factors. The two data sources are complementary in terms of temporal resolution and event types, providing a solid data foundation for constructing scenario sets that cover both routine operating conditions and high-risk situations.
In this study, a data-driven scenario analysis method [
23] is adopted. After data screening, a total of 1200 representative samples are obtained. According to typical L2/L2+ functional classifications, scenarios are grouped into automated lane keeping, automated lane change, ramp entry and exit, urban lane keeping, intersection crossing, low-speed driving, and automated parking. The corresponding relationships are shown in
Table 2.
To ensure the representativeness and completeness of the samples, the following rules are applied during data screening:
- (1)
Samples must contain complete key fields, including trajectories, speed, time, weather conditions, and other essential information;
- (2)
Accident samples must have clearly defined collision causality and belong to vehicle–vehicle or vehicle–pedestrian interaction types;
- (3)
Samples with missing fields, unclear labels, or irreproducible scenarios are excluded.
2.1.2. Environmental Factors and Scenario Feature Space
The set of environmental factors listed in
Table 3 is adopted to provide a unified description of road geometry, lane organization, traffic control facilities, obstacle distribution, and environmental conditions. Each category of factors is discretized into a finite set of values, which are used to construct the scenario graph nodes and attributes in the static complexity model.
2.1.3. Selection of Representative Parameterized Scenarios
To cover representative regions of the complexity–risk space while controlling the total number of scenarios, this study constructs a testing matrix based on the complete scenario set by selecting four key parameter dimensions: vehicle speed, target object speed, weather conditions, and traffic flow density. Through multidimensional parameter combinations, eight representative parameterized scenarios (indexed as 1, 21, 31, 33, 34, 36, 39, and 47) are selected from 50 candidate scenarios, forming a representative subset that includes highway straight-road conditions, urban lane-keeping and lane-changing conditions, multi-agent interaction conditions at intersections, and low-speed proving-ground conditions.
Based on the environmental factors listed in
Table 3 and in conjunction with functional classifications, an automated driving testing scenario set is constructed, with partial examples shown in
Table 4. Scenarios are grouped by functions such as highway lane keeping, low-speed automated driving, urban lane keeping, and intersection crossing. By combining different environmental factors and participant configurations, a progressive sequence of scenarios ranging from low complexity to high complexity is formed. The scenarios indexed as 1, 21, 31, 33, 34, 36, 39, and 47 in
Table 4 correspond to typical conditions such as braking of a preceding vehicle on a straight highway segment, low-speed disturbances in a proving ground, urban lane merging, and multi-agent convergence at intersections. These scenarios constitute the representative subset selected for subsequent analysis
Based on the statistical analysis of China-FOT naturalistic driving data and CIDAS accident records, this study characterizes the distribution features of urban road scenarios from three key environmental dimensions: vehicle speed, weather conditions, and traffic flow density. These characteristics are used to constrain parameter ranges and to guide the construction of the testing matrix. The statistical results, as illustrated in
Figure 2, indicate that urban vehicle speeds are mainly concentrated in low- and medium-speed ranges, while non-ideal weather conditions and high-density or congested traffic flows account for a relatively large proportion. Complex operating conditions are often generated by the superposition of multiple factors.
Based on these findings, the testing scheme adopts a non-uniform speed sampling strategy dominated by low and medium speeds, and constructs a stratified test set according to levels of environmental degradation and traffic flow density. This ensures that the selected representative scenarios more closely reflect the statistical characteristics of real-world urban traffic scenarios in terms of complexity and risk.
According to the statistical results of the databases, vehicle speeds in urban road scenarios are mainly distributed across three levels: low, medium, and high, while traffic flow density is primarily distributed across three levels: low, moderate, and high. Therefore, subsequent scenario construction adopts values based on these discrete levels to ensure consistency with real-world distributions.
Scenario design follows a progressive principle of “baseline–variation–extreme.” Baseline scenarios are used to represent typical operating conditions at a moderate level of complexity. Variation scenarios adjust vehicle speed or traffic flow density while keeping other conditions unchanged, in order to analyze the influence of single factors. Extreme scenarios introduce strong disturbance conditions, such as high-speed driving in rainy nighttime environments or congested traffic, to cover high-risk boundary cases. During the construction of the scenario matrix, factors such as the relative speed between the ego vehicle and target objects, as well as lighting conditions, are also considered. Several paired scenarios are designed to analyze the independent effects of relative speed and environmental factors on system performance, as well as their interaction effects. The detailed parameter configurations of the representative scenarios are provided in
Table 5.
2.2. Scenario Evaluation Model
This section first constructs a scenario complexity evaluation model, in which road testing scenario complexity is decomposed into two dimensions: static complexity and dynamic complexity [
24]. These dimensions, respectively, characterize the difficulty features of environmental structure and traffic interaction processes. To quantify complexity, information entropy theory is introduced based on spatiotemporal interaction relationships among vehicles, and discrete information sources within the scenario are modeled. The basic scenario complexity is then obtained through the combination of static entropy and dynamic entropy. Subsequently, on this basis, a risk evaluation model integrating dynamics- and risk-based indicators is developed to reflect the intensity of potential collision risk in the scenario.
2.2.1. Static Complexity
To reduce the influence of subjective experience on scenario complexity evaluation, information entropy theory is introduced to quantify static scenarios. Static complexity adopts a two-layer entropy-weighted structure: the first layer uses information entropy to characterize the diversity and non-uniformity of scenario elements, while the second layer reflects the relative importance of different elements to the driving task through weighting. Under a graph-based representation, the basic model of static scenario complexity is defined as:
where
denotes the static scenario complexity coefficient,
represents the total number of scenario element groups, and
indicates the proportion of nodes of a specific type in the graph structure relative to the total number of nodes. Unlike classical Shannon information entropy, physical elements such as roads, buildings, and traffic signs are mapped to nodes in a graph structure, and connections are established based on their topological relationships. In this way, the complexity of static scenarios is quantified in the form of structural entropy. Considering that different elements have varying degrees of influence on driving decision-making, an entropy-weighted corrected static complexity is introduced:
where
denotes the static scenario complexity,
represents the weight corresponding to the
-th group of elements in the static scenario composition,
denotes the sum of predefined scores of all elements within that group, and
is the base information entropy. The weighting coefficients are determined through a combination of expert scoring and empirical analysis. While preserving the objective characterization of the intrinsic complexity of the scenario provided by the entropy measure, this approach also reflects the differentiated contributions of various elements to the driving task. Consequently, the static structures of different road environments can be compared under a unified metric.
2.2.2. Dynamic Scenario Complexity
Dynamic scenario complexity is used to characterize the interaction intensity among traffic participants and the nonlinear characteristics arising from variations in geometric relationships and motion states. Based on the statistical analysis of typical interaction scenarios, a multi-factor coupled formulation is adopted to construct the dynamic complexity model. The core form of the dynamic complexity model is given in [
25]:
where
,
, and
denote the complexity factors corresponding to encounter angle, relative speed, and relative distance, respectively. An exponential mapping is adopted so that complexity increases in an accelerated manner with interaction intensity, which is consistent with the variation characteristics of driver perceptual load under critical operating conditions.
The angle factor is adapted from angle-based risk assessment concepts in the maritime domain and modified for road scenarios. Typical encounter relationships—such as same-direction, opposite-direction, and perpendicular crossing—are mapped to different complexity weights. Perpendicular crossings are assigned the highest weight, followed by opposite-direction encounters, with same-direction encounters assigned the lowest weight, reflecting differences in the difficulty of the perception–decision chain under different intersection geometries.
The relative speed and distance factors and introduce an adaptive safety distance on the basis of the traditional Time To Collision (TTC) metric, incorporating vehicle dynamic characteristics and road adhesion conditions into the modeling framework. For different types of traffic participants, including pedestrians, non-motorized vehicles, and motor vehicles, differentiated distance thresholds are applied to reflect differences in risk sensitivity.
Static complexity and dynamic complexity jointly determine the basic complexity of a scenario:
The multiplicative formulation is used to characterize the synergistic amplification effect between static environments and dynamic interactions, whereby identical dynamic behaviors result in higher overall complexity in more structurally complex environments.
The final scenario complexity
is determined jointly by the basic complexity and the traffic flow density modulation factor:
where
denotes the traffic flow density. Under low traffic flow conditions,
approaches 1, while under moderate and high traffic flow conditions, it appropriately amplifies the complexity level. This layered modulation mechanism enables flexible adjustment of the overall complexity level across different traffic flow environments while maintaining the structural stability of the complexity model, thereby providing a quantitative basis for describing the risk amplification effect in high-interaction scenarios.
2.2.3. Risk Evaluation Method
This study proposes a collision risk measurement framework based on dynamic models. By integrating the Physical Crash Severity Index (PCSI) with the TTC, a more comprehensive scenario risk evaluation system is established. Analysis indicates that traditional single-indicator evaluation methods are insufficient to accurately characterize potential risks in complex traffic scenarios; therefore, a composite risk index
is introduced [
26]:
This ratio design reflects the coupling between two key dimensions: PCSI quantifies the physical severity of a collision once it occurs, while TTC characterizes the urgency of collision occurrence. Taking their ratio enables the risk assessment to simultaneously account for both dimensions, providing a more comprehensive risk characterization in scenarios where a single indicator cannot adequately capture both severity and urgency.
For TTC modeling, considering the prevalent acceleration and deceleration behaviors in automated driving scenarios, relative acceleration is incorporated into the prediction framework. A piecewise function is employed to distinguish between accelerating and decelerating cases, thereby avoiding the underestimation of risk caused by the traditional constant-velocity assumption under highly dynamic operating conditions. The calculation formula is given as:
where
denotes the relative distance,
denotes the relative speed,
denotes the relative acceleration, and
represents the safe stopping distance.
The potential collision severity index (PCSI) integrates key physical variables such as collision angle, relative speed, and mass ratio, and is formulated as a weighted combination:
where
,
, and
denote the angle factor, the relative speed–distance factor, and the mass ratio factor, respectively, and
,
, and
are the corresponding weights. The angle factor characterizes differences in vehicle structural deformation and occupant injury under different impact angles, explaining why lateral collisions are generally more hazardous than same-direction rear-end collisions. The relative speed–distance factor reflects the physical principle that collision energy increases rapidly with increasing relative speed and decreasing separation distance. The mass ratio factor is derived from the principle of momentum conservation and characterizes the mechanism by which lighter vehicles are exposed to higher risk in asymmetric collisions. The weights are obtained through regression analysis on accident databases, among which the speed-related factor has the highest weight, consistent with the physical law that kinetic energy is proportional to the square of velocity.
By integrating TTC and PCSI, the risk model simultaneously characterizes the urgency of collision occurrence and the potential severity within a unified framework, thereby providing risk indicators with clear physical meanings as inputs for the subsequent fuzzy system.
2.2.4. Normalization of Scenario Risk and Complexity
Before being used as inputs to the fuzzy inference system, the quantified scenario complexity and risk indices are normalized to a common numerical scale. For each index, the minimum and maximum values are determined from the entire constructed scenario dataset, and all scenario samples are linearly normalized to the interval [0, 1] using min–max scaling. The resulting normalized complexity and risk values are then directly used as the input variables of the fuzzy inference model for testing environment allocation.
2.3. Fuzzy Framework
In automated driving testing scenarios, issues such as sensor noise, incomplete information, and fuzzy scenario boundaries are pervasive, making it difficult for analytical models based on precise thresholds to stably characterize the nonlinear relationships between complex scenario attributes and testing requirements. Therefore, a fuzzy inference system [
27] is employed to map scenario complexity
and risk
to the combined allocation ratios of proving-ground testing and open-road testing, thereby constructing an interpretable combinatorial testing strategy model.
Among various fuzzy inference structures [
28], the Mamdani-type system expresses knowledge in the form of “IF–THEN” rules and is suitable for complex decision-making problems with multiple inputs and single or multiple outputs. It can directly reflect the influence of expert experience on decision-making. Considering both interpretability and implementation complexity, a Mamdani-type fuzzy inference structure is adopted in this study to construct the combinatorial testing strategy model.
2.3.1. Mamdani-Type Fuzzy Inference System
The Mamdani-type fuzzy inference system adopted in this study is illustrated in
Figure 3 and mainly consists of four modules: fuzzification, rule base, fuzzy inference, and defuzzification. The system inputs are the scenario complexity
and risk
, while the output is a composite indicator representing the allocation ratios between proving-ground testing and open-road testing, which is used to guide testing resource allocation.
The fuzzification module maps the continuous input variables and into fuzzy sets through membership functions. Considering that variations in complexity and risk exhibit smooth transition characteristics, Gaussian membership functions are adopted to describe the input variables. This enables continuous transitions between adjacent levels at their boundaries, thereby enhancing robustness to measurement errors and parameter uncertainties.
The rule base module consists of a set of fuzzy rules in the form of IF–THEN statements. Based on combinations of complexity and risk levels, these rules provide corresponding recommendations for testing allocation ratios. The rule design integrates expert knowledge with statistical data analysis and covers the main combination patterns within the complexity–risk space, ensuring reasonable decision outputs under typical operating conditions.
The fuzzy inference module employs the Mamdani-type minimum–maximum inference mechanism to activate the corresponding rules according to the input membership degrees and aggregates the outputs of all activated rules to obtain a fuzzy output set for the testing allocation ratio. The defuzzification module uses the centroid method to convert the fuzzy output set into a crisp numerical value, yielding the recommended allocation ratio between proving-ground and open-road testing for a given scenario. The centroid method exhibits smooth response characteristics near decision boundaries, which helps avoid abrupt changes in testing allocation ratios between adjacent scenarios.
Through the above fuzzy inference structure, the combinatorial testing strategy model realizes an interpretable mapping of “complexity–risk–testing allocation” within a unified framework, providing a foundation for subsequent rule and parameter optimization based on evolutionary algorithms.
2.3.2. Establishment of the Fuzzy-Based Combinatorial Testing Model
On the basis of the proposed Mamdani-type fuzzy inference structure, this section presents the membership functions, rule formats, and defuzzification expressions adopted in this study, and provides a formal description of the mathematical implementation of the combinatorial testing model.
- (1)
Inputs, Outputs, and Membership Functions
The model inputs are scenario complexity
and risk
, and the output is the proving-ground testing proportion
. The variables
,
, and
all adopt a seven-level linguistic variable set:
After discretization, a total of scenario state combinations are formed in the space.
As shown in
Figure 4, for each input variable
, the membership function of its
-th linguistic level adopts a Gaussian form:
where
and
are the center and width parameters of the
-th linguistic level, respectively. By constraining adjacent levels to satisfy the following condition at their intersection points:
smooth transitions and appropriate overlap between linguistic levels are ensured, providing a tunable parameter space for subsequent evolutionary optimization.
- (2)
Rule Structure and Inference Mechanism
The rule base consists of a set of IF–THEN rules, and the
-th rule is expressed as:
where
,
, and
are the corresponding linguistic levels. For a given input
, the activation strength of the
-th rule is given by:
and the consequent membership function
is clipped accordingly:
All rule outputs are aggregated through a maximum operation to obtain the overall membership function of the output variable
:
- (3)
Defuzzification and Output Generation
The precise value of the testing proportion is obtained through defuzzification using the centroid method:
where
denotes the domain of the output variable. To reduce computational complexity, the output domain is discretized into a finite set of sampling points
in the numerical implementation, and a discrete centroid formulation is adopted:
Through the above definitions, scenario complexity and risk are mapped to a specific testing proportion via the fuzzy rule base and membership functions, thereby forming an interpretable nonlinear mapping structure from scenario attributes to testing resource allocation.
2.4. Optimization of the Fuzzy Rule Base
In the fuzzy inference model, the rule base determines the mapping relationship between scenario complexity , risk , and the testing proportion . To improve the adaptability and decision accuracy of the combinatorial testing model across different scenarios, evolutionary optimization is applied to the Mamdani rule base.
Considering that both inputs and outputs are discrete linguistic variables, the rule base is represented using a binary encoding scheme [
29]. Based on the input and output linguistic levels, the rule base can be expressed as a fixed-dimension rule matrix. After binary expansion, an individual can be represented as a binary string of length
. The genetic algorithm treats this representation as the search space and iteratively updates the rule structure through selection, crossover, and mutation operators.
The fitness function is jointly composed of the testing efficiency metric
and the resource consumption metric
:
The former measures testing coverage and convergence efficiency under a given rule configuration, while the latter reflects the corresponding testing time and resource consumption. Through the combined fitness design, efficiency and cost are explicitly balanced during the evolutionary process, providing a quantitative optimization objective for the combinatorial testing strategy.
To address the issues of premature convergence and insufficient population diversity that traditional genetic algorithms tend to exhibit in high-dimensional rule spaces [
30], an adaptive evolutionary mechanism integrating simulated annealing and K-means clustering is introduced into the genetic framework. First, a simulated annealing operator is embedded in the genetic iteration process, where the temperature parameter
controls the acceptance probability of suboptimal solutions, and an exponential decay strategy is adopted for temperature updating:
where
denotes the system temperature at the
-th iteration, and
is the cooling coefficient.
During the high-temperature stage, individuals with lower fitness are accepted with a certain probability to enhance the global search capability. State transitions are governed by an improved Metropolis criterion, which allows newly generated individuals with slightly lower fitness than the current individual to be accepted with a nonzero probability at higher temperatures, thereby effectively reducing the risk of being trapped in local optima.
where
is the probability that the current individual is replaced by the new individual,
is the fitness of the newly generated individual, and
is the fitness of the current individual.
Second, K-means clustering is performed on the population at each generation, and the standard deviation of the distances between individuals and each cluster center is computed as a measure of population diversity. When the standard deviation is large, the population distribution is relatively dispersed and diversity is high; in this case, the crossover rate and mutation rate are appropriately increased to enlarge the search space. When the standard deviation is small, the population tends to be concentrated and close to the convergence region; accordingly, the crossover rate and mutation rate are reduced to avoid excessive disturbance to the current favorable structures and to accelerate convergence. Through this diversity-driven adaptive adjustment mechanism, a search strategy is realized that emphasizes global exploration in the early stage and strengthens local exploitation in the later stage.
Within the improved genetic algorithm framework, both the structure and parameters of the fuzzy rule base are automatically evolved under the joint efficiency–cost optimization objective. This reduces the subjectivity introduced by manual parameter tuning and enhances the adaptability and robustness of the combinatorial testing strategy across scenarios with varying levels of complexity and risk.
2.5. Cost and Efficiency Calculation Model
To explicitly balance testing resource consumption and defect exposure capability during rule optimization, this study constructs an evaluation model for the combinatorial testing strategy from the two dimensions of cost and efficiency.
2.5.1. Testing Cost Model
The cost of proving-ground testing
mainly consists of site rental fees. The cost of open-road testing
includes three components: equipment usage cost
, personnel input cost
, and vehicle operating cost
:
where the equipment cost is estimated based on typical sensor rental prices reported in the literature [
31], the personnel cost is calculated according to the hourly labor cost of technical staff involved in the actual testing organization, and the vehicle wear-and-tear cost is determined based on the testing intensity.
2.5.2. Scenario Exposure and Time Consumption
Unlike proving-ground testing, where operating conditions are controllable, the duration of open-road testing is strongly influenced by factors such as traffic flow, weather, and road conditions, and the occurrence of target scenarios is inherently stochastic. To estimate the time consumption required to complete testing for a specific scenario on open roads, this study introduces scenario exposure to characterize the expected number of occurrences of a given type of scenario per unit time.
For a given scenario, its exposure can be estimated based on statistical frequency:
where
is the number of occurrences of the
-th value in all statistics,
is the total number of observations,
is the frequency of occurrence of the
-th value, and
is the probability of occurrence of the
-th value.
By selecting samples with structural similarity to Scenario 47 from the IND dataset and counting their occurrences across different intersections and observation periods, the frequency distribution of Scenario 47 can be obtained. The statistical results indicate that the occurrence frequencies of most scenarios exhibit a concentrated distribution.
Figure 5 presents the Kolmogorov–Smirnov test results for the occurrence frequency of Scenario 47.
As shown in
Figure 5, the KS test results indicate that, within this dataset, the occurrence frequency of Scenario 47 can be approximately regarded as following a normal distribution. Therefore, a normal distribution is adopted as an approximate model for scenario exposure, which smooths the influence of extreme samples on the estimation of scenario exposure time. Its probability density function can be expressed as:
where
and
denote the variance and the mean, respectively.
Accordingly, the calculation formula for testing time and scenario exposure is given by:
where
is the standardized Z-score corresponding to a 95% confidence level,
denotes the scenario exposure, and
represents the time required to reach the expected exposure level, satisfying:
where
is a function related to the testing proportion output by the fuzzy framework.
2.5.3. Testing Efficiency Model
In modeling testing efficiency, considering that defect exposure exhibits a typical diminishing marginal return pattern as the number of tests increases, an exponential saturation model is adopted to fit the relationship between testing efficiency and the number of tests:
where
denotes the cumulative efficiency achieved after
tests,
is the maximum attainable testing efficiency, and
is the efficiency constant reflecting the rate of efficiency improvement. This model captures the marginal effect characteristics of testing efficiency and can be used to evaluate the efficiency performance of different combinatorial testing strategies under given cost constraints.
Based on the above cost and efficiency models, the composite fitness function in rule base optimization can explicitly balance efficiency improvement and resource consumption, thereby providing a quantitative basis for the evolutionary optimization of combinatorial testing proportions.
4. Comparative Analysis of Results
Based on the clearly defined experimental setup and evaluation metrics, this section conducts a comparative analysis of the performance of four types of genetic optimization strategies from two perspectives: fitness convergence characteristics and efficiency–cost ratio.
4.1. Ablation Study
To quantitatively evaluate the impact of different optimization mechanisms on the performance of the combinatorial testing strategy, four comparative experimental schemes based on the ablation concept are designed in this study. The specific settings of the four optimization strategies are as follows:
- (1)
Original genetic algorithm (GA);
- (2)
Genetic algorithm with simulated annealing mechanism (GA-SA);
- (3)
Genetic algorithm with K-means clustering-based adaptive parameter tuning (GA-K Means);
- (4)
Complete optimization scheme integrating simulated annealing and K-means clustering (GA-SA-K Means).
Figure 10a illustrates the overall convergence characteristics of the four strategies. The original GA exhibits slower convergence and lower terminal fitness, with evident premature convergence. Introducing the K-means mechanism (GA-K Means) significantly accelerates fitness improvement during the initial search phase but offers limited enhancement in terminal fitness. Incorporating the simulated annealing mechanism (GA-SA) primarily improves convergence quality in the mid-to-late stages. The complete scheme, GA-SA-K Means, demonstrates superior convergence performance across both initial and later phases, accelerating convergence while achieving the highest terminal fitness, reflecting the synergistic benefits of the two mechanisms in global exploration and local exploitation.
As shown in
Figure 10b, the complete optimization scheme achieves higher efficiency–cost ratios across multiple representative scenarios. In particular, in urban high-interaction scenarios such as Scenarios 33 and 47, the improvement exceeds 25%, demonstrating the strategy’s advantage in resource allocation under high-complexity environments.
4.2. Comparison of Efficiency–Cost Ratio
As shown in
Figure 11, the complete optimization scheme achieves a higher efficiency–cost ratio across multiple typical scenarios. In particular, for urban interaction-intensive scenarios (such as Scenario 33 and Scenario 47), GA-SA-K Means improves the efficiency–cost ratio by over 25% compared with the baseline GA, and also demonstrates clear advantages over the single-mechanism variants GA-SA and GA-K Means. This indicates that the complete scheme provides superior resource allocation capability in high-complexity environments.
Specifically, in high-interaction urban scenarios, the improvement in efficiency–cost ratio is most pronounced. Scenarios 33 and 47 both represent urban intersection conditions, featuring multiple motor vehicles, pedestrians, and potential occluding objects. Clear conflict points exist between vehicle trajectories, resulting in high levels of complexity and risk. In these scenarios, the combinatorial testing strategy adaptively increases the proportion of open-road testing based on quantified complexity and risk, allocating more testing resources to real traffic environments to enhance the exposure probability of potential failure modes. Simultaneously, by jointly optimizing the fuzzy rule base and the testing ratios, the number of proving-ground tests in low-value conditions is reduced, thereby significantly improving the efficiency–cost ratio without increasing the overall testing budget.
In contrast, in scenarios such as 21 and 36, which represent low-speed or proving-ground conditions with relatively high traffic flow, the cost per test is low, but the risk exposure per test is limited. In these scenarios, the combinatorial testing strategy tends to execute test cases in bulk within proving grounds, allocating open-road tests only as necessary to ensure overall risk exposure while controlling the additional cost associated with open-road testing. The improvement in efficiency–cost ratio in these scenarios is lower than in high-interaction urban scenarios, but overall remains superior to the baseline, indicating that the strategy can achieve differentiated resource allocation according to scenario complexity and risk characteristics.
Overall, across the eight selected representative parameterized scenarios, the combinatorial testing strategy achieves higher efficiency–cost ratios than the unoptimized and single-mechanism baseline schemes in highway, urban, and proving-ground environments. This demonstrates that the model can adaptively adjust the proportion of proving-ground and open-road testing based on scenario complexity and risk levels, balancing risk exposure capability and cost control under a fixed testing budget, and providing a stable advantage in resource allocation.
4.3. Sensitivity Analysis with Respect to Expert-Defined Parameters
4.3.1. Experimental Setup
In the proposed fuzzy inference framework, all fuzzy rules are assigned equal activation weights and remain active during the inference process. Therefore, no relative importance weighting is introduced among rules. Under this setting, expert-defined parameters mainly refer to the initial parameters of the membership functions specified based on domain knowledge prior to optimization. The expert-defined parameters considered in this study are summarized in
Table 7.
To examine the sensitivity of the proposed framework to variations in expert-defined parameters, a controlled perturbation analysis was conducted on the membership function parameters listed in
Table 7. Specifically, the centers and widths of the Gaussian membership functions were independently perturbed within a ±10% range around their reference values. During the perturbation process, the linguistic partitioning of input variables and the semantic structure of the fuzzy rule base were preserved to ensure consistency in rule interpretation.
For each perturbed parameter configuration, the testing environment allocation ratios were recalculated using the same set of scenario inputs as those employed in the main experiments. To isolate the effect of parameter perturbations, the analysis was conducted using Scenario 47, which is adopted as a representative case in the main text. The sensitivity analysis was carried out under identical experimental settings to maintain consistency with the primary experiments.
4.3.2. Results and Analysis
The results indicate that perturbations in the membership function parameters lead to observable numerical variations in the testing environment allocation ratios, as illustrated in
Figure 12 and
Figure 13. However, the overall allocation patterns and decision trends remain consistent across different parameter configurations.
Specifically, scenarios characterized by higher risk levels consistently result in increased proportions of closed proving-ground testing, while scenarios with higher complexity levels continue to exhibit a greater reliance on open-road testing. Although minor fluctuations in allocation ratios are observed under parameter perturbations, the relative ordering of testing preferences across different environments is preserved.
These observations suggest that the testing environment allocation decisions produced by the proposed framework are stable with respect to reasonable variations in expert-defined membership function parameters. The sensitivity analysis confirms that the resulting allocation behavior follows consistent scenario-driven trends rather than being driven by specific parameter settings.
Overall, the sensitivity analysis results indicate that, under reasonable perturbations of the membership function parameters while preserving the semantic structure of the fuzzy rule base, only limited numerical variations are observed in the testing environment allocation ratios, whereas the overall allocation patterns and decision trends remain unchanged. Under both combined and one-factor-at-a-time perturbation settings, the relative ordering of testing environment preferences is consistently preserved. These results demonstrate that the proposed testing environment allocation mechanism exhibits stable mapping behavior with respect to expert-defined membership function parameters, and that the resulting allocation decisions are primarily driven by scenario characteristics rather than specific parameter settings.
4.4. Interpretability Analysis of the Optimized Fuzzy Rules
To further examine the interpretability of the proposed framework, this section analyzes the fuzzy rule base before and after optimization by visualizing the corresponding rule surfaces. The fuzzy rule surface provides an intuitive representation of how testing environment allocation responds to variations in scenario risk and complexity, thereby offering a direct means to assess whether the optimization process preserves the semantic structure of the original rules.
Figure 14 presents the fuzzy rule-base heatmaps before and after optimization, where
Figure 14a shows the initial rule-base heatmap and
Figure 14b illustrates the optimized one. As observed, the optimized rule-base heatmap preserves the overall monotonic trend with respect to scenario risk and complexity. Specifically, higher risk levels consistently correspond to increased allocations toward closed proving-ground testing, while higher complexity levels continue to exhibit a stronger preference for open-road testing.
Compared with
Figure 14a, the optimized rule-base heatmap in
Figure 14b exhibits smoother transitions in regions associated with moderate risk and complexity, while more concentrated responses are observed under extreme scenario conditions. No structural inversions or counterintuitive patterns are introduced during optimization, and the relative ordering of testing environment preferences remains unchanged. These results indicate that the evolutionary optimization refines the inference behavior through parameter-level adjustments without altering the semantic structure of the original rule base.
5. Conclusions
This study proposes a fuzzy-system-based combinatorial testing strategy for automated driving systems, in which an improved genetic algorithm is integrated to optimize the allocation ratio between proving-ground testing and open-road testing. By jointly modeling scenario complexity and risk within a unified fuzzy inference framework, the proposed approach enables quantitative characterization of scenario-level testing requirements and supports interpretable testing resource allocation across heterogeneous driving environments. On this basis, the fuzzy rule base is further optimized using an improved genetic algorithm, allowing the allocation strategy to adapt to variations in scenario distributions while preserving rule-level interpretability. The framework incorporates multiple traffic interaction indicators, including information entropy, angular factors, relative speed, and TTC, to construct a conflict risk assessment model that does not rely on high-precision measurements, thereby extending its applicability to nonlinear and high-dimensional traffic conditions. Combined with reproducible scenario construction based on accident samples and naturalistic driving trajectories, and integrated with cost–efficiency modeling, the proposed strategy forms a closed-loop testing framework that supports quantitative evaluation and feedback of testing performance. Experimental results demonstrate that the optimized strategy achieves higher resource utilization efficiency and improved testing economy compared with traditional unoptimized approaches, particularly in scenarios characterized by high complexity and intensive interactions.
The proposed framework is developed and validated using traffic data collected from Chinese road environments, and its current scope is therefore aligned with the corresponding traffic characteristics. Nevertheless, the underlying modeling rationale—namely, the abstraction of scenario complexity and risk and the fuzzy inference-based testing environment allocation mechanism—is not inherently restricted to a specific region. For applications in other traffic systems, the overall framework can be retained while the statistical ranges and parameter distributions of scenario attributes are recalibrated using local data. Future work may further explore automatic scenario generation and semantic reconstruction methods to efficiently model both typical and edge-case traffic situations, as well as the integration of multi-source heterogeneous data such as V2X communication, infrastructure information, and sensor observations. In addition, advanced optimization methods, including reinforcement learning and evolutionary game-theoretic approaches, could be incorporated to systematically compare testing strategies under equivalent resource constraints, further improving the generality and adaptability of combinatorial testing frameworks for automated driving systems.