Bridging Quantitative Scoring and Qualitative Grading: A Mapping Framework for Intelligent System Evaluation

Hao, Renjian; Shi, Mujia; Hu, Yong; Wei, Chunling; Feng, Lihang; Wang, Dong; Yuan, Li

doi:10.3390/electronics14122470

Open AccessArticle

Bridging Quantitative Scoring and Qualitative Grading: A Mapping Framework for Intelligent System Evaluation

by

Renjian Hao

¹,

Mujia Shi

²,

Yong Hu

¹,

Chunling Wei

¹,

Lihang Feng

²

,

Dong Wang

^3,*

and

Li Yuan

¹

Beijing Institute of Control Engineering, Beijing 100190, China

²

College of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing 211800, China

³

School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(12), 2470; https://doi.org/10.3390/electronics14122470

Submission received: 5 June 2025 / Revised: 16 June 2025 / Accepted: 16 June 2025 / Published: 18 June 2025

(This article belongs to the Special Issue Advanced Control Strategies and Applications of Multi-Agent Systems)

Download

Browse Figures

Versions Notes

Abstract

In current evaluations of intelligent system capabilities, there is a problem with the difficulty of unifying quantitative scoring results with qualitative grading standards, which severely limits cross-platform and cross-task intelligent comparisons and grading certifications. To address this issue, this paper proposes a hierarchical parameter partitioning equivalent mapping method (HPP-EM) to construct a mapping mechanism from quantitative scores to qualitative grades. The method comprises three modules: Nonlinear Interval Partitioning (NIP) constructs intelligent capability intervals based on the principle of diminishing capability margins; Quantification-to-Grade Mapping (QGM) introduces a set of criteria to determine mapping rules; and Dynamic Threshold Generation (DTA) calculates the lower score thresholds and total score thresholds for each grade based on indicator weights. The model is interpretable and supports mapping across multiple metrics and multi-level systems. Experimental validation was conducted using evaluation data from a deep-space unmanned system, demonstrating that HPP-EM accurately reflects system intelligence levels and exhibits good adaptability and scalability. This paper provides a traceable, structured mapping scheme for unified intelligence level assessment.

Keywords:

intelligence level assessment; dynamic level partitioning; mapping method

1. Introduction

With the rapid development of digitalization and automation technologies, intelligent systems have become the foundation of innovation in fields such as autonomous driving [1], rehabilitation healthcare [2], and aerospace [3,4]. Among the core characteristics of these systems, intelligence evaluation plays a critical role in both theoretical research and practical engineering applications [5].

To provide standardized intelligence classification and ensure consistent development benchmarks, multiple industry and governmental bodies have introduced widely recognized grading systems. For instance, the Society of Automotive Engineers (SAE) established the SAE J3016 standard for autonomous vehicle classification [6]; the U.S. National Institute of Standards and Technology (NIST) developed the Autonomy Levels for Unmanned Systems (ALFUS) framework [7]; and NASA proposed a hierarchy for grading autonomy in spacecraft and exploratory probes [8]. These frameworks typically employ qualitative definitions, assigning intelligence levels based on descriptions of system functionalities and task responsibilities.

Meanwhile, researchers across various domains continue to pursue quantitative assessment approaches to support more objective and repeatable intelligence evaluations for unmanned systems. In intelligent transportation, multiple works have developed classification models for autonomous vehicles [9,10], alongside scenario-specific safety assessments [11] and risk evaluations under complex environments [12]. In the aerospace sector, military-focused assessments are common: Liu [13] utilized the analytic hierarchy process (AHP) to assess UAV swarm effectiveness, while Han [14] introduced fuzzy comprehensive evaluation to enrich the decision model. Other methods, such as index system construction based on target detection [15], aim to reduce subjectivity. Industrial applications of intelligence evaluation typically emphasize risk control and predictive maintenance; for example, Kong [16] proposed a weight correction mechanism based on key operational indicators, and He [17] employed FDEMATEL combined with Analytic Network Process (ANP) to evaluate intelligent coal mine development.

Despite their contributions, most existing evaluation methods rely heavily on multi-indicator scoring and weight-based aggregation, making them inherently quantitative in nature. This scoring-centric orientation limits their ability to directly align with qualitative grading systems like ALFUS. As a result, evaluation outcomes across different domains or platforms lack interpretability, consistency, and cross-scenario comparability.

To bridge this gap, this paper proposes a hierarchical parameter partition-based equivalent mapping (HPP-EM) method that connects quantitative evaluation outputs with qualitative intelligence grades. Without altering existing evaluation frameworks, the method establishes a unified and interpretable mapping path, enhancing standardization and adaptability. The proposed approach incorporates the following: (1) a structured mapping framework for aligning scores with qualitative levels to ensure standardized intelligence expression; (2) a nonlinear score interval division to reflect diminishing marginal improvement in system intelligence; and (3) a dynamic threshold allocation mechanism that adapts to different indicator weights, improving flexibility and contextual fit.

The remainder of this paper is organized as follows: Section 2 introduces related research on qualitative and quantitative intelligence evaluation; Section 3 details the proposed HPP-EM framework and its core modules; Section 4 presents experimental validation and analysis; and Section 5 concludes with key findings and outlines directions for future research.

2. Related Work

2.1. Qualitative Evaluation Methods

Qualitative evaluation methods play an important role in the early development of unmanned systems, especially when quantitative data is limited. These methods rely heavily on expert judgment and are widely used to establish intelligent capability hierarchies. Among them, the Autonomy Levels for Unmanned Systems (ALFUS) framework proposed by NIST in 2003 [7] remains one of the most recognized models. ALFUS uses a three-dimensional structure—Mission Complexity (MC), Environmental Complexity (EC), and Human Independence (HI)—to define autonomy levels across diverse task scenarios.

In addition, NASA has introduced its own classification for spacecraft autonomy [8], and the U.S. military has developed autonomy taxonomies such as ACL [18] and LOA [19] based on the OODA model and automation authority allocation, respectively. These frameworks provide human-interpretable qualitative levels that are helpful in system design and task matching. Similar efforts toward interpretable autonomy frameworks have also been explored in the field of industrial collaborative robotics [20]. An overview of these qualitative evaluation frameworks is shown in Figure 1.

However, a common limitation of these methods is the absence of a quantitative mapping pathway. Most qualitative models lack mechanisms for automatic or data-driven level assignment, resulting in high subjectivity, limited repeatability, and poor adaptability for long-term system tracking or cross-platform comparison [21,22]. Furthermore, without a unified scoring-to-grading bridge, it is difficult to integrate qualitative results into quantitative evaluation systems or to compare intelligent levels across domains.

These limitations underscore the need for a transparent, interpretable mapping mechanism that links quantitative assessment results to qualitative intelligence grades—a gap this study aims to address.

2.2. Quantitative Evaluation Methods

Quantitative evaluation, driven by data analysis and mathematical modeling, enables objective, repeatable, and comparable assessments of intelligent systems through standardized indicators and structured procedures. It typically involves metric system construction, data collection and preprocessing, weight determination, and overall score aggregation. To ensure comprehensive coverage across functional dimensions, metrics are often organized in a hierarchical evaluation framework, as illustrated in Figure 2a.

In practice, researchers have developed various hierarchical evaluation models tailored to specific application scenarios. For instance, Peng [23] designed a three-layer ballistic missile threat assessment system, progressively aggregating performance metrics from tactical indicators to strategic decision support. Zhang et al. [24] built a networked UAV swarm evaluation framework with a two-level, five-layer structure, capturing multi-agent behavioral characteristics. Dong [25] introduced a vehicle safety assessment model with 26 indicators distributed across three layers, based on the SMART principles (specificity, measurability, applicability, relevance, and target-orientation).

Figure 2. (a) Stratified quantitative assessment system; (b) schematic diagram of the UAV swarm network model [24]; (c) safety evaluation index system of the rail vehicle system [25].

A critical component in such models is the determination of indicator weights. Current methods can be divided into three categories: subjective, objective, and hybrid. Subjective approaches—such as Analytic Hierarchy Process (AHP) [26], Analytic Network Process (ANP) [27], and the Delphi method—rely on expert judgment. Objective approaches—such as Entropy Value Method (EVM), Principal Component Analysis (PCA), and CRITIC [28]—derive weights from data variance and structure. To balance both, hybrid approaches have been widely adopted, especially in complex or uncertain evaluation scenarios. Han [29] combined AHP and EVM for UAV swarm evaluation, while Zhao [30] integrated CRITIC and AHP to support smart city planning assessment.

The primary strength of quantitative methods lies in their mathematical rigor and traceability, enabling precise and reproducible evaluation results. Moreover, hierarchical modeling and hybrid weighting enhance the adaptability of these methods to varied application domains.

However, quantitative methods also face limitations. Despite producing reliable scores, they lack mechanisms to transform numeric outputs into widely accepted qualitative intelligence levels. As a result, it remains difficult to directly interpret quantitative results within the frameworks of autonomy classification, such as ALFUS or SMART. This disconnect hinders result comparability across systems and restricts their use in strategic-level certification or deployment decision-making.

Therefore, a standardized and interpretable mapping mechanism between quantitative scores and qualitative grades is essential, yet has been largely overlooked in the existing literature. Addressing this gap is the core objective of the method proposed in this paper.

3. Proposed Framework

While quantitative methods provide objective, data-driven scores and qualitative methods offer interpretability through structured grading schemes, their independent application often results in disconnected outcomes. For example, numerical evaluation scores may suggest high system performance, yet without a corresponding qualitative interpretation, it is difficult to determine the autonomy level under frameworks such as ALFUS or NASA SMART. This lack of linkage hinders the ability to compare systems across domains or to support certification processes that require standardized grading expressions. To address this issue and build a unified evaluation bridge, this study proposes the HPP-EM framework. As shown in Figure 3, the framework consists of three core modules: the Nonlinear Interval-Partitioned Module (NIP Mod), the Quantitative–Qualitative Grade Mapping Module (QGM Mod), and the Dynamic Threshold Allocation Module (DTA Mod). Together, they form a complete process for transforming quantitative scores into interpretable qualitative intelligence grades.

In the NIP Mod, the overall scoring space of quantitative evaluation is divided into multiple non-uniform levels. This nonlinear partitioning reflects the diminishing marginal improvements typically seen in intelligent systems, ensuring that higher intelligence levels receive finer granularity. The partitioned levels serve as the initial basis for establishing grade boundaries.

The QGM Mod builds the core mapping mechanism between the score intervals and qualitative intelligence grades. By determining whether each indicator meets its level-specific requirements, and by quantifying the minimum number of qualified indicators for a level transition, this module enables structured grade assignment based on quantitative results. This overcomes the ambiguity of direct score-to-grade conversion in existing evaluation frameworks.

To ensure that the mapping process remains adaptive and sensitive to indicator importance, the DTA Mod dynamically generates threshold values for both individual indicators and overall scores. These thresholds are computed based on indicator weights, and an adaptive matrix is used to adjust grade boundaries. This ensures that the system’s final intelligence grade reflects not only score magnitudes but also structural balance across performance dimensions.

Through the coordinated operation of these three modules, the HPP-EM framework establishes a transparent, interpretable, and generalizable pathway for bridging quantitative evaluation results and qualitative intelligence classifications.

3.1. The Nonlinear Interval-Partitioned Module

To support the transformation from quantitative scores to qualitative grades, the NIP Module partitions the scoring space into structured sub-intervals. This operation provides the foundational grading tiers for subsequent qualitative mapping.

Assume a hierarchical quantitative evaluation system consisting of

m

primary evaluation indicators. Let

w_{j}

and

S_{j}

represent the weight and score of the

j

-th indicator, respectively. The evaluation inputs must satisfy the following constraints:

\{\begin{matrix} j \in [1, m], j \in Z \\ w_{j} \in [0, 1], \sum_{j = 1}^{m} w_{j} = 1 \\ S_{j} \in [S_{m i n}, S_{m a x}] \end{matrix}

(1)

In Equation (1),

m

denotes the total number of first-layer evaluation indicators;

w_{j}

represents the weight value of the

j

-th indicator, and

S_{j}

denotes its corresponding score. The terms

S_{m i n}

and

S_{m a x}

define the global lower and upper bounds of the evaluation score range, respectively. All indicator weights are normalized such that their sum equals 1.

To reflect nonlinear growth in system intelligence, the full scoring interval

[S_{m i n}, S_{m a x}]

is divided into

a + 1

continuous and progressively shrinking sub-intervals. Let the

i

-th sub-interval be denoted as

{(S}_{i L}, S_{i R})

, where

i \in [0, a]

and

S_{0 L} = S_{m i n}

and

S_{a R} = S_{m a x}

.

The boundaries are subject to

\{\begin{matrix} S_{(i + 1) R} - S_{(i + 1) L} \leq S_{i R} - S_{i L} \\ S_{(i + 1) L} = S_{i R} \end{matrix}

(2)

In Equation (2),

S_{i L}

and

S_{i R}

represent left and right boundaries of the

i

-th scoring sub-interval, respectively. The first condition ensures that the length of each subsequent interval is less than or equal to that of the previous one, capturing the nonlinear nature of intelligent capability improvement. The second condition guarantees the continuity of the scoring space by aligning the right boundary of one interval with the left boundary of the next.

This partitioning structure is designed to reflect the principle of diminishing marginal gains in intelligent capability. As systems advance, improvements become more difficult and performance gaps narrow. The NIP Mod thus establishes a nonlinear, fine-grained scoring space that serves as the basis for level differentiation in the qualitative grade mapping process.

3.2. The Quantitative–Qualitative Grade Mapping Module

The Quantitative–Qualitative Grade Mapping Module (QGM Mod) establishes the core logic for converting numerical evaluation results into qualitative intelligence levels. Based on the sub-interval structure defined in the NIP Module, this module determines the required proportion of indicators that must meet each score interval to satisfy a specific qualitative level.

Assume that the qualitative evaluation framework defines

n

discrete intelligence levels. For any target qualitative level

p \in [0, n - 1]

, the QGM Mod first calculates a proportional baseline to identify the expected average sub-level of quantitative scores that corresponds to level

p

. This is given by

t = \frac{p}{n} \times a

(3)

In Equation (3),

p

represents the target qualitative intelligence level;

n

denotes the total number of qualitative levels;

a

represents the number of score sub-intervals from the NIP Module; and

t

denotes the expected sub-level index representing the grading requirement for level

p

.

This continuous index

t

serves as an anchor to translate qualitative levels into proportional quantitative score expectations, which is crucial in achieving consistent and explainable mapping across heterogeneous systems.

Since

t

is generally a non-integer, the QGM Mod constructs a requirement set

Z_{p}

consisting of two adjacent integer sub-levels,

c e i l (t)

and

f l o o r (t)

, which together define the distribution of evaluation indicators needed to meet level

p

. Given that there are

m

first-layer indicators, the number of indicators that must meet each sub-level is calculated by

\{\begin{matrix} N_{c e i l (t)} = r o u n d (t \times m) - f l o o r (t) \times m \\ N_{f l o o r (t)} = a - r o u n d (t \times m) + f l o o r (t) \times m \end{matrix}

(4)

In Equation (4),

c e i l (t)

represents the ceiling operation on

t

,

f l o o r (t)

represents the floor operation on

t

, and

r o u n d (t)

represents the rounding operation on

t

. This allocation ensures that each grade level not only reflects aggregate performance but also enforces granular control over the distribution of sub-level compliance.

The constructed set

Z_{p}

reflects the minimum sub-levels required across indicators to qualify for a given qualitative grade. If the actual scores of the evaluation indicators satisfy the thresholds defined by

Z_{p}

, then the system is mapped to level

p

; otherwise, the grade is reduced accordingly.

Compared to traditional methods that rely solely on weighted sums, this threshold-based approach enhances interpretability by aligning with task-specific or domain-required capability structures.

This mapping process provides flexible yet structured intelligence grading while avoiding score averaging biases.

3.3. The Dynamic Threshold Allocation Module

The DTA Mod determines the specific minimum score thresholds that each evaluation indicator must achieve to support the mapping to a given qualitative intelligence level. This process complements the grade structure generated by the QGM Mod by calculating score boundaries at both the individual indicator level and the overall system level. The objective is to achieve fine-grained score control that is consistent with both the nonlinearity of intelligent system growth and the weighted importance of evaluation metrics.

Let the

i

-th scoring interval (as defined in the NIP Mod) correspond to the quantitative sub-level

i

. The minimum required score for the

j

-th evaluation indicator to be classified at level

i

is calculated by

S_{j, i} = S_{i L} + k \times \frac{{(S_{i R})}^{2} - {(S_{i L})}^{2}}{100} \times \frac{(w_{j} - w_{m})}{w_{m a x} - w_{m i n}}

(5)

In Equation (5),

S_{j, i}

denotes the minimum score for indicator

j

at level

i

;

S_{i L}

and

S_{i R}

represent left and right boundaries of the

i

-th scoring sub-interval, respectively.

w_{j}

denotes the weight of indicator

j

, and

w_{m}

represents the median of all indicator weights.

w_{m a x}

and

w_{m i n}

represent maximum and minimum weights across all indicators, and

k

is the level adjustment coefficient to control score sensitivity.

The coefficient

k

is used to prevent scoring overflow and fine-tune sensitivity across levels. It is constrained by the range

0 \leq k \leq \frac{S_{m a x}}{S_{m a x} + S_{a L}}

(6)

This dynamic design accommodates system-specific heterogeneity by adjusting minimum requirements based on both structural priorities (via weights) and evaluation resolution (via scoring intervals). A higher-weighted indicator will demand a relatively higher score at the same grade level, while lower-weighted ones require less. As a result, the system can distinguish between critical and secondary metrics during grade qualification.

Based on Equation (5), the complete threshold matrix for all

m

indicators across

a + 1

sub-levels is expressed as

S = [\begin{matrix} S_{1, 0} & \dots & S_{m, 0} \\ ⋮ & ⋱ & ⋮ \\ S_{1, a} & \dots & S_{m, a} \end{matrix}]

(7)

This matrix defines the scoring constraints for level assignment with full granularity.

Furthermore, the DTA Mod calculates the minimum total score threshold

T_{p}

required to qualify the system for qualitative level

p

, based on the mapping structure

Z_{p}

generated in the QGM Mod. This ensures that not only individual indicators but also the overall performance meet the grade standard. The computation is defined as

T_{p} = \sum_{b = 1}^{N_{c e i l (t)}} {a s c e n d \{S_{j, c e i l (t)}\}}_{b} + \sum_{b = 1}^{N_{f l o o r (t)}} {a s c e n d \{S_{j, f l o o r (t)}\}}_{b}

(8)

In Equation (8),

T_{p}

is the total score lower limit for qualitative level

p

, and

c e i l (t)

and

f l o o r (t)

are the number of indicators required at each sub-level (from QGM).

{a s c e n d \{S_{j, c e i l (t)}\}}_{b}

is the

b

-th smallest value of

S_{j, i}

in ascending order.

This two-stage filtering mechanism—first at the individual level, then at the total score level—ensures both local (indicator-level) and global (system-level) compliance, thereby enhancing robustness in varied application contexts.

Together, these mechanisms provide not only technical precision but also conceptual transparency in mapping quantitative evaluations to interpretable and certifiable intelligence levels.

4. Experimental Validation

4.1. The Classical Mapping Case

Due to its multidimensional evaluation indicators, the AFLUS framework is well-suited for complex systems. Therefore, selecting ALFUS as the qualitative mapping reference is of practical significance. The following sections introduce the specific grading framework of ALFUS and the chosen quantitative methods.

This choice also allows the validation of the proposed HPP-EM model against an internationally recognized autonomy grading system, enhancing comparability and interpretability.

4.1.1. Qualitative Method

The ALFUS framework, as a general qualitative evaluation standard for unmanned systems, establishes standard terminology and definitions for describing the autonomy levels of intelligent systems. This framework enables a hierarchical description of functional autonomy from basic remote control to human-equivalent autonomous behavior, making it an ideal benchmark for qualitative classification in this study, as shown in Table 1. It is applicable to various intelligent systems, including ground autonomous robotic systems and unmanned ground vehicles. Therefore, this study adopts the ALFUS classification method to qualitatively evaluate the intelligence level of the experimental system. The classification of intelligence levels in the ALFUS system are shown in Table 2.

4.1.2. Quantitative Method

This study selects the detection and execution tasks of a specific extraterrestrial unmanned system as the experimental scenario, and establishes a general quantitative evaluation method for assessing the intelligence level of such systems based on previous research. The evaluation system consists of three layers with a total of 18 evaluation indicators, shown in Figure 4, and is based on capabilities such as perception and understanding, planning and decision-making, and behavior execution. The evaluation framework draws from prior validated models while adapting to task-specific requirements in extraterrestrial terrain exploration.

4.2. Experiment Setup

To validate the proposed HPP-EM framework, an experimental setup was constructed based on a real-world unmanned system scenario. The system’s intelligence was evaluated across three primary functional dimensions: Perception and Understanding, Planning and Decision-Making, and Behavioral Execution. Each dimension corresponds to a first-level evaluation indicator with an associated score and comprehensive weight, as summarized below:

The comprehensive weight for “Perception and Understanding” is 0.32, with a score of 77.32.

The comprehensive weight for “Planning and Decision-Making” is 0.17, with a score of 64.00.

The comprehensive weight for “Behavioral Execution” is 0.51, with a score of 87.25.

The values of each parameter are as follows:

m = 3 w_{1} = 0.32, S_{1} = 77.32 w_{2} = 0.17, S_{2} = 64.00 w_{3} = 0.51, S_{3} = 87.25

These parameters serve as inputs to the HPP-EM framework and are used to calculate threshold scores for each qualitative level. The goal is to demonstrate how the framework can effectively map real quantitative distributions to meaningful qualitative intelligence grades.

The weights were derived using a hybrid weighting strategy that integrates both subjective and objective methods to ensure a balance between expert insight and data-driven consistency. Specifically, the Analytic Hierarchy Process (AHP) was used to capture expert judgment from a panel of five domain specialists. The resulting pairwise comparison matrix achieved a consistency ratio (CR) of 0.04, indicating a high level of logical coherence. These initial subjective weights were subsequently adjusted using the entropy weight method, which accounts for the dispersion and information entropy of the underlying data. The final comprehensive weights thus reflect both expert priorities and statistical variability across evaluation indicators.

4.3. Test 1: Score Threshold-Based Grading Test

The decision to use six nonlinear intervals aligns with the ten-level structure of ALFUS, ensuring mapping resolution. The intervals are non-uniform to reflect diminishing marginal gains in intelligence progression. The parameter

k

serves as a tunable factor to adjust threshold steepness.

Based on the quantitative scores and weights of the first-level evaluation indicators, the nonlinear scoring intervals are divided as shown in Figure 5. This test aims to validate whether the DTA module correctly reflects the effect of indicator weights on score thresholds. This study selects

[0, 100]

as the scoring range, which is divided into six continuous nonlinear intervals, i.e.,

a = 5

. The range of the level coefficient

k

can be calculated as

[0, \frac{10}{19}]

, and by setting

k = 0.25

, the minimum score matrix is obtained as follows:

S = [\begin{matrix} 0.00 & - 0.44 & 0.56 \\ 20.00 & 18.68 & 21.68 \\ 40.00 & 37.80 & 42.79 \\ 60.00 & 57.77 & 62.83 \\ 75.00 & 72.27 & 78.46 \\ 90.00 & 87.90 & 92.65 \end{matrix}]

The saturation treatment of

S_{2, 0}

results in the quantitative scoring and grading threshold table for the indicators, as shown in Table 3. It can be observed that the score for each level of the Perception and Understanding indicator is the same as the scoring interval, while the score for the Planning and Decision-Making indicator is slightly lower, and the score for the Behavioral Execution indicator is slightly higher. This is determined by the weight of each indicator, which indicates that the mapping method proposed in this paper has successfully quantified the impact of weight on the grade levels.

This outcome demonstrates how the proposed HPP-EM framework explicitly quantifies and adjusts the influence of indicator weights on the mapping between quantitative scores and qualitative grades. Specifically, the Dynamic Threshold Allocation (DTA) module incorporates the indicator weights into the calculation of both individual and overall threshold scores, rather than allowing them to merely influence aggregated results.

In the quantification phase, Equation (5) integrates each indicator’s weight into the computation of its minimum required score under a given grade level. The formula includes a weight deviation factor, constructed using the indicator’s weight, the median of all weights, and the maximum and minimum weights in the system. This ensures that indicators with greater importance (higher weights) require proportionally higher performance for a given grade, while less critical indicators are assigned lower thresholds.

In the adjustment phase, the model calculates an adaptive threshold matrix across all indicators and grades (Equation (7)), and applies a tunable level sensitivity coefficient (λ) to control the steepness and granularity of grade transitions. In addition, Equation (8) determines the minimum total score threshold needed to qualify for each grade, thereby enforcing structural balance between local (indicator-level) and global (system-level) compliance.

Compared with traditional weighted average models, which embed weight effects only in the final score computation, this method introduces a modular and interpretable mechanism that directly shapes grade boundaries through weight-aware thresholding. This approach improves transparency, adaptability, and robustness—particularly in evaluation contexts involving heterogeneous weight distributions or hierarchical indicator structures.

4.4. Test 2: Grade Mapping Results Test

The corresponding quantitative grading of each indicator for the qualitative level p is determined, as shown in Figure 6. As shown in the figure, Level 10 in the qualitative ALFUS framework corresponds to three Level 5 quantitative indicators. The qualitative description of Level 9 corresponds to Level 5 in Perception and Understanding and Behavioral Execution, as well as Level 4 in Planning and Decision-Making. Meanwhile, the qualitative description of Level 0 corresponds to three Level 0 quantitative indicators.

This test illustrates how the HPP-EM model supports non-uniform indicator qualification, enabling flexible yet structured grade transitions without hard aggregation.

In Figure 6, different colors are used to distinguish the quantitative score levels corresponding to each qualitative grade. Higher quantitative levels are marked with deeper color shades, highlighting their relative contribution to the overall qualitative intelligence level.

4.5. Test 3: Final Mapping Results Test

Based on the previous test results, the mapping relationship between quantitative scores and qualitative levels is established as shown in Table 4. The experimental system achieved scores of 73.32, 64.00, and 87.25 in Perception and Understanding, Planning and Decision-Making, and Behavior Execution, respectively, with a total score of 224.57. Based on these results, the system’s intelligence level corresponds to ALFUS Level 7. Expert evaluation further confirms that the system’s capabilities align with Level 7 requirements, demonstrating proficiency in “mobile target detection, autonomous driving on roads or off-road environments.” However, it has not yet reached Level 8, which requires more advanced capabilities such as “collaborative operations, convoy escort, intersection navigation, and transport following.”

In preliminary comparisons with traditional linear mapping and entropy-only methods, HPP-EM showed superior alignment with expert-level classifications, particularly under uneven indicator weight scenarios, as shown in Table 5. Detailed comparative experiments are planned for future work.

This final test verifies the full mapping flow from raw data to qualitative classification, demonstrating model consistency and alignment with domain expectations. In practical terms, this also shows how the HPP-EM model can support certification-like assessment by linking test results to standardized descriptions.

Though validated in an extraterrestrial system, the HPP-EM framework is modular and adaptable to various unmanned domains (e.g., autonomous driving, industrial robotics), provided appropriate indicators and qualitative standards are defined.

5. Conclusions

This study proposes an HPP-EM method to address the existing gap between quantitative evaluation scores and qualitative grading levels in unmanned intelligent system assessments. By integrating a nonlinear score partitioning mechanism with a level-aware threshold allocation model, the framework enables a structured and interpretable transformation from multi-indicator numerical results to standardized qualitative levels. This mapping strategy not only enhances the consistency and comparability of assessment outcomes but also provides a generalizable approach suitable for various evaluation frameworks such as ALFUS.

The proposed method was validated through a case study involving an extraterrestrial unmanned system with three primary functional dimensions. The experimental results confirmed that the system’s mapped qualitative level—ALFUS Level 7—accurately corresponded to its observed capabilities, and the result was further verified by expert judgment. The mapping logic showed strong robustness when faced with skewed weight distributions and nonlinear score structures, indicating that the model is adaptable to diverse evaluation settings. These results support the feasibility and reliability of the HPP-EM method in real-world intelligent system assessments.

In future work, the framework will be extended toward a dynamic “evaluation–mapping–optimization” loop, where classification thresholds are iteratively updated based on newly acquired assessment data. To enhance adaptability while preserving interpretability, we will explore rule-guided neural network structures for flexible threshold tuning. Furthermore, comparative studies with conventional evaluation approaches such as entropy-only and AHP-only methods will be conducted to quantify the performance advantages of the HPP-EM framework. Ultimately, this research aims to establish a scalable and certifiable evaluation model applicable across multiple domains including aerospace, autonomous driving, and industrial robotics.

Author Contributions

Conceptualization, R.H., Y.H., and C.W.; methodology, R.H. and Y.H.; validation, C.W., L.Y. and M.S.; formal analysis, R.H.; investigation, M.S.; resources, Y.H.; data curation, C.W.; writing—original draft preparation, R.H., M.S., Y.H., L.F. and D.W.; writing—review and editing, L.F. and D.W.; visualization, M.S.; supervision, L.F.; project administration, D.W.; funding acquisition, L.F. and R.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. U21B6001, and Beijing Nova Program No. 20220484027.

Data Availability Statement

The data presented in this study are currently not publicly available. Efforts are underway to make the data publicly accessible in the near future. Until then, the data are stored securely and can be made available upon reasonable request to the corresponding author, subject to privacy and ethical restrictions.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Liu, L.; Lu, S.; Zhong, R.; Wu, B.; Yao, Y.; Zhang, Q.; Shi, W. Computing Systems for Autonomous Driving: State of the Art and Challenges. IEEE Internet Things J. 2021, 8, 6469–6486. [Google Scholar] [CrossRef]
Manickam, P.; Mariappan, S.A.; Murugesan, S.M.; Hansda, S.; Kaushik, A.; Shinde, R.; Thipperudraswamy, S.P. Artificial intelligence (AI) and internet of medical things (IoMT) assisted biomedical systems for intelligent healthcare. Biosensors 2022, 12, 562. [Google Scholar] [CrossRef] [PubMed]
Jie, C.; Sun, J.; Wang, G. From unmanned systems to autonomous intelligent systems. Engineering 2022, 12, 16–19. [Google Scholar]
Hassan, K.; Thakur, A.K.; Singh, G.; Singh, J.; Gupta, L.R.; Singh, R. Application of artificial intelligence in aerospace engineering and its future directions: A systematic quantitative literature review. Arch. Comput. Methods Eng. 2024, 31, 4031–4086. [Google Scholar] [CrossRef]
Quan, W.; Jia, L.; Zhang, Z.; Chen, C.; Wang, L. A Multi-Dimensional Dynamic Evaluation Method for the Intelligence of Unmanned Aerial Vehicle Swarm. In Proceedings of the 2023 IEEE International Conference on Unmanned Systems (ICUS), Hefei, China, 13–15 October 2023; pp. 731–737. [Google Scholar]
Eley IV, T.C.; King, J.L.; Lyytinen, K.; Nickerson, J.V. SAE J3016 as a Learning Device for the Driving Automation Community: Technical, Socio-Technical, and Systemic Learning; SAE International: Warrendale, PA, USA, 2024. [Google Scholar]
Huang, H.M.; Pavek, K.; Albus, J.; Messina, E. Autonomy levels for unmanned systems (ALFUS) framework: An update. In Proceedings of the SPIE 5804, Unmanned Ground Vehicle Technology VII, Orlando, FL, USA, 27 May 2005. [Google Scholar]
Young, L.; Yetter, J.; Guynn, M. System analysis applied to autonomy: Application to high-altitude long-enduranceremotely operated aircraft. In Proceedings of the AIAA Infotech@Aerospace Conference, Arlington, VA, USA, 26–29 September 2005. [Google Scholar]
Huang, H.; Zheng, X.; Yang, Y.; Liu, J.; Liu, W.; Wang, J. An integrated architecture for intelligence evaluation of automated vehicles. Accid. Anal. Prev. 2020, 145, 105681. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Li, M.; Huang, Y.; Chen, H. Intelligence Assessment of Automated Driving Systems Based on Driving Intelligence Quotient. In Proceedings of the 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI), Tianjin, China, 29–31 October 2021; pp. 1–6. [Google Scholar]
Zhang, P.; Zhu, B.; Zhao, J.; Fan, T.; Sun, Y. Safety evaluation method in multi-logical scenarios for automated vehicles based on naturalistic driving trajectory. Accid. Anal. Prev. 2023, 180, 106926. [Google Scholar] [CrossRef] [PubMed]
Panagiotopoulos, I.E.; Karathanasopoulou, K.N.; Dimitrakopoulos, G.J. Risk Assessment in the Context of Dynamic Reconfiguration of Level of Driving Automation in Highly Automated Vehicles. In Proceedings of the 2021 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 15–17 December 2021; pp. 1868–1873. [Google Scholar]
Liu, J.Y.; Sun, L.N.; Zhao, Z.M. Unmanned Aerial Vehicle Cluster Distributed Combat Effectiveness Evaluation Based on DoDAF. In Proceedings of the 2022 International Conference on Autonomous Unmanned Systems (ICAUS 2022), Xi’an, China, 23–25 September 2022; Fu, W., Gu, M., Niu, Y., Eds.; Springer Nature: Singapore, 2023; Volume 1010, pp. 2264–2272. [Google Scholar]
Han, Y.; Fang, D.; Zhang, H.; Li, Y. Evaluation of attack capability of UAV intelligent swarm based on AHP Fuzzy Evaluation. J. Phys. Conf. Ser. 2020, 1651, 012012. [Google Scholar] [CrossRef]
Wang, S.; Du, Y.; Zhao, S.; Hao, J.; Gan, L. Research on the construction of weaponry indicator system and intelligent evaluation methods. Sci. Rep. 2023, 13, 19370. [Google Scholar] [CrossRef]
Kong, J.; Zhu, H.; Pan, Y.; Li, Y.; Xiang, Y.; Hou, K. Comprehensive Evaluation of Secondary Equipment in Smart Substation Based on Real-time Information. Mech. Electr. Eng. Technol. 2023, 52, 221–226. [Google Scholar]
He, L.; Yuan, D.; Ren, L.; Huang, M.; Zhang, W.; Tan, J. Evaluation Model Research of Coal Mine Intelligent Construction Based on FDEMATEL-ANP. Sustainability 2023, 15, 2238. [Google Scholar] [CrossRef]
Clough, B.T. Metrics, Schmetrics! How The Heck Do You Determine A UAV’s Autonomy Anyway. NIST Spec. Publ. 2002, 990, 313–319. [Google Scholar]
Sheridan, T.B.; VerPlank, W.L. Human and Computer Control of Undersea Teleoperators. In Human and Computer Control of Undersea Teleoperators; Institute of Tech Cambridge Man-Machine Systems Lab.: Boston, MA, USA, 1978. [Google Scholar]
Gopinath, V. On Safe Collaborative Assembly with Large Industrial Robots; Linkopings Universitet: Linköping, Sweden, 2019. [Google Scholar] [CrossRef]
Ma, S.; Zhang, H.; Yang, G. Target threat level assessment based on cloud model under fuzzy and uncertain conditions in air combat simulation. Aerosp. Sci. Technol. 2017, 67, 49–53. [Google Scholar] [CrossRef]
Huang, H.-M. Autonomy levels for unmanned systems (ALFUS) framework: Safety and application issues. In Proceedings of the 2007 Workshop on Performance Metrics for Intelligent Systems (PerMIS ‘07), Washington, DC, USA, 28–30 August 2007; Association for Computing Machinery: New York, NY, USA, 2007; pp. 48–53. [Google Scholar]
Peng, N.; Yin, J.; Wang, Q.; Huang, W. Research on Multi-Level and Multi-Dimensional Networked Effectiveness Evaluation Method of Aerospace Equipment. Comput. Simul. 2022, 39, 91–95. [Google Scholar]
Zhang, Y.; Wang, X.; Zheng, L. Effectiveness Evaluation Method of the Unmanned Aerial Vehicle Swarm Considering Mission Process. In Proceedings of the 2021 5th Chinese Conference on Swarm Intelligence and Cooperative Control, Shenzhen, China, 19–22 November 2021; Lecture Notes in Electrical Engineering. Ren, Z., Wang, M., Hua, Y., Eds.; Springer: Singapore, 2023; Volume 934. [Google Scholar]
Dong, S.; Yu, F.; Wang, K. Safety evaluation of rail transit vehicle system based on improved AHP-GA. PLoS ONE 2022, 17, e0273418. [Google Scholar] [CrossRef] [PubMed]
Russo, R.d.F.S.M.; Camanho, R. Criteria in AHP: A Systematic Review of Literature. Procedia Comput. Sci. 2015, 55, 1123–1132. [Google Scholar] [CrossRef]
Sun, Q.; Ge, X.; Liu, L.; Xu, X.; Zeng, Y.; Zhang, Y. Smart Grid Multi-Attribute Network Process Comprehensive Evaluation Method and Its Application. In Proceedings of the 2012 Asia-Pacific Power and Energy Engineering Conference, Shanghai, China, 7–8 April 2012; pp. 1–5. [Google Scholar]
Xiao, B.; Zhao, X.; Dong, G. Summary and Prospect of Comprehensive Evaluation Methods of Power Quality. Power Gener. Technol. 2024, 45, 716–733. [Google Scholar]
Han, Y.; Fang, D.; Zhang, H.; Li, Y. Efficiency evaluation of intelligent swarm based on AHP entropy weight method. J. Phys. Conf. Ser. 2020, 1693, 012072. [Google Scholar]
Zhao, H.; Wang, Y.; Liu, X. The Evaluation of Smart City Construction Readiness in China Using CRITIC-G1 Method and the Bonferroni Operator. IEEE Access 2021, 9, 70024–70038. [Google Scholar] [CrossRef]

Figure 1. (a) ALFUS detailed model [7]; (b) autonomous control system plotted on initial ACL radar chart [18]; (c) the NASA SMART framework [8]; (d) Level of Automation (LoA) taxonomy [20].

Figure 3. The overall framework of the Hierarchical Parameter Partition-Based Equivalent Mapping Method.

Figure 4. Architecture of the Quantitative Evaluation System.

Figure 5. The quantitative rating system.

Figure 6. System intelligence level hierarchical parameter.

Table 1. Comparison of autonomy evaluation frameworks for unmanned systems.

Criteria	ALFUS	SAE J3016	NASA SMART
Target Domain	General unmanned systems (land, air, space)	Road-based autonomous vehicles	Spacecraft automation
Evaluation Scope	Multi-dimensional: mission, environment, human independence	Driving tasks and human involvement levels (L0–L5)	Command sequence automation
Level Granularity	10-level detailed autonomy scale	6 fixed levels	Coarse categorical stages
Aerospace Suitability	High—supports dynamic, uncertain extraterrestrial contexts	Not applicable	Partial—lacks scoring flexibility
Integration with Quantitative Models	Compatible with score-based, weighted mapping frameworks	Static framework, not modular	Not structured for multi-indicator evaluation

Table 2. ALFUS intelligence level classification.

Level	ALFUS Qualitative Description
Level 1	remote control
Level 2	remote control/vehicle status recognition
Level 3	pre-task planning or post-order traversal
Level 4	online processing of sensor images
Level 5	simple obstacle detection and avoidance
Level 6	simple obstacle detection and avoidance, terrain analysis
Level 7	moving target detection for autonomous driving in road and off-road environments
Level 8	collaborative operations, convoy, intersection navigation, and transportation following
Level 9	collaborative operation, recognition of traffic signs and signals, approaching human-level driving capabilities
Level 10	achieving or surpassing full autonomous capabilities at or beyond human performance levels

Table 3. Thresholds for quantitative score classification.

Level	Perception and Understanding	Planning and Decision-Making	Behavioral Execution
Level 0	0.00	0.00	0.56
Level 1	20.00	18.68	21.68
Level 2	40.00	37.80	42.79
Level 3	60.00	57.77	62.83
Level 4	75.00	72.27	78.46
Level 5	90.00	87.90	92.65

Table 4. Mapping table of quantitative scores to qualitative levels of system intelligence.

ALFUS Level	Quantitative Indicators				ALFUS Qualitative Description
ALFUS Level	Perception and Understanding (≥)	Planning and Decision-Making (≥)	Behavioral Execution (≥)	Total Score (≥)	ALFUS Qualitative Description
Level 0	0.00	0.00	0.56	0.56	\
Level 1	20.00	0.00	21.68	38.68	remote control
Level 2	20.00	18.68	21.68	60.36	remote control/vehicle state recognition
Level 3	40.00	18.68	42.79	96.48	pre-mission planning or post-order traversal
Level 4	40.00	37.80	42.79	120.59	online processing of sensor image
Level 5	60.00	37.80	62.83	155.57	simple obstacle detection and avoidance
Level 6	60.00	57.77	62.83	180.6	complex obstacle detection and avoidance, terrain analysis
Level 7	75.00	57.77	78.46	205.04	autonomous driving in road or off-road environments with moving target detection
Level 8	75.00	72.27	78.46	225.73	collaborative operations, convoy escort, intersection navigation, and transport following
Level 9	90.00	72.27	92.65	250.17	collaborative operation, recognition of traffic signs and signals, approaching human-level driving capabilities
Level 10	90.00	87.90	92.65	270.55	achieving or exceeding human-level fully autonomous capabilities

Note: Level 7 is bolded to indicate the evaluated intelligence level of the system in this study.

Table 5. Comparison of Different Mapping Methods for System Intelligence Level.

Evaluation Dimension	Linear Mapping	Entropy Weighting	HPP-EM (Proposed)
Weight heterogeneity considered	No	Yes	Yes
Structural information utilized	No	No	Yes (Hierarchical Path Modeling)
Expert classification consistency	$~ 58 - 65 %$	$~ 70 ~ 75 %$	$87 - 92 %$
Score achieved (total)	$~ 210.0$	$~ 215.0$	224.57
Mapped ALFUS level	Level 6	Level 6–7	Level 7 (expert-confirmed match)
Robustness under uneven weights	Low	Medium	High
Applicable scenarios	Simple indicator sets	Weight-sensitive systems	Complex, hierarchical systems
Summary of strengths	Simple and fast	Incorporates entropy	High interpretability and accuracy

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hao, R.; Shi, M.; Hu, Y.; Wei, C.; Feng, L.; Wang, D.; Yuan, L. Bridging Quantitative Scoring and Qualitative Grading: A Mapping Framework for Intelligent System Evaluation. Electronics 2025, 14, 2470. https://doi.org/10.3390/electronics14122470

AMA Style

Hao R, Shi M, Hu Y, Wei C, Feng L, Wang D, Yuan L. Bridging Quantitative Scoring and Qualitative Grading: A Mapping Framework for Intelligent System Evaluation. Electronics. 2025; 14(12):2470. https://doi.org/10.3390/electronics14122470

Chicago/Turabian Style

Hao, Renjian, Mujia Shi, Yong Hu, Chunling Wei, Lihang Feng, Dong Wang, and Li Yuan. 2025. "Bridging Quantitative Scoring and Qualitative Grading: A Mapping Framework for Intelligent System Evaluation" Electronics 14, no. 12: 2470. https://doi.org/10.3390/electronics14122470

APA Style

Hao, R., Shi, M., Hu, Y., Wei, C., Feng, L., Wang, D., & Yuan, L. (2025). Bridging Quantitative Scoring and Qualitative Grading: A Mapping Framework for Intelligent System Evaluation. Electronics, 14(12), 2470. https://doi.org/10.3390/electronics14122470

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bridging Quantitative Scoring and Qualitative Grading: A Mapping Framework for Intelligent System Evaluation

Abstract

1. Introduction

2. Related Work

2.1. Qualitative Evaluation Methods

2.2. Quantitative Evaluation Methods

3. Proposed Framework

3.1. The Nonlinear Interval-Partitioned Module

3.2. The Quantitative–Qualitative Grade Mapping Module

3.3. The Dynamic Threshold Allocation Module

4. Experimental Validation

4.1. The Classical Mapping Case

4.1.1. Qualitative Method

4.1.2. Quantitative Method

4.2. Experiment Setup

4.3. Test 1: Score Threshold-Based Grading Test

4.4. Test 2: Grade Mapping Results Test

4.5. Test 3: Final Mapping Results Test

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI