Robot Performance Evaluation for Engineering Applications: A Systematic Review of Metrics, Methods and Practices

Wei, Xiang; Peng, Songjie; Zhao, Baosheng

doi:10.3390/technologies14050297

Open AccessSystematic Review

Robot Performance Evaluation for Engineering Applications: A Systematic Review of Metrics, Methods and Practices

by

Xiang Wei

^1,2

,

Songjie Peng

^2,3 and

Baosheng Zhao

^1,*

¹

School of Mechanical Engineering and Automation, Liaoning University of Science and Technology, Anshan 114051, China

²

State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Technologies 2026, 14(5), 297; https://doi.org/10.3390/technologies14050297

Submission received: 26 March 2026 / Revised: 9 May 2026 / Accepted: 10 May 2026 / Published: 12 May 2026

Download

Browse Figures

Versions Notes

Abstract

Robotics integration across manufacturing, healthcare, and hazardous environments demands robust performance evaluation. This study proposes a comprehensive Task–Environment–System–Metric (TESM) framework to link operational tasks and environmental constraints with quantifiable metrics. Based on TESM, a multi-level evaluation system is established, covering kinematic/dynamic performance, perception, human–robot interaction (HRI), reliability, and lifecycle economics. We systematically review key evaluation methodologies, including mechanistic modeling, digital twin simulation, physical testing, and multi-criteria decision-making (MCDM). Furthermore, typical engineering applications—ranging from industrial manipulators and mobile robots to collaborative and field systems are analyzed to demonstrate practical implementation. Despite significant progress, challenges persist regarding unified standards, testing fidelity, and the “black box” nature of data-driven assessments in safety-critical scenarios. This review concludes by identifying future research directions, such as establishing benchmark testing platforms, improving lifecycle assessment schemes, and developing modular evaluation tools. These advancements aim to ensure the scalable and reliable deployment of robotic systems in complex engineering environments.

Keywords:

robot performance evaluation; TESM framework; benchmarking; digital twin; multi-criteria decision-making; human–robot interaction

1. Introduction

Robotic systems have deeply penetrated into fields such as intelligent manufacturing, logistics and transportation, medical health and extreme field exploration, and the operational boundaries of these systems in complex dynamic environments are constantly being expanded [1]. This generalization of application scenarios means that the autonomous perception, decision-making, intelligence, and collaborative safety of robots are no longer icing on the cake, but rather the core foundation supporting their large-scale deployment. Against this backdrop, constructing a system performance evaluation method has become a key link connecting technology verification and engineering applications [1,2,3,4,5]. From an engineering perspective, performance evaluation runs through the entire lifecycle of a robotic system: in the design phase, performance indicators guide the optimization of structural topology, drive schemes and control strategies [6]; in the prototyping and calibration phase, test results directly determine whether the product meets standards and user requirements [7]; in the deployment phase, evaluation results are used for process optimization and system configuration decisions [6,7,8]; in the maintenance phase, performance degradation monitoring is crucial for ensuring the reliability and safety of the system [9]. Therefore, establishing a systematic, repeatable, and engineering-oriented performance evaluation framework has important scientific value and engineering practicality. Significant progress has been made in the field of robot performance evaluation, and some representative research directions have emerged. For industrial robots, existing research covers measurement systems for geometric accuracy, dynamic performance, stiffness, and trajectory tracking error, with some research results incorporated into international and industry standards [4,10,11,12]. In contrast, for mobile and service robots, research focuses on navigation accuracy, obstacle avoidance, and human–robot interaction. Furthermore, with the continuous improvement of robot system intelligence, data-driven performance prediction, health monitoring, and fault diagnosis methods are gradually becoming new research hotspots.

Despite these advances, a comprehensive framework is still lacking to comprehensively address the evaluation challenges in complex engineering environments. Existing research faces several significant limitations: First, performance index systems are fragmented and lack uniformity, making coordination difficult across different scenarios and robot types [13]. Second, evaluation methods often deviate from actual engineering environments, failing to fully reveal actual performance under complex working conditions and long-term operation [14]. Third, research on multi-criteria integrated evaluation and engineering decision support is insufficient, lacking a unified performance trade-off framework [2,13]. Fourth, the credibility, interpretability, and verifiability of data-driven models in safety-critical tasks require further in-depth research [15,16]. Therefore, this paper systematically reviews and reconstructs robot performance evaluation from an engineering application perspective. Starting from a macroscopic perspective of “task–environment–system–metrics,” we clarify the mapping relationships between each level and construct a metric generation path with engineering practicality. Unlike previous studies, this paper focuses on how to solve the robustness evaluation problem under uncertain environments by organically combining mechanism modeling, simulation analysis, and physical testing, while also looking ahead to the future development trends of standardized testing platforms and full lifecycle evaluation.

The subsequent sections of this paper are structured as follows: Section 2 provides a systematic literature analysis of research related to robot performance evaluation. Section 3 proposes the overall framework for robot performance evaluation from an engineering perspective and constructs a multi-level, multi-dimensional performance metric system. Section 4 reviews modeling methods and experimental testing means in robot performance evaluation, analyzing the applicability and characteristics of different methods. Section 5 summarizes application cases and experiences of performance evaluation in engineering practice, incorporating typical robot systems. Section 6 further discusses the latest progress in data-driven and intelligent technologies, and systematically reviews the development status and trends of relevant standard systems and open testing platforms. Finally, Section 7 concludes the paper and offers an outlook on future research directions.

2. Bibliometric Analysis

To articulate the state of the art in robot performance evaluation, this systematic review was conducted in accordance with the PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines in the Supplementary Materials. A systematic search was performed in the Web of Science database using keywords such as “robot kinematic performance” and “performance measurement,” spanning the period from January 2000 to December 2025. The literature search strategy was designed to capture three main aspects: robot types, evaluation methods, and application scenarios. The core search string used was: TS = (“robot*” OR “manipulator” OR “mobile robot*”) AND TS = (“performance evaluation” OR “performance metric*” OR “benchmark*” OR “standard*”) AND TS = (“engineering application*” OR “manufacturing” OR “service”). We conducted the search across Web of Science Core Collection, IEEE Xplore, and Scopus. It is worth noting that traditional risk-of-bias tools (like Cochrane RoB) are designed for clinical trials and do not fit engineering reviews well. Therefore, our quality screening relied primarily on peer-review status, the presence of physical experimental validation (as opposed to pure simulation), and relevance to industrial standards (such as ISO or GB/T). The detailed literature identification and screening process are illustrated in the PRISMA flow diagram Figure 1. Following the rigorous exclusion of irrelevant literature, 903 valid records were compiled for further analysis. CiteSpace was subsequently employed to perform co-occurrence analysis, clustering, and burst detection to map the knowledge domain. For the co-occurrence analysis, we used CiteSpace with the time slice set from January 2000 to December 2025 (1 year per slice). Node types were set to “Keyword” and “Reference”. We used the g-index (

k = 25

) to select the representative items in each slice. To make the visualization clearer and reduce network clutter, we applied the Pathfinder network scaling along with the “Pruning sliced networks” option. This setup helped to identify the main structural paths and boundaries of different research clusters. To ensure the rigor of the selection process, two researchers independently screened the titles and abstracts of the retrieved records, with any discrepancies resolved through consensus. From the included studies, we extracted key data points including robot categories, evaluation dimensions (e.g., kinematics, perception, and HRI), and specific performance metrics to support the construction of the TESM framework. Since this review focuses on engineering technical frameworks and qualitative methodological synthesis rather than clinical intervention outcomes, a formal risk of bias assessment at the individual study level was not performed, as it does not apply to the technical nature of the analyzed literature.

Through a systematic examination of the keyword network and burst terms, the evolution of research directions can be delineated. Figure 2 provides a comprehensive visualization of the primary research themes, where node size reflects the contribution of different groups.

As delineated in Figure 2a, the cluster analysis highlights several core themes, including “adaptive control,” “SLAM,” and “AI.” These clusters indicate that robot performance evaluation is inherently a cross-disciplinary field. Notably, “adaptive control” forms the largest cluster, underscoring its predominant role in reliability verification. Figure 2b further illustrates the international collaboration network. China and the USA exhibit dominant positions in terms of publication volume and network centrality. Furthermore, countries such as Korea, Japan, and Germany also demonstrate strong research capabilities, affirming a highly internationalized pattern of collaboration. Simultaneously, the burst detection analysis in Figure 2c identifies keywords with significantly increased attention from 2015 to 2025. High-intensity terms like “tracking” and “task analysis” reflect a dynamic shift in evaluation focus. This trajectory suggests a shift from basic stability metrics towards comprehensive performance analysis in complex scenarios.

The temporal evolution of publication volume, depicted in Figure 3, reveals minimal fluctuation prior to 2004. However, since 2005, a steady increase has been observed, peaking in the last four years.This significant momentum indicates that robot performance evaluation has established itself as a focal area of research.

3. Framework Formulation and Hierarchical Metric System

3.1. The Task–Environment–System–Metric (TESM) Framework

From the perspective of engineering applications, robot performance evaluation is inherently a closed-loop process constituted by “Task Requirements–Environmental Conditions–System Capabilities–Evaluation Metrics” [17,18,19,20]. In contrast to traditional device testing which focuses solely on the parameters of the robot body, performance evaluation oriented towards actual engineering scenarios emphasizes task-driven factors and environmental constraints. Spielberg et al. [21] and Tahir et al. [22] posited that a rational definition of “what performance to evaluate, which metrics to employ, and how to interpret the results” can only be established after explicitly clarifying “what task the robot needs to perform” and “in what environment the task is executed.”

In terms of overall architecture, robot performance evaluation typically encompasses four core elements. First, the Task. This describes the functions and objectives the robot must achieve in specific engineering scenarios [23], such as precision assembly by industrial manipulators, autonomous transport by mobile robots in warehousing, or cooperative screw-driving by collaborative robots. Tasks are generally characterized by a set of Key Performance Requirements (KPRs) [24], including positioning accuracy, cycle time, payload capacity, and safety levels. Second, the Environment. Environmental conditions—ranging from physical and operational constraints to information network layouts—directly dictate the reachable workspace, sensor measurement quality, and control algorithm effectiveness, serving as a critical source of variance in performance evaluation outcomes. Third, the System. As Miller et al. [25] stated, the system comprises not only the robot body (mechanical structure, drive units, controllers, and sensors) but also end-effectors, fixtures, external measurement systems, and upper-level scheduling/execution systems. In engineering applications, the object of performance evaluation is often the holistic “Robot + Process + Integration” system rather than the standalone robot. Fourth, the Metric. Metrics are formal representations of “task completion and system behavior,” used to quantitatively or semi-quantitatively measure performance in aspects such as accuracy, speed, stiffness, stability, safety, reliability, energy consumption, and economics [26]. Rational metrics must reflect the essential requirements of the task while possessing measurability, repeatability, and comparability.

Based on these four elements, a Task–Environment–System–Metric (TESM) framework is articulated. Starting from task requirements and given environmental conditions, the capabilities required for the robot system are analyzed to select or design matching performance metrics and evaluation methods. The evaluation results are then fed back into system design, process planning, and maintenance strategies, achieving a closed loop from “evaluation” to “optimization.” This architecture provides a unified logical starting point for the subsequent construction of the indicator system.

To clarify the methodological contribution of the TESM framework, Table 1 compares it with existing evaluation models. While traditional standards (e.g., ISO 9283 [27]) focus on kinematic precision in controlled settings, and systems engineering models emphasize developmental lifecycles, TESM explicitly couples dynamic environmental factors with specific task requirements. This makes it uniquely capable of evaluating modern AI-driven autonomy in unstructured scenarios.

In actual engineering projects, robot performance is subject to the combined influence of multi-layered factors. To facilitate analysis and optimization, it is necessary to hierarchically segregate performance, thereby enabling appropriate evaluation means and improvement measures at different levels. Generally, robot performance is delineated into three levels: component-level, system-level, and task-level [14].

(1) Component-level Performance. This level primarily concerns the performance of individual parts or subsystems, such as the speed/torque characteristics of servo motors, response bandwidth of drives, backlash and stiffness of reducers, and sensor resolution. Khan et al. [30] in validating their HILS testing system, noted that component-level evaluation is typically conducted under laboratory conditions using standardized testing fixtures and procedures, aiming to verify compliance with design specifications. While superior component performance is the foundation for high system performance, it does not directly guarantee success at the task level.

(2) System-level Performance. This level focuses on the robot body and its integrated overall behavior, such as absolute and repeatability positioning accuracy, end-effector trajectory tracking error, workspace range, and comprehensive stiffness/vibration characteristics. For mobile robots, this includes path tracking accuracy, pose estimation error, maximum traversable speed, and system stability. System-level evaluation is usually conducted in standardized or semi-standardized test environments, combining mechanistic modeling, simulation analysis, and physical measurement. Results at this level reflect both the aggregated effect of component performance and the quality of structural design, control algorithms, and system integration [28].

(3) Task-level Performance. This level is directly oriented towards specific application tasks and production/service goals, focusing on the comprehensive performance of the robot system under real or near-real operating conditions [15]. For instance, Nilakantan et al. [31] proposed a time-based model focusing on cycle time, task success rate, defect rate, downtime, and energy consumption per unit on assembly lines; Wang et al. [32] designed a hybrid human–robot collaborative manufacturing system, emphasizing collaborative efficiency, operator load, and subjective comfort; whereas Brosque et al. [33] introduced a robot evaluation framework for construction innovators, highlighting task completion rates, environmental adaptability, and safety accident rates on-site. Task-level evaluation typically necessitates field data, process monitoring, and statistical analysis. Although it is significantly affected by environmental fluctuations, process changes, and human operation, it most accurately reflects the degree to which the system satisfies engineering requirements.

Notably, there exists a bottom-up propagation and top-down constraint relationship among these three levels. Component performance influences system behavior through integration and control strategies, while task-level metrics impose constraints and optimization directions on system and component levels. Wang et al. [34] cautioned in a quality assurance report that blindly pursuing extreme component precision may lead to a significant increase in cost and maintenance complexity, potentially reducing the cost-effectiveness of the system for specific tasks. Consequently, the construction of a performance evaluation system requires balancing synergy and trade-offs between different levels.

3.2. Mechanism for Mapping Performance Requirements to Evaluation Metrics

To ensure that performance evaluation effectively serves engineering decision-making, it is essential to systematically analyze performance requirements from specific application scenarios and map them into measurable, actionable evaluation metrics and testing schemes. The procedure typically follows these steps:

(1) Scenario and Task Analysis. First, a thorough investigation and modeling of the target engineering scenario are conducted to clarify the system’s operation mode, workflow, and upstream/downstream interfaces. Duflou et al. [29] summarized methodologies for analyzing manufacturing cell flows, arguing that in discrete manufacturing, it is necessary to understand production beats, workpiece types, and fluctuation ranges. Similarly, Tulli et al. [35] in discussing order fulfillment efficiency, indicated that warehouse logistics must consider layout, traffic flow, and peak loads. Based on scenario analysis, key tasks (e.g., grasping, transporting, assembly, inspection, collaboration) are decomposed, and Critical Success Factors (CSFs) for each task are identified.

(2) Performance Requirement Extraction. Based on task decomposition, user needs and process requirements are translated into performance requirements, such as positioning accuracy limits, maximum cycle times, allowable downtime rates, safety levels, and environmental adaptability ranges. This process necessitates the involvement of equipment suppliers, system integrators, and end-users to formulate a consensus on performance requirements through analysis meetings and site surveys.

(3) Metric and Measurement Method Mapping. According to performance requirements, corresponding evaluation metrics are selected or designed, explicitly defining the physical meaning, units, reference standards, and measurement methods for each. In validating a vision system, Jiang et al. [36] mapped “hole-shaft mating quality and cycle stability” to specific metrics like “end-effector pose accuracy, insertion force curve characteristics, cycle time distribution, and yield rate.” Zanchettin et al. [37] and Costanzo et al. [38] concretized “collaborative safety” into metrics such as “maximum contact force, contact energy, minimum safety distance, and human–robot collaborative efficiency.” In this process, relevant international/industry standards and existing benchmarks must be referenced to ensure comparability and acceptability.

(4) Determination of Indicator Priority and Weights. Given the coexistence of multiple metrics, different engineering projects often place varying degrees of emphasis on accuracy, efficiency, safety, and cost. Multi-criteria decision-making methods, such as expert scoring, Analytic Hierarchy Process (AHP), and the Delphi method, are employed to quantify the importance of each indicator, forming a weight distribution or priority ranking. This not only aids in comprehensive scoring during the evaluation phase but also provides a basis for design optimization and scheme comparison.

(5) Evaluation Scheme and Closed-loop Application. Upon completing the requirement-metric mapping, a specific evaluation scheme is formulated, including test condition settings, sample size, experimental procedures, data acquisition/processing methods, and acceptance criteria. Evaluation results serve to verify whether the system meets requirements and are fed back into system design and O&M phases to localize and rectify performance issues. For long-term operating systems, online monitoring and periodic evaluation mechanisms can be established to form a closed loop of performance management.

Through this structured process, abstract “engineering requirements” are systematically transformed into quantifiable performance metrics and evaluation workflows, ensuring that robot performance evaluation transcends theoretical analysis or laboratory testing to be truly embedded in the full lifecycle of engineering projects. This also provides a clear application background and logical foundation for the subsequent construction of the indicator system, modeling, and testing methods.

3.3. Construction of the Multi-Level Performance Indicator System

To evaluate the overall performance systematically, the multi-level indicator system proposed in this section encompasses kinematics/dynamics, perception, human–robot interaction (HRI), and reliability. The construction principle strictly balances objective task requirements with the mutual influence and trade-off mechanisms among metrics, ensuring an engineering-feasible evaluation across multi-dimensional goals.

Table 2 categorizes the key performance indicators of robot systems into four dimensions: kinematics and dynamics, perception and localization, human–robot interaction and safety, and reliability and economy. The definition of each indicator aims to support quantitative evaluation in engineering scenarios and provides example application scenarios for reference.

4. Modeling, Simulation, and Experimental Validation Methodologies

Robot performance evaluation necessitates not only the precise characterization of the system’s physical behaviors across various levels but also the consideration of engineering feasibility and cost constraints. Consequently, research extensively employs a comprehensive methodology encompassing mechanism-based modeling, simulation analysis, physical experimental validation, uncertainty quantification, and multi-criteria decision-making (MCDM). Based on the metric system established in the preceding sections, this chapter systematically reviews existing major methods from the perspectives of modeling and experimentation, thereby discussing their applicability and limitations in engineering applications.

4.1. Mechanism-Based Performance Analysis

Mechanism-based analysis methods are founded on the kinematic and dynamic equations of robotic systems. By explicitly modeling structural parameters, control parameters, and error sources, these methods aim to derive quantitative relationships between performance metrics and design variables [68]. The core philosophy lies in providing theoretical support for structural design optimization, parameter calibration, and compensation strategies by analyzing key metrics—such as positioning accuracy, trajectory tracking error, stiffness, payload capacity, and energy consumption—under the premise of high physical interpretability. Specifically, mechanism-based modeling encompasses three core dimensions: First, at the kinematic level, researchers typically establish precise forward/inverse kinematic equations and Jacobian matrices to explicitly express the sensitivity of the end-effector pose to joint variables and geometric parameters. Building on this, geometric error models are constructed to analyze the propagation mechanism of link length deviations and joint zero-position offsets on end-effector accuracy [69]. Second, in terms of dynamics, models based on Lagrangian or Newton–Euler methods are utilized to describe the coupling relationships between mass distribution, inertia tensors, and drive characteristics. This facilitates the evaluation of joint torque requirements, system response bandwidth, and energy consumption levels under given trajectories [70]. For heavy-payload or precision machining, statics and stiffness modeling are indispensable [71]. By combining structural flexibility theory with contact dynamics, equivalent stiffness and modal characteristic models are constructed, thereby theoretically evaluating the impact of force-induced deformation and vibration on trajectory accuracy and surface quality. Table 3 systematically summarizes the theoretical foundations, key error sources, and corresponding engineering evaluation metrics for these three dimensions.

Surrounding these mechanism models, error sensitivity analysis and tolerance allocation have become the bridge connecting theory with engineering practice [75]. Systematic sensitivity analysis of key structural parameters and control gains allows for the identification of factors with the most significant impact on end-effector performance, thus guiding precision allocation during the design phase. Furthermore, designers can construct multi-objective optimization problems with targets or constraints such as “maximum end-effector error,” “stiffness margin,” and “energy consumption” to comprehensively balance link dimensions, material selection, and drive configuration.

Additionally, with the development of calibration technologies, mechanism models are widely applied in performance compensation [71,72]: by identifying geometric and dynamic parameters through offline or online calibration and incorporating them into the control loop for feedforward correction, absolute positioning accuracy can be significantly enhanced. For systems with significant flexibility, stiffness models can be integrated for pose optimization and force/position control to mitigate errors induced by external loads. However, the accuracy of mechanism models relies heavily on the degree of idealization in assumptions and the quality of parameter identification; in highly complex or strongly nonlinear environments, fusion with data-driven methods is often necessitated [76].

4.2. Simulation-Based Evaluation Methods

With the rapid advancement of simulation platforms and computational power, simulation-based evaluation methods have become a critical intermediate link bridging mechanism models and physical testing [77,78]. By constructing the robot body, fixtures, production layouts, and typical operational scenarios in virtual environments, it enables the systematic assessment of kinematic/dynamic performance, safety, human–robot interaction behaviors, and task completion effectiveness prior to physical deployment. In contrast to purely theoretical analysis, simulation methods offer a more intuitive visualization of motion trajectories, collisions, and load responses under complex conditions; compared to physical testing, they possess the advantages of low cost, low risk, high repeatability, and ease of parameter scanning.

At the mechanical and dynamic level, multibody dynamics simulation is widely employed to analyze the feasibility of novel structures and complex task trajectories. Rega et al. [79] introduced a virtual prototyping methodology, importing CAD models with assigned material properties and joint constraints into multibody simulation software. This approach enabled the evaluation of end-effector pose errors, joint torque margins, and structural vibration levels in a virtual environment, thereby optimizing link sections and damping designs.

In system-level applications, virtual prototyping and Digital Twin (DT) technologies further integrate robot models with process flows and logistics systems. Demirtas et al. [80] proposed a novel collaborative space concept, allowing engineers to iteratively evaluate different schemes in a virtual environment. This facilitates the pre-evaluation of workspace coverage, cycle time capabilities, and safety margins before physical construction, substantially reducing modification costs.

Moreover, for mobile robots and autonomous systems, environmental and perception simulation is particularly crucial. Fei et al. [81] focused on achieving reliable performance in harsh environments, demonstrating that by constructing simulation environments containing diverse indoor/outdoor structures and traffic participants—embedded with high-fidelity sensor models (cameras, LiDAR, radar)—the accuracy, robustness, and boundary performance of perception and localization algorithms can be systematically benchmarked.

It is noteworthy that the validity of simulation results is highly dependent on model fidelity and scenario accuracy [82,83,84], often leading to a well-known “Sim2Real” gap. While physical testing provides the ultimate high-fidelity ground truth, it is inherently limited by high costs, safety risks, and uncontrollable environmental noise. Consequently, in rigorous engineering applications, simulation is rarely used in isolation but rather integrated with physical testing to form a “Simulation–Test” closed loop (as illustrated in Figure 4). On one hand, experimental data is used to calibrate simulation parameters; on the other, simulation results optimize experimental design by highlighting sensitive operating conditions. With the promotion of the Digital Twin concept, increasing numbers of engineering practices are feeding online operational data back into simulation models, forming a closed-loop evaluation system for performance monitoring, predictive maintenance, and process re-optimization [76,80,81,84].

4.3. Experimental Design and Test Platform Construction

Although mechanism modeling and simulation analysis provide strong theoretical support, in engineering practice, any performance conclusion must ultimately be validated through physical testing [85]. Therefore, scientific experimental design and high-quality test platforms are foundational to building a reliable performance evaluation system. Unlike general experiments, robot performance testing often involves multi-degree-of-freedom motion, multi-sensor data acquisition, and the reproduction of complex working conditions. This necessitates a reasonable balance in experimental schemes between coverage, accuracy requirements, and implementation costs [79,86].

Rather than exhaustive testing, scientific experimental design in engineering balances metric coverage, statistical confidence, and implementation costs. Typically, representative test scenarios (e.g., critical workspace boundaries, payload extremes) are strategically coupled with specific measurement schemes—such as laser trackers for manipulator pose, IMU/ground-truth fusion for mobile navigation, and force/torque sensors for HRI safety—to efficiently support targeted decisions like acceptance or benchmarking.

Calibration and uncertainty analysis of the measurement system itself are indispensable, requiring validation against standard artifacts or benchmark conditions prior to testing [87,88,89]. Table 4 summarizes mainstream measurement systems used in robot performance evaluation literature, detailing their technical characteristics and application focus to guide equipment selection.

The design of test platforms and fixtures has a direct impact on the reliability and repeatability of results [90]. For manipulator accuracy assessment, high fixture stiffness and repeatable workpiece positioning are required to prevent fixture deformation from masking robot errors. For collaborative robot safety testing, biofidelic human-equivalent rigid or soft bodies must be constructed to accurately capture collision forces. In mobile robot testing, standardized site layouts and reproducible obstacle configurations are prerequisites for fair comparison. In terms of data acquisition, given that modern robotic systems are equipped with multi-source sensors, experiments often require the simultaneous recording of internal controller data, external measurement data, and environmental sensor data. This necessitates a unified time base or triggering mechanism to ensure the accuracy of subsequent data fusion and trajectory reconstruction. Finally, following metrological principles, measurement uncertainty should be estimated, providing confidence intervals or error upper bounds to endow the evaluation conclusions with clear statistical meaning [89,91].

4.4. Uncertainty Modeling and Robustness Evaluation

In actual engineering environments, uncertainty is ubiquitous, stemming from manufacturing tolerances, sensor noise, environmental disturbances, payload fluctuations, and structural deviations due to model simplification [92]. Ignoring these uncertainties during performance evaluation often leads to overly idealized conclusions, resulting in systems that are “compliant but unstable” upon deployment. Consequently, uncertainty modeling and robustness evaluation have become integral components of performance evaluation. The objective is to characterize the distribution of performance metrics under uncertain conditions, quantify system sensitivity to disturbances, and provide a basis for margin allocation and risk management [91].

At the modeling level, for error sources with distinct statistical characteristics, Park et al. [93] employed probabilistic models, treating structural parameters or external disturbances as random variables. Through Monte Carlo simulation or stochastic finite element methods, the mean, variance, and failure probability of performance metrics are calculated to obtain “probabilistically meaningful” results. Conversely, for parameters where statistical samples are scarce but bounds are known, Kaya et al. [94] adopted interval analysis or fuzzy mathematics, representing parameters as interval variables or fuzzy numbers to derive upper/lower bounds or membership functions of performance metrics, thereby supporting safety analysis under worst-case scenarios. At the evaluation level, robustness is typically characterized by sensitivity to parameter perturbations, performance fluctuation ranges, and constraint satisfaction under given confidence levels. A typical approach involves sampling perturbation points around the design space to perform multi-point simulation or testing, statistically analyzing the distribution to assess system robustness [95].

Robust optimization methods based on uncertainty modeling further integrate evaluation with design. By explicitly introducing parameter uncertainty into the optimization problem, the traditional goal of “optimal performance under nominal parameters” is transformed into “acceptable worst-case performance” or “optimal expected performance under variance constraints.” For safety-critical applications such as collaborative robots or medical robots, these methods are particularly vital, helping to embed safety margins into performance metrics during the design phase, thus mitigating the risk of frequent “patching” during later operation [80,96].

4.5. Application of Multi-Criteria Decision-Making (MCDM) Methods

The ultimate goal of robot performance evaluation is rarely to “provide an abstract score” but rather to support concrete engineering decisions, including equipment selection, system integration comparison, production line configuration, and maintenance strategy formulation. Since these decisions involve multiple conflicting objectives, safety constraints, and economic considerations, single indicators are often insufficient. Therefore, Multi-Criteria Decision-Making (MCDM) methods are widely applied in engineering projects [97,98]. The fundamental idea is to introduce weights and preferences based on the metric system and evaluation results, enabling the systematic ranking of candidate schemes to achieve a reasonable trade-off among multi-dimensional goals.

In equipment selection scenarios, engineering teams construct a “Scheme–Metric” matrix based on test results. Subsequently, subjective weighting methods like Analytic Hierarchy Process (AHP) or objective methods like Entropy Weighting are used to determine metric weights, followed by ranking algorithms such as TOPSIS, VIKOR, or Grey Relational Analysis [99,100].

With the increasing complexity of engineering problems, hybrid MCDM methods have become a mainstream trend. Instead of merely listing frequency, our systematic review identifies that the selection of MCDM tools highly depends on data availability and engineering context. To provide a clear guide for engineering practice, Table 5 summarizes the characteristics and typical applications of the mainstream MCDM methods identified in our included literature [97,98,99,100,101].

For production line optimization, device-level evaluation results are elevated to the system level. Through discrete event simulation, throughput and resource utilization are obtained, and MCDM methods are used to seek the optimal trade-off between “Efficiency–Cost–Robustness” [100,102].

A notable trend is the evolution of MCDM methods towards robustness analysis and visualization to enhance decision reliability and transparency. Figure 5 depicts the hybrid MCDM workflow proposed by scholars such as Kaya [94], Gamal [97], Goswami [98], Rashid [101], Praneeth [103], and Linet al. [104], embodying the complete steps of weighting, ranking, and sensitivity analysis. The sensitivity analysis systematically assesses the impact of parameter and weight variations on the final ranking, thereby improving decision robustness. Simultaneously, by visualizing scheme performance via radar charts or parallel coordinates, decision-makers can intuitively understand “why a specific scheme is superior.” This concept of interpretable and robust decision-making aligns with the previously emphasized interpretability of performance evaluation, serving as a key factor in increasing the adoption of evaluation results in engineering practice.

5. Performance Evaluation Practices for Typical Robotic Systems

Building upon the TESM framework, this section comparatively analyzes evaluation practices across four representative domains: industrial manipulators, mobile robots, collaborative systems, and specialized field robots. By examining their differences in metric prioritization and methodological combinations, we illustrate the cross-domain adaptability and implementation logic of the proposed framework in actual engineering scenarios.

5.1. Performance Evaluation of Industrial Manipulators

Industrial manipulators exhibit the highest maturity in applications such as discrete manufacturing, welding, painting, and assembly/inspection, with a relatively complete and standardized performance evaluation system. In engineering practice, the evaluation typically centers on accuracy metrics, dynamic performance metrics, and system-level production performance, supplemented by reliability and maintainability assessments [4,14].

Regarding accuracy evaluation, absolute positioning accuracy, repeatability, and trajectory tracking accuracy constitute the three core dimensions. As the most authoritative international standard in this domain, ISO 9283 explicitly prescribes the definitions, test trajectories, and calculation methods for these metrics. As shown in Table 6, the standard defines test paths such as the cube diagonal and circle, detailing the physical meaning and evaluation criteria for position accuracy, orientation accuracy, trajectory repeatability, and distance accuracy. Based on these standard trajectories, typical evaluation workflows involve using laser trackers, Coordinate Measuring Machines (CMMs), or high-precision vision systems to acquire the actual end-effector pose, comparing it with the desired value to compute error distributions, extremes, and RMS values [105,106].

In many practical projects, to reduce testing costs, engineers often deploy a limited number of test points at critical workstations or near joint limits. Mechanism models and interpolation methods are then used to infer accuracy levels across the entire workspace. Simultaneously, calibration techniques are employed to compensate for identified systematic errors, making the pre- and post-calibration performance comparison an integral part of the evaluation [107]. Dynamic performance focuses on cycle time capability, maximum acceleration, velocity stability, and vibration characteristics. Typical methods involve executing repetitive motions on representative process trajectories (e.g., weld paths, insertion force profiles), recording joint trajectories, driving torques, end-effector vibrations, and process quality metrics to analyze system response characteristics during high acceleration/deceleration phases and trajectory deviations induced by structural flexibility [108,109,110]. In certain high-speed assembly and packaging applications, frequency domain analysis and modal testing are required to identify natural frequencies, ensuring the operating bandwidth avoids major resonance peaks to mitigate fatigue and quality fluctuation risks.

At the system level, evaluation is gradually expanding from standalone metrics to the holistic “Manipulator + End-effector + Fixture + Process” system [111,112]. Engineering practice often incorporates operational data—such as production cycle time, yield rate, defect type distribution, equipment availability, and changeover time—into the evaluation scope. By statistically analyzing the discrepancy between “theoretical cycle time” and “actual cycle time,” root causes (e.g., conservative path planning, overly conservative collision avoidance, unstable fixture positioning) can be traced. Furthermore, with the rise of flexible manufacturing, the reconfigurability and programming ease required for high-mix low-volume production have elevated metrics like teaching time, program reuse rate, and changeover debugging duration to “generalized performance metrics,” carrying significant weight in engineering selection and scheme comparison [113,114].

Overall, the practice of industrial manipulator performance evaluation reflects a trend expanding from “body performance” to “system-level and process-level performance,” characterized by the combination of mechanism models, standardized testing, and operational data analysis.

5.2. Performance Evaluation of Mobile and Logistics Robots

Mobile and logistics robots are widely deployed in warehousing, e-commerce sorting, workshop material delivery, outdoor inspection, and agricultural operations. Compared to industrial manipulators, their performance evaluation places greater emphasis on navigation localization, path planning, motion safety, and system-level throughput [115,116]. Given that mobile robots often operate in semi-structured or unstructured environments, the impact of environmental factors on performance is particularly significant [117].

At the individual level, navigation and localization accuracy are core metrics [115,116,117,118]. Typical workflows involve executing multiple preset paths in test grounds equipped with high-precision ground truth systems (e.g., optical motion capture, total stations, or HD maps). Trajectory errors between the robot’s estimated pose and ground truth are recorded to compute Absolute Trajectory Error (ATE), Relative Pose Error (RPE), and their accumulation over distance. Building on this, error performance under different localization strategies (SLAM, Visual–Inertial Odometry, Multi-sensor Fusion) is examined to evaluate algorithm robustness against sparse features, lighting changes, or occlusion.

Path planning and motion control performance are characterized by path length, travel time, velocity smoothness, and obstacle negotiation capability [119,120]. To systematically assess adaptability in complex terrains at low cost, simulation-based frameworks are widely adopted. As illustrated in Figure 6, Shu et al. [121] proposed a method starting from terrain map definition and morphological feature extraction. By constructing dynamic models of wheeled or legged robots in virtual environments, large-scale dynamic simulations and dataset generation are performed to output feasibility and efficiency scores for optimal path planning algorithms. This “Virtual–Real Integration” approach allows engineers to pre-evaluate adaptability to narrow passages, ramps, and uneven surfaces prior to physical prototyping.

For mobile systems, such as service-type unmanned small cars, traditional rule-based navigation algorithms often face generalization bottlenecks in unknown or confined environments. Recently, end-to-end deep reinforcement learning (DRL) strategies have significantly improved navigation success rates [122]. By integrating sensor fusion such as radar-based obstacle detection [123] or utilizing spatial-memory mechanisms [124], DRL models have demonstrated highly robust performance in narrow and partially observable spaces. Consequently, evaluating these algorithms necessitates the adoption of specific learning-based metrics, such as reward convergence and generalization performance across varying scenarios.

For AMR or AGV fleets in large-scale warehousing, system-level evaluation of scheduling strategies and traffic management mechanisms is required, focusing on task completion time distribution, order throughput, congestion levels, and deadlock rates. This typically relies on the combined analysis of Discrete Event Simulation (DES) and field operational data.

Safety and environmental adaptability are also indispensable aspects [125]. In indoor scenarios, collision risks and safe stopping performance during interaction with pedestrians and other vehicles must be evaluated using metrics like minimum safety distance, braking distance, and reaction time to sudden obstacles. In outdoor or special environments, tests for ramp climbing, uneven terrain adaptability, ingress protection (IP) ratings, and stability under extreme weather are added. Recently, with increased deployment in “human–robot coexistence” scenarios, capabilities such as human behavior prediction, comfort control in crowded environments, and safety strategies for anomalous behaviors (e.g., children chasing) have been incorporated into performance discussions [121,125,126]. In summary, mobile and logistics robot evaluation exhibits a multi-level coupling of “Algorithm Performance—System Operational Performance—Human–Environment Interaction Performance,” necessitating a closed loop across experimental grounds, simulation scenarios, and real-world data.

5.3. Performance Evaluation of Collaborative and Service Robots

Collaborative and service robots share workspaces or engage in close physical contact with humans. Consequently, their performance evaluation is largely dominated by human–robot collaboration efficiency, safety, and user experience [127,128]. Unlike traditional industrial manipulators where payload and stiffness are paramount, for collaborative robots, safety margins, compliance control performance, and usability in co-existing scenarios become the key metrics.

To systematically quantify these “non-traditional” performances, researchers establish multi-dimensional metric systems encompassing safety, efficiency, and user experience [127,128,129,130]. As shown in Table 7, this system refines evaluation dimensions into primary indicators such as collision risk, monitoring function, task performance, and subjective perception, specifying corresponding physical measurement means (e.g., PFL testers) and subjective assessment methods (e.g., questionnaires).

Regarding safety evaluation, engineering practice typically adheres to standards like ISO/TS 15066, focusing on “collision risk” and “monitoring functions” [131]. Typical procedures involve testing planned contacts and unplanned collisions using force/torque sensors and biofidelic fixtures to verify whether peak contact forces and clamping pressures under various speeds and payloads remain below standard limits. Simultaneously, the response time of safety controllers under anomalous conditions is assessed to ensure rapid transition to a safe stop state [129,130].

In terms of efficiency and user experience, simple machine cycle time is insufficient. The focus shifts to “overall human–robot system performance,” utilizing video analysis and stopwatches to record collaborative cycle times, task success rates, and the ratio of parallel work time. Furthermore, service robot evaluation must incorporate user subjective feedback. Tools like the NASA-TLX scale or custom questionnaires are used to quantify operator trust, mental workload, and interaction comfort. This mode of combining objective data with subjective feedback distinguishes collaborative and service robots from traditional industrial counterparts.

5.4. Performance Evaluation of Field and Specialized Robots

Field and specialized robots are primarily designed for extreme or hazardous environments such as nuclear power plants, mines, deep sea, space, and disaster relief. Their evaluation focuses on environmental adaptability, task reliability, and safety. Compared to industrial robots in controlled workshop environments, evaluation conditions here are more stringent, and experimental organization is more challenging, often requiring a combination of environmental simulation chambers, scaled testing, and limited field trials [132,133].

For environmental adaptability, robots must withstand high temperatures, radiation, humidity, corrosion, dust, and severe vibration. Evaluation is typically conducted via environmental simulation labs combined with field tests. For instance, Cornejo et al. [134] elucidated the relevance of bio-derived functions (autonomy, adaptability, multi-functionality) in enhancing space robot efficiency in microgravity, conducting accelerated life tests and functional retention tests on key components under controlled temperature and radiation sources. Similarly, Kolvenbach et al. [135] evaluated traversability, pose stability, and control response on simulated terrain platforms featuring slopes, gravel, mud, and obstacles. Regarding task reliability, since frequent human intervention is often impossible in field operations, evaluation focuses on system failure rates, task success rates, and redundancy effectiveness over long-term operation. Common practices involve constructing critical task sequences combined with mission scenarios, statistically analyzing success/failure modes over multiple cycles, and tracing failure chains to formulate targeted design improvements.

Safety evaluation holds a dual meaning: firstly, the robot system must maintain structural integrity and functionality to prevent secondary disasters or environmental contamination; secondly, in human–robot mixed operations (e.g., mining or rescue), risks to field personnel—including blind spots, path uncertainty, and behavior under communication loss—must be assessed. With the development of teleoperation and semi-autonomous control, evaluation has begun to address operator workload, situational awareness, and decision pressure within the “Human–Machine-Environment” loop, utilizing simulation and VR platforms to compare performance under different interfaces and control modes. Overall, due to the high cost and risk of field trials, specialized robot evaluation relies heavily on indirect means like high-fidelity simulation and scaled testing, with limited field trials serving primarily for model and hypothesis validation.

5.5. Comparative Analysis of Engineering Cases and Summary

To intuitively demonstrate how the aforementioned framework is implemented in engineering practice, a comparative analysis of typical cases is necessary. A representative approach is to select four projects oriented towards “High-Precision Assembly,” “Large-Scale Logistics,” “Human–Robot Collaborative Assembly,” and “Hazardous Environment Inspection” and analyzing their commonalities and differences in requirements for analysis, metric construction, method selection, and result application [33,136].

To explicitly demonstrate the practical implementation of the proposed framework, we detail a step-by-step evaluation paradigm using a large-scale warehousing Autonomous Mobile Robot (AMR) system as a representative engineering case based on the TESM logic: (1) Task Definition (T): The primary objective is high-throughput, continuous material transport across designated zones, necessitating autonomous navigation and dynamic obstacle avoidance [116]. (2) Environmental Constraints (E): The operational setting is a highly dynamic warehouse characterized by narrow aisles, frequent spatial occlusion by moving forklifts and human workers, and varying illumination, which introduces significant perception noise. (3) System Capabilities (S): The evaluation target encompasses both the individual AMRs (equipped with LiDAR, vision sensors, and local planners) and the centralized Fleet Management System (FMS) responsible for global routing [119,120]. (4) Metric Selection (M): Metrics are hierarchically mapped. At the perception level, Absolute Trajectory Error (ATE) and obstacle detection recall are prioritized; at the safety level, minimum braking distance and safe stopping response time are selected; at the task level, overall fleet throughput (orders per hour) and deadlock resolution rates are utilized [137]. (5) Evaluation Method and Decision Output: A virtual-real closed-loop strategy is employed. Initial collision risks and maximum throughput are stress-tested using Discrete Event Simulation (DES) [100]. Subsequently, physical testing validates the safety metrics under specific corner cases. Finally, an MCDM method is applied to balance deployment costs, system efficiency, and safety constraints, outputting the optimal AMR fleet configuration for the specific warehouse [97,98].

Beyond the detailed logistics scenario, the TESM framework adapts dynamically to the unique constraints of other robotic domains. For example, in high-precision manufacturing, the “Task” (T) dictates micron-level assembly, shifting the “Metric” (M) focus towards structural stiffness and absolute trajectory accuracy, while the “Environment” (E) remains highly controlled. Consequently, the evaluation methodology strictly relies on physical testing via laser trackers rather than discrete-event simulation [138]. Conversely, for specialized field robots operating in extreme, unstructured environments, the evaluation paradigm drastically prioritizes terrain traversability and system reliability (e.g., MTBF) under unpredictable disturbances [115]. To systematically elucidate these paradigm shifts and demonstrate how evaluation focal points transition across diverse engineering applications, the characteristics of four typical robot categories are summarized in the cross-domain comparison matrix presented in Table 8.

Through the longitudinal review of diverse engineering projects, several universally significant experiences can be summarized. First, successful performance evaluation often intervenes early in the requirements analysis phase rather than merely serving as a post-acceptance check. Operability is guaranteed only when evaluation feasibility and data availability are fully considered during the requirement-metric mapping phase. Second, the “Virtual–Real Closed Loop” is a common trend. Whether for kinematic calibration of industrial robots or adaptability testing of field robots, mechanism modeling, simulation, and physical testing are rarely used in isolation. Instead, they form a closed loop where “models drive test design, and test data feeds back into model correction.” Finally, in projects with multi-objective performance and multiple stakeholders, the focus of decision-making dictates method selection. As indicated in Table 8, ranging from the “Efficiency and Yield” orientation of industrial robots to the “Task Success Rate” orientation of field robots, the evaluation system must serve the ultimate engineering decision goal. Multi-Criteria Decision-Making (MCDM) methods are utilized to unify technical metrics, safety requirements, and economic constraints into a single framework, thereby enhancing the value and impact of evaluation results in real-world engineering.

6. Emerging Trends and Future Challenges in Robot Performance Evaluation

6.1. Data-Driven and Intelligent Evaluation Methodologies

In recent years, the systematic collection of extensive datasets has driven a paradigm shift in performance evaluation, necessitating a fundamental distinction between classical robotics metrics and data-driven metrics [15,139]. Classical metrics—such as kinematic repeatability, absolute positioning accuracy, and structural stiffness—rely on deterministic physical measurements and mechanism-based models, which are highly effective in controlled industrial scenarios. In contrast, for unstructured environments and AI-driven autonomy, traditional metrics fall short. Data-driven and AI-based metrics—such as mAP and F1-score for perception, ATE/RPE for localization, and reward convergence for learning-based control—are inherently probabilistic. They are specifically designed to assess an algorithm’s generalization, robustness, and performance degradation over long-term operation. The accumulation of vast operational data makes it possible to transition from limited offline testing to directly learning these probabilistic state-performance relationships from data [140,141]. Figure 7 illustrates a general framework for this data-driven intelligent evaluation, covering the entire process from multi-source data acquisition to model training and online inference.

(1) Performance Prediction: Performance prediction based on machine learning and statistical modeling aims to address “how performance evolves with operating conditions.” The core rationale is to select operational state variables (e.g., current, temperature, velocity), environmental variables, and process quality metrics highly correlated with performance as input features. Utilizing regression or classification models (e.g., Neural Networks, SVM, Gaussian Processes), the mapping laws between these inputs and output targets—such as end-effector trajectory error, cycle time, energy consumption, vibration levels, or task success rates—are characterized. Compared to traditional methods, data-driven prediction excels in handling high-dimensional, non-linear, and multi-source data. When constructing data-driven performance prediction models, processing multi-source sensor data under complex environmental disturbances (e.g., strong light interference) remains a major challenge. Introducing advanced attention mechanisms, such as spatial–temporal transformer architectures [142], can effectively capture dynamic fluctuations in time-series data, thereby enhancing the accuracy of online assessments and the system’s environmental adaptivity. To address performance uncertainty, modern prediction models move beyond single-value outputs, incorporating probabilistic graphical models or Bayesian learning frameworks to describe the distribution and confidence intervals of performance metrics. This probabilistic forecasting facilitates the setting of safety margins under given failure probability constraints in scenarios like navigation avoidance or human–robot collaboration, preventing the pursuit of “average optimality” at the expense of performance boundaries under extreme conditions [129,130,143,144].

(2) Health Monitoring (PHM): Prognostics and Health Management (PHM) focuses on evaluating the robot’s “performance retention capability” and “degradation trends” [145]. Through training and validation on long-cycle operational data, this method strives to identify potential anomaly patterns and fault precursors, predicting remaining performance margins or Remaining Useful Life (RUL). In engineering practice, to mitigate the poor generalization of purely data-driven models under data scarcity or sudden condition changes, a hybrid modeling strategy of “Mechanism Prior + Data-Driven” is often adopted. Aivaliotis et al. [146] utilized mechanism models to filter obvious physical violations, subsequently employing data models to capture subtle degradation features. This approach supports the transition from “corrective maintenance” to “condition-based preventive maintenance,” significantly enhancing system availability and reliability by real-time monitoring of key components (e.g., reducers, joint modules).

(3) Online Self-Assessment: In flexible manufacturing and complex field task execution, offline evaluation struggles to respond timely to dynamic changes, making online self-assessment mechanisms a research hotspot [14,139,140]. This method typically embeds lightweight evaluation modules within the control system to continuously estimate and track key performance metrics or their proxies (e.g., real-time navigation residuals, contact forces, friction states). Online self-assessment serves not only as a “monitor” but also as a feedback loop for the “controller.” As shown in Figure 7b, based on online evaluation results, the system can introduce adaptive control or reinforcement learning to progressively adjust trajectory planning, control gains, or task allocation under safety constraints, achieving self-optimization during operation. Considering reliability, engineering applications generally adopt a strategy of “offline learning as primary, online fine-tuning as auxiliary”: the main policy search is completed in simulation, followed by restricted-step online updates in the real system, with the online evaluation module providing necessary boundary constraints and safety triggers.

6.2. Standard Systems and Open Testing Platforms

As robot performance evaluation transitions from laboratories to engineering practice, unified benchmarking tasks, clear evaluation standards, and reproducible open platforms serve as the critical “common language” and “common ruler.” They enable objective horizontal comparisons across different equipment, algorithms, and systems, providing a basis for technology selection and industrial regulation. Table 9 summarizes the representative general standards, special benchmarks, and open testing resources in this field.

(1) International and Domestic Standards: Currently, a systematic standard system has formed in the industrial and collaborative robot domains. For Industrial Robots: Standards represented by ISO 9283 [27] specify test conditions and calculation methods for metrics like positioning accuracy, repeatability, and trajectory tracking, ensuring comparability across brands. For Collaborative Robots: Standards like ISO 15066 [131,147] introduce specific requirements for human–robot coexistence, detailing force/speed limits, power monitoring, and human contact testing. For Mobile Robots: Normative documents regarding navigation accuracy, obstacle avoidance safety, and functional safety for AGVs and AMRs are being refined by relevant organizations. Although existing standards focus primarily on underlying safety and single-unit performance, they constitute the cornerstone for building higher-level evaluation frameworks. Currently, while kinematic and safety dimensions for industrial and collaborative robots are well-covered by existing norms, there remains an insufficient standardization for mobile, service, and field robots. Specifically, unified testing protocols for AI-based perception in unstructured environments and full lifecycle performance assessment are still lacking, which highlights the urgent need for cross-domain standard development.

(2) Benchmark Tasks: Complementary to standards are capability-specific benchmark tasks. These benchmarks typically provide standardized task descriptions, environmental configurations, and evaluation metrics, driving technological iteration through challenges or leaderboards [148]. Manipulation and Grasping: Examples like the YCB Object Set and Grasping Challenge [149] focus on evaluating success rates, robustness, and diversity of manipulation. Navigation and SLAM: Datasets such as KITTI [137] and EuRoC [150] center on trajectory accuracy and environmental adaptability, evaluating algorithm performance under lighting variations and feature-sparse environments. Human–Robot Interaction: Standardized dialogue tasks or collaborative workflows assess interaction naturalness, efficiency, and user experience. Benchmark tasks effectively compensate for the insufficient coverage of general standards in specific application scenarios (e.g., complex assembly, unstructured navigation) and are essential tools for rapid technology screening.

(3) Open Testing Platforms: Open datasets and testing platforms provide the physical and data support for “reproducibility.” Open Datasets: Covering multi-weather, multi-scenario sensor data (images, point clouds, tactile data), they allow researchers to train and evaluate algorithms without expensive hardware. Physical/Virtual Test Sites: An increasing number of laboratories and enterprises are establishing open test grounds (e.g., autonomous driving test zones) or cloud-based simulation platforms, enabling external teams to access standardized evaluation services.

Table 9. Summary of representative standards, benchmark tasks, and open datasets for robot performance evaluation.

Application Areas	Benchmark or Platform Name	Type	Main Content and Test Scenarios	Core Evaluation Indicators
Catch and manipulate	YCB Object & Model Set [149]	Dataset	It contains 77 everyday objects that have been scanned with high precision, providing geometric models and physical properties	Grasping success rate, object recognition accuracy, 6D pose estimation error
Catch and manipulate	OCRTOC (Open Cloud Robot Table Organization Challenge) [151]	Competitions and Platforms	Desktop organization tasks include object recognition, grabbing, planning, and placement in cluttered environments	Task completion rate, average planning time, and scene cleanup level
Mobility and Navigation	KITTI Vision Benchmark [137]	Dataset	For outdoor scenarios involving autonomous driving, it includes true data from binocular vision, LiDAR, and GPS or IMU	Visual odometry drift rate, SLAM positioning error, and 3D target detection accuracy
	EuRoC MAV Datasets [150]	Dataset	Provides visual–inertial data for the complex indoor environment of micro-aircraft	Trajectory tracking accuracy, attitude estimation error, and robustness to changes in illumination
	Barometer (DARPA SubT) [152]	Simulation competition	Exploration and search and rescue in underground and extreme environments, with an emphasis on environments without GPS	Explore regional coverage, number of artifact detections, and autonomous data transmission efficiency
Human–computer interaction and services	RoboCup@Home [153]	Competition	Home service scenarios include tasks such as “following someone,” “taking out the trash,” and “housekeeping services.”	Total task completion score, naturalness of human–computer interaction, operational security, and user satisfaction
Human–computer interaction and services	H3.6M (Human3.6M) [154]	Dataset	Large-scale human motion capture data is used for human behavior prediction and intent understanding in human–computer collaboration	Human posture prediction error, motion classification accuracy
Simulation and Learning Platform	Isaac Gym/ Orbit [155]	Simulation platform	A high-fidelity physics simulation environment that supports large-scale parallel reinforcement learning training	Algorithm convergence speed, Sim2Real (virtual-to-real) success rate, and physical interaction realism
Simulation and Learning Platform	OpenAI Gym/ Gymnasium [156]	Algorithm library	It provides a standardized API interface and a classic control environment for evaluating the performance of RL algorithms	Cumulative reward value, algorithm stability, and sample efficiency

6.3. Future Trends and Challenges

(1) Unification and Transferability of Metric Systems: Despite the systematic TESM framework proposed herein, fragmentation of metric systems and isolated evaluation methods remain prevalent across industries and manufacturers. The use of disparate metric combinations and test procedures for the same robot type weakens horizontal comparability and longitudinal traceability. Furthermore, many evaluation methods exhibit strong scenario dependency; conclusions drawn from specific test sites are difficult to transfer directly to other factories. Consequently, constructing a metric system with unified semantics and a generic structure while retaining necessary scenario differentiation is a long-term challenge. This requires broad consensus among academia, suppliers, and end-users to promote unified reference models and hierarchical evaluation frameworks at the standardization organization level. In this context, the proposed TESM framework can serve as a foundational reference. By providing a standardized semantic structure that directly links environmental constraints and task requirements to measurable metrics, TESM offers a methodological pathway to develop future standards for mobile, service, and AI-driven robots.

(2) Evaluation in Complex Scenarios and Long-Term Operation: Another prominent challenge lies in truthfully capturing “actual performance” under complex and long-term operating conditions. Current evaluations are predominantly short-term, centralized laboratory tests with idealized environments, failing to fully cover disturbances and extreme conditions present in field operations. Crucial metrics such as system reliability, availability, and lifecycle costs require long-term operational data, which is often unavailable in short project cycles. A future direction is to systematically embed data acquisition and online performance monitoring mechanisms into robotic systems. By utilizing unified log formats and privacy-preserving mechanisms, a cross-project performance database can be accumulated, bridging the cognitive gap between “short-term test results” and “long-term real-world performance” via data-driven methods.

(3) Evaluation of Human Factors and Social Impact: As robots move into public spaces and human–robot co-existing environments, the scope of performance evaluation is expanding beyond traditional technical metrics. Current evaluations focus heavily on “hard metrics” (accuracy, speed, safety compliance), with insufficient attention to “soft metrics” such as operator workload, learning costs, psychological safety, and impacts on employment structures. Future research must introduce systematic modules for human factors and social impact evaluation, such as constructing human–robot collaboration, comfort, and trust assessment methods based on physiological/behavioral indicators, and quantifying ethical risks based on compliance frameworks. Integrating these “soft metrics” with traditional “hard metrics” into multi-criteria decision-making without significantly increasing costs remains a direction worth exploring.

(4) Reliability and Interpretability of Data-Driven Evaluation: Data-driven and intelligent evaluation methods introduce new issues regarding model reliability, interpretability, and safety. On one hand, model performance heavily depends on training data coverage; significant deviations between deployment and training environments can lead to systematic bias. On the other hand, complex deep models often lack transparency. Addressing these issues requires efforts on multiple fronts: introducing physical priors and safety constraints into model structures; establishing strict quality control in data management; and deploying uncertainty estimation and degradation detection mechanisms. Ultimately, building intelligent evaluation systems with formally verifiable properties to achieve certifiable levels of interpretability and safety is key to their adoption in high-stakes engineering fields. Furthermore, to address the prevalent ’black box’ problem in AI-driven assessments, leveraging Large Language Models (LLMs) alongside standardized data models for the semi-automated programming of testing scripts and the interpretation of results is emerging as a promising pathway to improve system transparency and evaluation efficiency [157].

(5) Lifecycle and Cross-Domain Performance Management: Synthesizing the above factors (as shown in Figure 8), robot performance evaluation is expected to evolve from “single acceptance testing” to a comprehensive performance management system spanning design, manufacturing, integration, operation, and decommissioning.

In this vision, metric systems become dynamic knowledge structures that update with changes in tasks and environments; evaluation activities merge with digital twins and predictive maintenance to support continuous optimization. Simultaneously, the increasing commonality of evaluation issues across domains lays the foundation for cross-domain method libraries. Future efforts aiming to form unified evaluation frameworks and data interfaces at the common level, while retaining domain-specific constraints at the individual level, hold the promise of establishing a robust robot performance evaluation ecosystem.

6.4. Limitations of the Review

This review has a few limitations. First, our search was limited to English peer-reviewed papers in major databases, which means we might have missed some regional technical reports, non-English publications, or industry white papers. Second, because the robots discussed vary widely—from fixed industrial arms to mobile service robots—their evaluation metrics are highly domain-specific. This makes it difficult to conduct a quantitative meta-analysis, so our study mainly focuses on a qualitative summary and classification. Finally, as data-driven models and generative AI are increasingly used in robotics, traditional deterministic evaluation metrics are being challenged. How to effectively benchmark these AI-based dynamic behaviors is still an open question for future research.

7. Conclusions and Future Prospects

This paper systematically constructed a robot performance evaluation framework based on a multi-dimensional Task–Environment–System–Metric (TESM) perspective. Moving beyond descriptive summaries, our review yields several specific findings for engineering applications. First, the evaluation paradigm has fundamentally shifted from static acceptance testing to full-lifecycle performance management, requiring a multi-dimensional synergy of “Technology–Human–Economy”. Second, while mechanism modeling and physical testing remain foundational, their deep fusion with virtual simulation (e.g., virtual–real integration) is now indispensable for assessing dynamic environmental interference. Third, although data-driven evaluations offer immense potential for health monitoring and online self-assessment, their practical deployment is currently bottlenecked by “black-box” interpretability issues and a critical lack of cross-scenario standardization.

Consequently, to support the reliable deployment of robotic systems, future research must prioritize advancing unified, hierarchical metric standards and closed-loop management platforms spanning design, operation, and maintenance. Furthermore, deepening the fusion of mechanism priors with data-driven methods to balance evaluation accuracy with interpretability, along with systematically incorporating human factors and ethical impacts, will be critical for the responsible application of next-generation robots in complex social environments.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/technologies14050297/s1, Table S1: PRISMA 2020 Checklist [158].

Author Contributions

Conceptualization, X.W. and B.Z.; methodology, X.W. and S.P.; formal analysis, X.W.; investigation, X.W. and S.P.; resources, B.Z.; data curation, X.W.; writing—original draft preparation, X.W.; writing—review and editing, B.Z. and S.P.; visualization, X.W.; supervision, B.Z.; project administration, B.Z.; funding acquisition, B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by LiaoNing Revitalization Talents Program, grant number XLYC2411074. The APC was funded by the authors’ institution.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets generated and/or analysed during the current study are ot publicly available due to the trade secrets involved but are available from the corresponding author on reasonable request.

Acknowledgments

The authors would like to thank the technical staff at the National Robot Quality Inspection and Testing Center (Liaoning) for their support in conducting the experiments. During the preparation of this manuscript, the authors used OpenAI ChatGPT (version GPT-8.13, 2025) for assistance in drafting text. The authors have reviewed and edited all generated content and take full responsibility for the final scientific content and conclusions of this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TESM	Task–Environment–System–Metric
HRI	Human–Robot Interaction
MCDM	Multi-Criteria Decision-Making
KPRs	Key Performance Requirements
CSFs	Critical Success Factors
AHP	Analytic Hierarchy Process
HILS	Hardware-in-the-Loop Simulation
DES	Discrete Event Simulation
ATE	Absolute Trajectory Error
RPE	Relative Pose Error
SLAM	Simultaneous Localization and Mapping
AMR	Autonomous Mobile Robot
AGV	Automated Guided Vehicle
PFL	Power and Force Limiting
NASA-TLX	NASA Task Load Index
IP	Ingress Protection
DT	Digital Twin
PHM	Prognostics and Health Management
RUL	Remaining Useful Life
TOPSIS	Technique for Order Preference by Similarity to Ideal Solution
VIKOR	VlseKriterijumska Optimizacija I Kompromisno Resenje
BWM	Best-Worst Method
EDAS	Evaluation based on Distance from Average Solution
ISO	International Organization for Standardization
CMM	Coordinate Measuring Machine
IMU	Inertial Measurement Unit
CAD	Computer-Aided Design
VR	Virtual Reality
O&M	Operation and Maintenance
OED	Orthogonal Experimental Design
RSM	Response Surface Methodology
SVM	Support Vector Machine
GPS	Global Positioning System
LiDAR	Light Detection and Ranging
AI	Artificial Intelligence
MTBF	Mean Time Between Failures
ROS	Robot Operating System

References

Araujo, H.; Mousavi, M.R.; Varshosaz, M. Testing, validation, and verification of robotic and autonomous systems: A systematic review. ACM Trans. Softw. Eng. Methodol. 2023, 32, 51. [Google Scholar] [CrossRef]
Chukwurah, N.; Adebayo, A.S.; Ajayi, O.O. Sim-to-real transfer in robotics: Addressing the gap between simulation and real-world performance. Int. J. Robot. Simul. 2024, 6, 89–102. [Google Scholar] [CrossRef]
Caldas, R.; García, J.A.P.; Schiopu, M.; Pelliccione, P.; Rodrigues, G.; Berger, T. Runtime verification and field-based testing for ROS-based robotic systems. IEEE Trans. Softw. Eng. 2024, 50, 2544–2567. [Google Scholar]
Bi, Z.M.; Miao, Z.; Zhang, B.; Zhang, C.W. The state of the art of testing standards for integrated robotic systems. Robot. Comput.-Integr. Manuf. 2020, 63, 101893. [Google Scholar] [CrossRef]
Vaghani, B.M. Robotic systems for minimally invasive surgery: Enhancing precision, safety, and real-time feedback through industry 4.0 and 5.0. Clin. Med. Health Res. J. 2025, 5, 1368–1381. [Google Scholar] [CrossRef]
Hassanzadeh, A.; Moradi, S.; Burton, H.V. Performance-based design optimization of structures: State-of-the-art review. J. Struct. Eng. 2024, 150, 03124001. [Google Scholar] [CrossRef]
Apraiz, A.; Lasa, G.; Mazmela, M. Evaluation of user experience in human–robot interaction: A systematic literature review. Int. J. Soc. Robot. 2023, 15, 187–210. [Google Scholar] [CrossRef]
El-Meligy, M.A.; Mahmoud, H.A.; Sarhan, N.; Awwad, E.M. A configurable process control method for robotic system-based industrial service improvements. J. Eng. Res. 2025, 13, 579–589. [Google Scholar] [CrossRef]
Tian, Y.; Chen, C.; Sagoe-Crentsil, K.; Zhang, J.; Duan, W. Intelligent robotic systems for structural health monitoring: Applications and future trends. Autom. Constr. 2022, 139, 104273. [Google Scholar] [CrossRef]
Lau, K.; Hocken, R. A survey of current robot metrology methods. CIRP Ann. 1984, 33, 485–488. [Google Scholar]
Slamani, M.; Joubair, A.; Bonev, I.A. A comparative evaluation of three industrial robots using three reference measuring techniques. Ind. Robot. Int. J. 2015, 42, 572–585. [Google Scholar] [CrossRef]
Jaber, A.A. Design of an Intelligent Embedded System for Condition Monitoring of an Industrial Robot; Springer: Cham, Switzerland, 2016. [Google Scholar]
Bi, Z.; Miao, Z.; Zhang, B.; Zhang, C.W. Framework for performance assessment of heterogeneous robotic systems. IEEE Syst. J. 2020, 15, 1191–1201. [Google Scholar] [CrossRef]
Kakolu, S.; Faheem, M.A. Autonomous robotics in field operations: A data-driven approach to optimize performance and safety. Iconic Res. Eng. J. 2023, 7, 565–578. [Google Scholar]
Aali, M. Learning-Based Safety-Critical Control Under Uncertainty with Applications to Mobile Robots. 2025. Available online: https://hdl.handle.net/10012/21468 (accessed on 9 May 2026).
Haskard, A.; Herath, D. Secure robotics: Navigating challenges at the nexus of safety, trust, and cybersecurity in cyber-physical systems. ACM Comput. Surv. 2025, 57, 1–48. [Google Scholar] [CrossRef]
Firoozi, R.; Tucker, J.; Tian, S.; Majumdar, A.; Sun, J.; Liu, W.; Zhu, Y.; Song, S.; Kapoor, A.; Hausman, K.; et al. Foundation models in robotics: Applications, challenges, and the future. Int. J. Robot. Res. 2025, 44, 701–739. [Google Scholar] [CrossRef]
Ge, J.; Zhang, J.; Chang, C.; Zhang, Y.; Yao, D.; Li, L. Task-driven controllable scenario generation framework based on AOG. IEEE Trans. Intell. Transp. Syst. 2024, 25, 6186–6199. [Google Scholar] [CrossRef]
Peng, Y.; Han, J.; Zhang, Z.; Fan, L.; Liu, T.; Qi, S.; Feng, X.; Ma, Y.; Wang, Y.; Zhu, S.C. The tong test: Evaluating artificial general intelligence through dynamic embodied physical and social interactions. Engineering 2024, 34, 12–22. [Google Scholar]
Akalin, N.; Loutfi, A. Reinforcement learning approaches in social robotics. Sensors 2021, 21, 1292. [Google Scholar] [CrossRef] [PubMed]
Spielberg, A.; Amini, A.; Chin, L.; Matusik, W.; Rus, D. Co-learning of task and sensor placement for soft robotics. IEEE Robot. Autom. Lett. 2021, 6, 1208–1215. [Google Scholar] [CrossRef]
Tahir, N.; Parasuraman, R. Edge computing and its application in robotics: A survey. J. Sens. Actuator Netw. 2025, 14, 65. [Google Scholar] [CrossRef]
Michalos, G.; Spiliotopoulos, J.; Makris, S.; Chryssolouris, G. A method for planning human robot shared tasks. CIRP J. Manuf. Sci. Technol. 2018, 22, 76–90. [Google Scholar] [CrossRef]
Cheng, T.; Teizer, J.; Migliaccio, G.C.; Gatti, U.C. Automated task-level activity analysis through fusion of real time location sensors and worker’s thoracic posture data. Autom. Constr. 2013, 29, 24–39. [Google Scholar] [CrossRef]
Miller, D.J.; Lennox, R.C. An object-oriented environment for robot system architectures. IEEE Control Syst. Mag. 2002, 11, 14–23. [Google Scholar]
Aller, F.; Pinto-Fernandez, D.; Torricelli, D.; Pons, J.L.; Mombaur, K. From the state of the art of assessment metrics toward novel concepts for humanoid robot locomotion benchmarking. IEEE Robot. Autom. Lett. 2019, 5, 914–920. [Google Scholar] [CrossRef]
ISO 9283; Manipulating Industrial Robots–Performance Criteria and Related Test Methods. International Organization for Standardization: Geneva, Switzerland, 1988.
Keutzer, K.; Newton, A.R.; Rabaey, J.M.; Sangiovanni-Vincentelli, A. System-level design: Orthogonalization of concerns and platform-based design. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2000, 19, 1523–1543. [Google Scholar] [CrossRef]
Duflou, J.; Kellens, K.; Dewulf, W. Unit process impact assessment for discrete part manufacturing: A state of the art. CIRP J. Manuf. Sci. Technol. 2011, 4, 129–135. [Google Scholar] [CrossRef]
Khan, J. A Standardized Process Flow for Creating and Maintaining Component Level Hardware in the Loop Simulation Test Bench; Technical Report, SAE Technical Paper; SAE International: Warrendale, PA, USA, 2016. [Google Scholar]
Nilakantan, J.M.; Huang, G.Q.; Ponnambalam, S.G. An investigation on minimizing cycle time and total energy consumption in robotic assembly line systems. J. Clean. Prod. 2015, 90, 311–325. [Google Scholar] [CrossRef]
Wang, T.; Zhan, W. Design and control a hybrid human–machine collaborative manufacturing system in operational management technology to enhance human–machine collaboration. Int. J. Adv. Manuf. Technol. 2024, 1–11. [Google Scholar] [CrossRef]
Brosque, C.; Fischer, M. A robot evaluation framework comparing on-site robots with traditional construction methods. Constr. Robot. 2022, 6, 187–206. [Google Scholar] [CrossRef]
Wang, C.; Yang, Z.; Li, Z.S.; Damian, D.; Lo, D. Quality assurance for artificial intelligence: A study of industrial concerns, challenges and best practices. arXiv 2024, arXiv:2402.16391. [Google Scholar]
Tulli, S.K.C. Warehouse Layout Optimization: Techniques for Improved Order Fulfillment Efficiency. Int. J. Acta Inform. 2023, 2, 138–168. [Google Scholar]
Jiang, T.; Cui, H.; Cheng, X.; Tian, W. A measurement method for robot peg-in-hole prealignment based on combined two-level visual sensors. IEEE Trans. Instrum. Meas. 2020, 70, 5000912. [Google Scholar] [CrossRef]
Zanchettin, A.M.; Ceriani, N.M.; Rocco, P.; Ding, H.; Matthias, B. Safety in human–robot collaborative manufacturing environments: Metrics and control. IEEE Trans. Autom. Sci. Eng. 2015, 13, 882–893. [Google Scholar] [CrossRef]
Costanzo, M.; De Maria, G.; Lettera, G.; Natale, C. A multimodal approach to human safety in collaborative robotic workcells. IEEE Trans. Autom. Sci. Eng. 2021, 19, 1202–1216. [Google Scholar] [CrossRef]
Qin, Z.; Wang, P.; Sun, J.; Lu, J.; Qiao, H. Precise robotic assembly for large-scale objects based on automatic guidance and alignment. IEEE Trans. Instrum. Meas. 2016, 65, 1398–1411. [Google Scholar] [CrossRef]
Mei, B.; Liang, Z.; Xie, Y.; Fu, Y.; Yang, Y. Positioning accuracy enhancement of a robotic assembly system for thin-walled aerostructure assembly. J. Ind. Inf. Integr. 2023, 35, 100518. [Google Scholar] [CrossRef]
Kluz, R.; Trzepieciński, T. The repeatability positioning analysis of the industrial robot arm. Assem. Autom. 2014, 34, 285–295. [Google Scholar] [CrossRef]
Kayacan, E.; Chowdhary, G. Tracking error learning control for precise mobile robot path tracking in outdoor environment. J. Intell. Robot. Syst. 2019, 95, 975–986. [Google Scholar] [CrossRef]
Hassan, I.A.; Abed, I.A.; Al-Hussaibi, W.A. Path planning and trajectory tracking control for two-wheel mobile robot. J. Robot. Control (JRC) 2024, 5, 1–15. [Google Scholar] [CrossRef]
Bowling, A.; Khatib, O. The dynamic capability equations: A new tool for analyzing robotic manipulator performance. IEEE Trans. Robot. 2005, 21, 115–123. [Google Scholar] [CrossRef]
Eswaran, M.; Kumar Inkulu, A.; Tamilarasan, K.; Bahubalendruni, M.R.; Jaideep, R.; Faris, M.S.; Jacob, N. Optimal layout planning for human robot collaborative assembly systems and visualization through immersive technologies. Expert Syst. Appl. 2024, 241, 122465. [Google Scholar] [CrossRef]
Kishi, Y.; Yamada, Y.; Yokoyama, K. The role of joint stiffness enhancing collision reaction performance of collaborative robot manipulators. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 376–381. [Google Scholar]
Pang, G.; Yang, G.; Heng, W.; Ye, Z.; Huang, X.; Yang, H.Y.; Pang, Z. CoboSkin: Soft robot skin with variable stiffness for safer human–robot collaboration. IEEE Trans. Ind. Electron. 2020, 68, 3303–3314. [Google Scholar] [CrossRef]
Khosravani, M.R.; Haghighi, A. Large-scale automated additive construction: Overview, robotic solutions, sustainability, and future prospect. Sustainability 2022, 14, 9782. [Google Scholar] [CrossRef]
Mehrotra, T.; Shetty, S. An innovation of energy harvesting for small scale robotics in automation industry. In Proceedings of the 2023 International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballar, India, 29–30 April 2023; pp. 1–6. [Google Scholar]
Kolar, P.; Benavidez, P.; Jamshidi, M. Survey of datafusion techniques for laser and vision based sensor integration for autonomous navigation. Sensors 2020, 20, 2180. [Google Scholar] [CrossRef]
Básaca-Preciado, L.C.; Sergiyenko, O.Y.; Rodríguez-Quinonez, J.C.; García, X.; Tyrsa, V.V.; Rivas-Lopez, M.; Hernandez-Balbuena, D.; Mercorelli, P.; Podrygalo, M.; Gurko, A.; et al. Optical 3D laser measurement system for navigation of autonomous mobile robot. Opt. Lasers Eng. 2014, 54, 159–169. [Google Scholar] [CrossRef]
Park, K.M.; Park, Y.; Yoon, S.; Park, F.C. Collision detection for robot manipulators using unsupervised anomaly detection algorithms. IEEE/ASME Trans. Mechatron. 2021, 27, 2841–2851. [Google Scholar]
Kim, A.; Eustice, R.M. Perception-driven navigation: Active visual SLAM for robotic area coverage. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 3196–3203. [Google Scholar]
Gusrial, M.H.; Othman, N.A.; Ahmad, H.; Hassan, M.H.A. Review of Kalman filter variants for SLAM in mobile robotics with linearization and covariance initialization. J. Mechatron. Electr. Power Veh. Technol. 2025, 16, 69–83. [Google Scholar] [CrossRef]
Dias, M.B.; Zinck, M.; Zlot, R.; Stentz, A. Robust multirobot coordination in dynamic environments. In Proceedings of the IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, New Orleans, LA, USA, 26 April–1 May 2004; Volume 4, pp. 3435–3442. [Google Scholar]
Chen, B.; Hua, C.; Dai, B.; He, Y.; Han, J. Online control programming algorithm for human–robot interaction system with a novel real-time human gesture recognition method. Int. J. Adv. Robot. Syst. 2019, 16, 1729881419861764. [Google Scholar]
Hagras, H.; Callaghan, V.; Collry, M. Outdoor mobile robot learning and adaptation. IEEE Robot. Autom. Mag. 2001, 8, 53–69. [Google Scholar] [CrossRef]
Steiner, J.A.; He, X.; Bourne, J.R.; Leang, K.K. Open-sector rapid-reactive collision avoidance: Application in aerial robot navigation through outdoor unstructured environments. Robot. Auton. Syst. 2019, 112, 211–220. [Google Scholar] [CrossRef]
Pang, G.; Yang, G.; Pang, Z. Review of robot skin: A potential enabler for safe collaboration, immersive teleoperation, and affective interaction of future collaborative robots. IEEE Trans. Med. Robot. Bionics 2021, 3, 681–700. [Google Scholar] [CrossRef]
Li, W.; Hu, Y.; Zhou, Y.; Pham, D.T. Safe human–robot collaboration for industrial settings: A survey. J. Intell. Manuf. 2024, 35, 2235–2261. [Google Scholar] [CrossRef]
Gualtieri, L.; Rauch, E.; Vidoni, R. Development and validation of guidelines for safety in human-robot collaborative assembly systems. Comput. Ind. Eng. 2022, 163, 107801. [Google Scholar] [CrossRef]
Hopko, S.K.; Khurana, R.; Mehta, R.K.; Pagilla, P.R. Effect of cognitive fatigue, operator sex, and robot assistance on task performance metrics, workload, and situation awareness in human-robot collaboration. IEEE Robot. Autom. Lett. 2021, 6, 3049–3056. [Google Scholar] [CrossRef]
Ali, A.R.; Kamal, H. Time-to-fault prediction framework for automated manufacturing in humanoid robotics using deep learning. Technologies 2025, 13, 42. [Google Scholar] [CrossRef]
Alaka, H.T.O.; Mpofu, K.; Ramatsetse, B.; Adegbola, T.A.; Adeoti, M.O. Developing reliability centered maintenance in automotive robotic welding machines for a tier 1 supplier. Front. Robot. AI 2025, 12, 1620370. [Google Scholar] [CrossRef] [PubMed]
Tiwari, A.; Kumar, S.; Sharma, R.K.; Mehdi, H.; Saroha, M. Analysing the reliability factors of a robot utilized within an FMC comprising two machines and one robot. Int. J. Interact. Des. Manuf. (IJIDeM) 2025, 19, 4517–4531. [Google Scholar] [CrossRef]
Alapati, H.; Nehru, J.; Ketha, P. Cost-effectiveness and return on the investment analysis for necrobotic systems. In Necrobotics for Healthcare Applications and Management; Elsevier: Amsterdam, The Netherlands, 2025; pp. 181–193. [Google Scholar]
Varadharajan, V.S. Communication, Coordination and Organization of Practical Robot Swarms; Ecole Polytechnique: Montreal, QC, Canada, 2022. [Google Scholar]
Zhang, P.; Yao, Z.; Du, Z. Global performance index system for kinematic optimization of robotic mechanism. J. Mech. Des. 2014, 136, 031001. [Google Scholar] [CrossRef]
Chen, M.; Liang, H.; He, C.; Zhang, Y.; Huang, L. Research on error analysis and calibration method of 3-PUU parallel robot. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2025, 239, 1351–1364. [Google Scholar] [CrossRef]
Zhou, Z.; Gosselin, C. Simplified inverse dynamic models of parallel robots based on a Lagrangian approach. Meccanica 2024, 59, 657–680. [Google Scholar] [CrossRef]
Wu, K.; Li, J.; Zhao, H.; Zhong, Y. Review of industrial robot stiffness identification and modelling. Appl. Sci. 2022, 12, 8719. [Google Scholar] [CrossRef]
Ma, K.; Xu, F.; Xu, Q.; Gao, S.; Jiang, G.P. Trajectory error compensation method for grinding robots based on kinematic calibration and joint variable prediction. Robot. Comput.-Integr. Manuf. 2025, 92, 102889. [Google Scholar] [CrossRef]
Nonoyama, K.; Liu, Z.; Fujiwara, T.; Alam, M.M.; Nishi, T. Energy-efficient robot configuration and motion planning using genetic algorithm and particle swarm optimization. Energies 2022, 15, 2074. [Google Scholar] [CrossRef]
Sun, Y.; Tang, Y.; Zheng, J.; Dong, D.; Bai, L. Optimal variable stiffness control and its applications in bionic robotic joints: A review. J. Bionic Eng. 2023, 20, 417–435. [Google Scholar] [CrossRef]
Li, Y.; Gao, G.; Na, J.; Xing, Y. Error sensitivity flexibility compensation of joints for improving the positioning accuracy of industrial robots. IEEE Trans. Autom. Sci. Eng. 2024, 22, 5335–5348. [Google Scholar] [CrossRef]
Chen, Z.; Renda, F.; Le Gall, A.; Mocellin, L.; Bernabei, M.; Dangel, T.; Ciuti, G.; Cianchetti, M.; Stefanini, C. Data-driven methods applied to soft robot modeling and control: A review. IEEE Trans. Autom. Sci. Eng. 2024, 22, 2241–2256. [Google Scholar] [CrossRef]
Collins, J.; Chand, S.; Vanderkop, A.; Howard, D. A review of physics simulators for robotic applications. IEEE Access 2021, 9, 51416–51431. [Google Scholar] [CrossRef]
Muratore, F.; Ramos, F.; Turk, G.; Yu, W.; Gienger, M.; Peters, J. Robot learning from randomized simulations: A review. Front. Robot. AI 2022, 9, 799893. [Google Scholar] [CrossRef]
Rega, A.; Pasquariello, A.; Vitolo, F.; Cirillo, P.; Patalano, S. Computer-Aided Design and Multibody Modelling Integrated Approach for Virtual Prototyping of Customized Industrial Systems. In Proceedings of the International Conference of the Italian Association of Design Methods and Tools for Industrial Engineering, Palermo, Italy, 11–13 September 2024; pp. 385–392. [Google Scholar]
Demirtas, S.; Cankurt, T.; Samur, E. Development and implementation of a collaborative workspace for industrial robots utilizing a practical path adaptation algorithm and augmented reality. Mechatronics 2022, 84, 102764. [Google Scholar] [CrossRef]
Fei, T.; Mukhopadhyay, S.C.; Da Costa, J.P.J.; RoyChaudhuri, C.; Lan, L.; Demitri, N. Spatial environment perception and sensing in automated systems: A review. IEEE Sens. J. 2024, 24, 21813–21833. [Google Scholar] [CrossRef]
Durst, P.J.; McInnis, D.; Davis, J.; Goodin, C.T. A novel framework for verification and validation of simulations of autonomous robots. Simul. Model. Pract. Theory 2022, 117, 102515. [Google Scholar] [CrossRef]
Chen, K.; Cao, R.; James, S.; Li, Y.; Liu, Y.H.; Abbeel, P.; Dou, Q. Sim-to-real 6d object pose estimation via iterative self-training for robotic bin picking. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 533–550. [Google Scholar]
Alhazmi, K.; Sarathy, S.M. Dynamic optimizers for complex industrial systems via direct data-driven synthesis. Commun. Eng. 2025, 4, 25. [Google Scholar] [CrossRef]
Iranshahi, K.; Brun, J.; Arnold, T.; Sergi, T.; Müller, U.C. Digital twins: Recent advances and future directions in engineering fields. Intell. Syst. Appl. 2025, 26, 200516. [Google Scholar] [CrossRef]
Pandy, G.; Pugazhenthi, V.J.; Jeyarajan, B.; Murugan, A. Advancing Robotics Testing: A Novel Framework for Adaptive and Scalable Evaluation. In Proceedings of the 2025 17th International Conference on Computer and Automation Engineering (ICCAE), Perth, Australia, 20–22 March 2025; pp. 1–7. [Google Scholar]
Liu, Y.; Cui, X.; Fan, S.; Wang, Q.; Liu, Y.; Sun, Y.; Wang, G. Dynamic validation of calibration accuracy and structural robustness of a multi-sensor mobile robot. Sensors 2024, 24, 3896. [Google Scholar] [CrossRef] [PubMed]
Haque, E.A. The Role of Calibration Engineering In Strengthening Reliability of US Advanced Manufacturing Systems Through Artificial Intelligence. Rev. Appl. Sci. Technol. 2025, 4, 820–851. [Google Scholar]
Sigron, P.; Aschwanden, I.; Bambach, M. Compensation of geometric, backlash, and thermal drift errors using a universal industrial robot model. IEEE Trans. Autom. Sci. Eng. 2023, 21, 6615–6627. [Google Scholar] [CrossRef]
Etoundi, A.C.; Dobner, A.; Agrawal, S.; Semasinghe, C.L.; Georgilas, I.; Jafari, A. A robotic test rig for performance assessment of prosthetic joints. Front. Robot. AI 2022, 8, 613579. [Google Scholar] [CrossRef]
Evans, M. Optimisation of Manufacturing Processes: A Response Surface Approach; CRC Press: Boca Raton, FL, USA, 2022. [Google Scholar]
Belgacem, H.; Chihi, I. Toward Reliable and Intelligent Sensor Systems: A Comprehensive Study of Fault Diagnosis and Mitigation. IEEE Sens. Rev. 2025, 2, 511–536. [Google Scholar]
Park, J.g.; Yoon, H.; Youn, B.D. Probabilistic framework for reliable optimal design of gearboxes in general-purpose industrial robots considering random use conditions. J. Comput. Des. Eng. 2023, 10, 539–548. [Google Scholar] [CrossRef]
Kaya, I.; Karaşan, A.; Özkan, B.; Colak, M. An integrated decision-making methodology based on Pythagorean fuzzy sets for social robot evaluation. Soft Comput. 2022, 26, 9831–9858. [Google Scholar] [CrossRef]
Yang, S.; Sun, C.; Li, B.; Li, L. Multi-Target Coverage Trajectory Planning for Ceiling Painting Robot Chassis via Two-Stage Optimization. IEEE Trans. Autom. Sci. Eng. 2025, 22, 20112–20125. [Google Scholar] [CrossRef]
Yazdi, M. Reliability-centered design and system resilience. In Advances in Computational Mathematics for Industrial System Reliability and Maintainability; Springer: Cham, Switzerland, 2024; pp. 79–103. [Google Scholar]
Gamal, A.; Mohamed, M. A hybrid MCDM approach for industrial robots selection for the automotive industry. Neutrosophic Syst. Appl. 2023, 4, 1–11. [Google Scholar] [CrossRef]
Goswami, S.S.; Behera, D.K.; Afzal, A.; Razak Kaladgi, A.; Khan, S.A.; Rajendran, P.; Subbiah, R.; Asif, M. Analysis of a robot selection problem using two newly developed hybrid MCDM models of TOPSIS-ARAS and COPRAS-ARAS. Symmetry 2021, 13, 1331. [Google Scholar] [CrossRef]
Swethaa, S.; Felix, A. An intuitionistic dense fuzzy AHP-TOPSIS method for military robot selection. J. Intell. Fuzzy Syst. 2023, 44, 6749–6774. [Google Scholar] [CrossRef]
Khan, N.A.; Kumar, A.; Rao, N. A hybrid robot selection model for efficient decisive support system using fuzzy logic and genetic algorithm. Int. J. Syst. Assur. Eng. Manag. 2024, 15, 2120–2129. [Google Scholar] [CrossRef]
Rashid, T.; Ali, A.; Chu, Y.M. Hybrid BW-EDAS MCDM methodology for optimal industrial robot selection. PLoS ONE 2021, 16, e0246738. [Google Scholar] [CrossRef]
Sahoo, S.K.; Goswami, S.S. A comprehensive review of multiple criteria decision-making (MCDM) methods: Advancements, applications, and future directions. Decis. Mak. Adv. 2023, 1, 25–48. [Google Scholar] [CrossRef]
Praneeth, B.B.; Nadeem, S.P.; Vimal, K.; Kandasamy, J. Performance measurement of e-commerce supply chains using BWM and fuzzy TOPSIS. Int. J. Qual. Reliab. Manag. 2023, 40, 1259–1291. [Google Scholar] [CrossRef]
Lin, K.Y.; Chang, K.H.; Lin, Y.W.; Wu, M.J. Exploring key considerations for artificial intelligence robots in home healthcare using the unified theory of acceptance and use of technology and the fuzzy analytical hierarchy process method. Systems 2025, 13, 25. [Google Scholar] [CrossRef]
Moeller, C.; Schmidt, H.C.; Koch, P.; Boehlmann, C.; Kothe, S.; Wollnack, J.; Hintze, W. Real time pose control of an industrial robotic system for machining of large scale components in aerospace industry using laser tracker system. SAE Int. J. Aerosp. 2017, 10, 100–108. [Google Scholar] [CrossRef]
Maghami, A.; Khoshdarregi, M. Vision-based target localization and online error correction for high-precision robotic drilling. Robotica 2024, 42, 3019–3043. [Google Scholar] [CrossRef]
He, W.; Zhang, P.; Guo, K.; Sun, J.; Sivalingam, V.; Huang, X. Kinematic calibration and compensation of industrial robots based on extended joint space. IEEE Access 2023, 11, 109331–109340. [Google Scholar] [CrossRef]
Paksoy, E.; Dede, M.I.C.; Kiper, G. Enhancing trajectory-tracking accuracy of high-acceleration parallel robots by predicting compliant displacements. Robotica 2025, 43, 2003–2029. [Google Scholar] [CrossRef]
Kadri, M.B.; Khatri, S.A.; Yousuf, S. Trajectory Tracking Control of a Planar Robotic Arm Using Inverse Dynamics and Fuzzy Gain Scheduling: Simulation and Experimental Validation. IEEE Access 2025, 13, 186736–186759. [Google Scholar] [CrossRef]
Pedrocchi, N.; Villagrossi, E.; Cenati, C.; Molinari Tosatti, L. Design of fuzzy logic controller of industrial robot for roughing the uppers of fashion shoes. Int. J. Adv. Manuf. Technol. 2015, 77, 939–953. [Google Scholar] [CrossRef]
Wang, Y.; Zhou, Y.; Wei, L.; Li, R. Design of a four-axis robot arm system based on machine vision. Appl. Sci. 2023, 13, 8836. [Google Scholar] [CrossRef]
Wang, S.; Tao, J.; Jiang, Q.; Chen, W.; Qin, C.; Liu, C. A digital twin framework for anomaly detection in industrial robot system based on multiple physics-informed hybrid convolutional autoencoder. J. Manuf. Syst. 2024, 77, 798–809. [Google Scholar] [CrossRef]
Chennareddy, S.S.R.; Agrawal, A.; Karuppiah, A. Modular self-reconfigurable robotic systems: A survey on hardware architectures. J. Robot. 2017, 2017, 5013532. [Google Scholar]
Guo, J. Assembly of Slender Modules for Robotic Multi-Functionality and Adaptive Re-Configurability. Ph.D. Thesis, University of Illinois Urbana-Champaign, Champaign, IL, USA, 2025. [Google Scholar]
Vaquero, T.S.; Daddi, G.; Thakker, R.; Paton, M.; Jasour, A.; Strub, M.P.; Swan, R.M.; Royce, R.; Gildner, M.; Tosi, P.; et al. EELS: Autonomous snake-like robot with task and motion planning capabilities for ice world exploration. Sci. Robot. 2024, 9, eadh8332. [Google Scholar] [CrossRef]
Oyekanlu, E.A.; Smith, A.C.; Thomas, W.P.; Mulroy, G.; Hitesh, D.; Ramsey, M.; Kuhn, D.J.; Mcghinnis, J.D.; Buonavita, S.C.; Looper, N.A.; et al. A review of recent advances in automated guided vehicle technologies: Integration challenges and research areas for 5G-based smart manufacturing applications. IEEE Access 2020, 8, 202312–202353. [Google Scholar] [CrossRef]
Di Castro, M.; Ferre, M.; Masi, A. CERNTAURO: A modular architecture for robotic inspection and telemanipulation in harsh and semi-structured environments. IEEE Access 2018, 6, 37506–37522. [Google Scholar] [CrossRef]
Guastella, D.C.; Muscato, G. Learning-based methods of perception and navigation for ground vehicles in unstructured environments: A review. Sensors 2020, 21, 73. [Google Scholar] [CrossRef] [PubMed]
Gasparetto, A.; Boscariol, P.; Lanzutti, A.; Vidoni, R. Path planning and trajectory planning algorithms: A general overview. In Motion and Operation Planning of Robotic Systems: Background and Practical Approaches; Springer: Cham, Switzerland, 2015; pp. 3–27. [Google Scholar]
Yang, L.; Li, P.; Qian, S.; Quan, H.; Miao, J.; Liu, M.; Hu, Y.; Memetimin, E. Path planning technique for mobile robots: A review. Machines 2023, 11, 980. [Google Scholar] [CrossRef]
Shu, Y.; Dong, L.; Liu, J.; Liu, C.; Wei, W. Overview of terrain traversability evaluation for autonomous robots. J. Field Robot. 2025, 42, 1724–1765. [Google Scholar] [CrossRef]
Shi, H.; Shi, L.; Xu, M.; Hwang, K.S. End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Trans. Ind. Inform. 2019, 16, 2393–2402. [Google Scholar] [CrossRef]
Pico, N.; Montero, E.; Vanegas, M.; Erazo Ayon, J.M.; Auh, E.; Shin, J.; Doh, M.; Park, S.H.; Moon, H. Integrating radar-based obstacle detection with deep reinforcement learning for robust autonomous navigation. Appl. Sci. 2024, 15, 295. [Google Scholar] [CrossRef]
Montero, E.; Pico, N.; Ghergherehchi, M.; Choi, H. Collision-free robot navigation in confined and partially observable environments using spatial-memory deep reinforcement learning. Ain Shams Eng. J. 2026, 17, 103867. [Google Scholar] [CrossRef]
Rubio, F.; Valero, F.; Llopis-Albert, C. A review of mobile robots: Concepts, methods, theoretical framework, and applications. Int. J. Adv. Robot. Syst. 2019, 16, 1729881419839596. [Google Scholar] [CrossRef]
Cognominal, M.; Patronymic, K.; Wańkowicz, A. Evolving field of autonomous mobile robotics: Technological advances and applications. Fusion Multidiscip. Res. Int. J. 2021, 2, 189–200. [Google Scholar] [CrossRef]
Hopko, S.; Wang, J.; Mehta, R. Human factors considerations and metrics in shared space human–robot collaboration: A systematic review. Front. Robot. AI 2022, 9, 799522. [Google Scholar] [CrossRef]
Robla-Gómez, S.; Becerra, V.M.; Llata, J.R.; Gonzalez-Sarabia, E.; Torre-Ferrero, C.; Perez-Oria, J. Working together: A review on safe human-robot collaboration in industrial environments. IEEE Access 2017, 5, 26754–26773. [Google Scholar] [CrossRef]
Ding, Z.; Ji, Y.; Gan, Y.; Wang, Y.; Xia, Y. Current status and trends of technology, methods, and applications of Human–Computer Intelligent Interaction (HCII): A bibliometric research. Multimed. Tools Appl. 2024, 83, 69111–69144. [Google Scholar] [CrossRef]
Kosch, T.; Karolus, J.; Zagermann, J.; Reiterer, H.; Schmidt, A.; Woźniak, P.W. A survey on measuring cognitive workload in human-computer interaction. ACM Comput. Surv. 2023, 55, 283. [Google Scholar] [CrossRef]
ISO/TS 15066:2016; Robots and Robotic Devices—Collaborative Robots. International Organization for Standardization: Geneva, Switzerland, 2016.
Ali, A.A.; Beigomi, B.; Zhu, Z.H. Development of 6DOF hardware-in-the-loop ground testbed for autonomous robotic space debris removal. Aerospace 2024, 11, 877. [Google Scholar] [CrossRef]
Jiang, Z.; Otto, R.; Bing, Z.; Huang, K.; Knoll, A. Target tracking control of a wheel-less snake robot based on a supervised multi-layered snn. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25 October 2020–24 January 2021; pp. 7124–7130. [Google Scholar]
Cornejo, J.; Weitzenfeld, A.; Baca, J.; García Cena, C.E. Aerospace Bionic Robotics: BEAM-D technical standard of Biomimetic Engineering design methodology applied to mechatronics systems. Biomimetics 2025, 10, 668. [Google Scholar] [CrossRef]
Kolvenbach, H.; Arm, P.; Hampp, E.; Dietsche, A.; Bickel, V.; Sun, B.; Meyer, C.; Hutter, M. Traversing steep and granular martian analog slopes with a dynamic quadrupedal robot. Field Robot. 2022, 2, 910–939. [Google Scholar] [CrossRef]
Cheng, Y.W.; Sun, P.C.; Chen, N.S. The essential applications of educational robot: Requirement analysis from the perspectives of experts, researchers and instructors. Comput. Educ. 2018, 126, 399–416. [Google Scholar] [CrossRef]
Ortega-Gomez, J.I.; Morales-Hernandez, L.A.; Cruz-Albarran, I.A. A specialized database for autonomous vehicles based on the KITTI vision benchmark. Electronics 2023, 12, 3165. [Google Scholar] [CrossRef]
Siau, K.; Rossi, M. Evaluation techniques for systems analysis and design modelling methods–a review and comparative analysis. Inf. Syst. J. 2011, 21, 249–268. [Google Scholar] [CrossRef]
Izagirre, U.; Andonegui, I.; Eciolaza, L.; Zurutuza, U. Towards manufacturing robotics accuracy degradation assessment: A vision-based data-driven implementation. Robot. Comput.-Integr. Manuf. 2021, 67, 102029. [Google Scholar] [CrossRef]
Wang, M.; Yang, A. Dynamic learning from adaptive neural control of robot manipulators with prescribed performance. IEEE Trans. Syst. Man. Cybern. Syst. 2017, 47, 2244–2255. [Google Scholar] [CrossRef]
Seyyedi, A.; Bohlouli, M.; Oskoee, S.N. Machine learning and physics: A survey of integrated models. ACM Comput. Surv. 2023, 56, 115. [Google Scholar] [CrossRef]
Liu, Q.; Li, J.; Lu, Z. ST-Tran: Spatial-temporal transformer for cellular traffic prediction. IEEE Commun. Lett. 2021, 25, 3325–3329. [Google Scholar] [CrossRef]
Kang, J.; Fang, H.; Hao, Y. A closed-loop evaluation method for industrial robot performance driven by health data. IEEE/ASME Trans. Mechatron. 2022, 28, 726–736. [Google Scholar] [CrossRef]
Deng, L.; Li, W.; Pan, Y. Data-efficient Gaussian process online learning for adaptive control of multi-DoF robotic arms. IFAC-PapersOnLine 2022, 55, 84–89. [Google Scholar] [CrossRef]
Kumar, P.; Khalid, S.; Kim, H.S. Prognostics and health management of rotating machinery of industrial robot with deep learning applications—A review. Mathematics 2023, 11, 3008. [Google Scholar] [CrossRef]
Aivaliotis, P.; Arkouli, Z.; Georgoulias, K.; Makris, S. Degradation curves integration in physics-based models: Towards the predictive maintenance of industrial robots. Robot. Comput.-Integr. Manuf. 2021, 71, 102177. [Google Scholar] [CrossRef]
Ponikelskỳ, J.; Chalupa, M.; Černohlávek, V.; Štěrba, J. Force and pressure dependent asymmetric workspace research of a collaborative robot and human. Symmetry 2024, 16, 131. [Google Scholar] [CrossRef]
Khargonkar, N.; Allu, S.H.; Lu, Y.; Prabhakaran, B.; Xiang, Y. Scenereplica: Benchmarking real-world robot manipulation by creating replicable scenes. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 8258–8264. [Google Scholar]
Lerher, T.; Bencak, P.; Hercog, D.; Jerman, B.; Bizjak, L. Robotic Bin-Picking: Benchmarking Robotics Grippers with Modified YCB Object and Model Set. 2023. Available online: https://digitalcommons.georgiasouthern.edu/pmhr_2023/9 (accessed on 9 May 2026).
Chumuang, N.; Farooq, A.; Irfan, M.; Aziz, S.; Qureshi, M. Feature matching and deep learning models for attitude estimation on a micro-aerial vehicle. In Proceedings of the 2022 International Conference on Cybernetics and Innovations (ICCI), Virtual, 28 February–2 March 2022; pp. 1–6. [Google Scholar]
Liu, Z.; Liu, W.; Qin, Y.; Xiang, F.; Gou, M.; Xin, S.; Roa, M.A.; Calli, B.; Su, H.; Sun, Y.; et al. Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation. IEEE Robot. Autom. Lett. 2021, 7, 486–493. [Google Scholar] [CrossRef]
Agha, A.; Otsu, K.; Morrell, B.; Fan, D.D.; Thakker, R.; Santamaria-Navarro, A.; Kim, S.K.; Bouman, A.; Lei, X.; Edlund, J.; et al. NeBula: TEAM CoSTAR’s robotic autonomy solution that won phase II of DARPA subterranean challenge. Field Robot. 2022, 2, 1432–1506. [Google Scholar]
De Sousa, C.V.; Bagio, G.G.; Aidar, F.M.; Guimarães, K.H.; de Silva, M. RoboFEI@ Home Team Description Paper for RoboCup@ Home 2024: Eindhoven Edition. 2023. Available online: https://athome.robocup.org/wp-content/uploads/OPL-RoboFEI2024TDPEindhoven.pdf (accessed on 9 May 2026).
Ji, R.; Lu, C.; Zhong, J. Dynamic Differencing-Based Hybrid Network for Improved 3D Skeleton-Based Motion Prediction. AI 2024, 5, 2897–2913. [Google Scholar] [CrossRef]
Mittal, M.; Yu, C.; Yu, Q.; Liu, J.; Rudin, N.; Hoeller, D.; Yuan, J.L.; Singh, R.; Guo, Y.; Mazhar, H.; et al. Orbit: A unified simulation framework for interactive robot learning environments. IEEE Robot. Autom. Lett. 2023, 8, 3740–3747. [Google Scholar] [CrossRef]
Makoviychuk, V.; Wawrzyniak, L.; Guo, Y.; Lu, M.; Storey, K.; Macklin, M.; Hoeller, D.; Rudin, N.; Allshire, A.; Handa, A.; et al. Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv 2021, arXiv:2108.10470. [Google Scholar] [CrossRef]
Syniawa, D.; Droste, L.; Kuhlenkötter, B. Semi-Automated Programming of Industrial Robotic Systems Using Large Language Models and Standardized Data Model. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6115435 (accessed on 9 May 2026).
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]

Figure 1. PRISMA 2020 flow diagram of the literature screening process. “*,**” indicates records excluded during the title and abstract screening phase.

Figure 2. The mapping knowledge domain of robotics research: (a) co-occurrence clusters of research topics, (b) country collaboration network, and (c) top keywords with the strongest citation bursts.

Figure 3. The number of papers varies with the year.

Figure 4. Closed-loop process for robot performance evaluation based on simulation and experimental verification.

Figure 5. Flowchart of the hybrid multi-criteria decision-making method: (a) hybrid Entropy-MOORA method for alternative ranking; (b) integrated framework for robot selection using CRITIC, COPRAS, ARAS, and TOPSIS; (c) hierarchical considerations for user adoption of AI robots in healthcare; (d) multi-stage evaluation process based on interval-valued Pythagorean fuzzy (IVPF) methods; (e) integrated BWM-EDAS decision-making method with sensitivity analysis; (f) systematic identification and assessment of key performance metrics (KPM) based on literature and fuzzy TOPSIS.

Figure 6. Evaluation framework for mobile robot path planning based on terrain simulation and dynamics model: (a) dynamic simulation models of different robot morphologies; (b) flowchart of the evaluation algorithm.

Figure 7. Data-driven robot performance evaluation framework: (a) is a traditional open-loop performance evaluation framework, and (b) is a data-driven closed-loop performance evaluation framework.

Figure 8. Future challenges and driving factors in robot performance evaluation.

Table 1. Comparison of TESM with existing evaluation frameworks.

Framework	Core Focus	Environmental Context	Applicability to AI/Autonomy
Systems Engineering [28]	Design and lifecycle verification	Static (defined in requirements)	Limited (treated as standard software)
Standard Benchmarks (e.g., ISO) [27]	Kinematic accuracy and repeatability	Controlled and structured	Not applicable (focuses on deterministic execution)
Life Cycle Assessment (LCA) [29]	Sustainability and energy efficiency	Macroscopic (e.g., carbon footprint)	Indirect (measured via energy costs)
Proposed TESM	Coupled Task–System performance	Dynamic and unstructured	Explicitly evaluated (perception, planning, etc.)

Table 2. Robot performance index system.

Performance Category	Indicator Name	Indicator Definition	Example Application Scenarios
Kinematics and Dynamics	Absolute positioning accuracy	The deviation of the end effector from the target position in global coordinates	Precision assembly and alignment operation [39,40]
	Relative positioning accuracy	Deviation when moving to the same position multiple times	Repetitive handling and calibration scenarios [41]
	Tracking error	Difference between actual trajectory and expected trajectory	Trajectory control task [42,43]
	Maximum speed and acceleration	Maximum linear or angular velocity and acceleration of the end or body	High-speed transport and dynamic tasks [44]
	Effective workspace	The spatial range and shape that robots can reach	Production line layout assessment [45]
	Stiffness and Compliance	The system’s resistance to deformation or compliance under load	Processing tasks and collaborative tasks [46,47]
	Energy efficiency	Energy consumption per unit task and energy utilization efficiency	Large-scale automated production [48,49]
Perception and Positioning	Perception accuracy (e.g., mAP, F1-score)	Accuracy of sensor system detection and classification	Visual and laser navigation tasks [50,51]
	Recall rate or detection rate	The algorithm detects the coverage capability of the target	Obstacle recognition and target detection [52]
	Localization error (e.g., ATE, RPE)	Absolute Trajectory Error (ATE) and Relative Pose Error (RPE) compared to ground truth	SLAM and Navigation Control [53,54]
	Robustness index	Performance retention under noise and obstruction	Dynamic and complex scenarios [55]
	Real-time	Latency and frame rate response of sensing and positioning modules	High-speed interaction and real-time control [56]
	Environmental adaptability	Performance stability under different lighting and weather interference	Outdoor or unstructured scenarios [57,58]
Human–computer interaction and safety	Maximum contact force	Maximum safety boundary when in contact with the human body	Collaborative robot applications [59]
	Security response time	Safety shutdown response speed under abnormal conditions	Failure Risk Scenario [59,60,61]
	Collaboration efficiency	Efficiency of humans and robots working together to complete tasks	Production collaboration or auxiliary operations [59,60,61]
	Load and comfort	Human subjective or physiological load assessment	Human–machine collaboration or service robots [60,62]
Reliability and maintainability	Mean Time Between Failures	System normal operation probability per unit time	Production line operates around the clock [63,64]
	Availability	Percentage of systems in operational condition due to fault repair	Automated system reliability assessment [63,64,65]
	Mean Time to Repair	Average time to restore system to a usable state	Service and maintenance assessment [63,64]
	Risk Priority Number	Risk quantification based on failure mode and effects assessment	Maintenance strategy optimization [63,64]
Economy and lifecycle	Lifetime Cost	The total cost including purchase, installation, operation and maintenance	Investment return analysis [66]
	Return on investment	Lifecycle cost–benefit ratio	Scheme Comparison and Selection [66]
	Scalability	The system reserves the ability to be upgraded in the future	Cluster robots [66,67]
	Technology upgrade compatibility	Upgradeability of control platform hardware and software	Sustainable evolution and iteration [66]

Table 3. Summary of performance analysis dimensions and engineering applications based on mechanism models.

Analysis Dimensions	Theoretical Basis and Core Equations	Key Modeling Parameters and Error Sources	Key Evaluation Indicators	Engineering Applications and Decision Support
Kinematic analysis	Forward/Inverse Kinematic Equations	Geometric parameters (link length, offset)	End-effector pose error (position and attitude)	Precision Allocation: Identifying Key Geometric Tolerances; Error Compensation: Kinematic Calibration and Parameter Correction [71,72]
	Jacobian matrix	Joint zero-position deviation	Tracking error
	Error propagation theory	Assembly error	Repeatability
Kinetic analysis	Lagrange equations	Mass distribution and inertia tensor	Joint torque requirements	Selection and Configuration: Motor and Gearbox Selection; Energy Efficiency Optimization: Trajectory Planning and Lightweight Design [73]
	Newton–Euler method	Drive system characteristics	System response bandwidth
	Friction Model	External load conditions	Energy consumption level
Statics and Stiffness	The principle of virtual work	Linkage and joint stiffness coefficients	End deformation	Stiffness Enhancement: Section Optimization and Vibration Reduction Design; Force Position Control: Deformation Compensation and Force Compliance Control [74]
	Structural flexibility model	Material elastic modulus	Cutting or contact force stability
	Contact dynamics	Contact surface characteristics	Vibration modal frequency

Table 4. Comparison and application of mainstream robot performance measurement systems.

Measurement System	Core Principles	Typical Accuracy Range	Applicable Indicators	Advantages
laser tracker	Laser ranging and angle coding	10–50 μm	Static and dynamic accuracy, trajectory accuracy	Large measurement range, high precision, and portability
Coordinate Measuring Machine (CMM)	Contact or non-contact probes	1–10 μm	Static repeatability and part geometric error	Extremely high precision, serving as a benchmark (true value)
High-precision vision system	Multi-view and structured light triangulation	0.1–1 mm	6D pose, vibration, and dynamic trajectory	Non-contact, high sampling rate, easy to integrate
Inertial Measurement Unit (IMU)	Acceleration, angular velocity integral	Drift accumulation	Mobile robot navigation accuracy and body vibration	High real-time performance, compact size, used for relative positioning

Table 5. Comparison of typical MCDM methods in robot performance evaluation.

MCDM Method	Category	Key Characteristics & Limitations	Typical Robotics Applications
AHP (Analytic Hierarchy Process)	Weighting (Subjective)	Pros: Captures expert knowledge through pairwise comparisons. Cons: Prone to subjective bias; difficult with too many indicators. [cite: 813]	Weighting human–robot collaboration safety criteria based on expert consensus. [cite: 813]
Entropy Weighting	Weighting (Objective)	Pros: Derives weights directly from data dispersion, avoiding human bias. Cons: Ignores the actual engineering importance of the metrics. [cite: 813]	Evaluating kinematic metrics directly using physical test data. [cite: 813]
TOPSIS	Ranking	Pros: Logically straightforward; ranks based on distance to the ideal solution. Cons: Sensitive to normalization methods. [cite: 813]	Selecting the optimal industrial robot vendor among multiple candidates. [cite: 813]
VIKOR	Ranking	Pros: Focuses on the maximum utility of the majority and minimum regret. Cons: Computationally complex for dynamic data. [cite: 813]	Comparing robot configurations in safety-critical tasks (e.g., field robots). [cite: 813]
Hybrid (e.g., AHP-TOPSIS)	Comprehensive	Pros: Combines subjective engineering preferences with objective data ranking. Cons: High implementation complexity. [cite: 813]	Holistic lifecycle economics and system-level performance trade-offs. [cite: 813]

Table 6. Definition of core performance indicators and test trajectories for industrial robots based on ISO 9283 Standard.

Key Indicators	Test Track	Evaluation Criteria
Position accuracy
Attitude accuracy	Selected plane and measurement plane	Positioning accuracy and repeatability
Trajectory accuracy
	Poses to be used	Orientation accuracy and repeatability
Positional repeatability
Posture repeatability	Definitions of planes for location of test path	Path accuracy and path repeatability for a command path
Trajectory repeatability
Distance accuracy	Examples of test paths	Distance accuracy

Table 7. Evaluation index system for human–machine collaboration safety and efficiency.

Evaluation Dimensions	Primary Indicators	Example of Secondary Indicators	Measurement Methods
Security	Collision risk	Peak contact force, clamping force	Impact Force Tester
Security	Monitoring function	Stop time, speed limit response	Security controller logs
Efficiency	Task performance	Collaboration cycle time, task success rate	Video analytics, stopwatch
User experience	Subjective feelings	Trust level, comfort level, psychological burden	Questionnaire

Table 8. Comparison matrix of performance evaluation features for cross-domain robotic systems.

Robot Type	Core Requirements	Key Performance Indicators	Main Modeling Methods	Typical Experimental Methods	Decision Focus
Industrial robotic arms	Accuracy, speed	Repeatability and cycle time	Kinematics and Dynamics	Laser tracker	Efficiency and yield
Mobile robots	Navigation, throughput	Track error, order rate	Discrete event simulation	Motion capture system/GPS positioning	Flexibility and scalability
Collaborative robots	Safety and Inclusivity	Contact force, ergonomics	Contact Dynamics	Collision force test	Security and ease of use
Special robots	Reliability, adaptability	Passability, MTBF	Environment Interaction Model	Environmental simulation chamber	Task success rate

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, X.; Peng, S.; Zhao, B. Robot Performance Evaluation for Engineering Applications: A Systematic Review of Metrics, Methods and Practices. Technologies 2026, 14, 297. https://doi.org/10.3390/technologies14050297

AMA Style

Wei X, Peng S, Zhao B. Robot Performance Evaluation for Engineering Applications: A Systematic Review of Metrics, Methods and Practices. Technologies. 2026; 14(5):297. https://doi.org/10.3390/technologies14050297

Chicago/Turabian Style

Wei, Xiang, Songjie Peng, and Baosheng Zhao. 2026. "Robot Performance Evaluation for Engineering Applications: A Systematic Review of Metrics, Methods and Practices" Technologies 14, no. 5: 297. https://doi.org/10.3390/technologies14050297

APA Style

Wei, X., Peng, S., & Zhao, B. (2026). Robot Performance Evaluation for Engineering Applications: A Systematic Review of Metrics, Methods and Practices. Technologies, 14(5), 297. https://doi.org/10.3390/technologies14050297

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robot Performance Evaluation for Engineering Applications: A Systematic Review of Metrics, Methods and Practices

Abstract

1. Introduction

2. Bibliometric Analysis

3. Framework Formulation and Hierarchical Metric System

3.1. The Task–Environment–System–Metric (TESM) Framework

3.2. Mechanism for Mapping Performance Requirements to Evaluation Metrics

3.3. Construction of the Multi-Level Performance Indicator System

4. Modeling, Simulation, and Experimental Validation Methodologies

4.1. Mechanism-Based Performance Analysis

4.2. Simulation-Based Evaluation Methods

4.3. Experimental Design and Test Platform Construction

4.4. Uncertainty Modeling and Robustness Evaluation

4.5. Application of Multi-Criteria Decision-Making (MCDM) Methods

5. Performance Evaluation Practices for Typical Robotic Systems

5.1. Performance Evaluation of Industrial Manipulators

5.2. Performance Evaluation of Mobile and Logistics Robots

5.3. Performance Evaluation of Collaborative and Service Robots

5.4. Performance Evaluation of Field and Specialized Robots

5.5. Comparative Analysis of Engineering Cases and Summary

6. Emerging Trends and Future Challenges in Robot Performance Evaluation

6.1. Data-Driven and Intelligent Evaluation Methodologies

6.2. Standard Systems and Open Testing Platforms

6.3. Future Trends and Challenges

6.4. Limitations of the Review

7. Conclusions and Future Prospects

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI