A Method for Lunar Surface Autonomy Certification: Application to a Construction Pathfinder Mission

Dickinson, Cameron S.; Alam, Diba; Francis, Raymond; Lucier, Laura M.; Nguyen, Anh; Prosser, Noa; Waslander, Steven L.; Grouchy, Paul

doi:10.3390/aerospace12121115

Open AccessArticle

A Method for Lunar Surface Autonomy Certification: Application to a Construction Pathfinder Mission

by

Cameron S. Dickinson

^1,2,

Diba Alam

³,

Raymond Francis

⁴,

Laura M. Lucier

⁵,

Anh Nguyen

⁶,

Noa Prosser

³,

Steven L. Waslander

²

and

Paul Grouchy

^1,*

¹

MDA Space Ltd., Brampton, ON L6Y 6K7, Canada

²

University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON M3H 5T6, Canada

³

Faculty of Applied Science and Engineering, University of Toronto, Toronto, ON M5S 1A1, Canada

⁴

Jet Propulsion Laboratory, California Institute of Technology, Cañada Flintridge, CA 91011, USA

⁵

NASA Johnson Space Center (JSC), Houston, TX 77058, USA

⁶

Bombardier Inc., Montreal, QC H4S 1Y9, Canada

^*

Author to whom correspondence should be addressed.

Aerospace 2025, 12(12), 1115; https://doi.org/10.3390/aerospace12121115

Submission received: 15 November 2025 / Revised: 9 December 2025 / Accepted: 11 December 2025 / Published: 18 December 2025

(This article belongs to the Special Issue Lunar Construction)

Download

Browse Figures

Versions Notes

Abstract

Developing autonomous technologies will enable humanity to considerably expand our lunar and space exploration capabilities. Along with the technical challenges of developing autonomous technologies, there is also the issue of trust—stakeholders are often resistant to their use for a variety of psychological reasons. Nevertheless, several successful methods for gradually building trust have been developed for both terrestrial and space applications. Relevant case studies provide insights on how trust is built for stakeholders when it comes to self-driving vehicles, Artificial Intelligence in aviation, space station operations, satellite rendezvous missions, and Mars rover surface operations. Based on these case studies, we propose a generalized method for building trust with stakeholders and have applied it to a lunar construction pathfinder mission currently in development. Metrics for assessing success criteria for autonomous systems are provided as a means to progress through the proposed phases of autonomy deployment.

Keywords:

autonomy; lunar construction; robotics; ISRU; lunar regolith

1. Introduction

A key goal of Artemis is to establish a base camp at the south pole of the Moon. Achieving a sustained presence on the lunar surface will require innovations across a myriad of disciplines, including geotechnical engineering, materials science for the extreme environment, as well as the development of technologies for power systems, robotics, and sensors. While the need for most of these technologies is self-explanatory, the need for higher levels of autonomy is more subtle, as early lunar surface initiatives will employ minimal levels of astronaut labour due to the high cost of human surface operations. While teleoperation will play a part in early infrastructure deployments, at scale, this will require extremely high data volumes and rates between the Moon and Earth to ensure that human operators have sufficient situational awareness for safe operation.

As such, high levels of autonomy will be required for many of the systems deployed on the lunar surface. These assets will likely start off working in isolation, where the worst-case scenario of any off-nominal operations would lead to the loss of that vehicle or element, limiting the damage to the financial domain (i.e., loss of mission). As operations expand to include other vehicles or logistics modules (such as a habitation module), or, ultimately, astronauts, the failure of autonomous systems could lead to damage to a third party’s hardware and possible harm to astronauts within that environment.

Therefore, the question becomes how to ensure that the autonomous elements deployed on the lunar surface will meet the technical and operational safety demands—from both financial and human perspectives. The current work seeks to propose a method for autonomy certification, drawing from models used in both the terrestrial and space domains. These domains include aviation, self-driving cars, the International Space Station and Low Earth Orbit, and, finally, deep space exploration for Mars. This method will then be applied to a demonstration mission for lunar construction using Regolith Containment Units (RCUs) as a representative example of lunar autonomy.

1.1. Motivation for Autonomy

1.1.1. Definition of Autonomy

Autonomy is a major topic of interest across various industries, from aeronautics to manufacturing and beyond. Each industry strives towards advancements in Artificial Intelligence (AI) and other technologies that will enable systems to operate independently of humans. NASA defines autonomy as “the ability of a system to achieve goals while operating independently of external control,” as stated in the 2015 NASA Technology Roadmap [1]. The Space Science and Technology Partnership, consisting of the Department of the Air Force DAF [2]), NASA, and the National Reconnaissance Office (NRO), further state that the autonomous system must have “some level of decision-making authority” to achieve this. It is important to note that a system or subsystem can have various levels of autonomy, which can change during the duration of a mission. These range from no autonomy (with all operations conducted by human operators), to partial or supervised autonomy, and ultimately to full autonomy (requiring no human input to operate), as described in Section 1.2.

Full autonomy requires the (sub)system to have the ability [3] to (1) perceive and (2) analyze its state and environment, (3) make and plan decisions, and (4) execute those decisions, including with outcome anticipation, preparation [4], and anomaly detection and management capabilities [5]—without any input from humans. However, if full autonomy is not achieved, each of the four actions may exhibit different levels of autonomy within a given (sub)system, leading to partial autonomy. Partial autonomy can also be realized, for example, if operators can modify commands or take over control in the event of an emergency.

1.1.2. Operational Constraints: Why Is Autonomy Necessary?

As we scale our ambitions towards full-scale lunar habitats and, eventually, the Martian surface, autonomy for robotic systems will become essential for operation without direct human intervention. For early lunar operations, teleoperation will be employed, but autonomy will ultimately be necessary to scale operations and reduce operational costs in the pursuit of permanent infrastructure, such as habitation modules, landing pads, berms, power sources, and future endeavours like in-situ resource extraction (i.e., mining). Managing such a system without the use of autonomy would require a very large team of human specialists, which would become prohibitively expensive if conducted on the lunar surface. As detailed below, it also presents inherent technical challenges if done remotely from Earth. As this infrastructure scales from the Moon to Mars, these challenges will become nearly insurmountable, making the need for the Moon to be used as a proving ground for Mars even more evident.

Communication Constraints

As the operational distance from Earth increases, communication latencies lengthen, and blackouts due to spacecraft orbital occultations occur. Not only are time delays expensive, but, to resolve or avoid mission-critical anomalies, autonomy mitigates the need to wait for and commands and telemetry to be up- and downlinked before operating. A reliance on teleoperation could lead to catastrophic consequences, such as when navigating a rover around a hazard or providing medical response to crew injury in deep space exploration. Although Low Earth Orbit (LEO [6]) experiences near real-time communication (1–2 s latency), lunar missions will experience one-way delays of several seconds, while Mars [6] missions can experience delays of up to 22 min. Furthermore, based on the selected mission Operations Concept, up to 21 days of communication blackout can occur due to the transit of Mars missions [6]. Commands to Mars rovers are typically uplinked once a day to once every three days [7], a significant discrepancy from the real-time teleoperated Mobile Servicing System on the International Space Station (ISS). As an extreme example, a concept mission to land on Jupiter’s moon Europa anticipates 45 min of one-way communication delays and experiences blackouts every alternate 42 h during operation [8]. As can be seen, the latency and blackout periods increase with the depth of travel, and teleoperation quickly becomes an infeasible mode of command, requiring autonomous operations.

Limited Bandwidth/Data Rate—Not only do latency and blackout periods increase, but as spacecraft travel further from Earth, due to weakening signal strength and more interference, the data rate of communication decreases, reducing the amount of data that can be up- or downlinked between the system and Earth to command a system or return information; similar to latency, this makes autonomy an essential part of deep space exploration. For reference, data rates for the ISS have recently been upgraded to 600 Mbps [9], and the DSN currently supports up-link data rates of up to 2 kbps, and down-link data rates of up to 270 kbps [10]. As an example, the Perseverance [11] rover on Mars has a capacity of 2 Mbps for its ultra-high-frequency (UHF) antenna via an orbiting relay satellite around Mars available for limited periods of time each day, a capacity between 160 and 600 bps for its X-band high gain antenna for direct Earth-to-rover commands, and 10 bps for the X-band low gain antenna. The Mars Reconnaissance Orbiter [12], a satellite orbiting Mars (which is its own mission and is also a relay satellite for various Mars landers and rovers), typically has an X-band data rate for primary communications between 500 bps and 4 Mbps at its farthest and closest distances from Earth, respectively. Meanwhile, the Voyager 1 and 2 deep space missions [13] can downlink in the S-band at 40 bps for health statuses, and up to 7.2 kbps in the X-band for science data, both via the high-gain antenna. These data rates are dependent on the RF capabilities at the time of development and the distance from Earth. Regardless, issues that arise from limited bandwidth and data rate can also be overcome by autonomy–a system can make its own decisions without having to downlink data and subsequently uplink commands, saving data transfer for more valuable and important data and results.

Human Operator Constraints

Autonomy will not only be required for future space missions to enable efficient missions and reduce mission risk, but will also be required to alleviate workloads for crews and ground operators. When crewed missions progress into deeper space missions than the current LEO expeditions to the ISS, astronaut crews will have limited time and resources available to complete required operations–including science and maintenance–and autonomous systems will be required to enable crew time to be prioritized for necessary operations. For example, crewed missions may require up to 24 h of extravehicular activities per week on Mars, depending on mission design, leaving little cognitive capacity for extraneous activities that could be performed autonomously instead [14]. Autonomy would also reduce crew operator demands, including training demands: currently on the ISS, two operators are required for collocated teleoperation of the Space Station Remote Manipulator System (Canadarm2) from within the station, for example, which is a large commitment of crew time. In fact, the Special Purpose Dexterous Manipulator (DEXTRE) [15] robot on the ISS was initially intended for collocated teleoperation by the crew on the ISS, but complexity and training demands necessitated a shift to ground-only operation. Furthermore, the harsh environment of deep space exploration poses a large hazard to human life through isolation, radiation, gravity differences, sensorimotor skill impairment, vision changes, and other physical and mental challenges [6,16]. These factors require a shift in activities towards greater autonomy, to reduce training and operation demands, and reprioritize astronauts’ cognitive workload.

Autonomy of space systems will also be required as the quantity and scale of missions increase as the commercial market grows, with each mission requiring a large team of human specialists, and more collaborating systems. One likely candidate for this is the Habitable Worlds Observatory [17,18], which is a planned satellite with an anticipated diameter of more than 6 m, and solar power satellites currently in development that may require even larger scale on-orbit assembly on the scale of kilometers. At these scales, the balance of whether to use solely human operators or autonomy begins to tip towards the latter. As such, because of both communication and human limitations, as the space industry expands and explores further, autonomy will be an essential part of future space missions.

1.1.3. History of Autonomy in the Space Industry

Autonomy in the space industry has long been in development. It has been the focus of numerous studies and integral to many missions, including those on the Moon and Mars. However, much of space autonomy thus far has consisted of scripted autonomy (to execute a pre-written command independently) or other forms of partial autonomy. All missions have been supervised by a ground crew, with nearly all designed to permit human intervention. Despite this, autonomy has still proven necessary for the execution of critical phases of many missions, particularly in deep space.

In this section, a select few autonomous space missions to date in various space environments will be examined, detailing the evolution of autonomy in the face of different locational and operational requirements. In particular, the history of autonomy will be considered in chronological order for missions intended for Earth’s orbit, lunar orbit or the lunar surface, Mars’ orbit or the Martian surface, and other deep space environments. Note that this list is not extensive and simply highlights core missions with some level of autonomy.

Intravehicular Systems (ISS)

Intravehicular systems onboard the ISS are unique in having collocated human-autonomy interaction, requiring rigorous safety validation. NASA’s Robonaut 2 (2011) [19,20] demonstrated supervised autonomy and teleoperation to assist with crew tasks, with further ground development for autonomous climbing, navigation, and obstacle avoidance. Astrobee free-flyers [21] perform interior inspection and maintenance using autonomous path planning, visual localization, and fault management. Both serve as testbeds for frameworks like ISAAC [21], targeting future missions such as the intermittently crewed Gateway lunar platform.

Extravehicular Autonomy in Earth Orbit

Autonomy in Earth orbit has been used in broader applications, particularly for servicing and formation flying. Japan’s ETS-VII (1997) [22] and DARPA’s Orbital Express (2007) [23] demonstrated autonomous rendezvous, docking, and robotic servicing using tools like ASPEN [24] for automated planning. ESA’s Automated Transfer Vehicle (ATV) [25] (2008–2014) operated with full autonomy across all mission phases, including docking and deorbiting, without teleoperation override except from the ISS crew [26]. More recently, the Starling CubeSat swarm (2023) [27] and Proba-3 formation-flying mission (2024) [28] showcased distributed autonomy, on-board task planning, and precise relative navigation without ground input.

Lunar Autonomy

Lunar missions have increasingly adopted autonomy for navigation and landing. China’s Chang’e-4 (2018) [29] used autonomous guidance and fault management for far-side lunar landing. India’s Chandrayaan-3 (2023) featured semi-autonomous navigation by the Pragyan rover using ground-generated maps and operator-approved routes [30]. Japan’s SLIM lander (2024) [31] used vision-based autonomous landing, even under anomaly conditions, with operator supervision. Its LEV-2 rover [32] operated fully autonomously—detaching, navigating, imaging, and transmitting data without any teleoperation [32].

Martian Autonomy

Due to high communication latency, Mars missions exhibit the most advanced autonomy. NASA’s Sojourner (1997) [33,34,35] began with waypoint navigation and hazard avoidance. Spirit and Opportunity (2004) introduced AutoNav [33], visual localization, and planning tools like MAPGEN [36,37], with Opportunity using AEGIS [7] for autonomous science targeting. Curiosity (2012) expanded on these capabilities, with AEGIS achieving > 93% success in target selection [7]. Perseverance (2020) [7] further improved AutoNav (used for 88% of first-year roving [7]), Terrain Relative Navigation (TRN) for landing, and the Rover Collision Model (RCM) [38] for arm and terrain safety. Other missions like Phoenix (2008) [39] and the Ingenuity helicopter [40,41,42] integrated autonomy for flight control and vision-based navigation. Ingenuity’s final flights included OUTLAST [43], testing autonomous health assessment and localization. Orbital missions like MAVEN [44] and Hope [45] employed limited autonomy primarily during orbit insertion.

Deep Space Autonomy

Missions beyond Mars have demonstrated targeted autonomous capabilities. Deep Space 1 (1998) pioneered autonomous navigation and constraint-based AI planning [36,46,47]. JAXA’s Hayabusa2 (2014) used autonomous descent and landing on asteroid Ryugu [48], as did NASA’s OSIRIS-REx with Natural Feature Tracking [49]. Europa Clipper (2024) used autonomy for post-launch configuration [50], with upcoming autonomous Jupiter Orbit Insertion [51] and FDIR [52].

1.2. Classifications for Autonomy

1.2.1. Definition of Trust

Trust is required to support the development of autonomous operations. There must be a belief that a system can mitigate errors and correct itself during anomalous circumstances. It is especially important in the context of space exploration, where technology is costly to develop and deploy, and the lack of additional resources or readily available support, given the long distance and latency, puts human lives at higher risk if failure occurs.

Balancing mission goals with inherent risks calls for trust in a system that can meet viable performance expectations given uncertain situations. As defined in the Space Trusted Autonomy Readiness Levels (STAR-L) publication by Hobbs et al., “trust is a fundamental social process wherein a trustor evaluates the trustworthiness of a referent and makes a decision (or series of decisions) related to how willing [they are] to be vulnerable to that referent” [53]. A significant cause of reluctance in autonomous systems is a lack of motivation to trust them within largely uncertain environments, exacerbated with risks of human injury for crewed missions. A lack of trust leads to longer development times, high emphasis on Safety and Mission Assurance (S&MA) procedures, and a decrease in reliance behaviour, which is critical for space missions.

One way of increasing trust in a system is through quantitative classification levels and metrics. By clearly defining the capabilities of a system and ensuring that the appropriate classifications are obtained at different stages of design reviews, human trust increases as behaviours can be compared to expectations, and it can be seen that they have been tested against a set standard. There are multiple motivations for certification within the space exploration sector that promote trust, reliability, and the development of autonomous systems.

1.2.2. Industry Classification Systems

Across domains such as aviation, automotive, and robotics, different classification frameworks have been established per industry to measure autonomy. Each one reflects the usage of products in its sector, but the underlying goal is consistent: to provide a common language to describe what level of autonomy is achievable given its technological capability.

A high-level summary of several classification systems is provided, as they pertain to the case studies presented in Section 2. The focus is not to restate their full definitions but to provide a base level of reference for their categorization levels. Each established industry framework is mapped against the STAR-L Trust Readiness Levels (TrRLs), which serve as our central reference for assessing trust in the space autonomy industry. This comparison was made by identifying commonalities between the two systems so as to align them as close as possible. While not an exact match, analogous elements provide useful comparisons between systems. This translation is meant to accentuate the incorporation of trust and system-to-operator involvement in existing frameworks.

Space Trusted Autonomy Readiness Levels (STAR-L)

The STAR-L framework proposed by Hobbs et al. (2023) is an extension of the traditionally used Technology Readiness Levels (TRLs), tailored to the deployment of autonomous systems in space missions [53]. STAR-L is structured as a two-dimensional evaluation of technological maturity and the degree of trust it instills in stakeholders.

The framework introduces two key axes: Technology Readiness and Trust, each ranging from Level 1 to Level 9 (see Figure 1) [53]. The vertical axis expands on the TRLs, measuring how mature an autonomous system is in performing its intended functions within lab, simulated, and operational environments. In contrast, the horizontal axis is unique to STAR-L, providing a quantifiable measure of confidence that users, developers, operators, and other stakeholders can reasonably place in the system to perform its tasks. The TRLs are designed to provide a relationship between humans and autonomous systems. This trust is founded in how rigorously the system has been scoped, tested, verified, and validated in all relevant environments under realistic conditions of uncertainty. Definitions for the nine TrRL levels as described by Hobbs et al. are listed in Appendix A.

The resulting two-dimensional grid in Figure 1 emphasizes the need to balance functionality with assurance. If either axis develops more quickly than the other, the system risks falling towards the corners of overconfidence or underconfidence. In space applications, this imbalance can lead to undesirable outcomes. For example, overconfidence may occur when low-TRL technologies are prematurely delegated to mission-critical tasks, increasing the risk of failure. Conversely, underconfidence may arise when more mature autonomous systems are sidelined because stakeholders remain unwilling to accept the risks of development and deployment, leading primarily to an inefficient use of resources rather than direct mission hazards.

By progressing in a balanced manner, developers ensure that verification processes are executed at each stage of prototyping. However, in contrast to traditional systems, autonomous systems exhibit context-dependent responses that require careful behavioural analysis under uncertain conditions. This level of verification is necessary to build a complete understanding of the system’s functionality, limitations, and potential unanticipated behaviours in new environments. This steady pairing of technical demonstrations with assurance activities such as metric-based simulated testing fosters continuous trust and transparency in the system’s functionality. A balanced trajectory also provides ample opportunities for mission stakeholders to verify the system across the development stages, building familiarity and confidence in the system’s operations. In doing so, they become more comfortable engaging with the increased risks that naturally emerge as the autonomous technology advances, without being pressured into premature or underinformed reliance, nor warranting excessive hesitation.

SAE J3016 Levels of Driving Automation

The Society of Automotive Engineers (SAE) International Standard J3016 [54] is a framework that defines six levels of driving automation for motor vehicles on common roadways [55]. It is a widely accepted classification standard that describes characteristics and features of various autonomy levels within the autonomous vehicle industry. The framework is broadly divided into two categories, driver support features (Levels 0–2) and automated driving features (Levels 3–5), as shown in Figure 2. During the lower levels, the driver remains responsible for operating the vehicle while the system provides automated assistance, such as increased and collective amounts of steering and speed control. Within the latter section, the system holds more operational control than the user, adapting to driving conditions and increasing in autonomous capability until human intervention is no longer required due to trust in system functionality. This division provides a clear path for consumers, developers, and industry regulators to understand the system’s expected technical capabilities as responsibility shifts from the human to the system. A mapping to TrRLs is summarized in Table 1.

European Union Aviation Safety Agency (EASA) Classification of AI Applications

The European Union Aviation Safety Agency (EASA) AI Roadmap defines AI levels to guide the safe integration of AI in aviation by introducing a three-tiered classification scheme as presented in Figure 3. Each of these levels is broken down into two sub-levels, which gradually shift more accountability from a human to the AI system. Level 1 systems aid operators, where “decisions are taken by the end user based on support of the AI-based system, and all actions are implemented by the end user” [56]. Conversely, Level 2 systems encourage cooperation between the AI and the user, where the AI may provide information or present appropriate actions. Although decisions can be made by either party, the user is ultimately accountable for constant supervision and can override the AI’s actions at any moment. Lastly, the shift to Level 3 increases the authority of the AI from acting within human-defined boundaries to independently executing tasks and making decisions entirely independently. An estimated mapping to TrRLs is shown in Table 2.

Levels of Automation (LoA)

Badger, Hollaway, and Taylor (2020) provide an adaptation of the Levels of Autonomy (LoA) framework for deep space vehicles applied towards human spaceflight, defining six levels (0–5) across four performance categories: perception, information analysis, decision-making, and task execution, as shown in Figure 4 [3]. At the lower levels (0–1), the human maintains complete control over operations, requesting limited autonomous assistance if needed. Intermediate levels (2–3) provide partial autonomy, where the system can begin to analyze information and propose suggestions for human review. Lastly, the higher levels (4–5) involve systems capable of independent decision-making, gradually reducing human operator participation in decision-making until full autonomy is achieved. This structure not only defines the progression of autonomy through the six stages but also describes its distribution across each of the four categories of human-to-system collaboration. Within the context of space missions, the LoA classifications are designed to account for the human operator’s ability “to maintain enough situational awareness to correctly respond to a fault or to intervene if the automation fails and prevent the system from entering a compromised or degraded state as a result of automation error” [3]. This provides a practical framework for evaluating autonomous control in high-risk environments, including human spaceflight. An approximate mapping to TrRLs is shown in Table 3.

1.2.3. Autonomy Trust Versus Technological Readiness

While technological readiness provides a greater reliability of the physical system through ever-increasing test rigor, the focus of this paper’s interpretation of the classification systems listed above relate much more to trust of stakeholders. The STAR-L classification system is a matrix of both—emphasizing the need for human–machine engagement (trust) and the need for machine–environment engagement (technological readiness). As will be discussed later in this work, understanding how the technology performs (and possibly fails) under a wide range of conditions is a key aspect of how the system is ultimately architected and operated.

2. Real-World Examples of How High Levels of Autonomy Are Certified

Building trust in autonomous systems has taken on many forms, both terrestrial and in aerospace. In this section, several real-world examples of how high levels of autonomy are viewed by stakeholders will be explored, as well as how these projects and missions have been certified as a trusted system. This covers several different aspects listed in the classification systems above, including human safety considerations, hesitancy within the specific industry to adopt higher levels of autonomy, and how these hurdles are overcome. Each of these case studies has differing goals on the ultimate autonomy that is achieved. For example, there are no current plans within aviation to achieve fully autonomous aircraft. However, they each demonstrate a system with a level of autonomy that has been successfully employed in the design and deployment of a system that has ultimately gained the trust of their designers, operators, users, and possibly the general public.

2.1. Case Study: Safety Assessment of the Waymo Autonomous Driving System

2.1.1. Introduction and Background

Automated Driving Systems (ADS) have progressed from experimental prototypes to commercially deployed Level 4 vehicles capable of operating without human supervision within defined Operational Design Domains (ODDs). Unlike Level 2 driver assistance features, which require continuous driver monitoring, Level 4 ADS can independently handle the full dynamic driving task under specified conditions. The motivations for pursuing this technology are predominantly safety and economic efficiency.

From a safety standpoint, human error accounts for the vast majority of serious road collisions—estimated at 94% in the United States according to NHTSA (2015) [57]. Eliminating the risks associated with distraction, impairment, and reckless driving could dramatically reduce fatalities and injuries. Economically, autonomous ride-hailing fleets offer the potential to lower per-mile operating costs, increase vehicle utilization, reduce insurance claims, and ensure round-the-clock availability. Nevertheless, Level 4 deployment presents significant challenges, including geographic and environmental limitations, high capital expenditure, and the necessity of earning public trust by demonstrating safety performance that clearly exceeds that of human drivers.

A fundamental difference exists between safety certification in autonomous driving and other more established industries described above, such as aviation and ISS operations, which is that the level of environmental and scenario variation is so vast as to not be completely analyzable. Unlike in aviation and current space autonomy operations, it is not possible to complete a full FMEA assessment of an autonomous vehicle system, as the variety of driving conditions, environments and dynamic agent behaviours that can arise and affect the autonomous vehicle are too numerous. Instead, a multi-pronged approach to developing a safety case for autonomy is required, and as such, this case study of the industry leader Waymo’s approach to safety can help inform future space autonomy missions where complexity and variability of the scenarios continue to expand.

2.1.2. Current State of the Art in Safety Demonstration

To gain public trust in their autonomy systems, Waymo has developed one of the most advanced safety assessment frameworks for ADS in existence today, grounded in a formal safety case methodology [58]. The safety case approach structures a logical argument, supported by evidence, that the system achieves an acceptable safety level for deployment within its ODD.

The company’s Verification and Validation (V&V) process integrates several layers of testing and evaluation. Billions of simulated miles are executed annually to evaluate system performance across a vast array of traffic scenarios, including rare but hazardous events that cannot be easily tested on public roads [59]. Closed-course track testing enables detailed evaluation of perception, prediction, and planning subsystems under controlled yet complex driving conditions. On-road testing with human safety drivers allows for direct observation and assessment of the ADS in its intended operational environments prior to rider-only deployment. Finally, each software release undergoes regression testing, combining simulation and track-based evaluations, to ensure performance consistency and prevent the reintroduction of known issues.

2.1.3. Simulation and Closed-Course Testing

Simulation forms the foundation of Waymo’s safety case. The company operates a large-scale virtual environment known as Carcraft in which digital replicas of real-world road networks, vehicles, pedestrians, cyclists, and environmental conditions are created. This environment enables the ADS to be tested against billions of miles of scenarios that include rare, dangerous, or adversarial situations—such as a jaywalking pedestrian stepping out from behind a parked vehicle, or an oncoming driver executing a sudden lane incursion—that would be unsafe to stage repeatedly in live testing [60]. Simulation also allows for sensitivity analyses, where perception errors or timing delays can be introduced artificially to measure the system’s robustness under degraded conditions.

Complementing simulation, Waymo employs closed-course testing at dedicated facilities designed to replicate complex urban environments. These courses incorporate intersections, crosswalks, roundabouts, and occlusion-inducing obstacles. Human-driven vehicles, mannequins, and remotely controlled robotic actors simulate interactions with vulnerable road users and aggressive or inattentive drivers. Specific maneuvers tested include emergency braking in response to sudden obstacles, evasive swerves to avoid cross-traffic, and precise yielding behavior at pedestrian crossings. Sensor and actuator performance are validated under varying lighting, weather, and road surface conditions. Importantly, closed-course testing also serves as the final validation step for new software releases before they are introduced to the public fleet, thereby reducing the risk of unintended regressions in real-world deployments.

Together, simulation and closed-course testing provide a dual-layered foundation for Waymo’s safety case: simulation delivers breadth and statistical diversity, while closed-course evaluations provide depth and controlled repeatability for high-risk interactions.

2.1.4. Evaluating Completeness of the Safety Case

A central challenge in ADS safety assurance is determining whether simulation and testing provide a sufficiently complete basis to claim safe deployment. Waymo addresses this issue through a combination of scenario design, statistical reasoning, and operational feedback.

Scenario selection for simulation is grounded in large-scale real-world data collection from both ADS-driven and human-driven fleets. Hazardous or unusual interactions—such as unprotected left turns, aggressive cut-ins, and vulnerable road user encounters—are extracted and abstracted into scenario templates. These templates are then parameterized to enable broad exploration of conditions (e.g., speeds, distances, occlusions), effectively stress-testing system performance at the boundaries of safe operation. Simulation thus provides statistical breadth, allowing rare events to be tested at scale.

Closed-course testing complements this by focusing on safety-critical “must-pass” scenarios. These are identified through structured hazard analyses, such as Failure Mode and Effects Analysis (FMEA) and System-Theoretic Process Analysis (STPA), which highlight potential failure points in perception, prediction, and planning. Test environments incorporate robotic actors, pedestrian dummies, and occlusion obstacles to replicate situations with the greatest safety impact if mishandled. Each new software release must pass this curated battery, ensuring consistency across updates.

Completeness is treated probabilistically, not absolutely. It is impossible to enumerate or test every conceivable driving scenario, given the effectively infinite combinations of agents, road geometries, and environmental factors. Instead, Waymo argues sufficiency by (1) systematically covering high-risk classes of scenarios, (2) ensuring diversity and volume through simulation, and (3) validating safety empirically by demonstrating crash rate reductions relative to human drivers. Novel real-world events that escape prior modeling are captured through continuous fleet monitoring, root-cause analysis, and rapid integration into simulation libraries and regression test suites.

This iterative process reflects the broader consensus in safety engineering: that a safety case must demonstrate reasonable completeness by systematically addressing foreseeable hazards and empirically bounding residual risks, rather than by claiming exhaustive coverage [58,61].

2.1.5. Constructing the Human Benchmark

Although the goal of autonomous driving is to improve the safety of the driving task, it is unlikely that a perfect record of zero accidents is achievable. Instead, the standard sought to be achieved by Waymo is to exceed the performance of an ideally safe human driver, removing the effects of distractedness and impairment. To evaluate ADS performance in a statistically meaningful way, Waymo constructs a geographically matched human driver benchmark against which crash rates can be compared [60]. This process begins with the collection of police-reported crash data and information on vehicle miles traveled (VMT) from the same regions in which the ADS operates. The dataset is filtered to include only passenger vehicle crashes occurring on roadway types and speed limits comparable to those in the ADS’s ODD, and where the driver was in full control of their faculties.

Recognizing that police-reported data underrepresents the total number of crashes—particularly those involving minor injuries—Waymo applies a correction factor of 32%, based on NHTSA crash cost studies [62], to adjust for underreporting of injury crashes. In addition, the data are spatially re-weighted to match the ODD’s exposure profile, ensuring that comparisons are made between equivalent driving environments. The final benchmark metrics are expressed as incidents per million miles (IPMM) across several severity tiers, including “Any-Injury-Reported,” “Airbag Deployment,” and “Suspected Serious Injury+.”

2.1.6. Safety Performance Results

As of January 2025, Waymo reports a total of 56.7 million rider-only miles driven within its ODD, concentrated primarily in Phoenix, San Francisco, and Los Angeles. Analysis of crash rates across the defined severity tiers reveals substantial reductions compared to the human driver benchmark. The results are summarized in Table 4 below.

These reductions are both statistically significant and robust to variations in the human benchmark construction methodology [60,63]. The magnitude of improvement is consistent across severity tiers, with the greatest relative reductions observed in the most severe crash categories.

2.1.7. Regulatory Compliance

The regulatory environment for autonomous driving in the United States is built on a combination of federal safety standards, voluntary guidance, and industry best practices, rather than prescriptive pre-market certification. At the federal level, Waymo vehicles must comply with the Federal Motor Vehicle Safety Standards (FMVSS) for crashworthiness, occupant protection, braking, and lighting. Because Waymo’s current platforms retain conventional driver controls, they remain largely FMVSS-compliant, but future purpose-built vehicles may require exemptions or alternative compliance pathways. Oversight by the National Highway Traffic Safety Administration (NHTSA) is primarily post-market, relying on mandatory crash reporting under the Standing General Order 2021-01 and recall authority for unsafe defects, including software-based hazards.

In parallel, NHTSA has published a series of voluntary guidance frameworks—Automated Driving Systems 2.0: A Vision for Safety, Automated Vehicles 3.0, and Automated Vehicles 4.0 —that encourage transparency in areas such as safety case construction, cybersecurity, and human-machine interaction. While not binding, these frameworks shape industry reporting practices, and Waymo has aligned its public safety hub and safety case disclosures with their recommendations. State-level regimes, such as California’s DMV permit system and Arizona’s executive orders, impose additional requirements for disengagement reporting, proof of insurance, and safety documentation.

Beyond legal obligations, Waymo incorporates a range of industry safety standards that, while not mandatory, provide structured methodologies for hazard analysis and risk reduction. These include ISO 26262 [64] for functional safety, ISO/PAS 21448 (SOTIF) [65] for safety of intended functionality, and UL 4600, which formalizes safety case frameworks for autonomous systems. SAE J3016 [54], while not a safety standard per se, provides the common taxonomy for describing levels of driving automation and is central to regulatory classification. Waymo’s public statements indicate alignment with these standards as a means of bolstering the credibility and completeness of its safety case.

Finally, international considerations are becoming increasingly important as markets mature. The European Union, through the UNECE framework, has adopted binding regulations such as UN R157 (Automated Lane Keeping Systems) and cybersecurity/OTA update rules (UN R155, R156). These create a more prescriptive pre-market approval model compared to the U.S. self-certification regime. While Waymo’s deployments remain U.S.-based, international expansion would require adaptation of its safety case to conform to type-approval systems rather than post-market oversight.

2.1.8. Discussion and Future Directions

The available evidence suggests that Waymo’s ADS achieves a statistically significant safety advantage over human drivers when operating within its current ODD. Nonetheless, several technical and statistical challenges remain. Expanding the ODD to include more complex road types, adverse weather conditions, and greater interactions with vulnerable road users will require advances in perception, prediction, and decision-making capabilities. The handling of rare, high-severity events remains a key research focus, particularly as these events are infrequent enough to require extremely large exposure mileage for robust statistical estimation [61].

Another open question concerns the harmonization of safety benchmarks across jurisdictions. Current methods, while rigorous, are specific to the regions where the ADS operates. Broader certification will require standardized methodologies that can accommodate geographic, demographic, and infrastructural differences. Future research may also benefit from the integration of surrogate safety measures, blending real-world performance data with simulation results to accelerate the demonstration of safety benefits without requiring impractically large on-road exposure.

2.2. Case Study: Adoption of AI in the Aviation Industry

2.2.1. Introduction and Background

Artificial Intelligence (AI) is disrupting every industry, and aviation is no exception. While automation has been an integral part of aviation for decades, AI and Machine Learning (ML) represent a comparatively new and evolving development within the sector. Early adoption is predominantly concentrated in off-wing applications, where AI and ML are employed to enhance predictive and preventive maintenance, optimize operation and flight performance, and support air traffic management. Air safety regulators such as the Federal Aviation Administration (FAA) and the European Union Aviation Safety Agency (EASA) currently have no set path to certification for fully autonomous flying aircraft. However, that’s something both agencies are working on in coordination with industry, research institutions, OEMs and international partners to ensure the certification processes can keep up with the technological advancements. Recognizing the transformative potential of these technologies, aviation regulators have begun to formalize strategies for their safe integration into aircraft systems and operations. In 2023, the European Union Aviation Safety Agency (EASA) published its Artificial Intelligence Roadmap 2.0 [56], and the Federal Aviation Administration (FAA) released its Roadmap for Artificial Intelligence Safety Assurance [66] a year later. Both documents provide high-level insight into industry approaches and outline pathways for assuring the safety of AI in aircraft design and operation. They also establish guiding principles and priority areas aimed at enabling the responsible, risk-informed, and orderly introduction of AI-enabled technologies into the aviation domain. These documents are complemented by regular technical interchange meetings, workshops, and collaborative forums where regulators, manufacturers, operators, and standards organizations share research findings, discuss operational experiences, and refine common safety methodologies. This coordinated effort ensures that AI/ML development benefits from both industry innovation and regulatory oversight, building the safety evidence and operational confidence needed for broader adoption.

2.2.2. Industry AI Status and AI Category

In the EASA Concept Paper: Guidance for Level 1&2 machine learning applications [67] EASA provides the first set of technical objectives and organisation provisions that they anticipate as necessary for the approval of Level 1 AI applications (‘assistance to human’) and Level 2 AI applications (‘human-AI teaming’) while Level 3 AI applications (‘advanced automation’) is not enabled at the moment and will be covered in a future revision. Consequently, it only covers supervised learning and unsupervised learning, but not reinforcement learning. For details of AI classification as per EASA, refer to Table 5 below.

In these early stages of AI and ML adoption in aviation, the FAA and EASA have broadly aligned strategies, particularly for what EASA classifies as Level 1 and Level 2 machine learning applications. Both agencies emphasize integrating AI into existing aviation safety and certification frameworks, ensuring that these technologies enhance safety without introducing unacceptable risks. Their approach is incremental, advancing from low-risk applications toward more safety-critical functions as operational experience and assurance evidence accumulate. To manage risk, each AI-based system is assigned an assurance level that reflects the severity of potential consequences in the event of failure. For aircraft design, under EASA’s current policy, supervised learning applications are initially acceptable only if they do not involve Design Assurance Level (DAL) A (Catastrophic) or DAL B (Hazardous) classifications, while unsupervised learning applications are limited to DAL D (Minor). In practice, this means that a failure in a supervised learning system must not result in the loss of an aircraft or lead to severe injuries, whereas a failure in an unsupervised learning system may only cause a slight reduction in the aircraft’s safety margins.

Most industry players were expected to begin with Level 1 AI “assistance” applications by 2025. Gradual adoption of more automated Level 2 solutions supporting extended Minimal Crew Operations (eMCO), Single-Pilot Operations (SPO), and virtual co-controllers in Air Traffic Management (ATM) is projected around 2035. More advanced automation (Levels 3A and 3B) is expected between 2035 and 2050, though certain sectors, such as the drone industry, may advance faster [67].

2.2.3. Impetus for Increasing Levels of Autonomy

The motivation for adopting Artificial Intelligence (AI) and Machine Learning (ML) in aviation over traditional algorithmic approaches lies primarily in their ability to address complex, dynamic operational environments. Traditional coded systems require all decision rules to be explicitly programmed, which becomes impractical when dealing with high variability, such as rapidly changing weather, complex airport taxiway layouts, or subtle sensor anomalies. AI/ML, by contrast, can learn from vast historical and real-time datasets to recognize patterns, predict potential hazards, and adapt to evolving conditions without a complete rewrite of the software logic. This capability enables enhanced decision support, allowing systems to process multi-sensor inputs and provide pilots or operators with timely, context-aware recommendations. Depending on the application, AI can be a more economical and efficient way to get things done. In other cases, it has opened entirely new possibilities that never existed before.

For instance, a Level 1 AI application is the Visual Landing Guidance System (VLS) developed by Daedalean [67]. The system provides landing guidance for Part 91 (General Aviation) aircraft on hard-surface runways in daytime Visual Meteorological Conditions (VMC), using a forward-looking high-resolution camera as the only external sensor. During daytime VMC flight under Visual Flight Rules (VFR), the system recognizes and tracks hard-surface runways present in the field of view and allows the operator to select the one intended for landing or use a pre-configured selection based on a flight plan. Once a runway has been selected and once the aircraft begins its final descent towards it, the VLS provides the position of the aircraft in the runway coordinate frame as well as horizontal and vertical deviations from a configured glide slope, similar to a radio-based instrument landing system (ILS). Uncertainties and validity flags for all outputs are also produced by the system. The system employs a convolutional neural network responsible for identifying the boundaries and orientation of the runway, as described in Figure 5. The use of a convolutional neural network (CNN) to identify runway boundaries and orientation offers a significant advantage over traditional algorithmic coding, as conventional image-processing methods would require extensive, explicitly coded rules to handle diverse runway appearances, lighting variations, and environmental conditions. By contrast, the AI-based approach (see Table 5) enables the system to learn from a broad dataset of runway images, improving its robustness and adaptability to different airports and scenarios without the need for rewriting detection logic [68].

Furthermore, AI/ML facilitates automation of intricate or previously manual processes—such as predictive maintenance, anomaly detection, or autonomous taxi—thereby reducing crew workload and operational costs. For instance, EASA and Boeing are researching an Auto Taxi System that can autonomously taxi an aircraft once the appropriate clearance is received. After receiving clearance from Ground Control, the system is capable of providing a readback and planning the appropriate taxi route. It then controls the aircraft as it moves from the gate to the assigned runway. The system is designed to detect potential obstacles in the aircraft’s path and can stop, follow, or navigate around them as necessary. This is achieved using optical cameras and LiDAR sensors. The system is classified as a Level 2A AI—Human-AI Cooperation because the system can autonomously control the aircraft along the planned route, but the pilots, as the end users, are still required to monitor its performance and can intervene at any time, as depicted by the communication flow in Figure 6. Demonstrating a step further into the right direction, in their concept paper, EASA Scientific Committee has proposed a virtual copilot concept to support Single Pilot Operations (SPO) through human–AI teaming and crew resource management. Acting as a copilot, it would share tasks, maintain common goals with the Pilot in Command (PIC), and build situational awareness from real-time events to assist in decision-making. The system could adjust its support based on pilot activity, displayed information, and the pilot’s mental or physical state, detected via sensors and cameras. It would also monitor workload, communications, and aircraft position, intervening when necessary. It’s classified as level 2B AI because it shares the same set of common goals with the PIC, while the system is capable of automating certain decisions with supervision from the PIC. Thus, partially reducing the ‘authority’ of the PIC. These level 2 AI applications demonstrate how advanced automation can enhance operational efficiency, improve safety margins, and enable new operational models without removing human oversight from critical decision-making.

While the current common goal is to reduce pilot workload and enhance aircraft safety, it’s no secret that the industry is actively pursuing autonomous flight systems capable of gate-to-gate operation without direct pilot control. In such a system, the AI would assume full responsibility for specific phases of operation, with the human operator acting primarily as a safety supervisor rather than an active controller. Wisk Aerospace highlights that beyond safety improvements, autonomy offers significant financial benefits through avoiding potential pilot shortages, reducing operating costs and increasing passenger access and easier short and long-term maintenance [69].

2.2.4. Certification of AI Systems

For certification purposes, Systems must be designed to perform their intended functions under all foreseeable operating conditions. This requirement applies to the entire system, which includes the AI component. To gain confidence in the trustworthiness of an AI/ML application and to mitigate the AI black box concept, additional building blocks are introduced as a new AI framework on top of the regular established certification standards. The framework includes the following building blocks: AI Trustworthiness Block, AI Assurance Block, AI Human Factors Block and AI Safety Risk Mitigation Block [67].

The AI trustworthiness block starts with characterizing the AI application, which includes an ethics-based assessment, and also encompasses the safety assessment and security assessment [67].

The AI assurance building block provides guidance specific to AI-based systems and focuses on three key areas. Firstly, learning assurance addresses the shift from traditional rule-based programming to data-driven learning, highlighting gaps in existing development assurance methods that do not fully cover AI/ML learning processes, as seen in Figure 7. Secondly, development and post-operational explainability ensure that users receive understandable, reliable, and relevant information about how an AI/ML application produces its results. Finally, data recording supports both continuous safety monitoring and investigations following incidents or accidents [67].

The “human factors for AI” building block outlines guidance for addressing human—AI interaction needs, including operational explainability to deliver timely and comprehensible information to end users, and the concept of human—AI teaming to promote effective collaboration between operators and AI systems [67].

Finally, the “AI safety risk mitigation” building block recognizes that full transparency of the “AI black box” may not always be possible, and that residual risks arising from this uncertainty must be managed through appropriate mitigation strategies [67].

Currently, even with the above published framework, certifying AI and ML systems in aviation presents several significant challenges. Firstly, traditional certification frameworks are designed for deterministic, rule-based systems, whereas AI/ML models are inherently probabilistic and adaptive, complicating safety assurance. Secondly, the “black box” nature of many AI algorithms limits interpretability and explainability, making it difficult for regulators to fully understand decision-making processes. Thirdly, the data-driven learning approach requires large, diverse, and high-quality datasets, raising concerns about data representativeness and bias. While not currently a threat, with the introduction of reinforcement learning in the future, AI/ML systems may evolve post-certification through retraining or online learning, posing challenges for continuous compliance. Finally, existing regulatory guidance is still evolving, resulting in uncertainty around appropriate certification paths and standards for autonomous or AI-assisted functions in aircraft design and operations [56,66,67].

2.2.5. The Future of Autonomy in Aviation

Despite significant progress in integrating AI and ML into aviation, several open areas remain for improvement and exploration. Areas such as improving AI transparency and explainability are crucial to build trust among regulators and operators. This will naturally advance over time as experience is gained from existing AI applications in service. Further research is also needed on ensuring consistent safety assurance as AI systems learn and evolve. Expanding AI’s role into more complex flight operations and refining human–AI collaboration models will be key to unlocking its full potential. However, the path forward is not without obstacles. Certification challenges persist due to the fundamentally probabilistic nature of AI and the difficulty in fitting these systems into traditional rule-based regulatory frameworks. Concerns about data integrity, bias, and cybersecurity also pose risks that must be carefully managed. Additionally, integrating autonomous systems with existing air traffic management and ensuring pilots remain situationally aware in increasingly automated cockpits are ongoing hurdles. Overcoming these challenges will be essential for the safe and effective adoption of AI-powered autonomy in the aviation industry.

2.3. Case Study: How Autonomy Is Adopted on the International Space Station

2.3.1. Introduction and Background

Automation has played a critical role in supporting the International Space Station (ISS) Program (ISSP), with applications ranging from simple, rule-based systems to more complex, semi-autonomous operations. One foundational example is Fault Detection, Isolation, and Recovery (FIDR), a class of automation designed to take immediate, critical action in response to system failures. ISS FDIR systems are responsible for actions such as automatically closing inter-module ventilation valves to prevent the spread of contaminated air during a fire or toxic atmosphere event, and initiating failover from a malfunctioning primary command and control computer to a hot backup to ensure uninterrupted spacecraft control.

Over the ISS’s 25-year operational history, automation has become increasingly critical to improving operational safety, repeatability, and efficiency. This section explores the application of automation to ISS in the domain of extravehicular robotics operations, focusing on the development and use of two software applications: (1) the Automated Robotics Mission Designer (ARMD), and (2) the Mobile Servicing System Application Computer (MAC). These examples illustrate common successes and challenges of integrating automation into human spaceflight ecosystems and offer insights into the future of autonomous systems as NASA prepares for human missions beyond low-Earth orbit.

2.3.2. Automation in Robotics Mission Design

ISS extravehicular robotics operations rely heavily on the Space Station Remote Manipulator System (SSRMS, or Canadarm2) and the Special Purpose Dexterous Manipulator (SPDM, or Dextre). These robotic systems are essential for tasks such as ISS assembly and reconfiguration, maintenance, positioning of scientific payloads, and maneuvering spacewalking astronauts. Between 2009 and 2019, the ISS experienced a 70% increase in the number of days of robotics operations performed by ground-based flight controllers, and a greater than 600% increase in the number of commands issued to the robotics systems [70]. This surge in activity prompted NASA’s Robotics Operations Branch to seek ways to improve the efficiency of robotics operations planning and execution.

Recognizing that the time required to plan ISS robotics operations far exceeds the time needed to execute the operations, the team worked with software developers within NASA’s Flight Operations Directorate to initiate development of a software application capable of automating many of the previously manual and iterative tasks associated with robotics mission design. Christened the Automated Robotics Mission Designer (ARMD), the application can generate safe, collision-free trajectories for the robots, taking into account ISS structural constraints, keep-out zones, and robot singularities. ARMD also produces operator notes detailing clearance concerns and potential intrusions into the ISS system and payload envelopes.

Given a high-level sequence of events—such as releasing a grapple fixture, maneuvering the robot to a target configuration or location, grappling a payload, and then maneuvering the payload to a target location—ARMD can automatically generate detailed procedures and command scripts for execution by flight controllers. The software has delivered significant time and cost savings, with a return on investment realized within the first 18 months of use [70]. Over time, confidence in ARMD’s performance has grown, allowing for a significant reduction in the required level of verification and validation of its outputs.

Despite its successes, the ARMD project has faced some challenges. Automating all desired tasks and implementing additional software features has not, to date, had sufficient resources allocated for completion. Determining the appropriate level of end-to-end system understanding required by operators and assessing the degree to which new team members learn to design robotics operations manually before being permitted to use ARMD is an ongoing area of research. Additionally, although the team notes that ARMD’s overall error rate is low, the errors that do still occur are often subtle and require expert robotics system and mission planning knowledge to detect and resolve. For more information on the ARMD project, see Lucier [70].

2.3.3. The Mobile Servicing System Application Computer (MAC)

The MAC project similarly represents a step forward in the automation of extravehicular robotics operations for the ISSP. Developed by the Canadian Space Agency and MDA Space to assist with both the planning and the real-time execution of robotics operations, the MAC software suite and its associated hardware functions onboard the ISS, allowing it to support dynamic, in-situ decision-making and execution.

The primary motivation for the MAC project was to gain experience with systems employing increased autonomy in preparation for human missions to the Moon and Mars. These missions will require greater independence from Earth due to communication delays and resulting limitation in real-time ground-based support. MAC successfully demonstrated proof-of-concept capability for complex robotics operations in 2022 and 2023 and is now actively used by NASA’s Robotics Operations Branch to automate repetitive, command-intensive tasks such as system powerup.

Although MAC’s positioning onboard the vehicle is representative of the sort of edge computing solutions required for Mars-class missions, this aspect of the design presented the unique challenge of requiring that the application itself, and all operations scripts it produces, be classified as “flight software” and therefore subject to rigorous validation and verification by certified flight controllers using a high-fidelity ISS software simulator. While this process mitigates operational risk, it also reduces the return on investment by increasing labor requirements and introducing schedule delays. Additionally, the MAC project highlighted broader challenges likely to persist in future automation efforts. Specifically, MAC costs associated with development, testing, operator learning, and ongoing usage are higher than for ARMD. Similarly, there are difficulties integrating automation into legacy vehicle systems not originally designed for it, and an ongoing tension between system autonomy and operator authority.

One notable issue related to the latter is a requirement for frequent “authority to proceed” commands from human operators while executing MAC-assisted operations, a safeguard that has had the inadvertent effect of introducing risk of increased operator complacency. Moreover, the complexity of MAC’s layered software and script architectures can make it difficult for users to determine how and when to intervene during failures, underscoring the importance of designing automation systems with operator usability and situational awareness in mind.

Despite these limitations, MAC has proven particularly valuable in automating tasks such as system powerup, and the foundational needs of MAC addresses remain highly relevant for future, deep-space human exploration. For more information on the MAC project, the reader is directed to Rembala [71].

2.3.4. Lessons Learned from ISS Autonomy

Both ARMD and MAC represent important milestones in the evolution of automation within the ISSP. Their development and ongoing operational use have provided critical insights into how automation can be effectively integrated into human spaceflight ecosystems. Key lessons learned include the importance of considering operator needs early in the development process, prioritizing automation of the most labor-intensive tasks, and accounting for verification and validation requirements when selecting use cases. These insights will be vital as the industry continues to develop the autonomous and automated systems necessary for Earth-independent human space exploration.

2.4. Case Study: Orbital Express Robotic Rendezvous in Low Earth Orbit

2.4.1. Background

The Orbital Express program was a successful mission developed by Defense Advanced Research Projects Agency (DARPA) and Boeing, with partners including Ball Aerospace, NASA, MDA Space, Northrop Grumman Space Technology, Charles Stark Draper Laboratory Inc., and Starsys Research. The goal of this mission was to demonstrate autonomous on-orbit satellite rendezvous and docking (RVD), and servicing (including fluid and On-orbit Replacement Unit transfers). Launched on 8 March 2007, the mission successfully completed just over four months later on 22 July 2007, and consisted of the rendezvous of two satellites: Boeing’s ASTRO—the servicing vehicle, or chaser satellite, equipped with the MDA Space Orbital Express Demonstration Manipulator System (OEDMS); and Ball Aerospace’s NextSat Client spacecraft.

The motivation for autonomy on this mission was the desire to prove the feasibility of in-situ satellite servicing. Satellites, particularly uncrewed satellites, are the most numerous of all space systems, and as satellite quantity continues to scale, as well as with deeper space exploration, increasing the lifespan and improving the performance of missions through servicing becomes inextricable from autonomy.

Flight operations were performed with increasing levels of autonomy over the course of several mission scenarios over the mission lifespan. Early operational sequences included multiple Authority to Proceed (ATP) points, where the flight system would halt at an ATP and wait for confirmation from ground operators before continuing. The number of ATPs was reduced as the mission proceeded, and in the final mission scenario, a long and highly complex operational sequence was performed without ATPs or human intervention [72]. This level of autonomy was enabled due to features including, but not limited to

Autonomous Rendezvous and Capture Sensor System (ARCSS), a suite of sensors to obtain relative telemetry between ASTRO and NextSat [73];
AutoGuide and AutoNav, tools to develop commands for navigation of ASTRO when approaching or departing from NextSat [73];
ASTRO Flight Control System, to execute ASTRO motion commands from AutoGuide/AutoNav [73];
Mission Manager software to respond to anomalies, monitor health, and plan activities (to take over from teleoperated mode) [73];
Activity Scheduling, Planning ENvironment (ASPEN), an automated ground-based tool used to plan and modify long-term and daily operations developed (and operator-verified and edited) [24].

In particular, the OEDMS was a robotic manipulator with a vision system and control system, including visual servo mode, with the capability for teleoperation, supervised operation, or full autonomy via command from ASTRO’s mission manager [74].

2.4.2. Steps to Increased Autonomy

Full autonomy in this mission was achieved by starting with limited autonomous control and gradually increasing it throughout the demonstration. The lowest autonomy level consisted of pausing for ATP points between every scripted command. Ground supervision and approval gradually decreased, and telemetry was verified throughout against expected results from mission simulations. The descriptions of key tasks completed at each level of autonomy are defined below [72]:

Autonomy Level 1—ATP points for ground supervision and confirmation between every script command/execution. Operations performed at level 1 autonomy in the initial scenario of the mission included: coupler mating, fluid transfer, and battery transfer (via OEDMS) between the two satellites. Represents initial manual or supervised operations.
Autonomy Level 2—Increased use of autonomy with fewer ATP points. Operations performed at level 2 autonomy included fluid and ORU transfers in Scenario 1.
Autonomy Level 3—Increased use of autonomy with fewer ATP points. This included ORU and fluid transfers in Scenarios 2 and 7.
Autonomy Level 4—Full autonomy, with no ATP points and ground control merely overseeing operations without intervention, including autonomous fly arounds, station keeping, berthing, direct captures, and fluid and ORU transfers. Full autonomy was achieved gradually throughout the missions, including for fly-around and direct capture in Scenario 5, and later for the entire final rendezvous and servicing scenario, Scenario 8 (the Design Reference Mission).

Each task defined above can be considered an atomic task, meaning it is a core functionality that can be built upon to develop more complex actions. In Table 6, the progression of each atomic task through the Autonomy Levels across select scenarios is demonstrated, highlighting the combination and evolution of tasks to achieve full autonomy.

At the highest level of autonomy, with no ATP points or ground approvals needed, ASTRO demated from NextSat, before returning, performing a fly-around about NextSat, and approaching the target satellite, allowing OEDMS to grapple and berth NextSat, before conducting propellant and a battery ORU transfer between the satellites—again, all conducted fully autonomously [72]. With each successive scenario, autonomy increased, and the number of tasks performed together without initiating commands from ground in between increased as well. This method of gradually increasing complexity and autonomy aided in the mission successfully accomplishing fully autonomous satellite rendezvous, docking, and servicing operations.

Note that, for the OEDMS, checkouts were performed prior to the scenarios, including for the arm to approach NextSat without grappling, for ground operators to compare telemetry results from the visual servo operations to dynamic simulations and verify autonomy before any higher-risk operations were completed. Seven battery transfer operations and seven visual servo operations (including two full free-flyer captures) were conducted between lowest-level autonomy and full autonomy, demonstrating the evolution of ground supervision over the mission [74].

2.5. Case Study: Autonomy on the Surface of Mars with the Curiosity and Perseverance Rovers

The Curiosity rover, operated by the Mars Science Laboratory mission [75] and the Perseverance rover, operated by the Mars 2020 mission [76] are the two most recent of NASA’s Mars rover missions, and among the most complex spacecraft sent to planetary destinations. The operations for both rovers build on previous experience and advances on prior rover missions, including autonomy capabilities. The system complexity and ambitious science goals of both projects motivated automation and autonomy in a number of areas, either to reduce human workload and error risk, or to increase the capability of the flight system. Among the autonomy capabilities in use on the rovers [7] are Autonav [77], Simple Planner [78], AEGIS [79], and PIXL Adaptive Sampling [80].

2.5.1. Example Autonomy Systems on Current Mars Rovers

Autonav allows the rover to autonomously find a driving path from a starting point to a goal selected by the human operations team, while avoiding obstacles and satisfying other safety and operational constraints. The Autonav system on Curiosity is an advancement based on the system used previously on the MER rovers [77]; the Perseverance system is further enhanced, including the ability to perform perception and planning tasks while in motion, which speeds up the net traverse rate and allows the rover to travel distances of hundreds of metres per day when desired [81].

Simple Planner is an operations planning system, used on Mars 2020, which includes ground software tools and a flight software element called the OnBoard Planner (OBP). Simple Planner allows the development of flexible activity plans for the rover in which OBP determines the execution times of activities based on constraints set by the operations team. In addition to the allowed times for science activities, for example, OBP manages conflicts between activities that cannot run in parallel or that must run in a particular order, and manages onboard resources such as available electrical power and battery discharge limits. Certain subsystems on the rover require heating to reach safe temperatures at certain times of the Martian day, and OBP also autonomously computes the needed durations of heating for activities when scheduling them, and can adjust those times if the mechanisms do not reach the desired temperature when predicted. For activities with variable execution duration, or with an unexpected duration due to anomalous execution, OBP will detect such events and rebuild a new schedule to adapt. As such, OBP is capable of onboard autonomous activity scheduling and re-scheduling, and of adapting to actual environmental conditions, resource availability, and unexpected events on Mars. This employs information not available to the operations team at uplink planning time, and increases operational efficiency greatly. In addition to the OBP flight software, the Simple Planner system includes ground software tools for activity planning and simulation of the flight software behaviour. This includes a system that schedules activities in the ground tool according to the same constraints and logic used by OBP aboard the rover.

AEGIS (Autonomous Exploration for Gathering Increased Science) is a flight software system that allows autonomous target selection for remote sensing instruments. It is used on the Curiosity rover to select targets for the ChemCam instrument, a remote Laser Induced Breakdown Spectrometer (LIBS) with a telescopic context camera called the Remote Micro-Imager (RMI). On Perseverance, it selects targets for the similar SuperCam instrument, which includes the LIBS and RMI sensing modes, as well as capabilities for Raman, reflectance, and luminescence spectroscopy, and a microphone. AEGIS is principally used to acquire ChemCam or SuperCam measurements of geological targets around the rover after it has driven to a new locale, but before images of the new locale have been returned to Earth for the operations team to select new science targets. It has significantly increased the rate at which ChemCam and SuperCam measurements can be made, and thus the richness and density of the geochemical survey conducted along the rovers’ traverses.

2.5.2. Successful Strategies for Implementing and Advancing Autonomy on Mars Rover Missions

A number of strategies for the successful deployment of increasingly capable autonomy systems are common to the capabilities described above. In general, early work in scoping the capability and socializing it with stakeholders throughout the mission system pays off greatly. Several of the strategies employed, to positive effect, are summarized below.

2.5.3. Make Clear the Case for Autonomy

It’s important at the outset to have a clear explanation of the value of the proposed autonomy. In the case of AEGIS, the Curiosity rover was more than three years into its surface mission before the software was deployed; an argument could readily have been made that the mission had been successful without the software. The AEGIS team was able to show the potential increase in science return, both to the ChemCam instrument team, who operate the instrument and also interpret and use the science data it returns, but also the Mars Science Laboratory engineering and project management teams, who are responsible for the rover system and the allocation of resources for operations and flight system evolution. The increased science return was plausible based on an understanding of the rover’s operational cadence (there would be times when AEGIS would be useful and usable) and the manner of operations of ChemCam (the capability would provide more and useful science data). This benefit was also balanced by new costs, complexity, and risks, for which the AEGIS team was able to show the scale, character, and mitigations for any identified risks. The concerns are important to project managers, system engineers, and operators, both in the development phase and in operations, and if they appear to outweigh the benefits, they can prevent deployment, or once deployed, prevent successful employment of a new autonomy capability. Strategies for managing these concerns are further discussed in subsequent subsections.

A clear motivation for why and how an autonomy system will bring a meaningful benefit to a mission is essential in enabling its deployment–budget, complexity, risk, and other concerns will inevitably lead to the question “Is this worth it?”, and it’s essential that the answer be “Yes, and here, specifically, is why and how.”

2.5.4. Involve Stakeholders Early and Continually

Systems that introduce new autonomous behaviours can bring significant changes to a space mission. For complex missions, it can be a significant task to understand the effects. This is true for members of the mission team who will have to understand how to use the autonomy system while achieving their mission goals and protecting the spacecraft, raising questions among different stakeholders, such as

Does the autonomous targeting system affect the ability of human operators to select science observations?
Are there circumstances in which the adaptive sampling algorithm should not be used?
Which restrictions apply to Autonav driving, and are they different from operator-directed drives?
Will the Onboard Planner’s freedom to schedule observations affect the quality of scientific measurements, which are sometimes very specific in their timing requirements?

Resolving these kinds of questions, and indeed recognizing which questions need to be answered, demands clear communication between personnel responsible for the autonomous system and other stakeholders in the mission. These include engineers, designers, operators, managers, and the science and instrument teams. Communication and access to specialists on each side enables resolving these questions, but it also helps to build confidence in the autonomy system among members of the mission team.

2.5.5. Add Safety Protections, Don’t Remove Them

Mission system engineers and spacecraft operators have developed procedures and systems to prevent major risks from being realized aboard the spacecraft. These can include ground tools that verify uplink products to check for disallowed or incompatible command elements, or onboard systems that validate commands or prevent unsafe executions. These procedures and systems can become essential to safe operations, with the assumption being that they will protect the spacecraft, and the consequent assumption that the procedures must be followed and the ground and flight systems must never be deactivated or bypassed. In such cases, any operation that requires changes or workarounds to these systems can be viewed as risky or complex.

For example, on both rover missions, ground procedures exist to prevent pointing ChemCam or SuperCam in unsafe directions. Each instrument contains a powerful laser which is capable of vapourizing the surface of solid rock. Such a laser is a potential hazard if accidentally targeted at the rover’s own hardware, much of which is potentially visible to ChemCam or SuperCam, which are mounted on the pan-tilt stage of their rovers’ Remote Sensing Mast (RSM).

AEGIS deployment meant allowing the onboard computer to select target locations, violating the assumption that humans on the ground would vet targets for laser collision. This risk was mitigated in a number of ways. The AEGIS system was deployed with internal checks to exclude targets in the laser collision zone. Importantly, these checks are made, internal to AEGIS, without removing the system-level checks that were already in place and in which the mission and instrument teams had confidence.

Finally, with the new systems added to the existing protections, the operator procedures were adapted to allow operators to ensure, and demonstrate, that AEGIS-targeted observations would not violate laser collision, as they do for human-targeted observations.

2.5.6. Recognize the Value of Familiarity and Heritage

Confidence in autonomous systems can be built over time, but for new operators or new systems, an argument from heritage can often be convincing. For example, when AEGIS was deployed to Curiosity, it benefited significantly from previous use as an experimental deployment on the Mars Exploration Rover (MER) Opportunity. Much of the MSL AEGIS software was common to the MER deployment, which simplified V&V, but also built confidence in engineering operations personnel. The ChemCam team, however, had mostly not been part of the MER mission, and the MER deployment had selected targets for a camera rather than a laser spectrometer. In addition to the heritage argument, it was necessary to demonstrate to the ChemCam team, by analysis, V&V, and procedural and software controls, that the system was safe and reliable.

Simple Planner, by comparison, was very novel; at the time of its deployment to Mars 2020, no previous rover mission had used such an autonomous scheduling system. However, because the ground tools were deployed at the start of surface ops, key elements of the system were already in place for the deployment of the Onboard Planer flight software element. For example, operators have been using the Copilot ground scheduling tool, which replicates the logic used onboard by OBP to build plans, since landing. This enhanced trust in the new OBP flight software, since its scheduling algorithm had been successful on the ground, and had become familiar to the operations team.

2.5.7. Limits to Freedom of Action

Another way to reduce both risk and the concerns of operators, project managers, and science users is to constrain the range of actions an autonomous system can take. By circumscribing the freedom of action of the system, stakeholders can be reassured that the autonomy will not affect unrelated systems or behave in concerning ways.

For example, while rover Autonav can handle a great diversity of terrains, and has been tested in software simulations, physical testbeds, and in its experience on Mars, it is not given unrestricted freedom of path selection. The destination goal is set by the operators on Earth, but they also set keep-out zones of terrain which is suspected to contain significant hazards. Thus, while the Autonav will have the freedom to adjust its path (sometimes significantly) to avoid hazards and navigate towards its goal, operators can be confident it will not enter unsafe areas, or wander into very unexpected areas.

Such limitations to the freedom of action of autonomous systems can reduce complexity, make verification and validation tractable, and build confidence among system engineers, managers, operators, and science users in a way that enables deployment.

2.5.8. Fail Gracefully

Autonomous systems will on occasion reach conditions which are beyond their ability to manage without operator intervention. Under such circumstances, they should terminate their activities in a manner that preserves the safety of the spacecraft and its instrumentation, and ideally preserves the opportunity to collect scientific observations in a future command cycle.

AEGIS can, under certain circumstances, fail to detect any rock targets in its source image–this happened, for example, after an RSM fault on Curiosity, which resulted in the source image erroneously being pointed at the sky. In this case, AEGIS correctly detected no rocks and terminated its commanding without ever pointing or focusing ChemCam or activating the LIBS laser.

Operators can have confidence in autonomous systems if, when they encounter problems or conditions beyond their programming, they fail gracefully to safe states.

2.5.9. Minimize Operational Complexity

On complex missions with complex spacecraft such as Curiosity and Perseverance, the autonomy systems add their own complexities. It is also the case that they will be used by operators who have expertise in some aspect of the mission, but not necessarily in the autonomy system itself. The SuperCam team, for example, is composed mostly of geologists, geochemists, and instrument engineers, rather than specialists in autonomous robotics or flight software; this team is nonetheless responsible for the AEGIS system, which autonomously selects some of their instruments’ science targets. One important strategy to ensure autonomous systems can safely and regularly be used by the operations team is to manage the operational complexity.

In the case of AEGIS, parameters for rock selection are set to a small number of defined ‘scene profiles’ by specialists from the AEGIS software team. There are a great many such parameters that can, in principle, be adjusted for each uplink by the SuperCam team, but in practice, they rely on the default scene profiles. This allows highly capable target selection (which can be updated over time by the AEGIS specialists as needed) while avoiding the need for operators to consider how to adjust target-finding parameters or develop the expertise to do so.

2.5.10. Autonomy Deployment Process

Acceptance of an autonomy system relies on key steps in the deployment process, and can be set up for success by certain choices in the implementation. It is essential, for example, that a robust verification and validation process is conducted, considering all the reasonable nominal and off-nominal cases for the autonomy software itself, and for its interactions with the spacecraft and mission system more generally.

In many cases, a stagewise deployment is helpful. The Perseverance Autonav was extended in stages with new capabilities, wider acceptable conditions for use, and additional options over the course of several years in operations. AEGIS on Perseverance was deployed in two stages, allowing the team to become accustomed to using the system with a basic set of options before opening up more features to take advantage of desirable flexibility once operator experience and confidence had been built up.

Deployment of new autonomous systems must necessarily include training for operators. This would likely include practice sessions for using the tools, as well as relevant documentation that can be readily referenced and understood during uplink planning and downlink assessment. In the case of Simple Planner, an extensive “Flight School” training series was provided, along with a full multi-day Operations Readiness Test involving the full operations team, including engineering, science, and instrument operators, supported by specialists from the Simple Planner development team.

The complexity, scale, and scope of the deployment process should take into consideration the complexity of the system being deployed and the mission it’s being deployed into, and the magnitude of its potential effects on that system.

3. Proposed Method for Building Trust in Lunar Autonomy

While there are hard technological challenges across all the domains required for a sustained lunar presence, autonomy is unique in that there are also cultural challenges to overcome: namely, human trust in autonomous space systems operating directly with human crews and habitable structures. To build this trust, the levels of autonomy must be increased gradually, while human involvement is slowly reduced until full autonomy with minimal human oversight is achieved.

While the case studies listed above provide areas of direct comparison, they also highlight challenges for their application to the lunar environment—the most notable example being the collection of large, real world datasets. In the early phases of Artemis base development, operational data will be limited, owing to the limited number of vehicles present. The more instances of autonomous operation without human intervention that are logged, however, the more confident operators will be in the success of future fully autonomous missions.

Scalable autonomy will need to be introduced early in the technology development process if we are to overcome both the technological and cultural challenges that this new paradigm presents. To this end, we are proposing a series of phases that would demonstrate increasing levels of autonomy, taking elements from the various case studies listed above, and applying them specifically to lunar autonomy. Each phase will require a planned set of objectives to serve as the proof that the system is certified to move onto the next phase, ultimately ending with a fully autonomous system. The following is meant to describe a method, but is not meant to comprehensively cover all possibilities. It is instead left to the reader to tailor this to the needs of their specific application.

3.1. Phase 0—Manufacturing Reliability and Ground Testing

3.1.1. Design for Reliability

The foundation for any autonomous system rests with the elements that go into its design and manufacture. While much emphasis is placed on the algorithms required for autonomy and their ultimate verification, an algorithm can only perform if its sensors are properly characterized and functioning nominally. To this end, building trust begins with the need to fully understand the low-level components that go into the system design—Technological Readiness. If these components are newly developed, there are often unexpected failures or partial failures that may occur, and their level of reliability may be quite low. Compare this to a component that has been on the market for decades, where all the causes of failure have been identified and mitigated in the design. Of those that cannot be fully mitigated, even having knowledge of failure modes can translate into monitoring elements being integrated into a complete system design, to ensure that the system cannot transition into a state where neither the human operator nor the autonomy element is in control. This is one of the first steps in the ISS method of FDIR (Fault Detection, Isolation and Recovery).

One example of this would be the effects of radiation on the relevant hardware, which could also effect any software that is operating at the time of the event. In any mission-critical space system, fault tolerance is typically handled at the system level and there will be elements of fault tolerance managed within the software (e.g., AI), but more conventional software and hardware methods are incorporated into the design to ensure predictable and safe behaviors during mission operations.

The use of high-reliability parts versus Commercial Off-The-Shelf (COTS) parts often affects space hardware design. While high-reliability parts can provide a more trusted solution, this often drives up cost and complexity significantly, and the advent of the use of COTS parts for space is increasingly being employed [82]. High reliability parts can mitigate some anomalies; however, the level of criticality required of the system is often a factor (e.g., human spaceflight rated versus single lunar day). Often, COTS parts can be used, with the understanding that their reliability, and therefore trust in the system, is lower, or they may be deployed with redundancy to mitigate any failures that might occur.

3.1.2. Testing for Reliability

Space hardware is certified through a series of verification tests to ensure that it will meet the environment in which it will ultimately be deployed. This includes the effects of radiation, thermal environment, vacuum and vibration, to name just a few. These tests can be conducted at all levels of the manufacturing process to ensure that the final system will not experience any failures during operation.

Testing of software and the algorithms built into them can be conducted under a variety of different conditions. Typically, software functional performance is verified through the environmental testing listed above. This would ensure that all the basics of operation have been exercised under a wide variety of environmental conditions. Performance testing of the software is conducted on the hardware, typically at the end of environmental testing just prior to launch. This can also be conducted during operations, typically performed on hardware representations of the completed system.

3.1.3. Algorithm Development

The validation of algorithms, however, often requires special consideration, and possibly the tuning or calibration of specific hardware elements prior to their use. Historically, this has been performed on hardware elements within a laboratory or simulated environment, for example, the JPL Mars yard, which is analogous to Waymo’s closed-course testing in the case study above. Recently, however, the use of software simulators has become more prevalent. Analog operations provide close to real-world operations but are often limited in the testing that can be applied. Limitations can occur owing to differences in the environment (e.g., reduced gravity) or simply the amount of time a single operation can take to conduct. Simulated environments employ digital representations of the hardware (and accompanying algorithms) being verified, along with physics engines that represent environmental elements such as reduced gravity, lunar regolith or sun angles. Their advantage is that synthetic data can be assessed much faster than actual hardware, meaning a high number of test cases can be performed (often orders of magnitude larger than analog tests), and often under conditions that could exceed hardware limits (for the purposes of anomaly detection, for example). This is akin to the Waymo simulation testing noted above.

The drawback with simulated environments is that gaps may be present, as there are always unknowns and uncertainties that cannot be accounted for (the “Reality Gap”). These gaps can be observational or action in nature. Observational gaps occur owing to incomplete information or “too perfect” simulation, i.e., a mismatch in perception resolution or noise between simulations and reality. Action gaps are the result of oversimplified simulations, for example, simulations that do not account for real system latency in deploying the action.

As such, a hybrid approach of calibrating simulated data with analog data is undertaken. This provides a “best of both worlds” approach and is known as REAL2SIM—where the parameters of a model (e.g., regolith interactions) are inferred from real-world observations [83]. The work of Pfaff et al. provides a method for how this can be accomplished [84]. The use of a high fidelity simulator prior to deployment can

Build trust in the algorithm through the trials of numerous operational scenarios. This will determine cases where the algorithm may not function as expected and need to be addressed prior to deployment. This will also set initial benchmarking metrics for operational assessment, as real world data will be limited in initial lunar deployments of autonomous systems.
Test off-nominal situations not possible without harming actual hardware, and as mentioned in the Mars rover case study, the ability to fail gracefully should always be considered.
Inform operators about strategies to optimize operational performance and/or for training purposes. Here, again, benchmarks of human operator performance of the simulation can be collected. Such data can also help assess which operations are time intensive, possibly requiring additional autonomous algorithm development.
Identify safety limits of subsystems or the system as a whole. This follows the JPL application of safety protections to verify identified limits for critical mission elements and/or the ISS example of ARMDs automated command scripting, which would only supply allowed command parameters.

3.1.4. Assessment of Risk

Similar to the aircraft DAL categorizations, an assessment of the level of risk for a given environment should be made prior to deployment to the lunar surface. This would include factors such as financial considerations (loss of vehicle or damage to local infrastructure), as well as the risk of introducing astronauts within the robotic workspace. This assessment will form part of the FDIR analysis, and/or the Waymo STPA identification of failure points, and will pinpoint areas of higher concern. Benchmarking metrics scale with the criticality of failure in assessing different operational tasks, thus determining the number of operational trials for each (See Section 3.4 below for details).

The Mars Rover case study outlines one of the most important aspects of building trust: buy-in from stakeholders. Furthermore, this case illustrates that for missions that do not have outside stakeholders (e.g., third party infrastructure or astronauts in proximity), the only risk is to the operating vehicle itself—something that will likely be expected for early lunar missions. Risk assessment(s), mitigations (e.g., operational logic), and metrics (benchmarks) form the assessment basis for both evaluation of the success rate for the various autonomous elements, but more importantly, provide agreed-upon exit criteria for each phase of autonomy deployment with stakeholders.

3.1.5. Summary

This phase is very much standard design, build and test principles for space hardware; however, this is an important first step in building trust in the system. At the end of this phase, there should be a high degree of knowledge of the hardware (obtained through a mix of design and testing), along with trust in the autonomous algorithms based on analog or simulation testing (or a hybrid of both). A statistically significant level of testing (benchmarking) should be applied to ensure that the algorithm has been exhaustively tested under a wide variety of real and synthetic conditions, including anomaly testing.

Applying the STAR-L designations to the autonomy portion only, this would have TrRL4, as the complete system is tested, with all elements tested within a simulated or analogous environment.

3.2. Phase I—Tele-Operations with Autonomy Support

In the first operational stage, control of the robotic systems will be done by human operators on Earth via tele-robotics, with the support of an autonomous system. This arrangement is intended to ensure that the system both functions and performs as expected, prior to deployment of any elements that would conduct control of the system. This mimics the aviation industry’s roles of Pilot In Command (PIC) and Second In Command (SIC)—whose role it is to monitor all systems. (We amend these terms to highlight the use of an autonomy element and differentiate from human control: ÆPIC and ÆSIC, respectively.) Examples of the roles that the tele-operator and autonomy support element could provide are as follows:

3.2.1. Tele-Operator PIC-System Validation

Here, the human tele-operator can take the system through its paces, building upon successful operations by conducting increasingly more complicated tasks until the full capabilities of the system have been exercised (See Figure 8 below). This is the first step in building operational trust, as it is intended to show that no faults or unexpected behaviour were observed, and that the performance of the hardware and software (excluding any autonomous algorithms) is as expected. System trust is built through repetition of tasks, demonstrating that the system does not falter within a statistical margin that is satisfactory to the mission designers/operators.

3.2.2. Autonomy Supervisor ÆSIC-Data Collection and Learning

In this approach, models are updated periodically using newly captured data from the lunar surface and human operator feedback (similar to the scenario selection method used for autonomous vehicles). Data can be employed to update algorithms developed prior to launch and deployment (see above), with any differences between prediction and operator or deviations in expected performance while in the lunar environment necessitating further scrutiny and trialing as required. Deliberative AI algorithms intended for eventual deployment on the flight hardware could be trained on this dataset, though off-nominal (negative test cases) would likely need to be synthetically manufactured.

One direct example of this would be employing methods such as Teach and Repeat [85]. Teach and Repeat is a framework and approach to collecting real-world data for robotic simulations. It focuses on separating robotic development into the “teach” and “repeat” phases to train it to autonomously navigate an environment. The teaching phase includes manually or autonomously driving a rover along a known path, allowing it to capture relevant sensory information (odometry, images) that accounts for its path-planning. The repeat phase allows the motion control system to use the learned path under similar conditions, with operators observing its ability to complete that task, ultimately leading to repetitive tasks quickly gaining a high trust factor.

3.2.3. Autonomy Supervisor ÆSIC–Heads up Display

In addition, some autonomy may be necessary to augment operator decisions, given the challenges of round-trip communication delays. Proprioceptive algorithms will additionally be run in real-time to augment human operator situational awareness during nominal supervision of the plan by assisting with, for example, tactical anomaly detection and obstacle avoidance. This deployment of autonomy is analogous to that deployed in the aviation industry’s “assistance” application that could include anomaly detection.

3.2.4. Autonomy Supervisor ÆSIC-Predictive Planning

The deliberative algorithms intended for eventual deployment on the flight hardware could be employed to provide predictive planning capabilities on the ground, including suggestions on task planning, robotic motion planning, and strategic obstacle avoidance. This would serve as the entry point for some autonomous algorithms that can be deployed on the ground segment.

3.2.5. Autonomy Supervisor ÆSIC-Repetitive Tasks

As a stretch goal, low-level (atomic) autonomy tasks for actions such as simple repeated motion tasks could be incorporated at this stage to improve operator success rates. This could include Authorization To Proceed (ATP) points-deployed both on ISS and on Orbital Express–wherein the operator must assess each operational stopping point before giving the command to continue. The caution of operator complacency noted in the ISS case study for repetitive ATPs is repeated here and would suggest that operators require a high degree of training to fulfill a complete assessment of predicted plans prior to their execution.

3.2.6. Summary

While the major goal of this phase will be to validate the complete end-to-end performance within a select environment using human teleoperators, this will also serve to provide data for the autonomy algorithms to be assessed in Phase II. The criteria for exiting this phase are presented below.

Applying the STAR-L designations, this would have TRL6-8/TrRL1-4, as both sub-assemblies and the complete system are tested; however, human operation is required, and operations would be conducted without any autonomous elements (minimal trust).

3.2.7. Phase II—Increasing Autonomy with Humans In-the-Loop

Once the initial objectives of system validation have been completed via tele-operations in Phase I, the planned set of tasks will be repeated, transferring control to the autonomous systems, while keeping humans in-the-loop for remote e-stop, regular checkpointing and go/no-go decisions. Thus, the Autonomy Supervisor becomes pilot in command (ÆPIC), while the human tele-operator becomes second in command (SIC).

3.2.8. Autonomous Supervisor ÆPIC-Atomic & Repetitive Tasks

The application of autonomy to simple (i.e., atomic) tasks that involve only a single system element will lay the groundwork for more complicated operations and will build trust with stakeholders that the system is robust to the deployed environment.

Models employing approaches such as teach-and-repeat are another way to gain stakeholder trust. This will greatly improve the probability of success and will provide a blueprint for “teaching” autonomous systems in situ to perform specific tasks within a full-scale mission.

3.2.9. Autonomous Supervisor ÆPIC-Compound & Operational Logic Tasks

Much like Phase I, once simple atomic tasks have been demonstrated, these can be augmented to increase the level of complexity. Two examples of operational complexity would be compound tasks that require two or more independent elements that must work in concert (e.g., a robotic arm on a rover) or the use of operational logic where decisions are made onboard in real-time based on feedback from the system. Ultimately, the robotic element would gain full autonomous operation (i.e., all functionality) whilst under human supervision.

3.2.10. Tele-Operator SIC-Gating & Monitoring

Deliberative planning algorithms can be deployed on flight, with algorithms that ensure that the plan is fully communicated from the flight system to the ground segment so that it can be validated and approved prior to execution. Human tele-operators will then authorize execution of the in-situ generated plan.

By keeping the task and the worksite the same as in Phase I, outcomes between phases can be directly compared, with differences in expected behaviour requiring a review regarding the root cause, with possible adjustments to the algorithm.

As tasks become more trusted, the need for human authorization can be considered for removal on a case-by-case basis. Proprioceptive algorithms can also be run onboard the flight system, at first in parallel to ground-based monitoring, leading towards certification of the onboard capability.

Applying the STAR-L designations, this would have TRL6-8/TrRL5-6, as both sub-assemblies and the complete system are tested; however, human monitoring is still required.

3.3. Phase III—Fully Autonomous Certification

Through the staged approach defined above, both deliberative and proprioceptive algorithms are eventually certified to run autonomously. The mission can then attempt to execute end-to-end with no human intervention. This may also be done in a staged manner wherein humans monitor the execution of the autonomously generated plan, serving as a redundant safety measure, leading eventually to fully autonomous lunar operations, and certification as an autonomous element—akin to what is done for the aviation industry.

3.3.1. Autonomous Supervisor ÆPIC-All Tasks

At this point, the system is considered fully autonomous, and control of the system is via onboard algorithms and accompanying sensors.

3.3.2. Tele-Operator-Periodic Monitoring

The role of the tele-operator transitions to one of periodic monitoring and maintenance scheduling of system elements.

3.3.3. Summary

Through systematic and comprehensive testing of both the system (hardware and software) as well as the autonomous algorithms, trust is built that the system will behave as expected while operating in a tested environment. While system failure is regrettable (loss of asset), system failure resulting in damage (loss of property) or harm (loss of life) must be demonstrated to have a very low probability.

Applying the STAR-L designations, this would have TRL7-8/TrRL7-8, as it has been proven in a single environment, but may require modifications for use in a multitude of environments.

3.4. Metrics for Phase Exit

The above sections describe what actions are undertaken by the tele-operator and autonomous supervisor throughout the operational testing of the autonomous elements. This section will outline a proposed method for establishing both the benchmark testing required plus the accompanying metrics to declare each phase as complete. For the purposes of exiting Phase 0, only the algorithm testing is considered here, as the hardware and software testing would be part of a program plan for the entire system.

3.4.1. Autonomy Task Test Plan

The first step will be to develop an Autonomy Task Test Plan (ATTP) that will identify all of the atomic, compound and operational logic tasks that will eventually lead to all system functions being exercised. This could take the form of Figure 8.

For example, a robotic arm could (1) locate a grapple feature, (2) move the arm to grapple, (3) grapple, (4) move the payload, and (5) ungrapple, as a set of atomic tasks. Multiple instances of testing these atomic tasks would exercise all the possible operational variations. So, within the present example, the uncertainty in grapple fixture location (in 3D space) plus the uncertainty in robotic placement could yield a range of possible starting locations for trialing the “grapple task”, from which statistics on success and failure could be based.

The grouping of atomic tasks will ultimately be determined by physical and or operational relationships of the complete system. The example of the robotic arm could yield a compound task that exercises the atomic tasks in sequential order–grapple and move a payload. Additional compound tasks could be to interface with a variety of payloads, or if the arm were mounted on a rover, operations between subsystems would be required.

An example of operational logic would be a missed grapple, where the system was not successful in completing a sequential task. If the grapple task was assessed to be successful 90% of the time, this would result in a branch in the operational flow to account for the 10% (e.g., either through a second attempt or replanning without the payload).

Ultimately, the system would be tested in its entirety, with all tasks and logic paths exercised. It should be mentioned that any off-nominal testing may require a special setup of the system. So, the robotic arm’s start position could be purposefully mispositioned to trial what a missed grapple could look like (as an example).

3.4.2. Determination of Metrics

Here, we follow the method developed for autonomous vehicles, in which the number (N) of trials (i.e., execution of a task) is determined based on the confidence interval that is determined for this mission. These values would be determined by stakeholders with the following consideration: whether human safety is a factor (versus solely robotic operation); and whether external infrastructure (e.g., a habitation module or third party robotic elements) can be damaged during operation. Importantly this needs to be done for both the system as a whole, as well as sub-systems to ensure that the correct level of criticality is assigned to each autonomous system/sub-system. Lastly, operational constraints such as those listed above (e.g., lunar cycles, communication latency and/or human operator constraints) must also be considered. Based on the level of criticality of the mission, a statistical risk threshold is proposed (e.g.,

3 σ

or 99.7% operational success).

Initial benchmark metrics are calculated based on simulated results (e.g hardware-in-the-loop and software-in-the-loop testing results) and any earth-based physical testing. These benchmarks provide a mission baseline against which initial operational efficiency will be assessed. The rate of off-nominal operation forms part of this benchmark, where this includes the effects of operator error, environmental factors such as expected radiation effects on the hardware, or inherent system inaccuracies owing to onboard sensors (e.g., a machine vision system). If these performance indicators are not met within statistical certainty during mission operations, it would necessitate an assessment of what part of the system is not fully characterized, and possible changes to the autonomous algorithm as required.

The value for N can then be evaluated for the specific environment. This is done via the standard sample size formula for small populations, which employs the risk threshold/margin of error (e.g., 0.3% for

3 σ

) and desired confidence level (e.g., 95% confidence) [86].

As shown in Figure 8, sample values of N are assigned such that the number of successful trials is met within the confidence interval for that particular sub-element, and should include the number of trials through all levels of the ATTP. Trials involving atomic tasks, compound tasks, operational logic tasks, plus the complete system, would yield a total N that would determine that the system was certified for operation of all system functions.

3.4.3. Effect of Environment

What has been described thus far would be for a single operational environment. Environment is defined here as a location with a set number of external parameters that will be fixed throughout the operation of the system. Any changes to the external parameters would require a recertification, which would repeat some or possibly all of the tasks in the ATTP.

As an example, consider a rover with a robotic arm that is deployed onto the lunar surface. The environment that it is certified to operate in would include the terrain, negotiation of obstacles such as boulders, and extreme lighting. If a second micro-rover were added to its operational workspace, particularly one in which there is no operational coordination between vehicles, this would necessitate recertification to account for this new obstacle that would be routinely encountered. Moving the rover to a different location on the lunar surface, the addition of a habitation module to the operational environment, or possibly human astronauts in close proximity would each necessitate careful consideration of the impact on autonomous elements onboard.

It’s possible that not all elements may be impacted by the change in environment. Working with the previous example, the addition of a micro-rover may change how autonomy on the original rover is deployed, but may not impact the operation of the robotic arm. Thus, the impact on each of the tasks in the ATTP will have to be considered. Similarly, the number of trials for each element will have to be evaluated, as some form of “credit” may be granted for prior certification. The resulting updated ATTP may look something like that presented in Figure 9, with each layer of translucent blocks constituting a separate environment.

A system that has been certified for a multitude of environments would be classified with TRL9/TrRL9 designation.

4. Demonstration of a Lunar Autonomy Sandbox

The trialing of this method in a lunar environment is envisioned for the LUNAR BRIC construction demonstration mission [87]. While the primary mission will be conducted using tele-operation, once complete, the trialing of higher levels of autonomy will be tested. The focus will be on the autonomy of the robotic arm and accompanying vision system(s), which will be affixed to a stationary lander.

This has several advantages: (1) a stationary platform limits any damage of anomalous behaviour to the host system, i.e., it is not possible to cause third-party damage; (2) trialing of autonomous algorithms will be conducted following the primary mission, i.e., the mission does not rely on the autonomy for complete success; (3) there are no astronauts present, i.e., there is no possibility of causing harm to humans near to the workspace. So the consequences of off-nominal behaviour are extremely limited. This will greatly reduce the number of trials needed to certify the system for this limited environment.

4.1. Application of a Trust-Based Approach to Autonomy for LUNAR BRICS

4.1.1. Phase 0

The selection of hardware elements with the appropriate level of reliability will be assumed for the purposes of this writing. Autonomous algorithms will be initially trialed in a hybrid approach described above. This will include the creation of a synthetic environment that has been calibrated against operations in an analog robotic testbed. This synthetic testbed can be used to perform Monte Carlo simulations of various operations to determine the best (e.g., most efficient) method for surface operations. The implementation of the REAL2SIM environment will be completed in future development, but is described here as a working example.

The outcome of this effort will provide several outputs useful to operations planning of the mission: (1) trialing of the robotic operations, (2) a list of off-nominal tests that will be required to completely test all aspects of the system, and (3) an initial set of metrics for the expected robotic performance during the full extent of robotic operations.

4.1.2. Phase 1

The operational phase of the primary mission will be considered to fulfill Phase 1. This includes tele-operation of the robotic arm and accompanying vision system, but could be a pre-scripted series of commands based on telemetry. This may also include some heads-up information to overcome the 2–3 s operational lag owing to light time delays (see teleoperator suggested tasks in Figure 10 below). This phase will trial that human operation meets or exceeds the expected metrics, within statistical uncertainty, with those estimated from Phase 0.

For the purposes of this demonstration mission, we will assume a 0.1% operational error in the camera or robot 3D localization. After 10 trials, the 95% confidence interval would be 0.7%, improving to 0.2% after 100 trials and 0.06% after 1000. (i.e., the true mean will be within 95 of 100 calculated intervals). Given the low risk of this demonstration mission, as well as the time constraints of the lunar day, a minimum value of N = 10 will be set for each of the tasks (shown in Figure 10). The tasks that are conceived of for deployment of the Robotic arm and vision system are as follows in Figure 10.

Some tasks have an increased number of trials owing to the fact that they are required for trialing more than one atomic task, e.g., moving to a specific position is required to both dig and pick up a Regolith Containment Unit (RCU) from the surface. As such, those tasks will have a lower confidence interval, and thus improved trust in their operation. Some algorithm-based tasks have a decreased number of trials due to situational constraints, such as equipment or time limitations.

The output from this phase will be (1) Completion of the requisite number of trials (N) for each task; (2a) comparison of tele-operational data with Phase 0 simulated success metrics; (2b) identification of any deviations from expected behaviour; and (2c) identification of any changes to the autonomy algorithms.

4.1.3. Phase 2

This phase will commence at the end of the primary mission and will leverage all operational data collected via tele-operation. Here, the teleoperator will monitor specific operations that will be conducted, keeping in mind the 2–3 s lag time of operation.

The individual tasks are shown as the values in blue in Figure 11 above. As tasks are completed and the number of trials (N) is fulfilled, statistics of success will be maintained and compared against the human tele-operator. From this, the total number or trials for each atomic task (i.e., singular motion) is tabulated, and the final confidence interval for each is calculated. Note that a task, such as an algorithm to create the RCU, is considered part of the Dig and Dump cycle. This is not an autonomous algorithm and is instead triggered to seal and dispense the RCU once a fixed number of scoop operations has been completed.

The output from this phase will be (1) Completion of the requisite number of trials (N) for each task (2a) comparison of data with Phase 1 teleoperated metrics; (2b) identification of any deviations from expected behaviour; and (2c) identification of any changes to the autonomy algorithms (which would require recertification).

4.1.4. Phase 3

At the commencement of this phase, it will be assumed that all Phase 2 trials are complete, and the system is certified for this specific environment. This would mean that human operators would be left to monitor the success of the system on a periodic basis and resolve any anomalies that are not covered by the system.

For this demonstration mission, the addition of a second operational environment will be explored. This will be facilitated via the addition of an obstacle to the workspace, in this case, placement of a regolith pile, or possibly a mission element such as the GPR. This will necessitate a re-trial of the various tasks that are affected by the placement of such an obstacle. The conceptual environmental layering of Figure 8 would be applied to Figure 11 to represent the addition of a second operational environment in the context of the demonstration.

The output from this phase will be (1) Completion of the requisite number of trials (N) for each task, (2a) comparison of data with Phase 2 teleoperated metrics; (2b) identification of any deviations from expected behaviour; and (2c) identification of any changes to the autonomy algorithms (which would require recertification).

5. Conclusions

In order to facilitate a sustained lunar presence, or future deep space operations, it is necessary to develop fully autonomous systems capable of operation in a wide variety of environments. A generalized method is proposed for building trust with stakeholders when it comes to using autonomous systems, with application to a lunar construction pathfinder mission.

This method was derived using the NASA STAR-L system, which provide an excellent framework for categorizing the level of technological readiness and, most importantly, stakeholder trust. Case studies that identify mutually reinforcing methods for cultivating stakeholder trust in autonomous systems that are already deployed were also evaluated as part of development of this method. This includes gradual increases in autonomy with human oversight, and clear benchmarks for assessing each stage leading up to full autonomy. These case studies not only defined methods for building trust within each phase, but provided guidance on a means to transition from one phase to the next, leading to full certification as a fully autonomous system.

This generalized method for building trust with stakeholders was then applied to a lunar construction pathfinder mission currently under development. Starting from hardware design and algorithm development, through to operational procedures, the gradual increase in autonomous capabilities was considered—thereby increasing human trust in this systems.

We hope that this paper will provide a starting point for future discussions about how we combine building advanced lunar automation capabilities in conjunction with building a deep level of trust in these systems from stakeholders. Such discussions are vital if humankind is to advance space exploration beyond our current capabilities.

Author Contributions

Conceptualization, C.S.D. and P.G.; methodology, C.S.D., P.G. and S.L.W.; formal analysis, C.S.D., P.G. and S.L.W.; investigation, C.S.D., P.G., S.L.W., D.A., R.F., L.M.L., A.N. and N.P.; writing—original draft preparation, C.S.D., P.G., S.L.W., D.A., R.F., L.M.L., A.N. and N.P.; writing—review and editing, C.S.D., P.G., S.L.W., D.A., R.F., L.M.L., A.N. and N.P. All authors have read and agreed to the published version of the manuscript.

Funding

Part of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Acknowledgments

The authors would like to acknowledge Andrew Ogilvie for his thoughtful comments on the Orbital Express program mentioned in Section 2.4, Case Study: Orbital Express Robotics Rendezvous in Low Earth Orbit, as well as Andrew Allen for his excellent comments and review of the paper.

Conflicts of Interest

Authors Dr. Cameron Dickinson and Dr. Paul Grouchy were employed by the company MDA Space Ltd, Author Mr. Anh Nguyen was employed by the company Bombardier Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Trust Readiness Levels (TrRLs)

Trust Readiness Levels (TrRLs) as defined by Hobbs et al. [53]:

TrRL1: The system’s conceptual performance is acceptable to the designer.
TrRL2: The system’s task performance is understandable (traceable and logical) to the designer.
TrRL3: The system’s task performance is acceptable and understandable (traceable and logical) to a tester.
TrRL4: The system’s task performance is acceptable and understandable to a tester across multiple task conditions (inclusive of conditions that could invoke errors).
TrRL5: The system’s performance is acceptable and understandable (traceable and logical) to an operator in a simulated environment.
TrRL6: The system’s performance is acceptable and understandable (traceable and logical) to an operator in a relevant environment.
TrRL7: The system’s performance is acceptable and understandable (traceable and logical) to an operator in an operational environment.
TrRL8: The system’s performance is acceptable and understandable (traceable and logical) to an operator across multiple task conditions (inclusive of conditions known to invoke errors).
TrRL9: The system’s performance is universally accepted and understood by the community of operators across multiple task conditions (inclusive of conditions known to invoke errors).

References

NASA. NASA Technology Roadmaps: TA 4 Robotics and Autonomous Systems. National Aeronautics and Space Administration, July 2015. Available online: https://www.nasa.gov/wp-content/uploads/2016/08/2015_nasa_technology_roadmaps_ta_4_robotics_and_autonomous_systems_final.pdf (accessed on 8 August 2025).
Jones, C.A.; Stafford, M.; Latorella, K.; Bard, C.; Dorelli, J.; Rodgers, E.; Pensado, A.R.; Benjamin, G.; Lewis, S.; Patrick, A.; et al. Recommendations to Advance Space Trusted Autonomy. In Proceedings of the ASCEND 2021, Las Vegas, NV, USA, 15–17 November 2021; pp. 1–26. [Google Scholar]
Hollaway, D.; Taylor, E.; Badger, J. When the Eyes Don’t Have It: Autonomous Control of Deep Space Vehicles for Human Spaceflight. In Proceedings of the ASCEND 2020, Online, 16–18 November 2020; Available online: https://ntrs.nasa.gov/citations/20205007978 (accessed on 8 August 2025).
European Space Agency. Control, Autonomy and Intelligence. ESA. Available online: https://www.esa.int/Enabling_Support/Space_Engineering_Technology/Automation_and_Robotics/Control_Autonomy_and_Intelligence (accessed on 8 August 2025).
European Cooperation for Space Standardization. On-Board Autonomy. ECSS. Available online: https://ecss.nl/item/?glossary_id=692 (accessed on 9 August 2025).
Whitmire, A. The Behavioural Health Challenges of Mars Missions. National Academies of Sciences Panel on Biological and Physical Sciences and Human Factors, Open Meeting #1, 2024. Available online: https://ntrs.nasa.gov/api/citations/20240010442/downloads/Behavioral%20Challenges%20of%20a%20Mars%20Mission.pdf (accessed on 12 August 2025).
Verma, V.; Maimone, M.W.; Gaines, D.M.; Francis, R.; Estlin, T.A.; Kuhn, S.R.; Thiel, E.R. Autonomous Robotics is Driving Perseverance Rover’s Progress on Mars. Sci. Robot. 2023, 8, 80. [Google Scholar] [CrossRef]
Wagner, C.; Mauceri, C.; Twu, P.; Marchetti, Y.; Russino, J.; Aguilar, D.; Reeves, G. Demonstrating Autonomy for Complex Space Missions: A Europa Lander Mission Autonomy Prototype. J. Aerosp. Inf. Syst. 2024, 21, 37–57. [Google Scholar] [CrossRef]
NASA. Data Rate Increase on the International Space Station Supports Future Exploration; NASA, n.d. Available online: https://ntrs.nasa.gov/api/citations/20190025199/downloads/20190025199.pdf (accessed on 9 August 2025).
Moore, R.C. Satellite RF Communications and Onboard Processing. In Encyclopedia of Physical Science and Technology, 3rd ed.; Academic Press: Cambridge, MA, USA, 2003; pp. 439–455. [Google Scholar] [CrossRef]
Raible, D. Space Communications and Navigation. Technical Report; NASA, n.d. Available online: https://ntrs.nasa.gov/api/citations/20210010148/downloads/Mars%20Perseverance%20Communications%202021%20revised.pdf (accessed on 10 August 2025).
Taylor, J.; Lee, D.K.; Shambayati, S. Mars Reconnaissance Orbiter. Technical Report; NASA, n.d. Available online: https://descanso.jpl.nasa.gov/monograph/series13/DeepCommo_Chapter6–141029.pdf (accessed on 11 August 2025).
Davis, P. Spacecraft. Available online: https://science.nasa.gov/mission/voyager/spacecraft (accessed on 14 August 2025).
Belobrajdic, B.; Melone, K.; Diaz-Artiles, A. Planetary Extravehicular Activity (EVA) Risk Mitigation Strategies for Long-Duration Space Missions. NPJ Microgravity 2021, 7, 16. Available online: https://pmc.ncbi.nlm.nih.gov/articles/PMC8115028/ (accessed on 14 August 2025). [CrossRef]
Aziz, S. Development and Verification of Ground-Based Tele-Robotics Operations Concept for Dextre. Acta Astronaut. 2013, 86, 1–9. Available online: https://www.sciencedirect.com/science/article/abs/pii/S0094576511003286 (accessed on 14 August 2025). [CrossRef]
NASA. Exploration EVA Challenges: Human Health & Performance. Technical Report; NASA, July 2019. Available online: https://www.nasa.gov/wp-content/uploads/2018/11/hhp_eva_workshop_july_2019_final.pdf (accessed on 12 August 2025).
Feinberg, L.D.; Ziemer, J.; Ansdel, M.; Crooke, J.; Dressing, C.; Mennsesson, B.; O’Meara, J.; Pepper, J.; Roberge, A. The Habitable Worlds Observatory Engineering View: Status, Plans and Opportunities. Technical Report; Goddard Space Flight Center, NASA, 2024. Available online: https://ntrs.nasa.gov/api/citations/20240006497/downloads/HWO%20Engineering%20View%20Status%20Plans%20Opportunities.pdf (accessed on 14 August 2025).
Cash, I. Cassiopeia–A New Paradigm for Space Solar Power. Acta Astronaut. 2019, 159, 170–178. Available online: https://www.sciencedirect.com/science/article/abs/pii/S0094576518320708 (accessed on 12 August 2025). [CrossRef]
Badger, J.; Nguyen, V.; Mehling, J.; Hambuchen, K.; Diftler, M.; Luna, R.; Joyce, C. Towards Autonomous Operations of Robonaut 2 Humanoid Robotic Testbed. In Proceedings of the ISS Research and Development Conference, San Diego, CA, USA, 12–14 July 2016; Available online: https://ntrs.nasa.gov/citations/20160003480 (accessed on 15 August 2025).
Listenbee, C. Kavraki Lab Develops Framework for NASA’s Robonaut 2. Available online: https://csweb.rice.edu/news/kavraki-lab-develops-framework-nasas-robonaut-2 (accessed on 15 August 2025).
Coltin, B. Astrobee: The International Space Station Robotic Freeflyer. Technical Report; NASA Technical Reports Server, n.d. Available online: https://ntrs.nasa.gov/api/citations/20240012458/downloads/astronomy_society.pdf (accessed on 15 August 2025).
Oda, M.; Kawano, I.; Inaba, N.; Mokuno, M. On-Ground Tele-Operation and On-Board Autonomous Control for the ETS-VII’s Rendezvous Docking and Space Robot Experiments. Technical Report; National Space Development Agency of Japan, n.d. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=1447c7ec8f384a0a6f84fb3570964f5ec1edfead (accessed on 15 August 2025).
Howard, R.T.; Heaton, A.F.; Pinson, R.M.; Carrington, C.L.; Lee, J.E.; Bryan, T.C.; Johnson, J.E. The Advanced Video Guidance Sensor: Orbital Express and the Next Generation. AIP Conf. Proc. 2008, 969, 717–724. Available online: https://ntrs.nasa.gov/api/citations/20080015656/downloads/20080015656.pdf (accessed on 15 August 2025).
Chouinard, C.; Knight, R.; Jones, G.; Tran, D. An ASPEN Application: Automating Ground Operations for Orbital Express. Technical Report; Jet Propulsion Laboratory, n.d. Available online: https://ccia.ugr.es/~lcv/SPARK/08/Papers/paper10.pdf (accessed on 15 August 2025).
Pinard, D.; Reynaud, S.; Delpy, P.; Strandmoe, S.E. Accurate and Autonomous Navigation for the ATV. Aerosp. Sci. Technol. 2007, 11, 490–498. Available online: https://www.sciencedirect.com/science/article/abs/pii/S1270963807000624 (accessed on 15 August 2025).
European Space Agency. Safety and Autonomy Make the ATV Unique. Available online: https://www.esa.int/Science_Exploration/Human_and_Robotic_Exploration/ATV/Safety_and_autonomy_make_the_ATV_unique (accessed on 15 August 2025).
Friesen, T. What Is NASA’s Distributed Spacecraft Autonomy? Available online: https://www.nasa.gov/centers-and-facilities/ames/what-is-nasas-distributed-spacecraft-autonomy/ (accessed on 15 August 2025).
European Space Agency (ESA). PROBA-3’s First Autonomous Formation Flight. Available online: https://www.esa.int/Enabling_Support/Space_Engineering_Technology/Proba-3/Proba-3_s_first_autonomous_formation_flight (accessed on 15 August 2025).
Liu, J.; Ren, X.; Yan, W.; Li, C.; Zhang, H.; Jia, Y.; Wen, W. Descent Trajectory Reconstruction and Landing Site Positioning of Chang’E-4 on the Lunar Farside. Nat. Commun. 2019, 10, 4229. [Google Scholar] [CrossRef] [PubMed]
Ghosh, R.; Tomar, S.; Mhatre, C.S.; Sumithra, K.; Gvp, B.K.; Siva, M.S. Path Planning for the Pragyan Rover: Experiences and Challenges. In Proceedings of the 2024 International Conference on Space Robotics (iSpaRo), Luxembourg, 24–27 June 2024; pp. 70–75. [Google Scholar] [CrossRef]
Sakai, S.; Kushiki, K.; Sawai, S.; Fukuda, S.; Miyazawa, Y.; Ishida, T.; Saito, H. Moon Landing Results of SLIM: A Smart Lander for Investigating the Moon. Acta Astronaut. 2025, 235, 47–54. [Google Scholar] [CrossRef]
Inazawa, M.; Hirano, D.; Sutoh, M.; Sawada, H.; Nagata, M.; Sakoda, G. Autonomous Control of Lunar Excursion Vehicle 2 (LEV-2). In Proceedings of the 2024 International Conference on Space Robotics (iSpaRo), Luxembourg, 24–27 June 2024; pp. 224–230. [Google Scholar] [CrossRef]
Bajracharya, M.; Maimone, M.W.; Helmick, D. Autonomy for Mars Rovers: Past, Present, and Future. Comput. IEEE Comput. Soc. 2008, 41, 44–50. [Google Scholar] [CrossRef]
Pederson, L.; Kortenkamp, D.; Wettergreen, D.; Nourbakhsh, I. A Survey of Space Robotics. In Proceedings of the 7th International Symposium on Artificial Intelligence, Robotics and Automation in Space (i-SAIRAS), Nara, Japan, 19–23 May 2003; Available online: https://ntrs.nasa.gov/api/citations/20030054507/downloads/2003isairas.pedersen.pdf (accessed on 15 August 2025).
Mishkin, A.H.; Morrison, J.C.; Nguyen, T.T.; Stone, H.W.; Cooper, B.K.; Wilcox, B.H. Experiences with Operations and Autonomy of the Mars Pathfinder Microrover. In Proceedings of the 1998 IEEE Aerospace Conference Proceedings (Cat. No.98TH8339), Snowmass, CO, USA, 28 March 1998; Volume 2, pp. 337–351. [Google Scholar] [CrossRef]
Ai-Chang, M.; Bresina, J.; Charest, L.; Jonsson, A.; Hsu, J.; Kanefsky, B.; Yglesias, J. MAPGEN: Mixed Initiative Planning and Scheduling for the Mars ‘03 MER Mission. AAAI Technical Report SS-03-04, 2003. Available online: https://cdn.aaai.org/Symposia/Spring/2003/SS-03-04/SS03-04-001.pdf (accessed on 15 August 2025).
Bresina, J.; Hsu, J.; Jonsson, A.; Kanefsky, B.; McCurdy, M.; Yglesias, J. MAPGEN: Mixed-Initiative Activity Planning for the Mars Exploration Rover Mission, 2004. Available online: https://ntrs.nasa.gov/api/citations/20040084378/downloads/20040084378.pdf (accessed on 15 August 2025).
Verma, V.; Huang, J.; Bailey, P.; Carsten, J.; Klein, D. Perseverance Rover Collision Model for a Range of Autonomous Behaviors. In Proceedings of the 2022 IEEE Aerospace Conference (AERO), Big Sky, MT, USA, 5–12 March 2022; pp. 1–18. [Google Scholar] [CrossRef]
Bonitz, R.G.; Shiraishi, L.; Robinson, M.; Arvidson, R.E.; Chu, P.C.; Wilson, J.J.; Smith, P. NASA Mars 2007 Phoenix Lander Robotic Arm and Icy Soil Acquisition Device. J. Geophys. Res. Planets 2008, 113, e3. [Google Scholar] [CrossRef]
Roumage, G.; Azaiez, S.; Faure, C.; Louise, S. The Ingenuity Mars Helicopter Specified and Analyzed with the Real-Time Mode-Aware Dataflow Model. arXiv 2025. [Google Scholar] [CrossRef]
Anderson, J.L.; Karras, J.T.; Cacan, M.; Kubiak, G.; Pyrzak, G.; Dor, H.; Pipenberg, B. Ingenuity: One Year of Flying on Mars. In Proceedings of the 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2023; pp. 1–18. [Google Scholar] [CrossRef]
Ackerman, E. How NASA Designed a Helicopter That Could Fly Autonomously on Mars. IEEE Spectrum. Available online: https://spectrum.ieee.org/nasa-designed-perseverance-helicopter-rover-fly-autonomously-mars (accessed on 16 August 2025).
Basich, C.; Mauceri, C.; Kubiak, G.; Delfa, J.; Candela, A.; Proença, P.; Chien, S. Onboard Autonomous Health Assessment and Global Localization for the Mars Helicopter: Towards Multi-Flight Operations. In Proceedings of the International Symposium on Artificial Intelligence, Robotics, and Automation for Space (i-SAIRAS), Brisbane, Australia, 19–21 November 2024; Available online: https://ai.jpl.nasa.gov/public/documents/papers/outlast-isairas-2024.pdf (accessed on 16 August 2025).
NASA. MAVEN Mission and Spacecraft Description. Available online: https://pds-atmospheres.nmsu.edu/data_and_services/atmospheres_data/MAVEN/logs/Mission%20and%20spacecraft%20description.pdf (accessed on 16 August 2025).
Amiri, H.E.S.; Brain, D.; Sharaf, O.; Withnell, P.; McGrath, M.; Alloghani, M.; Yousuf, M. The Emirates Mars Mission. Space Sci. Rev. 2022, 218, 4. [Google Scholar] [CrossRef] [PubMed]
Riedel, J.; Bhaskaran, S.; Synnott, S.P.; Desai, S.D.; Bollman, W.E.; Dumont, P.J.; Williams, B.G. Navigation for the New Millennium: Autonomous Navigation for Deep-Space 1. In Space Flight Dynamics, Proceedings of the 12th International Symposium, Darmstadt, Germany, 2–6 June 1997; European Space Agency: Paris, France, 1997; Available online: https://articles.adsabs.harvard.edu//full/1997ESASP.403..303R/0000303.000.html (accessed on 16 August 2025).
Muscettola, N.; Nayak, P.P.; Pell, B.; Williams, B.C. Remote Agent: To Boldly Go Where No AI System Has Gone Before. Artif. Intell. 1998, 103, 5–47. [Google Scholar] [CrossRef]
Ogawa, N.; Terui, F.; Mimasu, Y.; Yoshikawa, K.; Ono, G.; Yasuda, S.; Tsuda, M. Image-Based Autonomous Navigation of Hayabusa2 Using Artificial Landmarks: The Design and Brief In-Flight Results of the First Landing on Asteroid Ryugu. Astrodynamics 2020, 4, 89–103. [Google Scholar] [CrossRef]
Lorenz, D.A.; Olds, R.; May, A.; Mario, C.; Perry, M.E.; Palmer, E.E.; Daly, M. Lessons Learned from OSIRIS-REx Autonomous Navigation Using Natural Feature Tracking. In Proceedings of the 2017 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2017; pp. 1–12. [Google Scholar] [CrossRef]
Bradley, B.; Brennan, C.; Buffington, B.; Burgoyne, H.; Dooley, J.; Evans, J.; Cook, K. Europa Clipper Mission: System Integration Review Report. In Proceedings of the 2022 IEEE Aerospace Conference (AERO), Big Sky, MT, USA, 5–12 March 2022; pp. 1–15. [Google Scholar] [CrossRef]
Cangahuala, L.A.; Campagnola, S.; Bradley, B.K.; Boone, D.R.; Buffington, B.B.; Ludwinski, J.M.; Scott, C.J. Europa Clipper Mission Design, Mission Plan, and Navigation. Space Sci. Rev. 2025, 221, 22. [Google Scholar] [CrossRef]
Srinivasan, J.M.; Barltrop, C.; Berman, S.; Bushman, S.; Dickson, J.; Drain, T.; Tuszynski, M. Europa Clipper Flight System Overview. Space Sci. Rev. 2025, 221, 30. [Google Scholar] [CrossRef]
Hobbs, K.L.; Lyons, J.B.; Feather, M.S.; Bycroft, B.P.; Phillips, S.; Simon, M.; Paine, S. Space Trusted Autonomy Readiness Levels. In Proceedings of the 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2023; pp. 1–17. [Google Scholar] [CrossRef]
SAE Standard J3016; Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles. SAE International: Warrendale, PA, USA, 2021.
SAE International. SAE Levels of Driving AutomationTM Refined for Clarity and International Audience. Available online: https://www.sae.org/standards/j3016_202104-taxonomy-definitions-terms-related-driving-automation-systems-road-motor-vehicles (accessed on 9 August 2025).
European Union Aviation Safety Agency. EASA Artificial Intelligence Roadmap 2.0-A Human-Centric Approach to AI in Aviation. 2025. Available online: https://www.easa.europa.eu/en/document-library/general-publications/easa-artificial-intelligence-roadmap-20 (accessed on 8 August 2025).
National Highway Traffic Safety Administration (NHTSA). Critical Reasons for Crashes Investigated in the National Motor Vehicle Crash Causation Survey. DOT HS 812 565, NHTSA National Center for Statistics and Analysis, Washington, DC, USA, 2015. Available online: https://crashstats.nhtsa.dot.gov/Api/Public/Publication/812506 (accessed on 18 September 2025).
Koopman, P.; Wagner, M. A Safety Standard Approach for Fully Autonomous Vehicles. SAE Int. J. Connect. Autom. Veh. 2018, 1, 8–24. [Google Scholar]
Kalra, N.; Paddock, S.M. Driving to Safety: How Many Miles of Driving Would It Take to Demonstrate Autonomous Vehicle Reliability? Transp. Res. Part A Policy Pract. 2016, 94, 182–193. [Google Scholar] [CrossRef]
Waymo. Waymo Safety Impact Report. Waymo Safety Hub, 2023. Available online: https://waymo.com/safety/impact/ (accessed on 18 September 2025).
Kalra, N.; Groves, D.G. The Enemy of Good: Estimating the Cost of Waiting for Nearly Perfect Automated Vehicles. RAND Corporation. 2017. Available online: https://www.rand.org/pubs/research_reports/RR2150.html (accessed on 18 September 2025).
Blincoe, L.J.; Miller, T.R.; Zaloshnja, E.; Lawrence, B. The Economic and Societal Impact of Motor Vehicle Crashes, 2010. NHTSA, DOT HS 812 013, May 2015. Available online: https://rosap.ntl.bts.gov/view/dot/78697 (accessed on 18 September 2025).
Scanlon, J.M.; Kusano, K.D.; Engström, J.; Victor, T. Collision Avoidance Effectiveness of An Automated Driving System Using A Human Driver Behavior Reference Model in Reconstructed Fatal Collisions, 2022. Waymo, LLC. Available online: https://waymo.com/research/collision-avoidance-effectiveness-of-an-automated/ (accessed on 18 September 2025).
ISO 26262:2018; Road Vehicles—Functional Safety. International Organization for Standardization: Geneva, Switzerland, 2018.
ISO 21448:2022; Road Vehicles—Safety of the Intended Functionality (SOTIF). International Organization for Standardization: Geneva, Switzerland, 2022.
Federal Aviation Administration. Roadmap for Artificial Intelligence Safety Assurance. 2023. Available online: https://www.faa.gov/sites/faa.gov/files/2023-03/AI_Safety_Roadmap.pdf (accessed on 22 August 2025).
European Union Aviation Safety Agency. Guidance for Level 1 & 2 Machine Learning Applications, Concept Paper. 2023. Available online: https://www.easa.europa.eu/en/document-library/general-publications/easa-artificial-intelligence-concept-paper-issue-2 (accessed on 22 August 2025).
Daedalean. Runway Landing Guidance. Available online: https://www.daedalean.ai/capabilities/landing (accessed on 10 August 2025).
Wisk. Autonomy, Explained. Available online: https://wisk.aero/autonomy/ (accessed on 10 August 2025).
Lucier, L.M.; Kirkpatrick, K.C.; Ramirez-Serrano, A. Lessons Learned in the Introduction of Automation and Autonomy to International Space Station (ISS) Robotics Operations Planning. In Proceedings of the 16th International Conference on Space Operations, Cape Town, South Africa, 18–22 May 2020; Available online: https://www.researchgate.net/publication/351391347_Lessons_Learned_in_the_Introduction_of_Automation_and_Autonomy_to_International_Space_Station_ISS_Robotics_Operations_Planning (accessed on 28 August 2025).
Rembala, R.; Braithwaite, T.; Langley, C.S.; Lamarche, L.; Smith, B. The Utilization of ISS Canadian Robotics to Advance Variable Autonomy Robotic Techniques and Technologies for Future Deep Space Exploration Missions from Cislunar Space to Mars. In Proceedings of the 67th International Astronautical Congress, Guadalajara, Mexico, 26–30 September 2016; Available online: https://iafastro.directory/iac/paper/id/34101/summary/ (accessed on 28 August 2025).
Friend, R.B. Orbital Express Program Summary and Mission Overview. In Proceedings of the Sensors and Systems for Space Applications II, Orlando, FL, USA, 17–18 April 2008; SPIE: Bellingham, WA, USA, 2008; Volume 6958, p. 695803. [Google Scholar] [CrossRef]
Motaghedi, P. On-orbit performance of the Orbital Express Capture System. In Proceedings of the Sensors and Systems for Space Applications II, Orlando, FL, USA, 17–18 April 2008; SPIE: Bellingham, WA, USA, 2008; Volume 6958, p. 69580E. [Google Scholar] [CrossRef]
Ogilvie, A.; Allport, J.; Hannah, M.; Lymer, J. Autonomous robotic operations for on-orbit satellite servicing. In Proceedings of the Sensors and Systems for Space Applications II, Orlando, FL, USA, 17–18 April 2008; SPIE: Bellingham, WA, USA, 2008; Volume 6958, p. 695809. [Google Scholar] [CrossRef]
Vasavada, A.R. Mission Overview and Scientific Contributions from the Mars Science Laboratory (MSL) after Eight Years of Surface Operations. Space Sci. Rev. 2022, 218, 14. [Google Scholar] [CrossRef]
Farley, K.A.; Williford, K.H.; Stack, K.M.; Bhartia, R.; Chen, A.; de la Torre, M.; Wiens, R. Mars 2020 Mission Overview. Space Sci. Rev. 2020, 216, 8. [Google Scholar] [CrossRef]
Rankin, A.; Maimone, M.; Biesiadecki, J.; Patel, N.; Levine, D.; Toupet, O. Driving Curiosity: Mars Rover Mobility Trends During the First Seven Years. In Proceedings of the 2020 IEEE Aerospace Conference, Big Sky, MT, USA, 7–14 March 2020; pp. 1–19. [Google Scholar] [CrossRef]
Siegfriedt, R.; Chien, S.; Gaines, D.; Kuhn, S.; Hazelrig, J.; Biehl, J.; Connell, A.; Francis, R.; Waldram, N. Mars 2020 Onboard Planner—Update And Preparations For Operations. In Proceedings of the Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA 2023), Leiden, The Netherlands, 18–20 October 2023; Available online: https://ai.jpl.nasa.gov/public/documents/papers/M2020-SP-ASTRA-2023.pdf (accessed on 2 October 2025).
Francis, R.; Estlin, T.; Doran, G.; Johnstone, S.; Gaines, D.; Verma, V.; Burl, M.; Frydenvang, J.; Montaño, S.; Wiens, R.C.; et al. AEGIS autonomous targeting for ChemCam on Mars Science Laboratory: Deployment and results of initial science team use. Sci. Robot. 2017, 2, 7. [Google Scholar] [CrossRef]
Lawson, P.R.; Kizovski, T.V.; Tice, M.M.; Clark, B.C.; VanBommel, S.J.; Thompson, D.R.; Wade, L.A.; Denise, R.W.; Heirwegh, C.M.; Elam, W.T.; et al. Adaptive sampling with PIXL on the Mars Perseverance rover. Icarus 2025, 429, 116433. [Google Scholar] [CrossRef]
Rankin, A.; Del Sesto, T.; Hwang, P.; Justice, H.; Maimone, M.; Verma, V.; Graser, E. Perseverance Rapid Traverse Campaign. In Proceedings of the 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2023; pp. 1–16. [Google Scholar] [CrossRef]
Herbert, E. Risk Management for Commercial-Off-The-Shelf Parts Based Space Hardware. Syst. Eng. 2025, 28, 374–392. [Google Scholar] [CrossRef]
Chen, Y.; Li, X.; Guo, S.; Ng, X.Y.; Ang, M. Real2Sim or Sim2Real: Robotics Visual Insertion using Deep Reinforcement Learning and Real2Sim Policy Adaptation. arXiv 2022, arXiv:2206.02679. [Google Scholar] [CrossRef]
Pfaff, N.; Fu, E.; Binagia, J.; Isola, P.; Tedrake, R. Scalable Real2Sim: Physics-Aware Asset Generation Via Robotic Pick-and-Place Setups. arXiv 2025, arXiv:2503.00370. [Google Scholar] [CrossRef]
Nourizadeh, P.; Milford, M.; Fischer, T. Teach and Repeat Navigation: A Robust Control Approach. arXiv 2024, arXiv:2309.15405. [Google Scholar]
Ahmed, S.K. How to choose a sampling technique and determine sample size for research: A simplified guide for researchers. Oral Oncol. Rep. 2024, 12, 100662. [Google Scholar] [CrossRef]
Dickinson, C.S.; Shi, F.N.; Vasudeva, K.; Mukherjee, R.M.; Blanchard, J.; Debrule, S.; Empey, J.; Kugler, J.; Maghoul, P.; Ryan, A.J.; et al. A Pathfinder Lunar Construction Mission Concept Using Regolith Filled Bags. Aerospace 2025, 12, 4135. [Google Scholar]

Figure 1. STAR-L Classification Levels Plotted on the Trust and Technology Readiness Axes. Green indicates the optimal development path for an autonomous system; red/orange indicates suboptimal use of the technology [53].

Figure 2. SAE J3016 Classification Levels [55].

Figure 3. EASA AI Classification Levels [56].

Figure 4. LoA Classification Levels [3].

Figure 5. Development View of the VLS [68].

Figure 6. Auto Taxi Project: High-Level Control Structure. The arrows between controllers denote information flows [67].

Figure 7. Global view of learning assurance W-Shaped Process, non AI/ML constituent V cycle process. The dotted line is here to make a distinction between the use of traditional development assurance processes (above) and the need for processes adapted to the data-driven learning approaches (below) [67].

Figure 8. Sample Autonomy Task Test Plan Diagram. Blue indicate simple (atomic) tasks; orange (compound) tasks involve a series of operations; and green (operational logic) where decisions are made onboard in real-time based on feedback from the system.

Figure 9. Sample Multi-Environment Autonomy Task Test Plan Diagram. Blue indicate simple (atomic) tasks; orange (compound) tasks involve a series of operations; and green (operational logic) where decisions are made onboard in real-time based on feedback from the system.

Figure 10. Robotic Arm and Vision System Atomic Task Breakdown for LUNAR BRICS.

Figure 11. Autonomy Task Test Plan (ATTP) Diagram for LUNAR BRICS Pathfinder Mission. Blue indicate simple (atomic) tasks; orange (compound) tasks involve a series of operations; and green (operational logic) where decisions are made onboard in real-time based on feedback from the system. Line colours indicate task grouping.

Table 1. SAE J3016 Level Description and STAR-L Designation, adapted from [55].

SAE Level	STAR-L TrRL	Description of SAE J3016 Level
Support Features: Human is in control, must supervise feature performance
0	4	Limited features: Warnings and short-term assistance in emergencies
1	4–5	Supportive features: Steering or speed control
2	5–6	Combination: Steering and speed control
Automated Features: Human is not primarily in control, may provide support
3	5–6	Operational in ideal conditions, requests human control when needed
4	7–8	Operational in ideal conditions and is self-sufficient
5	9	Independently operational in all conditions

Table 2. EASA Level Description and STAR-L Designation, adapted from [56].

EASA Level	STAR-L TrRL	Description of EASA Level
AI Assistance to Humans
1A	4	Human Augmentation
1B	4–5	Human utilizes AI decision-making assistance
Human and AI Teaming
2A	5–6	AI provides information by following a task pattern for human decisions
2B	5–6	Human and AI collaborate on problem-solving a unified goal
Advanced Automation
3A	7–8	AI makes decisions and actions with human supervision
3B	9	AI makes non-supervised decisions and actions

Table 3. LoA Level Description and STAR-L Designation, adapted from [3].

LoA Level	STAR-L TrRL	Description of LoA Level
0	1–3	Manual human operations
1	4	System provides information when requested
2	5	System suggests alternatives for human decision-making
3	6	System proposes decisions for human approval
4	7–8	System acts unless overruled by human intervention
5	9	System acts independently with full confidence

Table 4. Comparison of Incident Rates between Waymo and Human Drivers [60].

Severity Tier	Waymo IPMM	Human IPMM	Reduction	95% CI
Any Injury Reported	0.85	4.04	79%	−85% to −71%
Airbag Deployment	0.32	1.69	81%	−88% to −68%
Suspected Serious Injury+	0.04	0.24	85%	−94% to −46%
Police-Reported (all severities)	∼2.1	∼4.68	55%	−62% to −45%

Table 5. EASA AI Levels [67].

AI Level	Function Allocated to the System to Contribute to the High-Level Task	Authority of End User
1A: Human Augmentation	Automation support to information acquisition/analysis	Full
1B: Human Assistance	Automation support to decision-making	Full
2A: Human–AI Cooperation	Directed decision and automatic action implementation	Full
2B: Human–AI Collaboration	Supervised automatic decision and action implementation	Partial
3A: Safeguarded Advanced Automation	Safeguarded automatic decision and action implementation	Limited, upon alerting
3B: Non- supervised Advanced Automation	Non-supervised automatic decision and action implementation	Not applicable

Table 6. Autonomy Levels by Task and Scenario, as taken from a combination of papers from the International Society for Optics and Photoics (SPIE) [72,74].

Task	Scenario 0	Scenario 1	Scenario 2	Scenario 8
Coupler mating	1	–	–	–
Fluid (propellant) transfer	1	2	3	4
ORU transfer	1	2	3	4
OEDMS grapple and/or free-flyer capture & berthing	–	–	–	4
Undocking	–	–	4	4
Approach	–	–	4	4
Station-keeping	–	–	4	4
Direct capture	–	–	4	–
Autonomous fly-around	–	–	–	4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dickinson, C.S.; Alam, D.; Francis, R.; Lucier, L.M.; Nguyen, A.; Prosser, N.; Waslander, S.L.; Grouchy, P. A Method for Lunar Surface Autonomy Certification: Application to a Construction Pathfinder Mission. Aerospace 2025, 12, 1115. https://doi.org/10.3390/aerospace12121115

AMA Style

Dickinson CS, Alam D, Francis R, Lucier LM, Nguyen A, Prosser N, Waslander SL, Grouchy P. A Method for Lunar Surface Autonomy Certification: Application to a Construction Pathfinder Mission. Aerospace. 2025; 12(12):1115. https://doi.org/10.3390/aerospace12121115

Chicago/Turabian Style

Dickinson, Cameron S., Diba Alam, Raymond Francis, Laura M. Lucier, Anh Nguyen, Noa Prosser, Steven L. Waslander, and Paul Grouchy. 2025. "A Method for Lunar Surface Autonomy Certification: Application to a Construction Pathfinder Mission" Aerospace 12, no. 12: 1115. https://doi.org/10.3390/aerospace12121115

APA Style

Dickinson, C. S., Alam, D., Francis, R., Lucier, L. M., Nguyen, A., Prosser, N., Waslander, S. L., & Grouchy, P. (2025). A Method for Lunar Surface Autonomy Certification: Application to a Construction Pathfinder Mission. Aerospace, 12(12), 1115. https://doi.org/10.3390/aerospace12121115

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Method for Lunar Surface Autonomy Certification: Application to a Construction Pathfinder Mission

Abstract

1. Introduction

1.1. Motivation for Autonomy

1.1.1. Definition of Autonomy

1.1.2. Operational Constraints: Why Is Autonomy Necessary?

Communication Constraints

Human Operator Constraints

1.1.3. History of Autonomy in the Space Industry

Intravehicular Systems (ISS)

Extravehicular Autonomy in Earth Orbit

Lunar Autonomy

Martian Autonomy

Deep Space Autonomy

1.2. Classifications for Autonomy

1.2.1. Definition of Trust

1.2.2. Industry Classification Systems

Space Trusted Autonomy Readiness Levels (STAR-L)

SAE J3016 Levels of Driving Automation

European Union Aviation Safety Agency (EASA) Classification of AI Applications

Levels of Automation (LoA)

1.2.3. Autonomy Trust Versus Technological Readiness

2. Real-World Examples of How High Levels of Autonomy Are Certified

2.1. Case Study: Safety Assessment of the Waymo Autonomous Driving System

2.1.1. Introduction and Background

2.1.2. Current State of the Art in Safety Demonstration

2.1.3. Simulation and Closed-Course Testing

2.1.4. Evaluating Completeness of the Safety Case

2.1.5. Constructing the Human Benchmark

2.1.6. Safety Performance Results

2.1.7. Regulatory Compliance

2.1.8. Discussion and Future Directions

2.2. Case Study: Adoption of AI in the Aviation Industry

2.2.1. Introduction and Background

2.2.2. Industry AI Status and AI Category

2.2.3. Impetus for Increasing Levels of Autonomy

2.2.4. Certification of AI Systems

2.2.5. The Future of Autonomy in Aviation

2.3. Case Study: How Autonomy Is Adopted on the International Space Station

2.3.1. Introduction and Background

2.3.2. Automation in Robotics Mission Design

2.3.3. The Mobile Servicing System Application Computer (MAC)

2.3.4. Lessons Learned from ISS Autonomy

2.4. Case Study: Orbital Express Robotic Rendezvous in Low Earth Orbit

2.4.1. Background

2.4.2. Steps to Increased Autonomy

2.5. Case Study: Autonomy on the Surface of Mars with the Curiosity and Perseverance Rovers

2.5.1. Example Autonomy Systems on Current Mars Rovers

2.5.2. Successful Strategies for Implementing and Advancing Autonomy on Mars Rover Missions

2.5.3. Make Clear the Case for Autonomy

2.5.4. Involve Stakeholders Early and Continually

2.5.5. Add Safety Protections, Don’t Remove Them

2.5.6. Recognize the Value of Familiarity and Heritage

2.5.7. Limits to Freedom of Action

2.5.8. Fail Gracefully

2.5.9. Minimize Operational Complexity

2.5.10. Autonomy Deployment Process

3. Proposed Method for Building Trust in Lunar Autonomy

3.1. Phase 0—Manufacturing Reliability and Ground Testing

3.1.1. Design for Reliability

3.1.2. Testing for Reliability

3.1.3. Algorithm Development

3.1.4. Assessment of Risk

3.1.5. Summary

3.2. Phase I—Tele-Operations with Autonomy Support

3.2.1. Tele-Operator PIC-System Validation

3.2.2. Autonomy Supervisor ÆSIC-Data Collection and Learning

3.2.3. Autonomy Supervisor ÆSIC–Heads up Display

3.2.4. Autonomy Supervisor ÆSIC-Predictive Planning

3.2.5. Autonomy Supervisor ÆSIC-Repetitive Tasks

3.2.6. Summary

3.2.7. Phase II—Increasing Autonomy with Humans In-the-Loop

3.2.8. Autonomous Supervisor ÆPIC-Atomic & Repetitive Tasks

3.2.9. Autonomous Supervisor ÆPIC-Compound & Operational Logic Tasks

3.2.10. Tele-Operator SIC-Gating & Monitoring

3.3. Phase III—Fully Autonomous Certification

3.3.1. Autonomous Supervisor ÆPIC-All Tasks

3.3.2. Tele-Operator-Periodic Monitoring

3.3.3. Summary