1. Introduction
Surface mining occupies a uniquely hazardous position among industrial sectors. It is characterized by the interaction of large-scale mobile equipment, complex operating conditions, and the persistent exposure of personnel to high-risk environments. The combination of heavy-haul trucks, monotonous work cycles, and challenging terrain introduces fatigue, human error, and the potential for accidents. This makes operational failures both severe and often irreversible [
1,
2]. The emergence of Autonomous Haulage Systems (AHS) represents one of the most significant technological transitions in modern mining. This transition is driven by the need to eliminate the “human factor,” improve safety, and enhance productivity. AHS are continuously evolving and are gaining popularity for haulage solutions in surface mining. The AHS deployment in the late 2000s increased utilization, extended operating hours, and improved performance of up to 20% compared to conventional systems [
2,
3]. Modern AHS integrates advanced perception, control systems, communication networks, and intelligent decision-making to enable safer and more efficient haulage operations [
1,
3]. Leading original equipment manufacturers, such as Komatsu and Caterpillar, have fielded proprietary AHS platforms that are now operating at industrial scale across surface mining sites on multiple continents [
2].
Despite these achievements, fundamental architectural limitations persist. Perception in the current AHS is predominantly egocentric. Contemporary systems maintain situational awareness through onboard sensors such as LiDAR, radar, and cameras, supported by GNSS positioning and fleet management software for routing and scheduling [
4,
5]. This vehicle-centric standard is proven effective within carefully delineated operational design domains.
Modern commercial mining AHS integrates LiDAR, radar, GNSS, fleet management, and obstacle detection into highly reliable production systems. However, these systems remain largely vehicle-centric. They rely on onboard sensing and make limited use of external environmental information.
Urban and highway autonomous driving have made strides in perception and object detection in the recent past. Benchmark datasets such as KITTI [
6] and the Waymo Open Dataset [
7] established standardized protocols for evaluating LiDAR- and camera-based object detection, tracking, and scene understanding. Similarly, the single-stage detector architecture YOLO [
8] is widely used in many onboard object-detection systems that are being adapted for haul trucks, as presented in the mining-specific YOLO variants reviewed in
Section 3 [
9,
10]. However, these benchmarks were developed for structured autonomous driving, and their underlying assumptions of consistent road geometry, defined lane structures, and comparatively predictable traffic behavior do not transfer directly to the unstructured, dust-laden, and geotechnically dynamic conditions of open-pit mines. Consequently, research on mining AHS focuses on adapting these architectures through domain adaptation, transfer learning, and mining-specific datasets rather than applying them directly. This gap between general-purpose autonomous-driving perception research and the operational realities of surface mining motivates much of the mining-specific adaptation work reviewed in this paper.
The challenge is fundamentally both perceptual and contextual. The surface mining environment presents some of the most difficult operating conditions for autonomous systems from a sensing standpoint. Dust clouds generated by haul trucks operating over unconsolidated material significantly degrade LiDAR performance. This introduces significant noise into point cloud data and drastically reduces the detection range, often by more than half in extreme cases [
11,
12]. Vision-based sensors are similarly constrained, as they are highly sensitive to illumination variability. This includes glare from direct sunlight, low-light conditions at night, and motion-induced image distortion caused by continuous vehicle vibration [
13,
14,
15]. Collectively, these factors create persistent visibility degradation scenarios where current egocentric sensor fusion approaches struggle to mitigate effectively.
At a broader level, conventional vehicle-centric perception frameworks conceptualize each autonomous unit as an independent agent that relies exclusively on its onboard sensing systems for navigation and safety-critical decision-making. This inherently localized perception paradigm creates significant blind spots throughout the operational environment. Sensor range, line of sight, and environmental conditions constrain system awareness. Hazards such as slope instability, localized ground weakening, or the presence of manned equipment may exist beyond the perceptual horizon of a single vehicle. These limitations are recognized in autonomous systems, where perception is limited to immediate sensing and affected by the environment. This leads to incomplete awareness [
4,
16,
17]. As a result, autonomous haulage operations often require supplementary input such as externally defined operational constraints or supervisory control to compensate for gaps in perception. This underscores a fundamental limitation of current vehicle-centric autonomy paradigms.
Similar limitations have been found in the field of intelligent transportation systems, where complex, real-world environments have shown that vehicle-based autonomy is insufficient. In response, research has advanced toward cooperative perception frameworks such as Vehicle-to-Everything (V2X), enabling vehicles and infrastructure to exchange sensory and contextual data and thereby extending situational awareness beyond the limits of individual sensing [
18,
19,
20]. Simultaneously, the advent of digital twins and real-time geotechnical monitoring in mining contexts demonstrates the capacity to incorporate extensive environmental intelligence into operational decision-making [
21,
22]. In industrial practice, these ecosystem-level capabilities are already partially instantiated: commercial fleet management platforms such as Wenco Fleet Management System and Modular Mining’s dispatch coordinate truck–shovel assignment and routing across the mine [
23,
24,
25]. Wenco’s Fleet Management System provides real-time fleet dispatching, equipment tracking, production monitoring, and traffic coordination. This illustrates how operational intelligence already exists at the fleet level even though it remains largely disconnected from onboard perception. Ground-based slope-stability radar systems, such as those supplied by GroundProbe, have become a standard component of open-pit geotechnical monitoring programs [
26,
27,
28]. GroundProbe radar continuously measures slope deformation and provides early warnings of potential failures. However, these alerts are typically delivered to geotechnical personnel rather than directly integrated into autonomous haulage perception systems, illustrating the disconnect between environmental monitoring and vehicle autonomy.
In this review, ecosystem intelligence refers to a perception and decision-making paradigm that extends beyond vehicle-centric sensing by integrating information from multiple mine-wide sources, including connected vehicles, fleet management systems, fixed infrastructure sensors, digital twins, geotechnical monitoring platforms, and environmental sensing networks. By combining vehicle-level perception with ecosystem-level information, ecosystem intelligence enables a more comprehensive and context-aware understanding of the operational environment. This supports safer and more adaptive autonomous mining operations. Despite the availability of these technologies, their integration into a unified perception architecture remains under-developed.
This paper systematically reviews the gap between general-purpose autonomous driving and mining-specific adaptations, and the intelligent ecosystem for perception and decision-making for the mining-specific AHS. This paper is structured as follows.
Section 2 describes the systematic review methodology.
Section 3 reviews dynamic vision architecture in detail.
Section 4 and
Section 5 examine vehicle-level autonomy and fleet-scale ecosystem integration, respectively.
Section 6 presents the ECDV framework.
Section 7 addresses open challenges and emerging research directions.
Section 8 and
Section 9 present the discussion and conclusions, respectively.
2. Methodology
2.1. Review Design and Protocol Registration
This systematic review follows the PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [
29]. The review protocol was prospectively registered with OSF before screening began and specified the research questions, search strategy, eligibility criteria, data extraction fields, and quality assessment procedure in advance of data collection. The review protocol was designed to synthesize evidence on dynamic vision architectures deployed and evaluated in surface-mining autonomy contexts and to identify the architectural gap between current egocentric implementations and ecosystem-level intelligence requirements.
2.2. Search Protocol
Searches were conducted across five electronic databases: IEEE Xplore, Scopus, Web of Science, ACM Digital Library, and Google Scholar, covering the period from January 2010 to 2026. The 2010 lower bound was selected to capture the first generation of deep learning-enabled perception systems while excluding pre-deep-learning literature that is methodologically distinct. The search was structured around three concept clusters combined with Boolean operators:
Cluster A (domain): “surface mining” OR “open pit” OR “open cut” OR “AHS” OR “autonomous haulage”
Cluster B (technology): “perception” OR “vision” OR “LiDAR” OR “sensor fusion” OR “deep learning” OR “object detection” OR “semantic segmentation”
Cluster C (systems scope): “autonomous” OR “autonomy” OR “cooperative perception” OR “fleet management” OR “ecosystem” OR “digital twin” OR “V2X”
The full search string applied was (Cluster A) AND (Cluster B) AND (Cluster C). For instance, the Scopus implementation was: ((“surface mining” OR “open pit” OR “open-cut” OR “AHS” OR “autonomous haulage”) AND (“perception” OR “vision” OR “LiDAR” OR “sensor fusion” OR “deep learning” OR “object detection” OR “semantic segmentation”) AND (“autonomous” OR “autonomy” OR “cooperative perception” OR “fleet management” OR “ecosystem” OR “digital twin” OR “V2X”)) AND PUBYEAR > 2009. The IEEE Xplore and Web of Science strings followed the same Boolean logic, substituted into each platform’s field-tag syntax. Google Scholar, which does not support nested Boolean field queries, was searched using the simplified phrase combination “autonomous haulage” OR “open pit” AND (“perception” OR “LiDAR” OR “sensor fusion”) AND (“cooperative” OR “digital twin” OR “V2X”), with the first 200 results screened per query.
2.3. Inclusion and Exclusion Criteria
Studies were included if they;
- (i)
addressed sensing, perception, or decision-making for autonomous or semi-autonomous vehicles or systems in surface mining environments;
- (ii)
documented evidence of fleet management, cooperative perception, digital twin, or geotechnical monitoring integration with AHS systems;
- (iii)
reported original empirical results from field deployments, test-track trials, hardware system designs, or simulation experiments validated against real sensor data or hardware testing;
- (iv)
were published in peer-reviewed journals, major conference proceedings (IEEE ICRA, IROS, ITSC, Mine Automation), or substantive industry technical reports from named OEMs and research institutes; and were available in English.
Studies were excluded if they addressed exclusively underground mining, considered only teleoperation without autonomous perception, reported purely theoretical modeling without validation, or appeared only as abstracts or extended abstracts. Supplementary searches were conducted on the integration of geotechnical monitoring, digital twin applications in mining, and V2X cooperative perception.
2.4. Screening and Data Extraction
Title and abstract screening were conducted independently by two reviewers. Full-text screening was applied, and the inclusion and exclusion criteria were applied to the remaining papers, again independently by the same two reviewers. Disagreements at either stage were resolved through discussion between the two reviewers; cases that could not be resolved by consensus were adjudicated. Data extraction captured the architecture family, sensor modality, validation environment (field, test track, simulation), reported performance metrics, operational scale, safety outcomes, and explicitly stated limitations. The PRISMA flow diagram is shown in
Figure 1.
Figure 1 presents the PRISMA 2020 flow diagram illustrating the systematic screening and selection process used in this review. A total of 4847 records were initially identified through database searching, and an additional 14 studies were identified through citation tracking and expert recommendations. These additional studies were also included in the screening process together with the database derived records. After duplicate removal, 3214 unique records remained for title and abstract screening. During this stage, 2681 records were excluded because they were not directly relevant to surface mining autonomy or vision-based systems. Consequently, 533 full-text articles were assessed for eligibility. Following the application of the predefined inclusion and exclusion criteria, 100 studies were ultimately included in the final review. The 14 supplementary studies were considered alongside all other records throughout the screening and eligibility assessment process, and eligible studies from this group were included in the final set of 100 studies. Following a detailed evaluation, 447 studies were excluded for several reasons. These were studies focused exclusively on underground mining (
n = 99), simulation-only studies without hardware validation (
n = 96), studies lacking a perception or vision component (
n = 122), conference abstracts without sufficient technical detail (
n = 60), and non-English publications (
n = 70). Ultimately, 86 primary studies satisfied the inclusion criteria and were retained for the review. Together with 14 supplementary studies identified through citation tracking, a total of 100 studies were included in the final synthesis. The selection process ensured that the review focused specifically on validated research on autonomous haulage, perception systems, and ecosystem-level intelligence in surface mining environments.
2.5. Quality Assessment
Each included source was scored using a five-criterion quality-assessment rubric designed for this review, covering publication quality, methodological transparency, empirical or algorithmic validation, relevance to the review objectives, and completeness of reporting, as shown in
Table 1. Each criterion was scored 1 if satisfied and 0 if not sufficiently satisfied by the same two reviewers responsible for screening, with disagreements resolved through discussion. The maximum possible score is 5. Total scores were mapped to four reliability levels: High (4–5), Moderate (3), Low (1–2), and No Reliability (0), as defined in
Table 2.
Sources rated Low Reliability or No Reliability were retained for narrative and contextual discussion only and were excluded from the quantitative synthesis tables to ensure that comparative performance figures are not anchored to weakly substantiated results. Sources rated Moderate Reliability were retained in the synthesis tables but flagged; accordingly, sources rated High Reliability were treated as the primary evidentiary basis for quantitative claims. For sources retained in the synthesis tables, the reliability rating directly determined the evidence label applied. Studies validated on physical hardware at an operational or test-track mine site were labeled “deployment-validated”, whereas studies validated only in simulation, on automotive benchmark datasets, or at small research-prototype scale were labeled “research-prototype”.
5. Fleet Intelligence and Ecosystem Integration
This section synthesizes evidence from the reviewed literature on ecosystem-level data sources that complement onboard vehicle perception. Fleet management systems, cooperative perception networks via V2X communication, fixed-infrastructure sensing, digital twin platforms, and geotechnical monitoring systems are increasingly mature technologies already deployed in major mining operations. This section examines the current state of each ecosystem integration layer and identifies the technical and organizational barriers to their coupling with vehicle-level perception and decision support.
5.1. Fleet Management Systems and Perception Coupling
Fleet management systems constitute the existing layer of mine-wide intelligence in automated surface mining operations. Fleet management platforms such as Wenco, Modular Mining’s dispatch, and Sandvik’s OptiMine are reported to aggregate GPS-derived vehicle positions, payload measurements, fuel consumption data, and maintenance alerts. This helps to optimize cycle times, reduce queue congestion, and maximize shovel utilization. However, these systems operate at a logistics and scheduling level. They do not currently consume or broadcast real-time perceptual data. AHS trucks generate LiDAR point clouds, camera feeds, and radar detections that remain siloed within each vehicle’s onboard computer platform [
18,
71].
This architectural separation means that safety-critical environmental information detected by one vehicle, such as an obstacle at a dump or a damaged berm section, is not shared with other vehicles approaching the same location. Each vehicle independently detects or fails to detect the same hazard. The gap between FMS-level situational awareness and vehicle-level perceptual awareness is the primary architectural bottleneck in the transition toward ecosystem intelligence.
5.2. Cooperative Perception and V2X in Mining
Cooperative perception refers to the sharing of sensory data or derived perception outputs between vehicles and infrastructure to construct a shared environmental model. This has advanced substantially in the field of autonomous driving. V2X architectures encompassing Vehicle-to-Vehicle (V2V), Vehicle-to-Infrastructure (V2I), and Vehicle-to-Network (V2N) communication have been standardized through IEEE 802.11p (DSRC) and C-V2X (cellular) protocols and validated in road platooning and intersection management applications [
18,
19].
The translation of cooperative perception to surface mining is still in its early stages. Key distinctions from the road context include the physical scale of mine operations (pit diameters of 1–10 km with significant elevation changes); the severity of the communication environment (pit-wall reflections, dust attenuation of wireless signals, and limited 5G coverage in deep pits); and the heterogeneity of the vehicle fleet (autonomous trucks, manned light vehicles, manual dozers, shovel operators, and drones). Wang et al. [
33] demonstrated a cooperative perception prototype at a research mine site in China, using LTE V2X to share compressed LiDAR feature maps among two autonomous trucks and an RSU-mounted camera, achieving a 40% reduction in blind-zone area in the loading zone without exceeding available bandwidth constraints. This proof of concept represents the current frontier of cooperative perception in mining; no production deployments were identified in the reviewed literature.
The maturity of road V2X communication provides valuable insights for the evolution of cooperative perception and ecosystem-level data integration in surface mining. However, direct adoption is constrained by the unique operational characteristics of mining environments. In autonomous road vehicles, standardized V2X communication architectures have enabled interoperable V2V, V2I, V2N and V2P communication, allowing connected agents to exchange perception information, extend situational awareness beyond the sensing capabilities of individual vehicles, and support collaborative decision-making [
72,
73,
74]. However, the evidence synthesized in this review indicates that surface mining presents additional operational requirements that are not fully addressed by existing road-oriented V2X frameworks. First, ecosystem-level perception requires the integration of heterogeneous information sources, including onboard sensors, infrastructure sensing systems, fleet management systems, digital twins, geotechnical monitoring platforms, and environmental sensing networks, rather than relying solely on vehicle-to-vehicle communication. Second, communication frameworks should accommodate the unique characteristics of open-pit mining environments, including pit-wall signal reflections, variable terrain elevations, dynamic wireless coverage, and long-range operational requirements. Third, interoperability frameworks should support heterogeneous mining fleets comprising autonomous haul trucks, manually operated vehicles, stationary infrastructure, and unmanned aerial systems with diverse sensing and communication capabilities. Collectively, these findings suggest that future mining communication frameworks can build upon the architectural principles of mature V2X ecosystems while developing mining-specific extensions that facilitate interoperable ecosystem-level data integration, cooperative perception, and coordinated autonomous operations across the mining enterprise.
5.3. Infrastructure Sensing Integration
Fixed sensing infrastructure, including cameras and radar systems mounted at dump points, shovel pedestals, pit entry ramps, and berm edges, provides a complementary perspective. It is inherently free of the dust and vibration constraints that affect vehicle-mounted sensors. Several Tier 1 mining companies have deployed fixed camera arrays for traffic management and personnel safety monitoring at high-risk locations [
75]. However, these systems operate as independent safety overlays rather than as data sources integrated into the AHS perception pipeline. The technical pathway for infrastructure integration comprises two components: a communication interface between fixed sensor nodes and a central fusion server (achievable with existing LTE/5G private network infrastructure), and a data fusion architecture that combines detections from the fixed infrastructure and the vehicle ego into a coherent shared representation. The latter requires geometric calibration of fixed sensor frames to the mine coordinate system and temporal synchronization across heterogeneous hardware [
76,
77].
5.4. Digital Twin Integration
Mining companies are increasingly investing in digital platforms that integrate geospatial information, equipment telemetry, operational processes, and enterprise data to create dynamic representations of mining operations. Although many of these systems represent important precursors rather than fully realized digital twins, they provide the technological foundation for network-centric mining by supporting visualization, simulation, and enterprise-wide operational coordination [
78,
79]. Commercial platforms such as Hexagon HxGN, Caterpillar MineStar, and Bentley iTwin provide key integration capabilities by combining survey data, fleet telemetry, geological models, production schedules, and infrastructure information into unified operational environments. These platforms are continuously enriched by drone photogrammetry, LiDAR surveys, GNSS positioning, equipment telemetry, and enterprise databases, enabling increasingly accurate and up-to-date representations of the evolving mine environment [
79].
Integrating digital twin information directly into the AHS perception pipeline represents a significant yet largely unexplored opportunity for ecosystem-level perception. Rather than relying solely on onboard sensors, AHS perception systems could exploit contextual information maintained by the digital twin. This includes current haul road conditions, geotechnical hazard zones, blast exclusion boundaries, temporary road closures, fleet locations, equipment health, and anticipated traffic patterns. This contextual knowledge extends perception beyond the vehicle’s immediate sensing horizon, allowing perception algorithms to reason about operational conditions that may not be directly observable through onboard sensing alone. Like the role of high definition maps in autonomous road vehicles, digital twin provides a continuously evolving environmental priority. However, unlike static HD maps, digital twins can reflect the rapidly changing conditions of active mining operations, including excavation progress, blasting activities, road maintenance, and evolving traffic patterns. By integrating data across information technology and operational technology systems, digital twins have the potential to improve localization robustness, perception reliability, trajectory planning, and safety-critical decision-making through mine-wide situational awareness [
79].
Despite this potential, digital twins have not yet been fully integrated into real-time perception pipelines for autonomous mining vehicles. Existing digital twin implementations primarily support operational planning, visualization, simulation, and enterprise-level decision-making, in which system updates typically occur at operational timescales rather than perception timescales. In contrast, onboard AHS perception performs localization, sensor fusion, and obstacle detection within tens of milliseconds. This disparity in temporal resolution creates a fundamental systems integration challenge. Future research should therefore investigate hierarchical synchronization architectures in which only rapidly changing, safety-critical information, such as newly detected hazards, equipment positions, traffic conditions, and geotechnical alerts, is streamed directly into the perception pipeline. In contrast, less dynamic operational information is updated asynchronously. Such selective real-time synchronization would transform the digital twin from a passive operational management platform into an active source of contextual intelligence for autonomous perception, advancing the transition from vehicle-centric autonomy toward true ecosystem-level situational awareness.
5.5. Geotechnical Monitoring as a Perception Input
Geotechnical instability is one of the most significant safety hazards in surface mining. This is because it threatens personnel, mobile equipment, and production continuity [
26]. To mitigate these risks, modern slope stability monitoring employs complementary sensing technologies. This includes ground-based slope stability radar, satellite Interferometric Synthetic Aperture Radar (InSAR), GNSS receivers, automated total stations, extensometers, inclinometers, MEMS tilt sensors, and distributed optical fiber sensing. These technologies provide complementary spatial and temporal coverage for monitoring surface deformation, displacement rates, crack propagation, and progressive rock mass instability. Recent advances further integrate AI and IoT architecture to improve automated hazard detection, predictive analysis, and multi-sensor data fusion for continuous slope assessment [
26,
27].
These monitoring systems continuously generate high-frequency geotechnical information describing displacement magnitude, deformation velocity, acceleration trends, and evolving failure mechanisms. Progressive acceleration in slope displacement is widely recognized as one of the most reliable precursors to impending slope failure and forms the basis of Trigger Action Response Plans (TARPs). This enables engineering inspections, evacuation procedures, and the establishment of exclusion zones before catastrophic collapse occurs [
26,
27]. Distributed optical fiber sensing has also demonstrated significant potential for detecting localized strain accumulation, fracture initiation, and crack evolution before visible failure occurs, while satellite InSAR complements ground-based radar by providing regional-scale deformation monitoring across multiple pits, enabling the identification of previously unknown or slowly evolving geotechnical hazards that may not be captured by localized monitoring systems [
80].
Despite these advances, geotechnical monitoring systems are rarely directly coupled with the AHS perception and motion-planning pipelines. In current mining operations, deformation measurements and hazard assessments are typically communicated to geotechnical engineers or control room operators, who subsequently implement operational responses such as haul road closures, revised traffic routing, speed restrictions, or personnel exclusion zones through established operational procedures. Although this workflow provides an effective safety management process, it introduces a human-mediated delay between hazard detection and autonomous vehicle response, limiting the ability of AHS to react immediately to rapidly evolving geotechnical conditions.
Within the ECDV framework, geotechnical monitoring serves as an ecosystem-level perception layer that continuously augments onboard sensing with mine-wide environmental intelligence by integrating displacement measurements, radar alarms, InSAR deformation maps, distributed optical fiber strain measurements, and geotechnical risk assessments. This enables autonomous haulage vehicles to dynamically update maps, establish no-go zones, reroute traffic, adjust vehicle speeds, and anticipate hazards beyond the onboard sensing horizon, thereby improving operational safety and decision-making.
The principal research challenge is therefore no longer the development of additional sensing technologies, but the real-time integration of heterogeneous geotechnical information into autonomous perception and decision-making frameworks. Recent reviews emphasize that future slope stability monitoring should focus on multi-sensor integration, AI-assisted interpretation, IoT-enabled monitoring, and interoperable sensing architectures rather than isolated monitoring technologies [
26]. However, direct integration of geotechnical monitoring outputs into AHS perception remains largely unexplored. Among the ecosystem-level integration opportunities identified in this review, incorporating geotechnical monitoring into autonomous perception is one of the immediately deployable approaches, as much of the sensing infrastructure already exists in modern surface mining operations. Transforming these systems from passive monitoring tools into active perception inputs would enable autonomous haulage systems to anticipate geotechnical hazards before they become observable by onboard sensors, advancing the transition from vehicle-centric autonomy toward true ecosystem-level situational awareness.
Table 5 summarizes the principal integration layers identified in the literature, their current deployment maturity, their role in enabling ecosystem intelligence, and the key research gaps that must be addressed to support large-scale implementation.
9. Conclusions
This systematic review synthesized evidence from studies published between 2010 and 2026 to examine the evolution of dynamic vision architectures for AHS in surface mining. The review traced the evolution of perception systems from conventional onboard sensing to increasingly connected, context-aware architectures, drawing on the literature from mining engineering, robotics, intelligent transportation systems, and computer vision. Beyond vehicle-level sensing, the review examined the ecosystem-level data sources that operate alongside AHS. This includes fleet management systems, cooperative perception, digital twins, and geotechnical monitoring.
Three principal findings emerge from the reviewed literature. First, perception technologies have advanced substantially over the past decade. Improvements in deep learning, LiDAR-based object detection, millimeter-wave radar, temporal perception, edge computing, and multi-sensor fusion have significantly enhanced the accuracy, robustness, and computational efficiency of onboard perception systems. These advancements have supported the reliable commercial deployment of autonomous haul trucks within carefully delineated operational domains.
Second, the review identifies environmental robustness as the dominant unresolved challenge limiting autonomous perception in surface mining. Across the reviewed studies, airborne dust, post-blast atmospheric conditions, vibration, GNSS degradation, mixed-traffic interactions, terrain variability, and geotechnical instability consistently reduce the reliability of perception, regardless of sensing modality. LiDAR performance is particularly sensitive to airborne particulates, with ranging accuracy degrading as atmospheric transmittance falls [
38]. Although sensor fusion improves resilience by exploiting complementary sensing characteristics, no existing perception architecture consistently maintains robust situational awareness across the diverse environmental conditions encountered in operational surface mines. These limitations reflect not only the physical constraints of individual sensors but also the inherent limitations of vehicle-centric perception architectures.
Third, the review reveals that many of the technologies needed to extend perception beyond vehicles are already in commercial use, even though they remain functionally disconnected from AHS perception pipelines. Fleet management platforms such as Wenco and Modular Mining dispatch aggregate truck position, payload, and scheduling data at a commercial scale. But they do not currently consume or broadcast the LiDAR, camera, and radar detections generated onboard each truck [
18,
71]. Digital twin platforms, including Hexagon HxGN, Caterpillar MineStar, and Bentley iTwin, integrate survey data, fleet telemetry, and geological models into unified operational environments, yet function at planning and visualization timescales rather than the millisecond timescales required for perception [
78,
79]. Geotechnical monitoring systems, including slope stability radar, satellite InSAR, and MEMS-based sensing, continuously track displacement and deformation as precursors to slope failure. However, their outputs are typically routed to control room operators rather than directly into vehicle motion planning [
26,
27]. Research on cooperative perception, in which vehicles and infrastructure share sensor data to construct a common environmental model, has been demonstrated in prototype form but has not reached production deployment in mining. Consequently, the principal gap identified in the literature is no longer the availability of sensing technologies. However, the absence of a unified architecture that integrates these already-mature, independently operated systems into a coherent, real-time perception framework.
This architectural gap motivates the ECDV framework proposed in
Section 6. ECDV organizes this integration across five functional layers, extending from onboard perception (L1) through cooperative perception via V2X communication (L2), non-perceptual ecosystem context drawn from digital twins and geotechnical feeds (L3), predictive hazard modeling and risk-aware planning (L4), to a human–machine interface that supports operator oversight and audit (L5). Rather than replacing onboard perception, the framework augments it while preserving four safety-critical design constraints: graceful degradation when external data sources are unavailable, latency-stratified routing of time-critical and slower-updating information, propagation of uncertainty from external sources into the vehicle’s risk assessment, and explainability sufficient to support incident investigation. The framework is offered as a structured hypothesis for how the ecosystem-level technologies cataloged above could be connected to vehicle perception. It has not been implemented or validated, and its contribution at this stage is conceptual rather than empirical.
Advancing this agenda requires progress on several fronts identified in
Section 7. The absence of large-scale, publicly available, mining-specific perception datasets and benchmarks remains the most consistently cited barrier in the reviewed literature. This is because it leaves the field dependent on automotive benchmarks that do not reflect mining-specific conditions such as dust, night operation, and geotechnical events. Safety certification for machine-learning-based perception also lacks an established precedent. Existing functional safety standards, including IEC 61508 and the mining-specific ISO 17757:2019, were developed primarily for deterministic control systems, and applying them to neural network-based perception will require new verification methodologies that combine formal methods, statistical testing, and runtime monitoring [
81]. Communication architecture presents a further constraint. Reviewed work on feature compression and network-aware cooperative perception seeks to reduce the bandwidth burden of sharing point cloud data among vehicles [
83,
84,
85]. Recent evidence suggests that private 5G networks, combined with mobile edge computing, can meaningfully address this constraint, with one deployment reporting average communication latency of approximately 15 milliseconds and uplink and downlink throughput of approximately 1 and 1.5 gigabits per second, respectively [
76].
Several emerging technologies identified in this review offer promising, though still largely unvalidated, pathways for mining-specific perception. Foundation and vision-language models, such as the Segment Anything Model, have shown potential to reduce labeled-data requirements in adjacent domains. With one adaptation for terrain segmentation, it is reported to achieve competitive performance using only 5% of the labeled data required by specialized models. Event-based and neuromorphic sensing respond to per-pixel luminance changes with microsecond resolution and a dynamic range of over 120 dB. These characteristics suggest potential applicability to the lighting transitions and vibration encountered in surface mines, but no mining-specific validation studies were identified in the reviewed literature [
89,
90]. Federated learning allows perception models to be trained across multiple mine sites without centralizing proprietary operational data. While federated learning could theoretically provide substantially greater dataset diversity by enabling collaboration across multiple mining sites, it remains an area of ongoing research rather than a demonstrated capability in autonomous mining.
Regulatory and standardization pathways will need to evolve alongside these technical developments. ISO 17757:2019 establishes the principal international safety framework for autonomous earth-moving machinery. But offers limited guidance on certifying perception systems that integrate heterogeneous sensing and cooperative decision-making [
92]. Jurisdictional frameworks such as the Western Australia Work Health and Safety (Mines) Regulations 2022 take a risk-based approach that accommodates innovation while requiring demonstrated risk reduction, offering a possible template for ecosystem-aware AHS certification [
93]. The absence of dedicated technical standards for intelligent mining has itself been identified as a contributor to heterogeneous equipment architecture and limited interoperability across the sector [
94]. As discussed in
Section 8.2, the development of autonomous road vehicles serves as a warning: the road autonomy industry discovered, often at great expense, that Level 4 capabilities confined to a geofenced area cannot be expanded to more complex settings without fundamental architectural changes [
95]. Surface mining is positioned to internalize that lesson proactively, through standardization and ecosystem integration. As discussed in
Section 8.3, this review is subject to limitations, including the predominance of automotive perception literature in the reviewed dataset and the limited availability of non-confidential operational data from commercial AHS deployments. These constraints should be considered when interpreting the findings summarized above.
Overall, the reviewed literature indicates that vehicle-level perception technologies in surface mining have advanced considerably. Ecosystem-level perception and information integration remain comparatively underrepresented in the published literature, despite substantial advances in many of the underlying technologies. Future research should prioritize mining-specific perception datasets and benchmarks, field validation of cooperative perception architectures under operational mining conditions, safety certification methodologies suited to learning-based perception, and empirical testing of ecosystem-centric frameworks such as ECDV against real deployment data. Progress in these areas would strengthen the evidence base for the transition from vehicle autonomy to ecosystem intelligence in surface mining. This would help ensure that advances in perception algorithms are matched by equivalent progress in system integration, standardization, and operational validation.