Optimizing Autonomous Wheel Loader Performance—An End-to-End Approach

Aoshima, Koji; Wadbro, Eddie; Servin, Martin

doi:10.3390/automation6030031

Open AccessArticle

Optimizing Autonomous Wheel Loader Performance—An End-to-End Approach

by

Koji Aoshima

^1,2,*

,

Eddie Wadbro

^3,4

and

Martin Servin

^2,5,*

¹

Komatsu Ltd., 2-3-6, Akasaka, Minato-ku, Tokyo 107-8414, Japan

²

Department of Physics, Umeå University, SE-901 87 Umeå, Sweden

³

Department of Mathematics and Computer Science, Karlstad University, SE-651 88 Karlstad, Sweden

⁴

Department of Computing Science, Umeå University, SE-901 87 Umeå, Sweden

⁵

Algoryx Simulation AB, Kuratorvägen 2B, SE-907 36 Umeå, Sweden

^*

Authors to whom correspondence should be addressed.

Automation 2025, 6(3), 31; https://doi.org/10.3390/automation6030031

Submission received: 6 May 2025 / Revised: 9 July 2025 / Accepted: 10 July 2025 / Published: 12 July 2025

(This article belongs to the Collection Smart Robotics for Automation)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Wheel loaders in mines and construction sites repeatedly load soil from a pile to load receivers. Automating this task presents a challenging planning problem since each loading’s performance depends on the pile state, which depends on previous loadings. We investigate an end-to-end optimization approach considering future loading outcomes and transportation costs between the pile and load receivers. To predict the evolution of the pile state and the loading performance, we use world models that leverage deep neural networks trained on numerous simulated loading cycles. A look-ahead tree search optimizes the sequence of loading actions by evaluating the performance of thousands of action candidates, which expand into subsequent action candidates under the predicted pile states recursively. Test results demonstrate that, over a horizon of 15 sequential loadings, the look-ahead tree search is 6% more efficient than a greedy strategy, which always selects the action that maximizes the current single loading performance, and 14% more efficient than using a fixed loading controller optimized for the nominal case.

Keywords:

wheel loader; automation; optimization; look-ahead tree search; world model; deep learning

MSC:

68T07; 68U99; 90B06; 93C85

1. Introduction

The task of most construction and mining machines, such as wheel loaders, is to perform long sequences of earthmoving operations in dynamic environments. Optimizing this task involves selecting loading and transportation actions that maximize the gain over time, according to some performance measure. This is challenging for several reasons. First, the performance of a single loading cycle depends highly on the local shape of the pile and the soil properties [1]. The loading actions need to be carefully adapted to these conditions. Second, each loading alters the pile shape. From the perspective of future loadings, such a change can be either for the better or worse. There might be actions that are of low performance in the short term but are necessary to achieve higher gains in the future [2]. Therefore, a greedy strategy of always choosing the loading action that maximizes the performance for a single loading might be sub-optimal in the long run. Such a strategy might gradually deteriorate the pile state or lead to excessive transportation between the dig location and the load receiver. This motivates considering future states when selecting the actions.

We refer to the problem of finding the action sequence that maximizes the total performance of many wheel loading cycles, given the initial pile state, as the wheel loader end-to-end optimization problem. To the best of our knowledge, this problem has not been scientifically explored previously. In fact, until recently, it was computationally intractable. In our previous work [3], we developed data-driven models capable of predicting the loading performance and the subsequent state of the pile within a few milliseconds. Model inputs are the previous pile state and selected action parameters. This makes it possible to rapidly evaluate the net performance of numerous alternative action sequences in advance. These models are referred to as a world model, as they predict the new state the environment will transition into, given the previous state and a selected action, and the observations associated with the state transition [4].

This paper investigates methods for solving the wheel loader end-to-end optimization problem and analyzes the obtained solutions. The research questions include what characterizes the optimal action sequence, how sensitive the results are to the initial pile state and planning horizon, what the computational demands are, and the feasibility of practical use of the method. The idea is that the methods could be implemented in site management systems with high-precision monitoring of equipment and the 3D topography, sending coordinated work plans to the individual machines.

We use a look-ahead tree search approach illustrated in Figure 1. Future action candidates for the foreseen future states are identified recursively, offline. The action candidates are evaluated, using data-driven models developed in previous research [3], to compute their respective performance and expand into further action candidates under the new states. The method should apply to any kind of world model with the same prediction capabilities. This approach is, reportedly, similar to how experienced operators intuitively plan their work, but we expect that a computerized model and search algorithms can make more accurate predictions and plan over longer horizons. To understand the importance of the look-ahead search, we examined the effect of using different planning horizons and objectives.

The wheel loader is assumed to have a high-level planner and a low-level control system, which work as the hierarchical architecture described in refs. [5,6]. The high-level planner selects a subtask action, such as bucket filling, dumping, or V-shaped transportation, according to the current state of the machine and the work site. A low-level controller performs the subtasks given a motion plan and control parameters. For bucket filling, the machine is equipped with a type of admittance controller [7]. Our controller takes four action parameters that determine how reactive the boom and bucket actuation is in response to the momentaneous dig force [3]. The performance of a loading cycle is measured in terms of mass, time, and work, with contributions from each subtask.

We test the effect of the look-ahead tree search by considering a wheel loader tasked with performing 15 sequential loading cycles. The resulting performance and computational cost are compared with that obtained with a greedy search and a nominal strategy.

2. Related Work

The end-to-end optimization problem of sequential loading was discussed in [2], considered intractable, and substituted with a simpler problem of optimizing each loading cycle by searching in a lower-dimensional action space implicitly constrained by the pile state. An average of 90% efficiency (bucket payload relative to its capacity) was achieved over 20 cycles. The study was limited to 2D, used a heuristic simulation model for soil displacements and settling, and did not consider time or energy consumption. Attempts were made to improve the performance further by using heuristics to avoid action plans that produce challenging terrain configurations, but the optimization was computationally prohibitive.

What is a good dig location in the pile, given its shape in terms of a heightmap, has been studied from the perspective of both bucket filling and the transportation between the dig and the dump location. In [8], a coarse-level planner finds the points along the pile contour closest to the load receiver, and a refined planner was used to avoid curved regions of the pile surface to minimize bucket-side loading and drop-in filling efficiency. A sequential coarse-to-fine planner was studied in [9] and compared to a greedy planner. The coarse planner first produces a sequence of dig regions ordered radially around the pile centroid. The fine planner then selects the dig points inside each region that have the best local pile shape. Simulations (cellular automata) of 50 consecutive loadings showed that the coarse-to-fine strategy maintains a good pile shape and performs at 80–90% while a greedy strategy (always selecting the dig location with optimal pile shape) initially performs equal or better for ten loadings but drops down to about 60% after 25 loadings.

Models for adapting the loading control to the pile shape have been studied with different approaches. In [10], reinforcement learning was used to train a multiagent system for autonomous control of an underground load–haul–dump (LHD) vehicle using a depth camera, lidar, and force and kinematics sensors. One agent was used to select a favorable dig position, given the observed pile shape. A second agent had the task of steering the vehicle toward the target position and controlling the forward drive as well as the lift and tilt cylinders to perform bucket filling. The agents were trained in a simulated environment to learn control policies designed for high productivity and energy efficiency over multiple loading cycles while also avoiding collisions and wheel slip. The system achieved bucket filling with 75% of maximum capacity on average and used 4% less energy by actively selecting favorable dig locations. The vehicle was confined to operate in a narrow mine drift with little variability in viable dig position or how to navigate to and from the dump location. It is untested how well this approach applies to controlling a wheel loader above ground with much larger action and observation space. The study [11] explored multiobjective optimization of an LHD control strategy using surrogate models based on data from sequences of simulated loadings in a pile of fragmented rock. Each simulated sequence used a parameterized dig trajectory planner that adapts to the local pile state. Based on many simulations with different parameters, the learned model can identify the loading strategies that ensure a high average performance and avoid those that systematically deteriorate the pile state.

The optimization of the V-shaped transportation path between the dig and dump location has been addressed in several studies, focusing on finding the shortest path with curvature consistent with the vehicle’s minimum turning radius [12,13], taking into account the vehicle dynamics and construction working site constraints [14], non-uniform trajectories under forward and reverse driving conditions [15], or minimizing fuel consumption [16]. Time- and energy-efficient tracking of a planned path has been explored in several studies using optimal control [17,18].

The optimal bucket filling trajectory was researched in [19] using the discrete element method (DEM) and used in [20] to find the optimal short loading cycle using dynamic programming. For the sake of computational efficiency, the simulations were limited to quasi-2D piles sloped with the material’s angle of repose. The effect of the pile shape on the optimal trajectory or the resulting 3D pile state has not been explored using DEM simulation, apart from the work in [11].

3. Problem Formulation

We focus on the short loading cycle, in which the wheel loader repeatedly loads and dumps soil from a pile to a load receiver. Each cycle can be divided into a sequence of subtasks: V-turn-1, loading, V-turn-2, and dumping. These subtasks are illustrated in Figure 1. We assume the wheel loader is equipped with a low-level control system for each subtask. A high-level system (an operator or agent) activates the low-level systems by selecting a set of subtask action parameters:

a^{V 1}

,

a^{load}

,

a^{V 2}

, and

a^{dump}

, respectively.

At the beginning of each cycle

n = 1, 2, \dots, N

, the wheel loader is located by the load receiver at a position

x_{n}^{dump}

. The V-turn-1 subtask is to drive the wheel loader to a selected loading position

x_{n}^{dig}

somewhere along the edge of the nearby pile. A low-level controller plans and executes a V-turn motion that starts at

x_{n}^{dump}

and ends at

x_{n}^{dig}

, avoiding collisions with the environment, and with endpoints heading normal to the load receiver and the pile, respectively. As subtask action parameters, we use the target loading position,

a_{n}^{V 1} = x_{n}^{dig}

.

The next subtask is the actual loading from the pile. It starts with the machine approaching the selected loading position and ends with the vehicle reversing from the pile with a filled bucket. The task is carried out by an automatic bucket filling controller with some action parameters,

a_{n}^{load}

, that may be adapted to the current pile shape and material properties. The loading transforms the pile from a state

H_{n}

into a new state

H_{n + 1}

and results in some amount of soil mass

M_{n}

in the bucket. We identify the state with the geometric shape of the pile.

After bucket filling, the V-turn-2 subtask is performed. It ends with the wheel loader approaching the receiver located at a position

x_{n}^{dump}

, which is the corresponding action parameter

a_{n}^{V 2}

. The final subtask is to empty the bucket’s contents onto the load receiver with an action

a_{n}^{dump}

. When the receiver is filled, this subtask might require some adjustment action to prevent spillage. A completed loading cycle thus involves a selection of action parameters that we collect in a vector

a_{n} = [a_{n}^{V 1}, a_{n}^{load}, a_{n}^{V 2}, a_{n}^{dump}]

.

A loading cycle involves spending some amount of mechanical work

W_{n}

and time

T_{n}

to displace a certain mass

M_{n}

from the pile to the receiver. Each of the subtasks contributes to the cycle time and work, while it is only the loading subtask that produces a mass measurement. We attribute each loading cycle a performance, which we measure by the performance vector:

P_{n} = {[\frac{M_{0}}{M_{n}}, \frac{T_{n}}{T_{0}}, \frac{W_{n}}{W_{0}}]}^{T},

(1)

where

M_{0}

,

T_{0}

, and

W_{0}

are some characteristic values used for normalization. Note that the performance vector is a function of the pile state, the location of the load receiver, and the selected action, i.e.,

P_{n} (H_{n}, a_{n})

. We pose the end-to-end optimization problem of N sequential loading cycles as the problem of finding the sequence of actions

{a_{n}}_{n = 1}^{N}

that, given the initial pile state

H_{1}

and the dump location

x^{dump}

, satisfy

\underset{{a_{1}, \dots, a_{N}}}{argmin} \sum_{n = 1}^{N} w^{T} P_{n},

(2)

where

w

is a vector of positive weights for controlling the trade-off between maximizing loaded mass and minimizing time and work.

The posed optimization problem is computationally challenging to solve. An action sequence

{a_{n}}_{n = 1}^{N}

renders a sequence of pile states

{H_{n}}_{n = 1}^{N}

and performance measurements

{P_{n}}_{n = 1}^{N}

. These cannot be computed independently due to the strong dependency on the evolving pile state. The computational complexity for exhaustive search scales exponentially as

{(t_{a} D_{a})}^{N}

, where

D_{a}

is the number of action candidates that must be evaluated for each pile state and

t_{a}

is the computational time for doing so.

Assumptions and Delimitations

A number of delimitations and simplifying assumptions are made to bring down the difficulty of the problem. We focus on loading a single type of non-cohesive and homogeneous soil, namely gravel. The surrounding ground is assumed to be flat. We assume no soil is spilled from the bucket during the V-turn-2 subtask or when dumped on the receiver. Spillage would affect subsequent performance, either by loss in control precision or traction or by the need to clear the ground.

The receiver is located at a fixed location and orientation relative to the pile. We assume it is immediately replaced by another receiver at the same location when full. The wheel loader always approaches the receiver’s center position, orthogonally, and simply empties the bucket without considering the shape of the body. The contribution to the cycle performance is then a constant value. We thus ignore the selection of the dumping action parameter

a^{dump}

and the V-turn-2 parameter

a^{V 2}

from the problem but account for the contribution of the actions to the net performance.

Automatic bucket filling starts with the loader heading toward a dig location along the edge of the pile at a nominal target speed. To limit the dimensionality of the problem, the heading is always normal to the pile contour, and the edge is discretized into a finite number of candidates of dig locations separated by a step size smaller than the width of a bucket.

We assume there is always room for a collision-free V-turn between the receiver and the pile. As the dump location is known, the V-turn planner requires only the dig location and heading as input. The V-turns are performed with some performance

P_{n}^{V 1}

and

P_{n}^{V 2}

, and we assume that we can predict them (Section 4.2) with sufficient accuracy given the path and the bucket load mass.

For the loading, we assume access to a world model in the following form:

\begin{matrix} H_{n + 1} & = Φ (H_{n}, a_{n}), \end{matrix}

(3)

\begin{matrix} P_{n}^{load} & = Ψ (H_{n}, a_{n}), \end{matrix}

(4)

where

Φ

is a pile state predictor model and

Ψ

is a performance predictor model, which take as input the previous pile state

H_{n}

and the reduced action control parameter

a_{n} = [x_{n}^{dig}, a_{n}^{load}]

.

With these simplifications, the problem to solve is

\underset{{x_{n}^{dig}, a_{n}^{load}}_{n = 1}^{N}}{argmin} \sum_{n = 1}^{N} w^{T} P_{n} .

(5)

with the predicted performance

P_{n} = P_{n}^{load} + P_{n}^{V 1} + P_{n}^{V 2}

that must be computed sequentially along with the predicted pile state evolution

{H_{n}}_{n = 1}^{N}

starting from the initial state

H_{1}

.

4. Method

This section describes our models for predicting the outcome of a single loading cycle and the search method to find the optimal sequence of wheel loader actions.

4.1. Loading Prediction

We use the data-driven world model developed in [3] for a Komatsu WA320-7 wheel loader doing automatic bucket filling in piles of gravel using an admittance controller. The models predict the loading outcome:

\begin{matrix} H^{'} & = Φ (H, x^{dig}, a^{load}), \end{matrix}

(6)

\begin{matrix} P^{load} & = Ψ (H, x^{dig}, a^{load}), \end{matrix}

(7)

in terms of the resulting heightmap

H^{'}

and loading performance,

P^{load} = {[M_{0} / M_{load}, T_{load} / T_{0}, W_{load} / W_{0}]}^{T}

, given the current heightmap

H

, dig location

x^{dig}

, and action parameters

a^{load}

for the automatic loading controller.

The dependency on the pile shape is limited to the local pile shape,

h

, in the area of the dig location and with the selected heading. Therefore, the function

Φ

first does a cutout operation of the global heightmap to obtain the local pile surface. This is then fed as input, along with the loading action parameters, to a deep neural network

ϕ

. A replace function then substitutes the predicted local heightmap

h^{'} = ϕ (h, a^{load})

into

H

to obtain the predicted global heightmap

H^{'}

. Similarly, the loading performance is predicted using a deep neural network,

P^{load} = ψ (h, a^{load})

, after first cutting out the local heightmap at the dig location and selected heading. The model has three convolutional layers to encode

h

before being fed with

a^{load}

to a multi-layer perceptron. The swish activation is used for all layers. The pile state and performance predictor models are illustrated in Figure 2 and Figure 3, respectively.

The models are trained and validated on a dataset from more than 10,000 random loading actions on various gravel pile shapes. The trained models, with roughly

10^{7}

parameters, achieved around 95% accuracy in predicting loading performance and 97% accuracy in predicting the resulting pile state.

As in [3], we use different sizes and resolutions of the local heightmap for the pile state and performance predictor models. For the former model, we discretize a quadratic heightmap with 5.2 m sides in a

52 \times 52

grid. For the latter model, we use a

36 \times 36

grid and a side length of 3.6 m.

Optimal Loading Action Parameters

The wheel loader is equipped with the function of automatic bucket filling using admittance control. The controller is parameterized with four parameters

a^{load} \in {[0, 1]}^{4}

that control how reactive the boom lift and bucket tilt are to the perceived digging resistance, which in turn is an unknown function of the local pile shape. For each pile shape and soil strength, there exists some loading control parameters

a^{load *}

that optimize the loading performance:

a^{load *} = \underset{a^{load}}{argmin} w^{T} P^{load} .

(8)

The optimal control parameters can be computed using the gradient descent method with the iterative step:

a^{load} : = a^{load} - η \nabla (w^{T} P^{load}),

(9)

where

η

is the step length and

\nabla = \partial / \partial a^{load}

. As an initial guess, we use

a_{n}^{load} = {[1, 0, 0.0, 0.5, 0.5]}^{T}

, which corresponds to the machine thrusting deep into the pile, filling the bucket with little or no lifting of it [3]. This type of action has been suggested for uniformly sloped piles of dry homogeneous soil [21], and we expect the optimization process to converge faster with this initialization. The maximum number of iterations is set to 30 with early stopping (patience 3 for tolerance

10^{- 4}

). The gradient with respect to

a^{load}

is calculated by using pytorch.autograd and exploiting that

ψ

has two input branches, separating

h

and

a^{load}

so that the encoding part only needs to be evaluated once during the optimization process.

4.2. V-Turn Model

A loading cycle involves moving the wheel loader back and forth between the pile and the load receiver in a V-shaped pattern connecting the dump location

x^{dump}

and dig location

x^{dig}

. We estimate the transportation time and work by numerical integration of an assumed equation of motion for the wheel loader, including its motor strength, variable mass, and rolling resistance. For generating the V-turn paths, we use cubic B-splines [22]. In narrow spaces, Dubins curves may be the preferred choice to ensure that the vehicle’s minimum turning radius is respected [23]. Each V-turn path is formed by the combination of two spline curves. V-turn-1 is composed of the splines connecting

x^{dump} \to x^{V 1}

and

x^{V 1} \to x^{dig}

with the condition that the ingoing and outgoing directions should match in the switch back point

x^{V 1}

. Similarly, V-turn-2 is composed of two splines connecting

x^{dig} \to x^{V 2}

and

x^{V 2} \to x^{dump}

. See Figure 4 for an illustration.

To compute the splines, we follow the method and notations of [24]. The path to be found is represented by five control points,

{p_{i}}_{i = 0}^{5}

. These are obtained by solving the following equation system:

[\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ N_{0}^{1} (s_{0}) & N_{1}^{1} (s_{0}) & 0 & 0 & 0 & 0 \\ N_{0}^{2} (s_{0}) & N_{1}^{2} (s_{0}) & N_{2}^{2} (s_{0}) & 0 & 0 & 0 \\ 0 & 0 & 0 & N_{3}^{2} (s_{1}) & N_{4}^{2} (s_{1}) & N_{5}^{2} (s_{1}) \\ 0 & 0 & 0 & 0 & N_{4}^{1} (s_{1}) & N_{5}^{1} (s_{1}) \\ 0 & 0 & 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} p_{0} \\ p_{1} \\ p_{2} \\ p_{3} \\ p_{4} \\ p_{5} \end{matrix}] = [\begin{matrix} q_{0} \\ α d_{0} \\ 0 \\ q_{1} \\ β d_{1} \\ 0 \end{matrix}],

(10)

where

N_{i}^{(j)} (s)

is the jth derivative of the ith spline basis function associated with

p_{i}

and curve parameter

s \in [0, 1]

. On the right-hand side,

q_{0}

and

q_{1}

are the spline endpoints with first derivatives

α d_{0}

and

β d_{1}

, respectively, where

α

and

β

are parameters for controlling the magnitudes of the derivatives.

For a given set of endpoints,

x^{dump}

and

x^{dig}

(and heading at these points), we compute the connecting splines, via

x^{V 1}

and

x^{V 2}

, that minimize the path length and curvature according to the following weighted objective function [24]:

γ_{1} \sum {(κ_{i})}^{2} Δ s_{i} + γ_{2} \sum {(\frac{d κ_{i}}{d s})}^{2} Δ s_{i} + γ_{3} \sum Δ s_{i} .

(11)

with the weights set to

γ = [10.0, 10.0, 1.0]

to avoid sharp curves. We use Powell’s method to limit the search space such that the switch back points must reside in a box with side length

l_{2} = 10

m positioned at

l_{1} = 5

m from the endpoints, see Figure 4 for an illustration. These constraints are also chosen to shorten the reversing distance from

x^{dump}

and

x^{dig}

to the switch back points. This explains why the V-turn-1 and V-turn-2 paths are not identical although generated using the same

x^{dump}

and

x^{dig}

. The magnitude of the derivatives at

x^{dump}

,

x^{dig}

,

x^{V 1}

, and

x^{V 2}

are fixed at 10, 30, 5, and 5, respectively, for the paths to be straight around the data points. The implementation for computing the paths uses scipy.optimize.

For each given V-turn path, the time and work for driving the vehicle is computed. The first step is to compute the velocity profile

v (t)

and traction force

f (t)

. The velocity is obtained by numerical integrations of the equation of motion:

M_{tot} \frac{d v}{d t} + C_{r} v = f,

(12)

where the total variable mass

M_{tot} = M_{vehicle} + M_{load}

is composed of the vehicle mass

M_{vehicle}

,

M_{load}

is the loaded mass (zero for V-turn-1), and

C_{r} = μ_{r} M_{tot} g

is the rolling resistance with gravity acceleration g and rolling resistance coefficient

μ_{r}

. The machine is accelerated by a traction force

f (t)

to reach a preset target velocity during the transportation phase. We control

f (t)

via the throttle as described in [25]. When approaching an endpoint, the machine is decelerated by a constant brake force

f = f_{brake}

. The V-turn duration is computed by the time difference between the endpoints and the work as

\int \max [0, f (t)] v (t) d t

along the path. Note that only positive work is accounted for in the work computation, i.e., no energy can be accumulated during the decelerating phase. Also, we assume the vehicle follows the prescribed path precisely, with no required work for steering and with the bucket and boom angles held fixed during transport. We use a time step of 10 ms,

M_{vehicle} = 15, 2

tonne,

μ_{roll} = 0.01

, and the input throttle value can increase with a rate of

2.0

units/s. The target speed is 8.0 km/h for both reversing and forwarding except for the nearest 5 m before

x^{dig}

where it increases to 11.4 km/h. Example V-turns and trajectories with the resulting speed and traction force are shown in Figure 5.

To save computational time during end-to-end optimization, we pre-compute many optimal V-turns using a grid of dig locations and different headings. The result is stored in a look-up table. During optimization, the estimated time and work for the V-turns are interpolated from the look-up table.

4.3. Dumping

We assume the machine always empties the bucket at a fixed location at the receiver, without considering the shape of the loaded mass. Therefore, the emptying time is fixed at 5 s, and no work is associated with the dumping action.

4.4. Look-Ahead Tree Search

To find a near-optimal solution, we develop a look-ahead tree search algorithm that combines the world model and V-turn model for long-horizon predictions. The planning horizon is denoted by integer N and the search depth by

d \leq N

. At each planning step,

n = 1, 2, \dots, N

, a finite set of I candidate dig locations

{x_{i}^{dig}}_{i = 1}^{I}

and corresponding loading actions

{a_{i}^{load}}_{i = 1}^{I}

are considered. We are to select the dig location and loading action that are optimal over the search depth d. At the most shallow search depth,

d = 1

, this corresponds to picking the candidate that minimizes

w^{T} P_{n} (H_{n}, x_{n}^{dig}, a_{n}^{load})

. At the first planning step,

n = 1

, we know the pile state

H_{1}

and can make this pick. For the future planning steps, we must first expand the pile into its next state from the previous,

H_{n + 1} = H_{n}^{'} \equiv Φ (H_{n}, x_{n}^{dig}, a_{n}^{load})

. At larger search depths,

d \geq 2

, we compute the predicted net performance over the search horizon d using the following approximation of the evaluation function:

\begin{matrix} Q_{n}^{d} & = \sum_{k = n}^{n - 1 + d} w^{T} P_{k} (H_{k}, x_{k}^{dig}, a_{k}^{load}) \\ ≲ w^{T} P_{n} (H_{n}, x_{n}^{dig}, a_{n}^{load}) + \sum_{k = n + 1}^{n - 1 + d} w^{T} P_{k} ({\bar{H}}_{k}, {\bar{x}}_{k}^{dig}, {\bar{a}}_{k}^{load}), \end{matrix}

(13)

where

{\bar{x}}_{k}

and

{\bar{a}}_{k}

are greedy choices given the pile state

{\bar{H}}_{k}

; that is, those that minimize

w^{T} P_{k} ({\bar{H}}_{k}, {\bar{x}}_{k}, {\bar{a}}_{k})

. The pile

{\bar{H}}_{k}

is, in turn, the result of an expansion from the pile state at the previous search depth,

{\bar{H}}_{k - 1}

, using a greedy choice at that level too (if there is a maximum number loading cycles, N, then the summation in Equation (13) is limited by this in case

n + d > N

. Also, if

d = 1

, there is no second or higher term on the right-hand side). This approximation is to avoid doing an exhaustive search over all action candidates. Evaluating Equation (13) requires that pile state at the top level pile

H_{n}

is expanded into all the candidate piles

{H_{i}^{'}}_{i = 1}^{I}

using

{x_{i}^{dig}}_{i = 1}^{I}

and

{a_{i}^{load}}_{i = 1}^{I}

. The algorithm is displayed in Algorithm 1 and pictorially illustrated in Figure 1.

Algorithm 1: Look-ahead tree search with depth d and planning horizon N

4.4.1. Greedy Strategy

We refer to the choice of

d = 1

as the entirely greedy strategy. The dig location is always selected to optimize the one-step evaluation function

Q_{n}^{d = 1} = Q_{greedy} \equiv w^{T} P_{n}

with the loading action optimized for the local pile state at the dig location. In this case, the pile state

H_{n}

needs only to be expanded once and not into multiple pile state candidates.

4.4.2. Maximum Loading Strategy

Setting

Q_{n}^{d = 1} = Q_{loading} \equiv w^{T} {P_{n}}^{load}

results in dig locations and loading actions that are optimized for short-term loading performance while ignoring the V-turn cost. We refer to this as the maximum loading strategy.

4.4.3. Nominal Strategy

For comparison, we also test the strategy of always selecting the dig location that has the lowest transportation cost, irrespective of the loading performance. This is accomplished with

Q_{n}^{d = 1} = Q_{nominal} \equiv w^{T} ({P_{n}}^{V 1} + {P_{n}}^{V 2})

. We refer to this as the nominal strategy as it is a standard method [26] and the natural choice when there is no access to a world model.

4.5. Computational Time

The computational time is what limits us from computing the optimal solution by exhaustive search and instead approximate the evaluation function by Equation (13) at higher search depths. We register the computational time for analyzing a single loading cycle. It includes predicting the new pile state

H^{'}

and total performance

P

, which involves updating

a^{load}

according to Equation (9) (dominant part) and computing and integrating the V-turns given a selected dig location

x^{dig}

as described in Section 4.2. Table 1 shows the average computational time from profiling the loading predictions made on a desktop computer with an Intel(R) Core(TM) i7-8700K, 3.70 GHz, 32 GB RAM on a Windows 64-bit system and NVIDIA GeForce RTX 2070 SUPER.

The computational time for solving the end-to-end optimization problem depends on the given time for search, the number of child nodes per node (I), and the tree search depth (d). For reference, a greedy search takes about 29.4 s for

d = 40

,

I = 10

with 73.5 ms per prediction as it requires

I \times d

predictions.

5. Results

We test the look-ahead search algorithm considering a wheel loader performing a task of

N = 15

sequential loading cycles. First, we analyze the greedy strategy (prediction horizon

d = 1

) and compare it using the load optimizing and the nominal strategies. Next, we analyze the look-ahead tree search algorithm with different search depths to examine the effect of the prediction horizon. The weights were fixed at

w = [2, 1, 1]

such that production (loaded mass) and cost (loading time and mechanical work) contribute equally to the overall loading performance.

The initial pile surface

H_{1}

features a trapezoidal prism, 1.8 m tall with a

30^{\circ}

slope at the front, with Perlin noise [27] added to the surface to avoid ending up in the same local minima. A representative initial pile surface is illustrated in Figure 6.

The location and orientation of the receiver are fixed at

x^{dump} = [x, y, θ] = [- 12.0 m, - 3.0 m, - 30.0 °]

. The dig locations are constrained within

- 5.0 m \leq x \leq 8.0 m

and

0.0 m \leq y \leq 6.0 m

. The listup function generates candidate dig locations

{\tilde{x}}^{dig}

with a spacing of 1 m.

5.1. Greedy Strategy

The greedy strategy with no horizon,

d = 1

, is tested using the three evaluation functions described in Section 4.4.1, Section 4.4.2 and Section 4.4.3:

Q_{greedy} = w^{T} P_{n}

,

Q_{loading} = w^{T} P_{n}^{load}

, and

Q_{nominal} = w^{T} (P_{n}^{V 1} + {P_{n}}^{V 2})

. The pile states after five loading cycles are shown in Figure 7. The greedy strategy results in loadings that aim for areas in the search region with more soil easily accessible from the dump truck. The maximum loading strategy aims to the right where there is more soil but further from the dump truck. The nominal strategy loads to the left, in proximity to the dump truck, as can be expected when focusing entirely on minimizing the transportation cost.

The evolution of the loaded mass, time, and work over 15 loading cycles is shown in Figure 8, and the net result is listed in Table 2. The greedy strategy performs the best on average and in total. The nominal strategy’s performance drops over time, suggesting that the pile state is deteriorating.

The greedy strategy is 8% more productive and energy efficient than the nominal strategy, using nearly the same time to complete 15 cycles. The maximum loading strategy is 1% more productive than the greedy strategy in loading tasks but spends around 8% more time and energy during V-turns. That is an expected result, as transportation time and energy do not contribute to the evaluation function of the maximum loading strategy. The analysis demonstrates that the evaluation function works effectively.

5.2. Long-Horizon Planning with Look-Ahead Tree Search

The second test uses the tree search algorithm with finite prediction horizon d and evaluation function (13). Ten initial heightfields were prepared with different Perlin noise applied to the 1.8 m high trapezoidal prism. For each of these ten piles,

N = 15

sequential loading cycles were evaluated with the prediction horizon ranging from

d = 1

to

d = N

. The resulting loading outcome and performance evaluation function are shown in Figure 9. The results were averaged over the ten repetitions starting with the different initial piles. The performance converges around

d = 4

. Beyond this point, increasing the search horizon further does not enhance the overall performance.

The performance improves by about 5.6% on average when comparing the results between

d = 1

and

d = 4

. The total loaded mass is marginally reduced from about 64.7 to 64.0 tonnes (1.1%), while the total loading time and the work improves from about 665 to 632 s (5.0%) and from 14.9 to 13.9 MJ (6.7%), respectively. Note that we also tested with evaluation horizons of

N = 10

and

N = 5

. The results at

N = 10

showed a similar improvement ratio, but this trend was not found for

N = 5

, probably because of the shorter task length. As we have already found that the greedy strategy (

d = 1

) is 6% more performant than the nominal strategy, we conclude that the look-ahead tree search leads to a 14% increase in performance relative to the nominal strategy.

The computational cost of the search increases with the prediction horizon d. On average, the search took 250 and 10,843 predictions with

d = 1

and 4, respectively. The computational time was about 18 and 792 s, respectively, given that a prediction takes 73 ms on average. The computational cost for deciding an immediate optimal action is shown in Figure 10.

The search with

d = 15

required 3181 predictions to decide the first action, which amounts to 232 s, given a prediction takes 73 ms on average. The searches with

d = 4

and

d = 1

required 593 and 13 predictions to decide the first action, amounting to 43 s and 0.9 s, respectively. Note that the number of predictions depends on the prediction horizon, the resolution of listup, and the length along the pile edge within the constraint region. The loading action sequence can make the pile state complex, which can increase the number of predictions.

The optimal loading locations are visualized in Figure 11 for prediction horizons

d = 1

and 4.

The first dig locations are rather similar, but the consecutive sequences are distinct from each other. The greedy search in Figure 11a starts at

x = 0

, continues to the right, and then switches to the left. The tree search method in Figure 11b starts a little closer to the dump truck and stays consistently to the left, closer to the dump truck, working its way into the pile with approximately three buckets’ width. The conclusion is that the tree search method transforms the pile in a way that maintains future loading with good outcomes while keeping proximity to the dump truck.

6. Discussion

With the look-ahead search method, it is possible to find action sequences that maximize the total performance over long horizons by increasing the pile state quality for future loadings. This method adapts to the truck’s location and can do so even if it would vary provided that the location is known in advance. In our experiments, the optimal action sequence can be (before any serious code optimization) computed in 43 s with a look-ahead search with depth

d = 4

, which is close to the average loading cycle time. Future work should consider other soil properties, such as cohesive or inhomogeneous soil. The resulting pile state is likely to depend more strongly on the selected loading actions than it does for gravel, for which the pile always tends to settle with a local slope near the natural angle of repose. To support more realistic conditions, one should extend the framework to support non-flat ground and spillage. Capturing this with a world model would be more demanding and might benefit from a longer horizon than

d = 4

. That would motivate substantial improvements in the computational efficiency of the look-ahead search. In this study, we employed a set of specific weight factors that equally balance productivity (mass per unit time) and energy efficiency (mass per unit energy). It would be interesting, in future work, to explore the impact of different weight factors on the result, e.g., what digging behavior emerges when energy consumption is heavily penalized.

7. Conclusions

We describe and test a look-ahead tree search method to find near-optimal loading action sequences. This method uses a world model to predict future pile states and the performance for each loading cycle. It can dynamically adjust to varying conditions and constraints of the work site. Our results show significant performance improvements, showcasing the benefits of incorporating long-term predictions into decision making. The look-ahead tree search results in a 6% higher performance in sequential loading than a greedy strategy and 14% higher than the nominal strategy. Moreover, the tree search using long-horizon predictions leads to different decisions for immediate actions. This raises the question for future work on how sensitive the optimized action sequence is to variations in soil properties.

Author Contributions

Conceptualization and methodology, K.A., E.W. and M.S.; software and validation, K.A.; formal analysis, K.A.; investigation, K.A. and M.S.; resources, K.A.; writing—original draft preparation, K.A.; writing—review and editing, K.A., E.W. and M.S.; visualization, K.A. and M.S.; supervision and project administration, E.W. and M.S.; funding acquisition, K.A. and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was in part funded by Komatsu Ltd.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon reasonable request from the corresponding author.

Acknowledgments

This research was supported in part by Komatsu Ltd. and Algoryx Simulation AB. We thank Erik Wallin and Arvid Fälldin for providing us the valuable suggestions and implementation to improve the prediction speed.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DEM	Discrete element method
LHD	Load–haul–dump

References

Singh, S.P.; Narendrula, R. Factors affecting the productivity of loaders in surface mines. Int. J. Min. Reclam. Environ. 2006, 20, 20–32. [Google Scholar] [CrossRef]
Singh, S.; Simmons, R.G. Task planning for robotic excavation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Raleigh, NC, USA, 7–10 July 1992; Volume 92, pp. 1284–1291. [Google Scholar] [CrossRef]
Aoshima, K.; Fälldin, A.; Wadbro, E.; Servin, M. World modeling for autonomous wheel loaders. Automation 2024, 5, 259–281. [Google Scholar] [CrossRef]
Ding, J. Understanding World or Predicting Future? A Comprehensive Survey of World Models. arXiv 2024, arXiv:2411.14499. [Google Scholar] [CrossRef]
Lever, P.J. An automated digging control for a wheel loader. Robotica 2001, 19, 497–511. [Google Scholar] [CrossRef]
Wang, L.; Ye, Z.; Zhang, L. Hierarchical planning for autonomous excavator on material loading tasks. In Proceedings of the 38th International Symposium on Automation and Robotics in Construction, Dubai, United Arab Emirates, 2–4 November 2021; pp. 827–834. [Google Scholar]
Dobson, A.; Marshall, J.; Larsson, J. Admittance Control for Robotic Loading: Design and Experiments with a 1-Tonne Loader and a 14-Tonne Load-Haul-Dump Machine. J. Field Robot. 2017, 34, 123–150. [Google Scholar] [CrossRef]
Singh, S.; Cannon, H. Multi-resolution planning for earthmoving. In Proceedings of the 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146), Leuven, Belgium, 20 May 1998; pp. 121–126. [Google Scholar]
Magnusson, M.; Kucner, T.; Lilienthal, A.J. Quantitative evaluation of coarse-to-fine loading strategies for material rehandling. In Proceedings of the 2015 IEEE International Conference on Automation Science and Engineering (CASE), Gothenburg, Sweden, 24–28 August 2015; pp. 450–455. [Google Scholar]
Backman, S.; Lindmark, D.; Bodin, K.; Servin, M.; Mörk, J.; Löfgren, H. Continuous control of an underground loader using deep reinforcement learning. Machines 2021, 9, 216. [Google Scholar] [CrossRef]
Lindmark, D.M.; Servin, M. Computational exploration of robotic rock loading. Robot. Auton. Syst. 2018, 106, 117–129. [Google Scholar] [CrossRef]
Sarata, S.; Weeramhaeng, Y.; Tsubouchi, T. Planning of Scooping Position and Approach Path for Loading Operation by Wheel Loader. In Proceedings of the 22nd International Symposium on Automation and Robotics in Construction, Ferrara, Italy, 11–14 September 2005. [Google Scholar] [CrossRef]
Takei, T.; Hoshi, T.; Sarata, S.; Tsubouchi, T. Simultaneous determination of an optimal unloading point and paths between scooping points and the unloading point for a wheel loader. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 5923–5929. [Google Scholar] [CrossRef]
Alshaer, B.; Darabseh, T.; Alhanouti, M. Path planning, modeling and simulation of an autonomous articulated heavy construction machine performing a loading cycle. Appl. Math. Model. 2013, 37, 5315–5325. [Google Scholar] [CrossRef]
Shi, J.; Sun, D.; Qin, D.; Hu, M.; Kan, Y.; Ma, K.; Chen, R. Planning the trajectory of an autonomous wheel loader and tracking its trajectory via adaptive model predictive control. Robot. Auton. Syst. 2020, 131, 103570. [Google Scholar] [CrossRef]
Hong, B.; Ma, X. Path planning for wheel loaders: A discrete optimization approach. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
Nezhadali, V.; Frank, B.; Eriksson, L. Wheel loader operation—Optimal control compared to real drive experience. Control Eng. Pract. 2016, 48, 1–9. [Google Scholar] [CrossRef]
Sardarmehni, T.; Song, X. Path planning and energy optimization in optimal control of autonomous wheel loaders using reinforcement learning. IEEE Trans. Veh. Technol. 2023, 72, 9821–9834. [Google Scholar] [CrossRef]
Filla, R.; Frank, B. Towards finding the optimal bucket filling strategy through simulation. In Proceedings of the 15th Scandinavian International Conference on Fluid Power, Linköping, Sweden, 7–9 June 2017; pp. 402–417. [Google Scholar] [CrossRef]
Frank, B.; Kleinert, J.; Filla, R. Optimal control of wheel loader actuators in gravel applications. Autom. Constr. 2018, 91, 1–14. [Google Scholar] [CrossRef]
Bradley, D.A.; Seward, D.W. The development, control and operation of an autonomous robotic excavator. J. Intell. Robot. Syst. 1998, 21, 73–97. [Google Scholar] [CrossRef]
Piegl, L.; Tiller, W. The NURBS Book; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar] [CrossRef]
Dubin, L.B. On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. Am. J. Math. 1957, 79, 497–516. [Google Scholar] [CrossRef]
Maekawa, T.; Noda, T.; Tamura, S.; Ozaki, T.; Machida, K.-i. Curvature continuous path generation for autonomous vehicle using B-spline curves. Comput.-Aided Des. 2010, 42, 350–359. [Google Scholar] [CrossRef]
Aoshima, K.; Servin, M.; Wadbro, E. Simulation-based optimization of high-performance wheel loading. In Proceedings of the 38th International Symposium on Automation and Robotics in Construction, Dubai, United Arab Emirates, 2–4 November 2021; pp. 688–695. [Google Scholar]
Hong, B.; Ma, X. Path optimization for a wheel loader considering construction site terrain. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 2098–2103. [Google Scholar] [CrossRef]
Perlin, K. An image synthesizer. In Proceedings of the 12th Annual Conference on Computer Graphics and Interactive Techniques, San Francisco, CA, USA, 22–26 July 1985; pp. 287–296. [Google Scholar] [CrossRef]

Figure 1. Overview of the look-ahead tree search algorithm for a wheel loader moving soil in V-shaped loading cycles from the evolving pile to a dump truck.

Figure 2. Illustration of the pile state predictor model. The neural network,

ϕ

, has an encoder–decoder structure.

Figure 2. Illustration of the pile state predictor model. The neural network,

ϕ

, has an encoder–decoder structure.

Figure 3. Illustration of the performance predictor model. Since the encoder does not depend on

a^{load}

, it and its gradient only need to be computed once for each dig location.

Figure 3. Illustration of the performance predictor model. Since the encoder does not depend on

a^{load}

, it and its gradient only need to be computed once for each dig location.

Figure 4. An example of V-turn-1 (gold) and V-turn-2 (cyan) connecting

x^{dump}

and

x^{dig}

via the switch back points

x^{V 1}

and

x^{V 2}

that must lie inside the square search regions (gray). Headings are indicated with dashed lines.

Figure 4. An example of V-turn-1 (gold) and V-turn-2 (cyan) connecting

x^{dump}

and

x^{dig}

via the switch back points

x^{V 1}

and

x^{V 2}

that must lie inside the square search regions (gray). Headings are indicated with dashed lines.

Figure 5. An example of the speed and the force at V-turn trajectories.

Figure 6. The initial heightfield with the dump truck area indicated to the left and the search region for dig candidates to the right. The size of the bucket is shown in front of the search region.

Figure 7. The pile shape after the five loading sequences using a greedy strategy (

d = 1

) and the three different evaluation functions.

Figure 7. The pile shape after the five loading sequences using a greedy strategy (

d = 1

) and the three different evaluation functions.

Figure 8. The performance at each loading cycle using the greedy, maximum loading, and nominal strategy. The time and work are split per subtask, normalized, and V-turn values are negated for ease of comparison.

Figure 9. The trend of the total performance of the optimal sequence, evaluated values, and the total number of the predictions at each search for the prediction horizon d. The values are the averages of all test results with the ten different initial piles.

Figure 10. The number of the predictions per loading cycle in a search with planning horizon

N = 15

. The line transparency increases with prediction horizon d from

d = 2

(light) to

d = 15

(dark). The golden lines are the trend of

d = 1

(dashed) and

d = 4

(solid).

Figure 10. The number of the predictions per loading cycle in a search with planning horizon

N = 15

. The line transparency increases with prediction horizon d from

d = 2

(light) to

d = 15

(dark). The golden lines are the trend of

d = 1

(dashed) and

d = 4

(solid).

Figure 11. Visualization of the optimized sequence of dig locations

{x_{n}^{dig}}_{n = 1}^{1} 5

computed using greedy search (

d = 1

) and tree search (depth

d = 4

). The first dig location is marked with a green frame, and later locations with increasing transparency.

Figure 11. Visualization of the optimized sequence of dig locations

{x_{n}^{dig}}_{n = 1}^{1} 5

computed using greedy search (

d = 1

) and tree search (depth

d = 4

). The first dig location is marked with a green frame, and later locations with increasing transparency.

Table 1. Profiling a single loading cycle prediction.

Function		Time [ms]	Dominant Computing Cost
$Φ$	`cutout`	9.0	heightfield rotation
	$ϕ$	2.5
	`replace`	12.0	heightfield rotation
$Ψ$	$ψ^{load} \to a^{load}$	∼45.0	`grad-descent` (1.5 [ms] × iteration)
	$ψ^{V 1}$	2.5	generating 2 × cubic spline
	$ψ^{V 2}$	2.5	generating 2 × cubic spline
Total		∼73.5

Table 2. The total performance by the three variants of the greedy search strategy in units M [tonne], T [s], and W [MJ].

		Load		V-Turns		Total
Strategy	$M$	$T$	$W$	$T$	$W$	$T$	$W$
Greedy	$64.4$	197	6.2	421	9.3	$694$	$15.5$
Max loading	62.6	$189$	$6.0$	453	10.4	717	16.4
Nominal	59.8	260	9.4	$362$	$7.5$	697	16.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Aoshima, K.; Wadbro, E.; Servin, M. Optimizing Autonomous Wheel Loader Performance—An End-to-End Approach. Automation 2025, 6, 31. https://doi.org/10.3390/automation6030031

AMA Style

Aoshima K, Wadbro E, Servin M. Optimizing Autonomous Wheel Loader Performance—An End-to-End Approach. Automation. 2025; 6(3):31. https://doi.org/10.3390/automation6030031

Chicago/Turabian Style

Aoshima, Koji, Eddie Wadbro, and Martin Servin. 2025. "Optimizing Autonomous Wheel Loader Performance—An End-to-End Approach" Automation 6, no. 3: 31. https://doi.org/10.3390/automation6030031

APA Style

Aoshima, K., Wadbro, E., & Servin, M. (2025). Optimizing Autonomous Wheel Loader Performance—An End-to-End Approach. Automation, 6(3), 31. https://doi.org/10.3390/automation6030031

Article Menu

Optimizing Autonomous Wheel Loader Performance—An End-to-End Approach

Abstract

1. Introduction

2. Related Work

3. Problem Formulation

Assumptions and Delimitations

4. Method

4.1. Loading Prediction

Optimal Loading Action Parameters

4.2. V-Turn Model

4.3. Dumping

4.4. Look-Ahead Tree Search

4.4.1. Greedy Strategy

4.4.2. Maximum Loading Strategy

4.4.3. Nominal Strategy

4.5. Computational Time

5. Results

5.1. Greedy Strategy

5.2. Long-Horizon Planning with Look-Ahead Tree Search

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI