Probabilistic Contingent Planning Based on Hierarchical Task Network for High-Quality Plans

Zhao, Peng; Liu, Xiaoyu; Su, Xuqi; Wu, Di; Li, Zi; Kang, Kai; Li, Keqin; Zhu, Armando

doi:10.3390/a18040214

Open AccessArticle

Probabilistic Contingent Planning Based on Hierarchical Task Network for High-Quality Plans

by

Peng Zhao

^1,*,†

,

Xiaoyu Liu

^2,†,

Xuqi Su

³

,

Di Wu

^2,4,

Zi Li

³,

Kai Kang

⁵,

Keqin Li

⁶

and

Armando Zhu

⁷

¹

Microsoft Corporation, Beijing 100080, China

²

Guanghua School of Management, Peking University, Beijing 100080, China

³

School of Aeronautics and Astronautics, Shanghai Jiaotong University, Shanghai 200240, China

⁴

School of Systems and Computing, University of New South Wales, Canberra 2612, Australia

⁵

School of Business, University of New South Wales, Canberra 2612, Australia

⁶

Department of Computer Science, AMA University, Quezon 1106, Philippines

⁷

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Algorithms 2025, 18(4), 214; https://doi.org/10.3390/a18040214

Submission received: 25 December 2024 / Revised: 7 March 2025 / Accepted: 2 April 2025 / Published: 9 April 2025

(This article belongs to the Special Issue Algorithms and Optimization for Project Management and Supply Chain Management)

Download

Browse Figure

Versions Notes

Abstract

Deterministic hierarchical task network (HTN) planning assumes that planning evolves along a fully predictable path and neglects the quality of the plan in the partially observable environment. To bridge this research gap, this paper proposes an innovative probabilistic contingent HTN planner, named the High-Quality Contingent Planner (HQCP), designed to generate high-quality plans within partially observable contexts. Our methodology extends conventional HTN planning formalisms to accommodate for partial observability and assesses these extensions based on plan cost. Additionally, we propose a novel heuristic for high-quality plans and develop the integrated planning algorithm. These empirical studies verify the effectiveness and efficiency of the planner both in probabilistic contingent planning and in achieving plans of a high quality.

Keywords:

probabilistic contingent planning; high-quality plans; hierarchical task network planning

1. Introduction

Planning to complete a set of predefined goals is a fundamental aspect of decision-making processes across numerous domains, particularly in scenarios characterized by intricate procedures such as logistics, manufacturing, mission planning, information extraction, and emergency response [1,2]. A significant challenge inherent in these domains is the pervasive uncertainty, which often leads to frequent failures in the execution of plans. Classical planning typically assumes that the planning environment is fully observable. Namely, the planner has complete knowledge regarding the world’s states, and all the states are changed in a deterministic way under the planner’s control [3]. However, in reality, it is tough or even impossible for an agent to ascertain complete information at runtime; thus, the practical environment is partially observable to the planner. Additionally, the environment is subject to dynamic influences from uncertain factors, causing some states to fluctuate constantly. Therefore, this motivates the agent to plan under partial observability by taking into account the incomplete information beforehand and generating more robust plans [4,5].

Furthermore, in partially observable environments, the presence of uncertain states that remain unknown until they are directly sensed introduces significant variability in the quality of generated plans. Merely ensuring the feasibility of a plan is insufficient to meet the demands of the stringent requirements of such dynamic and unpredictable settings. It is imperative to proactively evaluate and consider the quality of a plan prior to its execution, anticipating potential uncertainties and their impact on performance. However, due to incomplete information, solely optimizing for an optimal plan may not be feasible during real-world execution in partially observable environments. The challenge of incomplete information complicates the pursuit of an optimal plan [6]. The complexity and uncertainty inherent in these settings necessitate a nuanced approach to planning. A high-quality plan in a partially observable environment means constructing a plan that includes multiple branches, each designed to address different possible observations and cope effectively with uncertain states as they arise. Moreover, such a plan should aim to minimize expected costs. We take the fire emergency evaluation case as an example. There are several dwellers trapped in a burning building. They may be in some rooms, but their specific locations cannot be predetermined. There are many feasible evacuation plans, but they have different costs. The goal of planning in this domain is to evacuate all the dwellers at the least cost. Therefore, planning under partial observability for high-quality plans is a challenging but valuable problem.

Hierarchical task network (HTN) planning [7,8] is an efficient AI planning approach. It solves complicated planning problems in hierarchical decomposition analogous to the human decision-making process. The main idea is to decompose abstract compound tasks into more specific ones until all tasks are primitive and can be executed directly. Owing to the use of specific domain knowledge, HTN planning has powerful reasoning and good support for large-scale domains [9,10,11,12,13]. In addition, it has been shown to be more expressive than classical planning representations [7,14]. HTN planning has been successfully applied in a great deal of real-world projects [15,16,17,18,19,20].

Although HTN planning shows great strengths, it faces the limitations seen in partially observable contexts and within high-quality plans’ requirement. The studies [21] of HTN planning under partial observability are unable to distinguish between feasible plans in terms of various qualities. They can only obtain a feasible plan to accomplish objectives without considering quality or cost metrics. Several HTN planning approaches [22] for high-quality plans concentrate upon the deterministic domains. However, when deployed in real-world scenarios, those planning approaches are hampered by incomplete observable information, resulting in failures. To the best of our knowledge, HTN planning for high-quality plans under partial observability has not yet been addressed. This significant gap motivates us to develop a planner capable of dealing with this problem.

In this paper, we put forward a novel probabilistic contingent planning approach based on HTN, named the High-Quality Contingent Planner (HQCP), designed to generate high-quality plans within the partially observable environment. We empirically evaluate the HQCP in two partially observable domains, medicate and extended ZenoTravel. The experimental results show that the HQCP performs effectively in terms of both solving probabilistic contingent planning problems and searching for high-quality plans under partial observability. Furthermore, the HQCP enhances the efficiency of the search process for identifying high-quality plans. The main contributions of this paper are summarized as follows:

We extend the existing HTN planning formalisms to partial observability and evaluate them regarding the cost.
We propose a novel heuristic to search for high-quality plans in probabilistic contingent HTN planning.
We develop the HQCP, implementing probabilistic contingent planning HTN with the heuristic.

The remainder of this paper is organized as follows. Section 2 provides a review of the related literature. Section 3 presents the formalism, definitions, and notation extended in this paper. Section 4 is dedicated to the proposed heuristic. The integrated planning algorithm is detailed in Section 5. Section 6 surveys the empirical evaluation, along with the results of the experiments conducted. Section 7 concludes this paper and outlines the directions for future research.

2. Literature Review

Various planning methodologies rooted in HTN planning have emerged to address partially observable environments, including contingent planning and probabilistic contingent planning. Contingent planning is characterized as the uncertainty of the initial state of the world, sensing actions that obtain the factual observation of belief states and the conditional plan for potential contingencies. For instance, CondSHOP2 [23] integrates the conditional forward-chain search technology into the SHOP2 framework [24], substituting it for the former forward-chain planning. It is improved for dealing with uncertain states. Probabilistic contingent planning represents an alternative significant approach for partial observability, incorporating probability reasoning and representation into partial observability. C-SHOP [25], which builds based on SHOP2, likewise involves belief states handling incomplete information about the environment and sensing actions to generate a conditional plan. Its improved successor, PC-SHOP [26], inherits all the characteristics of C-SHOP while introducing probability reasoning regarding the effects of actions. Tang et al. [21] propose a novel methodology utilizing probabilistic HTNs as an effective method for agents to plan under conditions in which their problem-solving knowledge is uncertain, and the environment is non-deterministic. This approach models the environment as a Markov Decision Process (MDP) and employs Earley graphs to bridge HTNs and MDPs. Expanding the review scope to non-hierarchical planning, Majercik and Littman [27] propose a planner for probabilistic propositional contingent planning. Brafman and Shani [28] compile contingent planning problems with sensing actions into classical planning. Maliah et al. [29] propose an online contingent planner with a landmarks-based heuristic. Partially observable contingent planning is developed for the penetration testing problem [30]. Shani [31] develops a heuristic function for Partially Observable Stochastic Contingent Planning. However, it is important to note that these approaches primarily focus on identifying feasible plans.

Simultaneously, some studies have investigated HTN-based planning aimed at generating high-quality or even optimal plans. In the HTN planner SHOP2, a branch-and-bound search is used to find the plan with the minimum cost. Hogg et al. [32] propose an integrated approach of HTN learning with reinforcement learning to generate high-quality plans by learning the methods and values of them. Luo et al. [33] put forward a genetic algorithm (MGA) to solve the optimum solution searching problem of HTN planning in the above situations. A length-variant chromosome is introduced to represent the possible planning solution in the form of a decomposition tree with dynamic node numbers. Georgievski and Lazovik [34] propose employing HTN planning in risk-sensitive planning domains. This approach introduces utility functions that encapsulate the risk preferences associated with compound tasks and adapts a best-first search algorithm to take such utilities into account. HTNPLAN-P [22] generates preferred plans by combining the procedural control knowledge specified by HTNs with rich user preferences, which proposes a branch-and-bound algorithm, together with a set of heuristics that, leveraging the HTN structure, measure progress made towards the satisfaction of preferences. Behnke et al. [35] introduce the first SAT-based approach for optimal HTN planning, translating HTN planning into propositional logic to find optimal plans. This process requires bounding the length of the solution rather than the decomposition depth. Holler et al. [36] introduce two novel progression algorithms that avoid unnecessary branching when the problem at hand is partially ordered and show that both are sound and complete. They introduce a method to apply arbitrary classical planning heuristics to guide the search in HTN planning, which relaxes the HTUN planning model to a classical model that is only used for calculating heuristics. Shao et al. [37] propose an HTN planning model based on Monte Carlo Tree Search to find the optimal solution. In its planning process, the planning tree is built by Monte Carlo Tree Search to guide the HTN planner to choose the best decomposition method. Behnke and Speck [38] present a novel approach to optimal totally ordered HTN planning, which is based on symbolic search. This approach is modified to find cost-optimal plans. Yousefi and Bercher [39] introduce the approach to find optimal solution for Fully Observable Non-Deterministic (FOND) HTN planning. Its formalization and complexity have also been analyzed [40,41]. Nevertheless, all these studies have been limited to producing high-quality plans within fully observable environments. We address this significant gap by focusing on HTN planning for high-quality plans in uncertain contexts.

3. Problem Definition

In order to concentrate on and emphasize the primary contribution of this study, this section mainly discusses the necessary extensions with respect to partial observability and high quality. It is worth noting that the rest of the notions and concepts are analogous to the classical HTN planning approaches [24]. To present partially observable information about the environment, belief states are introduced. To observe uncertain information, sensing actions are defined. The plan is extended as a conditional form to cope with uncertain outcomes. To generate high-quality plans, one necessary thing that should be taken into account first is how to quantitatively evaluate them. Since the cost can be directly evaluated in the planning domain, one natural way is to utilize the cost to distinguish different qualities. In this section, the evaluation of cost is discussed.

3.1. Belief States

Definition 1.

(Belief State) A belief state is a pair

b s = (S, P r o b)

where

$S$ is a set of ground states, which inherits the representation in PC-SHOP;
$P r o b$ is a probabilistic distribution over $s \in S$ .

In this paper, the belief state is adopted to represent incomplete information in the partially observable environment. A belief state stands for a probability distribution over ground states, which refers to a set of ground states. This paper inherits the representation of the ground states in PC-SHOP, which is described in terms of fluents and their values. Note that the sum of the probabilities of the element states should equal to 1, namely

\sum_{s \in S} P r o b (s) = 1

and

P r o b (s) \neq 0

.

3.2. Task Network

A task network is a hierarchical representation of tasks that need to be accomplished to achieve a goal. This paper inherits the definitions of the task network in traditional HTN planning. It consists of a set of tasks and the constraints between them. The tasks in the task network include two categories: primitive tasks and compound tasks. Primitive tasks are basic actions that can be executed directly without further decomposition. Compound tasks are higher-level tasks that need to be decomposed into simpler subtasks. The constraints define relationships and dependencies among tasks within a task network. The task network is defined as below.

Definition 2.

(Task network) A task network is a triple

w = (C, P, C o n s t r a i n t s)

, where

$C$ is a finite set of compound tasks;
$P$ is a finite set of primitive tasks, including actuation primitive tasks and sensing primitive tasks;
$C o n s t r a i n t s$ is a finite set of constraints.

3.3. Planning Domain Knowledge

Planning domain knowledge encompasses all the necessary information and rules that guide the HTN planning process, enabling effective task decomposition and plan reasoning. Unlike classical HTN planning, this paper expands the planning domain knowledge into the partially observable domain for high-quality plans. We first define it and define the components as follows. The planning domain knowledge is defined as below.

Definition 3.

(Planning domain knowledge) The planning domain knowledge is defined as

D = (O_{a c t}, O_{s e n s e}, M)

, where

$O_{a c t}$ is a finite set of actuation operators;
$O_{s e n s e}$ is a finite set of sensing operators;
$M$ is a finite set of methods.

Sensing actions aim to observe the planning environment and obtain observations regarding uncertain information. Note that sensing actions do not transform any ground states. Sensing actions are classified as actions, but they should be differentiated from other actions that do change the states of the objective environment. Therefore, in this paper, we designate the actions that change the states and lack sensing capability as actuation actions. Sensing actions lack positive and negative effects. The accessorial portion, however, is embodied in the observation. As the result of sensing, observation presents the real situation perceived by the agent in the partially observable environment and is treated as the precondition of the conditional plan, which means that if the agent senses an observation, then it executes the corresponding plan. Consequently, the two types of operators are defined as below.

Definition 4.

(Actuation operator) An actuation operator is a four-tuple

O_{a c t} = (h e a d, p r e, e f f e c t, c o s t)

where

$h e a d$ is the actuation operator’s head and should unify with an actuation primitive task that can be completed by the action;
$p r e$ is the precondition that must hold;
$e f f e c t$ is the positive and negative effect of the actuation operator;
$c o s t$ is the cost function of the operator.

Some works for high-quality plans in HTN planning [24] assign operators a constant cost. This assumption, however, has limitations. Since the operators are an abstract form of actions, different actuation actions instantiated from one actuation operator should have different costs. We take the ZenoTravel [42] domain as an example. ZenoTravel describes a problem in which passengers are transported by airplane between cities. The actuation operator !fly describes an airplane flying from one city to another city. Its cost is considered as the consumption of fuel during the voyage. The longer the flight route is, the more fuel is consumed. Hence, in this paper, the cost of the actuation operator is modeled as a function of variables or an if-else statement. When it is instantiated into an action, it has a real sense. For example, the fuel cost of the actuation operator !fly can be calculated by a linear function of the flying route between two cities. As in uncertain environments some states are not deterministic, the cost of an actuation operator can be formalized as an if-else statement function of grounds states. For instance, when the airplane lands at an airport, it can be refueled by the oil supplier there. But it is uncertain whether the oil supplier is occupied. The time cost of the operator !refuel should be 40 if the supplier is available but 100 if it is occupied.

Definition 5.

(Sensing operator) A sensing operator is a three-tuple

O_{s e n s e} = (h e a d, p r e, o b s)

, where

$h e a d$ is the sensing operator’s head, which should unify with a sensing primitive task;
$p r e$ is the precondition;
$o b s$ is a set of mutually exclusive observations that denote all the possible outcomes after the execution of the sensing action. An observation is a belief state.

In partially observable environments, the real outcome of a sensing action can only be determined at the execution time. However, during the planning process, the planner must account for all potential scenarios that may arise following the execution of a sensing action. An observation is represented as a belief state. While the sensing action does not alter the actual ground states within the belief state, it does modify the associated probability distribution. After executing a sensing action, one ground state within the belief state will be assigned a probability of 1, while the probabilities of all other ground states will be set to 0, reflecting the definitive results of the sensing action.

Definition 6.

(Method) A method is a triple

M = (h e a d, p r e, s u b t a s k, c o s t)

, where

$h e a d$ is the method’s head, which should unify with a compound task;
$p r e$ is the precondition;
$s u b t a s k$ is the subtask network;
$c o s t$ is the cost function of the method.

Methods are the decomposition rules that describe how to break down compound tasks into simpler tasks or subtasks. Each method provides a way to transform a compound task into a set of subtasks. The methods defined in this paper are different from the traditional HTN planning paradigms. A method has a subtask network, but a compound task can be decomposed by many instances of the methods. The subtask network has the same structure and definition as the task network defined above, which is a part of the task network. The cost of the method is modeled as a function, which indicates the cost of the subtask. This can be modeled as a heuristic function, which means the cost estimates on the responding subtasks. This will be specified in the following sections.

The application of the decomposition of methods is systematic and follows a structured approach that ensures that the planner can effectively navigate from high-level abstract tasks to concrete actions that can be executed. The first step in applying a decomposition of a method is identifying which methods are applicable to the current task. A method is considered applicable if its preconditions are satisfied given the current belief state. The planner selects one applicable method to decompose the current task. The selected method provides a set of subtasks that replace the original task in the task network. The decomposition process is repeated recursively until all tasks in the network are primitive.

3.4. Problem and Plan

Based on the above definitions, the problem of the planning in this paper is defined as follows.

Definition 7.

(Problem) A probabilistic contingent HTN planning problem for high-quality plans is a three-tuple

P = ({b s}_{0}, ω_{0}, D)

, where

b s_{0}

is the initial belief state,

ω_{0}

is the initial task network, and

D

is the planning domain knowledge.

A plan is the solution to the planning problem. In deterministic HTN planning, the plan consists of a sequence of actions that, when executed, will achieve the goal. In this paper, the plan is represented as a conditional form, with different branches corresponding to different observations. The conditions necessary for a plan to be considered a solution are typically specified through the preconditions of methods and operators, as well as constraints within the task network.

Definition 8.

(Plan) A plan is defined as

π = < a_{1}, \dots, a_{n}, ({a_{s e n s e}, o b}_{1}, π_{1}), \dots, ({a_{s e n s e}, o b}_{m}, π_{m}) >

, where

a_{n}

are actuation actions,

{o b}_{m}

are the observations resulting from a sensing action, and

π_{m}

is the corresponding subplan. The conditional plan means that if the agent senses the observation

{o b}_{m}

, then it carries out the subplan

π_{m}

in succession.

4. Heuristic

In this section, we discuss the heuristic for high-quality plans in probabilistic contingent HTN planning. In HTN planning, the task network, as a control strategy, guides the reasoning of actions. In HTN planning, the planning goal is replaced by the initial task network. This means that if there exists a plan, the initial task network is certain to be decomposed accurately to achieve the final goal via the given planning domain. Inspired by the work [43], which proposes an admissible HTN planning heuristic capable of finding optimal solutions heuristically, we extend into a partially observable domain.

As defined above, methods have the cost attribute, which is a heuristic cost estimate to the responding compound task. And different decompositions have various cost estimates. When a compound task is decomposed by a method, the cost helps select a subtask with minimal cost from all the available ones. This function should never overestimate the true cost. This guides searching for a plan in an efficient way. The cost of methods can be modeled as a function of variables. Furthermore, actuation primitive tasks can have cost heuristics. As sensing primitive tasks do not change the ground state of the environment, sensing tasks do not have a cost heuristic.

The costs of the tasks will be updated when planning goes forward. The cost of a compound task will be updated by the costs of the subtasks. We define a heuristic function, which is given as Formula (1).

c_{i}

are the compound tasks and

p_{i}

are the primitive tasks in the subtask. The formula is calculated by the cost of the subtask that has the minimum cost.

t . c o s t = \min \{\sum_{c_{i}, p_{i} \in t . s u b t a s k} (c_{i} . c o s t + p_{i} . c o s t)\}

(1)

Furthermore, as a sensing primitive task will have various observations, the cost of a sensing primitive task will be updated by the cost expectation of the corresponding tasks. We define Delta(t) as the algorithm that calculates the heuristic value in HTN planning. The detailed procedure is presented in Algorithm 1.

Algorithm 1. Delta(t)

begin
if t is a compound task, $c_{i}$ are the compound tasks, and $p_{i}$ are the primitive tasks, $c_{i}, p_{i} \in t . s u b t a s k$ then
$t . c o s t \leftarrow \min \{\sum (c_{i} . c o s t + p_{i} . c o s t)\}$
else if t is a sensing primitive task, ${P r o b}_{i}$ are the probability in the belief state and $t_{i}$ are the corresponding tasks then
$t . c o s t \leftarrow \sum {P r o b}_{i} \cdot t_{i} . c o s t$
return $t . c o s t$
end begin

5. Planning Algorithm

This paper aims to search for high-quality plans within a partially observable environment. Therefore, it necessitates the integration of partial observability with optimization procedures. More specifically, the obtained plan is capable of not only addressing the previously analyzed aspects of partial observability but also achieving high quality based on the evaluation of certain objectives. In this section, we elaborate on the planning algorithm that integrates the heuristic with probabilistic contingent planning.

The planner takes the planning problem

P = (b s_{0}, ω_{0}, D)

as the input, which includes the initial belief state

{b s}_{0}

, the initial task network

ω_{0}

, and the planning domain D. The initial belief state encapsulates the initial information about the partially observable environment. Finally, the planning algorithm aims to generate a plan for

P

. The planning procedure is shown as Algorithm 2.

At the beginning of the algorithm, the plan π is assigned by an empty set. The framework of the algorithm can be regarded as a loop that plans each step and moves forward recursively. When the plan is generated successfully or the algorithm returns failure, the whole cyclic procedure stops and the planning ends. Each step involves executing the tasks in the task network and proceeding with the next iteration of the process. The planning forward step is recursive, meaning that it calls the next iteration of the algorithm, which starts again with the next task in the sequence. This cyclical process continues until all tasks in the network have been successfully planned and executed, ensuring that the overall objective is achieved in the most efficient manner possible.

HTN planning involves breaking down complex tasks into simpler, more manageable subtasks. The planning forward step is elaborated as follows. The tasks in the task network are defined by three categories: compound tasks, actuation primitive tasks, and sensing primitive tasks, corresponding to methods, actuation operators, and sensing operators. Hence, the algorithm proceeds in three ways. From line 4 to line 16, when the current task is an actuation primitive task, an actuation operator is instantiated into a set of actions at the belief state bs that satisfies the preconditions of the actions. The action that has minimal cost is selected. The belief state and the task network are updated, respectively, and plan π is appended with

a

.

Line 17 to line 27 describes the situation where the current task is a compound task. Active is a set that consists of the instances of the available methods. If Active is not null, the instance that minimizes the cost is chosen. All the tasks in the task network will be updated from the bottom up at line 21. The detailed procedure for updating will be discussed in Algorithm 3. If the task network is checked to be not consistent by Algorithm 3, updating will be suspended, and backtracking happens to search for other available instances of the methods. Otherwise, the task network will be renewed at line 23 and the planning will proceed.

When the current task is a sensing primitive task, the planner senses the whole belief state for uncertainty and calls the HQCP recursively, which forms a conditional plan. From line 28 to line 36, when the current task is a sensing primitive task, the algorithm senses every ground state in the belief state, adding the sensing action and the observation into the corresponding subplan.

Backtracking and planning forward are mutually exclusive. When the planner cannot find a valid method to decompose a task or there are no available actions to accomplish a primitive task, or the cost of the current task network is not minimum, the algorithm will not proceed with planning forward. Instead, the planner will backtrack, which involves a series of corrective actions: removing the actions that have been planned so far, reverting the belief states to a prior state, and restoring the task network. The planner will backtrack to the most recent decision point. This step is essential for undoing any erroneous decisions and setting the stage for a new attempt at finding a viable plan with minimum cost.

Algorithm 2. HQCP

({b s}_{0}, ω_{0}, D)

Initialize:

π \leftarrow \emptyset

,

b s \leftarrow {b s}_{0}

,

ω \leftarrow ω_{0}

, ∀t,

t . c o s t \leftarrow 0

Algorithm 3 describes the detailed procedure of updating the costs of the tasks. As the heuristic, the cost of the parent task

\bar{t}

is updated at line 3. Line 4 formally defines consistency and checks if the task network is consistent. If the task network is not consistent, failure will be returned. This procedure repeats recursively until all tasks have been updated or the task network is not consistent.

Algorithm 3. update (t)

1.: begin
2.: $t \in \bar{t} . {s u b t a s k}_{i}$ , t∈ω
3.: $\bar{t} . c o s t \leftarrow D e l t a (\bar{t})$
4.: if $\exists \bar{t} . {s u b t a s k}_{j}$ such that $\bar{t}' . c o s t < \bar{t} . c o s t$ then
5.: ω is not consistent
6.: return failure
7.: else if $\bar{t}$ = ∅ then
8.: return true
9.: else
10.: update ( $\bar{t}$ )
11.: end begin

6. Empirical Study

In this section, we evaluate the HQCP’s performance in two standard planning domains: the medicate domain [44] and the extended ZenoTravel domain. As there have been no existing standard domains describing high-quality planning problems that are partially observable, we extend the standard ZenoTravel into a partially observable domain in which the planner evaluates the probability and the cost. As there are no existing HTN planning approaches investigating high-quality planning under partially observable domains, we compare the HQCP with the powerful HTN-based probabilistic contingent planner PC-SHOP in a partially observable domain to demonstrate the performance of dealing with partial observability. The medicate domain is a standard partially observable domain and is selected for comparison. Section 6.1 aims to verify the performance of dealing with partial observability. Section 6.2 attempts to show the performance of planning for high-quality plans in a partially observable environment. All the experiments are run on a Windows 11 machine with Intel Core i9-10900X CPU @ 3.70 GHz and 32 GB of RAM.

6.1. Medicate Domain

The medicate domain serves as a challenging planning competitions problem for testing and developing planning algorithms that can handle real-world complexities, especially those involving partial observability. The medicate domain takes into consideration a patient who may have a possible infection or may be healthy, but we do not know which infection the patient has. The planning goal is to diagnose the infection accurately and administer the correct medication to cure the patient. If the wrong medication is given, it can be fatal. But it does not explicitly consider planning quality. The primary focus remains on ensuring the patient’s safety and effective treatment rather than on optimizing secondary preferences like plan cost. Thus, the actions in this domain are not evaluated by cost. The treatment for the patient involves diagnosing the infection and taking proper medication to cure it. Inaccurate remedy, however, will kill the patient. In the planning domain, there is one sensing operator that diagnoses the disease and one actuation operator that medicates the patient using the corresponding remedy. One of the challenges is that the patient’s condition is not fully observable, requiring the planner to handle uncertainty. Another challenge is the explosion of complexity. The plan generated by the planner diagnoses each disease and cures it with the right medication action conditionally. The complexity of the planning problem and the size of the search space increase significantly with the number of diseases. As the number of possible diseases increases, so does the uncertainty and the need for more actions and considerations, leading to a dramatic increase in plan length.

Figure 1 shows the runtime of the two planners. Every experiment is run ten times, and the average CPU time is recorded. Figure 1 indicates that the HQCP is faster than the contrastive planner. With the difficulty of the planning problem increasing and the scale of the search space expanding, the efficiency of the HQCP does not decline dramatically, while the contrastive planner costs much time. The experiment results also show that the HQCP is capable of coping with real problems with high complexity. In our opinion, the better performance demonstrated by Figure 1 comes from the more succinct algorithm structure and the more efficient heuristic rules.

6.2. Extended ZenoTravel Domain

In order to illustrate the performance of the HQCP for high-quality plans in the partially observable environment, we extend standard ZenoTravel, one of the standard domains in the planning competition of the International Conference on AI Planning and Scheduling (AIPS-2002), into a partially observable domain. ZenoTravel is extended to verify the capabilities of the planning for high-quality plans in the partially observable domain. While numeric and temporal features are present in ZenoTravel, addressing these aspects is not the focus of this paper, and these features are treated within the framework of classical HTN planning. Given that there are no existing HTN planning approaches specifically investigating high-quality planning under partially observable conditions, we selected two cases from the ZenoTravel domain to verify and compare the planning capabilities for generating high-quality plans in such environments. In the ZenoTravel domain, an airplane transports people from city A to city C, transiting at city B. The airplane has two navigation modes: zoom and fly. Zoom, the fast movement, flies faster but consumes more fuel. The airplane must refuel when it lands at an airport if the zoom mode opens. Passengers board and debark from the airplane at the airport. Therefore, there are five actions: board, debark, fly, zoom, and refuel. Since the temporal problem is inevitable, all five actions have execution durations, and refueling can be carried out simultaneously with boarding or debarking.

One key issue in the extended ZenoTravel domain is the numeric cost. The domain requires obtaining the plan with the minimum fuel consumption. Furthermore, the airplane needs to refuel when it lands at an airport, while the refueling situation is uncertain. Specifically, when the airplane lands at the airport but the oil supplier is occupied by the other planes, the airplane cannot be refueled. The occupation, however, cannot be informed in advance.

Table 1 and Table 2 show the different plans under the different deadlines. In Plan 1, the deadline for arriving in city C is 21:30. In order to cut the cost, the airplane uses fly mode. However, in Plan 2, the deadline for arriving in city C is 21:00. The airplane opens fly and zoom modes. The plans observe whether the oil supplier is usable when the airplane lands at an airport. As a result of assistance from the heuristic, the two plans are both the plans with the minimum costs under the temporal requirements.

This approach significantly enhances the ability to generate high-quality plans in scenarios where environments are partially observable, a common challenge in real-world applications. By extending HTN formalisms to accommodate partial observability and incorporating quality evaluations, planners can make more informed decisions under uncertainty. The planner’s demonstrated efficiency in handling partial observability and its capability for high-quality planning in such environments suggest potential applications across various industries, such as emergency decision-making, supply chain management, intelligent robots, among others. The approach generates robust plans tailored for partially observable environments, making it more practical to real-world problems. And the high-quality plans facilitate reducing costs and improving profitability in practical applications.

7. Conclusions and Future Work

Traditional HTN planning takes less account of partial observability and neglects plan quality within partially observable environments, which significantly limits its practical applicability in real-world scenarios. To bridge this research gap, we proposed a probabilistic contingent planner based on HTN, named the HQCP, specifically designed to generate high-quality plans within partially observable environments. We have extended the HTN planning formalisms into partial observability and evaluated them in terms of cost. We developed the heuristic for high-quality plans. A complete planning algorithm has been constructed to reason the whole planning procedure. Finally, two partially observable domains have been selected to verify the performance of the planning approach. The first domain demonstrated that the planner is efficient in coping with partial observability, and the second domain highlighted the ability of high-quality planning in partially observable environments.

This paper addresses the challenges posed by partially observable environments and demonstrates the capability to generate high-quality plans under such conditions. This approach is particularly practical in real-world scenarios, which are inherently fraught with uncertainty. For instance, in emergency situations and robotics, numerous uncertainties arise during plan execution, necessitating flexible and robust planning strategies. Moreover, this paper optimizes the quality of the plans, making them more suitable for cost-sensitive scenarios such as logistics. By accounting for partial observability and optimizing plan quality, our approach enhances the feasibility and efficiency of planning systems in dynamic and unpredictable settings.

Future work will focus on expanding the scope to include a broader range of uncertainties in planning, such as temporal and resource uncertainties. We aim to enhance the representation and reasoning capabilities of our planning systems to better accommodate for various preferences. Additionally, we will explore incorporating machine learning techniques to improve uncertainties, extending the approach to multi-agent systems for collaborative planning, and improving scalability for large-scale problems. Conducting real-world case studies and addressing ethical considerations in AI planning will also be crucial areas of investigation. These efforts will contribute to making HTN planning more robust and applicable to diverse real-world domains.

Author Contributions

Conceptualization, P.Z.; methodology, P.Z.; software, X.L. and K.L.; validation, Z.L., X.L. and K.K.; formal analysis, P.Z. and A.Z.; investigation, D.W.; resources, Z.L.; data curation, K.K. and X.L.; writing—original draft preparation, P.Z.; writing—review and editing, D.W., X.S., and A.Z; visualization, X.L.; supervision, X.S.; project administration, D.W.; funding acquisition, X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. (The data are not publicly available due to privacy or ethical restrictions).

Conflicts of Interest

Author Peng Zhao was employed by the company Microsoft. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xiao, Z.; Blanco, E. Are People Located in the Places They Mention in Their Tweets? A Multimodal Approach. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2561–2571. [Google Scholar]
Xiao, Z.; Huang, Y.; Blanco, E. Context Helps Determine Spatial Knowledge from Tweets. In Proceedings of the Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings), Nusa Dua, Indonesia, 1–4 November 2023; pp. 149–160. [Google Scholar]
Ahmad, A.S.; Shams El-Din, M.N.; El-Sisi, A.B. An Adaptation Technique to Enhance HTN Planning. IJCI Int. J. Comput. Inf. 2023, 10, 157–163. [Google Scholar]
Goldman, R.P.; Zaidins, P.; Kuter, U.; Nau, D. A Comparative Analysis of Plan Repair in HTN Planning. In Proceedings of the 7th ICAPS Workshop on Hierarchical Planning, Banff, AB, Canada, 1–6 June 2024. [Google Scholar]
Wang, H.-W.; Liu, D.; Zhao, P.; Chen, X. Review on Hierarchical Task Network Planning under Uncertainty. Acta Autom. Sin. 2016, 42, 655–667. [Google Scholar]
Hu, Y.; Zhuo, H.H. Multi-Task Reinforcement Learning with Cost-Based HTN Planning. In Proceedings of the 2024 5th International Conference on Computer Engineering and Application (ICCEA), Hangzhou, China, 12–14 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 155–160. [Google Scholar]
Erol, K.; Hendler, J.; Nau, D.S. HTN Planning: Complexity and Expressivity. In Proceedings of the AAAI, Seattle, WA, USA, 31 July–4 August 1994; Volume 94, pp. 1123–1128. [Google Scholar]
Erol, K.; Hendler, J.; Nau, D.S. Complexity Results for HTN Planning. Ann. Math. Artif. Intell. 1996, 18, 69–93. [Google Scholar]
Qi, C.; Wang, D.; Munoz-Avila, H.; Zhao, P.; Wang, H. Hierarchical Task Network Planning with Resources and Temporal Constraints. Knowl.-Based Syst. 2017, 133, 17–32. [Google Scholar]
Liu, D.; Wang, H.; Qi, C.; Zhao, P.; Wang, J. Hierarchical Task Network-Based Emergency Task Planning with Incomplete Information, Concurrency and Uncertain Duration. Knowl.-Based Syst. 2016, 112, 67–79. [Google Scholar] [CrossRef]
Zhao, P.; Qi, C.; Liu, D. Resource-Constrained Hierarchical Task Network Planning under Uncontrollable Durations for Emergency Decision-Making. J. Intell. Fuzzy Syst. 2017, 33, 3819–3834. [Google Scholar] [CrossRef]
Zhao, P.; Wang, H.; Qi, C.; Liu, D. HTN Planning with Uncontrollable Durations for Emergency Decision-Making. J. Intell. Fuzzy Syst. 2017, 33, 255–267. [Google Scholar]
Mai, Z.; Zhang, J.; Xu, Z.; Xiao, Z. Is LLaMA 3 Good at Sarcasm Detection? A Comprehensive Study. In Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence (MLMI), Osaka, Japan, 2–4 August 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 141–145. [Google Scholar]
Höller, D.; Behnke, G.; Bercher, P.; Biundo, S. Language Classification of Hierarchical Planning Problems. In Proceedings of the 21st European Conference on Artificial Intelligence, ECAI 2014, Prague, Czech Republic, 18–22 August 2014; IOS Press: Amsterdam, The Netherlands, 2014; pp. 447–452. [Google Scholar]
Sandamali, M.; Gamini, D. Comparison of HTN Planning and OR-Based Approaches for Solving Problems in Logistics Domains. Vidyodaya J. Sci. 2024, 27. [Google Scholar] [CrossRef]
Zhao, P.; Li, K.; Hong, B.; Zhu, A.; Liu, J.; Dai, S. Task Allocation Planning Based on Hierarchical Task Network for National Economic Mobilization. J. Artif. Intell. Gen. Sci. (JAIGS) 2024, 5, 22–31. [Google Scholar] [CrossRef]
Wang, J.; Li, H.; Ru, J.; Xu, H. Decision Support for Beyond Visual Range Air Combat Based on HTN Planning and Machine Learning. In Proceedings of the 2024 36th Chinese Control and Decision Conference (CCDC), Xi’an, China, 25–27 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 3188–3193. [Google Scholar]
Xiao, Z.; Cui, Y.; Mai, Z.; Xu, Z.; Li, J. Corporate Event Prediction Using Earning Call Transcripts. In Information Management and Big Data, Proceedings of the 10th Annual International Conference, SIMBig 2023, Mexico City, Mexico, 13–15 December 2023; Springer: Cham, Switzerland, 2024; pp. 261–272. [Google Scholar]
Xiao, Z.; Mai, Z.; Xu, Z.; Cui, Y.; Li, J. Corporate Event Predictions Using Large Language Models. In Proceedings of the 2023 10th International Conference on Soft Computing & Machine Intelligence (ISCMI), Mexico City, Mexico, 25–26 November 2023; pp. 193–197. [Google Scholar]
Shen, Y.; Yan, M. HTN Planning for Dynamic Vehicle Scheduling with Stochastic Trip Times. Neural Comput. Appl. 2023, 35, 9917–9930. [Google Scholar]
Tang, Y.; Meneguzzi, F.; Sycara, K.; Parsons, S. Planning over MDPs through Probabilistic HTNs. In Proceedings of the AAAI-11 Workshop on Generalized Planning, San Francisco, CA, USA, 7–8 August 2011. [Google Scholar]
Sohrabi, S.; Baier, S.; McIlraith, S.A. HTN Planning with Preferences. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, CA, USA, 11–17 July 2009; pp. 1790–1797. [Google Scholar]
Kuter, U.; Nau, D.; Reisner, E.; Goldman, R. Conditionalization: Adapting Forward-Chaining Planners to Partially Observable Environments. In Proceedings of the ICAPS 2007 Workshop on Planning and Execution for Real-World Systems, Providence, RI, USA, 22–26 September 2007. [Google Scholar]
Nau, D.S.; Au, T.-C.; Ilghami, O.; Kuter, U.; Murdock, J.W.; Wu, D.; Yaman, F. SHOP2: An HTN Planning System. J. Artif. Intell. Res. 2003, 20, 379–404. [Google Scholar] [CrossRef]
Bouguerra, A.; Karlsson, L. Hierarchical Task Planning Under Uncertainty. Available online: https://www.researchgate.net/profile/Lars-Karlsson-10/publication/2949969_Hierarchical_Task_Planning_under_Uncertainty/links/09e4150ffdb6d19c13000000/Hierarchical-Task-Planning-under-Uncertainty.pdf (accessed on 25 December 2024).
Bouguerra, A.; Karlsson, L. PC-SHOP: A Probabilistic-Conditional Hierarchical Task Planner. Intell. Artif. 2005, 2, 44–50. [Google Scholar]
Majercik, S.M.; Littman, M.L. Contingent Planning under Uncertainty via Stochastic Satisfiability. Artif. Intell. 2003, 147, 119–162. [Google Scholar] [CrossRef]
Brafman, R.; Shani, G. A Multi-Path Compilation Approach to Contingent Planning. In Proceedings of the AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; Volume 26, pp. 1868–1874. [Google Scholar]
Maliah, S.; Brafman, R.; Karpas, E.; Shani, G. Partially Observable Online Contingent Planning Using Landmark Heuristics. In Proceedings of the International Conference on Automated Planning and Scheduling, Portsmouth, NH, USA, 21–26 June 2014; Volume 24, pp. 163–171. [Google Scholar]
Shmaryahu, D.; Shani, G.; Hoffmann, J.; Steinmetz, M. Partially Observable Contingent Planning for Penetration Testing. In Proceedings of the Iwaise: First International Workshop on Artificial Intelligence in Security, Melbourne, Australia, 20 August 2017; Volume 33. [Google Scholar]
Shani, G. Heuristics for Partially Observable Stochastic Contingent Planning. In Proceedings of the ECAI 2024, 27th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 19–24 October 2024; IOS Press: Amsterdam, The Netherlands, 2024; pp. 4124–4131. [Google Scholar]
Hogg, C.; Kuter, U.; Munoz-Avila, H. Learning Methods to Generate Good Plans: Integrating Htn Learning and Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Atlanta, GA, USA, 11–15 July 2010; Volume 24, pp. 1530–1535. [Google Scholar]
Luo, J.; Zhu, C.; Zhang, W. Messy Genetic Algorithm for the Optimum Solution Search of the HTN Planning. In Foundations of Intelligent Systems, Proceedings of the Sixth International Conference on Intelligent Systems and Knowledge Engineering, ISKE2011, Shanghai, China, 15–17 December 2011; Springer: Berlin/Heidelberg, Germany, 2012; pp. 93–98. [Google Scholar]
Georgievski, I.; Lazovik, A. Utility-Based HTN Planning. In Proceedings of the ECAI 2014, the Twenty-First European Conference on Artificial Intelligence, Prague, Czech Republic, 18–22 August 2014; IOS Press: Amsterdam, The Netherlands, 2014; pp. 1013–1014. [Google Scholar]
Behnke, G.; Höller, D.; Biundo, S. Finding Optimal Solutions in HTN Planning-A SAT-Based Approach. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; pp. 5500–5508. [Google Scholar]
Höller, D.; Bercher, P.; Behnke, G.; Biundo, S. HTN Planning as Heuristic Progression Search. J. Artif. Intell. Res. 2020, 67, 835–880. [Google Scholar]
Shao, T.; Zhang, H.; Cheng, K.; Zhang, K.; Bie, L. The Hierarchical Task Network Planning Method Based on Monte Carlo Tree Search. Knowl.-Based Syst. 2021, 225, 107067. [Google Scholar] [CrossRef]
Behnke, G.; Speck, D. Symbolic Search for Optimal Total-Order HTN Planning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 11744–11754. [Google Scholar]
Yousefi, M.; Bercher, P. Laying the Foundations for Solving FOND HTN Problems: Grounding, Search, Heuristics (and Benchmark Problems). In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Jeju, Republic of Korea, 3–9 August 2024. [Google Scholar]
Chen, D.; Bercher, P. Fully Observable Nondeterministic HTN Planning–Formalisation and Complexity Results. In Proceedings of the International Conference on Automated Planning and Scheduling, Guangzhou, China, 2–13 August 2021; Volume 31, pp. 74–84. [Google Scholar]
Chen, D.Z.; Bercher, P. Flexible Fond Htn Planning: A Complexity Analysis. In Proceedings of the International Conference on Automated Planning and Scheduling, Singapore (Virtual), 13–24 June 2022; Volume 32, pp. 26–34. [Google Scholar]
Quemy, A.; Schoenauer, M.; Dreo, J. MultiZenoTravel: A Tunable Benchmark for Multi-Objective Planning with Known Pareto Front. arXiv 2023, arXiv:2304.14659. [Google Scholar]
Bercher, P.; Behnke, G.; Höller, D.; Biundo, S. An Admissible HTN Planning Heuristic. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 480–488. [Google Scholar]
Weld, D.S.; Anderson, C.R.; Smith, D.E. Extending Graphplan to Handle Uncertainty & Sensing Actions. In Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), Madison, WI, USA, 26–30 July 1998; pp. 897–904. [Google Scholar]

Figure 1. Average runtime for the medicate domain with n diseases.

Table 1. Plan 1 for the extended ZenoTravel problems.

Plan 1
Actions	Start-Time	End-Time
(!Observe usable A)
(!board-passenger 20)	16:35	16:55
(!fly A B)	16:55	19:10
(!debark-passenger 10)	19:10	19:25
(!board-passenger 30)	19:10	19:25
(!fly B C)	19:25	21:20
(!debark-passenger 40)	21:20	21:35
(!Observe unusable A)
(!board-passenger 20)	16:35	16:55
(!fly A B)	16:55	19:10
(!debark-passenger 10)	19:10	19:25
(!board-passenger 30)	19:10	19:25
(!fly B C)	19:25	21:20
(!debark-passenger 40)	21:20	21:35

Table 2. Plan 2 for the extended ZenoTravel problems.

Plan 2
Actions	Start-Time	End-Time
(!Observe usable A)
(!refuel-at A)	16:35	16:55
(!board-passenger 20)	16:35	16:55
(!zoom A B)	16:55	18:30
(!debark-passenger 10)	18:30	18:45
(!board-passenger 30)	18:30	18:45
(!fly B C)	18:45	20:40
(!debark-passenger 40)	20:40	20:55
(!Observe unusable B)
(!board-passenger 20)	16:35	16:55
(!fly A B)	16:55	19:10
(!debark-passenger 10)	19:10	19:25
(!board-passenger 30)	19:10	19:25
(!Observe usable B)
(!refuel-at B)	19:10	19:25
(!zoom B C)	19:25	20:55
(!board-passenger 40)	20:55	21:10
(!Observe unusable B)
NULL

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, P.; Liu, X.; Su, X.; Wu, D.; Li, Z.; Kang, K.; Li, K.; Zhu, A. Probabilistic Contingent Planning Based on Hierarchical Task Network for High-Quality Plans. Algorithms 2025, 18, 214. https://doi.org/10.3390/a18040214

AMA Style

Zhao P, Liu X, Su X, Wu D, Li Z, Kang K, Li K, Zhu A. Probabilistic Contingent Planning Based on Hierarchical Task Network for High-Quality Plans. Algorithms. 2025; 18(4):214. https://doi.org/10.3390/a18040214

Chicago/Turabian Style

Zhao, Peng, Xiaoyu Liu, Xuqi Su, Di Wu, Zi Li, Kai Kang, Keqin Li, and Armando Zhu. 2025. "Probabilistic Contingent Planning Based on Hierarchical Task Network for High-Quality Plans" Algorithms 18, no. 4: 214. https://doi.org/10.3390/a18040214

APA Style

Zhao, P., Liu, X., Su, X., Wu, D., Li, Z., Kang, K., Li, K., & Zhu, A. (2025). Probabilistic Contingent Planning Based on Hierarchical Task Network for High-Quality Plans. Algorithms, 18(4), 214. https://doi.org/10.3390/a18040214

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Probabilistic Contingent Planning Based on Hierarchical Task Network for High-Quality Plans

Abstract

1. Introduction

2. Literature Review

3. Problem Definition

3.1. Belief States

3.2. Task Network

3.3. Planning Domain Knowledge

3.4. Problem and Plan

4. Heuristic

5. Planning Algorithm

6. Empirical Study

6.1. Medicate Domain

6.2. Extended ZenoTravel Domain

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI