Resource Allocation Approach of Avionics System in SPO Mode Based on Proximal Policy Optimization

Dong, Lei; Liu, Jiachen; Sun, Zijing; Chen, Xi; Wang, Peng

doi:10.3390/aerospace11100812

Open AccessArticle

Resource Allocation Approach of Avionics System in SPO Mode Based on Proximal Policy Optimization

by

Lei Dong

^1,2,

Jiachen Liu

^1,3

,

Zijing Sun

⁴

,

Xi Chen

^1,2,* and

Peng Wang

^1,2

¹

Key Laboratory of Civil Aircraft Airworthiness Technology, Civil Aviation University of China, Tianjin 300300, China

²

Science and Technology Innovation Research Institute, Civil Aviation University of China, Tianjin 300300, China

³

College of Safety Science and Engineering, Civil Aviation University of China, Tianjin 300300, China

⁴

Xi’an Aeronautics Computing Technique Research Institute, AVIC, Xi’an 710000, China

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(10), 812; https://doi.org/10.3390/aerospace11100812

Submission received: 10 September 2024 / Revised: 29 September 2024 / Accepted: 2 October 2024 / Published: 4 October 2024

(This article belongs to the Collection Avionic Systems)

Download

Browse Figures

Versions Notes

Abstract

Single-Pilot Operations (SPO) mode is set to reshape the decision-making process between human-machine and air-ground operations. However, the limited on-board computing resources impose greater demands on the organization of performance parameters and the optimization of process efficiency in SPO mode. To address this challenge, this paper first investigates the flexible requirements of avionics systems arising from changes in SPO operational scenarios, then analyzes the architecture of Reconfigurable Integrated Modular Avionics (RIMA) and its resource allocation framework in the context of scarcity and configurability. A “mission-function-resource” mapping relationship is established between the reconfiguration service elements of SPO mode and avionics resources. Subsequently, the Proximal Policy Optimization (PPO) algorithm is introduced to simulate the resource allocation process of IMA reconfiguration in SPO mode. The objective optimization process is transformed into a sequential decision-making problem by considering constraints and optimization criteria such as load, latency, and power consumption within the feasible domain of avionics system resources. Finally, the resource allocation scheme for avionics system reconfiguration is determined by controlling the probability of action selection during the interaction between the agent and the environment. The experimental results show that the resource allocation scheme based on the PPO algorithm can effectively reduce power consumption and latency, and the DRL model has strong anti-interference and generalization. This enables avionics resources to respond dynamically to the capabilities required in SPO mode and enhances their ability to support the aircraft mission at all stages.

Keywords:

proximal policy optimization; avionics system; phased-mission; dynamic resource allocation; deep reinforcement learning; single-pilot operations

1. Introduction

Under the trend of intelligent and low-cost development in commercial aviation, the Single-Pilot Operations (SPO) mode provides remote flight support services through advanced AI-based avionics systems and Ground Operators (GO), assisting flight crews in controlling aircraft operations [1,2]. It has become one of the main directions in the development of modern aviation technology [3,4]. Although the use of the Two-Crew Operations (TCO) mode can reduce the workload of pilots, problems such as cognitive deficiencies, thinking biases, and inconsistent movements during the collaboration process can directly affect the efficiency of on-board pilots’ decisions. If the number of crewmembers can be reduced under the premise of ensuring the functionality and safety requirements of commercial aircraft, it can decrease the space and size of the cockpit, optimize the allocation of cockpit resources, alleviate the pressure of pilot shortage worldwide, and reduce the flight deck crew cost [5].

The avionics system is the aircraft’s “brain” and “central nervous system”, and its performance directly impacts the aircraft’s level of automation and autonomy [6]. The current generation of avionics systems is evolving toward Integrated Modular Avionics (IMA), characterized by highly shared system resources, highly integrated data, and highly concentrated software [7]. The dynamic reconfiguration technology of IMA effectively supports the function allocation mechanism of the avionics system through resource redistribution, redundant SW/HW backup, and the deployment of application functions [8]. Reconfigurable Integrated Modular Avionics (RIMA) minimizes the idleness and wastage of the system’s resources and capacity, enhancing the effectiveness of the system’s operational outcomes [9].

It is essential to ensure that system functions are intelligently aligned with the dynamic operational scenarios of the SPO mode. AI-based avionics systems need to have learning capabilities in certain operational scenarios, enabling them to function as “teammates” in collaboration with pilots, thereby forming a Human-Autonomy Teaming (HAT) cooperation model [10,11]. For instance, Lim et al. proposed a Virtual Pilot Assistance (VPA) system architecture for SPO, incorporating intelligent components such as a Cognition-Complexity model and an Uncertainty analysis model. The VPA allows the system to adjust its autonomy level according to task complexity and the current state of the crew, preventing an excessive workload on the single pilot [12,13]. Tokadlı et al. developed a Playbook delegation interface that enables pilots to call and modify plays in collaboration with their autonomous teammates. In a study involving twenty pilots, the Playbook interface was evaluated to explore real-time function allocation, identifying the teaming skills needed to support HAT [14,15]. Li et al. proposed a Virtual Co-Pilot (V-CoP) system for the SPO mode, which integrates pilot commands and real-time cockpit instrumentation data, utilizing a multimodal large-scale language model to achieve high accuracy in scenario analysis and information retrieval [16]. It is evident that the SPO mode will reshape the organizational and decision-making process between human-machine and air-ground operations. Specifically, intelligent functions will consume limited on-board computational resources, thereby imposing progressively higher demands on the organization of avionics system performance and the optimization of process efficiency.

Traditional reconfiguration methods for avionics systems typically rely on pre-trained resource allocation schemes generated by heuristic algorithms, such as simulated annealing [17], genetic algorithms [18], and particle swarm optimization [19]. At the same time, the theoretical foundations of these methods are heavily dependent on the accuracy of the model. As a result, it can be challenging to achieve optimal solutions for non-convex optimization problems. The IMA architecture synthesizes system applications, functions, and resources. The objectives, capabilities, operations, and optimization involved in this synthesis pose significant challenges for traditional optimization methods due to the increasing complexity of aircraft applications, the broader range of system information, and the higher demands for operational quality of system resources. This complexity substantially increases the difficulty of system integration and synthesis. Therefore, a more flexible framework is needed to tackle the resource allocation challenges in complex avionics systems. Deep Reinforcement Learning (DRL) offers a promising approach to intelligent decision-making in high-dimensional, complex state spaces. Unlike traditional methods, DRL does not require assumptions about the optimization objective and maintains model accuracy by learning from experiences, which are stored and used to train neural networks through repeated interactions between the intelligent agent and the system environment.

Recent scholarly efforts have increasingly focused on applying DRL to address the dynamic reconfiguration and resource allocation challenges in IMA systems. Zhang et al. proposed a sequential game-based reconfiguration method for IMA systems using multi-agent reinforcement learning, which enhances the efficiency of resource allocation in avionics systems [20,21]. Li et al. developed a DRL-based IMA task allocation model and introduced a generalized architecture for solving the avionics task set allocation and scheduling problem in scenarios characterized by high uncertainty and complexity [22]. Such as SPO, Dong et al. designed a DRL-based task allocation framework for intelligent agent coalitions, addressing task allocation and scheduling issues by considering constraints like resource demands and execution windows within the SPO collaborative interaction system [23]. In essence, if SPO mode can be cross-integrated with the dynamic reconfiguration technology of IMA, it would leverage the avionics functions’ capability to interpenetrate and re-plan, thereby maximizing the support of avionics resources for missions at each stage of the SPO mode.

This paper makes the following specific contributions: (1) It establishes a resource allocation framework for avionics system reconfiguration progress across different SPO scenarios and constructs the mapping relationship between the reconfiguration service elements of the SPO mode and the RIMA resources based on the “mission-function-resource” hierarchy. (2) It investigates the Proximal Policy Optimization (PPO) algorithm, transforming the multi-objective optimization process into a sequential decision-making problem by considering constraints and optimization criteria such as load, latency, and power consumption. (3) It validates the feasibility and effectiveness of the proposed algorithm through simulation experiments and analysis of the results.

2. System Model and Problem Description

2.1. Resource Allocation Framework for Avionics System in SPO Mode

Initial research on SPO primarily focused on updating cockpit on-board equipment and providing remote ground station support. However, as research progressed, attention shifted toward a combined “cockpit + ground station” approach [24]. In this combined program, the cognitive space of the single pilot and the logical space of the intelligent system are interactively deduced based on flight and pilot conditions. A taxonomy of operating conditions for SPO can be constructed, as shown in Figure 1.

As the Taxonomy Conditions (TC) progress from 1 to 4, the operating conditions become increasingly challenging, and the requirements for the safe implementation of SPO grow more complex. To enhance avionics resources to support each mode within the SPO taxonomy, this paper builds on the Cognitive Pilot-Aircraft Interface (CPAI) proposed in [25]. Accordingly, we present the resource allocation framework for the avionics system in SPO mode, as illustrated in Figure 2.

Drawing upon the analysis in literature [26] of the relationship between different degrees of autonomy and the avionics system requirements in SPO mode and the validation method of task-function allocation in SPO mode, as studied in [27], we observe that the resource allocation process of the avionics system in SPO mode can be divided into multiple consecutive and non-overlapping phases. The resource allocation and functional behaviors of each phase differ, suggesting characteristics of a Phased-Mission System (PMS) [28]. This means that the AI-based avionics system determines the resource allocation scheme according to different operating conditions and flight missions in SPO mode. For example, in the avionics system resource allocation process shown in Figure 2, compared to SPO TC-1, in TC-3, where the single pilot is incapacitated, more resources should be allocated to avionics functions related to autonomy and situational awareness. To achieve these objectives, the avionics system in SPO mode must have the capability to redistribute resources and deploy application functions. Therefore, the CPAI should adopt an integrated, modular, and reconfigurable architecture.

2.2. Resource Allocation Model for Avionics System in SPO Mode

2.2.1. Hierarchical Architecture of Avionics System

The avionics system architecture in SPO mode should transition from a functional organization of equipment or subsystems to a focus on system capability supply. By leveraging system equipment capabilities, specialization, and functional synthesis, it should establish the organization of equipment functions and resource capability configuration. The avionics system architecture can be defined based on the “mission-function-resource” hierarchical mapping relationship. The hierarchical architecture of the avionics system in the SPO scenario [29] is shown in Figure 3.

(1): Mission layer

Firstly, the mission objectives of the avionics system need to be defined based on the specific requirements of SPO-TC. Through physiological measurables, cognitive indicators and external condition variables monitoring, the operation scenario of the SPO-TC is constructed. To lay the foundation for clarifying the mapping relationship between top-level missions and the avionics system functions, these scenarios of the SPO-TC can be defined as the system’s mission layer.

(2): Function layer

Considering the subsequent mapping from the function layer to the resource layer, the functions must be further decomposed. The avionics system primarily consists of communication, navigation, cockpit display and other functions [30]. For avionics software, these functions can be further decomposed into several independent or interrelated entities, referred to as “tasks” [31]. The set of these functions is defined as:

F = {f_{1}, f_{2}, \dots, f_{H}}

(1)

where the hth function is denoted as f_h, and H represents the total number of functions. From these H functions, a total of K tasks can be decomposed. Each task is characterized by attributes such as task arrival time, worst-case execution time, storage resource requirements, and reconfiguration loading time.

(3): Resource layer

The generation of system functions is highly dependent on the effective allocation of resources, and it is necessary to establish a mapping relationship between the logical structure and physical entities. When researching the resident function, the resources of the RIMA platform should be quantified. To simplify calculations and meet actual system requirements, the resources provided by the avionics system are categorized into computing resources, storage resources, and communication resources. In the actual avionics system, the General Processing Modules (GPMs) can be divided into multiple virtual partitions based on their functions. All tasks within a partition share the resources allocated to that partition. The set of resources is defined as:

C = {c_{1}, c_{2}, \dots, c_{M}}

(2)

where the mth processing module is denoted as c_m, and M represents the total number of processing modules. These M modules can be divided into N partitions by virtual boundaries. Each partition is characterized by attributes such as partition timeframe and storage resource capacity.

2.2.2. Resource Allocation Mechanism Based on RIMA

The execution of avionics system reconfiguration requires certain triggering conditions, primarily including mission switching and system resource failure [32,33]. When the reconfiguration trigger condition is monitored, the resource allocation scheme can be loaded to each partition of the RIMA to execute the reconfiguration of avionics system resources. Compared to resource failures, SPO mission switching has a more significant impact on flight safety. To emphasize the research objectives of this paper, the following assumptions are made:

Assumption 1.

The reconfiguration process does not consider the failure-oriented resource allocation mechanism; the avionics system resource layer is failure-immune.

Assumption 2.

The resource allocation follows a First-Come-First-Served (FCFS) strategy to enable the responsive allocation of avionics resources to dynamic missions in SPO mode.

Assumption 3.

The avionics system has a backup module, ensuring that the reconfiguration and configuration of new avionics functions do not preempt resources allocated to existing functions.

2.2.3. Resource Allocation Constraints and Optimization Indicators

(1): Resource allocation constraints

Firstly, the basic constraints are defined within the “mission-function-resource” architecture of the RIMA system to identify the effective feasible domain space for resources. In terms of time, each task has a specific arrival time and execution time, and all tasks within a partition must complete reconfiguration in the shortest possible time. Regarding memory, due to the limited capacity in each partition, it is essential to ensure that the total memory occupied by the tasks does not exceed the available memory in the partition. As for uniqueness, different tasks can be loaded into the same partition, provided that the resource capacity of the partition meets the requirements.

(2): Resource allocation optimization indicators

Load balancing: Balancing the load in each partition enhances the execution speed of tasks. At the end of each reconfiguration, load balancing reflects the distribution of system resource allocation. The standard deviation measures the dispersion of the load across each partition. Based on this, the load balancing metric at moment t can be defined as:

H B (t) = \frac{1}{N} \sum_{n = 1}^{N} {(l o a d_{n} (t) - \bar{l o a d} (t))}^{2}

(3)

l o a d_{n} (t) = β_{1} U R_{n}^{use} (t) + β_{2} M R_{n}^{use} (t)

(4)

\bar{l o a d} (t) = \frac{1}{N} \sum_{n = 1}^{N} l o a d_{n} (t)

(5)

U R_{n}^{use} (t) = \frac{\sum_{k = 1}^{K} ψ_{k, n} (t) \times W C E T_{k}}{T D_{n}}

(6)

M R_{n}^{use} (t) = \frac{\sum_{k = 1}^{K} ψ_{k, n} (t) \times R A M_{k}}{P M_{n}}

(7)

where

l o a d_{n} (t)

denotes the real-time load of partition n at moment t,

\bar{l o a d} (t)

denotes the average real-time load of the partition;

U R_{n}^{use} (t)

denotes the real-time CPU utilization of partition n,

W C E T_{k}

denotes the worst case execution time of the kth task, TD_n denotes the frame time of partition n;

M R_{n}^{use} (t)

denotes the memory utilization of partition n,

R A M_{k}

denotes the plan memory of the kth task, PM_n denotes the storage capacity of partition n, and

ψ_{k, n} (t)

takes a value of either 1 or 0, signifying whether task k is allocated to partition n at moment t.

Latency reducing: The resource allocation of the avionics system caused by SPO mode switching is characterized by emergence, batch and real-time. The typical AFDX communication network in the IMA system adopts the FCFS virtual connection scheduling algorithm, which adopts the first-come-first-served scheduling strategy for the tasks in each queue [34]. The recovery of processes during the reconfiguration of the IMA system takes up most of the time. If multiple processes are located in the same processing module, their recovery occurs serially. This means that the loading time for resource allocation must be calculated cumulatively based on the actual reconfiguration order of the processes. Based on this, the latency metric can be defined as:

L T (t) = \sum_{k = 1}^{K} (t_{k}^{'} - t_{k})

(8)

where t_k denotes the arrival time of task k and

t_{k}^{'}

denotes the start time of task k. As shown in Figure 4, task 1, task 2, and task 3 arrive at moments t₁, t₂ and t₃, and start at moments

t_{1}^{'}

,

t_{2}^{'}

and

t_{3}^{'}

. When task 1 and task 2 arrive, the partition still has enough CPU resources, and the resource allocation demand can be satisfied immediately at this time. While task 3 arrives with insufficient CPU resources, it needs to wait for

t_{3}^{'} - t_{3}

in the queue before executing the reconfiguration. Therefore, it is necessary to optimize the resource allocation scheme further to minimize the possible delay caused by the resource allocation of the avionics system.

Power consumption reducing: The current IMA system is characterized by unbalanced comprehensive performance indices, primarily evident in high hardware integration and processing capability, as well as volume, weight, and power consumption. Additionally, the system imposes stringent requirements on the environmental control of the installed components. Particularly, when system reconfiguration occurs due to changes in the SPO mode operation scenario, the limited on-board computational resources must minimize the power consumption associated with resource allocation. Based on this, the instantaneous power consumption metric at moment t can be defined as:

P C (t) = P^{0 %} + P^{100 %} - P^{0 %} \times [2 U R_{n}^{use} (t) - {(U R_{n}^{use} (t))}^{1.4}]

(9)

where

P^{0 %}

and

P^{100 %}

are the power consumption of partition in the idle mode and full load mode, respectively, let N partitions in the IMA system have the same values of

P^{0 %}

and

P^{100 %}

.

{U R}_{n}^{u s e} (t)

denotes the real time CPU utilization of partition n.

(3): Objective optimization function

Combining the basic constraints and optimization indicators, the objective optimization function can be defined as:

\begin{array}{l} o b j : & \min {λ_{1} H B (t) + λ_{2} L T (t) + λ_{3} P C (t)} \\ s . t . & β_{1} + β_{2} = 1, λ_{1} + λ_{2} + λ_{3} = 1 \\ \forall U R^{use} (t) \leq U R^{use - \max} (t) \\ \forall M R^{use} (t) \leq M R^{use - \max} (t) \end{array}

(10)

3. Problem Transformation and Algorithm Design

3.1. MDP for Avionics Resource Allocation in SPO Mode

The environment in a reinforcement learning problem is typically described by a Markov Decision Process (MDP), a mathematical model for sequential decision-making used to study optimization problems solvable by dynamic programming. MDP is characterized by a quintuple

〈 S, A, P, R, γ 〉

, where S and A denote the state space and action space, respectively. The state transition probability is defined by P, with

P (\cdot) = p (s^{'} | s, a)

denoting the probability of transitioning to state

s^{'}

after taking action a from state s. The reward function R denotes the reward function for taking action a in state s.

γ \in (0, 1]

is the discount factor, which indicates the weighting of long-term rewards. The primary objective of reinforcement learning is to enable the agent to discover an optimal policy that maximizes the long-term cumulative reward through continuous interaction with the environment. Drawing inspiration from the success of reinforcement learning in solving sequential decision-making problems, the resource allocation problem for avionics systems in SPO mode can be approached by modeling it as an MDP.

3.1.1. State Space

The state space should encapsulate the information required for resource allocation in the avionics system under the SPO mode, as well as the selection of actions based on the observed data. The state

s (t_{k})

is defined as the concatenation of the partition state

s_{n}^{t_{k}}

and task state

s_{k}

within the RIMA system’s processing module at the moment of the task’s arrival. The resource requirements for the kth task, including computational requirement and memory requirement, are denoted as

u_{k 1}

and

u_{k 2}

, respectively. Similarly, the computational resource utilization and memory resource utilization of partition n at this moment are represented by

u_{n 1}^{t_{k}}

and

u_{n 2}^{t_{k}}

. The execution time of the kth task is denoted by

v_{k}

. Therefore, the state space can be defined as:

\begin{matrix} s (t) & = [s_{1}^{t_{k}}, \dots, s_{N}^{t_{k}}, s_{k}] \\ = [u_{11}^{t_{k}}, u_{12}^{t_{k}} \dots, u_{N 1}^{t_{k}}, u_{N 2}^{t_{k}}, u_{k 1}, u_{k 2}, v_{k}] \end{matrix}

(11)

3.1.2. Action Space

The process of selecting an action is essentially a resource allocation procedure for avionics system reconfiguration oriented toward SPO mode switching. Specifically, this involves loading task k into partition n (indexed by the reconfiguration decision variable) at moment t. Given that there is a total of N possible actions to choose from, the action space can be defined as:

a (t) = n \in {1, 2, \dots, N}

(12)

3.1.3. Reward Function

The reward function serves as the sole metric for the agent to assess the effectiveness of its current actions. It is also crucial in converting decision-making problems into reinforcement learning tasks. This study proposes the conversion of the decision evaluation index (objective optimization function) into a reward function, ensuring compliance with the constraints outlined in Equation (10) during the allocation process. Consequently, the reward function is defined as:

R (s, a) = - λ_{1} H B (t) - λ_{2} L T (t) - λ_{3} P C (t)

(13)

3.2. PPO Algorithm Network Model

The Proximal Policy Optimization (PPO) algorithm is a reinforcement learning method that operates within an actor-critic framework. This algorithm’s architecture consists of two main components: the actor network, which follows a policy-based approach to generate actions and interact with the environment for optimizing the policy model

π (s, a)

, and the critic network, which employs a value-based approach to evaluate the quality of actions and guide the selection of subsequent actions to optimize the value function [35]. The network structure of the PPO algorithm is shown in Figure 5.

Although the traditional policy gradient algorithms, such as Asynchronous Advantage Actor Critic (A3C) [36] and Actor–Critic with Experience Replay (ACER) [37], have achieved good control effects in many decision-making problems, they still face many problems, such as the difficult selection of iteration step size and low data utilization. The PPO algorithm is a new policy gradient algorithm modified from the Trust Region Policy Optimization (TRPO) algorithm. The update is monotonic, and the updated policy is always better than the previous one. The objective function in TRPO is denoted by:

L_{c p i} (θ) = {\hat{E}}_{t} [\frac{{\tilde{π}}_{θ} (a_{t} | s_{t})}{π_{θ} (a_{t} | s_{t})} {\hat{A}}_{π_{θ}, t}] = {\hat{E}}_{t} [r_{t} (θ) {\hat{A}}_{π_{θ}, t}]

(14)

where cpi represents the conservative policy iteration,

r_{t} (θ) = {\tilde{π}}_{θ} (a_{t}| s_{t}) / π_{θ} (a_{t} | s_{t})

represents the importance sampling weight between the old policy

π_{θ}

and new policy

{\tilde{π}}_{θ}

,

{\hat{A}}_{π_{θ}, t}

represents the estimation of the advantage function at timestep t. Its definition is given by Equation (15):

{\hat{A}}_{π_{θ}, t} = Q_{π_{θ}, t} (s_{t}, a_{t}) - V (s_{t}, ω_{t})

(15)

where

Q_{π_{θ}, t} (s_{t}, a_{t})

denotes the value of executing action

a_{t}

in state

s_{t}

at time t under policy

π_{θ}

. Meanwhile,

V (s_{t}, ω_{t})

indicates the expected value of state

s_{t}

at time t given that the Critic network’s parameter is

ω_{t}

.

However, maximizing

L_{c p i}

may result in high variance and excessively large policy updates. To address this, the surrogate objective function employed in this work modifies the policy’s original objective function to ensure stable policy updates, which is denoted by:

L_{c l i p} (θ) = {\hat{E}}_{t} \{m i n [\begin{array}{l} r_{t} (θ) {\hat{A}}_{π_{θ}, t}, \\ c l i p (r_{t} (θ), 1 - ε, 1 + ε) {\hat{A}}_{π_{θ}, t} \end{array}]\}

(16)

As shown in Equation (16), the objective function is defined as the minimum between the unclipped surrogate objective and the clipped surrogate objective, comparing the probability ratio with a clipped version of the ratio. The clipped surrogate objective can be denoted by:

c l i p (r_{t} (θ), 1 - ε, 1 + ε) = {\begin{cases} 1 - ε, if r_{t} (θ) \leq 1 - ε \\ 1 + ε, if r_{t} (θ) \leq 1 + ε \\ r_{t} (θ), otherwise \end{cases}

(17)

which bounds the policy update to a certain range

[1 - ε, 1 + ε]

, preventing it from deviating too far from the old policy [38]. The PPO algorithm ensures that the policy is updated in a monotonous and undiminished direction by minimizing the loss and guarantees that the update range of the policy is controllable.

3.3. Avionics Resource Allocation Based on the PPO Algorithm

In view of the above ideas, to solve the resource allocation problem of the avionics system using the PPO algorithm, the training process is given in Algorithm 1.

Algorithm 1. DRL with PPO for avionics resource allocation

Input:
    System parameters, state space, action space, discount factor, learning rate.
Run:
    Initialize the hyperparameter and the network parameter of the PPO algorithm
    Generate a simulated avionics resource allocation environment for training according to predefined criteria
    For iteration = 1, 2, …, do
    Collect trajectory data to replay buffer D
        For timestep = 1 to T do
            For task = 1 to K do
                Observe the state s_t of the resource allocation environment
                Run policy

π_{θ}

to choose an action a_t based on the observed state
The DRL agent receives an immediate reward r_t and the next state of the s_t₊₁
Input the agent’s state variables into the critic network to estimate the advantage function

{\hat{A}}_{π_{θ}, t}

            End for
        End for
        Update θ by a gradient method to optimize the loss function

L_{c l i p} (θ)

Replace the parameters of the actor network

π_{θ} \leftarrow {\tilde{π}}_{θ}

End for
Output:
Resource allocation scheme for avionics system in SPO mode

During the training process, the DRL agent interacts with the environment based on the operational scenario information of the avionics system in SPO mode. The agent executes an action a_t according to the current state s_t. When a task arrives, the computational and storage resource utilization rates for each partition are known, allowing the derivation of the actual load, latency, and power consumption post-allocation. This process leads to a new state s_t₊₁ and returns a reward R_t to the DRL agent. These steps are repeated with each new task: collecting new trajectories, computing the surrogate objective, estimating the policy gradient, and updating the policy parameters. This interaction continues for subsequent tasks in the same manner. Following offline training of the DRL model using the PPO algorithm, an optimal strategy is identified to maximize long-term cumulative rewards, which can then be saved and applied to generate the resource allocation scheme for the avionics system in SPO mode.

4. Simulation Results and Analysis

4.1. Simulation Settings

4.1.1. Experimental Parameters

In the experiments, the deep reinforcement learning algorithms are trained on a server with Intel Xeon 8336C (32Core, 64Thread, 2.30GHz, Turbo 3.50GHz) × 2, and NVIDIA RTX 4090 GPU × 2.

4.1.2. Operation Scenario Information

The information about the avionics system operation scenario in SPO mode includes the arrival times of 300 tasks and their associated resource requirements (including computational requirement and memory requirement), represented by the bubble diagram in Figure 6. The scenario information is derived from a mathematical model of CPAI, which is based on selected subsets of variables. The mathematical model evaluates physiological measurables (including heart rate, respiratory rate, and blink rate), cognitive indicators (such as mental workload coefficient and mental fatigue coefficient), and external condition variables (like operational complexity and mental fatigue coefficient) in real time. The switching of the SPO-TC is then implemented within the decision logic based on predefined rules [25].

As illustrated in Figure 6, the single pilot’s cognitive workload gradually increases before moment t₀. At this point, the pilot would likely request the assistance of a ground operator. After moment t₀, the single pilot becomes incapacitated, leading to a transition in the taxonomy condition from SPO TC-2 to SPO TC-3. The AI-based avionics system then assists the ground operator in taking over the duties of the single pilot. To compensate for the absence of the pilot’s independent cognition, behavior, and abilities, the weights of various avionics functions are adjusted, prioritizing those related to autonomy and situational awareness. It should be noted that the weights of avionics functions determine the arrival rate of tasks and the requirements for computational/storage resources in the scenario information. The higher the autonomy level (SPO-TC), the faster the arrival rate of tasks and the more requirements for computational/storage resources. In this paper, the DRL agent is trained with the PPO algorithm based on the operational scenario information of the avionics system in SPO mode and dynamically loads tasks into each partition of RIMA according to the resource allocation scheme.

4.1.3. Hyperparameters

Different hyperparameters in the algorithm affect the optimization and training process. To improve the quality of training, the learning rate and the number of neurons in the hidden layer are selected for comparison and validation. Initially, the learning rate is set to 1 × (10⁻², 10⁻³, 10⁻⁴), and the number of neurons in the hidden layer is set to (32, 64, 256). Subsequently, the orthogonal test method is used to train the model with these hyperparameters by permutation. Finally, the training effect, as reflected in the reward curve shown in Figure 7, and the best parameter combination are selected.

During reinforcement learning training, the agent aims to maximize its cumulative reward over time. By exploring the environment, trying different actions, and observing the resulting rewards, the agent learns to take actions that lead to higher rewards and avoid those that lead to lower rewards. This learning process enables the agent to converge toward an optimal policy that maximizes its long-term cumulative reward. Specifically, in Figure 7a, the agent’s convergence effect oscillates more or even fails to converge. The agent in Figure 7b converges the earliest after about 32,000 steps, but it struggles to converge when the number of hidden units in the neural network is set to 256. Conversely, in Figure 7c, convergence occurs the earliest after about 17,000 steps, and the overall effect is better, though slight fluctuations are observed when the number of hidden units is 32. Combining the above analyses, it is evident that the convergence speed and stability of the neural network are more balanced when the number of hidden units is set to 64, and the learning rate is 1 × 10⁻⁴, indicating that the agent has learned a resource allocation scheme that satisfies the constraints of the SPO mode. The final selected DRL model parameters are shown in Table 1.

4.2. Simulation Experiments for Different Comparative Methods

To demonstrate the advantage of the PPO algorithm in addressing the resource allocation problem of the avionics system in SPO mode, the same state space, action space, reward function, and network parameters are established within a consistent environment. Training is then conducted using the PPO algorithm, A3C algorithm, and ACER algorithm, as illustrated in Figure 8.

Figure 8 shows the A3C algorithm struggles to converge within a limited number of steps. The ACER algorithm improves on this issue but only converges to a locally optimal solution. In contrast, the PPO algorithm not only overcomes the convergence difficulty but also enhances the agent’s learning speed, effectively addressing the resource allocation problem for avionics system reconfiguration. The figure clearly illustrates that the inflection point of the proposed method occurs earlier, and the final convergence value is higher. This is because the core idea behind the PPO algorithm is to introduce a “proximal” term to the objective function, which constrains the policy update to remain close to the old policy. By maintaining this proximity, the PPO algorithm ensures that policy changes are not too drastic, thereby promoting stability during the learning process. To more comprehensively demonstrate the optimization effect of the PPO algorithm on the resource allocation scheme, the Greedy algorithm [39] is further introduced to replace the poorly performing A3C algorithm, as shown in Figure 9.

Next, based on the resource allocation schemes for the avionics system produced by different algorithms, a comparative analysis of power consumption and latency is conducted below.

The performance comparison of the three algorithms in terms of instantaneous power consumption is presented in Figure 10. The figure shows that when the number of task arrivals is small, the instantaneous power consumption of each algorithm does not differ significantly. However, around the arrival of the 140th task, the SPO mode switching leads to a surge in avionics function resource demand and an increase in RIMA system load. At this point, the optimization performance of the PPO algorithm is significantly better than that of the other algorithms, with power consumption reduced by 8.27% and 11.26% on average compared to the A3C algorithm and the Greedy algorithm, respectively. Figure 11 presents a comparison of the algorithms’ performance in terms of latency. It can be observed that the latency of all algorithms increases with the continuous arrival of tasks, and as the number of arrival tasks gradually increases, the performance gap becomes evident. The poor optimization performance of the Greedy algorithm may be due to its greedy strategy of assigning tasks to the partition with the lowest current CPU utilization without considering the overall system optimization objective. As a result, it only achieves a locally optimal solution.

4.3. Simulation Experiments for Different Comparative Operation Scenarios

Following the information extraction process of the avionics system operation scenario depicted in Figure 6, two additional SPO mode avionics system operation scenarios are illustrated in Figure 12. These scenarios were developed based on varying task arrival times and their respective resource requirements.

A comprehensive evaluation of the avionics system resource allocation schemes generated by the method proposed in this paper for different SPO mode operation scenarios is presented below. The optimization effects on power consumption and delay are compared.

As illustrated in Figure 13 and Figure 14, despite the differences among the three types of SPO mode operation scenarios, the method proposed in this paper consistently maintains the superiority and stability of the avionics system resource allocation scheme. This is evidenced by an average reduction of 9.91% in instantaneous power consumption and a 22.06% decrease in cumulative delay. These results demonstrate the effectiveness of the proposed method in dynamically supporting avionics resources and enhancing the aircraft’s mission support capabilities across various phases.

5. Conclusions and Future Works

Based on the analysis of the SPO collaborative interaction scheme and multi-stage mission characteristics of the avionics system, this paper introduces a resource allocation framework tailored to the SPO mode avionics system. The framework incorporates resource allocation constraints and optimization criteria by comprehensively considering load, latency and power consumption within the feasible domain of the avionics system’s “Mission-Function-Resource” architecture. The complex optimization of the reconfigurable avionics system in SPO mode is reformulated into a sequential decision-making resource allocation scheme. Simulation results demonstrate that the resource allocation scheme proposed in this paper reduces power consumption by an average of 9.91% and latency by an average of 22.06% compared to the A3C, ACER, and Greedy algorithms. This enables avionics resources to respond dynamically to the capabilities required in SPO mode and enhances their ability to support the aircraft mission at all stages.

Although the method proposed in this paper demonstrates advantages in resource allocation problems for avionics systems in SPO mode, it also has some limitations. For instance, as an on-policy algorithm, the PPO algorithm cannot leverage past trajectory data, leading to extended training durations. Consequently, future research needs to address how to reduce computation time while maintaining the balance between exploration and exploitation efficiency. Furthermore, the highly nonlinear nature of DRL models can result in unexpected behaviors. Therefore, in aviation applications, it is crucial to conduct explainability analysis to enhance system developers’ understanding and trust in DRL model decisions. This will also help identify and improve factors that impact model performance.

Author Contributions

Conceptualization, L.D. and J.L.; methodology, J.L. and Z.S.; software, J.L.; validation, L.D., X.C. and P.W.; formal analysis, X.C.; investigation, L.D.; resources, L.D. and J.L.; data curation, Z.S.; writing—original draft preparation, J.L.; writing—review and editing, Z.S.; visualization, X.C. and P.W.; supervision, J.L.; project administration, P.W.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities 3122024037 and the Open Fund for the Key Laboratory of Civil Aircraft Airworthiness Technology SH2023101701.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

Author Zijing Sun was employed by the company AVIC. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SPO	Single-Pilot Operations
GO	Ground Operators
TCO	Two-Crew Operations
IMA	Integrated Modular Avionics
RIMA	Reconfigurable Integrated Modular Avionics
HAT	Human-Autonomy Teaming
VPA	Virtual Pilot Assistance
V-CoP	Virtual Co-Pilot
DRL	Deep Reinforcement Learning
PPO	Proximal Policy Optimization
TC	Taxonomy Conditions
CPAI	Cognitive Pilot-Aircraft Interface
GPMs	General Processing Modules
FCFS	First-Come-First-Served
MDP	Markov Decision Process
A3C	Asynchronous Advantage Actor Critic
ACER	Actor Critic with Experience Replay
TRPO	Trust Region Policy Optimization

References

Wang, G.; Li, M.; Wang, M.; Ding, D. A systematic literature review of human-centered design approach in single pilot operations. Chin. J. Aeronaut. 2023, 36, 1–23. [Google Scholar] [CrossRef]
Gore, B.F.; Wolter, C. A task analytic process to define future concepts in aviation. In Proceedings of the Digital Human Modeling. In Proceedings of the Applications in Health, Safety, Ergonomics and Risk Management: 5th International Conference, Heraklion, Crete, Greece, 22–27 June 2014; pp. 236–246.
Stanton, N.A.; Harris, D.; Starr, A. The future flight deck: Modelling dual, single and distributed crewing options. Appl. Ergon. 2016, 53, 331–342. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Zhong, K.L.; Luo, Y.; Miao, W. Key technology and future development of regional airliner. Acta Aeronaut. Astronaut. Sin. 2023, 44, 156–180. [Google Scholar]
Bilimoria, K.D.; Johnson, W.W.; Schutte, P.C. Conceptual framework for single pilot operations. In Proceedings of the Proceedings of the international conference on human-computer interaction in aerospace, New York, USA, 30 July–1 August 2014; pp. 1–8.
Zaeske, W.M.M.; Brust, C.-A.; Lund, A.; Durak, U. Towards Enabling Level 3A AI in Avionic Platforms. In Proceedings of the Software Engineering 2023 Workshops, Paderborn, Germany, 2 February 2023; pp. 189–207. [Google Scholar]
Gaska, T.; Watkin, C.; Chen, Y. Integrated modular avionics-past, present, and future. IEEE Aerosp. Electron. Syst. Mag. 2015, 30, 12–23. [Google Scholar] [CrossRef]
Lukić, B.; Ahlbrecht, A.; Friedrich, S.; Durak, U. State-of-the-Art Technologies for Integrated Modular Avionics and the Way Ahead. In Proceedings of the IEEE/AIAA 42nd Digital Avionics Systems Conference (DASC), Barcelona, Spain, 1–5 October 2023; pp. 1–10. [Google Scholar]
Omiecinski, T.; Johnson, D.; Omiecinski, T.; Johnson, D. Autonomous dynamic reconfiguration of integrated avionics systems. In Proceedings of the Guidance, Navigation, and Control Conference, New Orleans, LA, USA, 11–13 August 1997; pp. 1504–1514. [Google Scholar]
Ziakkas, D.; Pechlivanis, K.; Flores, A. Artificial intelligence (AI) implementation in the design of single pilot operations commercial airplanes. In Proceedings of the 14th International Conference on Applied Human Factors and Ergonomics, San Francisco, CA, USA, 20–24 July 2023; Volume 69, pp. 856–861. [Google Scholar]
Sprengart, S.M.; Neis, S.M.; Schiefele, J. Role of the human operator in future commercial reduced crew operations. In Proceedings of the IEEE/AIAA 37th Digital Avionics Systems Conference (DASC), London, UK, 23–27 September 2018; pp. 1–10. [Google Scholar]
Lim, Y.; Bassien-Capsa, V.; Ramasamy, S.; Liu, J.; Sabatini, R. Commercial airline single-pilot operations: System design and pathways to certification. IEEE Aerosp. Electron. Syst. Mag. 2017, 32, 4–21. [Google Scholar] [CrossRef]
Lim, Y.; Gardi, A.; Ramasamy, S.; Sabatini, R. A virtual pilot assistant system for single pilot operations of commercial transport aircraft. In Proceedings of the 17th Australian International Aerospace Congress (AIAC), Melbourne, Australia, 26–28 February 2017; pp. 26–28. [Google Scholar]
Tokadlı, G.; Dorneich, M.C.; Matessa, M. Development approach of playbook interface for human-autonomy teaming in single pilot operations. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Seattle, WA, USA, 28 October–1 November 2019; Sage: Los Angeles, CAS, USA, 2019; pp. 357–361. [Google Scholar]
Tokadlı, G.; Dorneich, M.C.; Matessa, M. Evaluation of playbook delegation approach in human-autonomy teaming for single pilot operations. Int. J. Hum.–Comput. Interact. 2021, 37, 703–716. [Google Scholar] [CrossRef]
Li, F.; Feng, S.; Yan, Y.; Lee, C.-H.; Ong, Y.S. Virtual Co-Pilot: Multimodal Large Language Model-enabled Quick-access Procedures for Single Pilot Operations. arXiv 2024, arXiv:2403.16645. [Google Scholar]
Montana, D.; Hussain, T.; Vidver, G. A Genetic-Algorithm-Based Reconfigurable Scheduler; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Zhao, C.; Zhang, W.; Dong, F.; Dai, J.; Dong, L. Research on Resource Allocation Method of Integrated Avionics System considering Fault Propagation Risk. Int. J. Aerosp. Eng. 2022, 2022, 8652818. [Google Scholar] [CrossRef]
Xing, P.P. Dynamic Reconfiguration Strategy and Reliability Model Analysis of Integrated Avionics System. Ph.D. Thesis, Civil Aviation University of China, Tianjin, China, 2020. [Google Scholar]
Zhang, T.; Chen, J.; Lv, D.; Liu, Y.; Zhang, W.; Ma, C. Automatic Generation of Reconfiguration Blueprints for IMA Systems Using Reinforcement Learning. IEEE Embed. Syst. Lett. 2021, 13, 182–185. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, W.; Dai Ling, C.; Wang, L.; Wei, Q. Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-Agent Reinforcement Learning. Acta Electron. Sin. 2022, 50, 954–966. [Google Scholar]
Li, D.; Tian, Y.; Zhao, C.; Zou, J. Deep Reinforcement Learning-Based Constrained Optimal Reconfiguration Scheme for Integrated Modular Avionics System. In Proceedings of the 3rd International Conference on Electrical Engineering and Mechatronics Technology (ICEEMT), Nanjing, China, 21–23 July 2023; pp. 388–391. [Google Scholar]
Dong, L.; Chen, H.; Chen, X.; Zhao, C. Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN. Acta Aeronaut. Astronaut. Sin. 2023, 44, 180–195. [Google Scholar]
Neis, S.M.; Klingauf, U.; Schiefele, J. Classification and review of conceptual frameworks for commercial single pilot operations. In Proceedings of the IEEE/AIAA 37th Digital Avionics Systems Conference (DASC), London, UK, 23–27 September 2018; pp. 1–8. [Google Scholar]
Liu, J.; Gardi, A.; Ramasamy, S.; Lim, Y.; Sabatini, R. Cognitive pilot-aircraft interface for single-pilot operations. Knowl.-Based Syst. 2016, 112, 37–53. [Google Scholar] [CrossRef]
Yin, J.; Zhu, Z. Flight Autonomy Impact to the Future Avionics Architecture. In Proceedings of the IEEE/AIAA 37th Digital Avionics Systems Conference (DASC), London, UK, 23–27 September 2018; pp. 1–7. [Google Scholar]
Wang, M.; Luo, Y.; Huang, K.; Zhao, P.; Wang, G. Optimization and verification of single pilot operations model for commercial aircraft based on biclustering method. Chin. J. Aeronaut. 2023, 36, 286–305. [Google Scholar] [CrossRef]
Tang, M.; Xiahou, T.; Liu, Y. Mission performance analysis of phased-mission systems with cross-phase competing failures. Reliab. Eng. Syst. Saf. 2023, 234, 109174. [Google Scholar] [CrossRef]
Chen, Y.; Luo, Y.; Wang, M.; Zhong, G.; Xiao, G.; Wang, G. DFCluster: An efficient algorithm to mine maximal differential biclusters for single pilot operations task synthesis safety analysis. Chin. J. Aeronaut. 2022, 35, 400–418. [Google Scholar] [CrossRef]
Zhou, G.; Xu, J.; Ma, S.; Zong, J.; Shen, J.; Zhu, H. Review of key technologies for avionics systems integration on large passenger aircraft. Acta Aeronaut. Astronaut. Sin. 2024, 45, 253–295. [Google Scholar]
He, F. Theory and Approach to Avionics System Integrated Scheduling; Tsinghua University Press: Beijing, China, 2017. [Google Scholar]
Zhao, C.; He, F.; Li, H.; Wang, P. Dynamic reconfiguration method based on effectiveness for advanced fighter avionics system. Acta Aeronaut. Astronaut. Sin. 2020, 41, 355–365. [Google Scholar]
Wang, P.; Liu, J.; Dong, L.; Zhao, C. Task oriented DIMA dynamic reconfiguration strategy for civil aircraft. Syst. Eng. Electron. 2021, 43, 1618–1627. [Google Scholar]
Li, J.P.; Wang, Y.; Bai, Y. Design and Implementation for Adaptive Scheduling Strategy of AFDX Virtual Link. Comput. Meas. Control. 2012, 20, 1986–1988. [Google Scholar]
Li, P.; Xiao, Z.; Wang, X.; Huang, K.; Huang, Y.; Gao, H. EPtask: Deep reinforcement learning based energy-efficient and priority-aware task scheduling for dynamic vehicular edge computing. IEEE Trans. Intell. Veh. 2023, 9, 1830–1846. [Google Scholar] [CrossRef]
Mnih, V. Asynchronous Methods for Deep Reinforcement Learning. arXiv 2016, arXiv:1602.01783. [Google Scholar]
Wang, Z.; Bapst, V.; Heess, N.; Mnih, V.; Munos, R.; Kavukcuoglu, K.; De Freitas, N. Sample efficient actor-critic with experience replay. arXiv 2016, arXiv:1611.01224. [Google Scholar]
Han, S.-Y.; Liang, T. Reinforcement-learning-based vibration control for a vehicle semi-active suspension system via the PPO approach. Appl. Sci. 2022, 12, 3078. [Google Scholar] [CrossRef]
Vince, A. A framework for the greedy algorithm. Discret. Appl. Math. 2002, 121, 247–260. [Google Scholar] [CrossRef]

Figure 1. A taxonomy of SPO operating conditions.

Figure 2. Resource allocation framework for the avionics system in SPO mode.

Figure 3. “Mission-function-resource” hierarchy of the avionics system in SPO scenario.

Figure 4. Example of resource allocation latency.

Figure 5. The structure of the Proximal Policy Optimization algorithm.

Figure 6. Extraction process of avionics system operation scenario information in SPO mode.

Figure 7. Comparison of hyperparameter selection test: (a) Learning rate

α

= 1 × 10⁻²; (b) Learning rate

α

= 1 × 10⁻³; (c) Learning rate

α

= 1 × 10⁻⁴.

Figure 7. Comparison of hyperparameter selection test: (a) Learning rate

α

= 1 × 10⁻²; (b) Learning rate

α

= 1 × 10⁻³; (c) Learning rate

α

= 1 × 10⁻⁴.

Figure 8. Comparison of convergence performance for different methods.

Figure 9. Comparison of convergence performance for different methods: (a) Resource allocation scheme of PPO; (b) Resource allocation scheme of ACER; (c) Resource allocation scheme of Greedy.

Figure 10. Comparison of power consumption optimization effects for different algorithms.

Figure 11. Comparison of latency optimization effects for different algorithms.

Figure 12. Information from the avionics system operation scenario in SPO mode: (a) Operation scenario B; (b) Operation scenario C.

Figure 13. Comparison of power consumption optimization effects for different scenarios.

Figure 14. Comparison of latency optimization effects for different scenarios.

Table 1. Parameter setting for DRL model.

Experimental Parameters	Value
Learning rate α	1 × 10⁻⁴
Discount factor γ	0.99
Number of total timesteps T	1 × 10⁵
Number of hidden layers	2
Number of neurons in each hidden layer	64
Number of CPMs M	4
Number of partitions N	10
$Power consumption of partition in the idle mode P^{0 %}$	10 W
$Power consumption of partition in the full mode P^{100 %}$	25 W
Number of tasks K	300

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, L.; Liu, J.; Sun, Z.; Chen, X.; Wang, P. Resource Allocation Approach of Avionics System in SPO Mode Based on Proximal Policy Optimization. Aerospace 2024, 11, 812. https://doi.org/10.3390/aerospace11100812

AMA Style

Dong L, Liu J, Sun Z, Chen X, Wang P. Resource Allocation Approach of Avionics System in SPO Mode Based on Proximal Policy Optimization. Aerospace. 2024; 11(10):812. https://doi.org/10.3390/aerospace11100812

Chicago/Turabian Style

Dong, Lei, Jiachen Liu, Zijing Sun, Xi Chen, and Peng Wang. 2024. "Resource Allocation Approach of Avionics System in SPO Mode Based on Proximal Policy Optimization" Aerospace 11, no. 10: 812. https://doi.org/10.3390/aerospace11100812

APA Style

Dong, L., Liu, J., Sun, Z., Chen, X., & Wang, P. (2024). Resource Allocation Approach of Avionics System in SPO Mode Based on Proximal Policy Optimization. Aerospace, 11(10), 812. https://doi.org/10.3390/aerospace11100812

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Resource Allocation Approach of Avionics System in SPO Mode Based on Proximal Policy Optimization

Abstract

1. Introduction

2. System Model and Problem Description

2.1. Resource Allocation Framework for Avionics System in SPO Mode

2.2. Resource Allocation Model for Avionics System in SPO Mode

2.2.1. Hierarchical Architecture of Avionics System

2.2.2. Resource Allocation Mechanism Based on RIMA

2.2.3. Resource Allocation Constraints and Optimization Indicators

3. Problem Transformation and Algorithm Design

3.1. MDP for Avionics Resource Allocation in SPO Mode

3.1.1. State Space

3.1.2. Action Space

3.1.3. Reward Function

3.2. PPO Algorithm Network Model

3.3. Avionics Resource Allocation Based on the PPO Algorithm

4. Simulation Results and Analysis

4.1. Simulation Settings

4.1.1. Experimental Parameters

4.1.2. Operation Scenario Information

4.1.3. Hyperparameters

4.2. Simulation Experiments for Different Comparative Methods

4.3. Simulation Experiments for Different Comparative Operation Scenarios

5. Conclusions and Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI