Enhancing Quadcopter Autonomy: Implementing Advanced Control Strategies and Intelligent Trajectory Planning

: In this work, an in-depth investigation into enhancing quadcopter autonomy and control capabilities is presented. The focus lies on the development and implementation of three conventional control strategies to regulate the behavior of quadcopter UAVs: a proportional–integral–derivative (PID) controller, a sliding mode controller, and a fractional-order PID (FOPID) controller. Utilizing careful adjustments and fine-tuning, each control strategy is customized to attain the desired dynamic response and stability during quadcopter flight. Additionally, an approach called Dyna-Q learning for obstacle avoidance is introduced and seamlessly integrated into the control system. Leveraging MATLAB as a powerful tool, the quadcopter is empowered to autonomously navigate complex environments, adeptly avoiding obstacles through real-time learning and decision-making processes. Extensive simulation experiments and evaluations, conducted in MATLAB 2018a, precisely compare the performance of the different control strategies, including the Dyna-Q learning-based obstacle avoidance technique. This comprehensive analysis allows us to understand the strengths and limitations of each approach, guiding the selection of the most effective control strategy for specific application scenarios. Overall, this research presents valuable insights and solutions for optimizing flight stability and enabling secure and efficient operations in diverse real-world scenarios.


Introduction
As technology continues to evolve, the domain of drones presents a fascinating landscape of innovation.Among the diverse types of drones available, quadcopters stand out as highly promising and versatile unmanned aerial vehicles (UAVs), known for their exceptional maneuverability and ability to navigate challenging terrain.They have become invaluable tools in diverse applications, such as surveillance and disaster response [1,2].To fully harness their capabilities, it is essential to develop advanced control strategies and trajectory planning techniques that can effectively adapt to dynamic environments and avoid obstacles [3][4][5][6][7].
For implementation and simulation, various linear, nonlinear, and robust controllers have been studied [8][9][10].The PID and FOPID controllers have been applied on height, x and y positions, and roll, pitch, and yaw angles.Li, J. et al. proposed a trajectory-tracking control of a quadrotor based on a fractional-order S-Plane model; their study focused on the low tracking accuracy and weak anti-interference ability of quadcopter drones in trajectory-tracking control [11].Ademola, A. et al.'s study investigated the problem of the nonlinear quadcopter system's mathematical modeling and control for stabilization and trajectory tracking using the feedback linearization (FBL) technique combined with the PD controller [12].The implementation of a robust sliding mode controller (SMC) for trajectory tracking has been carried out by Yih, C.-C. et al.They demonstrate from the Lyapunov stability theorem that the proposed control scheme can guarantee the asymptotic stability of the tilt-rotor quadcopter in terms of position and attitude following control allocation [13].The Dyna-Q learning-based obstacle avoidance technique was the subject of many research works and was applied to a wide range of quadcopter controllers [14][15][16].Budiyanto, A. et al. proposed the application of the Deep Dyna-Q algorithm in formation control in both simulations and actual experiments [15].
The main objectives of this work revolve around enhancing quadcopter autonomy and control capabilities through comprehensive investigation.The first section of this work explores the principles of flight dynamics, encompassing the mechanics that govern quadcopter movement.The second section delves into the modeling aspect, presenting a detailed mathematical representation of quadcopter dynamics.This comprehensive quadcopter model covers both kinematics and dynamics, serving as a robust framework for controller design and stability analysis.Utilizing this model, we proceed to implement three distinct control approaches: the proportional-integral-derivative (PID) controller, the fractional-order PID controller, and the sliding mode controller (SMC).
The PID controller, widely known for its simplicity and widespread use, serves as the baseline for comparison among the control strategies.Additionally, the project explores the integration of fractional calculus into the PID controller to enhance control performance, particularly in managing system nonlinearities and uncertainties [6].Furthermore, the sliding mode controller's robustness against disturbances was evaluated.
In the final section of this work, the crucial aspect of obstacle avoidance is addressed through the development of an intelligent trajectory planning approach using Dyna-Q learning.This approach leverages real-time environmental data to generate safe and efficient paths for quadcopters, enabling them to autonomously navigate complex environments while intelligently avoiding obstacles.The proposed control strategies, together with trajectory planning using Dyna-Q learning, undergo extensive validation through simulations employing various performance metrics.
The anticipated outcomes of this research include significant advancements in quadcopter autonomy, enabling their deployment in increasingly challenging scenarios.By integrating advanced control strategies and intelligent trajectory planning, we aim to enhance the quadcopter's adaptability and responsiveness, making them indispensable assets in applications such as search and rescue missions, precision agriculture, and infrastructure inspection.Through an in-depth understanding of the strengths and limitations of various control strategies and innovative approaches like Dyna-Q learning, we strive to unlock the full potential of quadcopters and pave the way for their widespread adoption in diverse applications.

Quadcopter State Space Model
The state space vector X describes the position of the quadcopter in space and its linear and angular velocities as follows [5][6][7][8][17][18][19][20]: The control input vector for a quadcopter state space model is represented as follows: where U 1 controls the quadcopter's total thrust that is responsible for quadcopter altitude, while U 2 , U 3 , and U 4 control the moments that result in roll, pitch, and yaw angles, respectively, enabling precise control and stabilization of the quadcopter in response to external disturbances.Figure 1 illustrates the movement of a quadcopter in space.The x, y, and z variables represent the displacements of the quadcopter's center of mass from an Earth-fixed inertial frame along the respective x, y, and z axes.The quadcopter's orientation is represented by the three Euler angles: φ represents the roll angle around the x-axis, θ represents the pitch angle around the y-axis, and ψ represents the yaw angle around the z-axis [5][6][7].
inertial frame along the respective x, y, and z axes.The quadcopter's orientation is represented by the three Euler angles: φ represents the roll angle around the x-axis, θ represents the pitch angle around the y-axis, and ψ represents the yaw angle around the z-axis [5][6][7].In order to obtain the state space representation of the quadcopter, the following equations that describe the translational and rotational motion of the quadcopter are used [17][18][19][20]: The translation equation of motion in terms of control input is shown above; And the rotational equation of motion in terms of control input is shown above, where: mt is the total mass of the quadcopter;   and   ,   are the aerodynamics rotation coefficients matrix;   is the rotor's angular velocities about the axis where the rotation occurs (z-axis). And ,   , and   are the mass moments of inertia about the three principal axes in the body frame.
The nonlinear state differential equations are expressed according to the following: In order to obtain the state space representation of the quadcopter, the following equations that describe the translational and rotational motion of the quadcopter are used [17][18][19][20]: The translation equation of motion in terms of control input is shown above; And the rotational equation of motion in terms of control input is shown above, where: m t is the total mass of the quadcopter; K ax and K ay , K az are the aerodynamics rotation coefficients matrix; ω rz is the rotor's angular velocities about the axis where the rotation occurs (z-axis). And I xx , I yy , and I zz are the mass moments of inertia about the three principal axes in the body frame.
The nonlinear state differential equations are expressed according to the following: x 2 = U 1 m t (sin x 7 sin x 11 +cos x 7 sin x 9 cosx 11 ) − A x x 2 m t .

PID Controller
PID controllers, known as proportional-integral-derivative controllers, are widely employed due to their simplicity and high efficiency in numerous industrial applications [19][20][21][22].To design the PID controller for the nonlinear system, an initial step involves designing and tuning a controller for the linearized model.Subsequently, the designed controller is then implemented on the nonlinear quadcopter system.This approach simplifies the control design process by utilizing classical control techniques suitable for linear systems, allowing for an effective starting point in achieving stabilization.However, it is crucial to acknowledge that the linearized model is an approximation and may not fully capture all the complexities of the quadcopter's nonlinear dynamics.Therefore, further adjustments and fine-tuning of the actual nonlinear system may be required to optimize stability and overall performance in varying operating conditions.
Altitude Controller: A PID controller is designed to control the quadcopter's altitude, generating the control input U 1 that governs the quadcopter's altitude as shown below: Roll Controller: A PID controller is designed to generate the control input U 2 that governs the quadcopter's roll motion.The control law is presented by the following: Pitch Controller: a PID controller is designed to generate the control input U 3 that governs the quadcopter's pitch motion.The control law is presented by the following: Heading Control: A PID controller is designed to generate the control input U 4 that governs the quadcopter's yaw motion.The control law is presented by the following: Position Control: To control the position, two PID controllers are designed to generate the control signals, U x and U y , which represent the accelerations, ..
x and .. y, respectively.The formulated control laws are presented by the following:

Fractional-Order Controller
The nonlinear fractional controller, known as the PI λ D µ controller, represents a broader form of the classical PID controller.It expands the conventional integral and differential orders, λ and µ, beyond integer values into the real and complex domains [11,23,24].

Type of Controller
λ µ The transfer function of a fractional-order controller has the following form: The equation for the PI λ D µ controller output in the time domain is as follows: K P , K I , and K D represent the proportional, integral, and derivative gains, respectively, and e(t) denotes the error between the desired and the obtained results.In addition to these parameters, the FOPID controller introduces two additional degrees of freedom, µ and λ, which play a crucial role in enhancing the controller's performance and providing greater flexibility in its design.These extra degrees of freedom allow for superior control capabilities and improved adaptability to various control tasks.After obtaining the appropriate PID gains for the nonlinear system, we proceeded to implement these gains into the fractional-order PID (FOPID) controller.The FOPID controller is designed using the FOMCON toolbox in MATLAB 2018a, which allows for tuning the parameters µ and λ for each controller.Through a systematic process, we fine-tuned these parameters and made necessary adjustments to the gains to account for any observed system oscillations.This iterative tuning procedure aimed to achieve improved control performance and stability for the quadcopter system under consideration.

Sliding Mode Controller
The sliding mode controller (SMC) technique is a nonlinear control approach that alters the dynamics of a nonlinear system using a discontinuous control signal.This signal compels the system to slide along a predefined cross-section of its normal behavior, showcasing remarkable accuracy and robustness [12,25,26].The SMC technique comprises two essential components.Firstly, a discontinuous control law is employed to drive the error vector towards a specific decision rule, referred to as the sliding surface.Once the error vector reaches this surface, a continuous component of the controller takes over to follow the system dynamics defined by the equations characterizing the sliding surface.The selection of the sliding surface or decision rule is a critical aspect and is based on performance criteria since it determines the system's dynamics.Therefore, it can be expressed as follows [12]: where S is the sliding surface, λ is a positive tuning parameter, and e is the tracking error; .
and where K1, K2 are positive tuning parameters.The sliding surface, its derivative, develops controllers for attitude, altitude, position, and heading.
Altitude sliding mode controller: Considering the error between the desired altitude and the actual one provides the following equation: The error in Equation ( 13) is replaced in Equations ( 14) and ( 15), and thus the following equations are obtained: Equation ( 16) can be written as follows: ..
The control input U1 law is obtained and represented by the following equation: The attitude sliding mode controllers are presented below: Roll controller: ..
The control input U2 law is obtained and represented by the following equation: Pitch controller: ..
The control input U3 law is obtained and represented by the following equation: Heading sliding mode controller: ..
The control input law is obtained and represented as follows: Position sliding mode controller: ..
In this part, the trajectory of the quadcopter was refined using MATLAB and explored three distinct controllers: PID, fractional-order PID, and sliding mode.By simulating the quadcopter's dynamics and subjecting these controllers to rigorous testing, our objective was to significantly enhance its path-tracking performance.Meticulous adjustments were systematically implemented to improve the quadcopter's precision in trajectory tracking.This part underscores our practical approach to trajectory enhancement, seamlessly integrating simulated dynamics with real-world control experimentation.

Results
The presented results encompass the performance of the PID, fractional-order PID, and sliding mode controllers.The simulation was conducted under various scenarios to assess their effectiveness in trajectory regulation and error minimization.The outcomes underline the adaptability of these control methods to the nonlinear model, as well as their robustness in handling disturbances.Before delving into the results of controllers both with and without disturbances, it is important to discuss the reason behind introducing disturbances into the simulation.The inclusion of disturbances in our study serves two significant purposes: firstly, to emulate real-world scenarios where systems often encounter unpredictable external influences, and secondly, to evaluate the controllers' ability to manage such disturbances and maintain stable performance.By subjecting the controllers to varying disturbance levels, we gain valuable insights into their robustness and adaptability, allowing us to make well-informed assessments of their practical efficacy.

•
Linear model control The fully tuned PID controller for the linear model displayed accurate trajectory regulation and effective control over the system's behavior.The obtained trajectory closely approximates the desired trajectory, exhibiting a small deviation in its initial phase.Therefore, the tuning process is validated.Figure 2 shows the obtained and desired trajectories.
Automation 2024, 5, FOR PEER REVIEW 7 In this part, the trajectory of the quadcopter was refined using MATLAB and explored three distinct controllers: PID, fractional-order PID, and sliding mode.By simulating the quadcopter's dynamics and subjecting these controllers to rigorous testing, our objective was to significantly enhance its path-tracking performance.Meticulous adjustments were systematically implemented to improve the quadcopter's precision in trajectory tracking.This part underscores our practical approach to trajectory enhancement, seamlessly integrating simulated dynamics with real-world control experimentation.

Results
The presented results encompass the performance of the PID, fractional-order PID, and sliding mode controllers.The simulation was conducted under various scenarios to assess their effectiveness in trajectory regulation and error minimization.The outcomes underline the adaptability of these control methods to the nonlinear model, as well as their robustness in handling disturbances.Before delving into the results of controllers both with and without disturbances, it is important to discuss the reason behind introducing disturbances into the simulation.The inclusion of disturbances in our study serves two significant purposes: firstly, to emulate real-world scenarios where systems often encounter unpredictable external influences, and secondly, to evaluate the controllers' ability to manage such disturbances and maintain stable performance.By subjecting the controllers to varying disturbance levels, we gain valuable insights into their robustness and adaptability, allowing us to make well-informed assessments of their practical efficacy.

Linear model control
The fully tuned PID controller for the linear model displayed accurate trajectory regulation and effective control over the system's behavior.The obtained trajectory closely approximates the desired trajectory, exhibiting a small deviation in its initial phase.Therefore, the tuning process is validated.Figure 2 shows the obtained and desired trajectories.Table 2 showcases the PID gains, settling time, and percentage overshoot for positions x, y, and z and orientations roll, pitch, and yaw.These parameters provide crucial insights into the PID controller's performance, stability, and accuracy in regulating the system.The PID gains obtained from the linear model are used as a starting point for the PID gains of the nonlinear model.
Table 3 displays the PID gains for positions x, y, and z and orientations roll, pitch, and yaw in the nonlinear model.Table 2 showcases the PID gains, settling time, and percentage overshoot for positions x, y, and z and orientations roll, pitch, and yaw.These parameters provide crucial insights into the PID controller's performance, stability, and accuracy in regulating the system.The PID gains obtained from the linear model are used as a starting point for the PID gains of the nonlinear model.
Table 3 displays the PID gains for positions x, y, and z and orientations roll, pitch, and yaw in the nonlinear model.Table 2 showcases the PID gains, settling time, and percentage overshoot for positions x, y, and z and orientations roll, pitch, and yaw.These parameters provide crucial insights into the PID controller's performance, stability, and accuracy in regulating the system.Using the PID gains derived from the linear model as a starting point, we fine-tuned the PID controller for the nonlinear model and obtained the following results shown in Figure 5: Automation 2024, 5, FOR PEER REVIEW 9 Using the PID gains derived from the linear model as a starting point, we fine-tuned the PID controller for the nonlinear model and obtained the following results shown in Figure 5:       Using the PID gains derived from the linear model as a starting point, we fine-tuned the PID controller for the nonlinear model and obtained the following results shown in Figure 5:      Disturbance was introduced to the closed loop system in the interval t = {10 s-13 s}.The results are shown in Figure 8.
Figure 9 illustrates the control inputs from the PID controller with a disturbance.Figure 10 shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time: Disturbance was introduced to the closed loop system in the interval t={10s-13s}.The results are shown in Figure 8.     Disturbance was introduced to the closed loop system in the interval t={10s-13s}.The results are shown in Figure 8.      Disturbance was introduced to the closed loop system in the interval t={10s-13s}.The results are shown in Figure 8.The initial gains were obtained from the PID controller for the nonlinear model.Subsequently, these gains were further adjusted to optimize the performance of the fractionalorder PID controller.The evaluation of the fractional-order PID controller was carried out

Fractional-Order PID Controller Results
The initial gains were obtained from the PID controller for the nonlinear model.Subsequently, these gains were further adjusted to optimize the performance of the fractionalorder PID controller.The evaluation of the fractional-order PID controller was carried out under both disturbance and non-disturbance conditions to highlight its adaptability and effectiveness in handling external influences.The following section dives into the results of these simulations, providing insights into the performance and robustness of the fractional-order PID controller in real-world scenarios.Table 4 displays the fractional-order PID controller gains for positions x, y, and z and orientations roll, pitch, and yaw, as well as the parameters µ and λ.These optimized gains are essential in ensuring accurate and stable control of the system, allowing the fractional-order PID controller to regulate the trajectory effectively under the influence of nonlinear dynamics.Figure 11 shows some modifications of the gains obtained from the previous PID controller and by tuning the parameters of the fractional-order PID controller.

Fractional-Order PID Controller Results
The initial gains were obtained from the PID controller for the nonlinear model.Subsequently, these gains were further adjusted to optimize the performance of the fractionalorder PID controller.The evaluation of the fractional-order PID controller was carried out under both disturbance and non-disturbance conditions to highlight its adaptability and effectiveness in handling external influences.The following section dives into the results of these simulations, providing insights into the performance and robustness of the fractional-order PID controller in real-world scenarios.Table 4 displays the fractional-order PID controller gains for positions x, y, and z and orientations roll, pitch, and yaw, as well as the parameters μ and λ.These optimized gains are essential in ensuring accurate and stable control of the system, allowing the fractional-order PID controller to regulate the trajectory effectively under the influence of nonlinear dynamics.Figure 11 shows some modifications of the gains obtained from the previous PID controller and by tuning the parameters of the fractional-order PID controller.Figure 12 depicts the control inputs from the fractional-order PID controller without disturbance.
Automation 2024, 5, FOR PEER REVIEW 12 Figure 12 depicts the control inputs from the fractional-order PID controller without disturbance.Figure 13 shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.   Figure 13 shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.After introducing disturbance to the closed loop system in the interval t={10s-13s}, the following results shown in Figure 14 were obtained:   After introducing disturbance to the closed loop system in the interval t={10s-13s}, the following results shown in Figure 14 were obtained: Figure 13 shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.After introducing disturbance to the closed loop system in the interval t={10s-13s}, the following results shown in Figure 14 were obtained:    Figure 16 shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.   Figure 16 shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Sliding Mode Controller Results
The simulation of the sliding mode controller starts with initial gains set to "ones", which is then fine-tuned to achieve optimal values, ensuring the controller's robustness and adaptability.Through a comprehensive exploration of various scenarios, with and without disturbances, we examine the controller's adeptness in regulating trajectory outputs and ensuring system stability.This section presents a detailed analysis of the simulation results, providing valuable insights into the sliding mode controller's efficiency in handling intricate dynamics and external perturbations.Table 5 displays the sliding mode gains for positions x, y, and z, and orientations roll, pitch, and yaw for the nonlinear model.These optimized gains are essential in ensuring accurate and stable control of the system, allowing the sliding mode controller to regulate the trajectory effectively under the influence of nonlinear dynamics.

Sliding Mode Controller Results
The simulation of the sliding mode controller starts with initial gains set to "ones", which is then fine-tuned to achieve optimal values, ensuring the controller's robustness and adaptability.Through a comprehensive exploration of various scenarios, with and without disturbances, we examine the controller's adeptness in regulating trajectory outputs and ensuring system stability.This section presents a detailed analysis of the simulation results, providing valuable insights into the sliding mode controller's efficiency in handling intricate dynamics and external perturbations.Table 5 displays the sliding mode gains for positions x, y, and z, and orientations roll, pitch, and yaw for the nonlinear model.These optimized gains are essential in ensuring accurate and stable control of the system, allowing the sliding mode controller to regulate the trajectory effectively under the influence of nonlinear dynamics.
Through careful adjustment of the gains, the results presented in Figure 17 demonstrate effective and stable control in the absence of disturbances.These outcomes serve as proof of the sliding mode controller's adaptability in successfully handling nonlinear dynamics.
Automation 2024, 5, FOR PEER REVIEW 14 Through careful adjustment of the gains, the results presented in Figure 17 demonstrate effective and stable control in the absence of disturbances.These outcomes serve as proof of the sliding mode controller's adaptability in successfully handling nonlinear dynamics.Despite the presence of disturbances, the sliding mode controller in Figure 20 demonstrates exceptional adaptability by effectively handling nonlinear dynamics, maintaining stability, and achieving precise trajectory tracking.
Automation 2024, 5, FOR PEER REVIEW 15 Despite the presence of disturbances, the sliding mode controller in Figure 20 demonstrates exceptional adaptability by effectively handling nonlinear dynamics, maintaining stability, and achieving precise trajectory tracking.

Discussion
Each controller has its advantages and suitability for specific applications.The PID controller is reliable and widely used, while the fractional-order PID controller provides enhanced flexibility.Meanwhile, the sliding mode controller excels in handling

Discussion
Each controller has its advantages and suitability for specific applications.The PID controller is reliable and widely used, while the fractional-order PID controller provides enhanced flexibility.Meanwhile, the sliding mode controller excels in handling

Discussion
Each controller has its advantages and suitability for specific applications.The PID controller is reliable and widely used, while the fractional-order PID controller provides enhanced Meanwhile, the sliding mode controller excels in handling disturbances and uncertainties.The choice of the controller depends on the specific requirements and characteristics of the control system.Further research and experimentation are encouraged to explore these controllers' performance in different applications and real-world scenarios.
The PID controller, originally designed for the linear model, initially exhibited a slight perturbation in trajectory due to a disparity between the desired y trajectory's starting point (set at 5) and the quadcopter's actual initial position (which is 0).Additionally, a minor delay of approximately 1 s was observed in trajectory tracking, which was attributed to the system's inherent oscillatory behavior.When applied to the nonlinear model, the PID controller displayed an initial time lag aligning with the trajectory.It manifested noticeable perturbations at the beginning of the trajectory: around 14 s for the x component and 5 s for the y component.However, beyond this initial phase, the controller gradually stabilized, closely approaching the desired path, while exhibiting a slight delay and a minor error.The PID controller demonstrated an acceptable performance in handling disturbances.It effectively managed the impact of disturbances on the y component.However, it exhibited a delayed reaction in addressing the deviations in the x component.It is worth noting the consistent presence of a delay in trajectory tracking, as well as a minor error that persisted along the trajectory.Regarding the altitude and heading components, the PID controller's performance was notably satisfactory both in the presence and absence of disturbances.Despite these observations, the PID controller's handling of disturbances remained acceptable overall.

Comparison
After evaluating the PID, fractional-order PID (FOPID), and sliding mode controllers for trajectory tracking, distinct performance characteristics came to light: The PID controller displayed effective disturbance handling even though there were considerable delays and overshoots.The fractional-order PID controller exhibited remarkable performance in rejecting disturbances; however, there is still some presence of error.
On the contrary, the sliding mode controller excelled comprehensively.The sliding mode controller not only achieved accurate trajectory tracking with minimal delays but also robustly handled disturbances, maintained minimal errors, achieved quick settling times, and demonstrated negligible overshoot.Taking these attributes into account, the sliding mode controller emerges as the most efficient choice, showcasing exceptional performance and suitability for achieving precise and effective trajectory tracking across many scenarios.
This section has provided a comprehensive evaluation of three controllers: PID, fractional-order PID, and sliding mode.
The PID controller demonstrated reliability and adaptability, yet with delays and overshoots.The fractional-order PID exhibited potential and robustness, but there were still slight errors present, along with initial instability.In contrast, the sliding mode controller stood out with precise trajectory tracking, robust disturbance handling, rapid settling times, and minimal overshoot.
This exceptional performance positions the sliding mode controller as the preferred choice for real-world applications requiring accurate trajectory tracking and offering valuable insights for practical implementation.
When controlling a quadcopter, the PID, FOPID, and sliding mode controllers (SMCs) offer different advantages and challenges.The PID controller is simple and widely used but struggles with nonlinearities and disturbances.The FOPID controller, with fractional calculus, provides better flexibility and robustness but is complex to design and computationally intensive.The SMC excels in robustness and handling nonlinearities, making it ideal for quadcopters, but it suffers from implementation complexity and chattering issues.The choice of control method hinges on balancing performance requirements, system complexity, and implementation feasibility.

Enhancing Quadcopter Trajectory Tracking through Dyna-Q Learning
Trajectory planning and obstacle avoidance are important for quadcopter navigation, ensuring safe and efficient paths in complex environments.By employing reinforcement learning, particularly the Dyna-Q approach, quadcopters can enhance their decision-making and adapt their flight trajectories.This combination of strategic path planning and adaptive obstacle avoidance, aided by advanced machine learning, allows quadcopters to optimize their operations, prevent collisions, and maintain stability while dynamically adjusting to their surroundings and achieving mission objectives [27-31].

Reinforcement Learning Approaches
Reinforcement learning is a type of machine learning approach where an agent learns to make decisions by interacting with an environment [12,32].Figure 23 shows the reinforcement learning block diagram.

Enhancing Quadcopter Trajectory Tracking through Dyna-Q Learning
Trajectory planning and obstacle avoidance are important for quadcopter navigation, ensuring safe and efficient paths in complex environments.By employing reinforcement learning, particularly the Dyna-Q approach, quadcopters can enhance their decision-making and adapt their flight trajectories.This combination of strategic path planning and adaptive obstacle avoidance, aided by advanced machine learning, allows quadcopters to optimize their operations, prevent collisions, and maintain stability while dynamically adjusting to their surroundings and achieving mission objectives [27-31].

Reinforcement Learning Approaches
Reinforcement learning is a type of machine learning approach where an agent learns to make decisions by interacting with an environment [12,32].Figure 23 shows the reinforcement learning block diagram.The agent learns and makes decisions and takes actions to maximize cumulative rewards over time, adjusting its behavior based on the reward received from the environment that indicates how good or bad the actions were.This feedback loop helps the agent learn optimal strategies for achieving specific goals, making it well suited for tasks that involve sequential decision-making in dynamic environments [12,32].The agent learns and makes decisions and takes actions to maximize cumulative rewards over time, adjusting its behavior based on the reward received from the environment that indicates how good or bad the actions were.This feedback loop helps the agent learn optimal strategies for achieving specific goals, making it well suited for tasks that involve sequential decision-making in dynamic environments [12,32].

Agent Environment
Reinforcement learning approaches consist of model-free and model-based approaches.The model-based approach includes learning the model or being provided with the model, while the model-free approach involves policy optimization and Q-learning techniques.Dyna-Q learning combines both learning the model and Q-learning to optimize the learning process effectively.In reinforcement learning, the Markov Decision Process (MDP) is used to model the interactions between an agent and the environment, helping the agent maximize cumulative rewards in uncertain environments [12,27,29,32,33].MDPs aim to determine policies that guide the agent's actions: - The deterministic policy specifies a single action for each state; for every state (s) there is a clear action choice π : S → A that the agent follows.- The stochastic policy assigns a distribution over actions to each state following the policy π such that π: S → proba(A), where the agent decides actions based on probabilities for each state (s).This way, the agent can choose different actions in a state, with each option having its own chance of being selected.
Environments can also be deterministic (outcomes of actions are predictable) and stochastic (outcomes of actions are probabilistic and uncertain).Dyna-Q learning effectively navigates both deterministic and stochastic environments by optimizing the agent's decision.

Q-Learning Algorithms
Q-learning is a useful technique for improving how quadcopters are controlled.Scientists have tried different versions of Q-learning to solve issues in quadcopter control, such as not often receiving rewards or dealing with complicated situations.By using Qlearning, quadcopters can quickly change how they act, deal with different situations, and perform tasks on their own.This has the potential to enable these aerial robots to perform more advanced tasks in the future.The Q-learning technique leads the quadcopter to the development of a value function that helps make smart decisions based on the rewards they expect to receive [28,30,[33][34][35].
The Q-value Qπ(s, a) estimates the expected reward starting from state (s), taking action (a), and following policy π, while the V-function Vπ(s) estimates the cumulative reward starting from state (s) under policy π.The optimal policy π * maximizes the expected cumulative rewards.The agent faces the exploration-exploitation dilemma, where it must balance trying new actions (exploration) to learn about the environment and selecting known actions (exploitation) to maximize rewards.The epsilon-greedy strategy manages this balance by occasionally choosing random actions to explore while mostly selecting the best-known actions.This process continuously updates the Q-values, guiding the agent towards optimal decisions.
In the domain of reinforcement learning, the temporal difference (TD) error plays a crucial role in updating the expected return of an agent's actions as it transitions from one state to another.The TD error captures the difference between the current estimate of the Q-value of a state-action pair and the updated estimate based on observed outcomes.Mathematically, the TD error is defined as follows [12,28]: The temporal difference (TD) update equation for Q-learning, which is a model-free reinforcement learning algorithm, is used to update the Q-values based on the observed rewards and transitions.The equations are as follows [15,34]: where α is the learning rate: • High Learning Rate (α near 1): The agent will be highly responsive to the most recent experiences.

•
Low Learning Rate (α near 0): The agent will be less responsive to new experiences and will rely more on existing knowledge.
Figure 24 shows the Q-learning algorithm block diagram.

•
Low Learning Rate (α near 0): The agent will be less responsive to new experiences and will rely more on existing knowledge.
Figure 24 shows the Q-learning algorithm block diagram.

Implementation of Dyna-Q Learning for Trajectory Planning
Reinforcement learning begins with an agent's Q-table containing initial values.The agent explores and refines Q-values through interactions, guiding actions for higher rewards.Dyna-Q learning combines real experiences with simulations, helping the agent

Implementation of Dyna-Q Learning for Trajectory Planning
Reinforcement learning begins with an agent's Q-table containing initial values.The agent explores and refines Q-values through interactions, guiding actions for higher rewards.Dyna-Q learning combines real experiences with simulations, helping the agent navigate complex environments efficiently by learning to avoid obstacles and achieve goals [10,12].It maintains a Q-table and uses an environment model to accelerate learning.This approach strikes a balance between real and simulated experiences [15,16,[34][35][36].
In reinforcement learning for optimal policy derivation, the Bellman equation for the state-value function V(s) is defined as follows: This leads to the Bellman equation for the action-value function: To create a transition matrix: T[s, a, s ′ ] = P(s ′ | s, a) The agent's possible states and actions are enumerated, and transition probabilities are assigned; the agent randomly selects an action while it is in a random state, then transitions to the next state according to the updated transition matrix, measures the reward according to the reward function, and updates the Q-values.This process is repeated in the m planning steps.
Figure 25 illustrates the Dyna-Q learning algorithm block diagram.
The TD (temporal difference) error for a state-action pair (s, a) in Dyna-Q is expressed as follows: Real-world environments often have some level of uncertainty, and incorporating stochastic elements in the learning process can be beneficial to train agents that perform well in uncertain and dynamic scenarios.Ultimately, the choice between deterministic and stochastic environments depends on the application and the learning goals.
The total reward and steps in the deterministic environment are shown in Figure 26.
-A lower exploration rate ε is better for obstacle avoidance during the initial stages of learning.

Results
Table 6 summarizes the chosen values of the parameters used in the improved Dyna-Q learning algorithm.Real-world environments often have some level of uncertainty, and incorporating stochastic elements in the learning process can be beneficial to train agents that perform well in uncertain and dynamic scenarios.Ultimately, the choice between deterministic and stochastic environments depends on the application and the learning goals.
The total reward and steps in the deterministic environment are shown in Figure 26.The total reward and steps in the stochastic environment are shown in Figure 28.The trajectory path planning is represented in 2D and 3D spaces in Figure 29.The total reward and steps in the stochastic environment are shown in Figure 28.The total reward and steps in the stochastic environment are shown in Figure 28.The trajectory path planning is represented in 2D and 3D spaces in Figure 29.The trajectory path planning is represented in 2D and 3D spaces in Figure 29.The trajectory path planning is represented in 2D and 3D spaces in Figure 29.The stochastic environment seems to yield a more direct path from the starting point to the goal while avoiding obstacles effectively.This could be because the stochastic environment encourages the agent to explore and find efficient paths rather than taking a roundabout route.If this behavior aligns with your goals and the real-world scenario you are simulating, the stochastic environment might be more appropriate.

Dyna-Q Learning with Sliding Mode Controller
After using the path determined by Dyna-Q learning as the planned path and employing a sliding mode controller to manage the quadcopter's movements, the resulting flight path can be observed in both 3D and 2D spaces in Figure 30.The stochastic environment seems to yield a more direct path from the starting point to the goal while avoiding obstacles effectively.This could be because the stochastic environment encourages the agent to explore and find efficient paths rather than taking a roundabout route.If this behavior aligns with your goals and the real-world scenario you are simulating, the stochastic environment might be more appropriate.

Dyna-Q Learning with Sliding Mode Controller
After using the path determined by Dyna-Q learning as the path employing a sliding mode to manage the quadcopter's movements, the resulting flight path can be observed in both 3D and 2D spaces in Figure 30.Table 7 displays the sliding mode gains for positions x, y, and z and orientations roll, pitch, and yaw for the obtained trajectory from using Dyna-Q learning.The following results in Figure 31 were obtained: Table 7 displays the sliding mode gains for positions x, y, and z and orientations roll, pitch, and yaw for the obtained trajectory from using Dyna-Q learning.The following results in Figure 31  The following results in Figure 31 were obtained:    Figure 33 shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Discussion
The obtained results from the integration of the sliding mode controller with the Dyna-Q learning-based obstacle avoidance system were indeed remarkable.The Dyna-Q learning agent demonstrated impressive performance in learning optimal collision-free paths for the quadcopter in complex environments.Through iterations of simulation and interaction with various obstacles, the agent effectively applied a set of trajectories that enabled safe and efficient navigation.Employing these learned trajectories as the desired path for the quadcopter, the sliding mode controller showcased its ability to accurately track the trajectory, maintaining a high level of precision.
The sliding mode controller offers an excellent solution for trajectory planning and

Discussion
The obtained results from the integration of the sliding mode controller with the Dyna-Q learning-based obstacle avoidance system were indeed remarkable.The Dyna-Q learning agent demonstrated impressive performance in learning optimal collision-free paths for the quadcopter in complex environments.Through iterations of simulation and interaction with various obstacles, the agent effectively applied a set of trajectories that enabled safe and efficient navigation.Employing these learned trajectories as the desired path for the quadcopter, the sliding mode controller showcased its ability to accurately track the trajectory, maintaining a high level of precision.
The sliding mode controller offers an excellent solution for trajectory planning and obstacle avoidance due to its robustness, adaptability, and ability to manage constraints.

Discussion
The obtained results from the integration of the sliding mode controller with the Dyna-Q learning-based obstacle avoidance system were indeed remarkable.The Dyna-Q learning agent demonstrated impressive performance in learning optimal collision-free paths for the quadcopter in complex environments.Through iterations of simulation and interaction with various obstacles, the agent effectively applied a set of trajectories that enabled safe and efficient navigation.Employing these learned trajectories as the desired path for the quadcopter, the sliding mode controller showcased its ability to accurately track the trajectory, maintaining a high level of precision.
The sliding mode controller offers an excellent solution for trajectory planning and obstacle avoidance due to its robustness, adaptability, and ability to manage constraints.Its inherent capability to swiftly adjust control inputs to follow optimized trajectories, as demonstrated by its integration with Dyna-Q learning for obstacle avoidance, makes it a prime choice for achieving high-performance autonomous navigation.The SMC's capacity to ensure accurate and safe trajectory tracking, even in the presence of disturbances and uncertainties, makes it a key choice for enhancing the capabilities of autonomous systems operating in complex and obstacle-rich environments.
At last, the integration of Dyna-Q learning and the sliding mode controller presents a powerful solution for obstacle avoidance in autonomous quadcopter navigation.This approach, driven by data-driven reinforcement learning and robust real-time control, demonstrates remarkable efficiency, safety, and adaptability.By enabling quadcopters to learn optimal trajectories and swiftly respond to dynamic environments, this approach holds great promise for enhancing quadcopter autonomy and successfully navigating intricate space.

Conclusions
The objective of this study was to formulate a precise mathematical model for a quadcopter and to devise three distinct control strategies: linear PID, nonlinear fractional-order PID, and sliding mode controllers, with the aim of stabilizing the quadcopter's behavior.Through extensive simulations and precisely refining these controllers, important results were concluded.Remarkably, the fractional-order PID controller demonstrated superior performance compared to the conventional PID controller when it came to accurately tracking flight paths that exhibited dynamic changes.Additionally, the sliding mode controller showcased exceptional proficiency in handling the complexities of nonlinear dynamics and external disturbances.The utilization of the sliding mode controller extended to trajectory planning.Notably, through the integration of Dyna-Q learning, the quadcopter's navigational capabilities were augmented, setting the stage for enhanced autonomous navigation.Since the sliding mode controller (SMC) offered the best results among the three controllers in nonlinearity handling, it was chosen to be combined with Dyna-Q learning for quadcopter trajectory planning.Together, the SMC's precise control and Dyna-Q's adaptive planning create a powerful system for achieving efficient and reliable trajectory planning in dynamic environments.
In essence, this study successfully accomplished its dual objectives by formulating an adept mathematical model for the quadcopter and introducing three controllers.The fractional-order PID controller emerged as a frontrunner in adapting to varying flight paths, while the sliding mode controller excelled in managing complexities.This study's insights contribute to the advancement of both quadcopter control methodologies and trajectory planning, with potential applications in autonomous aerial navigation.

Figure 1 .
Figure 1.Movement of a quadcopter: body frame and inertial frame.

Figure 1 .
Figure 1.Movement of a quadcopter: body frame and inertial frame.

Figure 2 .
Figure 2. Desired and obtained trajectory simulation using the PID controller for the linear model.

Figure 3
Figure 3 illustrates the control inputs.

Figure 2 .
Figure 2. Desired and obtained trajectory simulation using the PID controller for the linear model.

Figure 4
Figure4shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 4 .
Figure 4. Errors in positions and orientations for the linear model using the PID controller.

Figure 3 .
Figure 3.Control inputs of the PID controller simulation for the linear model.

Figure 4
Figure4shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 3 .
Figure 3.Control inputs of the PID controller simulation for the linear model.

Figure 4
Figure4shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 4 .
Figure 4. Errors in positions and orientations for the linear model using the PID controller.

Figure 4 .
Figure 4. Errors in positions and orientations for the linear model using the PID controller.

Figure 5 .
Figure 5. Desired and obtained trajectory simulation using the PID controller for the nonlinear model without disturbance.

Figure 6
Figure 6 illustrates the control inputs from the PID controller without disturbance.

Figure 6 .
Figure 6.Control inputs of the PID controller simulation for the nonlinear model without disturbance.

Figure 7
Figure7shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 5 .
Figure 5. Desired and obtained trajectory simulation using the PID controller for the nonlinear model without disturbance.

Figure 6
Figure 6 illustrates the control inputs from the PID controller without disturbance.

Figure 5 .
Figure 5. Desired and obtained trajectory simulation using the PID controller for the nonlinear model without disturbance.

Figure 6
Figure 6 illustrates the control inputs from the PID controller without disturbance.

Figure 6 .
Figure 6.Control inputs of the PID controller simulation for the nonlinear model without disturbance.

Figure 7
Figure7shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 6 .
Figure 6.Control inputs of the PID controller simulation for the nonlinear model without disturbance.

Figure 7
Figure7shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.Disturbance was introduced to the closed loop system in the interval t = {10 s-13 s}.The results are shown in Figure8.Figure9illustrates the control inputs from the PID controller with a disturbance.Figure10shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time:

Figure 7 .
Figure 7. Errors in positions and orientations in the nonlinear model using the PID controller without disturbance.

Figure 8 .
Figure 8. Desired and obtained trajectory simulation using the PID controller for the nonlinear model with disturbance.

Figure 9
Figure9illustrates the control inputs from the PID controller with a disturbance.

Figure 9 .
Figure 9.Control inputs of the PID controller simulation for the nonlinear model with disturbance.

Figure 10
Figure10shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time:

Figure 7 .Figure 7 .
Figure 7. Errors in positions and orientations in the nonlinear model using the PID controller without disturbance.

Figure 8 .
Figure 8. Desired and obtained trajectory simulation using the PID controller for the nonlinear model with disturbance.

Figure 9
Figure9illustrates the control inputs from the PID controller with a disturbance.

Figure 9 .
Figure 9.Control inputs of the PID controller simulation for the nonlinear model with disturbance.

Figure 10
Figure10shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time:

Figure 8 .
Figure 8. Desired and obtained trajectory simulation using the PID controller for the nonlinear model with disturbance.

Figure 7 .
Figure 7. Errors in positions and orientations in the nonlinear model using the PID controller without disturbance.

Figure 8 .
Figure 8. Desired and obtained trajectory simulation using the PID controller for the nonlinear model with disturbance.

Figure 9
Figure9illustrates the control inputs from the PID controller with a disturbance.

Figure 9 .
Figure 9.Control inputs of the PID controller simulation for the nonlinear model with disturbance.

Figure 10
Figure10shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time:

Figure 9 . 11 Figure 10 .
Figure 9.Control inputs of the PID controller simulation for the nonlinear model with disturbance.Automation 2024, 5, FOR PEER REVIEW 11

Figure 10 .
Figure 10.Errors in positions and orientations for the nonlinear model using the PID controller with disturbance.

Figure 10 .
Figure 10.Errors in positions and orientations for the nonlinear model using the PID controller with disturbance.

Figure 11 .
Figure 11.Desired and obtained trajectory simulation using the FOPID controller without disturbance.Figure11.Desired and obtained trajectory simulation using the FOPID controller without disturbance.

Figure 11 .
Figure 11.Desired and obtained trajectory simulation using the FOPID controller without disturbance.Figure11.Desired and obtained trajectory simulation using the FOPID controller without disturbance.

Figure 12 .
Figure 12.Control inputs of the FOPID controller simulation for the nonlinear model without disturbance.

Figure 12 .
Figure 12.Control inputs of the FOPID controller simulation for the nonlinear model without disturbance.

Figure 13
Figure13shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 12 .
Figure 12.Control inputs of the FOPID controller simulation for the nonlinear model without disturbance.

Figure 13 .
Figure 13.Errors in positions and orientations for the nonlinear model using the FOPID controller without disturbance.

Figure 14 .
Figure 14.Desired and obtained trajectory simulation using the FOPID controller with disturbance.

Figure 15
Figure 15 depicts the control inputs from the FOPID controller with disturbance.

Figure 13 .
Figure 13.Errors in positions and orientations for the nonlinear model using the FOPID controller without disturbance.

Figure 12 .
Figure 12.Control inputs of the FOPID controller simulation for the nonlinear model without disturbance.

Figure 13 .
Figure 13.Errors in positions and orientations for the nonlinear model using the FOPID controller without disturbance.

Figure 14 .
Figure 14.Desired and obtained trajectory simulation using the FOPID controller with disturbance.

Figure 15
Figure 15 depicts the control inputs from the FOPID controller with disturbance.

Figure 14 .
Figure 14.Desired and obtained trajectory simulation using the FOPID controller with disturbance.

Figure 15 13 Figure 15 .
Figure 15 depicts the control inputs from the FOPID controller with disturbance.Automation 2024, 5, FOR PEER REVIEW 13

Figure 15 .
Figure 15.Control inputs of the FOPID controller simulation for the nonlinear model with disturbance.

Figure 16
Figure16 shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 15 .
Figure 15.Control inputs of the FOPID controller simulation for the nonlinear model with disturbance.

Figure 16 .
Figure 16.Errors in positions and orientations for the nonlinear model using the FOPID controller with disturbance.

Figure 16 .
Figure 16.Errors in positions and orientations for the nonlinear model using the FOPID controller with disturbance.

Figure 17 .
Figure 17.Desired and obtained trajectory simulation using the SMC without disturbance.

Figure 18
Figure18depicts the control inputs from a sliding mode controller without disturbance.

Figure 17 .
Figure 17.Desired and obtained trajectory simulation using the SMC without disturbance.

Figure 18
Figure18depicts the control inputs from a sliding mode controller without disturbance.

Figure 17 .
Figure 17.Desired and obtained trajectory simulation using the SMC without disturbance.

Figure 18
Figure18depicts the control inputs from a sliding mode controller without disturbance.

Figure 18 .
Figure 18.Control inputs of the SMC simulation for the nonlinear model without disturbance.

Figure 19
Figure19shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 19 .
Figure 19.Errors in positions and orientations for the nonlinear model using the SMC without disturbance.

Figure 18 .
Figure 18.Control inputs of the SMC simulation for the nonlinear model without disturbance.

Figure 19
Figure19shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 17 .
Figure 17.Desired and obtained trajectory simulation using the SMC without disturbance.

Figure 18
Figure18depicts the control inputs from a sliding mode controller without disturbance.

Figure 18 .
Figure 18.Control inputs of the SMC simulation for the nonlinear model without disturbance.

Figure 19
Figure19shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 19 .
Figure 19.Errors in positions and orientations for the nonlinear model using the SMC without disturbance.

Figure 19 .
Figure 19.Errors in positions and orientations for the nonlinear model using the SMC without disturbance.

Figure 20 .
Figure 20.Desired and obtained trajectory using the SMC with disturbance.

Figure 21
Figure 21 depicts the control inputs from the SMC with the presence of disturbance.

Figure 20 .
Figure 20.Desired and obtained trajectory using the SMC with disturbance.

Figure 21
Figure21depicts the control inputs from the SMC with the presence of disturbance.Figure22displays how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 20 .
Figure 20.Desired and obtained trajectory using the SMC with disturbance.

Figure 21
Figure21depicts the control inputs from the SMC with the presence of disturbance.

Figure 21 .
Figure 21.Control inputs of the SMC simulation for the nonlinear model with disturbance.

Figure 22
Figure 22 displays how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 22 .
Figure 22.Errors in positions and orientations for the nonlinear model using the SMC with disturbance.

Figure 21 .
Figure 21.Control inputs of the SMC simulation for the nonlinear model with disturbance.

Figure 20 .
Figure 20.Desired and obtained trajectory using the SMC with disturbance.

Figure 21
Figure21depicts the control inputs from the SMC with the presence of disturbance.

Figure 21 .
Figure 21.Control inputs of the SMC simulation for the nonlinear model with disturbance.

Figure 22
Figure 22 displays how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 22 .
Figure 22.Errors in positions and orientations for the nonlinear model using the SMC with disturbance.

Figure 22 .
Figure 22.Errors in positions and orientations for the nonlinear model using the SMC with disturbance.

Figure 26 .
Figure 26.Total reward and total steps per episode for the deterministic environment.The trajectory path planning is represented in Figure 27 in 2D and 3D spaces.

Figure 26 .
Figure 26.Total reward and total steps per episode for the deterministic environment.The trajectory path planning is represented in Figure 27 in 2D and 3D spaces.

Figure 28 .
Figure 28.Total reward and total steps per episode for the stochastic environment.

Figure 27 .
Figure 27.The 2D and 3D deterministic environment path planning (The red squares: obstacles, the blue line: the optimal path).

Figure 28 .
Figure 28.Total reward and total steps per episode for the stochastic environment.

Figure 28 .
Figure 28.Total reward total steps per episode for the stochastic environment.

Figure 28 .
Figure 28.Total reward and total steps per episode for the stochastic environment.

Figure 29 .
Figure 29.The 2D and 3D stochastic environment path planning (The red squares: obstacles, the blue line: the optimal path).

Figure 29 .
Figure 29.The 2D and 3D stochastic environment path planning (The red squares: obstacles, the blue line: the optimal path).

Figure 30 .
Figure 30.The 2D and 3d quadcopter trajectory using the sliding mode controller (The red squares:the blue line: the optimal path).

Figure 31 .
Figure 31.Desired and obtained trajectory using the SM controller.

Figure 32
Figure32depicts the control inputs from the sliding mode controller.

Figure 31 .
Figure 31.Desired obtained trajectory using the SM controller.

Figure 32 23 Figure 32 .
Figure 32 depicts the control inputs from the sliding mode controller.Automation 2024, 5, FOR PEER REVIEW 23

Figure 33 .
Figure 33.Errors in positions and orientations using the sliding mode controller.

Figure 32 .
Figure 32.Control inputs of the sliding mode controller simulation.

Figure 33 shows 23 Figure 32 .
Figure33shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 33 shows
Figure33shows how the errors of positions x, y, and z and orientations roll, pitch, and yaw change with respect to time.

Figure 33 .
Figure 33.Errors in positions and orientations using the sliding mode controller.

Figure 33 .
Figure 33.Errors in positions and orientations using the sliding mode controller.

Table 1
resumes all possible configurations for the types of classical PID controllers, which are determined by the values of µ and λ:

Table 1 .
Extended types of classical PID controllers.

Table 2 .
PID controller gains, settling time, and overshoot for the linear model.

Table 2 .
PID controller gains, settling time, and overshoot for the linear model.

Table 2 .
PID controller gains, settling time, and overshoot for the linear model.
Table 3 displays the PID gains for positions x, y, and z and orientations roll, pitch, and yaw in the nonlinear model.

Table 3 .
PID controller gains for the nonlinear model.

Table 3 .
PID controller gains for the nonlinear model.

Table 3 .
PID controller gains for the nonlinear model.

Table 4 .
FOPID controller gains for the nonlinear model.

Table 4 .
FOPID controller gains for the nonlinear model.

Table 5 .
The SMC gains for the nonlinear model.

Table 5 .
The SMC gains for the nonlinear model.

Table 6 .
The values of the parameters used in the improved Dyna-Q learning algorithm.

Table 6 .
The values of the parameters used in the improved Dyna-Q learning algorithm.

Table 7 .
SM gains for the resulting path from Dyna-Q learning.

Table 7 .
SM gains for the resulting path from Dyna-Q learning.