Performance Enhancement of Wireless BLDC Motor Using Adaptive Reinforcement Learning for Sustainable Pumping Applications

Antony, Richard Pravin; Komarasamy, Pongiannan Rakkiya Goundar; Ibrahim, Moustafa Ahmed; Alanazi, Abdulaziz; Rajamanickam, Narayanamoorthi

doi:10.3390/su172310881

Open AccessArticle

Performance Enhancement of Wireless BLDC Motor Using Adaptive Reinforcement Learning for Sustainable Pumping Applications

by

Richard Pravin Antony

¹

,

Pongiannan Rakkiya Goundar Komarasamy

^2,*

,

Moustafa Ahmed Ibrahim

^3,*

,

Abdulaziz Alanazi

⁴

and

Narayanamoorthi Rajamanickam

¹

Department of Electrical and Electronics Engineering, SRM Institute of Science and Technology, Kattankulathur 603 203, India

²

Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur 603 203, India

³

Electrical Engineering Department, University of Business and Technology, Jeddah 23435, Saudi Arabia

⁴

Department of Electrical Engineering, College of Engineering, Northern Border University, Arar 73222, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Sustainability 2025, 17(23), 10881; https://doi.org/10.3390/su172310881

Submission received: 17 September 2025 / Revised: 30 October 2025 / Accepted: 4 November 2025 / Published: 4 December 2025

(This article belongs to the Collection Electric Vehicles: New Challenges and Opportunities for Sustainability)

Download

Browse Figures

Versions Notes

Abstract

This paper presents an adaptive reinforcement learning (RL)-based control strategy for a wireless power transfer (WPT)-fed brushless DC (BLDC) motor drive, aimed at enhancing efficiency in industrial applications. Conventional control methods for BLDC motors often result in higher energy consumption and increased torque ripple under dynamic load and voltage variations. To address this, an adaptive RL framework is implemented with pulse density modulation (PDM), enabling the controller to augment motor speed, torque, and input power in real time. The system is modeled and tested for a 48 V, 1 HP BLDC motor, powered through a 1.1 kW WPT system. Training is carried out across 10 learning episodes with varying load torque and speed demands, allowing the RL agent to adaptively minimize losses while maintaining performance. Results indicate a significant reduction in torque ripple to a minimum of 0.20 Nm, stable speed regulation within ±30 rpm, and improved power utilization compared to existing controllers. The integration of RL with WPT provides a robust, contactless, and energy-efficient solution that is suitable for sustainable industrial motor-pump applications.

Keywords:

adaptive control; dynamic load conditions; energy efficiency; BLDC motor; pulse density modulation (PDM); reinforcement learning

1. Introduction

Brushless DC (BLDC) motors are a significant advancement in motor technology, offering efficiency, durability, and precision [1]. They eliminate brushes and commutators, improving reliability, minimizing wear and tear, and reducing maintenance. BLDC motors are crucial in industrial and wireless applications, providing great torque with minimal energy consumption, with sustainable operation [2]. They are used for robotics, electric vehicles, and industrial drives. BLDC motors are ideal for wireless applications in remote-operated vehicles, drones, and medical devices, due to their compact, lightweight design. Their brushless design minimizes electromagnetic interference, ensuring reliable performance [3]. The incorporation of a wireless control system allows real-time monitoring and adaptive adjustments, enhancing functionality in IoT-enabled devices and remote applications [4].

Figure 1 showcases the block diagram of wireless power transfer and control signals for a BLDC motor for wireless applications. The process begins with the high-frequency inverter, which converts DC into a WPT that requires a high-frequency AC signal, and feeds it to the WPT transmitter compensator and transmitter coil. Through inductive coupling, power is wirelessly transferred across the receiver coil and tuned by the WPT receiver compensator to maintain resonance and minimize losses. The received AC power is then rectified using the rectifier and supplied to the BLDC switching inverter, which generates the required three-phase excitation for the BLDC motor. To achieve closed-loop control, sensors continuously measure motor parameters such as voltage, current, torque, rotor position, and speed, transmitting this feedback to the controller. The signal processing and data communication ensure reliable data exchange between the motor side and the microcontroller unit (MCU). This wireless framework ensures seamless power transfer, where WPT control is effectively performed by the transmitter, but the motor control is critical for the receiver side.

Wireless communication has expanded the use of BLDC motors. WPT systems generally employ two control types: unilateral and bilateral communication [5]. Bilateral wireless communication in power transfer systems enables mutual data exchange between the transmitter and receiver, allowing for adaptive control and efficiency optimization. Unlike unilateral control, which operates solely from the transmitter side with the receiver side’s lightweight design and reduced latency [6], bilateral control provides feedback on the receiver side’s conditions, enhancing stability, power regulation, and safety in dynamic operating environments [7]. Maintaining consistent torque and speed when the load changes unexpectedly can lead to instability, compromising the system’s reliability [8,9]. Traditional control techniques like a proportional–integral–derivative (PID) controller often struggle to adapt to dynamic loads in real time, resulting in reduced performance under variable conditions [10]. Adaptive control strategies that dynamically adjust motor parameters in real time to ensure consistent and reliable operation demonstrate that wireless is needed. Sophisticated motor control algorithms are needed to dynamically adapt to load variations while maintaining optimal performance [11,12].

Reinforcement learning (RL) is a subset of artificial intelligence that can improve motor control by enabling real-time adjustments and enhanced performance [13]. RL can be integrated into BLDC motor control to boost efficiency and prolong the motor’s operational life [14]. RL can also enhance motor performance through adaptive control, utilizing deep learning for advanced decision-making in uncertain load conditions [15,16]. Energy-efficient control approaches using RL can adjust to changes in load while maintaining optimal performance [17]. These strategies have been successfully applied in automated manufacturing, robotics, and electric vehicles [18]. In wireless technology, researchers have investigated RL-based control in BLDC motors for drones, medical instruments, and IoT devices [19]. Real-time implementation and operational resilience in RL-based motor control remain challenging tasks, requiring further study to reduce computing complexity and refine algorithms for practical use for motor pump application [20]. In existing RL applications in motor control, particularly in terms of reward function design and system adaptability, conventional RL-based controllers rely on static or single-objective reward functions.

This article presents a novel approach to controlling a BLDC motor pump, as follows:

This research uniquely integrates reinforcement learning, wireless power transfer, and pulse density modulation to provide an adaptive control framework for BLDC motors, ensuring efficient and contactless operation in industrial environments;
This research develops a novel hybrid reward function that balances energy efficiency, torque ripple reduction, and motor stability, ensuring self-learning and continuous adaptation without manual tuning;
The proposed 48 V, 1 HP BLDC motor drive, powered through a 1.1 kW WPT system, validates the novelty of combining an RL-driven PDM with WPT for sustainable industrial processes, such as pumping and automation, where energy savings and reliability are critical.

The structure of this article is outlined as follows. The article begins with an introduction, highlighting the challenges of BLDC motors under dynamic load conditions and the motivation for using adaptive reinforcement learning-based control. In Section 2, the system model and the mathematical modeling of the dynamic loading condition of the motor is presented. The proposed adaptive RL-based control architecture details, including the reinforcement learning framework, reward function design, and wireless communication integration are elaborated on in Section 3. The simulation and experimental setup describe the hardware, software, and testing methodology to demonstrate the enhanced control of the BLDC motor in practical cases in Section 4, followed by the results and discussion, which analyze performance improvements over conventional methods in Section 5. Finally, the Conclusion summarizes the key findings and suggests future enhancements.

2. System Description and Modeling of the BLDC Motor Pump System

The system model corresponding to an adaptive reinforcement learning-based control strategy to improve the operating efficiency of a BLDC motor under changing load conditions, particularly in pump applications, is depicted in Figure 2. The system comprises two major parts: the environment and the control unit. In the environment, the BLDC motor drives the pump with power from a WPT system, which converts the direct current input to an alternating current output. The inverters in both the transmitter and receiver sides are controlled by PDM signals [21]. Current and speed sensors systematically monitor the motor parameters, such as current, speed, and load oscillations. In Figure 2, long dash arrows are measurements, doted arrows are control signals, red arrows are pulse signals. These sensor readings are crucial for real-time adjustments to maintain efficiency and stability. Motor performance is influenced by load variations due to the pumping operation and, therefore, a sophisticated control mechanism is required to provide steady speed control and low torque ripple [22].

The control unit includes a reinforcement learning agent, an MCU, and a signal processing unit with a data communication module. The RL agent is trained on optimal control strategies through a reward-based mechanism, and it tunes the motor control parameters in real time to maximize efficiency. The MCU handles sensor inputs and generates PDM signals to drive the inverter switching sequence, enabling adaptive and efficient motor control. The signal processing module facilitates communication between sensors, the MCU, and the RL agent, with real-time data exchange and decision-making.

Wireless communication is necessary to enable remote supervision, regulation, and adjustment of the motor parameters. The system allows for the real-time data communication of sensor signals integrated into the motor mechanism, such as the load current, speed, and temperature. Wireless communicates the data to the control unit for assessment and is used to adjust the operation of the motor. The communication protocols used in these systems are structured to offer minimal latency and high reliability to enable the motor to dynamically adjust to changes in load or operating conditions. In the operation of the pumping system, real-time operating data from the motor is communicated in real time, wirelessly, to a control unit. This enables the system to adjust motor parameters according to changing loads so that the system is in peak operating condition without the need for direct physical intervention. The use of wireless communication in BLDC motor systems increases their versatility, making them applicable in complex environments where real-time feedback and adjustment are required to ensure efficiency and performance.

Mathematical Modeling of Dynamic Load Conditions of the Motor in the Environment

In the analysis of the BLDC motor, the motor dynamic loading conditions must be taken into account, as they affect the performance of the motor. The load conditions are not static and have a significant impact on the motor’s operation. For the comprehension of this effect and the design of appropriate control schemes, a mathematical set of equations are used. These equations simulate the motor’s electrical and mechanical parts and the effect of the load dynamics. The electrical behavior of the BLDC motor can be simulated by the Equation in (1)

V_{a} = L (\frac{d i}{d t}) + i R + E_{b}

(1)

where V_a is the voltage applied, L is the inductance, R is the resistance, i is the flow of the current, and E_b is the back electromotive force (EMF), a highly important parameter which is proportional to the speed of the motor. Back EMF can be defined mathematically as in Equation (2)

E_{b} = k_{e} ω

(2)

where k_e is the back EMF constant, playing a large role in the motor electrical dynamics, and ω is the angular velocity of the motor. The relationship between the electrical input that is given to the motor and the mechanical output that is generated by the motor is of the highest importance when it comes to the accurate modeling of the motor’s behavior, especially under loaded conditions. On the mechanical side of operations, the motion equation is primarily decided by the motor dynamics, as well as the load being driven by the motor. The rotational dynamics that are being modeled are written as follows in Equation (3)

J (\frac{d i}{d t}) = T_{m} - T_{L} - B_{ω}

(3)

Here, J is the moment of inertia of the motor, T_m is the motor torque, T_L is the load torque, and B is the coefficient of damping. The motor torque T_m can be written in terms of current, as (4)

T_{m} = k_{t} i

(4)

where k_t is the torque constant and i is the current. The load torque T_L, which varies over time due to external disturbances or varying load conditions, is described by Equation (5)

T_{L} = J_{L} (\frac{d ω_{L}}{d t}) + B_{L} ω_{L}

(5)

In this, J_L is the load inertia, ω_L is the load angular velocity, and B_L is the load damping coefficient. The load inertia, J_L, may vary with time, which can be modeled as in (6)

J_{L} = J_{L 0} + Δ J_{L (t)}

(6)

where J_L₀ is the nominal load inertia and ΔJ_L_(t) represents the time-varying change in the load inertia. The control system needs to respond to these dynamic conditions. The control input u(t), which is tasked with regulating the voltage supplied to the motor, has been designed in such a way that the level of performance is achieved at all times, even when the load it is subjected to fluctuates and changes. This input is mathematically expressed by Equation (7)

u (t) = V_{m a x} (1 - (\frac{T_{m}}{T_{L}}))

(7)

where V_max is the maximum voltage that can be used without the risk of malfunctions. The correspondence between the angular velocity of the motor—that is, the rate at which the motor turns—and the voltage supplied to the motor must also be considered in the context of real-time motor control, as in (8)

ω = \frac{(V_{a} - E_{b})}{R}

(8)

In order to effectively optimize the operation of a motor while it is undergoing dynamic load conditions, it is important that energy consumption is accurately tracked over time. This energy consumption is actually modeled using the equation presented in (9), which captures the interaction between the various factors affecting the energy consumption.

E (t) = \int_{0}^{t} V_{a} i d t

(9)

where E(t) is used to represent the energy being consumed by the motor at a particular moment in time, t. Moreover, adaptive control mechanisms can be strategically employed to make the necessary adjustments to the motor parameters in accordance with the feedback received in real time from their operating conditions. The adjustment of these control parameters is completely captured by the equation presented in (10)

Δ k = γ (\frac{\partial L}{\partial k})

(10)

where Δk is the change that takes place in the control parameters, L is the loss function to be minimized, and γ is the learning rate, which is important in determining how quickly the parameters are updated. Motor speed (ω), torque (T), and load inertia (J) were alternative control parameters. While they provide valuable insights, they do not directly influence learning dynamics as effectively as L and γ. Also, the torque–current relationship can be represented as in Equation (11)

T = k_{i} i

(11)

where k_i is the current–torque constant. The power required by the load is another important parameter that has a strong influence on the motor’s efficiency and performance, and it is given by Equation (12)

P_{L} = T_{L} ω_{L}

(12)

Lastly, the measurement of the load in real time is of extreme importance for the efficient operation of adaptive control systems. The torque of the load, an important parameter, can be approximated by applying a smoothing technique, as given in Equation (13)

T_{L}^{’} (t) = α T_{L} (t) + (1 - α T_{L}^{’} (t - 1))

(13)

where T’_L(t) represents the estimated load torque and α is a smoothing factor; this specific factor is vital in determining the rate at which the load estimation reacts and changes to the new measurements taken. These equations together constitute a specific and complex arrangement, consisting of both the motor dynamics and the load dynamics. This is of great importance for the creation of sophisticated control algorithms, which enable the BLDC motor to function efficiently, especially under dynamic load conditions. This is vital in ensuring that the best performance is always attained under real-time applications that rely on efficiency.

3. Proposed Adaptive RL-Based Control Architecture

The proposed adaptive reinforcement learning-based control structure is developed with the aim of maximizing the overall performance of the BLDC motor system, especially when operated under dynamic load conditions. Optimization is realized in this process by leveraging the outstanding adaptability provided by RL algorithms in an effective manner. In this architectural framework, a feedback loop is harmoniously integrated with an RL agent that is capable of continuous learning and real-time adaptation of the motor control parameters from the performance data obtained during operation, as illustrated in Figure 3. The RL agent operates in an environment in which there is the BLDC motor as well as the enabling wireless communication infrastructure. Its main goal is to reduce control errors while simultaneously ensuring energy efficiency and system stability. For enabling this learning process, the system utilizes a reward function that directs the RL agent, stimulating it to discover and implement the best control actions, even if the load conditions vary.

The intrinsic quality of the RL model is fundamentally characterized by the dynamic process of the interaction of the agent and its environment, which is rigorously formulated mathematically in the structure of a Markov decision process. At any instance of time, denoted by t, the system state, denoted by s_t, incorporates a number of important variables including motor speed, torque, and the present loading conditions. Accordingly, the RL agent selects an action, which the modulation of voltage or current of the motor on the existing system state. Subsequently to this action, the environment progresses to a next state, and denotes the same by s_t₊₁, and concurrently provides a reward, r_t, a reflection of the performance of the chosen action. The state transitions can be formulated as displayed in Equation (14)

P (S_{t + 1} | s_{t}, a_{t}) = P [S_{t + 1} = s_{t + 1} | S_{t} = s_{t}, A_{t} = a_{t}]

(14)

where P is the probability distribution governing state transitions. The policy π(a_t∣s_t) maps states to actions and determines the agent behavior. The objective is to find an optimal policy, π∗, that maximizes the cumulative discounted reward, defined as in (15)

G_{t} = \sum_{k} γ^{k} r_{t + k + 1}

(15)

where γ is the discount factor that balances immediate and future rewards. The control action affects the applied voltage, V_a, which is adjusted dynamically to maintain the desired motor speed and torque. This can be expressed as (16)

V_{a} = V_{m a x} π (a_{t} | s_{t})

(16)

where V_max is the maximum allowable voltage. The motor’s electrical dynamics are modeled in Section 2. The reward function, r_t, is designed to minimize the control error, e, defined as the difference between the desired and actual motor performance metrics, such as speed (ω_d − ω_a) and torque (T_d − T_a). The reward can be expressed as in (17)

r_{t} = - (k_{1} e_{ω}^{2} + k_{2} e_{T}^{2})

(17)

where e_ω and e_T are the speed and torque errors, and k₁, k₂ are weighting factors. To enhance learning efficiency, the Q value function Q(s_t,a_t) estimates the expected cumulative reward for a given state–action pair. The update rule for the Q value is given by the Bellman Equation in (18)

Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α (r_{t} + {γ m a x}_{a} Q (s_{t} + 1, a^{’}) - Q (s_{t}, a_{t}))

(18)

where a is the learning rate. The RL agent’s policy is refined using an optimization algorithm, such as a policy gradient method. The policy gradient method updates the policy parameters, θ, as in (19)

θ \leftarrow θ + α \nabla_{θ} \log π_{θ} (a_{t} | s_{t}) Q (s_{t}, a_{t})

(19)

where ∇_θ is the gradient operator. In practice, the adaptive control system integrates these equations with a neural network to approximate the policy and Q value functions. The architecture of the system includes sensors that monitor real-time motor parameters, a wireless communication network that transmits data to the RL agent, and actuators that implement the control’s actions. Figure 3, illustrating the system, depicts the RL agent as the core component, receiving state inputs from wireless sensors and the motor system, processing the inputs to determine the optimal actions, and sending back control signals to the motor. The adaptive RL framework proposed will function as a semi-online mode. Initially, the control policy is trained offline, using a limited dataset. The framework incorporates adaptive policy updating and real-time reward adjustment during operation. This approach enables the system to refine its control actions continuously based on real-time feedback, thereby reducing data dependency and improving adaptability under varying conditions. Consequently, the proposed RL-based control scheme offers both learning efficiency and operational practicality for real-world applications. The combination of RL with wireless communication and motor dynamics modeling represents a significant development in designing an intelligent motor control system for pumping applications.

Computation of Reward Function for RL to Enhance Motor Performance

The control strategy designed for optimizing motor performance in the intended system is primarily reliant upon a sophisticated adaptive feedback mechanism. The mechanism is designed to repeatedly observe and investigate various operating parameters like speed, torque, and the current of the BLDC motor. In doing so, it can actually implement real-time adaptations in control signals, so that the motor is at its optimal operating level. The proposed control strategy adopts a PID controller for baseline performance adjustment. It also adopts the proposed reinforcement learning model to enable the system to learn efficiently, in order to counter any dynamic loads and unpredicted external perturbations that occur. The process of optimization commences with the determination of the error signal, represented as e(t), and defined as a difference between the desired motor state and actual motor state

u (t) = K_{p} e (t) + K_{i} \int_{0}^{t} e (t) d t + K_{d} (\frac{d t}{d e (t)})

(20)

where K_p, K_i, and K_d are the proportional, integral, and derivative gains, respectively, of control systems. This specific control action, as u(t), is important in controlling the input current supplied to the motor, with the goal of eliminating any variances from desired performance parameters that have been set. To further enhance this specific control technique, an RL agent is incorporated into the system, which dynamically learns from the operating environment in an effort to enhance decision-making continuously. The RL control system tunes the PID gains in an adaptive manner, maximizing a reward function set to punish variances from set targets, as well as inefficiencies, in the use of energy. Furthermore, the control input, as u(t), is updated based on the policy of the RL agent, as π(a_t∣s_t), to ensure real-time adaptability and responsiveness under conditions of change.

The wireless communication component adds an essential element of extensibility and flexibility to the motor control system. The communication system utilizes low-latency protocols such as Wi-Fi to provide a real-time reaction. Sensor nodes that were mounted on the motor provided the relay performance data, including speed ω(t), torque T(t), and temperature Θ(t), to the RL agent via a wireless gateway. This can be represented mathematically as (21)

D (t) = \{ω (t), T (t), Θ (t)\}

(21)

Here, D(t) is the data vector sent at time t. The RL agent processes this data and sends optimized control signals C(t) to the motor system in response. Secure and reliable communication is of the highest priority, and hence, cryptographic algorithms like the advanced encryption standard are used to encrypt data and prevent unauthorized access. In addition to this, wireless communication provides the benefit of system scalability, with several motors or devices being able to be controlled simultaneously within a coordinated system of a distributed network. This aspect is particularly beneficial in industrial automated pumping systems, where centralized control can result in high computational loads.

The mechanisms enable real-time adaptability of the system, as they enable it to adapt dynamically and respond to changing load conditions, unforeseen environmental interference, and possible faults in the system. The RL agent is in the process of learning and optimization continuously, for the best control strategy. Learning is guided by a reward function that is specifically designed to capture and represent the different performances of the system. For instance, the objectives are to minimize control error, e(t), minimize the energy used, E(t), and maintain stability, even in the case of fluctuating loads. The design of the reward function can be expressed as given in Equation (22)

r (t) = - (w_{1} {e (t)}^{2} + w_{2} E (t) + w_{3} {∆ T (t)}^{2})

(22)

where e(t) is the instantaneous control error (difference between desired and actual speed or torque). E(t) is the real-time energy consumption, and ΔT(t) is the torque ripple, estimated as the standard deviation of torque over a cycle. w₁, w₂, and w₃ are weighting factors that determine the trade-off between stability, efficiency, and ripple suppression. This structure penalizes high control error, energy waste, and torque fluctuations. The squared terms for e(t) and ΔT(t) ensure a stronger penalty for larger deviations, while energy is treated as a linear term to reflect the cumulative cost.

Figure 4 describes the process of calculating the reward for the RL-based motor control system. It starts by collecting motor parameters, such as speed, torque, voltage, and current. The system then computes the tracking error, energy consumption, and torque ripple. These factors are combined using weighted values to assess motor performance. The final reward guides the RL agent in optimizing control actions.

High adaptability of the system is also encouraged with the employment of predictive modeling techniques, where the RL agent can effectively predict upcoming states by considering and making use of past knowledge, enabling it to adapt its actions in anticipation. Such high predictive ability is enabled by a state-transition model as presented in Equation (23)

s_{t + 1} = f (s_{t}, a_{t}) + η

(23)

where f denotes the complex behavior of the system of interest, with η taking into account the presence of noise or any uncertainties that may arise. Real-time operation of the system triggers the mechanisms that enable adaptability to ensure that the control strategy possesses the ability to evolve and adapt accordingly, to effectively address nonlinearities. The nonlinearities may manifest in the form of variations in the electromagnetic response of the motor, or as rapid and unpredictable variations in load. This dynamic readjustment enables minimization of wear and tear on the system to a very large degree while promoting overall efficiency. It is also responsible for significantly contributing to the prolongation of the lifespan of the system, making the architecture reliable and sustainable for industrial use. The incorporation of the same intelligent results in the evolution of a unified framework integrates advanced control mechanisms with wireless communication and real-time adaptability intelligently. Such a sophisticated treatment is directed towards optimizing motor performance and reliability under real-world conditions.

4. Simulation and Experimental Setup

Implementation in the simulation and experimental setup of the proposed control architecture based on adaptive reinforcement learning requires components, ranging from hardware, software, and simulation platforms to datasets, all of which are important in order to successfully test its operation and performance.

Figure 5 illustrates the complete simulation and experimental configuration of the proposed adaptive RL-based WPT control system for a BLDC motor with the specifications listed in Table 1, used for pumping applications with specifications as in Table 2. The system is composed of three major sections: the high-frequency inverter with a WPT link to LCL compensation, the BLDC motor drive and the adaptive control. The high-frequency inverter converts the DC input voltage V_IN into a high-frequency alternating current to drive the transmitting coil of the WPT system. The inverter consists of four MOSFET switches (S₁–S₄) arranged in a full-bridge configuration with DC-link capacitors C₁ and C₂ for voltage balancing. The switching operation of the inverter is governed by PDM signals that are generated by the controller, which are optimized in real time through the RL agent. The WPT section employs an LCL resonant network, comprising the transmitter coil L_T, receiver coil L_R, and compensation capacitors. The mutual inductance, M, between the transmitter and receiver coils and the WPT system specifications are given in Table 3. The resonant configuration ensures high transfer efficiency and minimizes reactive losses. The received AC power at the secondary coil is rectified by a full-bridge diode rectifier (D₁–D₄) and filtered by a capacitor to provide a stable DC voltage to the BLDC inverter stage.

The BLDC switching inverter consists of six MOSFET switches (S₁–S₆), forming a three-phase bridge that drives the BLDC motor. The inverter’s switching pulses are also controlled using PDM signals that are optimized by the adaptive RL controller to regulate torque, speed, and overall efficiency. The motor’s rotor position and speed are detected by Hall-effect sensors, while a current sensor monitors the phase current for closed-loop feedback. These measurements are transmitted through the signal processing and data communication block to the controller. The RL agent interacts continuously with the controller to adjust the switching density and compensate for environmental variations such as load torque, voltage fluctuation, or temperature change, as in Figure 4. The reward function in the RL agent aims to minimize speed error and torque ripple while maximizing overall system efficiency. The signal processing and data communication modules facilitate wireless data exchange between the transmitter-side and receiver-side controllers, ensuring coordinated control action between the power transmitter and motor drive unit. In this system, the pump load is directly coupled to the BLDC motor, and its mechanical torque is proportional to the square of the motor speed, representing a realistic pumping profile.

The system configuration involves a 48 V BLDC motor with a 750 W power rating, accompanied by various sensors that are capable of monitoring crucial parameters like speed, torque, and temperature; additional information concerning the other specifications is provided, as shown in Table 1. An STM32 microcontroller serves as the primary processing unit, interfaced with a Wi-Fi wireless communication module with a packet loss rate of 5% and communication latency of 10 ms. The microcontroller is responsible for executing the control algorithms and relaying data to the RL agent. Additionally, a power supply unit, data acquisition system, and load emulator are integrated to simulate variable load conditions. The system is housed in a controlled environment to minimize interference.

The software setup involves programming the microcontroller using embedded C and implementing the RL algorithm in Python 3.9; the leveraging library is TensorFlow. The RL model is trained, ensuring rapid processing of large datasets. The wireless communication protocols are configured using standard APIs, and for data visualization tools, MATLAB R2024b is used for real-time monitoring of motor parameters. The simulation environment replicates real-world operating conditions for the BLDC motor to validate the control architecture’s effectiveness. MATLAB/Simulink R2024b is utilized to design and simulate the motor’s dynamics, incorporating electrical, mechanical, and thermal models. The simulation parameters include a load torque range, operating speed range, and input voltage variations. A detailed list of the various simulation parameters that were utilized in this study is provided in Table 4, below.

During environmental disturbances, like voltage sags and temperature changes, load variations are deliberately injected to assess the robustness of the system. To facilitate this testing, the reinforcement learning training environment is implemented carefully, using OpenAI Gym, where the complex dynamics of the motor are represented as a Markov decision process. During validation, the environment is deliberately altered every 250 episodes to test the agent’s adaptability, simulating practical operating variations, which are listed in Table 5. Here, one episode means running the motor for a fixed number of control steps, and the episode range is a continuous block of episodes, grouped together for analysis and reporting. The designed reward function is utilized to penalize control errors as well as energy inefficiency, while, at the same time, rewarding stability as well as the adaptability of the system.

The simulation also takes careful note of communication latency as well as potential packet losses that could occur in the wireless network, thereby emulating real-world scenarios that the system may be subjected to. A specific packet loss rate of 5% and an equivalent latency of 10 milliseconds are deliberately introduced to strictly test the robustness of the system under such circumstances.

The training and testing dataset for the reinforcement learning model is important in ensuring that the control strategy is able to adapt and respond accordingly under a vast range of changing conditions. The dataset comprises time-series data that has been formulated from simulated configurations and experimental setups, comprising a range of parameters such as motor speed, torque, current, and temperature readings. The training set forms 70% of the data, thus ensuring that the model is exposed to a broad range of scenarios when training, while the testing set forms the remaining 30% that is used to test the performance of the model in situations that it has not experienced or learned before. A detailed and extensive description of the features in the dataset can be found in Table 6.

The data also goes through a pre-processing stage that plays a vital role in normalizing the feature values, removing noise in the dataset, and ensuring the overall consistency of the data. Data augmentation techniques are employed to expand the dataset, creating additional scenarios for the RL agent to learn from. The final dataset is benchmarked using performance metrics, such as mean squared error and reward stability, to assess the model’s reliability in real-world applications. This holistic approach ensures that the implementation and experimental setup comprehensively evaluate the proposed control architecture.

The hardware prototype, Figure 6, was constructed to validate the simulation results of the proposed approach. The motor and controller parameters utilized in the hardware prototype were the same as those utilized in the simulation. On the transmitter side, an STM controller and a Wi-Fi module are coupled with an external antenna to facilitate remote monitoring and control. A laptop is integrated into the system and is likely utilizing software for real-time data acquisition, parameter adjustment, and performance testing. Another STM controller and a single-board Wi-Fi module are integrated into the system on the receiver side. The BLDC motor is powered by a dedicated driver circuit that translates the mechanical signals from the STM controller into the appropriate voltage and current. A proximity sensor is positioned next to the motor to sense its speed and position and thus provide valuable feedback for proper operation. For electrical performance testing, a mixed signal oscilloscope (MSO) is utilized, coupled with current and voltage probes measuring real-time waveforms. This equipment is mainly utilized to test control algorithms, measure motor efficiency, and explore the feasibility of wireless motor control.

The boost converter operated at a duty ratio of 0.5, which provides an output voltage of 48 V for an input voltage of 24 V. The inverter circuit is used to convert the DC voltage from the chopper to an AC voltage that is usable to the BLDC motor. The inverter output voltage shows a three-level single phase output voltage of around 48 V. The corresponding three phase input voltage of the BLDC motor is being maintained around 48 V at the dynamic load condition. The load increased at 2 s, which causes a slight dip in the output voltage. It is also observed that the controller works very well to maintain the voltage at constant value

5. Results and Discussion

This section presents a detailed analysis of the experimental and simulation outcomes obtained from the proposed RL-based BLDC motor drive system. This section highlights that the key electrical and mechanical responses of the motor were evaluated over 2500 episodes and divided into ten ranges, with each introducing controlled environmental variations in load torque, supply voltage, or disturbance patterns. Comparative observations between theoretical predictions and measured waveforms are discussed to demonstrate the accuracy of the design model and the reliability of the hardware implementation.

In WPT-based motor drive systems, input power is a critical performance indicator, since it reflects both the transmitted energy from the primary side and the motor’s operational demand. The reinforcement learning-based control ensures that despite such fluctuations, the motor receives adequate power without long-term instability. The power input WPT delivers the necessary electrical energy to the BLDC motor, while the Hall sensor outputs (Figure 7) provide rotor position feedback for precise commutation. These signals ensure precise synchronization between the stator current and rotor movement, reducing torque ripple and enhancing performance.

The hall sensor output shows three square wave signals (red, yellow, and blue) shifted by 120° electrical, representing rotor position feedback. The periodic switching confirms consistent motor rotation and accurate position detection. Figure 8 illustrates the PDM switching pulses applied to the six inverter switches (S₁–S₆) under RL-based control. The pulses are adaptively modulated episode by episode, ensuring optimized commutation for efficient torque generation and reduced losses. Insights I, II, and III in Figure 8 show the variations in pulse density across the switches reflect the RL agent’s response to changing operating conditions, dynamically balancing voltage, torque ripple, and speed. This adaptive switching strategy highlights the controller’s ability to fine-tune the inverter operation for improved stability and performance in each training episode.

The switching pulse for one cycle is given in Figure 9. In one electrical cycle, the six switches (S₁–S₆) operate in 120° conduction mode, where two switches (one upper and one lower) conduct at any instant. PDM pulses vary their density, rather than amplitude or width, to regulate power flow, enabling finer control of switching events. The RL-based PDM controller modulates pulse density within these conduction intervals to regulate torque, speed, and voltage. This results in optimized switching patterns that improve the efficiency and dynamic response of the motor drive.

Figure 10 shows the three-phase current response over the environment, highlighting phase-shifted current behavior for each episode of the BLDC motor under RL-based WPT control. Insight I provides a zoomed-in view, showing a detailed current response over the environment change, and Insight II shows modulation effects with reduced ripples. The current value derived from the motor is around 15A and increases to around 18A during full load condition. This condition is caused by increasing the load at 2 s. The results confirm RL-based PDM operation with clear phase current transitions.

Figure 11a presents the overall voltage response across all three phases of the BLDC motor for the full duration with environment changes. Figure 11b provides a zoomed-in view, clearly showing that the transition in the episode range does not affect much in the voltage and results in a stable and balanced voltage profile.

Figure 12 shows the dynamic response of the torque in the BLDC motor. The electromagnetic torque rises initially and then stabilizes with step changes under RL-based control. The nonlinear torque–speed characteristic of the pump load significantly affects the dynamic performance of the BLDC motor drive. As the torque demand rises quadratically with speed, existing controllers tend to exhibit overshoot and delayed responses during load transitions. However, the proposed controller successfully adapts to these nonlinear variations by continuously tuning the switching density of the inverter.

The corresponding speed value of the BLDC motor during the varying load condition is given in Figure 13. The speed is maintained as constant throughout the load conditions and sees a small dip when the load increases, but is maintained as constant. The observations made during the varying load condition of the BLDC motor provides a clear picture about the effectiveness of the controller.

Fluctuations in output power are often caused by dynamic changes in torque, speed, or external disturbances, such as voltage ripples and sensor noise. Figure 14 illustrates the power output of the motor system under varying load and voltage conditions to validate the dynamic adaptability of the reinforcement learning-based control.

The system is simulated for a 10 episode range for each episode. It initially rises sharply to 680 W, stabilizing under nominal load, and varies based on environment changes, like dipping to 550 W due to reduced demand. It then peaks at 660–680 W under higher torque/load conditions. Finally, the power settles back near the rate when the condition is normalized.

The response time has been optimized, with quicker system reactions to changes in load and operational conditions. Energy efficiency has also been enhanced through adaptive control strategies that minimize energy consumption without sacrificing performance. System stability shows notable improvements, as the reinforcement learning model continuously adapts to environmental variations, ensuring the motor operates within optimal parameters. Efficient energy utilization is achieved by adjusting control inputs dynamically, leading to reduced energy waste. Torque ripple is minimized through fine-tuned motor control, providing smoother operation. This dynamic adjustment is essential in practical applications, such as electric vehicles and pumping systems, where load variations are frequent and uninterrupted operation is necessary.

5.1. Analysis Based on the Reinforced Learning Episode Range

In Figure 15, the performance of an RL-based control strategy for a BLDC motor under WPT is evaluated, with the reward function used as a metric for evaluating system stability and adaptability over 2500 testing episodes. Blue lines represent raw rewards, while the red curve shows a 50 episode moving average to highlight long-term trends and dashed lines mark environment change points.

In Range 1 (episodes 0–250), the system operates at nominal conditions of 1.20 Nm torque and 48 V supply, maintaining a high and stable reward of 95. With the introduction of a +10% load disturbance in Range 2 (episodes 251–500), the reward drops sharply from 95 to 91 before gradually recovering to around 94, indicating the agent’s adaptation to the increased torque demand (1.32 Nm). Similarly, in Range 3 (episodes 501–750), under a +15% load (1.38 Nm), the reward decreases to 85.5 and recovers to 95. In Range 4 (episodes 751–1000), a torque ripple of ±3% results in a deeper reward dip to 86, followed by recovery to 94.8. In Range 5 (episodes 1001–1250), the presence of sensor noise reduces the reward to 87, with recovery up to 94. A voltage sag to 46 V in Range 6 (episodes 1251–1500) produces a significant drop to 86, with gradual recovery to 93. In Range 7 (episodes 1501–1750), voltage reduction by 1 V, combined with iron loss, leads to a dip near 88 and recovery to 92.5. A bus voltage ripple of 5% in Range 8 (episodes 1751–2000) results in a dip to 87 before recovering to 93.2. The most severe condition occurs in Range 9 (episodes 2001–2250), where a +25% load (1.50 Nm) causes the reward to decline sharply to 90 before rising back to 95. Finally, in Range 10 (episodes 2251–2500), the system returns to nominal operating conditions, and the reward stabilizes at 96, demonstrating a successful recovery. Across all ranges, recovery trends showed that the RL policy increasingly required fewer episodes to regain performance after repeated disturbances, indicating an improving adaptability. The plotted reward trajectory with vertical markers at change points clearly reflects this behavior, validating adaptive RL as a robust, self-optimizing control strategy for BLDC motors operating in dynamic, uncertain conditions.

The response time of the motor, as seen in Figure 16a, fluctuates between 100 ms and 300 ms across various environments. Initially, the response time is closer to the higher end, but as the tests progress, it stabilizes to near 150 ms. This trend suggests a gradual improvement in the motor’s responsiveness. The improvement in response time indicates that the motor is becoming more efficient over time, with a noticeable decrease in fluctuation toward the end of the tests. In terms of the system stability given in Figure 16b, the stability index remains quite high, ranging between 0.95 and 1.0. Most of the tests exhibit a stability index close to 1.0, with only a minor dip to 0.95 occurring in the beginning of the environment change. This indicates that the motor’s system stability is strong throughout.

Energy utilization, shown in Figure 17a, ranges from 80.56% to 89.12%, with the highest recorded at 89.12%. The data reveals a small but steady improvement in energy utilization, suggesting that the motor uses energy more efficiently as the learning proceeds over the episodes. The slight decrease over time indicates that the motor adapts and adjusts its energy consumption, leading to more effective energy utilization. Energy efficiency, depicted in Figure 17b, varies between 85.09% and 92.24%, with the highest efficiency recorded at 92.24% and the average being 88.66%. The performance remains relatively stable, indicating that the motor consistently operates at a high level of energy efficiency. The overall trend suggests that the motor maintains a steady performance with only slight fluctuations in energy consumption.

Motor stability is predominantly high, with values hovering around the 99% mark. The stability index shows only minor dips, but the overall performance remains strong throughout the test series. The highest stability is recorded at a value of 1.0, with the other tests showing minimal variation from this ideal value. This suggests that the motor is highly stable, with only slight variations that do not significantly affect overall performance. The stability levels throughout the test series reflect a motor that operates with reliability and consistency.

5.2. Hardware Results Analysis Based on the BLDC Motor

The hardware analysis examines the practical implementation of the BLDC motor drive and RL control system, validating the design through real-time measurements. It focuses on evaluating key parameters such as voltage, current, torque, and switching pulses to ensure reliable performance under varying load conditions over the testing episode range.

The rectified output shown in Figure 18a gives a steady DC voltage of nearly 48 V and dips during every environmental change, confirming effective wireless power transfer. The corresponding current waveform in Figure 18b stabilizes around 15 A after the startup oscillations, indicating proper load regulation. These results validate that a stable DC supply is delivered to the BLDC motor from WPT system.

Figure 19 displays the gate-drive switching pulses for each of the six inverter switches, clearly demonstrating the 120° conduction sequence required for BLDC motor operation. Each channel shows a precise, non-overlapping pattern that ensures proper commutation and minimizes switching losses. Figure 19 illustrates the corresponding three-phase line voltages of the BLDC motor under varying environmental conditions, highlighting the stable amplitude and balanced phase displacement despite environmental changes. The circled regions emphasize intervals where minor fluctuations occur, yet the waveform integrity and motor performance remain unaffected. Together, these plots confirm accurate pulse generation, robust voltage output, and reliable operation of the BLDC drive under dynamic loading conditions.

The switching pulses shown in Figure 20 remain well-timed, even when PDM is applied, indicating that the 120° commutation logic is preserved, while the density of the pulses adapts to load and speed changes. The hardware experimental results of the BLDC motor in Figure 19 show the three-phase voltages across 2500 episodes under the different dynamic operating conditions listed in the table. In the baseline range (0–250), the voltages are well balanced, with symmetrical PDM switching and amplitudes close to ±48 V, representing nominal operation. As the operating environment changes across different episode ranges, the phase voltages exhibit corresponding variations in amplitude, switching density, and waveform regularity. Load increases lead to denser PDM activity and supply voltage reduction lowers amplitude, while disturbances such as temperature, sensor noise, and bus ripple introduce irregularities and distortion. By the final range (2251–2500), the waveforms return to close to nominal, confirming the adaptability of the control strategy.

The motor operates under the nominal conditions shown in Figure 20, insight I (Episodes 0–250), and the three-phase voltages (V_A, V_B, V_C) that are stable, balanced, and symmetrical. This provides the reference signature against which later disturbances are compared. Insights II shows that with the supply drop to 46 V (Episodes 501–750), the reduced supply directly lowers the amplitude of phase voltages, but the PWM density increases. This shows the controller compensating to maintain torque, proving that the supply voltage directly governs the amplitude, while the duty cycle adapts to load demand. As indicated in Insight III’s load increase by 25% (Episodes 2001–2250), this heavy load condition produces the densest PDM switching in the waveforms, with visibly stressed voltages compared to baseline. The inverter must work harder, showing how load demand translates into increased switching effort and electrical stress.

Figure 21 illustrates the three-phase stator currents (I_A, I_B, I_C) of the BLDC motor over 2500 episodes, under dynamic environmental and load variations. At the baseline condition, the currents are sinusoidal-like, with proper phase separation, representing a nominal balanced operation. As the table conditions progress, the current waveforms exhibit changes in amplitude, distortion, and ripple content, depending on supply, load, and disturbance factors. The top plot shows the DC’s current trend, while the middle and bottom plots highlight the detailed switching currents at selected episodes (I–IV).

From Figure 21, in Insight I (Episodes ~250–500, load increase 10%), compared to the baseline, the phase current amplitude rises, and the waveform becomes slightly denser with higher harmonic content. This reflects the additional torque demand, where the controller increases the duty cycle and current flow to maintain the motor speed. In Insight II (Episodes ~500–750, supply voltage drop to 46 V), the current amplitude increases further compared to Insight I. Since the supply is reduced, the inverter compensates by drawing higher currents to maintain the output torque. This aligns with the table, where supply reduction directly stresses the current profile. Insights III (Episodes ~1500–1750, sensor noise impact) The current waveforms become irregular and distorted, with noticeable asymmetry between phases. Sensor noise disrupts the rotor position estimation, causing unbalanced current distribution. The figure shows higher ripple and reduced smoothness in the phase currents compared to earlier ranges. Insight IV (Episodes ~2000–2250, load increase 25%) is the most demanding condition. Phase current amplitudes are significantly higher than in Insight I–III, and the waveforms appear highly stressed with a denser ripple content. The controller delivers maximum effort to meet torque demand, which matches the table condition of a heavy load increase. It shows that the phase difference between the voltage and current of the BLDC motor is being maintained at a minimum value.

From the motor output, analysis is performed on the basic episode range. The torque ripple, represented in Figure 22, ranges from 0.5 Nm to 2.5 Nm, starting higher at the environment change and gradually decreasing throughout the episode range. This reduction in torque ripple suggests that the motor operation becomes smoother and more consistent as the system progresses.

From Table 7, the proposed system achieves a fast response time of 100 ms, ensuring quick adaptation to load variations. It enhances energy efficiency and energy utilization, optimizing power consumption while maintaining stable motor operation. System stability and motor stability demonstrate the robustness of the control strategy under dynamic conditions. Additionally, the torque ripple is minimized, leading to a smoother and more efficient motor performance.

5.3. Interpretation of Findings and Comparison of Relevant Works

The proposed adoptive RL controller demonstrates robust adaptation across a wide set of environmental perturbations (2500 episodes, 10 scenario ranges). Key performance highlights include the following: (i) simulation peak efficiency is 92.24% and hardware efficiency is around 90.45% (Figure 23, Table 7), (ii) torque ripple reduction to a minimum measured value near 0.5 Nm during stable operation and an average ripple of 1.5 Nm across episodes (Figure 22, Table 7), (iii) convergence of the RL policy within 180 episodes with stable reward trends thereafter (Figure 15), and (iv) improved dynamic response with the response time reduced to a 150 ms average, with extremes of 100–300 ms as the episodes progressed (Figure 16, Table 7). These results substantiate the claims of improved energy utilization, disturbance rejection, and faster adaptation under changing load and supply conditions.

Figure 23 compares the simulation and hardware results for the environmental variations and its effects on the energy efficiency of the BLDC motor with the RL framework. The simulation bars remain consistently high, because the model assumes ideal thermal conditions. In contrast, the hardware efficiency reflects real-world changes like motor temperature, increasing copper and core losses and causing a small drop in efficiency to about 85–86%. Episodes 2 and 3 correspond to cooler ambient conditions and steadier input voltage, allowing the hardware to reach 90% and 89%, closely matching the simulation’s 92% and 91%. Mid-range episodes show moderate dips as occasional voltage sags or inverter heating slightly elevate the conduction and switching losses. From these, it is confirmed that the control algorithm keeps the motor performance within a narrow high efficiency of 92%, even in environmental changes. The comparison demonstrates the robustness of the proposed BLDC drive under dynamic loading conditions; it sustains around 89% efficiency, validating its suitability for energy-sensitive applications like sustainable pumping and industrial automation.

The comparative analysis in Table 8 highlights that the proposed adaptive reinforcement learning approach delivers a superior balance of speed, robustness, and energy efficiency compared with existing control strategies. While classical feedback linearization and periodic adaptive methods achieve low computation times, they suffer from slower convergence and lower efficiency, limiting their suitability for dynamic industrial environments. Advanced techniques such as deep neural networks and soft computing optimization exhibit high accuracy and efficiency but incur longer computation times and moderate convergence rates. In contrast, the proposed adaptive RL converges in only 180 episodes, with a moderate computation time of 2.8 ms per episode, outperforming traditional RL and chaotic adaptive strategies in responsiveness. It also achieves the highest energy efficiency of 92%, demonstrating its ability to minimize power losses under varying load and environmental conditions. Although the torque ripple is slightly higher at 45%, the algorithm’s high robustness and very fast convergence make it ideal for real-time industrial applications, where rapid adaptation and energy savings are critical. Overall, the proposed adaptive RL method sets a new benchmark by combining fast learning, high efficiency, and strong disturbance rejection, making it particularly well-suited for sustainable pumping systems with WPT and BLDC motors under dynamic loading operations.

5.4. Limitations and Future Improvements

Although the proposed adaptive RL-based control integrated with WPT and PDM demonstrates significant performance improvements, certain limitations should be acknowledged to ensure a balanced understanding of the work. The adoptive RL algorithm, when executed in real time, imposes a moderate computational load on the microcontroller. Although off-line and semi-online training reduce this burden, complete on-board online learning for higher sampling rates may require a more powerful embedded processor or dedicated AI accelerator hardware. The training process relies on a sufficiently large dataset, representing various dynamic load and voltage scenarios, but 2500 episodes were used in this study. The system’s performance depends on stable wireless communication. Latency (10 ms) and packet losses (5%) were simulated, but more severe real-world interference could reduce response accuracy. Future work will explore error-tolerant or predictive communication protocols. The temperature-dependent losses and coil misalignment influence the overall performance; these are not included in the study.

Future work will address these limitations by implementing chip reinforcement learning to achieve full online adaptability, optimizing the reward function for reduced computational demand, and testing the system with higher power BLDC motors and varying coupling distances to validate scalability for industrial deployment.

6. Conclusions

The investigation confirms that adaptive reinforcement learning provides a robust and intelligent control strategy for BLDC motor drives in WPT systems. Unlike conventional fixed-parameter methods, adaptive RL continuously adapts to varying operating conditions such as load fluctuations, supply voltage variations, and disturbances including temperature rise, bus ripple, and sensor noise. The hardware waveforms of the voltage and current revealed that the ARL controller effectively maintained phase balance and torque delivery, even under stress conditions, while reducing distortion and improving the overall waveform quality. From the energy efficiency analysis, peak efficiency was observed at 92.24% in the simulation and 90.45% in the hardware during moderate load conditions. Even under high-stress episodes, both the simulation and hardware followed the same trend, confirming that the proposed strategy preserves efficiency, despite real-world losses. Furthermore, torque ripple was reduced by nearly 12% compared to traditional modulation schemes, and the dynamic response time improved by about 18%, which was evident from the faster stabilization of reward signals during the training episodes.

The adaptive RL-based PDM scheme optimized the inverter operation by intelligently balancing torque, voltage, and speed regulation, leading to stable power delivery and improved recovery from transient disturbances. This adaptability was particularly evident during supply drops and heavy load increments, where the system maintained reliable performance with minimal efficiency degradation. In summary, the results highlight the capability of the ARL-based controller to achieve high adaptability, reduced torque ripple, stable and efficient power transfer, and resilience against uncertainties. These findings underscore the strong potential for deploying ARL-driven control in advanced industrial motor drive applications, where intelligent, adaptive, and energy-efficient operation is crucial for future sustainable energy systems.

Author Contributions

Conceptualization, R.P.A. and P.R.G.K.; methodology, R.P.A.; software, R.P.A.; validation, P.R.G.K., M.A.I., A.A., and N.R.; formal analysis, N.R.; investigation, R.P.A.; resources, A.A.; data curation, M.A.I.; writing—original draft preparation, R.P.A.; writing—review and editing, P.R.G.K.; visualization, A.A.; supervision, P.R.G.K.; project administration, M.A.I.; funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number “NBU- FFR-2025-332-08.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used for the study are made available within the manuscript.

Acknowledgments

The authors gratefully thank the Prince Faisal bin Khalid bin Sultan Research Chair in Renewable Energy Studies and Applications (PFCRE) at Northern Border University for their support and assistance.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RL	Reinforcement Learning
WPT	Wireless Power Transfer
BLDC	Brushless DC motor
PDM	Pulse Density Modulation
DC	Direct Current
AC	Alternating Current
MCU	Microcontroller Unit
PID	Proportional–Integral–Derivative controller
IoT	Internet of Things
EMF	Electromotive Force
MSO	Mixed Signal Oscilloscope

References

Mohanraj, D.; Aruldavid, R.; Verma, R.; Sathiyasekar, K.; Barnawi, A.B.; Chokkalingam, B.; Mihet-Popa, L. A Review of BLDC Motor: State of Art, Advanced Control Techniques, and Applications. IEEE Access 2022, 10, 54833–54869. [Google Scholar] [CrossRef]
Kumar, A.; Marwaha, S.; Manna, M.S.; Marwaha, A.; Kumar, R.; Amir, M.; Bajaj, M.; Zaitsev, I. Comparative Analysis of Brushless DC and Switched Reluctance Motors for Optimizing Off-Grid Water Pumping. Sci. Rep. 2025, 15, 3527. [Google Scholar] [CrossRef] [PubMed]
Vishnuram, P.; Suresh, P.; Narayanamoorthi, R.; Vijayakumar, K.; Nastasi, B. Wireless Chargers for Electric Vehicle: A Systematic Review on Converter Topologies, Environmental Assessment, and Review Policy. Energies 2023, 16, 1731. [Google Scholar] [CrossRef]
Ramesh, P.; Komarasamy, P.R.G.; Rajamanickam, N.; Alharthi, Y.Z.; Elrashidi, A.; Nureldeen, W. A Comprehensive Review on Control Technique and Socio-Economic Analysis for Sustainable Dynamic Wireless Charging Applications. Sustainability 2024, 16, 6292. [Google Scholar] [CrossRef]
Wang, H.; Chau, K.T.; Liu, W.; Goetz, S.M. Design and Control of Wireless Permanent-Magnet Brushless DC Motors. IEEE Trans. Energy Convers. 2023, 38, 2969–2979. [Google Scholar] [CrossRef]
Gu, Y.; Wang, J.; Liang, Z.; Zhang, Z. Flexible Constant-Power Range Extension of Self-Oscillating System for Wireless In-Flight Charging of Drones. IEEE Trans. Power Electron. 2024, 39, 15342–15355. [Google Scholar] [CrossRef]
Tian, J.; Zhou, Y.; Yin, L.; AlQahtani, S.A.; Tang, M.; Lu, S.; Wang, R.; Zheng, W. Control Structures and Algorithms for Force Feedback Bilateral Teleoperation Systems: A Comprehensive Review. Comput. Model. Eng. Sci. 2025, 142, 973–1019. [Google Scholar] [CrossRef]
Prabhu, N.; Thirumalaivasan, R.; Ashok, B. Critical Review on Torque Ripple Sources and Mitigation Control Strategies of BLDC Motors in Electric Vehicle Applications. IEEE Access 2023, 11, 115699–115739. [Google Scholar] [CrossRef]
Antony, R.P.; Komarasamy, P.R.G.; Rajamanickam, N.; Alroobaea, R.; Aboelmagd, Y. Optimal Rotor Design and Analysis of Energy-Efficient Brushless DC Motor-Driven Centrifugal Monoset Pump for Agriculture Applications. Energies 2024, 17, 2280. [Google Scholar]
Chen, J.I.Z.; Lin, H.Y. Performance Evaluation of a Quadcopter by an Optimized Proportional–Integral–Derivative Controller. Appl. Sci. 2023, 13, 8663. [Google Scholar] [CrossRef]
Selvaraj, A.; Thottungal, R. A novel power quality-improved high-step-up-gain Luo converter-powered BLDC motor drive with model reference adaptive controller for electric vehicles. Electr. Eng. 2025, 107, 4801–4817. [Google Scholar] [CrossRef]
Nazeer, M.S.; Laschi, C.; Falotico, E. RL-Based Adaptive Controller for High Precision Reaching in a Soft Robot Arm. IEEE Trans. Robot. 2024, 40, 2498–2512. [Google Scholar] [CrossRef]
Du, W.; Ding, S. A Survey on Multi-Agent Deep Reinforcement Learning: From the Perspective of Challenges and Applications. Artif. Intell. Rev. 2021, 54, 3215–3238. [Google Scholar] [CrossRef]
Li, P.; Sun, W.; Shen, J.-X. Flux Observer Model for Sensorless Control of PM BLDC Motor with a Damper Cage. IEEE Trans. Ind. Appl. 2019, 55, 1272–1279. [Google Scholar] [CrossRef]
Chen, P.; Zhao, J.; Liu, K.; Zhou, J.; Dong, K.; Li, Y.; Guo, X.; Pan, X. A Review on the Applications of Reinforcement Learning Control for Power Electronic Converters. IEEE Trans. Ind. Appl. 2024, 60, 8430–8450. [Google Scholar] [CrossRef]
Gao, T. Optimizing Robotic Arm Control Using Deep Q-Learning and Artificial Neural Networks through Demonstration-Based Methodologies: A Case Study of Dynamic and Static Conditions. Robot. Auton. Syst. 2024, 181, 104771. [Google Scholar] [CrossRef]
Shahid, M.B.; Jin, W.; Abbasi, M.A.; Bin Husain, A.R.; Munir, H.M.; Hassan, M.; Flah, A.; Souissi, A.S.E.; Alghamdi, T.A.H. Model predictive control for energy efficient AC motor drives: An overview. IET Electr. Power Appl. 2024, 18, 1894–1920. [Google Scholar] [CrossRef]
Jahan Jui, J.; Ahmad, M.A.; Molla, M.I.; Rashid, M.I.M. Optimal Energy Management Strategies for Hybrid Electric Vehicles: A Recent Survey of Machine Learning Approaches. J. Eng. Res. 2024, 12, 454–467. [Google Scholar] [CrossRef]
Gao, L.; Zhao, J.; Duan, J.; Liu, S.; Ma, F. Reinforcement Learning-Based Motion Control Method for Electrical-Hydraulic Valve-Controlled System. IEEE/ASME Trans. Mechatron. 2025, 30, 751–762. [Google Scholar] [CrossRef]
Zhang, K.; Cui, Z.; Ma, W. A Survey on Reinforcement Learning-Based Control for Signalized Intersections with Connected Automated Vehicles. Transp. Rev. 2024, 44, 1187–1208. [Google Scholar] [CrossRef]
John, F.; Komarasamy, P.R.G.; Rajamanickam, N.; Vavra, L.; Petrov, J.; Kral, V. Performance Improvement of Wireless Power Transfer System for Sustainable EV Charging Using Dead-Time Integrated Pulse Density Modulation Approach. Sustainability 2024, 16, 7045. [Google Scholar] [CrossRef]
Udayakumar, A.K.; Raghavan, R.R.V.; Houran, M.A.; Elavarasan, R.M.; Kalavathy, A.N.; Hossain, E. Three-Port Bi-Directional DC–DC Converter with Solar PV System Fed BLDC Motor Drive Using FPGA. Energies 2023, 16, 624. [Google Scholar] [CrossRef]
Zbinden, J.; Molin, J.; Ortiz-Catalan, M. Deep Learning for Enhanced Prosthetic Control: Real-Time Motor Intent Decoding for Simultaneous Control of Artificial Limbs. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 1177–1186. [Google Scholar] [CrossRef] [PubMed]
Boroujeni, M.S.; Markadeh, G.A.; Soltani, J. Torque Ripple Reduction of Brushless DC Motor Based on Adaptive Input–Output Feedback Linearization. ISA Trans. 2017, 70, 502–511. [Google Scholar] [CrossRef]
Adıgüzel, F.; Türker, T. A Periodic Adaptive Controller for the Torque Loop of Variable Speed Brushless DC Motor Drives with Non-Ideal Back-Electromotive Force. Automatika 2022, 63, 732–744. [Google Scholar] [CrossRef]
Rodríguez-Molina, A.; Villarreal-Cervantes, M.G.; Serrano-Pérez, O.; Solís-Romero, J.; Silva-Ortigoza, R. Optimal Tuning of the Speed Control for Brushless DC Motor Based on Chaotic Online Differential Evolution. Mathematics 2022, 10, 1977. [Google Scholar] [CrossRef]
Elymany, M.M.; Enany, M.A.; Metwally, H.; Shaier, A.A. Enhanced Operation of PVWPS Based on Advanced Soft Computing Optimization Techniques. Sci. Rep. 2024, 14, 29429. [Google Scholar] [CrossRef]
Zhang, Y.; Gono, R.; Jasiński, M. An Improvement in Dynamic Behavior of Single Phase PM Brushless DC Motor Using Deep Neural Network and Mixture of Experts. IEEE Access 2023, 11, 64260–64271. [Google Scholar] [CrossRef]

Figure 1. Block diagram of the wireless system for a BLDC motor drive with control.

Figure 2. Block diagram of the adaptive RL for the proposed BLDC motor pump system.

Figure 3. Block diagram of proposed adaptive RL-based control architecture.

Figure 4. Process flow diagram of the reward function computation.

Figure 5. Schematic diagram for the proposed wireless BLDC motor with adaptive RL control.

Figure 6. Hardware setup for the proposed wireless BLDC motor system.

Figure 7. BLDC motor hall sensor output.

Figure 8. PDM switching gate pulse for BLDC motor inverter.

Figure 9. PDM switching pulse for one cycle with 120° conduction mode.

Figure 10. Current waveform of BLDC motor with RL control.

Figure 11. Voltage waveform of BLDC motor. (a) For 3 phases in all episode range; (b) voltage waveform during environment change.

Figure 12. Torque in the BLDC motor.

Figure 13. Speed response in the BLDC motor.

Figure 14. Power output of the BLDC motor under varying environment.

Figure 15. Reward over episodes for BLDC motor under varying conditions.

Figure 16. (a) Response time comparison of proposed RL framework and (b) comparison of system stability index.

Figure 17. Based on environment change: (a) energy utilization by the system; (b) proposed system efficacy analysis.

Figure 18. Rectified output from WPT receiver side. (a) Output voltage of rectifier and (b) output current of rectifier.

Figure 19. Switching pulses for the BLDC motor for 120° commutation.

Figure 20. Voltage waveform of BLDC motor with insight of environment variations.

Figure 21. Current waveform of BLDC motor.

Figure 22. Torque ripple, based on environment change.

Figure 23. Comparison of energy efficiency between simulation and hardware for the proposed BLDC motor.

Table 1. BLDC motor specifications.

Parameters	Values
BLDC motor power	1 hp
Voltage	48 V
Current	20 A
Main winding resistance	0.1 Ω
Main winding inductance	0.5 mH
Rated torque	3.18 Nm
Maximum motor speed	3000 rpm
Casing	304 stainless steel, 2 mm, IP6

Table 2. Pump specifications.

Parameters	Values
Motor power	1 hp
Volumetric flow rate	274 L/min
Outlet head	10 m ≈ 0.7 bar
Inlet suction	1–2 m above pump inlet
Fluid density	ρ ≈ 720 kg/m³

Table 3. WPT system specifications.

Parameters	Values
WPT system power	1.1 kW
Input DC voltage	48 V
Coupling co-efficient	0.2
Switching frequency	85 kHz
Mutual inductance	10 µH
Self-inductance of the transmitter coil	50 µH
Compensation capacitor of the transmitter side	700 nF
Compensation inductance of the transmitter side	20 µH
Self-inductance of the receiver coil	50 µH
Compensation capacitor of the receiver side	70 nF
Compensation inductance of the receiver side	50 µH
Distance between transmitter to receiver coil	180 mm

Table 4. Environment metrics of proposed framework.

Parameters	Values
Load Torque Range	0.1–2 Nm
Speed Range	500–3000 RPM
Voltage Range	40–60 V

Table 5. Environment scenario setup.

Range	Episodes	Environment Change at Range Start
1	0–250	Baseline (nominal)
2	251–500	Increase load by 10%
3	501–750	Supply decrease to 46 V
4	751–1000	Load decrease by 15%
5	1001–1250	Increase mechanical loss
6	1251–1500	Increase Temperature
7	1501–1750	Increase Sensor noise
8	1751–2000	Increase DC bus ripple
9	2001–2250	Increase Load by 25%
10	2251–2500	Return near nominal

Table 6. Dataset used in proposed framework.

Parameters	Values	Unit
Speed (ω)	Rotational speed of the motor	RPM
Torque (T)	Load torque	Nm
Current (I)	Input current	A
Voltage (V)	Input voltage	V
Temperature (Θ)	Motor surface temperature	°C

Table 7. Performance parameters of Proposed system.

Metric	Minimum Value	Maximum Value	Average Value
Response Time	100 ms	300 ms	150 ms
Energy Efficiency	85.09%	92.24%	88.65%
System Stability	0.95	1.00	0.98
Energy Utilization	80.56%	89.12%	84.84%
Torque Ripple	0.5 Nm	2.5 Nm	1.5 Nm
Motor Stability	0.98	1.00	0.99

Table 8. Performance comparison of the proposed system.

Topology	Algorithm	Computation Time	Convergence	Robustness	Efficiency	Torque Ripple
[13]	Reinforcement Learning	Medium (2.5 ms/episode)	Moderate (400 episodes)	High	90%	Not specified
[23]	Deep Neural Network with Mixture of Experts	High (4.8 ms/episode)	Fast (250 episodes)	High	91%	Not specified
[24]	Adaptive Input–Output Feedback Linearization	Low (1.2 ms/episode)	Fast (200 episodes)	Medium	88%	30%
[25]	Periodic Adaptive Control	Low (1.0 ms/episode)	Slow (500 episodes)	Low	85%	25%
[26]	Chaotic Adaptive Tuning Strategy	Medium–High (3.5 ms/episode)	Moderate (350 episodes)	High	89%	33%
[27]	Soft Computing Optimization Algorithms	High (5.0 ms/episode)	Moderate (400 episodes)	Medium	90%	Not specified
[28]	Super-Twisting Sliding Mode Control	Low–Medium (2.0 ms/episode)	Fast (220 episodes)	High	91%	40%
Proposed	Adaptive RL	Medium (2.8 ms/episode)	Very Fast (180 episodes)	High	92.24%	45%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Antony, R.P.; Komarasamy, P.R.G.; Ibrahim, M.A.; Alanazi, A.; Rajamanickam, N. Performance Enhancement of Wireless BLDC Motor Using Adaptive Reinforcement Learning for Sustainable Pumping Applications. Sustainability 2025, 17, 10881. https://doi.org/10.3390/su172310881

AMA Style

Antony RP, Komarasamy PRG, Ibrahim MA, Alanazi A, Rajamanickam N. Performance Enhancement of Wireless BLDC Motor Using Adaptive Reinforcement Learning for Sustainable Pumping Applications. Sustainability. 2025; 17(23):10881. https://doi.org/10.3390/su172310881

Chicago/Turabian Style

Antony, Richard Pravin, Pongiannan Rakkiya Goundar Komarasamy, Moustafa Ahmed Ibrahim, Abdulaziz Alanazi, and Narayanamoorthi Rajamanickam. 2025. "Performance Enhancement of Wireless BLDC Motor Using Adaptive Reinforcement Learning for Sustainable Pumping Applications" Sustainability 17, no. 23: 10881. https://doi.org/10.3390/su172310881

APA Style

Antony, R. P., Komarasamy, P. R. G., Ibrahim, M. A., Alanazi, A., & Rajamanickam, N. (2025). Performance Enhancement of Wireless BLDC Motor Using Adaptive Reinforcement Learning for Sustainable Pumping Applications. Sustainability, 17(23), 10881. https://doi.org/10.3390/su172310881

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Performance Enhancement of Wireless BLDC Motor Using Adaptive Reinforcement Learning for Sustainable Pumping Applications

Abstract

1. Introduction

2. System Description and Modeling of the BLDC Motor Pump System

Mathematical Modeling of Dynamic Load Conditions of the Motor in the Environment

3. Proposed Adaptive RL-Based Control Architecture

Computation of Reward Function for RL to Enhance Motor Performance

4. Simulation and Experimental Setup

5. Results and Discussion

5.1. Analysis Based on the Reinforced Learning Episode Range

5.2. Hardware Results Analysis Based on the BLDC Motor

5.3. Interpretation of Findings and Comparison of Relevant Works

5.4. Limitations and Future Improvements

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI