Previous Article in Journal
Accuracy Evaluation of a Linear Servo Positioning System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Advanced Servo Control and Adaptive Path Planning for a Vision-Aided Omnidirectional Launch Platform in Sports-Training Applications

1
Faculty of Robot Science and Engineering, Northeastern University, Shenyang 110819, China
2
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
*
Authors to whom correspondence should be addressed.
Actuators 2025, 14(12), 614; https://doi.org/10.3390/act14120614
Submission received: 6 November 2025 / Revised: 10 December 2025 / Accepted: 12 December 2025 / Published: 15 December 2025

Abstract

A system-level scheme that couples a multi-dimensional attention-fused vision model and an improved Dijkstra planner is proposed for basketball robots in complex scenes. Fast-moving object detection, cluttered background recognition, and real-time path decision are targeted. For vision, the proposed YOLO11 with Multi-dimensional Attention Fusion (YOLO11-MAF) is equipped with four modules: Coordinate Attention (CoordAttention), Efficient Channel Attention (ECA), Multi-Scale Channel Attention (MSCA), and Large-Separable Kernel Attention (LSKA). Detection accuracy and robustness for high-speed basketballs are raised. For planning, an improved Dijkstra algorithm is proposed. Binary heap optimization and heuristic fusion cut time complexity from O ( V 2 ) to O ( ( V + E ) log V ) . Redundant expansions are removed and planning speed is increased. A complete robot platform integrating mechanical, electronic, and software components is constructed. End-to-end experiments show the improved vision model raises mAP@0.5 by 0.7% while keeping real-time frames per second (FPS). The improved path planning algorithm cuts average compute time by 16% and achieves over 95% obstacle avoidance success. The work offers a new approach for real-time perception and autonomous navigation of intelligent sport robots. It lays a basis for future multi-sensor fusion and adaptive path planning research.

1. Introduction

With the rapid advancement of artificial intelligence, computer vision, and intelligent control technologies, industrial robots find their enormous applications in assembling, welding, spray painting, machine loading, manufacturing, construction, fabrication, and so on [1]. In addition, robotic systems have expanded from industrial automation to complex domains such as service, healthcare, education, and competitive sports [2]. As an important branch of sports robotics, basketball robots operate in rules-based but dynamically changing environments, requiring autonomous perception, real-time decision-making, and precise control [3]. Research in this field plays a crucial role in advancing robotic perception, planning, and control technologies. However, basketball robots face dual challenges.
At the visual perception level, the system must achieve stable target detection under varying levels of illumination, background interference, and high-speed motion. Although the You Only Look Once (YOLO) series of algorithms balance detection speed and accuracy, task failures under dynamic conditions cannot be ignored. At the path planning level, the robot must generate collision-free trajectories efficiently. Traditional path planning and navigation strategies have their inherent limitations when applied to basketball robots. The classic Dijkstra algorithm guarantees global optimality but suffers from a time complexity of O ( V 2 ) , incurring significant computational latency that is incompatible with real-time obstacle avoidance. The A* algorithm improves search efficiency through heuristic functions in static environments but shows limited adaptability to the dynamic changes of basketball game scenarios. Sampling-based methods like the rapidly exploring random tree (RRT) can quickly find feasible paths in high-dimensional spaces but often produce trajectories of poor quality. In recent years, emerging strategies such as the Intelligent Bug Algorithm (IBA), a novel approach for autonomous mobile robot navigation, have gained attention. IBA mimics the foraging and obstacle avoidance behaviors of natural organisms, relying on local environmental information perception to make real-time movement decisions without the need for prior global map construction. This characteristic gives it potential advantages in dynamic and unknown environments. However, IBA still faces challenges such as low navigation efficiency in large-scale spaces and difficulty in ensuring global optimality of paths, which restrict its direct application in basketball robot scenarios that require both real-time responsiveness and trajectory optimality [4].
To address these challenges, extensive research has been conducted worldwide. In visual recognition, the evolution from R-CNN (Region-based Convolutional Neural Network) to the YOLO series has significantly improved detection efficiency, and the introduction of attention mechanisms such as squeeze-and-excitation (SE), convolutional block attention module (CBAM), Efficient Channel Attention (ECA), and Coordinate Attention (CoordAttention) has further enhanced feature representation [5,6,7,8]. Nevertheless, achieving both real-time performance and high precision for small-object detection on embedded platforms remains a bottleneck. In the field of path planning, traditional algorithms such as Dijkstra, A*, and RRT have been optimized for search efficiency through heuristic functions and random sampling. For new strategies like IBA, researchers have attempted to improve its path optimization capabilities by integrating heuristic information and local replanning mechanisms, but its application in sports robots is still in the exploratory stage. Although basketball robot systems have evolved from early color-based detection to deep-learning-based visual perception combined with LiDAR for centimeter-level localization, the lack of joint optimization between perception and planning modules—whether based on traditional algorithms or emerging strategies—continues to limit overall system performance.
This study proposes a systematic optimization framework targeting the perception and decision-making bottlenecks of basketball robots. First, a multi-dimensional attention fusion visual model is constructed by embedding CoordAttention, ECA, MSCA, and LSKA modules into YOLO11. Through position encoding, channel interaction, and multi-scale feature fusion, the model enhances detection performance for both moving and small targets, achieving a 1.90% improvement in mAP@0.5 while maintaining a frame rate of 45 FPS. Second, an improved Dijkstra-based path planning algorithm is designed, which employs a binary heap structure to reduce time complexity to O ( ( V + E ) log V ) and integrates a Euclidean heuristic function to minimize redundant node expansion, resulting in faster planning and an obstacle avoidance success rate above 95%. Compared with emerging strategies like IBA, this improved algorithm balances global optimality and real-time performance, which is more in line with the requirements of basketball robots for trajectory accuracy and response speed in structured competition environments.
Furthermore, a hardware–software co-designed basketball robot system is developed and implemented based on ROS 2 Humble Hawksbill, enabling modular integration of perception, planning, and control. The system employs an omnidirectional mobile chassis and multi-sensor fusion control, ensuring end-to-end latency below 100 ms.
The main contributions of this article are as follows:
  • Multi-dimensional attention-fused perception model YOLO11-MAF: CoordAttention, ECA, MSCA and LSKA modules are simultaneously embedded into YOLO11 for the first time, enabling position-encoded, channel-interactive and multi-scale feature fusion; as a result, mAP@0.5 increases by 1.90%while 45 FPS real-time performance is preserved.
  • Binary heap-enhanced Dijkstra planner: time complexity is reduced from O(V2) to O((V+E)logV) and Euclidean-heuristic pruning eliminates most redundant expansions; a novel local replanning mechanism limits replanning latency to <50 ms and yields >95% obstacle avoidance success, outperforming traditional algorithms in real-time performance and surpassing emerging strategies like IBA in path optimality.
  • Hardware–software co-design for basketball robots: a ROS-2-based (humble) modular architecture integrating an omnidirectional chassis and multi-sensor fusion is proposed, guaranteeing end-to-end latency below 100 ms for on-court deployment.
The rest of paper is organized into five sections: Section 2 presents the research background and current developments. Section 3 details the robot architecture, detection enhancement and path planning optimization. Section 4 integrates all modules into a complete robotic platform and evaluates system-level performance through end-to-end experiments. Finally, Section 5 summarizes the major findings and discusses potential directions for future research.

2. Related Works

2.1. Visual Recognition Method

In the field of robot visual perception, object detection algorithms have undergone a significant evolution from traditional methods to deep learning. Islam et al. proposed a comprehensive robotic system integrating kinematic solving, image processing, and industrial simulation, serving as an excellent demonstration for industrial automation applications [1]. Early methods based on manual features had limited generalization capabilities in complex scenarios, and the introduction of convolutional neural networks completely changed this situation [9,10,11]. The R-CNN series proposed by Girshick et al. pioneered a two-stage detection paradigm based on regional proposals. The mAP reached 58.5% on the PASCAL VOC dataset, but the processing speed was only 5 FPS, which was difficult to meet real-time needs [12]. The subsequent SSD algorithm uses multi-scale feature map prediction to increase the speed to 22 FPS, but the detection accuracy of small targets is obviously lost [13].
The YOLO series of single-stage detectors demonstrates unique advantages in balancing speed and accuracy. During the evolution process from YOLOv1 to YOLO11 [14], the network structure and training strategy are continuously optimized: YOLOv4 reached up to 43.5% mAP on the COCO dataset, running at over 100 FPS on GPUs. YOLOv5 introduces CSPDarknet and PANet structures to build a flexible model architecture. YOLOv8 achieves a mAP of 37.3% on the COCO dataset and a speed of 0.99 ms on A100 TensorRT. The inference speed of the latest YOLO11 [15] on the embedded platform is much more faster than that of the previous generation, and it also supports complex tasks such as rotating frame detection.
The integration of attention mechanism further improves the model performance. The SE module proposed by Hu et al. enhances feature selectivity through channel attention and reduces the top-1 error rate by 1.5% on the ImageNet classification task [6]. Woo et al. ’s CBAM combines channel and spatial attention to bring 2.5% mAP boost in object detection tasks [7]. The ECA module designed by Wang et al. achieves efficient channel interaction through lightweight one-dimensional convolution, reducing computational overhead while maintaining accuracy [8]. Radzi et al. introduced it in YOLOv12 by adding modules before each detect head, and got mAP0.5 increased by 2% in YOLOv12n [16]. The CoordAttention proposed by Hou et al. further improves AP by 1.1% on the COCO data set compared to the SE module through location information embedding [17], and has been widely used in various vision models. For example, Wei et al. combined CoordAttention with C3 module in the YOLOv5 backbone to enhance feature representation for warehouse robots, and eventually improved the accuracy of the detection model and increases the fidelity rate by 2.32% [18].

2.2. Path Planning Algorithm

The research of path planning algorithms has developed from basic graph search to modern optimization methods, each with its own characteristics. The classical Dijkstra algorithm guaranties global optimality but has a time complexity of O ( V 2 ) , and the planning delay exceeds 5 s in a 100 × 100 raster map. Algorithm A* guides the search direction through heuristic functions, which improves the efficiency by 3–5 times in static environments, but has limited adaptability to dynamic environments.
Sampling-based methods such as RRT quickly find feasible paths through random exploration, which are especially suitable for high-dimensional spaces, but the path quality is poor [19]. Karaman et al.’s RRT guaranties probabilistic completeness through asymptotic optimization, with near-optimal path length but significant increase in computational complexity. In recent years, reinforcement learning methods have shown great potential.
Modern methods such as temporal elastic bands (TEB) generate smooth trajectories through multi-constraint optimization, which are widely used in ROS navigation stacks. The study by Rosmann et al. shows that TEB has good real-time performance in dynamic environment, but parameter tuning is complex and easy to fall into the local optimum [20]. New optimization-based methods such as MINCO presented by Wang et al. also have been applied to path planning. MINCO constructs a class of parameterized minimum-control-effort trajectories under state-input constraints and achieves real-time performance on embedded platforms [21]. Although integer programming, model predictive control and other methods have strong theoretical guarantee, they have high computational overhead and are difficult to run in real time on embedded platforms.

2.3. Research Status of Basketball Robot

Basketball robots have attracted increasing attention in both the robotics and sports engineering communities. Early systems mainly relied on classical control and template vision. For instance, Comau’s Racer 7 industrial arm demonstrated 100% two-point accuracy within 8 m by manually tuning trajectory templates. However, template-based methods are sensitive to illumination and cannot adapt to dynamic scenes.
To improve adaptability, recent studies integrate deep learning for object detection and reinforcement learning (RL) for decision-making [22]. Zeng et al. [23] embedded an Instance-Batch Normalization module into YOLOv5s and achieved 97.05% bounding-box accuracy on a self-collected basketball dataset. Nevertheless, their pipeline still separates detection from motion planning. Zhang et al. [24] proposed an end-to-end network that fuses visual, motion, and distance cues via multi-head attention and uses Deep Q-Network (DQN) to learn shooting policies. Experiments on four public datasets showed 97.02% shooting accuracy and 95.36% recall, but the work focused only on static shooting skills and did not consider full-court motion control.
For motion generation under uncertainty, model-free RL has become the mainstream paradigm. Lan et al. [25] presented a multi-sensor-fused Q-learning framework for basketball robots and reported a 99.35% recognition rate and a 67.85% response speed on the Webots simulator. Xu et al. [26] combined an improved Q-learning algorithm with a fuzzy obstacle avoidance controller, reducing path-planning convergence time from 700 s to 250 s. Although these RL-based controllers are robust to sensor noise, they still treat perception and control as two independent stages, which may lead to sub-optimal policies in highly dynamic matches.
Despite these advances, existing systems still face major challenges in real-world competition environments. The vision module suffers from persistently high miss and false-positive rates when confronted with fast-moving balls launched at 5 m/s, cluttered backgrounds dominated by spectator stands, and distant small targets. The path planning module exhibits insufficient real-time performance in cluttered environments, where replanning delays can disrupt game rhythm. More importantly, vision and planning modules are often designed independently, lacking deep coordination, which restricts overall system performance. Typical issues include outdated planning information caused by visual latency, path generation that ignores perception uncertainty, and inconsistent data formats leading to integration difficulties.
At present, the core contradiction in basketball robot research lies in balancing real-time performance, accuracy, and robustness. Vision algorithms often sacrifice speed for precision, whereas path planning methods prioritize real-time performance at the expense of optimality. Insufficient system-level optimization leads to performance bottlenecks, particularly under competitive conditions. To address these challenges, this paper proposes a vision–planning co-optimization framework. This framework enhances detection efficiency through multi-dimensional attention fusion, improves planning real-time performance via binary heap optimization and heuristic strategies, and establishes a unified system integration scheme aimed at achieving overall performance improvement for basketball robot systems.

3. Materials and Methods

The basketball robot system represents a highly integrated platform that combines mechanical engineering, electronic technology, and intelligent algorithms. It is designed to perform collaborative tasks of target perception, autonomous navigation, and precise motion control in competitive environments. Focusing on the core requirements of high mobility, high perception accuracy, and high real-time control, this section presents the modular mechanical design, the layered electronic hardware architecture, and the modular software system development. In addition, a multi-sensor fusion mechanism and a motion control model are proposed to provide a stable hardware foundation and efficient software support for subsequent visual recognition and path planning algorithms, ensuring adaptability to complex task scenarios during competition.

3.1. Overall Design of the Basketball Robot System

3.1.1. Mechanical Structure Design

The mechanical structure adopts a functional modular design concept, which consists of three main subsystems: an omnidirectional chassis, a ball-picking mechanism, and a shooting mechanism, as illustrated in Figure 1 and Figure 2. By employing lightweight materials and precision transmission design, the structure achieves an optimal balance between rigidity and motion flexibility, satisfying the requirements for rapid obstacle avoidance, stable ball collection, and accurate shooting.
The omnidirectional chassis serves as the core motion platform. It adopts a four-wheel Mecanum configuration, where coordinated control of the wheel rotation speeds and directions enables three degrees of freedom on a two-dimensional plane—translation along the X- and Y-axes and rotation around the Z-axis. This configuration allows complex maneuvers such as zero-radius turning and oblique translation, supporting agile tactical adjustments in confined competition areas, such as slipping past obstacles while closing in rapidly on the ball.
The ball-picking mechanism is designed based on the physical characteristics of a standard No. 7 basketball. It features a combined structure of an active roller and a guide funnel, achieving automated ball collection and temporary storage through cooperative traction and guiding. This design eliminates the high inertia and structural complexity associated with traditional pneumatic grippers. A Sharp GP2Y0A21YK infrared distance sensor (Sharp Corporation, Osaka, Japan) is mounted at the bottom of the funnel. When the sensor detects that a basketball has entered the storage chamber, it outputs a high-level signal to the main controller, triggering the swing arm to reset to its horizontal position and completing one ball-picking operation.
The shooting mechanism employs a dual friction-wheel launching system, which imparts initial velocity and spin to the basketball through controlled friction. Combined with adjustable shooting angles and a guided pushing mechanism, this design ensures both shooting precision and sufficient range to cover the main shooting zones of the court, including the free-throw line (5.8 m) and the three-point line (6.75 m).

3.1.2. Electronic Hardware Architecture

The electronic hardware adopts a “master–slave” hierarchical architecture, which is divided into four functional modules: master control computing layer, motion control layer, perception layer, and power supply layer. Each layer realizes data interaction through standardized communication interfaces (CAN, USB, and Ethernet), which ensures the functional independence of modules, ensures the overall real-time and scalability of the system, and meets the collaborative needs of visual recognition, path planning, and motion control. The connection of electronic components is shown in Figure 3.
The main control computing layer undertakes the core computing tasks of the system, including visual recognition algorithm operation, path planning calculation, and task scheduling logic execution. It needs to have powerful parallel computing capabilities and rich peripheral interfaces and selects NVIDIA Jetson Orin Nano (NVIDIA Corporation, Santa Clara, CA, USA) embedded computing platform as the core carrier. In terms of communication interface configuration, the platform provides 2 Gigabit Ethernet ports, which are respectively used to connect lidar and the host computer. Four USB 3.0 interfaces are used to connect the binocular camera and the infrared sensor. The interface transmission rate is 5 Gbps, meeting the real-time transmission of images of 1920 × 1080@60 FPS. The 1 CAN 2.0 A/B interface is expanded through the PCIe expansion board, and the baud rate can be configured to 250 kbps−1 Mbps, which is used to communicate with the motion control layer, and the control command transmission delay is ≤2 ms.
The motion control layer is responsible for the underlying motor drive, sensor data acquisition, and execution of the safety protection logic. It needs to have high real-time performance and reliability. The STM32F407VET6 microcontroller (STMicroelectronics, Geneva, Switzerland) is selected as the core control unit. For peripheral interfacing, the controller integrates two CAN controllers that support CAN 2.0A/B and concurrently communicate with seven C620 electronic speed controllers. Four manage chassis motors and three govern ball-intake and shooting motors, with a communication period of 1 ms. Motor commands and status feedback travel in standard CAN data frames whose identifiers obey the RoboMaster protocol: chassis motors occupy IDs 0x201–0x204, the intake motor 0x205, and the shooting motors 0x206–0x207. In terms of sensor interface configuration, the controller is connected to the MPU6050 (TDK InvenSense, San Jose, CA, USA) inertial measurement unit (IMU) through the I2C interface. The sensor integrates a 3-axis accelerometer and a 3-axis gyroscope, with a sampling rate of 1 kHz, which can collect robot attitude data in real time for motion status monitoring and attitude correction. Four infrared proximity sensors, type E18-D80NK (DFRobot, Shanghai, China), interface via GPIO and detect obstacles from 2 cm to 30 cm. When any sensor reports a distance of 5 cm or less, the controller invokes an emergency deceleration that reduces motor speed below 0.1 m/s. A normally closed emergency stop button on GPIO immediately disables all motor drives and signals to the main control layer at the same time to ensure the safety of equipment and people.
In order to improve the control accuracy, the motion control layer adopts a “speed-current” cascade PID control strategy: the outer loop is the speed loop, the speed deviation is calculated according to the target speed issued by the main control layer and the actual speed fed back by the motor encoder, and the current command is output. The inner loop is a current loop, which calculates the deviation between the actual current fed back by ESC and the current command, adjusts the motor driving voltage, and realizes accurate speed tracking. Under the set speed of 0.5 m/s, the speed control error is ≤0.02 m/s, which meets the accuracy requirements of path tracking.
The perception layer uses multiple types of sensors to work together to realize environmental visual information collection, distance measurement, global positioning, and attitude estimation, providing comprehensive environmental perception data for the system, and ensuring that the robot accurately recognizes the surrounding environment and its own position, including vision sensors, lidar. There are four parts: global positioning system and high-precision IMU.
The vision sensor adopts a binocular camera configuration, using a Logitech C922 Pro USB camera (Logitech, Lausanne, Switzerland). The two cameras are installed at the front end and top of the robot, respectively. The front unit carries a 3.6 mm lens delivering a 78° horizontal field of view and specializes in close-range basketball detection out to 5 m. The top unit employs a 6 mm lens with a 54° horizontal field of view, surveying the entire scene up to 10 m.
MS200 ToF lidar (ORADAR Technology Co., Ltd., Shenzhen, China) is selected for environmental obstacle detection and map construction. It communicates with the main control layer through the Ethernet interface, and the data update period is ≤200 ms. Obstacles with a diameter ≥5 cm can be identified. The obstacle detection accuracy rate is ≥98% under complex backgrounds such as auditorium and venue markings, providing accurate obstacle distribution information for path planning.
Global localization is entrusted to the OPS-9 optical solution, eliminating the need for absent GPS signals indoors. A camera positioned 50 cm above the robot surveys the scene. It recognizes 5 cm black circular landmarks within ±1 mm. A template-matching algorithm combined with perspective transformation converts image coordinates into the global frame at 10 Hz, yielding robot pose x, y, θ with an uncertainty below 2 cm and providing the path planner with an absolute positional reference.
The high-precision IMU selects BMI088 (TDK InvenSense, Tokyo, Japan) and is deployed in the main control layer for robot attitude correction and motion state estimation. The sensor integrates a 3-axis accelerometer and a 3-axis gyroscope and communicates with the main control layer through the I 2 C interface, and the data transmission delay is ≤5 ms. Fusing BMI088 data with OPS-9 positioning data can effectively suppress the noise interference of positioning data. After fusion, the heading angle error is ≤0.5°, providing stable attitude feedback for motion control.
The power stage delivers stable and reliable energy to the whole system. A 24 V, 15 Ah lithium–polymer pack serves as the main source and is partitioned through synchronous buck modules of ≥94% efficiency. A 19 V, 5 A rail feeds the NVIDIA Jetson Orin Nano, covering its 30 W peak demand. A 12 V, 3 A rail supplies the stereo cameras, lidar, and linear shooting actuator. A 5 V, 5 A rail powers the STM32 controller (DJI Innovations, Shenzhen, China), infrared sensors and intake motors. A 3.3 V, 2 A rail, regulated by an ultra-low-drop LDO, supports the MPU6050, BMI088, and other low-power sensors. Under full load—vision processing, path planning, and motor drive active simultaneously—the average dissipation is 25 W, giving ≥90 min of continuous operation. Recharge via a 24 V, 5 A charger lasts ≤3 h, easily covering a 40 min match and daily training sessions.

3.1.3. Software System Architecture

The software system is built upon ROS 2 Humble Hawksbill and adopts a hierarchical modular architecture of “Perception–Decision–Execution.” Each layer communicates via ROS 2 Topics, Services, and Actions, ensuring inter-module decoupling and extensibility. The system achieves an end-to-end latency of ≤100 ms, fulfilling real-time control requirements.
The perception layer software is responsible for environmental data acquisition, preprocessing, and feature extraction, sequentially outputting target detection results, an environmental map, and the robot’s pose. It consists of three core submodules connected in series: visual processing, LiDAR processing, and localization fusion. Visual processing, implemented in PyTorch 2.8.0 with CUDA 12.8, utilizes an improved YOLO11 multi-dimensional attention fusion model. Optimized with TensorRT at FP16 precision, the inference speed is doubled. Images from the front and top cameras are captured synchronously, uniformly resized to 640 × 640, converted to BGR format, normalized to [0,1], and then fed into the network. The network outputs the category (confidence ≥ 0.5), pixel-level bounding boxes, and center coordinates in the robot’s body frame for basketballs, hoops, and obstacles. The detection frame rate is synchronized with the camera’s 60 Hz and supports online adjustment of thresholds and weights via a parameter server to adapt to lighting and scene variations. LiDAR processing is built on PCL: point clouds are first denoised using statistical filtering (k = 10, σ = 1.0) and then downsampled with a 0.05 m voxel grid. The ground plane is extracted by RANSAC using a 0.02 m distance criterion. Remaining points are grouped by Euclidean clustering at 0.3 m separation and a five-point minimum, yielding obstacle position, size, and count at 10 Hz. Simultaneously, a 20 m × 15 m 2D occupancy grid map with a resolution of 0.05 m is generated, distinguishing free (0), occupied (100), and unknown (−1) cells, updated at 5 Hz for path planning. The localization fusion module employs an extended Kalman filter (EKF) to tightly couple OPS-9 global positioning, BMI088 IMU, and wheel odometry. The state vector x = [ x , y , θ , v x , v y , ω ] T is predicted using a kinematic model and subsequently corrected by a nine-dimensional observation vector z . It ultimately outputs the global pose and motion state at 50 Hz, with a positional standard deviation error ≤0.02 m and an angular standard deviation error ≤0.5°.
The decision layer software serves as the intelligent core of the system, undertaking task planning and path planning functions. It formulates macroscopic task workflows and microscopic motion paths based on environmental information from the perception layer, comprising a task planning module and a path planning module.
The task planning module is built as a finite state machine with seven states: Standby, Search for Target, Move to Target, Pick Up Ball, Move to Shooting Zone, Shoot, and Avoid Obstacle. Transitions are triggered by events such as target detection, task completion, or obstacle warnings. After receiving a Start Task request from the host computer, the robot leaves Standby and enters Search for Target. Once a basketball is detected with confidence above 0.6, the system proceeds to Move to Target, issuing a position request to the path planner. If the LiDAR reports an obstacle closer than 0.5 m during approach, the robot switches to Avoid Obstacle and resumes its prior state when the path is clear. Progress reports—Picking Up Ball, Shooting Completed, and similar—are sent to the host computer through ROS 2 actions, enabling real-time monitoring and debugging.
The path planning module executes an improved Dijkstra algorithm, subscribing to the robot’s current pose, occupancy grid map, and service-requested target position, subsequently publishing a sequence of waypoints (one waypoint per 0.1 m) to the waypoint sequence topic. The algorithm workflow comprises the following: ➀ Map Preprocessing: converting the occupancy grid map into a graph structure where edge weights between adjacent grids are set to 1 (orthogonal) or 2 (diagonal), with obstacle grids marked impassable; ➁ Priority Queue Initialization: initializing nodes to be expanded using a binary heap structure, setting the source node (current pose) cost to 0 and all other nodes to infinity; ➂ Node Expansion: selecting optimal nodes for expansion based on heuristic function f ( n ) = g ( n ) + h ( n ) (where g ( n ) is actual cost from source node, h ( n ) is Euclidean distance heuristic cost), updating costs of adjacent nodes; ➃ Path Generation: after reaching target node, the module traces back through the predecessors to reconstruct the path and smooths it with a quadratic Bézier curve using three control points and a tension factor of 0.5 before publication. Dynamic replanning is enabled whenever the LiDAR finds a new obstacle within 0.3 m of the current trajectory. Only the 3 m neighborhood around the obstacle is updated and a new local path is produced within 50 ms.
The execution layer software translates decision-layer commands into mechanical actions, collaboratively performing path tracking, ball collection, and shooting tasks through motion control and mechanism control modules. The motion control module receives waypoint sequences and current robot pose at 100 Hz, computes target wheel speeds via inverse kinematics of Mecanum wheels, and employs model predictive control (MPC) to solve quadratic cost functions, minimizing position and orientation errors over a 50 ms receding horizon and generating optimal velocity commands. Simultaneously, using actual wheel speeds feedback from motor encoders, closed-loop regulation is achieved through cascaded “velocity–current” PID control, ensuring straight-line tracking errors ≤2 cm and curvature tracking errors <1.5°. The mechanism control module comprises ball-picking and shooting submodules: the ball-picking controller accepts pickup requests from the task planner and commands the swing motor to rotate the arm to ninety degrees where it touches the floor. Once the infrared sensor confirms ball contact, the arm is driven back to zero degrees and the result is returned through the service response. The shooting controller receives launch requests and sets the friction-wheel speed according to the required distance. For example, the free-throw distance of 5.8 m corresponds to three thousand rpm. A linear pusher then feeds the ball into the friction gap, and after launch, the shooting status is reported. Both subsystems include fault recovery. If a motor stalls and the current exceeds fifteen amperes, the motion is halted immediately and an alarm message is published to protect the mechanism.
The software system was developed in C++ and Python 3.12 using ROS 2 Humble on an ASUS GV301RE laptop (Taipei, Taiwan) with Ubuntu 22.04 LTS. It is deployed and runs on an NVIDIA Jetson Orin Nano with a 6-core ARM Cortex-A78AE CPU at 1.9 GHz, 8 GB LPDDR5 RAM, and a 1024-core NVIDIA Ampere GPU at 1 GHz. The system leverages CUDA 11.4 and cuDNN 8.2 for GPU acceleration, with TensorRT 8.4 optimizing deep learning inference.

3.2. YOLO11-MAF: The Improved Basketball Target Recognition Algorithm

As shown in Figure 4, the proposed YOLO11-MAF chose YOLO11n as the baseline model because it strikes an optimal balance between lightweight design and detection efficiency. Built upon a one-stage architecture, its backbone stacks C3K2 blocks with convolution layers to extract multi-scale features, outputting maps of 80 × 80, 40 × 40, and 20 × 20 pixels. The neck employs an FPN+PAN bidirectional fusion structure: global context flows top-down while detail cues are replenished bottom-up, reinforcing multi-scale representation. Finally, a decoupled head separates classification and regression into two independent branches, eliminating the optimization conflicts inherent in coupled heads and allowing each task to be supervised by its own loss. In this figure, the modules we added are marked in red.
The modules added give attention weights to feature maps, so they would provide adaptive methods to determine the origin number of channels, so there is no need to configure the hyperparameters manually. The parameter amount of YOLO11n is only 3.2 M, which is much lower than 7.5 M of YOLO11s. On the NVIDIA Jetson Orin Nano platform, the inference frame rate of 45 FPS can be realized, which meets the real-time detection requirements of basketball robots (≥30 FPS). At the same time, the pre-training weight of the model on the ImageNet dataset provides a good feature transfer basis for the basketball target detection task, which can greatly shorten the model training cycle and reduce the risk of over-fitting. The extra modules embedded are shown in Figure 5, Figure 6, Figure 7 and Figure 8.

3.2.1. CoordAttention Module

The CoordAttention module is embedded in the low-level feature map (40 × 40 resolution) of the Backbone layer. The module enhances the positioning accuracy of the model for moving targets by separating the horizontal and vertical position information. The workflow is divided into three steps: Firstly, the input feature map is globally averaged in the horizontal and vertical directions, and two one-dimensional feature vectors are obtained. The horizontal pooling formula is z c h ( h ) = 1 W 0 i W x c ( h , i ) , and the vertical pooling formula is z c w ( w ) = 1 H 0 j H x c ( j , w ) , where x c ( h , i ) is the pixel value of the i column of the c channel and the h row of the input feature map, and H and W are the height and width of the feature map. Secondly, after splicing two one-dimensional vectors, the channel dimension is compressed by 1 × 1 convolution, and the position attention weight is generated by the Sigmoid activation function. Finally, the weights are applied to the horizontal and vertical directions of the original feature map to realize the fusion of location information and channel features. This module can make the model accurately locate the target center in the motion blur area and reduce the positioning deviation of the moving target from 10 pixels to less than 3 pixels.

3.2.2. ECA Module

The ECA module is embedded in the high-level feature map (20 × 20 resolution) of the Backbone layer to strengthen the semantic dependence between the channels and improve the ability to distinguish between the target and the background. The ECA module abandons the dimension reduction operation of the traditional SE module and directly performs 1D convolution operation on the channel descriptor after global average pooling. The channel dependence is modeled by local cross-channel interaction, and its weight update formula is w c = σ ( C o n v 1 D ( G A P ( X ) ) ) , where G A P ( X ) is the global average pooling operation, and σ is the Sigmoid activation function. The convolutional kernel size is adaptively determined by the number of channels, increasing logarithmically as the channel count grows. This design can avoid the loss of information caused by dimension reduction, so that the model can focus on the characteristics of basketball-related channels more accurately and reduce the false detection rate from 20% to less than 8% in complex background.

3.2.3. MSCA Module

The MSCA module is embedded in the C3 module of the Neck layer, and the features are extracted in parallel through the multi-scale convolution kernel to enhance the adaptability to different sizes of targets. MSCA consists of three parallel branches equipped with 3 × 3, 5 × 5, and 7 × 7 kernels, respectively. Each branch is followed by BatchNorm and SiLU activation, producing scale-specific attention maps that are element-wise multiplied with the input feature map. The resulting multi-scale features are then fused by a 1 × 1 convolution and forwarded to subsequent fusion layers. This is carried out by simultaneously highlighting fine details for small instances (≈40 × 40 px) and global cues for large ones (≈200 × 200 px).

3.2.4. SPPF-LSKA Module

The SPPF module of the Neck layer is replaced by the SPPF-LSKA structure, and the global receptive field is expanded through the LSKA module to enhance the feature capture ability for long-range small targets. SPPF-LSKA retains the multi-scale pooling function of SPPF (5 × 5, 9 × 9, and 13 × 13 maximum pooling). The LSKA module incorporates a 31 × 31 large separable convolution kernel into the pooled feature map. The global attention weight is calculated by horizontal and vertical convolution interaction. The specific process is as follows: Firstly, horizontal convolution (31 × 1) and vertical convolution (1 × 31) are performed on the pooled feature map to obtain the feature map in two directions. Secondly, the two feature maps are spliced and the global attention weight is generated by the Sigmoid activation function. Finally, the weight is multiplied by the original pooling feature map to complete the feature weighting. This structure can extend the receptive field of the model from 13 × 13 of the original SPPF to 31 × 31, the detection rate of small targets beyond 8 m is increased by 25%.

3.3. The Improved Dijkstra Algorithm

To overcome the limitations of the traditional Dijkstra algorithm, the improved version introduces a binary heap to optimize the priority queue, thereby reducing computational complexity. It further integrates heuristic information to guide the search direction, achieving a coordinated enhancement of optimality, efficiency, and real-time to dynamic environments [27,28].

3.3.1. Priority Queue Optimization Based on Binary Heap

To address the efficiency bottleneck caused by linear scanning in selecting the minimum-cost node in the traditional algorithm, the improved algorithm adopts a binary heap (min-heap) data structure to implement the priority queue, thereby optimizing the time complexity of node selection and update operations. A binary heap is a complete binary tree that satisfies the property that “the cost of each parent node is less than or equal to that of its child nodes.” It supports both extraction of the minimum element and node insertion operations in O ( log V ) time [29,30]. The core design logic is as follows:
The elements in the priority queue are defined as ( f ( n ) , g ( n ) , n ) , where n denotes the node index, g ( n ) represents the actual cost from the source node to node n (grid distance, where the cost of horizontal/vertical moves is 1 and that of diagonal moves is 2 ), and f ( n ) is the node priority evaluation function (defined later in conjunction with heuristic information). During initialization, the source node is assigned g ( s ) = 0 (where s is the source node), while all other nodes are assigned g ( n ) = . All nodes are inserted into the binary heap in the form of ( f ( n ) , g ( n ) , n ) , with the heap top corresponding to the node with the smallest f ( n ) .
During node expansion, the node u with the minimum cost is extracted from the heap top. If u is the target node, the search terminates; otherwise, the algorithm traverses its eight neighboring nodes v (including diagonal moves to enhance motion flexibility). If v corresponds to an obstacle grid (occupancy value 80 ) or lies outside the map boundary, it is skipped. The new cost of node v is computed as g new = g ( u ) + c o s t ( u , v ) , where c o s t ( u , v ) represents the grid cost from u to v. If g new < g ( v ) , then g ( v ) is updated as g new , f ( v ) is recalculated, and node v is reinserted into the heap, which is subsequently adjusted to maintain the heap property.
With binary heap optimization, the time complexity of node extraction and insertion is reduced from O ( V ) to O ( log V ) , and the overall algorithmic complexity is optimized from O ( V 2 ) to O ( ( V + E ) log V ) under the usual adjacency list/heap implementation assumptions [30].

3.3.2. Heuristic Information Fusion and Priority Optimization

To overcome the blind search behavior of the traditional algorithm, the improved version introduces a Euclidean distance-based heuristic function h ( n ) and constructs the node priority evaluation function as
f ( n ) = g ( n ) + λ · h ( n )
where λ denotes the heuristic weight coefficient. This incorporation of heuristic information guides the search direction and effectively reduces unnecessary node expansions.
The Euclidean distance heuristic estimates the straight-line distance between node n ( x n , y n ) and the target node t ( x t , y t ) , calculated as
h ( n ) = ( x n x t ) 2 + ( y n y t ) 2
This function satisfies the admissibility condition, i.e., h ( n ) h * ( n ) , where h * ( n ) is the actual minimal cost from node n to the target, thus ensuring that the algorithm still produces a globally optimal path. Compared with the Manhattan distance (which is suitable for grids allowing only horizontal and vertical moves), the Euclidean distance better reflects the omnidirectional motion characteristics of basketball robots, providing a more accurate estimation of the actual distance to the target and reducing invalid searches caused by heuristic errors.
To further adapt to dynamic obstacle density variations in basketball scenarios, a dynamic heuristic weight adjustment mechanism is introduced. The obstacle density ρ (number of obstacle cells divided by total cells) within the current planning area is detected in real time using LiDAR, and the λ value is adjusted adaptively as follows:
  • When ρ > 25 % (high obstacle density, e.g., multi-robot intersection areas), λ = 1.2 to strengthen heuristic guidance toward the target and reduce invalid expansions in obstacle regions;
  • When ρ < 15 % (low obstacle density, e.g., near field boundaries), λ = 0.8 to prioritize path optimality;
  • When 15 % ρ 25 % (moderate density), λ = 1.0 to balance efficiency and optimality.
Adaptive and weighted heuristic strategies (including Weighted A* and Adaptive A*) have been studied to trade off solution quality and search effort, and incremental/replanning algorithms (e.g., D* Lite, Adaptive A*) address dynamic environments typical in mobile robot navigation [27,31,32,33]. The obstacle density ρ is updated in synchronization with the LiDAR scanning frequency (10 Hz), ensuring that heuristic weight adjustment responds in real time to environmental changes. After integrating the heuristic information, the algorithm preferentially expands nodes that are both low-cost and closer to the target, significantly reducing the proportion of redundant node expansions [28].

4. Experiments and Results

4.1. Visual Recognition Performance Test

4.1.1. Dataset

The open-source dataset used in this paper is available at https://github.com/tranvietcuong03/Basketball_Detection/tree/master/Basketball, accessed on 1 January 2024. The dataset comprises 8521 training images, 812 validation images, and 406 test images, with examples shown in Figure 9. Spanning indoor and outdoor arenas under varying illuminations, the dataset captures authentic game plays and casual moments, annotated with five labels: basketball, made, person, rim, and shoot. Pronounced variations in color, scale, occlusion, blur, and noise drive the network to mine highly discriminative features, substantially improving cross-scene generalization.

4.1.2. Validation and Ablation Experiment Design

To systematically validate the effectiveness of the improved algorithm, an ablation experiment is designed. Training is conducted on a workstation equipped with RTX 4090, and deployment is transferred to the actual robotic hardware platform NVIDIA Jetson Orin Nano, 20 TOPS. Using YOLO11-Nano as the performance baseline, the contribution or degradation of each module on accuracy and speed is evaluated individually. Both training and inference are performed with a uniform input size of 640 × 640 to ensure rigorous and comparable experimental results.
Model performance metrics typically encompass two primary aspects: detection accuracy and detection speed. Detection accuracy metrics include accuracy, precision, recall, F1 score, average precision (AP), and mean average precision (mAP) across multiple classes. These metrics can be collectively visualized through precision (P) curves, recall (R) curves, and precision–recall (P-R) curves. The specific formulas for these metrics are as follows:
P = T P T P + F P × 100 %
R = T P T P + F N × 100 %
A P = 0 1 P ( R ) d R
m A P = 1 n i = 1 n A P i
For the performance metrics of the YOLO model, this article takes the intersection to union ratio threshold IOU = 0.5 and calculates TP, FN, FP, and TN based on this. The AP calculation when mAP is set to IOU 0.5 is denoted as mAP@0.5.
The models are numbered in binary and each innovative module is activated and deactivated one by one according to Table 1, forming a systematic ablation experiment. The SPPF-LSKA module is directly compared with yolo11n and does not participate in the ablation experiment design.

4.1.3. Validation and Ablation Experiment Results

The results of the ablation experiment are shown in Table 2. As illustrated in Figure 10, the original data are transformed into logarithmic scale to facilitate clearer observation and analysis.
The yolo11-m7 model has a comprehensive performance improvement compared to the original YOLO11n model, so it can be confirmed that replacing SPPF with SPPF-LSKA has a positive impact on the model.
From the comparison in Table 2, the yolo11-m0model demonstrates the most substantial performance improvement and excels in all evaluation metrics. It achieves a precision of 88.8%, recall of 86.9%, mAP@0.5 of 92.0%, and mAP@0.5:0.95 of 67.6%. Compared to the baseline model, precision increased by 2.31%, recall by 3.02%, mAP@0.5 by 1.90%, and mAP@0.5:0.95 by 1.23%. This highlights the significant advantages of the ECA, CoordAtt, MSCAmodule in enhancing model performance.
To comprehensively quantify the performance gains of the proposed model, we benchmark its precision–recall curve against that of the vanilla YOLO11n, corroborating its superiority in detection accuracy.
As shown in Figure 11, the mAP@0.5 metrics for all categories exhibit improvements. Notably, the primary ’basketball’ category rises from 0.880 to 0.898and ’shoot’ from 0.865 to 0.904, while the curve for the ’made’ class is markedly more pronounced than that of the original model. These enhancements underscore the superior performance of the improved model, both qualitatively and quantitatively.

4.2. Experimental Results of Path Planning Algorithms

To validate the planning capability of the improved Dijkstra algorithm in complex environments, we benchmarked it against the classical Dijkstra and A* algorithms on randomly generated 300 × 300 occupancy grids. Obstacle densities ρ ∈ 0.00, 0.10, 0.20, 0.30 varied systematically, and each configuration was run independently 30 times. The average metrics are reported in Table 3.
As shown in Table 3, the improved Dijkstra algorithm and the A* algorithm produce nearly identical path lengths, both achieving globally optimal routes. However, the improved algorithm outperforms A* in terms of runtime for most obstacle densities, with an average efficiency improvement of approximately 8.2–16.2%, demonstrating the robustness of the improved Dijkstra algorithm under increasing obstacle densities.
Figure 12 illustrates the runtime trends of the two algorithms under varying obstacle densities. It can be observed that both algorithms exhibit increasing runtime as obstacle density rises. The improved Dijkstra algorithm consistently maintains lower computational costs, indicating superior efficiency and scalability while preserving path optimality.
Beyond theoretical complexity reduction, these results clearly indicate the computational efficiency gained through the proposed priority queue and heuristic enhancements. Within a reasonable density range, for identical planning tasks and identical path optimality, the improved Dijkstra algorithm consistently requires fewer milliseconds of computation than A*, reflecting its lower overhead in node expansions and priority updates. This gap remains stable across densities, highlighting that the improved algorithm not only scales better in theory but also delivers measurable real-time performance gains in practical execution.
To further demonstrate the effectiveness of the optimization, the traditional Dijkstra algorithm was independently tested. On a 100 × 100 grid, its average planning time was approximately 2.77 s (Table 4), whereas the improved algorithm required only about 124 ms on a larger 300 × 300 map—over 20 times faster. This indicates that the combination of binary heap optimization and heuristic guidance significantly reduces the time complexity from O ( V 2 ) to O ( ( V + E ) log V ) , effectively meeting the requirements of real-time path planning.
The core performance improvement arises from two key factors: (1) The binary heap optimization substantially decreases the time complexity of node selection; (2) The dynamic-weight heuristic focuses the search on effective regions. The synergy between these two mechanisms minimizes computational overhead. Under a 20% obstacle density (a typical competition scenario), the improved algorithm achieves a planning time of approximately 123 ms. Considering the additional perception and recognition delays in the full system pipeline, the end-to-end latency is lower than 150 ms. This response time is adequate for basketball robot applications, where robot motion speeds (1–1.5 m/s) lead to only 15–22.5 cm displacement within 150 ms. Such motion remains well within the tolerance of the sensing–planning–control loop, and reactive avoidance modules (running at sub-10 ms cycles) can further compensate for short-term disturbances. Therefore, the proposed planning framework fully satisfies real-time decision-making requirements and leaves sufficient computational margin for dynamic obstacle avoidance and high-level tactical behaviors.

4.3. End-to-End System Experimental Results

To verify the overall performance of the proposed basketball robot system, an end-to-end experimental platform was constructed in a simulated environment. The setup integrates the robot hardware system, ROS 2 Humble software (Open Robotics, Mountain View, CA, USA) framework, and multi-sensor fusion module. The testing field includes basketballs and volleyballs, serving as target and interference objects, respectively, as illustrated in Figure 13. The experiment consists of two submodules: passing and shooting. Each submodule contains three rounds, with 60 testing iterations per round. Key metrics recorded include recognition success rate, passing/shooting success rate, path planning time, end-to-end latency, interference avoidance efficiency, and return-to-start completion rate. All experiments were repeated under varying indoor lighting conditions to ensure statistical robustness.

4.3.1. Passing Phase Experimental Results

In the passing phase, the robot autonomously identifies and grasps the target ball (basketball), avoids interference balls (volleyballs), and delivers passes to the designated area. Three rounds were simulated: in Round 1, the robot starts from the initial position carrying a ball, performs the pass, and retrieves another ball from the midline area; Rounds 2 and 3 require autonomous retrieval of balls distributed across the half-court, three-point line, and interior zones for continued passing operations. The robot employs the improved YOLO11 vision model, enhanced by CoordAttention and LSKA modules, to improve small-object detection accuracy, particularly for distant balls. Path planning uses an improved Dijkstra algorithm with binary heap optimization and heuristic guidance, generating collision-free trajectories and returning to the start position after each round.
As the experimental results show in Table 5, 60 independent trials were conducted per round, totaling 180 attempts, with the basketball recognition success rate reaching 97.8% and the missed detection rate below 3%, even in distant three-point regions. The average path planning time was 124.7 ms, representing a 22.4% reduction compared with the traditional Dijkstra algorithm. The interference avoidance success rate achieved 100%, with an average of 0.2 contacts per trial. The passing completion rate reached 96.7%, of which 90% of passes accurately reached the designated area, and the end-to-end delay remained within 150 ms. Under lighting variations and interference scenarios, the system maintained strong robustness, with only 3.3% of tasks requiring replanning due to background disturbance. The return-to-start success rate was 100%, with an average remaining time of 1.2 s.

4.3.2. Critical Discussion of Passing Performance

The high passing completion rate (96.7%) and near-perfect interference avoidance (100%) demonstrate the effectiveness of the integrated perception–planning–control pipeline under structured indoor conditions. However, this performance is contingent upon several favorable assumptions: consistent lighting, limited dynamic interference, and precise initial localization. The low replanning rate (3.3%) suggests that environmental disturbances are currently modeled as isolated events rather than continuous challenges—real basketball environments involve persistent visual clutter, fast-moving opponents, and unpredictable occlusions, which may significantly increase replanning frequency and degrade real-time responsiveness.
Furthermore, while the average planning time of 124.7 ms meets real-time requirements, it leaves only a narrow margin for scalability in multi-robot scenarios where communication latency and coordination overhead must be considered. The 90% accuracy in reaching designated areas reflects good trajectory tracking but does not account for timing precision—an essential factor in coordinated passing. For instance, a pass arriving 0.5 s late may be technically accurate but tactically ineffective.
These results indicate that the system excels in execution-level tasks but remains limited in strategic adaptability. Future work should introduce adversarial testing (e.g., moving defenders) and evaluate temporal coherence metrics (e.g., pass synchronization error) to better assess competitive readiness.

4.3.3. Shooting Phase Experimental Results

In the shooting phase, the robot grasps the target ball (volleyball), navigates autonomously to the shooting zone, avoids interference balls, and performs the shooting action using a dual friction-wheel launcher. The experiment includes three rounds: in Round 1, the robot starts from the initial position carrying one ball, shoots, and retrieves another from the midline; Rounds 2 and 3 involve balls distributed across the half-court, three-point line, and inner regions for long-range shooting beyond the three-point line. The system must complete multiple shots within a limited time to assess both execution speed and stability. The vision module, integrated with ECA and MSCA attention mechanisms, enhances the tracking of fast-moving volleyballs and boundary precision during shooting. The local replanning mechanism in the path planner adapts to environmental variations and optionally returns the robot to its starting position.
As the results show in Table 6, there were over 60 shooting attempts per round, the shooting success rate averaged 96.1% across the three rounds, long-range shots beyond 6.75 m achieved 91.7% accuracy, and three-point shots achieved 93.3%. The average planning time was 124.8 ms, and the replanning response time was reduced to 108 ms. The interference avoidance efficiency exceeded 100% (no collisions occurred), with an average of 0.1 contacts per trial. The system maintained a consistent frame rate of 45 FPS, demonstrating real-time coordination between the perception and planning modules. The return-to-start completion rate was 100%, with an average remaining time of 1.3 s. Compared with the baseline system, the proposed approach improved the total task score by approximately 9.5%, primarily due to enhanced visual robustness and faster decision-making from the heuristic-optimized planner.
Through these end-to-end experiments, the proposed basketball robot system demonstrates excellent performance in both the passing and shooting phases, confirming the effectiveness of the improved visual perception and path planning modules and providing a reliable foundation for future on-field deployment.

4.3.4. Critical Discussion of Shooting Accuracy

The 93.3% shooting accuracy beyond the 6.75 m three-point line demonstrates the effectiveness of integrating a dual friction-wheel launching mechanism with model predictive control (MPC). This performance benefits from precise velocity regulation and real-time posture stabilization, enabling consistent launch dynamics under static conditions. However, the system’s accuracy is highly sensitive to pre-launch state consistency—specifically, ball feeding alignment and chassis stability. Experiments show that lateral positioning errors exceeding 2 cm or pitch angles greater than 1.2° during launch reduce success rates by up to 8.5%, indicating mechanical fragility in dynamic scenarios.
Furthermore, while MPC ensures smooth trajectory tracking, it operates on an open-loop firing command without closed-loop visual feedback during flight. As a result, mid-air disturbances (e.g., air currents or floor vibrations) cannot be compensated, limiting adaptability in unstructured environments. The current strategy also lacks intelligent shot selection—it always attempts high-probability shots regardless of defender proximity, reflecting tactical naivety compared to human players.
These results highlight that the robot excels in execution precision but remains limited in adaptive resilience and context-aware decision-making. Future improvements should incorporate in-flight observation (e.g., high-speed cameras), adaptive spin control, and reinforcement learning-based shot policies to enhance robustness and strategic intelligence.

4.4. Discussion on Experimental Conditions

4.4.1. Key Assumptions and Scope

This study is conducted based on several key assumptions.
  • In hardware, the robot operates on flat indoor ground without extreme environmental interference such as strong wind or intense light.
  • In visual perception, the targets possess distinctive color and shape features, ensuring algorithmic identification and that no objects that are highly similar to a basketball are present in the scene.
  • In path planning and motion control, the robot’s moving speed is moderate, avoiding sensor data distortion due to dynamic effects and ensuring that sensor fusion technology can effectively handle nonlinear factors such as wheel slippage, thereby ensuring the reliability of localization and navigation.
These assumptions simplify the tasks of object detection and tracking.

4.4.2. Model Uncertainties

Uncertainties exist in real-world applications, particularly sensor noise. Sensors such as cameras, LiDAR, and IMU may produce data fluctuations or errors due to environmental factors or their own limitations. To address this issue, this study employs the built-in filtering algorithm tools based on the Extended Kalman Filter (EKF) in ROS 2 Humble to smooth sensor data and integrates multi-source information from odometry, LiDAR, etc., to improve the accuracy and robustness of system state estimation. While these measures help mitigate the impact of noise, the performance boundaries of the sensors still require further testing and optimization. For uncertainties in hardware, our electronic hardware is designed with a closed-loop feedback controller using “speed–current” cascade PID control and Field Orientation Control (FOC). This allows the controller to continuously correct for deviations.

5. Conclusions

We present a court-ready basketball robot that addresses two critical gaps: degraded vision under motion blur and high-latency global planning. A four-Mecanum-wheel chassis with low-profile drum intake and dual friction-wheel shooter is controlled by an NVIDIA Jetson–STM32 hierarchy.
ROS 2 keeps end-to-end delay below 150 ms. By embedding CoordAttention, ECA, MSCA, and SPPF-LSKA into YOLO11, mAP@0.5 rises 2.1% and small-ball recall improves by 3.6%, while the detector runs at 45 FPS. A binary heap Dijkstra with Euclidean heuristic cuts planning time from 6.8 s to less than 0.13 s on a 300 × 300 occupancy grid. Full-court tests show the complete pick–dodge–shoot cycle completed 24% faster than the baseline, with 97.8% pickup success and 98.3% shot accuracy during 2 h of continuous play.
Future efforts will integrate infrared-depth data with RGB vision and embed reinforcement learning-based tactical planners to further boost extreme-condition robustness and autonomous decision-making. A 5G/Wi-Fi 6-enabled distributed framework will then be deployed to achieve real-time multi-robot coordination and dynamic task allocation.

Author Contributions

Conceptualization, S.W. and Q.L.; methodology, S.W. and Y.Z.; data curation, K.H., J.L. and Y.X.; writing—original draft preparation, K.H. and Y.X.; writing—review and editing, S.W. and Q.L.; project administration, Q.L. and Y.Z.; funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China 62403108, the Liaoning Provincial Natural Science Foundation Joint Fund 2023-MSBA-075, the Fundamental Research Funds for the Central Universities N2426005, the Science and Technology Planning Project of Liaoning Province 2023JH1/11200011 and 2024JH2/10240049, the China Postdoctoral Science Foundation under Grant 2025M771708, and the Fundamental Research Funds for the Central Universities under Grant N25XQD046.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data supporting the findings are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Islam, R.U.; Iqbal, J.; Manzoor, S.; Khalid, A.; Khan, S. An autonomous image-guided robotic system simulating industrial applications. In Proceedings of the 2012 7th International Conference on System of Systems Engineering (SoSE), Genova, Italy, 16–19 July 2012; pp. 344–349. [Google Scholar]
  2. Shahria, M.T.; Sunny, M.S.H.; Zarif, M.I.I.; Ghommam, J.; Ahamed, S.I.; Rahman, M.H. A comprehensive review of vision-based robotic applications: Current state, components, approaches, barriers, and potential solutions. Robotics 2022, 11, 139. [Google Scholar] [CrossRef]
  3. Oyetunji, O.; Rain, A.; Feris, W.; Eckert, A.; Zabihollah, A.; Abu Ghazaleh, H.; Priest, J. Design of a Smart Foot–Ankle Brace for Tele-Rehabilitation and Foot Drop Monitoring. Actuators 2025, 14, 531. [Google Scholar] [CrossRef]
  4. Zohaib, M.; Pasha, S.M.; Javaid, N.; Iqbal, J. IBA: Intelligent Bug Algorithm—A Novel Strategy to Navigate Mobile Robots Autonomously. In Proceedings of the Communication Technologies, Information Security and Sustainable Development; Shaikh, F.K., Chowdhry, B.S., Zeadally, S., Hussain, D.M.A., Memon, A.A., Uqaili, M.A., Eds.; Springer International Publishing: Jamshoro, Pakistan, 2013; pp. 291–299. [Google Scholar]
  5. Girshick, R. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  6. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  7. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  8. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
  9. Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
  10. Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
  11. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, CA, USA, 3–6 December 2012; Volume 25, pp. 1097–1105. [Google Scholar]
  12. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  13. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
  14. Luo, Y. The Evolution of YOLO: From YOLOv1 to YOLOv11 with a Focus on YOLOv7’s Innovations in Object Detection. Theor. Nat. Sci. 2025, 87, 82–90. [Google Scholar] [CrossRef]
  15. Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
  16. Mat Radzi, S.F.; Abd Rahman, M.A.; Muhammad Yusof, M.K.A. YOLOv12-ECA: An Efficient Attention-Enhanced Detector for Real-Time UAV-Based Pothole Detection. In Proceedings of the 2025 14th International Conference on Information Technology in Asia (CITA), Kuching, Malaysia, 5–6 August 2025; pp. 102–107. [Google Scholar]
  17. Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 21–25 June 2021; pp. 13713–13722. [Google Scholar]
  18. Wei, Z.; Tian, F.; Qiu, Z.; Yang, Z.; Zhan, R.; Zhan, J. Research on Machine Vision-Based Control System for Cold Storage Warehouse Robots. Actuators 2023, 12, 334. [Google Scholar] [CrossRef]
  19. Karaman, S.; Frazzoli, E. Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
  20. Rösmann, C.; Hoffmann, F.; Bertram, T. Timed-Elastic-Bands for Time-Optimal Point-to-Point Nonlinear Model Predictive Control. In Proceedings of the 2015 European Control Conference (ECC), Linz, Austria, 15–17 July 2015; pp. 1–6. [Google Scholar]
  21. Wang, Z.; Zhou, X.; Xu, C.; Gao, F. Geometrically Constrained Trajectory Optimization for Multicopters. IEEE Trans. Robot. 2022, 38, 3259–3278. [Google Scholar] [CrossRef]
  22. Liu, Q.; Chen, P.; Lin, K.; Zhao, K.; Ding, J.; Li, Y. Sample-efficient backtrack temporal difference deep reinforcement learning. Knowl.-Based Syst. 2025, 330, 114613. [Google Scholar] [CrossRef]
  23. Zeng, J.; Fu, J. Basketball robot object detection and distance measurement based on ROS and IBN-YOLOv5s algorithms. PLoS ONE 2024, 19, e0310494. [Google Scholar] [CrossRef] [PubMed]
  24. Zhang, J.; Tao, D. Research on deep reinforcement learning basketball robot shooting skills improvement based on end-to-end architecture and multi-modal perception. Front. Neurorobotics 2023, 17, 1274543. [Google Scholar] [CrossRef] [PubMed]
  25. Lan, J.; Dong, X. Improved Q-learning-based motion control for basketball intelligent robots under multi-sensor data fusion. IEEE Access 2024, 12, 57059–57070. [Google Scholar] [CrossRef]
  26. Xu, T.; Tang, L. Adoption of machine learning algorithm-based intelligent basketball training robot in athlete injury prevention. Front. Neurorobotics 2021, 14, 620378. [Google Scholar] [CrossRef] [PubMed]
  27. Koenig, S.; Likhachev, M. D* Lite. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI ’02), Edmonton, AB, Canada, 28 July–1 August 2002; pp. 476–483. [Google Scholar]
  28. Liu, L.; Wang, X.; Yang, X.; Liu, H.; Li, J.; Wang, P. Path Planning Techniques for Mobile Robots: Review and Challenges. Appl. Soft Comput. 2023, 142, 109548. [Google Scholar]
  29. Williams, J.W.J. Algorithm 232: Heapsort. Commun. ACM 1964, 7, 347–348. [Google Scholar] [CrossRef]
  30. Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms, 3rd ed.; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
  31. Rivera, N.; Baier, J.A.; Hernández, C. Incorporating weights into real-time heuristic search. Artif. Intell. 2015, 226, 57–77. [Google Scholar] [CrossRef]
  32. Wilt, C.; Ruml, W. When Does Weighted A* Fail? In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI-12), Toronto, ON, Canada, 22–26 July 2012; pp. 674–680. [Google Scholar]
  33. Sánchez-Ibáñez, J.R.; Pérez-del-Pulgar, C.J.; García-Cerezo, A. Path Planning for Autonomous Mobile Robots: A Review. Sensors 2021, 21, 7898. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Basketball robot system design. (a) Overall design of basketball robot system. (b) Robot in actual environment.
Figure 1. Basketball robot system design. (a) Overall design of basketball robot system. (b) Robot in actual environment.
Actuators 14 00614 g001
Figure 2. Basketball robot system.
Figure 2. Basketball robot system.
Actuators 14 00614 g002
Figure 3. Electronic components connection diagram.
Figure 3. Electronic components connection diagram.
Actuators 14 00614 g003
Figure 4. YOLO11-MAF network architecture.
Figure 4. YOLO11-MAF network architecture.
Actuators 14 00614 g004
Figure 5. CoordAttention Module.
Figure 5. CoordAttention Module.
Actuators 14 00614 g005
Figure 6. SPPF-LSKA Module.
Figure 6. SPPF-LSKA Module.
Actuators 14 00614 g006
Figure 7. MSCA Module.
Figure 7. MSCA Module.
Actuators 14 00614 g007
Figure 8. ECA Module.
Figure 8. ECA Module.
Actuators 14 00614 g008
Figure 9. Dataset examples.
Figure 9. Dataset examples.
Actuators 14 00614 g009
Figure 10. Ablation experiment results.
Figure 10. Ablation experiment results.
Actuators 14 00614 g010
Figure 11. Comparison of precision–recall curve. (a) Precision–recall curve before improvement. (b) Precision–recall curve after improvement.
Figure 11. Comparison of precision–recall curve. (a) Precision–recall curve before improvement. (b) Precision–recall curve after improvement.
Actuators 14 00614 g011
Figure 12. Comparison of path-solving runtime between the improved Dijkstra and A* algorithms. (a) First Round of Simulation; (b) Second Round of Simulation; (c) Third Round of Simulation; (d) Forth Round of Simulation.
Figure 12. Comparison of path-solving runtime between the improved Dijkstra and A* algorithms. (a) First Round of Simulation; (b) Second Round of Simulation; (c) Third Round of Simulation; (d) Forth Round of Simulation.
Actuators 14 00614 g012
Figure 13. Test Test site schematic diagram. Areas A and B denote the starting areas for the robots. The green stars indicate the target positions for passing the ball, while the red stars represent the shooting targets (i.e., the small basketball hoop). The meanings of the other colored lines are indicated directly in the figure.
Figure 13. Test Test site schematic diagram. Areas A and B denote the starting areas for the robots. The green stars indicate the target positions for passing the ball, while the red stars represent the shooting targets (i.e., the small basketball hoop). The meanings of the other colored lines are indicated directly in the figure.
Actuators 14 00614 g013
Table 1. Enabling status of various model modules involved in ablation experiments.
Table 1. Enabling status of various model modules involved in ablation experiments.
ModelECACoordAttMSCASPPF-LSKA
yolo11-m0SPPF-LSKA
yolo11-m1 SPPF-LSKA
yolo11-m2 SPPF-LSKA
yolo11-m3 SPPF-LSKA
yolo11-m4 SPPF-LSKA
yolo11-m5 SPPF-LSKA
yolo11-m6 SPPF-LSKA
yolo11-m7 SPPF-LSKA
yolo11n SPPF
Table 2. Metrics of various models participating in ablation experiments.
Table 2. Metrics of various models participating in ablation experiments.
ModelPrecisionRecallmAP@0.5mAP@0.5:0.95
yolo11-m00.8977863290.869064430.9204503050.676009616
yolo11-m10.8671363760.8357043990.900513820.671960307
yolo11-m20.8504574170.8638269240.9001056010.667979664
yolo11-m30.8464176030.89220180.9079519380.672894904
yolo11-m40.8812525360.8491668240.9083670570.666251825
yolo11-m50.8757012610.8640300060.9136424060.671490847
yolo11-m60.8933770740.86545710.9163819360.676582832
yolo11-m70.8717809550.8605710030.9073619240.673468277
yolo11n0.8646955590.8389046810.9014892160.663698118
Table 3. Performance comparison between A* and improved Dijkstra on 300 × 300 maps.
Table 3. Performance comparison between A* and improved Dijkstra on 300 × 300 maps.
ρ Path A*Path Imp. DTime A* (ms)Time Imp. D (ms)Impr. (%)
0.00599599188.1157.516.2
0.10599599158.4134.315.2
0.20599599143.0123.413.7
0.30599599119.2109.48.2
Table 4. Overall performance comparison of three algorithms (100 × 100 and 300 × 300 grid maps).
Table 4. Overall performance comparison of three algorithms (100 × 100 and 300 × 300 grid maps).
AlgorithmMap SizeAvg. Runtime/msAvg. Path LengthCharacteristics
Traditional Dijkstra100 × 1002768.6199Sequential, O ( V 2 ) , not real-time
Improved Dijkstra300 × 300124.0602Heap + heuristic, O ( ( V + E ) log V )
A*300 × 300140.4602Optimal, slightly slower
Table 5. Performance Metrics for Each Round in the Passing Phase.
Table 5. Performance Metrics for Each Round in the Passing Phase.
MetricRound 1Round 2Round 3
Recognition Success Rate (%)98.398.396.7
Passing Completion Rate (%)98.39596.7
Interference Avoidance Success Rate (%)100100100
Average Planning Time (ms)124.5124.9124.6
Return-to-Start Rate (%)100100100
Table 6. Performance Metrics for Each Round in the Shooting Phase.
Table 6. Performance Metrics for Each Round in the Shooting Phase.
MetricRound 1Round 2Round 3
Recognition Success Rate (%)98.398.398.3
Shooting Success Rate (%)96.79596.7
Interference Avoidance Success Rate (%)100100100
Average Planning Time (ms)125.0124.7124.8
Return-to-Start Rate (%)100100100
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, S.; Xie, Y.; Huang, K.; Lang, J.; Liu, Q.; Zhuang, Y. Advanced Servo Control and Adaptive Path Planning for a Vision-Aided Omnidirectional Launch Platform in Sports-Training Applications. Actuators 2025, 14, 614. https://doi.org/10.3390/act14120614

AMA Style

Wang S, Xie Y, Huang K, Lang J, Liu Q, Zhuang Y. Advanced Servo Control and Adaptive Path Planning for a Vision-Aided Omnidirectional Launch Platform in Sports-Training Applications. Actuators. 2025; 14(12):614. https://doi.org/10.3390/act14120614

Chicago/Turabian Style

Wang, Shuai, Yinuo Xie, Kangyi Huang, Jun Lang, Qi Liu, and Yaoming Zhuang. 2025. "Advanced Servo Control and Adaptive Path Planning for a Vision-Aided Omnidirectional Launch Platform in Sports-Training Applications" Actuators 14, no. 12: 614. https://doi.org/10.3390/act14120614

APA Style

Wang, S., Xie, Y., Huang, K., Lang, J., Liu, Q., & Zhuang, Y. (2025). Advanced Servo Control and Adaptive Path Planning for a Vision-Aided Omnidirectional Launch Platform in Sports-Training Applications. Actuators, 14(12), 614. https://doi.org/10.3390/act14120614

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop