1. Introduction
High-degree-of-freedom (DoF) robot manipulators serve as essential components for flexible manipulation in complex environments and play an increasingly critical role in industrial automation and intelligent manufacturing systems [
1]. Composed of multiple links and joints, these systems operate in high-dimensional state spaces and must generate physically feasible motions to reach diverse target configurations within constrained workspaces [
2]. In such settings, computational efficiency and scalability become fundamental design requirements, particularly when environmental conditions and task specifications change frequently [
3].
Traditional motion planning approaches rely on explicitly defined kinematic and dynamic models, employing graph-search- or sampling-based algorithms to generate collision-free trajectories [
4]. Although these methods ensure feasibility and constraint satisfaction, their computational complexity increases significantly with increasing DoF and environmental complexity. Furthermore, dynamic environments often require repeated replanning, limiting real-time applicability. The stochastic nature of sampling-based planners may also produce structurally different trajectories for identical start-to-goal pairs, which can reduce execution consistency and complicate downstream learning [
5].
To address these limitations, data-driven motion generation methods have emerged as promising alternatives [
6]. Neural network-based policies can generate motions directly through inference once trained, bypassing iterative optimization and search [
7]. Reinforcement Learning (RL) and Imitation Learning (IL) represent two dominant paradigms for policy learning in high-dimensional robotic control [
8]. Although RL learns through interaction with the environment, it typically requires extensive exploration and a high training cost [
9]. In contrast, IL learns policies from demonstration data and often provides faster convergence and improved stability when high-quality demonstrations are available [
10].
In parallel, Digital Twin (DT) integrated with Internet of Things (IoT) technologies has enabled continuous synchronization between physical robotic systems and their virtual replicas. In DT-enabled pipelines, sensor data from IoT devices support real-time monitoring, predictive simulation, and iterative policy updates within the virtual environment. However, the effectiveness of IL in such frameworks strongly depends on the quality and structural diversity of the trajectory dataset [
11]. Even when trajectories are physically feasible, automatically generated motion candidates may contain redundant or task-irrelevant trajectories due to stochastic sampling processes [
12]. As dataset size increases, such redundancy can degrade learning efficiency, increase computational cost, and negatively affect generalization performance [
13]. Therefore, systematic trajectory curation that preserves structural representativeness while removing redundancy is essential for scalable IL in DT-based robotic systems.
Unlike existing trajectory filtering approaches that primarily focus on human-collected demonstrations or influence-based sample evaluation, this study addresses redundancy arising specifically from physics-based sampling planners. We propose a scalable imitation learning framework that automatically generates feasible motion trajectories using a physics-based planner, eliminating the reliance on expert teleoperation [
14]. To manage structural diversity within planner-generated trajectory pools, we introduce a hybrid trajectory filtering strategy that combines clustering-based grouping with rule-based quality assessment. By selectively leveraging trajectory clusters corresponding to specific task conditions and target states, the proposed approach reduces redundant samples while maintaining representative trajectory structures, thereby improving both policy performance and data efficiency. For clarity, we use the term “trajectory filtering” to refer to data curation processes that select informative trajectories from planner-generated datasets.
2. Related Works
Imitation Learning (IL) is a learning paradigm in which a robot acquires a control policy by observing and mimicking human demonstrations [
15]. Early studies referred to this approach as Learning from Demonstration (LfD) or Programming by Demonstration, and it was originally proposed as an alternative to reinforcement learning to address its low sample efficiency and long exploration times [
16]. Schaal (1996) demonstrated that demonstration data can be leveraged not merely as action replicas but as a means to learn the underlying dynamics of a task, significantly improving learning speed and stability even in nonlinear control problems [
17].
Subsequently, imitation learning was extended to a wide range of robotic manipulation tasks using robotic arms, where the quality and quantity of demonstration data were repeatedly shown to have a decisive impact on the final policy performance [
18]. Traditional approaches predominantly relied on expert demonstrations collected via teleoperation by skilled human operators. While this method ensures high-quality data, it inherently suffers from scalability limitations due to the substantial human labor and time costs required for large-scale dataset construction and task diversification [
19].
To reduce dependence on expert demonstrations, recent studies have explored various strategies for minimizing human involvement in demonstration data collection. For example, Efficient Data Collection for Robotic Manipulation via Compositional Generalization (2024) proposed an in-domain data collection strategy that exploits the compositional generalization capability of policies to cover a broader range of task variations with the same data collection effort [
20]. Similarly, Tool-as-Interface: Learning Robot Policies from Observing Human Tool Use (2025) demonstrated that robot policies can be learned without direct robot manipulation by reducing the embodiment gap through Gaussian Splatting-based 3D scene reconstruction and tool-centric action representations derived from videos of human tool use [
21].
As the scale of demonstration datasets increases, the inclusion of low-quality demonstrations, redundant trajectories, and task-irrelevant behaviors becomes more likely, thereby degrading learning efficiency and policy performance. Consequently, trajectory filtering techniques that selectively utilize only informative demonstrations from large, pre-collected datasets have emerged as an important research topic. Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets (2023) demonstrated the effectiveness of post hoc data selection by retrieving task-relevant state–action pairs from unlabeled large-scale datasets [
22]. CUPID (2025) applied influence functions to imitation learning to quantitatively estimate each demonstration’s contribution to closed-loop policy performance (expected return), enabling the removal of harmful data and the selective curation of beneficial trajectories [
23]. SCIZOR (2025) combined self-supervised task progress estimation with similarity-based deduplication to automatically filter suboptimal and redundant state–action pairs, achieving an average performance improvement of 15.4% [
24].
Although these studies have made significant progress in reducing demonstration costs and improving learning efficiency in large-scale data regimes, most existing approaches still rely heavily on human expert demonstrations, observation-based video data, or large-scale simulation environments. In particular, the systematic automatic generation of physically feasible trajectories under real-world constraints—and the exploitation of their structural properties for data selection—has not been sufficiently explored. Moreover, simulation-dependent approaches continue to suffer from limited real-world transfer due to the persistent sim-to-real gap [
25]. These limitations highlight the need for scalable methods that can automatically construct and curate trajectory datasets from physics-generated motion planning results.
The novelty of this work lies in the integration of physics-generated trajectory datasets with a hybrid data curation pipeline that combines task-conditioned clustering and rule-based trajectory quality evaluation. Unlike conventional imitation learning pipelines that rely on human demonstrations or large-scale vision datasets, the proposed framework focuses on scalable trajectory filtering for planner-generated motion data. This perspective shifts the focus from policy architecture design to scalable trajectory data curation for physics-generated motion datasets.
To address these limitations, this work makes the following contributions. First, we propose a low-cost and scalable data construction framework that automatically generates physically feasible trajectory data using a physics-based motion planner, without requiring human expert demonstrations. Second, we introduce a hybrid data selection method that combines clustering-based grouping with rule-based quality evaluation. Third, by selectively utilizing trajectory clusters corresponding to specific task conditions and goal states, the proposed approach effectively filters out unnecessary data, thereby improving both the performance of imitation learning policies and the efficiency of computational resource utilization.
3. Background
3.1. Imitation Learning
Imitation Learning (IL) is a methodology for learning a policy by mimicking expert demonstrations without explicitly defining a reward function [
26]. Representative approaches include Behavior Cloning (BC) and Inverse Reinforcement Learning (IRL) [
27]. BC directly learns state–action pairs, making it simple and efficient to implement. However, when the distribution of expert data differs from the distribution encountered during actual policy execution, covariate shift occurs, leading to compounding errors. Additionally, collecting expert demonstration data requires significant costs, posing practical financial challenges for applying imitation learning. In contrast, IRL infers the latent reward function implicitly followed by the expert and then learns the policy through reinforcement learning (RL) [
28]. While it offers high expressiveness (enabling more nuanced policies via reward inference), it suffers from increased computational costs.
Recent research has actively progressed to overcome these limitations of IL. For example, data re-collection methods like DAgger (Dataset Aggregation) mitigate distribution shift problems [
29], while adversarial IL techniques such as GAIL (Generative Adversarial Imitation Learning) [
30] and AIRL (Adversarial Inverse Reinforcement Learning) [
31] enable more accurate learning of expert policy structures. Furthermore, Vision–Language–Action (VLA) policy models utilizing large-scale Multi-Embodiment robot datasets, such as RT-1, RT-2, and Open-X Embodiment, significantly improve generalization performance and expand real-world applicability. In particular, RT-2 fuses vision, language, and action through transformer architectures to achieve zero-shot generalization in dynamic environments, but high-quality trajectory filtering remains essential to mitigate distribution shifts. Nevertheless, IL still faces structural and ethical limitations, including the high cost of acquiring high-quality demonstration data, difficulties in generalization to new environments, policy stability sensitive to data quality, and amplification of expert biases during real-world deployment.
3.2. Behavioral Cloning
Behavioral Cloning (BC) is a method that learns a deterministic policy to predict the same action as the expert given a state from expert demonstrations. BC has advantages in its simple structure, fast training process, and feasibility with low computational resources. However, when encountering new states not present in the expert data, the policy can rapidly accumulate errors due to distribution shift, which is a major limitation of BC in real-world robot control.
The training objective of BC is to minimize the following loss function:
is the expert demonstration dataset, and
is the policy network.
In this study, BC is used as the baseline model, employing an MLP (Multi-Layer Perceptron)-based BC MLP structure that takes state features as input and directly outputs the next-step joint values. The feedforward structure of MLP is suitable for deterministic mapping in joint-space control and avoids complex sequence dependencies, making it an appropriate baseline for comparing the effectiveness of data selection techniques. The hybrid filtering framework proposed in this study is expected to contribute to alleviating BC’s covariate shift problem.
3.3. Recurrent Neural Network
Recurrent Neural Networks (RNNs), specialized for processing sequential data, can model temporal dependencies and are widely used in problems where time-series information is critical, such as continuous joint states or trajectories in robots. RNN reflect past information in current action predictions through hidden states, showing effective characteristics in control problems where different actions are required over time even for the same input. The BC RNN model applies this RNN structure to Behavior Cloning to overcome the per-timestep independent prediction limitation of MLP-based BC. Since actual robot trajectories exhibit long-term dependencies, structures like LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Unit) are known to learn more stably than vanilla RNN. GRU reduces parameters by approximately 25% compared to LSTM while maintaining similar performance [
32]. BC RNN reflects the temporal continuity of trajectories, enabling smoother and more stable prediction of entire trajectories. However, for very long sequences, recent studies indicate that Transformer-based models (e.g., Decision Transformer) excel in handling long-term dependencies ([
33]; robotics applications in 2025).
3.4. Dropout
Dropout is a representative regularization technique designed to prevent overfitting in neural networks by probabilistically deactivating a subset of neurons during training, thereby discouraging excessive reliance on specific features. In imitation learning settings with diverse trajectory patterns and noisy demonstrations, Dropout facilitates more generalized policy learning, typically yielding a 10–15% improvement in generalization performance [
34].
When applying Dropout to recurrent neural networks (RNNs), such as LSTM or GRU, standard fully connected Dropout is generally avoided due to its disruptive effect on recurrent state dynamics. Instead, recurrent Dropout or variational Dropout is commonly employed [
35]. In particular, variational Dropout applies a consistent dropout mask across timesteps, preserving temporal consistency in sequence modeling. In this study, the
model incorporates these RNN-specific Dropout strategies, with dropout rates typically ranging from 0.2 to 0.5, to reduce overfitting in RNN-based behavioral cloning and enable more stable policy learning under significant data quality variations.
3.5. Mixture Density Network
Mixture Density Networks (MDNs) extend conventional neural network architectures by modeling probabilistic output distributions rather than producing single-point predictions [
36]. An MDN outputs the mixing coefficients (
), means (
), and variances (
) of
K Gaussian mixture components, thereby representing multimodal action distributions.
The
model combines RNN-based temporal modeling with MDN-based probabilistic action prediction, allowing multiple plausible actions to be inferred for the same state. This formulation is particularly advantageous in robot manipulation scenarios where multiple valid trajectories can achieve an identical goal. The MDN training objective is defined as:
Despite its expressive power, MDN training is known to suffer from challenges such as instability, slow convergence, and mode collapse, with performance highly sensitive to data quality, dataset size, and filtering strategies. Effective training often requires careful parameter initialization and temperature annealing, and several stabilization techniques have been proposed in recent studies. In the context of this work, MDNs are particularly well suited for capturing multimodal optima inherent in trajectories generated by physics-based sampling motion planners.
3.6. Random Sampling Algorithm
Random sampling is a core technique in sampling-based motion planning, enabling efficient exploration of high-dimensional configuration spaces without exhaustive search. Representative algorithms such as RRT (Rapidly exploring Random Tree) and PRM (Probabilistic Roadmap) incrementally construct feasible robot trajectories by sampling random states and connecting collision-free configurations. Due to their probabilistic completeness, these methods can find valid solutions with high probability as the number of samples increases.
In simulation environments such as Gazebo, sampling-based planners generate physically feasible trajectories while considering dynamic constraints. However, they often suffer from low sample efficiency in complex environments, particularly in narrow passages. These characteristics motivate the need for trajectory filtering strategies that can select informative samples from large sets of planner-generated trajectories. Such random sampling-based data generation forms the foundation of the hybrid trajectory filtering framework proposed in this study.
5. Experiment Setting
5.1. Experimental Environment and Data Generation
The experiments were conducted in a Gazebo simulator under the ROS (Robot Operating System) environment. The virtual robot model in Gazebo was described using URDF (Unified Robot Description Format), and MoveIt was adopted as the motion planner for robot control.
For trajectory data generation, motion planning algorithms provided by OMPL (Open Motion Planning Library), including RRT, RRT*, and FMT, were utilized. For each target, the motion planner was executed three times per algorithm, resulting in a total of 69 diverse trajectory candidates per target. The collected trajectory candidates were subsequently refined using the proposed HC Filter and then used as training data.
The robot employed in this study was the Doosan Robotics H2017 model. As shown in
Table 3, which compares robot model specifications with those used in prior studies, the H2017 has a higher payload capacity and a wider working range than the robots adopted in previous research.
5.2. Training and Evaluation Settings
The learning model incorporated four variations based on Behavior Cloning (BC): BC MLP, BC RNN, BC RNN Dropout, and BC RNN MDN. For trajectory prediction, a Sequential Approach was adopted. Instead of predicting the entire trajectory simultaneously, this method iteratively predicts the state of the next step based on the current state input until the target point is reached. This sequential approach offers distinct advantages, including high memory efficiency and the capability to respond in real time to dynamic environmental changes.
As defined in
Table 4, the success criterion for this experiment was established as a Euclidean distance of less than 0.05 m (5 cm) between the robot end-effector’s final position and the target point. This threshold of 5 cm aligns with the success criteria employed in prior studies addressing similar robotic reaching tasks.
6. Experiment Result
This chapter presents the experimental results conducted to validate the effectiveness of the proposed HC Filter. The comparative analysis focused on the average target reach success rate and the quality of generated trajectories, depending on the application of data selection techniques.
6.1. Comparison of Success Rates by Data Selection Method
To evaluate the performance of the proposed methodology, the General model (without data selection), the Rule-based model, and existing trajectory filtering studies (BehaviorRetrieval, SCIZOR, CUPID) were established as baselines. Four variations based on Behavior Cloning (BC) were utilized as learning models: BC MLP, BC RNN, BC RNN Dropout, and BC RNN MDN. The success criterion was defined as a distance of less than 0.05 m (5 cm) between the end-effector and the target point.
It should be noted that the primary objective of this study is not to propose a new policy architecture, but to investigate trajectory data curation strategies for imitation learning using physics-generated motion planning data. Therefore, the selected baselines were chosen as representative trajectory filtering and data selection approaches that have been widely explored in imitation learning research. While some of these methods are often applied to high-dimensional or vision-based datasets, they remain relevant benchmarks for evaluating the effectiveness of data selection strategies.
To ensure fair comparison, all filtering methods were evaluated using the same imitation learning models and identical experimental settings. In this way, the comparison focuses on the impact of trajectory data selection rather than differences in policy architectures, allowing the evaluation to isolate the effect of filtering strategies on policy performance.
Unlike prior trajectory filtering approaches such as Behavior Retrieval, SCIZOR, and CUPID, which primarily rely on similarity-based retrieval or influence estimation for selecting training samples, the proposed framework emphasizes computational scalability for planner-generated trajectory datasets. In particular, the hybrid filtering strategy operates directly on motion planning outputs through task-conditioned clustering and rule-based trajectory quality evaluation, enabling efficient grouping and ranking without requiring expensive influence estimation or pairwise similarity computations.
As shown in the experimental results (
Figure 2), the proposed clustering-based filtering techniques (K-Means, DBSCAN, Hierarchical) and their hybrid variants combined with rule-based methods achieved higher success rates compared to all baseline models. Specifically, the K-Means (Hybrid) technique demonstrated the highest success rate of 79.1%, while the Hierarchical (Hybrid) and DBSCAN (Hybrid) techniques achieved 78.4% and 76.6%, respectively. When compared to the General model without data selection (54.1%) and the Rule-Based General model (40.0%), these results indicate that the proposed methodology effectively enhances the quality of training data, thereby significantly improving policy performance.
Furthermore, in the case of the K-Means technique, applying clustering alone (Pure) yielded a success rate of 74.4%. However, the Hybrid approach, which incorporates the rule-based filter, reached 79.1%, showing an additional performance improvement of 4.7 percentage points. This demonstrates that the secondary rule-based filtering, following the primary data selection via clustering, provides more precise control over data quality.
6.2. Comparison of Generated Trajectory Quality
To evaluate the qualitative efficiency of the generated trajectories in addition to the success rate, we conducted a comparative analysis of (1) Average Final Distance (Goal Reaching Accuracy), (2) Average Trajectory Length (Trajectory Efficiency), and (3) Average Joint Movement (Actuation Efficiency).
As illustrated in
Figure 3, the proposed clustering and hybrid methods demonstrated superior performance compared to the baseline models across all trajectory quality metrics.
Goal Reaching Accuracy (Final Distance): The proposed techniques (e.g., K-Means Hybrid: 0.103 m) achieved significantly closer proximity to the target at the final position compared to the baselines (e.g., General: >0.3 m).
Trajectory Efficiency (Trajectory Length): Whereas the baseline models generated inefficiently long trajectories (e.g., General: >25 m), the proposed methods produced highly concise and efficient trajectories (e.g., K-Means Hybrid: approx. 2.0 m).
Actuation Efficiency (Joint Movement): The proposed techniques (e.g., K-Means Hybrid: approx. 2.36) drastically reduced the total joint movement compared to the baselines (e.g., General: >50). This confirms that the proposed methods generate efficient trajectories, which are beneficial in terms of energy consumption and mechanical wear.
In conclusion, the proposed hybrid data filtering technique was demonstrated to be highly effective not only in enhancing the success rate of imitation learning but also in generating high-quality trajectories that ensure both the accuracy and efficiency essential for real-world robot operations.
6.3. Comparison of Success Rates by Imitation Learning Algorithm
This section validates whether the proposed hybrid data clustering filter consistently enhances performance across a broad spectrum of imitation learning algorithms, demonstrating its generalizability beyond specific model architectures.
The experiments measured success rates by applying the proposed selection techniques (K-Means, DBSCAN, Hierarchical, and their hybrid variants) as well as the baselines (General, SCIZOR, etc.) to each of the four models defined in
Section 5.2 (BC MLP, BC RNN, BC RNN Dropout, and BC RNN MDN).
The most notable results were observed in models based on Recurrent Neural Networks (RNN), as shown in
Figure 4. When integrated with the BC RNN model, the proposed clustering techniques (DBSCAN Hybrid: 96.2%, Hierarchical Hybrid: 96.2%) achieved significantly higher success rates compared to the General model (82.5%). Specifically, for the BC RNN Dropout model, the proposed K-Means (Hybrid) and K-Means (Pure) techniques recorded a success rate of 97.5%. This result can be interpreted as a synergistic effect arising from the RNN effectively learning the sequential nature of trajectory data, Dropout mitigating overfitting, and the hybrid filter providing high-quality training data. The validity of these techniques was also confirmed in the BC MLP model, where Hierarchical (75.0%) and DBSCAN Hybrid (73.8%) methods consistently outperformed the General model (56.2%). This indicates that the proposed filter intrinsically enhances data quality independent of model complexity.
However, the BC RNN MDN model exhibited a divergent trend, with the baseline SCIZOR achieving the highest performance at 75.3%. This performance degradation in MDN-based models suggests a fundamental mismatch between the architecture and the filtered data. While MDN is designed to capture complex, multi-modal action distributions, the proposed hybrid filter selectively retains only the most efficient trajectories, effectively transforming a multi-modal planning problem into a unimodal one. This reduction in structural diversity likely hindered the training stability of the MDN, as its expressive capacity for diversity conflicted with the focused optimality of the curated dataset.
This observation highlights a potential trade-off between trajectory efficiency and structural diversity in curated datasets. While the proposed filtering strategy improves learning stability for deterministic policy models such as BC MLP and BC RNN, probabilistic architectures like MDNs may benefit from retaining a broader distribution of feasible trajectories. Future research could explore adaptive filtering strategies that preserve a controlled level of trajectory diversity while removing clearly inefficient trajectories. Such approaches may better support multi-modal policy architectures while maintaining the advantages of trajectory quality filtering.
In conclusion, the proposed hybrid data clustering filter induced a substantial performance enhancement across most imitation learning algorithms—including BC MLP, BC RNN, and BC RNN Dropout—by providing high-quality, streamlined training data.
6.4. Analysis of Success Rates by Selection Technique Across Target Distance Ranges
This section analyzes the impact of variations in the distance from the robot’s starting point to the target (Distance Range: 0.0 cm–5.0 cm) on the success rates of each model. The objective is to verify the stability of the control performance maintained by the proposed method despite varying task difficulties, as shown in
Figure 5.
- 1.
Stable Dominance in Short- and Medium-Range Tasks (0.0 cm–3.0 cm)
Maintenance of Top-Tier Performance by Hybrid Techniques: In environments where the target distance was 3.0 cm or less, K-Means Hybrid, DBSCAN Hybrid, and Hierarchical Hybrid consistently maintained high success rates, often exceeding 80% in short- and medium-range tasks.
Resilience to Increasing Difficulty: Notably, as the distance increased from 0.0 cm to 3.0 cm, the success rate of the General (unselected) model tended to decline. In contrast, the proposed hybrid techniques maintained high success rates based on curated high-quality data, demonstrating robustness against distance variations.
- 2.
Analysis of Performance Inversion in Long-Range Tasks (4.0 cm–5.0 cm)
Relative Strength of SCIZOR: In the 5.0 cm range, representing the highest task difficulty, the legacy technique SCIZOR recorded the highest performance, with a success rate of approximately 68%.
Limitations and Implications of Hybrid Techniques: A slight decline in the success rates of the proposed clustering-based techniques was observed as the distance increased. This suggests that trajectory complexity increases with target distance, and filtering based solely on efficiency (e.g., distance and smoothness) may not sufficiently capture the diverse state–action pairs required for reaching distant targets. Nevertheless, hybrid techniques often maintained competitive or higher performance compared to the General model (approx. 50%) and the Rule-Based General model, thereby demonstrating the effectiveness of data selection.
- 3.
Performance Disparity Among Trajectory Filtering Techniques
Limitations of CUPID: Across all distance ranges, CUPID recorded the lowest success rates (below 10%), suggesting limited compatibility with planner-generated trajectory data in this experimental setting.
Variability of the Optimal Technique: While the optimal hybrid algorithm (K-Means, DBSCAN, or Hierarchical) varied slightly depending on the distance, all consistently exhibited significantly higher performance compared to the unselected method (see
Appendix A for additional results)
7. Conclusions
This paper proposes a hybrid data curation framework for imitation learning that leverages physics-generated trajectories to address the high costs associated with expert demonstration collection. The framework automatically accumulates a large-scale set of candidate trajectories via random sampling algorithms utilizing formula-based motion planners, and subsequently curates high-quality training data through a Hybrid Data Clustering Filter that integrates clustering algorithms (KMeans, DBSCAN, Hierarchical) with rule-based selection. This approach focuses on overcoming the limitations of conventional imitation learning and enhancing practicality in real-world robot control.
Experimental results demonstrated that the model utilizing the proposed technique achieved a target reach success rate of 79.1%, outperforming models without data selection (General) and those applying legacy filtering methods (Cupid, Scizor). This represents a 25.0 percentage point improvement over the General model (54.1%), verifying that enhanced data quality significantly improves the stability and generalization capabilities of the learning policy. Notably, the Hybrid technique significantly reduced the data selection time compared to legacy methods, confirming its effectiveness in improving preprocessing efficiency and reducing system operational costs.
The primary contributions of this study are as follows: First, the proposed trajectory generation framework reduces the costs of imitation learning data collection and is characterized by securing higher-quality training data through filtering. It overcomes the limitations of expert demonstrations by collecting massive amounts of data at low cost and improving learning performance through quality curation. Second, by selectively utilizing training data from clusters corresponding to the target point, the approach conserves computational resources while achieving higher success rates compared to non-selective models. Third, the incorporation of rule-based techniques following clustering further enhances the quality of the generated trajectories.
Future research plans to validate the sim-to-real gap by applying the proposed framework to actual robot hardware and to generalize it to complex scenarios such as dynamic obstacles or multi-robot environments. Additionally, surrogate-assisted optimization techniques could be explored to reduce the computational cost associated with large-scale trajectory generation in simulation environments [
40]. Such approaches may improve the efficiency of trajectory dataset construction while maintaining the fidelity of physics-based motion planning. Furthermore, future work will investigate scalable trajectory filtering strategies for large trajectory pools generated in simulation environments, where efficient dataset curation becomes increasingly important as task complexity and planning dimensionality grow.