Author Contributions
Conceptualization, J.X. and X.W.; methodology, J.X. and Z.K.; software, J.X. and Z.K.; validation, J.X., B.S. and Y.H.; formal analysis, J.X. and B.S.; investigation, J.X. and Y.H.; resources, M.X.; data curation, B.S. and Y.H.; writing—original draft preparation, J.X. and Z.K.; writing—review and editing, X.W. and M.X.; visualization, B.S. and Y.H.; supervision, X.W. and M.X.; project administration, X.W. and M.X.; funding acquisition, X.W. and M.X. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Comparison of time-domain and frequency-domain planning in a sudden hazard scenario. (a) Time-Domain Planner: Delayed Response. (b) Frequency-Domain Planner: Prompt Global Adjustment. Red vehicles denote the ego vehicle and blue vehicles denote surrounding vehicles; the pedestrian icon indicates the hazard, and the speed–time curve illustrates the braking response over time.
Figure 1.
Comparison of time-domain and frequency-domain planning in a sudden hazard scenario. (a) Time-Domain Planner: Delayed Response. (b) Frequency-Domain Planner: Prompt Global Adjustment. Red vehicles denote the ego vehicle and blue vehicles denote surrounding vehicles; the pedestrian icon indicates the hazard, and the speed–time curve illustrates the braking response over time.
Figure 2.
Comparison of autoregressive trajectory generation processes. (Left): time-domain autoregression generates one incremental action per step, forming a trajectory through local accumulation. (Right): frequency-domain autoregression progressively refines a global trajectory by predicting frequency coefficients from low to high.
Figure 2.
Comparison of autoregressive trajectory generation processes. (Left): time-domain autoregression generates one incremental action per step, forming a trajectory through local accumulation. (Right): frequency-domain autoregression progressively refines a global trajectory by predicting frequency coefficients from low to high.
Figure 3.
Schematic of the incremental action representation for trajectory planning.
Figure 3.
Schematic of the incremental action representation for trajectory planning.
Figure 4.
Schematic of time-domain autoregressive planning with incremental action tokens and pose-based feedback.
Figure 4.
Schematic of time-domain autoregressive planning with incremental action tokens and pose-based feedback.
Figure 5.
Pipeline of frequency-domain action tokenization using DCT and Byte Pair Encoding (BPE) [
40] for compact vocabulary generation. The blue/orange/green curves and bars correspond to the three action dimensions
, respectively. The colored blocks use the same color coding and represent the corresponding quantized DCT-coefficient tokens after flattening. The red dashed line marks the truncation boundary, and the red curve highlights the merge process applied afterward.
Figure 5.
Pipeline of frequency-domain action tokenization using DCT and Byte Pair Encoding (BPE) [
40] for compact vocabulary generation. The blue/orange/green curves and bars correspond to the three action dimensions
, respectively. The colored blocks use the same color coding and represent the corresponding quantized DCT-coefficient tokens after flattening. The red dashed line marks the truncation boundary, and the red curve highlights the merge process applied afterward.
Figure 6.
Model architecture. The framework consists of three key components: (1) a multimodal scene encoder that aggregates context from maps, agents, traffic lights, and ego-state; (2) an autoregressive frequency-domain token decoder that generates a sequence of action tokens; and (3) a deterministic trajectory reconstructor that decodes tokens into frequency coefficients and reconstructs motion plans.
Figure 6.
Model architecture. The framework consists of three key components: (1) a multimodal scene encoder that aggregates context from maps, agents, traffic lights, and ego-state; (2) an autoregressive frequency-domain token decoder that generates a sequence of action tokens; and (3) a deterministic trajectory reconstructor that decodes tokens into frequency coefficients and reconstructs motion plans.
Figure 7.
Workflow of the GRPO algorithm. The process involves: (1) generating candidate action token sequences through an autoregressive model; (2) decoding and executing these sequences in a simulated environment to compute a multi-component reward (safety, comfort, rule compliance, progress); and (3) using the reward returns to compute advantages for policy optimization.
Figure 7.
Workflow of the GRPO algorithm. The process involves: (1) generating candidate action token sequences through an autoregressive model; (2) decoding and executing these sequences in a simulated environment to compute a multi-component reward (safety, comfort, rule compliance, progress); and (3) using the reward returns to compute advantages for policy optimization.
Figure 8.
Comparison of training and validation losses between time-domain and frequency-domain action representations across training steps. (a) Training loss comparison. (b) Validation loss comparison.
Figure 8.
Comparison of training and validation losses between time-domain and frequency-domain action representations across training steps. (a) Training loss comparison. (b) Validation loss comparison.
Figure 9.
Learning curves of reward versus training steps for different RL algorithms. The line styles distinguish different methods as indicated in the legend, and the shaded regions show ± one standard deviation over N random seeds.
Figure 9.
Learning curves of reward versus training steps for different RL algorithms. The line styles distinguish different methods as indicated in the legend, and the shaded regions show ± one standard deviation over N random seeds.
Figure 10.
Comparative response characteristics of two motion-planning paradigms in a sudden cut-in scenario. (a) Scenario BEV (blue: frequency-domain, orange: time-domain). (b) Ego vehicle speed profiles under time-domain vs. frequency-domain planning.
Figure 10.
Comparative response characteristics of two motion-planning paradigms in a sudden cut-in scenario. (a) Scenario BEV (blue: frequency-domain, orange: time-domain). (b) Ego vehicle speed profiles under time-domain vs. frequency-domain planning.
Figure 11.
Comparative response characteristics of two motion-planning paradigms in an emergency braking scenario. (a) Scenario BEV (blue: frequency-domain, orange: time-domain). (b) Ego vehicle speed profiles under time-domain vs. frequency-domain planning.
Figure 11.
Comparative response characteristics of two motion-planning paradigms in an emergency braking scenario. (a) Scenario BEV (blue: frequency-domain, orange: time-domain). (b) Ego vehicle speed profiles under time-domain vs. frequency-domain planning.
Figure 12.
Comparative control characteristics of two motion-planning paradigms in an abrupt obstruction by a stopped vehicle scenario. (a) Scenario BEV (blue: frequency-domain, orange: time-domain). (b) Ego vehicle speed profiles under time-domain vs. frequency-domain planning.
Figure 12.
Comparative control characteristics of two motion-planning paradigms in an abrupt obstruction by a stopped vehicle scenario. (a) Scenario BEV (blue: frequency-domain, orange: time-domain). (b) Ego vehicle speed profiles under time-domain vs. frequency-domain planning.
Figure 13.
Comparative control characteristics of two motion-planning paradigms in a sudden pedestrian crossing at a crosswalk scenario. (a) Scenario BEV (blue: frequency-domain, orange: time-domain). (b) Ego vehicle speed profiles under time-domain vs. frequency-domain planning.
Figure 13.
Comparative control characteristics of two motion-planning paradigms in a sudden pedestrian crossing at a crosswalk scenario. (a) Scenario BEV (blue: frequency-domain, orange: time-domain). (b) Ego vehicle speed profiles under time-domain vs. frequency-domain planning.
Figure 14.
Near-stop failure mode. The predicted trajectory points exhibit centimeter-level net displacement but persistent oscillatory geometry, resembling an inplace spinning or jittering behavior under near-zero speed. The blue circles denote the generated trajectory points, and the blue curve shows the resulting trajectory. The numbers indicate the sequential index of the points in the order they are generated. The arrow represents the displacement vector from the first point to the last point.
Figure 14.
Near-stop failure mode. The predicted trajectory points exhibit centimeter-level net displacement but persistent oscillatory geometry, resembling an inplace spinning or jittering behavior under near-zero speed. The blue circles denote the generated trajectory points, and the blue curve shows the resulting trajectory. The numbers indicate the sequential index of the points in the order they are generated. The arrow represents the displacement vector from the first point to the last point.
Table 1.
Computational cost comparison between time-domain and frequency-domain planners under the same hardware setting. Training time is reported in wall-clock hours using 24 A100 GPUs. Peak memory is measured as the maximum GPU memory footprint during training. Inference latency is measured per planning cycle in closed-loop evaluation, and we report the latency distribution statistics (mean, P95, and maximum) over all planning cycles.
Table 1.
Computational cost comparison between time-domain and frequency-domain planners under the same hardware setting. Training time is reported in wall-clock hours using 24 A100 GPUs. Peak memory is measured as the maximum GPU memory footprint during training. Inference latency is measured per planning cycle in closed-loop evaluation, and we report the latency distribution statistics (mean, P95, and maximum) over all planning cycles.
| Method | IL Time | RL Time | Peak Mem | Mean Lat. | P95 Lat. | Max Lat. |
|---|
| | (h) | (h) | (GB) | (ms) | (ms) | (ms) |
|---|
| Time-domain | 30 | 9 | 76 | 72.6 | 88.4 | 103.1 |
| Freq-domain | 40 | 14 | 69 | 81.8 | 101.9 | 118.5 |
Table 2.
Overall closed-loop planning performance on the nuPlan validation set reported as mean ± standard deviation over N random seeds.
Table 2.
Overall closed-loop planning performance on the nuPlan validation set reported as mean ± standard deviation over N random seeds.
| Planner | NR-CLS | Collision | TTC | Drivable | Speed | Comfort | Progress | R-CLS |
|---|
| Cont. + IL | | | | | | | | |
| Cont. + IL + PPO | | | | | | | | |
| Cont. + IL + GRPO | | | | | | | | |
| Freq. + IL | | | | | | | | |
| Freq. + IL + PPO | | | | | | | | |
| Freq. + IL + GRPO (Ours) | | | | | | | | |
Table 3.
Comparison with representative planning methods on the nuPlan benchmark. * denotes with rule-based post-processing. NR/R denote non-reactive and reactive closed-loop evaluation, respectively.
Table 3.
Comparison with representative planning methods on the nuPlan benchmark. * denotes with rule-based post-processing. NR/R denote non-reactive and reactive closed-loop evaluation, respectively.
| Type | Planner | Val14 | Test14-Hard | Test14-Random |
|---|
| | | NR | R | NR | R | NR | R |
|---|
| Expert | Log-Replay | 93.53 | 80.32 | 85.96 | 68.80 | 94.03 | 75.86 |
| Rule-based & Hybrid | IDM [41] | 75.60 | 77.33 | 56.15 | 62.26 | 70.39 | 72.42 |
| PDM-Closed* [35] | 92.84 | 92.12 | 65.08 | 75.19 | 90.05 | 91.64 |
| PDM-Hybrid* [35] | 92.77 | 92.11 | 65.99 | 76.07 | 90.10 | 91.28 |
| GameFormer* [42] | 79.94 | 79.78 | 68.70 | 67.05 | 83.88 | 82.05 |
| PLUTO* [36] | 92.88 | 89.84 | 80.08 | 76.88 | 92.23 | 90.29 |
| PlanAgent* [43] | 93.26 | 92.75 | 72.51 | 76.82 | - | - |
| Diffusion [44] | 94.26 | 92.90 | 78.87 | 82.00 | 94.80 | 91.75 |
| Carplanner* [45] | - | - | - | - | 94.07 | 91.10 |
| Learning-based | UrbanDriver [46] | 68.57 | 64.11 | 50.40 | 49.95 | 51.83 | 67.15 |
| PDM-Open [35] | 53.53 | 54.24 | 33.51 | 35.83 | 52.81 | 57.23 |
| PlanTF [47] | 84.27 | 76.95 | 69.70 | 61.61 | 85.62 | 79.58 |
| PLUTO [36] | 88.89 | 78.11 | 70.03 | 59.74 | 89.90 | 78.62 |
| Diffusion Planner [44] | 89.87 | 82.80 | 75.99 | 69.22 | 89.19 | 82.93 |
| Ours (Freq. Tokens + GRPO) | 90.82 | 88.31 | 79.62 | 78.19 | 92.44 | 91.08 |
Table 4.
Effect of frequency truncation length K reported as mean ± standard deviation over N independent runs. For “Infer. Time (ms)”, the reported value is the per-run average inference latency, and the standard deviation reflects run-to-run variation rather than per-cycle tail latency.
Table 4.
Effect of frequency truncation length K reported as mean ± standard deviation over N independent runs. For “Infer. Time (ms)”, the reported value is the per-run average inference latency, and the standard deviation reflects run-to-run variation rather than per-cycle tail latency.
| K | NR-CLS | Collision | TTC | Drivable | Speed | Comfort | Progress | R-CLS | Infer. Time (ms) |
|---|
| 8 | | | | | | | | | |
| 12 | | | | | | | | | |
| 16 | | | | | | | | | |
| 24 | | | | | | | | | |
Table 5.
Effect of token vocabulary size reported as mean ± standard deviation over N runs.
Table 5.
Effect of token vocabulary size reported as mean ± standard deviation over N runs.
| NR-CLS | Collision | TTC | Drivable | Speed | Comfort | Progress | R-CLS |
|---|
| 512 | | | | | | | | |
| 1024 | | | | | | | | |
| 2048 | | | | | | | | |
| 4096 | | | | | | | | |
Table 6.
Effect of tokenization scheme under the same frequency-domain representation reported as mean ± standard deviation over N runs. All frequency-domain methods use and are fine-tuned with GRPO. The vocabulary budget is matched to whenever applicable.
Table 6.
Effect of tokenization scheme under the same frequency-domain representation reported as mean ± standard deviation over N runs. All frequency-domain methods use and are fine-tuned with GRPO. The vocabulary budget is matched to whenever applicable.
| Planner | NR-CLS | Collision | TTC | Drivable | Speed | Comfort | Progress | R-CLS |
|---|
| Cont. | | | | | | | | |
| Freq. + Bin | | | | | | | | |
| Freq. + KM | | | | | | | | |
| Freq. + BPE | | | | | | | | |
Table 7.
Effect of planning horizon reported as mean ± standard deviation over N runs.
Table 7.
Effect of planning horizon reported as mean ± standard deviation over N runs.
| Horizon (s) | NR-CLS | Collision | TTC | Drivable | Speed | Comfort | Progress | R-CLS |
|---|
| 4.0 | | | | | | | | |
| 6.0 | | | | | | | | |
| 8.0 | | | | | | | | |