Abstract
Efficient scheduling of automated rail transportation in hilly orchards is critical for maintaining fruit freshness and ensuring timely market delivery. This study develops a dynamic scheduling method for multi-transporter orchard rail systems through mathematical modeling, reinforcement learning algorithms, and field validation. We formulated a comprehensive scheduling model and designed four distinct frameworks to address randomly arriving tasks. In the optimal framework (Framework 3, which was chosen due to its hybrid strategy combining periodic global planning and local task point adjustment), we compared six rule-based heuristic algorithms against three reinforcement learning approaches: centralized SAC, decentralized MARL-DQN, and conventional DQN. Additionally, two emergency response strategies were developed and evaluated. Simulation experiments demonstrated that Framework 3 maintained high load factors while reducing task completion times. The centralized SAC algorithm outperformed other methods, achieving 1533.71 ± 50.09 reward points compared to 863.67 ± 30.54 for rule-based heuristics, a 77.6% improvement. For emergency tasks, Strategy 2 achieved faster response times with minimal disruption to routine operations. Field trials on a 153 m physical track with four autonomous transporters validated the DQN algorithm, confirming good sim-to-real consistency. This research provides a practical solution for dynamic scheduling challenges in hilly orchards, offering measurable efficiency improvements over traditional methods.