Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Study on the Speed Decision Control of Agricultural Vehicles in a Collaborative Multi-Machine Operation Scenario

Sustainability 2025, 17(10), 4326; https://doi.org/10.3390/su17104326

by Guangfei Xu¹

, Jiwei Feng¹

, Quanjin Wang¹, Dongxin Xu¹, Jingbin Sun¹, Meizhou Chen^2,* and Jian Wu^1,*

Reviewer 1:

Boyi Xiao

Reviewer 2:

Jianbo Feng

Reviewer 3: Anonymous

Reviewer 4:

Andres Annuk

Sustainability 2025, 17(10), 4326; https://doi.org/10.3390/su17104326

Submission received: 24 March 2025 / Revised: 28 April 2025 / Accepted: 7 May 2025 / Published: 9 May 2025

(This article belongs to the Special Issue Sustainable Traffic Flow Management and Smart Transportation)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript proposes a speed decision control study in agricultural vehicle cooperation scenarios. Some suggestions are:

Avoid direct translation. In line 17, “reinforcement learning method” should be an “algorithm” other than “method”. Please check the remaining content.
For figure 1, it seems strange to present the applied methodology in the introduction section. In addition, the paragraph starting from line 104 and paragraph 119 are redundant.
In equation 15, the state space of the vehicle model is constructed, but its derivative relationship seems lost. What’s the bond with the RL algorithm as the agent only takes observations into account?
How is the RL training data generated/collected? The iteration goes up to 10000 times, hence the amount of pre-training data should be large.
org/10.1016/j.energy.2022.124105 may provide some new thoughts for the study.

Author Response

Comment 1: Avoid direct translation. In line 17, “reinforcement learning method” should be an “algorithm” other than “method”. Please check the remaining content.

Response 1: Thank you for your constructive advice. The statement of “reinforcement learning method” is changed into “reinforcement learning algorithm” of the whole manuscript.

Comment 2: For figure 1, it seems strange to present the applied methodology in the introduction section. In addition, the paragraph starting from line 104 and paragraph 119 are redundant.

Response 2: Thank you for your constructive advice. Figure 1 is the whole logical architecture to make the expression clear. the paragraph starting from line 104 and paragraph 119 described the detailed working principle of figure 1. According to your suggestions, we changed figure 1 and the relevant paragraph to Section 4 which is more proper to exhibit the algorithm principle and experimental logic.

Comment 3: In equation 15, the state space of the vehicle model is constructed, but its derivative relationship seems lost. What’s the bond with the RL algorithm as the agent only takes observations into account?

Response 3: Thank you for your constructive advice. Perhaps due to the draft being uploaded to the system, the derivative term might not have been displayed clearly. We rewrote the formula to ensure its complete display. The equation 15 is used to derive the LMI based speed tracking controller which has no direct connection with the RL algorithm.

Comment 4: How is the RL training data generated/collected? The iteration goes up to 10000 times, hence the amount of pre-training data should be large.

Response 4: The SUMO traffic simulator enables third-party systems to achieve reinforcement learning. In this case, TraCI will play the role of a "converter" between SUMO and reinforcement learning methods to establish this interaction. TraCI is capable of retrieving every piece of information in the simulation, including vehicles and networks. Based on the observation of the state, we can set and allocate rewards accordingly, and let reinforcement learning optimize the strategy according to the rewards. Afterwards, the reinforcement learning agent will assign new actions to the SUMO through TraCI and continuously observe the environmental state. The interaction between the reinforcement learning agent and the environment through TraCI will continue until the termination state is reached or the agent meets the termination condition. Each episode of interaction stores the required vehicle status data, such as speed, acceleration and other states.

In this research, the iteration is set to 10000, during each training step, take the average value every 10 times to make it easier to train.

Comment 5: org/10.1016/j.energy.2022.124105 may provide some new thoughts for the study.

Response 5: Thank you for your constructive advice. We have read the whole article, and this article is related to our study which can provide a good reference for this research. We have cited this article in our manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

The introduction is written in a chaotic manner without a clear logic chain, the majority of this part focuses on speed control, then what about the more important concept of "speed decision"? Besides, the authors claim that the present body of knowledge seldom considers uncertainties and disturbances, which deviates greatly from the reality.
The explicit contributions or innovations of this research need to be presented in the section of introduction or conclusions.
The method of dealing with variations of friction and other uncertainties is quite odd by the simple transformation to the uncertainty of tire cornering stiffness, please elaborate on that issue. In addition, how is that uncertainty tackled with in both the controller design and the simlation test?
There are some peculiar expressions of the terms, for instance, the authors use "lateral deflection the rigidity" instead of "cornering stiffness", and "transverse pendulum" for a simple "yaw".
The explanation of Fx is missing, the longitudinal forces on the four tires are identical?
The setting of reward of operational efficiency is quite vague in itself, the statement is somewhat paradox, please elaborate on equation (33).
The authors claim to emphasize on operational efficiency, yet they put more weight on fuel consumption, see the section after (37).
A major concern about the research is the neglect of correlation of the agricultural vehicle with the co-working machines, the whole process of decision making and controller design is completely concentrated on the vehicle, what is the effect and meaning of "Collaborative Multi-Machine Operation"? Is it not the kind of vehicle platoons or vehicle formation control? How is it reflected in the controller design?
The results representation is quite obscure, is it possible to transform the distribution-kind of plots to variations-versus-time kind of plots? It will alleviate the readers to fully understatnd the results.

Author Response

Comment 1: The introduction is written in a chaotic manner without a clear logic chain, the majority of this part focuses on speed control, then what about the more important concept of "speed decision"? Besides, the authors claim that the present body of knowledge seldom considers uncertainties and disturbances, which deviates greatly from the reality.

Response 1: Thank you for your constructive advice. The main concept of this manuscript is to make a speed decision-making to form the speed tracking target. Therefore, both of speed decision-making and speed control are important. In the introduction section, the present body of knowledge seldom considers uncertainties and disturbances maybe not proper, so we revised it into a proper statement. Finally, we have rearranged the introduction to make it more logic.

Comment 2: The explicit contributions or innovations of this research need to be presented in the section of introduction or conclusions.

Response 2: Thank you for your constructive advice. The explicit contributions or innovations of this research are summarized and added into Introduction part.

Comment 3: The method of dealing with variations of friction and other uncertainties is quite odd by the simple transformation to the uncertainty of tire cornering stiffness, please elaborate on that issue. In addition, how is that uncertainty tackled with in both the controller design and the simlation test?

Response 3: Thank you for your review. In this research, as the agriculture vehicle mainly works in the complex field. But the current tire model has shown a certain degree of inaccuracy, mainly due to uncertainties related to unconsidered nonlinearities and disturbances affecting tire operating conditions. Factors such as tread depth, inflation pressure, tire temperature, and road surface conditions significantly impact the force and torque characteristics of tires. These factors can change considerably during tire operation and notably affect tire and vehicle performance. Tire cornering stiffness is a key parameter used to describe the lateral force of the tire. It will change according to the changes of a series of factors such as the current tire lateral angle, tire pressure, vertical load and road friction coefficient, so it has uncertainty. If the tire cornering stiffness is selected as a constant value, there will be a large error with the actual engineering phenomenon. Therefore, the uncertainty is considered into tyre cornering stiffness.

In the controller design, based on the built control model including uncertainty of tire cornering stiffness as:

The closed-loop system is guaranteed to be stable with uncertainty by means of LMI constraints. In order to make the above closed-loop system asymptotically stable and satisfy the evaluation index, a positive ρ is given if and only if there exists a symmetric positive definite matrix P satisfying the following inequality:

Then, use convex optimization tools (such as MATLAB's LMI Toolbox or YALMIP) to solve the optimization problem.

In the simulation test, the uncertainty can be present in complex situations. Then the designed LMI controller can suppress the influence of uncertainties as much as possible and achieve precise speed control.

Comment 4: There are some peculiar expressions of the terms, for instance, the authors use "lateral deflection the rigidity" instead of "cornering stiffness", and "transverse pendulum" for a simple "yaw".

Response 4: Thank you for your review. The manuscript is finished by all authors, and the expression of some technical term may be different, so we have checked up the whole manuscript to unify the statements.

Comment 5: The explanation of Fx is missing, the longitudinal forces on the four tires are identical?

Response 5: Thank you for your review. It may be lost during the process of uploading to the system of the manuscript, and we have checked up and made sure that the explanation of Fx at the corresponding positions. To simplify the model and make it easy to derive the controller, the longitudinal forces are set to identity.

Comment 6: The setting of reward of operational efficiency is quite vague in itself, the statement is somewhat paradox, please elaborate on equation (33).

Response 6: Thank you for your review. Agricultural vehicles have an optimal operating speed range. During the optimal operating range, the reward first increases and then decreases with the operating speed. When the speed is lower than the optimal speed, the reward increases with the increase of the operation speed. When the speed is higher than that of the writer's homework, the reward decreases as the homework speed increases. Also consider that the speed change should be as smooth as possible to ensure good operational efficiency. To make it clear, the function is written into a clear form:

The general relationship between rewards and speed can be shown in the following figure.

The relevant statement is added into manuscript.

Comment 7: The authors claim to emphasize on operational efficiency, yet they put more weight on fuel consumption, see the section after (37).

Response 7: Thank you for your review. The speed decision-making strategy considers operational efficiency, fuel consumption, safety and smoothness to decide the final speed. According to the production need, the weights between them can be adjusted. In this research, the fuel consumption received more attention. However, this is a good remind for us to make further research to try other more weight combination to find more operation modes.

Comment 8: A major concern about the research is the neglect of correlation of the agricultural vehicle with the co-working machines, the whole process of decision making and controller design is completely concentrated on the vehicle, what is the effect and meaning of "Collaborative Multi-Machine Operation"? Is it not the kind of vehicle platoons or vehicle formation control? How is it reflected in the controller design?

Response 8: Thank you for your review. In the process of speed decision design, the correlation between the position and speed of the current operation vehicle and the surrounding operation vehicles was considered, and thus the reward design of reinforcement learning was carried out. The specific correlation can be seen from Table 2. From mentioned considered operational efficiency, fuel consumption, safety and smoothness, "Collaborative Multi-Machine Operation" can be reflected in speed decision-making stage.

It is like the kind of vehicle platoons or vehicle formation control, but the driving environment is different, and the speed decision-making should be more inclined to field work. According to the field working characteristics of agriculture vehicle, the speed decision-making considers operational efficiency, fuel consumption, safety and smoothness to train. And the speed tracking controller considers the complex field environment to make the design.

Comment 9: The results representation is quite obscure, is it possible to transform the distribution-kind of plots to variations-versus-time kind of plots? It will alleviate the readers to fully understatnd the results.

Response 9: Thank you for your advice. In fact, the number of training sessions and the time during the training process are also positively correlated, and their variation patterns are basically the same. The training time may vary due to different devices. However, the relationship between the variable distribution and the number of training sessions does not change significantly with the replacement of equipment, and its reproducibility is high. The opinions you put forward will be carefully considered by us, which will help us express the experimental results more clearly and completely.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Please see the attachment for specific comments

Comments for author File: Comments.pdf

Comments on the Quality of English Language

English writing and expression are not professional enough and need to be improved

Author Response

Comment 1、The abstract is missing experimental data results and conclusions.

Response 1: Thank you for your constructive advice. Some experimental data results and conclusions are added in the abstract.

Comment 2、The language expression needs improvement. For example, L514: "From the simulationoutcomes, the following points can be inferred" should be changed to "From the simulationresults".L519-L520: "The system has comprehensive performance such as.."L153: "StrategyGradient (PG)" should be corrected to "Strategy Gradient (SG)".L346: "two concealed layers, andan output layer, Each of the hidden layers contains 128". Should it be "concealed layer" or"hidden layer" explicitly?L390-L391: "the speed decision model is trained to obtain the speeddecision model;" there is repetitive expression.

Response 2: Thank you for your constructive advice. Thank you for your constructive advice. We have checked up the whole manuscript and the entire language was polished and optimized by experts who are native English speakers.

Comment 3、 The content of Chapter 3 is too detailed, and the content of Section 4.1 in Chapter 4 belongs to the experimental method and should not be included in the results section. lt is recommended to write separately.

Response 3: Thank you for your constructive advice. We have checked up the whole content of Chaper 3 and 4. The content of Chapter 3 is simplified to make it readability. Then, we have adjusted the content to separate Section 4.1 as a single section.

Comment 4、L396: "double-shifted shaped paths" is inconsistent with the title of Figure 2 "Double lane change path". Please unify the expression.

Response 4: Thank you for your constructive advice. We have checked up the whole content and unify the expression into "double-shifted shaped paths".

Comment 5、L404: There is no need to specially mark when there is only one subsection (1).

Response 5: Thank you for your constructive advice. We have adjusted the content and the subsection (1) was removed.

Comment 6、L408-L410:"with the shaded regions representing the minimum and maximum outcomes from the training process with different random seeds." Does "the shaded regions" refer to the light green shaded areas in the figure? What is this information intended to explain?

Response 6: Thank you for your constructive advice. "the shaded regions" surely refer to the light green shaded areas in the figure. The dark lines represent the average of two different random experiments, while the shaded parts above and below represent the positive and negative standard deviations. This means that the average line always divides the entire shaded part vertically.

Comment 7、 1410-L411: "At the same time, a reward score is set to represent the result of the intelligent body interacting with the dynamic environment [35-36]." No need to add references.

Response 7: Thank you for your constructive advice. We have adjusted the relevant content and the references [35-36] are removed.

Comment 8、 in Figure 9-13, the meaning of the vertical axis is unclear, are there units for the values? How to express the distribution information? Please supplement the explanation in the text.

Response 8: Thank you for your constructive advice. The Figure 9-13 are Kernel Density Estimation which are a graph that show the distribution characteristics of data. It estimates the probability density function of random variables through smooth curves. The interpretation of kernel density maps mainly relies on the shape and distribution of the curves. The curve in the kernel density map represents the probability density estimation of the data. The higher the curve, the higher the density of the data point at that position. So there are no unite for the values. The relevant content is supplemented in the text.

Comment 9、The titles of Figures 11 and 12 should be consistent.

Response 9: Thank you for your constructive advice. The statement of Figures 11 and 12 are all rewritten in the same.

Comment 10、 Sections 4.1 and 4.2 are too brief in the analysis of the experimental results, only describing the results in the figures, lacking exploration of the underlying reasons for the results."

Response 10: Thank you for your constructive advice. The experimental results in Sections 4.1 and 4.2 are analyzed and some exploration of the underlying reasons for the results are supplemented.

Comment 11、Figure 16-18. The horizontal axis of the figure is time, and the vertical axis is acceleration which fails to reflect different speeds, only showing the trend of changes in the variables of Acceleration, Power consumption, and Driving torque under different methods of MPC, LMl, and target. Please modify.

Response 11: Thank you for your constructive advice. Figure 15 shows the variation of speed versus time under different methods of MPC, LMI, and target. And Figure 16-18 mainly reflect the other performance with acceleration, power consumption, and driving torque of the relevant controllers during speed tracking control. We tried to modify the relevant statement and the advice give us inspiration to the next achievement exhibition.

Comment 12、Some references (such as [1,4,5,8,16, 21,23, 28, 31, 32]) are too old and need to be updated. The reference format is not uniform, and the positions of the years are inconsistent Please unify the format.

Response 12: Thank you for your constructive advice. We have modified and updated all old references (such as [1,4,5,8,16, 21,23, 28, 31, 32]). Moreover, we have checked up the format of all references to make it uniform.

Comment 13、The introduction section is too long and is recommended to be condensed.

Response 13: Thank you for your constructive advice. We have modified the Introduction section to make it simplify and clear.

Reviewer 4 Report

Comments and Suggestions for Authors

The strenght of articles evaluating the autonomous movement control of agricultural machinery. The article presents theoretical issues well, but the connection between theory and practice is low. Positive are the description of experiments.

The abbreviation „PPO“is used in the keywords in row 25. It's not eligible to be used in keywords as an abbreviation.
Figure 1. Is it the author's creation? If not, when does it need to add a citation?
All formulas in the article. Is it the author's creation? If not, when does it need to add a citation?
A short overview of the article's structure might be included at the end of the introduction.
Chapter 1 finishes with a formula without any explanations. It seems that these formulas do not directly relate to the article's content.
Formulas 1-7 are pretty sophisticated, so they need to explain their physical essence.
In Table 2, describe the parameters. To what formula do these belong?
Figure 3. What does the score mean?
What is the footstep of iterations?
In conclusion, a review is waiting for some numerical results.
What are the uncertainties of measuremenr devices and their label data and producers?

Author Response

Comment 1: The abbreviation „PPO“is used in the keywords in row 25. It's not eligible to be used in keywords as an abbreviation.

Response 1: Thank you for your advice. We have changed another keyword “maximum entropy-constrained” instead of “maximum entropy-constrained PPO”.

Comment 2: Figure 1. Is it the author's creation? If not, when does it need to add a citation?

Response 2: Thank you for your constructive advice. Figure 1. is our creation, and we put it in a wrong place. Figure 1 is the whole logical architecture to make the expression clear. the paragraph starting from line 104 and paragraph 119 described the detailed working principle of figure 1. According to your remind, we changed figure 1 and the relevant paragraph to Section 4 which is more proper to exhibit the algorithm principle and experimental logic.

Comment 3: All formulas in the article. Is it the author's creation? If not, when does it need to add a citation?

Response 3: Thank you for your constructive advice. Some of formulas are derived from other theory. There maybe some formulas lack of citations. We have checked up the whole manuscript and relevant formulas were added with citations.

Comment 4: A short overview of the article's structure might be included at the end of the introduction.

Response 4: Thank you for your constructive advice. We have added a short overview of the article's structure at the end of the introduction.

Comment 5: Chapter 1 finishes with a formula without any explanations. It seems that these formulas do not directly relate to the article's content.

Response 5: Thank you for your constructive advice. Chapter 1 mainly shows the preliminaries that referred and applied by this research. The following decision-making strategy is derived based on the relevant theory in chapter 1.

Comment 6: Formulas 1-7 are pretty sophisticated, so they need to explain their physical essence.

Response 6: Thank you for your constructive advice. Formulas 1-7 are some basic theoretical formulas for reference in the derivation of this article, and their physical meanings need to be explained according to the specific application scenarios. Overall, they are the expressions of two classic reinforcement learning algorithms, mainly the mathematical manifestations of the update methods of reinforcement learning strategies when agents interact with the environment in real time.

Comment 7: In Table 2, describe the parameters. To what formula do these belong?

Response 7: Thank you for your constructive advice. The parameters are described in the table below the Input meaning. The parameters are used in the reinforcement learning training and testing process to show the relationship between agent with surrounding vehicles. It is not belong to theory formula.

Comment 8: Figure 3. What does the score mean?

Response 8: Thank you for your constructive advice. The score refers to the reward score of the agent during the training process, which is used to indicate whether the agent has achieved the goal in accordance with the designed requirements.

Comment 9: What is the footstep of iterations?

Response 9: Thank you for your constructive advice. In reinforcement learning, the footstep of iterations refers to the step-by-step process through which an algorithm updates its policy or value function to improve performance. The exact steps vary depending on the algorithm (e.g., value-based, policy-based, or actor-critic methods), but most RL iterations follow a core loop of interaction, evaluation, and improvement.

Comment 10: In conclusion, a review is waiting for some numerical results.

Response 10: Thank you for your constructive advice. Some numerical results are added in conclusion.

Comment 11: What are the uncertainties of measuremenr devices and their label data and producers?

Response 11: Thank you for your constructive advice. Measurement devices and their labeled data are critical in fields like industrial automation, robotics, healthcare, and scientific research. However, uncertainties can arise from various sources, affecting accuracy, reliability, and decision-making. The uncertainties in measuring equipment, label data, and manufacturers mainly stem from three aspects: equipment errors (such as calibration drift, noise, and environmental interference), label issues (manual annotation errors, mislabeling due to sensor noise, dataset bias), and production defects (ambiguous specifications, inconsistent quality control, calibration flaws). These uncertainties affect data reliability and system performance, which need to be controlled through regular calibration, data validation, redundant design, and third-party testing to ensure measurement accuracy and model robustness.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The paper is mildly modified from the last revision. Several issues remain unsolved. For instance, the correlation of the ego vehicle with environmental machines is not reflected in the paper, a figure of the relation will be preferable, in addition, the relation is not explicitly shown in the speed decision-making process, the same also applies to the simulation verification section; the representation of friction variation with the uncertainty of cornering stiffness is questionable, as it deviates from the basic principles of vehicle dynamics theory; In line 326, the expression is contradicted to the authors' reply.

Some minor issues include that, (1) the authors claim this paper is centered on vehicle platoons-like concept, yet the components such as information flow topology and distributed controllers for formation/platoon vehicles are not seen in the paper; (2) the identical longitudinal force on the four tires can cause braking in steering maneuvers, which is quite unapplicable in reality.

Author Response

Comment: The paper is mildly modified from the last revision. Several issues remain unsolved. For instance, the correlation of the ego vehicle with environmental machines is not reflected in the paper, a figure of the relation will be preferable, in addition, the relation is not explicitly shown in the speed decision-making process, the same also applies to the simulation verification section; the representation of friction variation with the uncertainty of cornering stiffness is questionable, as it deviates from the basic principles of vehicle dynamics theory; In line 326, the expression is contradicted to the authors' reply.

Response: Thank you for your patient and fair suggestions.

In Section 3 (Speed Decision and Control Design) of the paper, a DMEPPO algorithm based on dynamic environmental information is proposed. Its input state variables (Table 2) include the speeds of surrounding vehicles (such as the real-time speeds of vehicles in front, behind, and on the left and right sides) and distance information.

These input parameters directly reflect the interaction relationship between the main vehicle and the environmental vehicles.

Furthermore, the safety distance (Equation 36) and energy efficiency (Equation 35) are considered in the reward function, and the constraint conditions of multi-vehicle collaboration are implicitly included.

To more intuitively demonstrate the correlation, we have added the following content: Chart supplementation: Add the "Environmental Vehicle Status Input" module in the speed decision framework diagram (Figure 1), and mark the information flow of multi-vehicle collaboration (such as speed and distance).

Simulation verification: In the experimental results section (Section 5), supplementary explanations are provided on the settings and run of dynamic environment in the test scenarios, such as how the main vehicle adjusts its speed based on the status of surrounding vehicles when simulating multi-vehicle collaborative operations.

In Equation 9 of the paper, the tire offset stiffness is modeled as the nominal value plus the uncertain term (Cf= Cof +△Cf). This assumption is based on the characteristics in vehicle dynamics where the offset stiffness is affected by factors such as the road adhesion coefficient and load changes (as in reference 28).

In the modeling section (Section 2), it is supplemented that the uncertainty of lateral stiffness mainly stems from the equivalent stiffness changes caused by complex field environments (such as soil moisture and tire wear), rather than being directly linked to the coefficient of friction.

Lines 326 in the text are the weight allocation of the reward function. In this study, energy conservation was given priority consideration, and thus a higher weight was assigned to it. However, there were clerical errors in the expression of the main text, and we corrected the main text.

The research focus of this paper is on the hierarchical decision-making control of a single vehicle (decision-making layer + tracking layer), rather than the collaborative control of multiple vehicle fleets.

However, the mention of "collaborative multi-machine operation scenarios" in the introduction may cause misunderstandings. The research scope is the autonomous decision-making of individual vehicles in a dynamic environment, rather than queue control in the strict sense. In the discussion section (Section 6), it is pointed out that the current work has laid the foundation for single-vehicle control, and in the future, it can be expanded to a multi-vehicle information flow topology and distributed control.

In this study, agricultural vehicles mainly travel in a straight line, and the longitudinal force changes of the four wheels are relatively small. Therefore, it can be considered that they are equal. However, when the direction changes, the longitudinal forces of the four wheels will change accordingly. The longitudinal forces of the four wheels are not equal, so when the direction changes, their longitudinal forces will also change. The last reply mainly considered the straight-line driving scenario. In the actual implementation process of this research, the longitudinal forces of the four wheels vary based on the model.

Article Menu

A Study on the Speed Decision Control of Agricultural Vehicles in a Collaborative Multi-Machine Operation Scenario

Further Information

Guidelines

MDPI Initiatives

Follow MDPI