Next Article in Journal
Applying Neural Networks in Aerial Vehicle Guidance to Simplify Navigation Systems
Next Article in Special Issue
Design Limitations, Errors and Hazards in Creating Decision Support Platforms with Large- and Very Large-Scale Data and Program Cores
Previous Article in Journal
Predicting Intentions of Pedestrians from 2D Skeletal Pose Sequences with a Representation-Focused Multi-Branch Deep Learning Network
Previous Article in Special Issue
Efficient Rule Generation for Associative Classification
 
 
Article
Peer-Review Record

An Evaluation Framework and Algorithms for Train Rescheduling

Algorithms 2020, 13(12), 332; https://doi.org/10.3390/a13120332
by Sai Prashanth Josyula *, Johanna Törnquist Krasemann and Lars Lundberg
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Algorithms 2020, 13(12), 332; https://doi.org/10.3390/a13120332
Submission received: 26 October 2020 / Revised: 4 December 2020 / Accepted: 7 December 2020 / Published: 11 December 2020
(This article belongs to the Special Issue Algorithms in Decision Support Systems)

Round 1

Reviewer 1 Report

The authors have addressed all my previous comments and the manuscript has been improved.

There are few minor issues that should be modified before publishing the paper:

  • Is frequent to find in the manuscript consecutive sentences ended by the same citation. Please reduce this number of repetitive citations to ease the reading.
  • Table 3 is repeated two times.
  • Line 307 I think that “log train delays” is “long train delays”.
  • As the algorithms are part of the contributions of the paper I recommend to extract their explanation to a separated section to highlight the contribution.
  • Please include a more clear definition of the extensions included in ALG2.

Author Response

Please see the attached rejoinder.

Author Response File: Author Response.pdf

Reviewer 2 Report

Dear Authors

I rarely read a paper that is so well written and ready to publish. This essentially is an "accept" judgement with minor comments.

On lines 505 and 506 I believe the correct word is "recommendation" where you write "advice".

Lines 570 to 574: I am actually not surprised that TFD and TAD are correlated. Please cite below and mention that in many scenarios TFD and TAD are directly correlated.

Harrod, Steven, Fabrizio Cerreto, and Otto Anker Nielsen, “A Closed Form Railway Line Delay Propagation Model”, Transportation Research Part C, Vol. 102, 2019, pp. 189-209

Thanks for a very interesting paper.

Regards

Reviewer

Author Response

Please see the attached rejoinder.

Author Response File: Author Response.pdf

 

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

 

Round 1

Reviewer 1 Report

The main topic of this paper is the development and evaluation of a framework that can be used to compare algorithms for train rescheduling.
Indeed, the authors list their contributions as:
(1) propose an evaluation framework for train rescheduling algorithms,
(2) improve two existing train rescheduling algorithms: a heuristic and a MILP-based exact algorithm,
(3) conduct an experiment to compare the two multi-objective algorithms using the proposed framework (a proof-of-concept).

The main objective of the paper: to come to a unified framework to compare the wealth of algorithms in this area, is indeed very important.
The main contribution, in my opinion, is the framework. Indeed, their alterations to existing algorithms are very minor: e.g. for ALG2 a lexicographic multiobjective approach is taken, rather than a single objective approach. For the other the approach is slightly more.

The paper starts of very promising, and I find sections 1, 2, 3 particularly well written.
For 4 and 5, that describe a framework to classify the models, and to measure its quality I have mostly very minor remarks. The main question though that stuck with me is whether the authors claim to present an exhaustive framework, that will be able to classify all possible problems -- or whether they aim to present a framework that focusses on the key aspects of algorithm characteristics and qualities. While the first seems hard to claim given the general differences between different cases/applications, the second would require some motivation: why are these characteristics selected? These quality measures? Are they all important to evaluate algorithms?
The paper would greatly improve in value if indeed motivation was provided for the framework.
There I would personally lean towards rather the selection of less items if indeed the proposal of the framework is that it will be used henceforth by the larger community.

Sections 6, 7 and 8 are in such stark contrast with the message of S1-S5 that I came close to recommending to reject the paper.
Section S6 is titled "Application of the framework for evaluation and comparison of algorithm performance". However, it does not refer to the framework at all, but rather provides an abstract summary of the two selected algorithm workings and improvements that are hardly, if at all, linked to the framework.
On closer inspection I found that two of the original versions algorithms discussed by the authors are actually described according to the framework in table 3, section 4 -- although changes made to the algorithm and their consequences (if any) to the description in the framework are not discussed.

The main purpose of the framework is to compare algorithms. A natural comparison would have also been to compare the improved versions of the algorithm with the results of the previous algorithms: this is however not included. Therefore, the improvement of the alterations remains unclear.
Secondly, the discussion in S7 of results does not follow the framework of S5 in an orderly fashion. The Quality Indicators in S7 are presented in a different order, and sometimes under different headings, than in S5. I see no reason for this difference.

The application of the QI of S5 leads to an 8-page result section where I personally find it hard to keep the overview in performance between the two algorithms on the many different QI's included in the performance framework. That said, it seems that often AlG2 seems to outperform ALG1 on all of these -- leading one to wonder if they are all essential?
Part of the reason this section is long and somewhat cumbersome is:
1) a somewhat artificial split in discussing Scenarios 1-20 and 21-30 separate -- while actually also scenarios 1-10 are different from 11-20 -- so why not immediately discuss also S21-30?
2) There are many metrics, some close: are they all essential?
3) The choice of the authors to provide results for each scenario independently, without presenting aggregate numbers, makes it a somewhat cumbersome read
4) The lengthy presentation (8 pages) makes one doubt whether the suggested framework is workable (even from a research perspective), especially when one wants to compare more than 2 algorithms
5) Provided the objective of the framework, I am personally missing the comparison between the "improved" algorithms and the original ones, in order to demonstrate that the alterations indeed led to improvement.


Therefore, I have come to the recommendation of a major revision where the authors majorly focus on the restructuring and rewriting of section 6, 7 (and 8 to some extend), including required changes in S4/5. to come to a framework that is well motivated, demonstrated in a structured way, and indeed allows for easy/insightful comparison of multiple algorithms.

 

Minor remarks:

Table 2, page 6:
1) often "etc" is used. This seems out of place in a framework that is intended to be used in general for comparison. Is "etc" needed?
2) Degree of freedom: this could also be interpreted as part of the solution space that is being explored by the algorithm. Please consider rephrasing
3) "main ideas of the approach": the description here all has to do with the objective function. Also, what is a "fixed" vs "flexible" objective function?

Page 7, "special considerations". The text mentions "several core constraints". Could you include a list of these, or a reference to these? So that there is no ambiguity about the list. List could also appear in appendix.

Table 3, page 8 (and some others): I often have difficulty distinguishing where one column ends and the next start, same for rows that span multiple lines. Maybe the typesetting can be improved?

Table 3, page 9. Row "applicable scenarios". The second two columns contain "etc.", the first column contains all the same words apart from "etc". This makes the difference unclear (or is it a typo?)

Page 10: is TOPSIS really required for the expression of the QI's in table 4? Not for every case one may be able to compute the ideal solution...
Page 10: could you include motivation for these QIs: is it an exhaustive list? a selection? how was it constructed?
Page 10, table 4: "delays" includes "passenger delays" but not delays fro freight trains. Is this correct?

Page 11, point 5 "freight trains". "Freight trains often ... of schedule." It is not entirely clear what point is made here. Is it that the on-time performance of freight trains is less important than the travel time? At the same time, you will actually later argue for the problems of freight trains that arrive too early -- and I assume similar problems (conflict with resource availability for handling the train) do equally exist for late arrivals...
Moreover, as for additional stops: doesn't this also impact costs and energy consumption?

Page 11, point 6: IS the specific visualization of this metric important for the framework? and what features are on the axes?


Section 6: The title is misleading.
Also, please start with stating if table 3 gives an accurate description of the improved algorithms properties. (e..g link it clearly to the framework)


Page 16, listing 6.2.2
It is not clear what the value of this code is, as the function "setObjectiveN()" is not included nor explained, and moreover, the concept is well explained in the text and I do not see how this adds something. Then, rather, include actual pseudo code to demonstrate the process.


page 16: the TOPSIS approach is discussed, with stating that solutions would preferably " as close as possible" to the ideal solution. However, the "location" is defined in multiple dimensions (QIs): how do you express the overall distance to the ideal solution?

S7: Could you consider to present the results more aggregated: e.g. a single nr per metric for each algorithm, or at least an average over each of the 10 scenarios (1-10, 11-20, 21-30)? This would be easier to obtain an overview.

Page 17: Figure 3 and 4 do not directly follow the framework (that e.g. does not mention a focus on early trains, but rather mentions delayed trains, that will only come a whole page later).

Page 17, S7.1, first sentence "in this ... and discussed" -- however, you have already started this discussion in Fig 3 and Fig 4, and previous text.

Table 9: could results be presented on a more aggregate level, e.g. total delayed trains (percentage), so it is more easy to compare?
Table 10: light grey difficult to see in print

page 19, before section 7.1.1: "from the results... secondary delay": could you please start with this and a matching table: so start with the overview, then go into details.

section 7.1.1:
- as the framework does not use "passenger perspective" is see no reason to separate it here, it makes it more difficult to follow the match between framework and results
- "An increased... increased boarding times." This has already been discussed, furthermore, general discussion on the metrics belong to S5.
- (page 20, last sentence) "from a passenger perspective". What about dispatchers?

Section 7.1.2
- First paragraph mentions problem with early arrivals. This while earlier in the paper travel time was deemed more important that timeliness (or so it seemed). Please make consistent. Also with later sentence on this page "however, they prefer... unplanned stops"
- Also, figure 5 shows early departures and NOT early arrivals. This I found confusing -- as important of on time arrivals for freight is mentioned in the previous text.


Section 7.2.1
- repetition " the algorithm ... by the disturbance"
- "comparable to the above numbers". Unclear. Please rephrase

Section 7.2.2
Table 19: why is ALG 2 not included? all 0?

Section 7.3
Why present, and focus discussion, on the time-window of 1h when all results are for 1.5h?

Section 8:
I miss some reflection on the framework.

Reviewer 2 Report

Dear Authors

This is a needed paper, and it just needs some adjustment and clarity in subject. The paper starts strong, with a clear objective, and then towards the end I feel lost and miss the thread of the starting argument.

Your objective is to document and demonstrate assessment methods for train scheduling algorithms, but then you include point (2) "Improve two existing train rescheduling algorithms...", and I don't see how that supports the paper topic. It is a distraction, and possibly something you should remove and use in another paper.

You make no mention of PESP in your review of scheduling algorithms. You don't need a deep discussion, but you should have a paragraph on PESP.

There is another paper you can cite that specifically compares two different scheduling formulations on the same data set:

Harrod, S., & Schlechte, T. (2013). A direct comparison of physical block occupancy versus timed block occupancy in train timetabling formulations. Transportation Research. Part E: Logistics and Transportation Review, 54, 50–66. https://doi.org/10.1016/j.tre.2013.04.003

The largest revision I seek is in the argument of your goals and conclusions. There are two fundamental categories of comparison:

(1) performance measures that represent the explicit goals of the formulation - objective value, processing time, explicit constraints - how well does the formulation solve the problem compared to another formulation with the same goals?

(2) implied or external performance measures. The most common example would be robustness. Very few timetabling formulations explicitly address robustness, because it is non-linear. How should one measure and judge performance measures that are not explicitly part of the formulation?

How useful is it to judge algorithms on performance measures that are not explicitly part of the formulation? To say that an algorithm fails to achieve some performance measure when that measure is not in the objective or the constraints seems of low value to me.

You spend some time describing alternative optima solutions for problems. How often does this happen? Is this a common concern in selecting algorithms?

You miss entirely a statistical analysis of your results. For example, a T statistic on the significance of your differences. These are matched pairs. If you perform these statistical tests, I am not sure you need to publish the tables with all the individual results.

I miss a more intellectual discussion of 

(1) how should we organize an experimental test of a selection of algorithms?

(2) which measures are most important to the comparison, and which are secondary?

(3) how to value the primary and secondary measures?

(4) when the secondary measures are highly valued, does that mean the algorithm is poorly designed, because it neglects to address that measure directly?

(5) when do you stop? when do you say, "we have enough measures"?

There needs to be more thought on the strategy and goals of performance measure.

General comments about the manuscript:

The introduction starts with the same general statement on the value of railways that I am seeing in nearly every manuscript submitted to me. Remove it. Everybody knows railways have value. Use your first sentence to get right to the subject. Start from the beginning with a strong statement about your research.

Can you make your data set public? It would further the goal of comparing algorithms if future researchers could also test their algorithms against your results.

The abbreviations, TFD, TAD, ALG1, etc. are not easy to remember. Could you just spell them out with short word definitions? It is cumbersome to read, especially if one has to read the paper between interruptions (phone calls, office visits).

I look forward to your revision.

Regards

Reviewer

Reviewer 3 Report

The paper tackles the train rescheduling problem from the algorithmic point of view. It proposes an evaluation framework for train rescheduling algorithms and presents two existing algorithms with improvements to solve the rescheduling problem. The framework and the algorithm are tested in a case study using data from the Swedish railway network.

The paper is well-written and the topic is interesting for researchers and practitioners. However, it can be improved in some aspects.

-Section 4 presents a comparison between 3 real life rescheduling algorithms. Why other popular algorithms from literature are not included in the comparison table?

-The framework proposed is a set of metrics to assess the solutions obtained. However, as it is claim as one of the main contributions, there is a lack of a detailed comparison with the metrics proposed in the literature and justification of why this metrics are better or more general than the ones proposed in literature.

-There is a lack of explanation about why the authors choose the two algorithms studied in the paper. The two algorithms studied have been proposed by the authors in previous papers. It should be included, at least, one of the most popular algorithms in literature.

-There is no analysis or explanation of the fairness of the comparison between the algorithms.

-There is a not enough mention in the conclusions Section about the calculation time of the algorithms. The results of calculation time must be clearer presented and analysed.

-Please, review the reference list, as there is some references (for instance [20]) that missed the journal name, volume and/or pages.

Back to TopTop