Review Reports - Assessing and Visualizing Pilot Performance in Traffic Patterns: A Composite Score Approach

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study presents a novel approach to objectively measure pilot performance using a composite score derived from various flight metrics. The paper also describes the development of a web application for visualizing performance metrics. Thirty general aviation pilots completed flight simulator scenarios under different Flight Rules (VFR vs IFR) and difficulty levels (Low vs High). The authors validated their composite score by examining correlations with workload indicators (subjective NASA-TLX and an objective auditory oddball task). Pilots also provided feedback on the usefulness of the visualization tool.

The following are major comments:

The paper lacks a clear justification for the weight distribution in the composite score (e.g., why landing is given 40% weight while other metrics receive 15%). (Figure 3)
There is no external validation of the composite score against expert instructor ratings or real-world flight performance even you mentioned this in limitations.
The pilot selection was limited to private pilots only, with no commercial pilots included. Additionally, an A320 flight simulator was used, and the participants typically do not have regular flight experience. Furthermore, The flight simulator used is modeled after an A320, but the composite score’s applicability to other aircraft types is not addressed. This presents a limitation that should be acknowledged in the study.
The scenarios were presented in a fixed order (VFR-low, VFR-high, IFR-low, IFR-high), which may introduce a learning effect that confounds workload comparisons. However, the study does not explicitly control for or analyze potential learning effects over repeated trials.
The study relies solely on NASA-TLX (subjective) and an auditory oddball task (objective) to measure workload, without considering physiological measures (e.g., heart rate variability, eye tracking). While a significant relationship is found between subjective and objective workload measures, confounding factors such as individual differences in cognitive abilities are not explored.
The reported correlations between the composite score and workload measures are weak to moderate (e.g., r = -0.47 for IFR-low), raising questions about their practical significance. Effect sizes and confidence intervals should be more prominently reported to aid interpretation of findings.
While the application received positive feedback, users reported limitations in responsiveness and visualization features. The study does not outline how the authors plan to address usability concerns or improve the tool based on feedback.
You need to have recent references as most of the used one are pretty old.

The following are minor comments:

You need to unify the reference style all around the paper. Those website references can be treated in the same way when citing paper.
You mentioned that “The present study aimed to build on the findings of previous research [19–21] and 61 develop a holistic metric of flight performance…”. Why you based your research on these references (19-21), they are old. You need to find if new researches have been made and start from the latest to make your research reasonable.
in line 85, you mentioned “An open-access web application (Shiny App) was…,” the information was not clear here, however, after reading many paged I found this in line 204 “After data acquisition, an online application was developed using the Open Source 204 software Shiny app in R Studio [28].” So I recommend to make the reader understand this from first time.
In table one title, remove the abbreviations to the bottom of the table by adding may be a symbol beside these abbreviations inside the table.
In table 2, I am assuming the “N” is the item number ion this table, however, in other places in the paper, I can see that N is the number of participants, clarify.
No need for the period “.” After the word “vs”. Omit in all places.
In table 5, why N is 21?
In the results section, you are including huge quantity of numbers in the test, I prefer to keep numbers in tables, and explain them in text.
Change the title of section 4.5 into Limitation and future direction.
The conclusion section is not reflecting the results you got very well, needs improvement.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

General comments

The paper proposes a new tool to assess pilots’ performance based on a composite score that combines various flight metrics, supported by a visual representation through an online application. The topic is quite interesting, and it suits well to the journal’s scope. The research methodology used by the authors is reasonably well structured, including the collection and analysis of both direct and indirect data from various sources. The results add value to the existing body of knowledge on training strategies/procedures for aircraft pilots and can potentially contribute to a practical and tangible benefit in assessing the skills/performance of pilots providing the results can be verified/validated (which is one of the limitations and areas for future research that have been identified by the authors). There are a few issues though that should be clarified by the authors around the way how the new tool was applied and that may have a strong influence on the results before the paper is published. I’ve made a few comments in the below section that hopefully will contribute to enrich the argumentation presented by the authors, as well as a few minor issues with editing/typos that need to be corrected.

Providing these issues are effectively addressed by the authors, I think the paper meets the requirements to be published in this journal.

Detailed comments

Line 21: the citation style for the reference “trophi.ai” is incorrect. There are other instances throughout the text where webpages have been cited as references. Please check whether the citation style is compliant with the journal’s instructions.
Line 26: pilots are also assessed through metrics/parameters that are set forth by national aviation authorities as minimum requirements for them to pass their flight exams and attain their corresponding licence. Your sentence seems to imply the performance metrics are only driven by research purposes.
Line 50-51: “However, their composite score lacked an important metric related to the most critical part of the flight: the landing”. This sentence is a bit misleading as you are assuming that this flight phase is the most critical based on accident statistics. You would need to have evidence of a positive correlation between pilot performance and number of accidents during approach/landing to verify whether the latter could indeed be mostly attributed to pilot error. Also, many accidents in commercial aviation (which is the bulk of the statistics available) stem from crew resource management issues, thus the performance metrics would need to account for non-technical skills that could impair the performance of the crew/team, and not only pilots individually. I suggest you rephrase this part of the text to include the above caveats in your argumentation.
Line 63: “…the critical landing phase”. The word “critical” seems a bit unnecessary here again given you have already stressed the “criticality” of this flight phase in the preceding section.
Line 64 : “…these flight parameters and performances.” Maybe this should be re-written as “ …these flight performance metrics/scores”.
Lines 102-103: “airbus” should be capitalized (i.e.,, Airbus).
Line 138: “They had…”. Do you mean pilots? Please revise the sentence accordingly
Section 2.4.1: the criteria to allocate the weight percentages to each metric/flight phase is questionable. The landing phase, which is based on a g-force metric, was given a weight of 40%. However, one could argue this percentage is probably too high compared to the other metrics as these would have also a meaningful impact on the performance of pilots during landing. For example, the flight path deviation metric was given only 15%, and yet a pilot who was (hypothetically) considerably above the target flight path and overshot the runway would arguably have a poor performance (and likely result in an accident). The same rationale could be extended to other metrics, e.g., airspeed. In fact, the latter parameter is of key importance when facing emergency situations, such as the engine failure event that is considered in one of the scenarios. As such, one could argue a different weight allocation to the different metrics would lead to totally different composite scores, thus making the assessment of pilots’ performance very dependent on the weight criteria. This should be made clear on the explanation provided on the paper, as a potential limitation of the proposed tool.
Figure 4: though it may seem obvious to the authors (and to this reviewer, for that matter), it would be good to include a brief explanation on the different attributes shown in these plots, e.g., points, curves/areas, vertical bars for each scenario, p, etc.
Section 3.2: the explanation in the text mentions N=30, but Table 5 is showing N=21. Please check.
Lines 319-321: “The higher scores obtained for the IFR conditions may be due to the participants being recreational pilots. As argued previously [30], these novice pilots experience higher workload during IFR conditions”. Did the pilots who participated in your experiment have similar experience levels? If not, how was this factor (variable) incorporated in the analysis of your results?
Line 343: typo on “highlights”, should be “highlight”
Line 407: I believe you meant “ A last limitation…”?

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors,

Flight training domain and pilot performance evaluation needs more studies like this to improve flight safety and training efficiency.

It is clearly seen that there is a big work and efforts behind the research. To reflect this, my first comment about mapping and showing in one chart/visual what you have done in the research as methodological flow chart. This may give reader better understanding of your complex experiments.

Another comment is to reflect participant competency levels. The A320 simulator was used in the experiments is CPL level, but participant's competency level is PPL. The simulator plane could be C-172 or similar. A320 is more stable to observe their flight performance during the VFR aerodrome traffic circuit. My concern is how you could reflect better their competency levels compared to scenarios complexity levels considering workload. Providing competency-based explanations for participants and scenarios would be great explaining the tasks to be performed. Maybe you could give a matrix for this for each scenario tasks and flight phases.

Additionally, the experimental airspace can be given better by using charts and scenarios together. This may help reader to understand better.

My other comment is about workload measurement. You would ask participants during flight depending on the workload levels how they are feeling. After experiment could be missing their exact feelings about workload exposure. In your future study you may focus on how you can measure workload better with flight performance and tasks.

Finally, your results can be briefly given in a matrix again regarding to your hypothesis and other studies in domain. This may help reader again to understand results in one table/visual and reduce the complexity in the results and discussion text sections repeating hypothesis titles.

Regards

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

GREAT JOB

Reviewer 2 Report

Comments and Suggestions for Authors

The response of the authors to the issues raised in my original report was satisfactory. I recommend accepting the paper for publication.

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors,

Thank you for your revised edition of your study. I believe your study will be adding high value to aviation domain.

Regards,