Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Application of the Thermo-RAdiometric Normalization of Crop Observations (TRANCO) Back in Time: An Assessment of the Potential for Crop Time-Series Generalization to Past Years Using Wheat as a Proxy

Remote Sens. 2026, 18(4), 571; https://doi.org/10.3390/rs18040571

by Juanma Cintas^1,2,*

, Emilio Guirado³

, Jaime Martínez-Valderrama¹

, Italo Moletto-Lobos⁴

, Carmen López-Zayas², Tamara Escamilla², Inbal Becker-Reshef⁵

, Javier Cabello⁶

, Maria Jacoba Salinas-Bonillo⁶

and Belén Franch⁴

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Remote Sens. 2026, 18(4), 571; https://doi.org/10.3390/rs18040571

Submission received: 29 December 2025 / Revised: 2 February 2026 / Accepted: 10 February 2026 / Published: 12 February 2026

(This article belongs to the Special Issue New Perspectives in Plant Phenotyping: Satellite-Based Multispectral Remote Sensing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper addresses an important and highly concerned issue in the field of agricultural remote sensing, namely the limited temporal universality of crop classification models across different years. The author proposes a standardized strategy (TRANCO), which coordinates the crop time series over multiple years based on growth days and crop calendars, and uses long-term Landsat data (from 2008 to 2020) to evaluate its effectiveness.

The research has a clear objective and the overall experimental design is reasonable. The results consistently show that compared with the baseline method and the time window method, TRANCO can improve inter-annual similarity and classification performance. Although the methodological contributions of this study are mainly incremental rather than conceptual, its outcomes provide a practical and effective solution for long-term crop mapping applications.

I believe that this manuscript would be suitable for publication in the journal "Remote Sensing" after minor revisions. The main purpose is to clarify the methodological assumptions, ensure the fairness of comparisons among different normalization strategies, and make moderate adjustments to certain assertions regarding generalization ability and applicability.

1. In Section 4.2 (Line 176), the authors state: "...neither tackle Landsat-7's strip problem." Considering the study's retrospective period (2008-2013) heavily relied on Landsat 7 data, would simply applying monthly composites and interpolation be sufficient to mitigate the significant data gaps caused by the SLC-off malfunction? The authors should address in the Discussion section the potential impacts of this decision on classification accuracy. If certain Blocks coincidentally located in regions severely affected by stripe gaps, could this explain the performance decline observed in specific years (e.g., 2011 or 2013) as shown in Figure 5?

2. As shown in Figure 3(a) and described in the text, the Baseline model outperformed TRANCO during the training period (2017–2020). This finding is intriguing, suggesting that TRANCO's normalization process, while enhancing generalization capability, might simultaneously cause loss of annual spectral details (over-smoothing). The authors should explicitly discuss this trade-off between accuracy in known domains (Baseline's strength) and stability in unknown domains (TRANCO's advantage) in the Discussion section. This is not a limitation per se but requires clarification: If ground truth data for the target year is available, direct training might be preferable; however, in the absence of such data (the scenario addressed in this study), TRANCO becomes necessary.

3.The study employs CDL data as ground truth. However, the quality of CDL products generated during 2008–2010 may be inferior to those produced in 2017– How much of the observed performance decline during the validation period stems from model prediction errors versus potential inaccuracies in the contemporaneous CDL data itself? The authors should briefly discuss the potential interference of historical CDL data quality on evaluation outcomes.

4. TRANCO heavily relies on crop calendars to determine the GDD (Growing Degree Days) accumulation start point (biofix date). The authors utilized simulated/forecasted calendars (Line 209). How would the shape of TRANCO's phenological curves change if the predicted planting dates deviated by ±10 or ±15 days? While re-running experiments is not required, the authors should ideally reference or provide supplementary explanations regarding the method's sensitivity to biofix date errors.

5. As shown in Figure 5, all models exhibit a significant performance decline (reduction in F1 scores) during 2011 and 2013.Could the authors correlate this phenomenon with meteorological data (AgERA5) to investigate potential climatic causes? Were these years characterized by extreme weather events (e.g., severe droughts or floods) in the study region? This analysis would strengthen the assessment of TRANCO's robustness under climatic anomalies and clarify whether the model failures stem from universal climatic stressors or methodological limitations.

6.Section 5.4 identifies SIPI (Structure Insensitive Pigment Index) and EVI (Enhanced Vegetation Index) as the most critical features during specific GDD stages. Strengthen the biological interpretation of these findings. Since SIPI correlates with carotenoid/chlorophyll ratios, does this indicate that capturing crop senescence processes or stress signals (via SIPI/EVI) is more critical for historical generalization than simple greenness metrics (e.g., NDVI)? Expanding this discussion with theoretical mechanisms (e.g., pigment dynamics during drought stress) would significantly enhance the paper's scientific contribution.

7. The manuscript concludes that TRANCO significantly outperforms both the time-window approach and baseline methods. However, it remains unclear whether all normalization strategies were compared under fully optimized conditions. Specify whether the parameters of the time-window approach were specifically tuned for wheat (e.g., window length, spectral bands) or pre-fixed without crop-specific adjustments. The authors should explicitly address whether any crop-specific information utilized in TRANCO (e.g., phenological phase derived from CDL) introduced prior knowledge not accessible to baseline methods, potentially biasing the comparison.

8. Figure 6 presents annual probability maps, but the thumbnails are too small to discern spatial details.
Recommendation: Consider displaying only 2–3 representative years (e.g., the best-performing year, the worst-performing year, and a training-period year) with enlarged views, or ensure sufficient resolution for publication to maintain readability.

9. Acronyms like "JD" (Julian Day) and "GDD" (Growing Degree Days) should be explicitly defined upon first mention in the text to improve readability for interdisciplinary audiences.

10. Conceptual Diagram for Method Comparison (Illustrative Enhancement)
Recommendation: Consider adding a schematic illustration to visually differentiate the baseline values, time-window values, and TRANCO normalization processes. Such a diagram would strengthen methodological clarity by explicitly showing how each approach handles temporal variability and spectral normalization.

Author Response

Thank you very much for your revision. We found your advices and corrections interesting and needed. Although, we found some of them outside the scope of this research, we would like to explore them in a future.

Thank you for your help. We added some information about this to the limitations section. Nonetheless, we didn’t consider this to be the cause of the decline in 2011 or 2013, which we associated to the CDL average quality.

We suggested the reason behind this is the influence of temporal autocorrelation in the training partition, since it is composed by four consecutive years. Although, further analysis is necessary to ensure this, in previous experiences in spatial generalization of crop time series (doi:10.1016/j.jag.2023.103283) TRANCO outperforms the Baseline in the same period and at global scale. We understood this improvement as signal of spatial autocorrelation reduction, but in the current case the extent of the USA and their wheat crops seems not enough for that reduction to be important. Thus, in Figure 5, where the model is confronted with years not seen in the training partition, the TRANCO approach outperforms the Baseline, because it also achieves being able to deal with temporal autocorrelation. Again, this explanation needs further testing and discussion, but we consider it outside of our scope, which is showing that the accumulation of GDD also achieves temporal normalization of crop time-series. Also, we are not sure about the over-smoothing statement causing spectral details to be lost. GDD has been used to describe phenology stages in a wide variety of fields. Hence, they should facilitate detecting the key stages of a crop (always its parameters are well-defined). Errors could raise from using a new source of information (AgERA-5) which introduces a new source of variability. Because of that, when plenty of data is available and the correct sampling is possible, is expected to non-normalized classification to improve respect the normalized ones. Nonetheless, other studies using this methodology in a more limited spatial and temporal extents, have found that the addition of GDD improves the classifications. We expanded our explanations about this in the Discussion section.

3. The study employs CDL data as ground truth. However, the quality of CDL products generated during 2008–2010 may be inferior to those produced in 2017– How much of the observed performance decline during the validation period stems from model prediction errors versus potential inaccuracies in the contemporaneous CDL data itself? The authors should briefly discuss the potential interference of historical CDL data quality on evaluation outcomes.

Thank you for the tip. However, when assessing the CDL performances for the codes I considered as wheat, it turns out that there’s not too much difference in the metrics between the periods 2008-2010 and 2017-2020. However, there’s a steep decline of the performance in the year 2013, that is followed by TRANCO and Time windows, but not the Baseline. This suggests that TRANCO is harder to deviate from what it recognizes as wheat, perhaps because, it has extra phenological information coming from the crop calendars. Supporting this, the Time windows approach, though with lower performances, also follow this decline in 2013. We added this to the discussion in the limitations section.

TRANCO normalization shouldn’t be affected too much by ±15 days variation in the static crop calendar consulted. In the case of the Start of Season, that corresponds to the emergence date after the dormancy period for winter wheat and the sowing date for spring wheat, that variation shouldn’t deviate too much the accumulation of GDD, since the temperature in such dates are low. In the case of the End of Season, associated to harvesting dates, it indicates when the accumulation must stops, hence the curves should be lengthened towards the harvesting date, increasing the total GDD accumulated and, possibly, deviating its ability to describe phenological stages. We expanded this in the discussion.

5. As shown in Figure 5, all models exhibit a significant performance decline (reduction in F1 scores) during 2011 and 2013. Could the authors correlate this phenomenon with meteorological data (AgERA5) to investigate potential climatic causes? Were these years characterized by extreme weather events (e.g., severe droughts or floods) in the study region? This analysis would strengthen the assessment of TRANCO's robustness under climatic anomalies and clarify whether the model failures stem from universal climatic stressors or methodological limitations.

After checking, the average minimum, mean, and maximum temperatures for the wheat season seem to have normal behavior. Though some years have lower temperatures than others, they are within what is expected. As total precipitation comes, for the wheat season they have a large variability, without the driest ones matching the worst performances. However, most humid years (2010 and 2016) seems to some of the best performances of the classification. Yet, a more in-deep analysis must be performed since flashing floods could be missed at the temporal scale used. Nonetheless, though interesting (and planed for the future) we consider such analysis outside the scope of this paper. A couple of lines were written about his.

Thanks for the suggestion, we find it very interesting and could open exciting research lines. However, at the moment, we think this falls outside the scope of this research, which is focused on exploring the application of TRANCO dynamics for crop time-series normalization through time.

I think this is well-defined in the methodology section, where we define the characteristics of each approach. In the case of the Baseline, a pure time-series without further information about the crop is used (that role is taken by the classifier). In the case of the two normalization approaches, the Crop Calendars add some extra information by limiting the time series to the wheat season (Time windows) or by using accumulated GDD (TRANCO).

Thanks for noticing. We added the requested 600 dpi images as a zip file and added a pdf with 120 dpi images. However, it seems the journal didn’t use the 600 ones. For publication, we will ensure the 600 dpi are the ones used.

9. Acronyms like "JD" (Julian Day) and "GDD" (Growing Degree Days) should be explicitly defined upon first mention in the text to improve readability for interdisciplinary audiences.

Thanks for noticing. It should be fixed now.

Thanks, this was also noticed by other reviewer. Please find the diagram in the Methodology.

Reviewer 2 Report

Comments and Suggestions for Authors

Summary of the paper

This manuscript investigates whether TRANCO, previously validated for spatial generalization, is effective for temporal generalization of crop time series. Using wheat as a proxy and the U.S. Crop Data Layer in concert with Landsat imagery from 2008 to 2020, the authors compare TRANCO against a Time Windows normalization and a non-normalized baseline. Performance is evaluated using Jeffries-Matusita distances and Random Forest classification, training models on recent years (2017-2020) and validating on earlier years (2008-2016). Results indicate that TRANCO produces more stable and consistent time-series normalization across years and leads to better and more stable classification performance in the validation period, supporting its potential use to reconstruct historical crop maps when ground truth data are in short supply.

Major comments

General comment: all experiments are performed on wheat and only in the USA. Please soften general claims, e.g. "applicable through time", or clearly state that conclusions are valid only for wheat and CDL-like datasets and extension to other crops and regions remains future work.

Lines 174-176, 483-484: it is explicitly assumed in the manuscript that there was no harmonization between Landsat instruments, and Landsat-7 striping was not accounted for such a limitation may be quite serious, even for an interannual analysis. Please provide a clearer argument in the justification section for this choice not affecting a temporal analysis, or discuss it in the limitations section, or provide a brief analysis on the effect it might have.

Lines 368-384, 398-426: the Baseline is better than TRANCO in training, but the reverse is true in validation. The reason is ascribed to the treatment of temporal autocorrelation only in the Discussion, Lines 514-518, but this is only conjecture. It should be clarified in the Methods section that temporal autocorrelation is not handled during training; additionally, more evidence is needed to support the hypothesis that this explains the Baseline overperformance, or the arguments for TRANCO’s training benefit should be toned down.

The lines 458-462 refer to the reliance upon the static crop calendar as a limitation regarding the dependence of TRANCO upon the values of SOS and EOS. Please follow more closely the way uncertainty in the crop calendar propagates into the TRANCO value in the normalization procedure and possibly provide an analysis regarding the dependence of the error in the calendar throughout the years.

Lines 579-580: "raw data will be available on request." - This is not sufficient to allow reproducibility. In fact, the processing (Google Earth Engine processing, feature selection, spatial CV, rotated sampling) is complex. Could the authors either host the scripts on some form of repository (e.g., GitHub or Zenodo) or describe the processing in pseudo-code? Specifically, this would help to clarify the TRANCO processing, rotated sampling strategy, and MSMD feature selection. Without it, the current study is hard to reproduce.

Minor comments

There are several tiny grammatical errors in the manuscript. These include:

Line 45: please replace “Literate review” with Literature review

Line 76: please replace "spatial biased" with "spatial bias".

Line 176: “neither tackle Landsat-7’s strip problem” unclear phrasing.

Lines 6-7: please change “The tests we performed in this research shows that TRANCO approach is able to generalize information outside its time component” to “The tests performed in this research show that the TRANCO approach is able to generalize information outside its time component.”

Lines 25-27: please change “Then, we test the performance of two normalization approaches” to “Then, we tested the performance of two normalization approaches.”

Line 45: please replace “Literate review” with “Literature review.”

Lines 75-76: please replace “The spatial biased to the Northern Hemisphere of crop type crop observations” with “The spatial bias toward the Northern Hemisphere in crop type observations.”

Lines 126-127: I would ask to replace “Our model rely on a reanalysis database” with “Our model relies on a reanalysis database.”

Lines 151-152: “required a well representation of the such variability” should be changed to “required a good representation of such variability.”

Lines 173-176: Replace "since our scope is to normalize time-series, and not classification" with "since our scope is time-series normalization and not classification."

Lines 186-187: Please replace “Vegetation’s metabolism is related with the daily temperature accumulation” with “Vegetation metabolism is related to daily temperature accumulation.”

Lines 311-313: please replace “we split our ARD into a training period, when three random forest classifiers will be trained” with “we split our ARD into a training period, where three random forest classifiers were trained.”

Lines 389-390: change "As far as Time windows is concerned" to "As far as the Time Windows approach is concerned."

Lines 458-459: please change “One reason for the low performance of the normalization would be the utilization of static crop calendars” to “One possible reason for the low normalization performance is the use of static crop calendars.”

Lines 463-464: please change "Though our main scope is not to develop a fine classification model" to "Although our main scope is not the development of a fine classification model."

Lines 482-483: "Lower results are expected" should be replaced with “Lower performance is expected.”

Lines 565-566: “suggests a better handling of autocorrelation by TRANCO; a better analysis of this is mandatory though” to “suggests a better handling of autocorrelation by TRANCO; however, a more detailed analysis is still required.”

Repeated concepts include: justification for the use of TRANCO, Description of Random Forest as a proxy method, Explanation of autocorrelation. The Introduction and Discussion need to be simplified by eliminating the repetition found in some concepts.

References section: there are some references cited more than once in successive sections, such as references [49] and [53], which refer to similar concepts; the authors may consider streamlining these citations.

Author Response

Summary of the paper

Thank you very much for your corrections and advices. We consider we meet most of your suggestions, although some of them we found difficult to implement due to lack of time (10 days). Nonetheless, we consider them interesting enough for future research. Thank you a lot.

Major comments

Thanks for your suggestion. It was stated throughout the paper, but not in the conclusions. I added a sentence about this.

Thanks for notice the need for further explanation. We did not account for the Landsat-7 stripping because the main scope of the study is to normalize wheat crop time series. Hence, the average shape of such time series would be enough. Nonetheless, we also wanted to prove that such normalization can be used to improve classification, but only prospectively. Thus, we didn’t fix the stripping problem, since a correct classification is outside of the scope of the study. Yet, because of the normalization and averaging steps, such problem would affect more to TRANCO approache than to the Baseline.

Thank for notice this. We toned down the argument in the lines 519-520 of the sent article by calling for the need of a deeper study of this effects. However, it wasn’t as clear as it should be. We add some extra sentences explaining that more in deep studies are needed and that it is only a suggestion.

We elaborate about this in the discussio now, however we consider further analysis out of the scope of this paper. In fact, we think that such analysis should employ a summer crop, where errors are expanded for higher temperatures.

We considered hosting the whole scripts in Codeberg or Zenodo, but they rely on custom python packages that could present problems when installing (mainly the Download of earth engine data, and the dealing with geospatial information). When referring to “raw data will be available on request”, we are including the code. It is a first step to get in contact and resolve the issues it could raise. Furthermore, further explanation of the specific processing algorithms can be found in their respective citations. We only implemented them in python. In addition, some kind of pseudo-code about the rotated sampling methodology is already in the paper (older lines 250 – 257).