Duration Rotation in U.S. Treasury Fixed-Income ETFs: Evidence for a “Median” Strategy
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsI will add my suggestions one by one below:
- I dont see any novelty regarding rotating the returns in 3 classes (either new model, theory or statistically innovative technique)
- I don't see any hypothesis testing , CI, significance tests and other non-statistical tests like comparison of Sharpe ratios or Sortino Ratio or other financial indicators and if the difference is statistical significant.
- I think the second biggest issue is the data snooping and overfitting, which involves rebalancing frequency and selecting the best performing.
- How do you select the transaction cost?
- I can't see out of sample and robusteness analysis
- the analysis looks like a course exercise
english has repetitions, informal phrasing for academics, no technical explanations and gramatical errors. please review the text in details
Author Response
please see the enclosed file
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors,
Please find my review and comments below.
The manuscript studies rotation strategies across the U.S. Treasury yield curve from 2008 to 2025, comparing three systematic portfolios—Winners, Median, and Losers—ranked by their prior-period total returns. Each strategy’s performance is compared against a passive buy-and-hold benchmark represented by the Bloomberg U.S. Treasury Total Return Index (LUATTRUU). Key performance metrics include cumulative portfolio value, annualized return, volatility, Sharpe ratio, and maximum drawdown. Transaction costs and taxes are also accounted. One of the key findings is that it is possible to outperform the passive LUATTRUU benchmark by periodically rotating across U.S. Treasury maturities. In particular, the results suggest that while Winner and Loser portfolios demonstrate cyclical outperformance, the Median strategy exhibits greater consistency, resilience, and lower drawdowns, offering an attractive balance between return and stability for fixed-income investors. This strategy achieves the best balance of return, volatility, and drawdown control when re-balanced on semi-annual basis.
The sections are well settled and the results are described in terms of portfolio growth dynamics, annual returns, drawdowns, Sharpe ratios, periods of crisis, transaction costs and taxes. The cited references are relevant and include mostly recent publications. Anyway, there are two references, for which the description could be improved: “for International Settlements” (line 434) and “of New York” (line 462).
The research motivation is significant and the proposed idea is feasible. Nevertheless, there are some areas for improvement. Please, address the following questions.
- Clarify how this work differs from prior studies on the topic and why the chosen approach is expected to add value beyond standard benchmarks.
- Add Methods section that specifies data fields, return definitions, rebalancing rules, risk-free rate for Sharpe, assumptions on execution, and software used.
- Temper claims and clearly discuss limitations such as market regime dependence, backtest assumptions, and implementation frictions.
- Consider strengthening the link to FinTech angle: automation, data pipeline, reproducible code, decision support, etc. The link for the data and the code on lines 384-385 is not working (https://github.com/traders2025/ 384 Treasury_Rotation_Trading_Strategy).
Author Response
Please see the enclosed file
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe central contribution is positioned as a novel rotation approach and an empirical finding that semiannual rebalancing performs best. At present, that conclusion is drawn from a single in-sample backtest where the same full sample is used both to select the best-performing frequency and to evaluate it. This creates a clear risk of overfitting, particularly because multiple strategies and multiple rebalancing horizons are examined and the best outcome is highlighted without any adjustment for multiple testing. As a result, it is not possible to assess whether the reported advantage reflects a persistent effect or a selection artifact. The revised manuscript should implement a straightforward out-of-sample design (e.g., a train/test split or a rolling walk-forward procedure) in which the rebalancing frequency and any other key design choices are selected only using past data and then held fixed during evaluation. In addition, the paper should introduce an explicit correction for multiple comparisons across the strategy-frequency grid, using a simple and transparent procedure, so that statistical conclusions are not driven by data-snooping.
The statistical treatment of performance is currently incomplete. The tables and discussion emphasize annualized returns, volatility, Sharpe ratios, and drawdowns, but do not provide formal inference for the strategy’s excess performance over the benchmark, nor do they address serial correlation that is common in bond and ETF returns at weekly or monthly horizons. The revision should report significance tests for the mean difference between the strategy and benchmark using heteroskedasticity-and-autocorrelation robust methods (e.g., Newey–West), and should provide confidence intervals rather than relying on point estimates. A block bootstrap, even in a basic implementation, would materially strengthen the claims by yielding uncertainty bands for Sharpe ratios and for differences in Sharpe across strategies, and by making clear how often the strategy dominates under resampling that preserves time dependence.
Several design checks that are standard in the recent fixed income momentum and rotation literature are missing and are easy to incorporate. Specifically, the signal construction should be tested with a small skip between the measurement window and the holding period (e.g., skipping one week for weekly signals or one month for monthly signals) to reduce contamination from microstructure effects and short-term reversals. The manuscript should also demonstrate that the main findings are not an artifact of a single lookback by repeating the analysis for a small set of alternative windows (for example 3, 6, and 12 months). These additions require minimal changes but directly address concerns that the reported results are fragile to common implementation choices.
The modeling of trading costs needs to be made more defensible while remaining parsimonious. The current approach relies on a single constant bid–ask assumption and treats market impact as negligible, but cost sensitivity is precisely what differentiates rotation strategies across rebalancing frequencies. A simple scenario-based analysis would be sufficient for a revision: report results under low/base/high cost assumptions (e.g., 0.5×, 1×, 2× the current spread assumption) and identify the break-even cost level at which the advantage disappears. Relatedly, the turnover definition as written is not internally consistent with the stated portfolio construction, which undermines reproducibility of the net-of-costs results. This must be corrected by providing an unambiguous portfolio turnover formula matched to the actual number of holdings and the actual rebalancing rule.
Finally, there are reporting issues that currently impede comprehension by a specialist reader and need to be corrected in a revision. Some table labels and metrics are not defined in a way that allows independent verification. In particular, the “Tr. Err.” field appears with negative values, which is inconsistent with standard tracking error definitions unless it refers to a different quantity entirely; the manuscript must define it explicitly and ensure that labels match the computed statistics. The paper should also state clearly how annualization is performed for volatility and Sharpe ratios across different rebalancing frequencies, including the return sampling frequency, scaling factors, and the risk-free rate convention if excess returns are used. In addition, at least one robustness replication on an alternative but closely related proxy set (such as another issuer’s Treasury ETFs, Treasury futures, or constant-maturity total return indices) would significantly reduce concerns that the effect is specific to a particular ticker set and implementation.
Author Response
Please see the enclosed file
Author Response File:
Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for Authors
The introduction should be substantially strengthened, for instance by engaging with the seminal contributions of Martin Leibowitz and the subsequent related literature.
The empirical analysis should also incorporate measures of relative risk-adjusted performance, such as the information ratio, in order to allow for a more meaningful comparison with the benchmark. For an overview of risk-adjusted performance measures, the authors may refer to https://link.springer.com/book/10.1007/978-3-031-59819-7 and https://www.wiley.com/en-us/Practical+Portfolio+Performance+Measurement+and+Attribution%2C+3rd+Edition-p-9781119831945 .
All figures should, where feasible, include benchmark data; this is particularly important for Figure 2, where such a comparison would be especially informative.
The citation to Magdon-Ismail et al. (2004) in line 196 is out of context and should be removed.
The reference to Ilmanen and Kizer (2012) appears to be incorrect. It is likely that the intended citation is this paper by the same authors: https://www.pm-research.com/content/iijpormgmt/38/3/15 .
Transaction costs are appropriately proxied by the bid–ask spread; however, the methodology surrounding their treatment is unclear. Transaction costs are discussed only ex post, in Section 5, and briefly mentioned in the conclusions (Section 6). This gives the impression that transaction costs have been included merely as an afterthought and have not been properly integrated into the empirical analysis. If this is indeed the case, the reported results are rendered irrelevant or potentially misleading. The authors must therefore clarify how transaction costs are incorporated into the analysis and, if they have not been fully accounted for, the methodology, empirical analysis, and results should be reconsidered and revised accordingly. Strategies of the type examined in this paper are meaningful only when evaluated within a realistic trading environment.
Author Response
Please see the enclosed file
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsIn my suggestions/issues, I have 2 parts: major and minor parts:
Major Comments
-
Novelty is limited and insufficiently justified. The paper claims a “median strategy” anomaly but does not clearly demonstrate how this differs from existing duration-allocation or carry-based strategies already documented in the fixed-income literature.
-
Economic intuition is weak. The explanation that the “middle group avoids extremes” is largely descriptive and not supported by a formal theoretical framework or empirical decomposition.
-
Issue of data-snooping bias.
-
Small effective sample size. Semi-annual rebalancing produces only ~35 observations, which severely limits statistical power and reliability of the HAC and Sharpe tests.
-
ETF universe is extremely small.
-
Benchmark selection is questionable. I recommend that risk-adjusted alpha relative to duration-matched benchmarks not be tested.
-
Economic magnitude is modest. An excess return of ~1.25% annually may be explained by differences in duration exposure rather than by a structural anomaly.
-
Robustness analysis is extensive but repetitive. Many robustness tests appear exploratory rather than hypothesis-driven.
Minor Comments
-
Several figures are redundant and could be consolidated (Sharpe, Sortino, volatility, drawdowns).
-
Some sections repeat narrative explanations of the same result without adding new analysis. you can have them in appendix part
-
Some references appear reports or unpublished work and should be clarified.
-
Data source description should specify exact total return calculation and dividend reinvestment method.
-
A clearer explanation of portfolio rebalancing implementation (trading dates, slippage assumptions) would improve transparency.
english has repetitions, informal phrasing for academics, no technical explanations and gramatical errors. please review the text in details
Author Response
Please see the enclosed file
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsMy main concern remains the same in substance, even though the authors now acknowledge it explicitly in the cover letter. The central empirical conclusion - that the semi-annual version of the Median strategy is the preferred implementation - is still selected and evaluated in-sample. The response letter does not claim that a true train/test split or walk-forward procedure was implemented; instead, it states that such an out-of-sample framework was not feasible and that robustness checks were added in its place. I appreciate the candor, but robustness checks are not equivalent to genuine out-of-sample validation. As long as the specification is chosen and assessed on the same sample, the headline result remains vulnerable to selection effects.
I also acknowledge that the authors responded to the multiple-testing concern in the cover letter by reporting exact p-values and discussing Bonferroni-style corrections. This is a useful improvement in transparency. However, the concern is still only acknowledged rather than resolved. The manuscript continues to emphasize the best-performing strategy-frequency combination after searching across multiple configurations, yet no formal correction is incorporated into the main inferential claim. The paper is now more honest about the data-snooping issue, but the methodological implication of that issue has not been fully addressed.
The statistical treatment is improved but still incomplete. I agree that adding HAC inference for excess returns and Lo-style Sharpe-ratio inference is a meaningful step forward. However, the revision still does not provide the requested block-bootstrap analysis, nor does it provide uncertainty bands for Sharpe ratios or formal inference for differences in Sharpe across competing strategies. As a result, the paper now offers stronger evidence that one specification performs well relative to zero or to the benchmark, but it still does not convincingly establish that the preferred strategy is statistically superior to the alternative strategies once time dependence and model-selection uncertainty are taken seriously.
Some of the robustness additions are helpful and should remain. I appreciate the authors’ response on skip-period robustness and alternative lookback windows, both of which were explicitly mentioned in the cover letter and incorporated into the manuscript. Likewise, the transaction-cost section is much more defensible now, with a clearer turnover-based framework, low/base/high cost scenarios, and a break-even analysis. These revisions materially strengthen the paper.
However, one reporting inconsistency remains serious. The authors state in the cover letter that tracking error has been clarified and formally defined, and the manuscript now indeed defines tracking error as an annualized standard deviation of excess returns. By that definition, tracking error should be nonnegative. Yet Appendix Table A9 still reports negative values in the “Tr. Err.” columns. This means that either the table is mislabeled or the quantity shown is not tracking error as defined in the main text. This inconsistency directly affects verifiability and should be corrected before the manuscript can be considered further.
Finally, I do not view the added GOVT benchmark as a complete answer to the robustness concern regarding alternative proxies. It is a useful additional passive benchmark, and I welcome its inclusion. But that is not the same as replicating the strategy on an alternative investable proxy set, Treasury futures, or constant-maturity Treasury return indices. The current revision reduces benchmark dependence, but it does not yet rule out instrument-set dependence.
Author Response
please see the enclosed file
Author Response File:
Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThis revised version fully deserves to be published. Well done!
Author Response
Thank you for your recommendation. We are enclosing a file with latest updates
Author Response File:
Author Response.pdf
Round 3
Reviewer 1 Report
Comments and Suggestions for AuthorsI don't have anything else to add. On my end, it is ok. thank you very much for the detailed replies.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have addressed all of my suggestions in this revised version of the manuscript.

