Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Longitudinal Study Examining Physical Activity Habit Formation

Behav. Sci. 2026, 16(4), 535; https://doi.org/10.3390/bs16040535

by Thomas McAlpine^1,2

, Caitlin Liddelow^2,3

, Jessica Charlesworth^1,2

, Enrique Mergelsberg^1,2, Astrid Green^1,2, Elizaveta Novoradovskaya^1,2, Teagan Franz^1,2, Darren Haywood^4,5

, Frank D. Baughman²

, Hayley Breare^1,2 and Barbara Mullan^1,2,*

Reviewer 1: Anonymous

Reviewer 2:

Ki Hong Joung

Reviewer 3:

Timur Liwinski

Behav. Sci. 2026, 16(4), 535; https://doi.org/10.3390/bs16040535

Submission received: 19 November 2025 / Revised: 7 March 2026 / Accepted: 30 March 2026 / Published: 2 April 2026

(This article belongs to the Special Issue The Impact of Psychosocial Factors on Health Behaviors)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper 'A Longitudinal Study Examining Physical Activity Habit Formation' presents unique data (assessing habit formation during a 'naturally occurring' context change) alongside theoretically insightful and novel analyses. However, the paper has some shortcomings that could be addressed in a revision.

Abstract

What do authors mean by 'novel' cues?
Also, the term 'cue use' is not quite clear. The more common term is 'cue exposure', as 'cue use' implies that individuals actually do something with it, even though they only cognitively link a cue with an existing behaviour.
The term 'physical activity' should be used consistently. Why introduce a new term ('exercise') for the keywords?

Introduction

- Well written.

- When talking about simple vs. complex behaviour, it is better to use the terminology from the discussion (single vs. multiple step).

-Page 2, line 52: Consider using the term 'execution' instead of 'engagement' (in line with Gardner's terminology).

- Consider acknowledging the following relevant papers in the introduction and/or discussion:

Di Maio, S., Keller, J., Kwasnicka, D., Knoll, N., Sichert, L., Fleig, L. (2022). What helps to form a healthy nutrition habit? Daily associations of intrinsic reward, anticipated regret, and self-efficacy with automaticity, Appetite, 175, Article 106013. https://doi.org/10.1016/j.appet.2022.106083

Habit substitution toward more active commuting - Di Maio - 2025 - Applied Psychology: Health and Well-Being - Wiley Online Library

The temporal trajectories of habit decay in daily life: An intensive longitudinal study on four health‐risk behaviors - Edgren - 2025 - Applied Psychology: Health and Well-Being - Wiley Online Library

HabitWalk: A micro‐randomized trial to understand and promote habit formation in physical activity - Baretta - 2025 - Applied Psychology: Health and Well-Being - Wiley Online Library

- Page 2, line 72: leave out the adverbs „highly influential“

- Use the term habits instead of routines

- Authors argue that participants experienced a context change – how did authors ensure that this actually affected participants physical activity behavior – were the usual „cues“ actually „taken away“ (for example, if participants usually went for a run in a park or engaged in a YouTube routine, the Corona lockdown did not really affect these patterns. I wonder whether cue exposure was really disrupted)

- Do you have any (qualitative) data on the type of cues participants chose?

- Please state at the beginning that you are referring to trait self-control.

Methods/Results

Was this study pre-registered?

Why was ethics approval granted in Australia, where data collection was not conducted, when the study was actually carried out in the US?

What was the rationale for only asking participants every other day and not daily? That appears to be a rather uncommon sampling scheme.

- Can you provide a reference for your PA measure?

- Was there any power analysis to determine how many measurements per person would be sufficient?

How did you ensure that participants actually chose a 'new' cue? How was 'new' defined? Did you have any information on how participants actually chose a 'new' behaviour?

Have you also considered testing the fit of a quadric curve? (See Keller et al., 2021. The authors applied three different regression models for each participant and found that the best fit to the data across model types was for quadratic models.)

Discussion

How would an intervention targeting trait self-control differ from an intervention targeting habit formation for physical activity (PA)? Wouldn't the former approach be more aligned with the 'global health approach' hypotheses (see Fleig et al., 2025/ Understanding Multiple Health Behaviours | 16 | The Routledge Internat

Thank you for giving me the opportunity to review your valuable paper. I look forward to reading the revised version.

Author Response

Please see attached file.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This study employed a 12-week longitudinal design to track the process of physical activity (PA) habit formation and investigate the influence of behavioral consistency, cue stability, complexity, and reward on habit strength (SRHI).

Strengths:

Significance of the Topic: Following the classic study by Lally et al. (2010), there remains a scarcity of longitudinal research specifically addressing the habit formation process of complex behaviors like physical activity. This study is a timely attempt to fill this gap.

Analytical Approach: The use of Linear Mixed Models (LMM) was appropriate for analyzing individual change trajectories while accounting for the missing data inherent in longitudinal designs.

Visualization: Figure 1 clearly illustrates the non-linear (asymptotic) curve of habit formation, aiding reader comprehension.

Weaknesses:

Severe Attrition Rate: Of the initial 488 participants, only 52 were included in the final analysis, representing a data loss rate of approximately 89%. This suggests a high potential for survivorship bias.

Simplicity of Measures: Key variables such as complexity and reward were measured using single items, which may compromise construct validity.

Please find further comments in the attachment.

Comments for author File: Comments.pdf

Author Response

Please see attached file.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This manuscript reports a longitudinal study of physical activity (PA) habit formation during COVID-19 lockdowns in the United States. Habit strength was assessed over 12 weeks using intensive repeated measures, and growth curve models were used to test whether habit formation follows a logarithmic trajectory and to identify predictors of change.

The topic is timely and theoretically relevant, particularly given the opportunity for habit disruption and re-formation during lockdowns (lines 77 - 86). The dense sampling schedule (43 measurement occasions) is a clear strength. However, several substantial methodological and analytic issues limit confidence in the conclusions. In particular, high attrition, ad-hoc handling of missing data, poor model fit, and coarse behavioral measures substantially weaken inferences about habit formation and its predictors. While some of these issues are acknowledged in the discussion, their implications are not fully reflected in the interpretation of results or the abstract.

Overall, I recommend major revisions. With more transparent handling of missing data, clearer justification of analytic choices, and a more cautious interpretation of findings, the manuscript could still make a useful contribution to the literature on habit formation under real-world constraints.

Strengths:

The study leverages the COVID-19 lockdown as a naturalistic disruption context, which is well motivated theoretically (lines 77 - 86) and extends prior habit research beyond stable environments.

The high-frequency assessment design (every two days; 43 time points; line 130) is a notable methodological strength and allows for fine-grained examination of intra-individual change.

The simulation-based power analysis informed by Lally et al. (2010) (lines 198 - 214) is innovative and transparently reported.

The focus on self-control as a potential moderator of habit formation is theoretically grounded (lines 62 - 66), and the observed association with habit growth (lines 277 - 279) is potentially interesting for intervention design.

The discussion of the negative intercept–slope covariance (lines 332–338) is thoughtful and appropriately cautious, suggesting that individuals with weaker initial habits may have greater room for growth.

Major Concerns:

Sample Size and Attrition Bias

The initial sample (N = 91) was reduced to N = 41 due to attrition and the requirement to complete at least 50% of follow-ups (line 231). Although the authors report comparisons between retained and excluded participants (lines 233 - 236), the finding that groups differed significantly in lockdown status (p = 0.021; line 236) is potentially consequential and currently underplayed. Given that lockdown severity likely constrained PA opportunities, this difference could systematically bias habit trajectories and predictor effects.

Suggestions for improvement:

Conduct and report sensitivity analyses excluding participants who reported “no lockdown” (line 236) to test whether the main findings hold under more homogeneous contextual conditions.

Alternatively, include lockdown status as a time-varying or time-invariant covariate in the growth models to explicitly account for contextual constraints.

Expand the limitations section to more explicitly discuss how selective attrition and the small final sample size constrain generalizability and statistical power, particularly for detecting heterogeneity in growth patterns.

Data Handling and Model Identification

To address missing data and identification issues, non-consecutive time points were removed, reducing the dataset from 43 to 24 time points (lines 239–245). This approach appears ad hoc and risks altering the shape of individual trajectories, especially in a study explicitly concerned with nonlinear growth. The resulting poor model fit for both linear and logarithmic models (e.g., CFI = 0.563, RMSEA = 0.322 for the linear model; Table 2, line 263) suggests that the specified models do not adequately represent the data.

Although FIML was used (line 223), the extent of missingness (noted via the 2:1 ratio issue at line 239) raises concerns about whether FIML alone is sufficient.

Suggestions for improvement:

Provide a clearer justification for removing non-consecutive time points and explicitly discuss how this decision may affect estimated growth parameters.

Explore alternative analytic approaches better suited to sparse and heterogeneous longitudinal data, such as Bayesian growth models or multiple imputation combined with growth modeling.

Given the pronounced inter-individual variability (Figure 1, line 261), consider implementing latent class or growth mixture models (already noted as a future direction at line 354) in the current revision to test whether distinct habit formation trajectories exist.

Temper claims about predictor effects (e.g., self-control predicting slope; line 278) in light of the consistently poor global model fit.

Measurement Validity

PA behavior was assessed using a single binary (yes/no) item covering the previous two days (lines 187–190). While pragmatic, this coarse measure limits sensitivity to meaningful variation in behavioral repetition and may partly explain the null findings for H2. Cue use was also assessed in a minimal format (yes/no/unsure; line 192), without information on cue type, consistency, or adherence, despite participants being asked to self-select cues (lines 154–159).

Habit strength was measured using the SRBAI (line 179), which demonstrated excellent internal consistency (α = 0.96; line 184). However, self-reported automaticity may be especially vulnerable to contextual disruption during lockdowns.

Suggestions for improvement:

More explicitly acknowledge that the null effects for behavioral repetition and cue frequency (lines 277 - 279) may reflect measurement limitations rather than true absence of effects.

If feasible, provide additional descriptive analyses (e.g., frequency of PA engagement across time) to contextualize the binary PA measure.

In the revision, clarify what types of cues participants selected and whether any guidance or examples were provided (lines 154 - 159).

Strengthen the future directions section by specifying how objective PA measures (e.g., accelerometry; line 391) or more granular self-report tools could address these limitations.

Hypotheses and Interpretation

Hypothesis 1 (logarithmic growth outperforming linear growth; line 117) was not supported, with only minimal increases in habit strength across the study period (M = 2.82 to 3.34; line 258). This finding stands in tension with the framing of habits as “fundamental” for PA maintenance (line 38). The discussion begins to address this (lines 301 - 307), but the implications could be developed further, particularly regarding the complexity and contextual dependence of PA habits.

Hypothesis 2 was only partially supported, with self-control emerging as the sole significant predictor (lines 277 - 279). The negative slope when predictors are zero (line 280) suggests that habit strength may decline in the absence of supportive factors, but this point is not fully explored.

The abstract statement that “higher levels of self-control were significantly related to faster habit strength formation” (line 32) somewhat overstates the strength of the evidence, given the poor model fit (CFI = 0.559; line 273).

Suggestions for improvement:

Reframe conclusions to emphasize the difficulty of forming PA habits under disrupted conditions rather than framing null growth as a failure of habit theory.

Discuss more explicitly why PA may differ from simpler behaviors studied in prior habit research (lines 301 - 307).

Temper claims in the abstract and discussion regarding self-control, explicitly acknowledging the marginal improvement in model fit.

Consider discussing alternative or complementary predictors (e.g., motivation, affective responses to PA; line 54) that may be more salient under lockdown conditions.

Minor Concerns

Clarity and Presentation

Line 117: clarify earlier that the logarithmic model is being compared directly to a linear model.

Line 258: clarify whether the reported t-test (p = 0.019) was one- or two-tailed, especially given the modest effect size (d = 0.40).

Table 1 (line 254): category percentages appear to exceed 100% (e.g., “More than three weeks” at 85.4% and “Not applicable” at 12.2%); please check and correct.

Proofread for typographical errors such as “Loglikeihood” (line 220).

References and Currency

Some references are dated (e.g., Warburton et al., 2006; line 97). Consider citing more recent reviews on PA benefits and habit formation.

Given the 2025 publication date and 2020 data collection (line 134), briefly situate the findings within post-pandemic habit research.

Tables and Figures

Figure 1 (line 261) effectively illustrates variability; clearer axis labels (e.g., “Measurement Occasion” or “Days”) would improve interpretability.

Ensure Table 3 (line 288) reports all estimates consistently, including standard errors and p-values.

Author Response

Please see attached file.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Peer Review Report

1. General Summary

This study investigates the dynamic process of physical activity habit formation during the unique context of COVID-19 lockdowns. By employing a high-frequency longitudinal design (43 time-points over 12 weeks), the authors attempt to replicate and extend established habit formation models. The finding that trait self-control is a significant predictor of the rate(slope) of habit formation, rather than the initial state, provides valuable theoretical insights into how complex health behaviors are sustained.

While the overall model fit was below conventional thresholds, this likely reflects the inherent inter-individual variability in complex, multi-step behaviors like exercise. The manuscript is well-written and the statistical approach is rigorous. With minor clarifications, this paper will make a meaningful contribution to the field.

2. Comments for the Authors

A. Introduction & Theoretical Framework

• Complexity of Behavior: The authors correctly identify physical activity as a multi-step behavior. It would strengthen the manuscript to briefly distinguish between habitual instigation(deciding to exercise) and habitual execution(the routine itself), as this distinction might explain some of the observed variability in growth trajectories.

• Context Change: The use of the "habit discontinuity hypothesis" as a backdrop for the COVID-19 lockdown is excellent. Ensure the connection between the "window of opportunity" and the specific participants (who all wanted to increase activity) is clearly emphasized.

B. Methodology

• Cue Specification: Participants selected a single cue to assist their habit formation. While frequency of cue use was measured , adding a brief descriptive summary or examples of the types of cues chosen (e.g., time-based vs. event-based) would provide better context for the readers.

• Attrition Analysis: The retention rate was approximately 45%. While the authors noted no demographic differences between those included and excluded, it would be beneficial to mention if there were differences in baseline motivation or past activity levels, as these often predict dropout in exercise studies.

C. Results & Discussion

• Interpretation of Self-Control: The positive relationship between self-control and the slope of habit formation is a key finding. The discussion should further elaborate on whyself-control matters more for the rate of change in complex behaviors compared to simpler habits.

• Model Fit and Variability: The authors honestly report poor overall model fit. In the discussion, please reinforce that this "lack of fit" is a finding in itself—highlighting that group-level aggregate models may fail to capture the highly idiosyncratic nature of habit formation following major life disruptions.

• Future Directions: The suggestion to use Latent Class Growth Modeling (LCGM) in future research is a strong point. Expanding this slightly to suggest what kind of sub-groups might be expected (e.g., "fast-formers" vs. "fluctuators") would be helpful for researchers looking to build on this work.

3. Conclusion

The study is methodologically sound and addresses a critical gap in longitudinal habit research. The transparency regarding the data preparation and the use of sensitivity analyses (Supplementary Materials) is commendable. I recommend acceptance once the minor clarifications regarding cue types and self-control mechanisms are addressed.

Author Response

Please see attached response doc.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have adequately addressed all comments, and I support the publication of the manuscript in its current form.

Author Response

Reviewer #3

Comments and Suggestions for Authors

The authors have adequately addressed all comments, and I support the publication of the manuscript in its current form.

Authors Response

Thank you again for your time and effort in reviewing the manuscript. Your comments have improved the manuscript.

Article Menu

A Longitudinal Study Examining Physical Activity Habit Formation

Reviewer #3

Comments and Suggestions for Authors

Authors Response

Further Information

Guidelines

MDPI Initiatives

Follow MDPI