2-Year Change in Revised Hammersmith Scale Scores in a Large Cohort of Untreated Paediatric Type 2 and 3 SMA Participants

The Revised Hammersmith Scale (RHS) is a 36-item ordinal scale developed using clinical expertise and sound psychometrics to investigate motor function in participants with Spinal Muscular Atrophy (SMA). In this study, we investigate median change in the RHS score up to two years in paediatric SMA 2 and 3 participants and contextualise it to the Hammersmith Functional Motor Scale–Expanded (HFMSE). These change scores were considered by SMA type, motor function, and baseline RHS score. We consider a new transitional group, spanning crawlers, standers, and walkers-with-assistance, and analyse that alongside non-sitters, sitters, and walkers. The transitional group exhibit the most definitive change score trend, with an average 1-year decline of 3 points. In the weakest patients, we are most able to detect positive change in the RHS in the under-5 age group, whereas in the stronger patients, we are most able to detect decline in the RHS in the 8–13 age group. The RHS has a reduced floor effect compared to the HFMSE, although we show that the RHS should be used in conjunction with the RULM for participants scoring less than 20 points on the RHS. The timed items in the RHS have high between-participant variability, so participants with the same RHS total can be differentiated by their timed test items.


Introduction
Spinal muscular atrophy (SMA) is an autosomal recessive neuromuscular disorder caused by mutations in the survival motor neuron 1 (SMN1) gene located on chromosome 5q leading to SMN protein deficiency [1][2][3][4]. It induces proximal muscle atrophy and weakness, leading to secondary complications including scoliosis, joint contractures, and progressive respiratory decline [4,5]. SMA is divided into types which are defined by the age of onset and the highest developmental milestone achieved. Type 1 children do not achieve the ability to sit independently, type 2 children can sit but cannot walk independently, and type 3 children achieve independent walking, but lose motor function over time and many become wheelchair dependant [4,5].
In recent years, several treatment options have been clinically proven to be effective and approved for commercial use. Both nusinersen and risdiplam are specifically designed to increase the amount of functional SMN protein by altering splicing of survival motor neuron 2 (SMN2) gene pre-mRNA. SMN2 is intact in all SMA individuals, but a single nucleotide change leads to exclusion of exon 7 from the majority of the transcript with consequent lower levels of functional SMN protein. Both nusinersen [6][7][8] and risdiplam [9,10] have been studied in symptomatic type 1 and pre-symptomatic cohorts, as well as in type 2 and 3 SMA, and significant benefits have been demonstrated, with transformative changes especially when administered close to disease onset or pre-symptomatically [3,[6][7][8]10,11]. However, functional improvement or stabilisation of function in more advanced and chronic stages of the disease require more careful documentation, and comparison with the natural history in SMA types 2 and 3 is required.
The Hammersmith Functional Motor Scale Expanded (HFMSE) is a clinical outcome assessment designed and validated to assess gross motor function in SMA [12]. The Revised Hammersmith Scale (RHS) was developed to address discontinuity in the HFMSE [13], and several items were adapted and added from the North Star Ambulatory Assessment (NSAA) [14] and the Children's Hospital of Philadelphia Infant Test of Neuromuscular Disorders (CHOP-INTEND) [15] to increase the sensitivity of the scale in the strongest and weakest patients, respectively. From the NSAA, a scale validated and widely used in Duchenne muscular dystrophy, items relating to one legged standing, hopping, and climbing/descending box steps were included alongside the two timed items (the rise from floor (RFF) and 10 metre walk/run test (10MWR)). From the CHOP-INTEND, the "Adduction from Crook Lying" item was included. The Revised Upper Limb Module (RULM) was specifically designed to capture upper limb function in SMA and is used as an outcome in ongoing clinical trials.
Due to the availability of multiple outcome measures to assess patients with SMA types 2 and 3, their comparative strengths and weaknesses need to be understood. Recent therapeutic innovation and the increased availability of disease-modifying drugs have led to a change in phenotypes, with the majority of children with SMA now on a treatment. The cohort analysed here is one of the largest natural history cohorts of SMA types 2 and 3 available. The increasing availability of treatments makes it crucial to understand the sensitivity to change of available outcome measures, to aid trial design and inform clinical care. The data presented will provide reference data to detect changes in treated SMA 2 and 3 patients.

Aims
We aim to characterise the change in RHS scores over a two-year period by age, motor function, and total RHS score in a large international cohort of untreated SMA 2 and 3 participants. We aim to contextualise these change scores by providing the corresponding change in the HFMSE score. The aim of this longitudinal, multicentre natural history study is to demonstrate how the RHS score can be used in conjunction with other functional measures such as the RULM and the RHS timed items to enhance the understanding of this cohort's disease progression and detect changes with treatments.

Inclusion Criteria
The participants included in this analysis are recruited from the International SMA Consortium (iSMAC) natural history studies (SMA REACH UK, PNCRN USA and Italian Telethon) [16]. All participants had a genetically confirmed diagnosis of SMA type 2 or 3, were receiving SMA Standards of Care treatment [17][18][19], had no previous involvement in clinical trials, and had at least two RHS assessments performed between the 17 March 2015 and the 29 July 2019.

Scales
RHS, HFMSE, and RULM assessments were conducted by experienced neuromuscular physiotherapists who were part of, or trained by, iSMAC. RHS, HFMSE, and RULM scores were collected in clinics approximately every 6 months, as recommended in the standard of care [18,19]. The RHS is a 36-item ordinal scale with a maximum score of 69 points (33 items are scored 0-2, and three 0-1). The HFMSE is 33-item ordinal scale with a maximum score of 66 points. The RHS and HFMSE can be scored simultaneously, as many items are similar or a perfect match between scales. The RULM is a 20-item ordinal level scale (including a separately scored entry item) which captures proximal, mid-level, and distal arm performance with a maximum score of 37 points.

Analysis
Participants without a known SMA type, gender, RHS total score, and HFMSE total score were excluded from the analysis. As the data were collected longitudinally in clinic, the participant visits were not scheduled uniformly at six-month intervals. Therefore, visits that were completed ±3 months were accepted for analysis. Additionally, to maximise participant populations, every participant assessment could act as a baseline [20].
RHS medians and interquartile ranges are presented. Significance testing for the RHS and HFMSE change scores was completed using a sign test for the median = 0, with a significance level of 5%. For some groups, it was not possible to compute the sign test due to low sample size/low number of non-zero change scores. Means and standard deviation (SD) values for the change scores are presented in the Supplementary Tables S1-S4, and, here, the p-values are calculated using a t-test. Participants were stratified according to SMA type, defined by peak motor function attainment (for SMA type 2 vs. SMA type 3) and symptom onset (for SMA type 3a vs. SMA type 3b). The World Health Organisation (WHO)derived functional groups were determined based on previous published work [21], which grouped participants based on their WHO motor function, with scores of 2 (crawling), 3 (standing with assistance), 4 (standing independently), or 5 (walking with assistance) coded as the "transitional group". Of note here, this scale is not ordinal, and patients who, for example, could not crawl but could stand with assistance were classified at a 3 instead of a 1. Change scores were stratified by type and age as follows: <5, 5-7, 8-13, and 14-18 years, in order to align with previous research on the HFMSE which used similar age groups [22,23]. Additionally, the change scores were stratified by baseline motor function, which were described using quintiles of the RHS total score across the whole population. The quintiles of the RHS total score were calculated using all the data and were defined as follows: Quintile 1 (Q1)-scores from 0-4, Quintile 2 (Q2)-scores from 5-9, Quintile 3 (Q3)-scores from 10-18, Quintile 4 (Q4)-scores from 19-42, Quintile 5 (Q5)-scores from 43-69.
To model the relationship between the RULM and the RHS, and the RHS timed items and the RHS total, a random effects model was used to adjust for the between-participant correlation. The timed tests were only considered as valid if the participants scored >0 on the corresponding item. For the timed tests, a linear model was considered with a personspecific intercept only. To define the RHS total score, which was most predictive of the performance of the RHS timed items, the receiver operating characteristic (ROC) was used, which trades off the sensitivity and specificity rates of potential cut offs. The ROC curve is not shown here. When jointly considering the RULM and the RHS and the timed tests and the RHS, a piecewise linear model with one knot was used. This creates two joined straight lines to represent an inflection point in the relationship. The position of the inflection point was identified using a grid-search (fitting the model with each possible breakpoint from 1-67), and the value that minimised the Akaike information criterion (AIC) (which is a trade-off between model complexity and goodness of fit) was chosen. When considering the timed items, the linear model was found to minimise the AIC compared to a piecewise linear model with one knot. All analysis was completed in R (version number 3.6.0).

Participants
This analysis consisted of 177 participants assessed at 586 time points (an average of 3.3 assessments per participant). Participants were recruited from seven sites globally and the populations at each site were varied.
The majority of participants included in this analysis were SMA type 2 (62%), with 33% SMA type 3a and 5% type 3b. The full range of the RHS was observed in this cohort, with a median score of 12. An overview of the medians, interquartile ranges, and ranges observed in this population are available in Table 1. Female patients made up 47% of the cohort, with significantly stronger RHS scores observed in the females compared to the males (p = 0.007). In the 149 patients where spinal surgery status was known, 23 (15%) had undergone spinal surgery. In these patients, the median RHS score was lower and the median age was higher.

2-Year Change in RHS and HFMSE
The full RHS and HFMSE change scores are presented in Table 2. We observe relative stability in the SMA type 2 and 3b groups. However, in the SMA type 3a group, there is a trend towards a slight decrease in scores, which is only significant at 18 months (−2 in the RHS (p = 0.027), −1.5 in the HFMSE (p = 0.01)). When considering the change score by age group, the under-5-year age group are the only participant subgroup who display a positive significant change score across all time points of both the RHS and HFMSE. In the 5-7 years subgroup, there is, on average, a trend of decline from 18 months, although this is not significant. In the 8-13 years subgroups, we see a decrease in RHS scores at all time points, which are significant at 6, 12, 18, and 24 months (p = 0.001, p < 0.001, p < 0.001, and p = 0.005, respectively). The 14-18 years subgroup display a mild, non-significant trend towards decline at 6 and 12 months. Grouping change scores by motor function, the strongest difference is observed in the transitional group, where the average change is significantly negative at 12, 18, and 24 months (p < 0.001, p < 0.001, and p = 0.016 for both the RHS and HFMSE). It is worth noting that this is the smallest subgroup.
When considering the change in RHS and HFMSE with respect to cross-tabulated SMA types and age (full results shown in Table 3), we found that in the under-5 age group, the SMA 2 subgroup showed increasing RHS and HFMSE scores, which were significant at all follow ups (p = 0.01, p < 0.001, p = 0.002, and p = 0.003 for the RHS; p = 0.06, p = 0.002, p = 0.015, and p = 0.001 for the HFMSE). Notably, at 24 months, there was a median 1-point increase in the RHS, with 75% of assessments for which there was a 24-month follow up having at least a 1-point increase in the RHS. In the under-5 age group, the SMA 3a group also showed a trend towards increasing score at and after 12 months, although this was only significant at 6 months in both the RHS and HFMSE (median 6 (p = 0.03) and 3 (p = 0.06), respectively).
In the 5-7 age group, there is a decline in the RHS scores for the SMA 2 subgroup that is significant at 2 years (−2 points, p = 0.021), but this was not significant in the HFMSE. There is no significant change in the 3a participants in this age group, although there is a trend towards decline in the RHS and not the HFMSE.
In the 8-13 age group, there is a clear decline in the SMA 2 participants in the RHS and the HFMSE, with a significant median decline of -1 in both the RHS and the HFMSE at 12 months (p = 0.003 and p < 0.001, respectively). In this group, the median 24-month change in the RHS is −1.5 (p = 0.019), and in the HFMSE is −3 (p = 0.004). The same changes were also seen in the SMA 3a participants between the age of 8 and 13 years at baseline, with a significant decline in both the RHS at 6, 12, and 18 months (p = 0.003, p = 0.004, and p = 0.013, respectively) and in the HFMSE at 6, 12, 18, and 24 months (p = 0.007, p = 0.001, 0.001, and p = 0.035, respectively). At 24 months, the median RHS change is -9, and the median HFMSE change is -6. In the over-14s, there is an overall stability in the SMA 2s and 3bs, and a trend towards decline in the 3as, although the significance could not be assessed due to the small numbers.    Sitter When considering the change scores by both age group and motor function at baseline (as in Table 4), we observed increasing RHS and HFMSE scores in the under-5 sitters, which is significant at 6, 12, 18, and 24 months (p = 0.014, <0.001, <0.001, and <0.001 for the RHS, respectively; and 0.018, 0.002, 0.004, and <0.001 for the HFMSE, respectively). There is an average increase of 1.5 and 2.5 points in the RHS and HFMSE, respectively, at the 24-month time points. This contrasts with the sitters in the 5-7, 8-13, and 14-18 age groups, who display negative change scores across all time points in both the HFMSE and RHS. In the sitters aged between 8 and 13 at baseline, there is a significant decline at 1-year of -1 in the RHS (p = 0.002) and -2 in the HFMSE (p < 0.001), respectively. In the transitional group, there is a trend of decline in all age groups, with notable changes in the 5-7 and 8-13 age groups, although significance could not be assessed due to small sample sizes. In the walkers, there is a clear improvement in the under-5s, which is significant at 6 months in both the RHS (5, p = 0.006) and the HFMSE (3, p = 0.012).
The full change scores of the cross tabulated data by age group and RHS total score are presented in Table 5. In those scoring in the lowest quartile of the RHS (RHS score 0-4) before the age of 5, there is an increase in both the RHS and the HFMSE scores. The change in the RHS is significant at 6, 12, and 18 months (p = 0.002, 0.001, and 0.0031, respectively), but due to small sample size, it was not possible to calculate the significance at 24 months. Of relevance, this group do not show significant changes in the HFMSE. In other age groups, participants scoring in the lowest quartile had no detectable change (the absolute average change was ≤1) at any follow-up on either the RHS or the HFMSE. In the Q2 group (RHS score 5-9 at baseline), there was an increase in both the RHS and the HFMSE scores in the under-5 group, which was significant at 6, 12, and 24 months in the HFMSE (p = 0.043, 0.004, and 0.002, respectively), and at 12 and 18 months in the RHS (p = 0.004 and 0.004, respectively). It was not possible to assess the significance of the change at 24 months due to small sample sizes. In this group, the median 12 months change was 2 points on both the RHS and the HFMSE, whereas at 18 months, it was 5.5 and 6, respectively. In the Q2 group, between the ages of 8 and 13, there was a trend towards decline in both the RHS and HFMSE. In the Q3 participants (RHS score 10-18 at baseline), there was a decline in the 8-13 age group, which was significant at 12 and 18 months in both the HFMSE (p = 0.007 and 0.012) and the RHS (p = 0.035 and p = 0.021). In the Q4 group (RHS score 19-42 at baseline), between the ages of 8-13, there was a significant decline in the RHS and HFMSE at 6, 12, and 18 months (for the RHS, p = 0.004, 0.001, and 0.001, respectively; for the HFMSE, p = 0.019, p < 0.001, and 0.002, respectively). In this cohort, the median 2-year change is -9 on the RHS and -6 on the HFMSE. In the Q5 group (RHS score 43-69 at baseline), there was a trend towards decline in the RHS and HFMSE, but it was not significant.

Ceiling and Floor Effect
Five participants, across 11 assessments, scored a 0 on the RHS, all SMA type 2 (median age 13.8 years (IQR: 12.2, 16.5)). Spinal surgery status was only known for eight of the assessments; of these, six occurred after spinal fusion surgery (75%). Eleven participants, all SMA type 2, across 19 assessments, scored a 0 on the HFMSE (median age 13.1 years (IQR: 10.6, 15.5)). Twelve of these assessments occurred after spinal fusion surgery (63% of the assessments where spinal fusion surgery was known), with zero assessments having missing spinal fusion surgery data.
On the occasions where participants scored a 0 on the RHS, the majority also scored a 0 on the HFMSE (75%). However, there was a lower rate of floor assessments in the RHS compared to the HFMSE, with participants in two thirds (13/19) of the assessments in which they scored a 0 on the HFMSE having a non-zero score on the RHS. The details of this are presented in Table 6. Here, the seven participants who scored a 1 all achieved a 1 on the Item 4-Adduction from Crook Lying item. For the six participants who scored a 2 on the RHS whilst scoring a 0 on the HFMSE, they all scored a 2 on Item 4-Adduction from Crook Lying.
Only two participants scored a maximum score of 69 in the RHS, a 6.3-year-old SMA 3a participant and a 15.2-year-old SMA 3b participant. Both only achieved this during one assessment, and both also scored the maximum of 66 on the HFMSE. These were the only participants who scored the maximum on the HFMSE.

Timed Tests
The timed portion of Item 19, runs 10 m, was recorded for 95 assessments; however, the corresponding item score was 0 for 7 (7%) of the assessments, and so these were removed. This yielded 88 assessments for 31 patients. The timed portion of Item 25, rise from floor, was recorded for 100 assessments; however, the corresponding item score was 0 for 19 (19%) of the assessments, and so these were removed. This yielded 81 assessments for 28 patients.
The average time for Item 19 (runs 10 m) when achieving a RHS total score of 50 was 7.91 s, and for every increase of one-point in RHS total score, this time reduced by 0.17 s. The average time for item 25 (rise from floor) when achieving a RHS total score of 50 was 10.18 s, and for every increase of one-point in RHS total score, this time reduced by 0.36 s. When comparing trends of RHS total score with the two timed test items, we see that for participants achieving above a 60 on the RHS total, there is about 5 s of variation in the time for run and nearly 20 s in variation in the rise from floor time. The rise from floor time is noticeably more variable than the time for run, as can be seen in Figure 1. This suggests that higher granularity can be achieved by looking at the rise from floor. Additionally, the rise from floor shows more of a person-specific effect, where some participants have higher rise-from-floor times across all time-points. This may suggest that the rise from floor captures aspects of strength and function that are at least partially independent of the RHS total score. A possible contributor to these differences might also be related to the lower limb contractures, muscle imbalances, and the different distribution of muscle weakness that can impact strategy selection and be observed between participants apparently performing at a similar level when assessed with a functional scale.     Table 6. Breakdown of RHS and HFMSE total scores in assessments where at least one was 0. A score of 42 or above on the RHS was the most predictive (in terms of tradi specificity and sensitivity) of the ability to complete Item 19, run 10 m. Of the 463 a ments where participants scored less than 42, participants in 1% of assessments (six) pleted the timed test. In 70% of assessments where participants scored a 42 or mor timed test was completed (83 timed test completed, 35 not completed).

HFMSE Total
A score of 44 or above on the RHS was the most predictive (in terms of tradi specificity and sensitivity) of the ability to complete Item 25, rise from floor. Of th assessments where participants scored less than 44, participants in <1% of assess (three) completed the timed test. In 72% of assessments where participants scored a more, the timed test was completed (78 timed test completed, 31 not completed).

RULM and RHS
The RULM score was recorded for 86 participants at 226 time points. There is a trend linking the RULM and the RHS, with a score of 0 on the RHS equivalent to an age score of 10.33 on the RULM. The RULM score then increases linearly with the with a one-point increase on the RHS equivalent to 0.74 points on the RULM, up to score of 22 on the RHS. After this, the slope is shallower, as the assessment scor impacted by the ceiling effect of the RULM. Here, a one-point increase on the RHS sponds to a 0.21-point increase on the RULM. This trend is demonstrated in Figure 2 data suggest that the RULM provides more information than the RHS in assess where the participants have relatively low scores in the RHS: for scores under 20 o RHS, there is a range of 37 points on the RULM. Similarly, in those scoring under the RHS, there is a range of 25 points on the RULM, and in those scoring 0 on the A score of 42 or above on the RHS was the most predictive (in terms of trading-off specificity and sensitivity) of the ability to complete Item 19, run 10 m. Of the 463 assessments where participants scored less than 42, participants in 1% of assessments (six) completed the timed test. In 70% of assessments where participants scored a 42 or more, the timed test was completed (83 timed test completed, 35 not completed).
A score of 44 or above on the RHS was the most predictive (in terms of tradingoff specificity and sensitivity) of the ability to complete Item 25, rise from floor. Of the 477 assessments where participants scored less than 44, participants in <1% of assessments (three) completed the timed test. In 72% of assessments where participants scored a 44 or more, the timed test was completed (78 timed test completed, 31 not completed).

RULM and RHS
The RULM score was recorded for 86 participants at 226 time points. There is a clear trend linking the RULM and the RHS, with a score of 0 on the RHS equivalent to an average score of 10.33 on the RULM. The RULM score then increases linearly with the RHS, with a one-point increase on the RHS equivalent to 0.74 points on the RULM, up to a total score of 22 on the RHS. After this, the slope is shallower, as the assessment scores are impacted by the ceiling effect of the RULM. Here, a one-point increase on the RHS corresponds to a 0.21-point increase on the RULM. This trend is demonstrated in Figure 2

Discussion
The RHS and HFMSE are two scales designed to capture motor function in SMA. Whereas the HFMSE captures the physical abilities of SMA type 2 and type 3 participants with limited/no ambulation, the RHS was developed to extend the range of functional abilities captured by the HFMSE at both the lower (including the Adduction from Crook Lying item adapted from the CHOP-INTEND) and the upper end (including the Stand on one Leg, Hop, Climb, and Descend Box Step, and the timed items adapted from the NSAA). Consequently, the RHS was designed to assess physical abilities of very weak SMA type 2 participants who are no longer able to sit through to very strong, ambulant participants with SMA type 3. However, longitudinal comparative analysis of these two scales is limited, and correlation of the RHS with other functional scales capable of capturing aspects of function related to the upper limb (RULM) have not been reported so far.
In the non-sitters, our findings of no-change at 12 and 24 months is in line with what has been previously reported with the HFMSE [20]. This remains a limitation for both scales, which can be addressed by the inclusion of additional assessments of upper limb function. Indeed, we found that the RHS is highly correlated with the RULM, similarly to the HFMSE [24]. Additionally, our results suggest that the RHS can be enhanced by the RULM when considering the variation in the weaker participants who are scoring under 20 on the RHS. In the non-sitters, who typically achieve between 1 and 3 on the RHS, the RULM total scores ranged from 0-20, providing much more sensitivity for disease progression. We suggest that the RHS is performed alongside the RULM in these cases rather than using a gross motor scale such as the CHOP-Intend [15]. As the CHOP-Intend was developed for use in infants, items 15 and 16, which are performed in ventral suspension, are not appropriate for older patients, which leads to incomplete assessments.
Our findings show that any improvement of scores on the RHS or the HFMSE over time would represent a positive divergence from the natural history and could allow to assess therapeutic response even in this patient population. It is worthy to note that the floor effect observed for the RHS occurs only in this subpopulation of non-sitters, and that the RHS exhibits less floor effect than the HFMSE, with two thirds of the participants who achieved 0 on the HFMSE achieving a 1 or a 2 on the Adduction from Crook Lying item on the RHS. Our data suggest, therefore, that by considering the RHS, individuals who

Discussion
The RHS and HFMSE are two scales designed to capture motor function in SMA. Whereas the HFMSE captures the physical abilities of SMA type 2 and type 3 participants with limited/no ambulation, the RHS was developed to extend the range of functional abilities captured by the HFMSE at both the lower (including the Adduction from Crook Lying item adapted from the CHOP-INTEND) and the upper end (including the Stand on one Leg, Hop, Climb, and Descend Box Step, and the timed items adapted from the NSAA). Consequently, the RHS was designed to assess physical abilities of very weak SMA type 2 participants who are no longer able to sit through to very strong, ambulant participants with SMA type 3. However, longitudinal comparative analysis of these two scales is limited, and correlation of the RHS with other functional scales capable of capturing aspects of function related to the upper limb (RULM) have not been reported so far.
In the non-sitters, our findings of no-change at 12 and 24 months is in line with what has been previously reported with the HFMSE [20]. This remains a limitation for both scales, which can be addressed by the inclusion of additional assessments of upper limb function. Indeed, we found that the RHS is highly correlated with the RULM, similarly to the HFMSE [24]. Additionally, our results suggest that the RHS can be enhanced by the RULM when considering the variation in the weaker participants who are scoring under 20 on the RHS. In the non-sitters, who typically achieve between 1 and 3 on the RHS, the RULM total scores ranged from 0-20, providing much more sensitivity for disease progression. We suggest that the RHS is performed alongside the RULM in these cases rather than using a gross motor scale such as the CHOP-Intend [15]. As the CHOP-Intend was developed for use in infants, items 15 and 16, which are performed in ventral suspension, are not appropriate for older patients, which leads to incomplete assessments.
Our findings show that any improvement of scores on the RHS or the HFMSE over time would represent a positive divergence from the natural history and could allow to assess therapeutic response even in this patient population. It is worthy to note that the floor effect observed for the RHS occurs only in this subpopulation of non-sitters, and that the RHS exhibits less floor effect than the HFMSE, with two thirds of the participants who achieved 0 on the HFMSE achieving a 1 or a 2 on the Adduction from Crook Lying item on the RHS. Our data suggest, therefore, that by considering the RHS, individuals who otherwise are indistinguishable on the HFMSE can be split into three groups based on their Adduction from Crook Lying item score.
In this study, we split the population that has been defined as sitters in previous studies with the HFMSE into two separate groups: one group included those patients whose maximum motor function was sitting (sitters); and another that included those patients who were also able to crawl, stand with support, walk with support, or stand independently (transitional). Previous work from our group has shown that the transitional group have RHS scores significantly different from the sitters group [21]. In a previous study on the HFMSE, where the group of sitters was considered to include both what we have defined as sitters and transitional in the present study, Coratti et al., 2020 [20] found that the 12-and 24-month HFMSE scores had a change in mean of −0.83 and −1.99, respectively. In our study, when we considered the sitters broken down by age groups, we found that there was significant change in median scores of 1.5 at 24 months for the under-5 age group, whereas there was a non-significant trend towards decline of -1 and -2 at 24 months in the 5-7 and 8-13 age groups, respectively. These findings suggest that an average gain in RHS or HFMSE score over time in sitters over the age of 5 would signify treatment response in a treated patient population.
When considered in isolation, the newly defined transitional group was the group who displayed greatest change over time. It is worth noting that a smaller proportion of our participants are in this group, but despite the heterogeneity of the group in terms of age, these participants display relatively homogeneous decline. We observed a trend towards decline in this group in all age groups beyond the age of 5, but none of the sample sizes were large enough to complete significance testing. This transitional group is likely a population of interest for therapeutic research, as they would be most likely to display a detectable treatment effect in the shortest timeframe.
Patients who scored between 10 and 18 on the RHS at baseline (Q3), represent the midto-strong sitters and the weaker transitional patients. In this group, we found increasing RHS scores in the under-5 age group, and a decline in both the RHS and HFMSE in the 8-13 age group that was significant at 12 and 18 months in the RHS and the HFMSE. In patients scoring between 19 and 42 on the RHS at baseline (Q4), which represent the strongest sitters, the transitional patients, and the weaker ambulant patients, we found improving scores in the under-5 age group, and a significant decline in the 8-13 age group in both the RHS and HFMSE, with a median 2-year change of −9 on the RHS and of −6 on the HFMSE. In the SMA 3b group, we observed stability at 6, 12, 18, and 24 months, although the number of SMA 3b participants in this study was too small to consider the change with respect to age group.
A ceiling effect was identified in <1% of RHS and HFMSE assessments (and <1% of participants) in the present study [25]. Both participants in this analysis scoring the maximum on the HFMSE also scored maximum in the RHS. It was not possible, therefore, to compare the relative strength of the score in the very strongest SMA patients, as none were included in this sample. However, the RHS includes two timed tests which were designed to increase the sensitivity of the scale for the strongest patients. The relationship between the RHS and the two timed tests was linear on average, and we found that the timed tests allowed to discriminate between patients having similar RHS total scores. This was particularly true for the rise from floor time, where the between-person variability was very high and the within-person variability was lower. It is likely that this betweenperson variability reflects the differences in pattern of muscle involvement in individual participants affected by SMA irrespective of the subtype. Our findings, therefore, suggest that it is possible to differentiate patients who are scoring similarly on the RHS by using the rise from floor time, thus increasing the sensitivity of the scale in patients achieving a total score of over 40. Additionally, patients scoring a RHS total score above 41 were most likely to perform the walk 10 m item, and this occurred in 70% of assessments. Similarly, the rise from floor item was most likely to be performed in assessments where the participant had an RHS total score of at least 44. It is worth noting that in 7 and 19% of assessments where the walk 10 m and rise from floor items were reported respectively, the corresponding items score was 0, and these times were discarded. This high rate is potentially due to the newness of the RHS scale, although it should be noted that these times could still be informative, particularly the rise from floor, where a 0 corresponds to rise with furniture. In the SMA 3a and 3b participants, it would be important to understand the change in timed test by age group, but this was not possible with this data due to small sample sizes.
One limitation of this study is that by splitting the participants into age groups, and not considering instead age as a continuous variable, it is possible that the groups chosen do not fully represent the participants. The groups here are nevertheless similar to groups used in papers analysing the HFMSE and reflect the peak in motor function observed at 5 years in the SMA 2s, and at 7 years in the SMA 3s [24]. In order to allow analysis of the change scores by motor function category and baseline RHS, the groups that have previously been used (i.e., <5, 5-13, >13 for SMA 2s age [20,23,26,27]; and <5, 5-7, 8-14, >14 for SMA 3s [23,28]) were merged to create the following groups: <5, 5-7, 7-13, 14-18. In this analysis, only children were included, but it will be important to separately analyse the natural history of the RHS in adult SMA in the future [23]. Finally, as this population was selected from the full collected cohort based on completeness of the full RHS, with non-missing items, we cannot exclude that the participants included in this analysis will be biased towards a stronger population, as it might be that some participants having missing items because they were not able to complete the items were excluded from the analysis. The inclusion of missingness codes for each item in future prospective data collections, with options such as "not completed due to injury" vs. "not completed due to lack of time", could address this limitation.
Despite these limitations, this study represents the most comprehensive natural history dataset to contextualise disease progression in different subgroups of the SMA 2 and SMA 3 populations, and natural history data are becoming less available due to increasing therapeutic availability worldwide.

Conclusions
This study describes change over time in RHS scores across SMA type, age, ambulatory status, and newly described WHO-derived functional types. We showed that the RHS is more effective at differentiating non-sitter participants and has a reduced floor effect when compared to the HFMSE. In addition, our findings confirm that the RULM should be used in conjunction with the RHS for the weakest patients. In particular, RULM score should be used to differentiate participants scoring less than 20 on the RHS. We also showed that change in RHS over time is as sensitive as the HFMSE, and that they can both detect change in certain type, age, and motor function subgroups. However, the timed tests included in the RHS mean that it is more able to discriminate between patients scoring similarly on the RHS total, a benefit over the HFMSE. This information is an important factor in improving clinical trial designs, informing future patient clinical guidelines, and assisting in the interpretation of results of medical interventions in SMA types 2 and 3.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jcm12051920/s1, Table S1: Up to 2-year mean change in the RHS and HFMSE broken down by SMA type, age, and motor function group; Table S2: Up to 2-year mean change in the RHS and HFMSE cross-tabulated by SMA type and age; Table S3: Up to 2-year median change in the RHS and HFMSE cross-tabulated by motor function and age; Table S4: Up to 2-year median change in the RHS and HFMSE cross-tabulated by baseline RHS group and age; Table S5: Up to 2-year median change in the RHS and HFMSE cross-tabulated by baseline RHS group and age; Table S6: Count of patients scoring 0 on the RHS and the HFMSE. has served as a consultant and as a speaker in sponsored symposiums for Biogen, and has received personal fees for AveXis. A.W., E.Milev, G.S., M.C., M.M., and Z.Z.C. have no conflicting interests. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.