Baseline Variability Affects N-of-1 Intervention Effect: Simulation and Field Studies

The simulation study investigated the relationship between the local linear trend model’s data-comparison accuracy, baseline-data variability, and changes in level and slope after introducing the N-of-1 intervention. Contour maps were constructed, which included baseline-data variability, change in level or slope, and percentage of non-overlapping data between the state and forecast values by the local linear trend model. Simulation results showed that baseline-data variability and changes in level and slope after intervention affect the data-comparison accuracy based on the local linear trend model. The field study investigated the intervention effects for actual field data using the local linear trend model, which confirmed 100% effectiveness of previous N-of-1 studies. These results imply that baseline-data variability affects the data-comparison accuracy using a local linear trend model, which could accurately predict the intervention effects. The local linear trend model may help assess the intervention effects of effective personalized interventions in precision rehabilitation.


Introduction
An N-of-1 research design can be used to examine the effectiveness of rehabilitation interventions for a single patient. This research design involves multiple measurements over time in one patient and is advantageous for assessing behavioral and functional changes in individual patients in various rehabilitation settings [1][2][3]. Two major types of N-of-1 research designs are reversal design ( Figure 1A) and multiple-baseline design ( Figure 1B) [4][5][6][7]. A reversal design runs consecutive sessions by alternating control and treatment conditions, while a multiple-baseline design runs consecutive sessions by allowing each control condition for a different number of data points.
Such designs expose the patient to both control (baseline phase) and treatment (intervention phase) conditions, thus comparing individuals' behavior and function between the baseline and intervention phases to discover whether the behavior and function in the control condition are changed after treatment initiation [8][9][10]. A previous study noted that a meta-analysis of randomized N-of-1 research designs represented the highest level of evidence in clinical practice [11].
Once the N-of-1 data in the baseline and intervention phases are obtained, they are conventionally compared between the phases using various analysis methods, such as the percentage of non-overlapping data assuming a stable slope (PND-STB), the splitmiddle line, autoregressive integrated moving average model (ARIMA), and Tau-U [12][13][14][15]. However, interpretation of these conventional analysis methods is limited to over-and In the reversal design, the researcher monitors a behavior by alternating control with treatment conditions. In the multiple-baseline design, the researcher monitors several behaviors by using a different number of control data points under the same treatment.
Such designs expose the patient to both control (baseline phase) and treatment (intervention phase) conditions, thus comparing individuals' behavior and function between the baseline and intervention phases to discover whether the behavior and function in the control condition are changed after treatment initiation [8][9][10]. A previous study noted that a meta-analysis of randomized N-of-1 research designs represented the highest level of evidence in clinical practice [11].
Once the N-of-1 data in the baseline and intervention phases are obtained, they are conventionally compared between the phases using various analysis methods, such as the percentage of non-overlapping data assuming a stable slope (PND-STB), the split-middle line, autoregressive integrated moving average model (ARIMA), and Tau-U [12][13][14][15]. However, interpretation of these conventional analysis methods is limited to over-and under-estimation because these methods assume a linearly stable slope of the data changes. However, the patient's small and non-normal data would not be linearly but nonlinearly or unstably changed with time [16,17]. Recently performed Bayesian estimates, such as the local linear trend model (LLT), are expected to be a new, useful analysis method because they can assume both the unstable changing level and the slope of the data [18,19]. Therefore, this model can compare N-of-1 data with time-course changes in the level and slope between the baseline and intervention phases.
Although the LLT is useful for the analysis of small and non-normal N-of-1 data with time-course changes in level and slope, little is known about the accuracy of comparison between baseline and intervention phases for the LLT. A previous study noted that the quality of data in the baseline phase influences the data comparison between baseline and intervention phases: first, baseline data should be less variable and highly stable to accurately predict future data; second, baseline data should lack any trend to allow for accurate evaluation of the change in slope and level of the data after the introduction of the treatment [20,21]. Based on previous studies on the quality of baseline data, we hypothesized that the accuracy of data comparison between baseline and intervention phases may be affected by the variability of baseline data and changes in slope and level after treatment initiation. Correspondingly, the LLT may accurately predict the effectiveness of actual field data. Exploring how the accuracy of data comparison between baseline and intervention phases is affected by baseline-data variability and changes in level and slope In the reversal design, the researcher monitors a behavior by alternating control with treatment conditions. In the multiple-baseline design, the researcher monitors several behaviors by using a different number of control data points under the same treatment.
Although the LLT is useful for the analysis of small and non-normal N-of-1 data with time-course changes in level and slope, little is known about the accuracy of comparison between baseline and intervention phases for the LLT. A previous study noted that the quality of data in the baseline phase influences the data comparison between baseline and intervention phases: first, baseline data should be less variable and highly stable to accurately predict future data; second, baseline data should lack any trend to allow for accurate evaluation of the change in slope and level of the data after the introduction of the treatment [20,21]. Based on previous studies on the quality of baseline data, we hypothesized that the accuracy of data comparison between baseline and intervention phases may be affected by the variability of baseline data and changes in slope and level after treatment initiation. Correspondingly, the LLT may accurately predict the effectiveness of actual field data. Exploring how the accuracy of data comparison between baseline and intervention phases is affected by baseline-data variability and changes in level and slope after treatment using the LLT may contribute to our understanding of the N-of-1 research design to examine the effectiveness of rehabilitation intervention for a single patient.
To address these issues, the present study aimed (1) to compare data-comparison accuracy between the LLT and conventional PND-STB methods according to baseline-data variability and compare changes in level and slope of the baseline data after introducing the N-of-1 intervention, (2) to explore the range of baseline-data variability and changes in level and slope for acceptable data-comparison accuracy using a simulation study, and (3) to examine the effect of intervention using the LLT and PND-STB in an actual N-of-1 field study.

Simulation Study
Our study was approved by the Research Ethics Committee of the Tokyo Kasei University (SKE2021-12) and followed the guidelines of the Declaration of Helsinki. We predicted that individual data, such as behavior and function, would randomly fluctuate and that the level and slope of the data would change after commencing treatment. Therefore, to generate the individual data, a simple state-changing model was constructed, which included the level as immediate effects, slope as retardative effects, and random fluctuation as data variability process, as follows: where α is the y-intercept of the data, reflecting the level of the data as immediate effects; β is the slope of the data, reflecting changes in the data as retardative effects; ε t is the random variation, reflecting the fluctuation of the data as data variability; and t is the number of trials during the baseline and intervention phases. study, and (3) to examine the effect of intervention using the LLT and PND-STB in an actual N-of-1 field study.

Simulation Study
Our study was approved by the Research Ethics Committee of the Tokyo Kasei University (SKE2021-12) and followed the guidelines of the Declaration of Helsinki. We predicted that individual data, such as behavior and function, would randomly fluctuate and that the level and slope of the data would change after commencing treatment. Therefore, to generate the individual data, a simple state-changing model was constructed, which included the level as immediate effects, slope as retardative effects, and random fluctuation as data variability process, as follows: , ( where is the y-intercept of the data, reflecting the level of the data as immediate effects; is the slope of the data, reflecting changes in the data as retardative effects; is the random variation, reflecting the fluctuation of the data as data variability; and is the number of trials during the baseline and intervention phases. Figures 2 and 3 show the representative simulation data of the change in the level and slope, respectively, derived from Equation (1). deviated from the value of the slope trajectory by 0.0 and ±1.2 (0.2 steps) for the median value of the baseline data. α of the baseline phase was 1.0, whereas α of the intervention phase multiplied the level of the end of the baseline data by 1.0 to 1.6 (0.1 steps), reflecting the change in level.
of the baseline phase was 0.1, whereas of the intervention phase multiplied the slope of the baseline phase by 1.0 to 2.2 (0.2 steps), reflecting the change in slope. was set from 0 to 4 in the baseline phase and from 5 to 14 in the intervention phase in accordance with the number of trials in previous N-of-1 research [2,3,7,10,15]. Equations were constructed using the Python 3.9.7 (Python Software Foundation, Wilmington, DE, USA).   Subsequently, the LLT was applied to the simulated data. The LLT assumes that both levels (Equation (3)) and slope (Equation (4)) from the simulation data (Equation (2)) follow Gaussian random walks. An LLT was constructed for the simulated data as follows: , where is the simulation data, is the random variable, is the level, is the slope, is the disturbance at the level, and is the disturbance at the slope [18,19]. The LLT estimated the state value with random variables ( ) in the baseline and intervention phases (state value) and forecasted the value by extrapolating the baseline value to the intervention phase using the level ( ) and slope ( ) (forecast value). After calculating state and forecast values, the values in the intervention phase were compared using the percentage of non-overlapping data between state and forecast values (PND-LLT) and PND-STB, as follows: Subsequently, the LLT was applied to the simulated data. The LLT assumes that both levels (Equation (3)) and slope (Equation (4)) from the simulation data (Equation (2)) follow Gaussian random walks. An LLT was constructed for the simulated data as follows: where y t is the simulation data, ε t is the random variable, µ t is the level, ν t is the slope, ξ t is the disturbance at the level, and ζ t is the disturbance at the slope [18,19]. The LLT estimated the state value with random variables (ε t ) in the baseline and intervention phases (state value) and forecasted the value by extrapolating the baseline value to the intervention phase using the level (µ t ) and slope (ν t ) (forecast value). After calculating state and forecast values, the values in the intervention phase were compared using the percentage of non-overlapping data between state and forecast values (PND-LLT) and PND-STB, as follows: where g is the number of state values greater than forecast values (PND-LLT) or the number of raw data in the intervention phase greater than the maximal data in the baseline phase (PND-STB), and n is the number of sessions in the intervention phase. In current practice, PND values in the range of 90-100 are accepted as "very effective", those in the range of 70-90 as "effective", those in the range of 50-70 as "questionable", and those below 50 as "ineffective" [22]. The generation and comparison of simulated data were repeated 100 times, and the PND-LLT and conventional PND-STB were calculated in each step of random error and changes in level and slope. The LLT was conducted using the statsmodels package in the Python environment.
Next, we predicted that the variability of baseline data (ε t of Equation (1)) and changes in the level (α of Equation (1)) and slope (β of Equation (1)) after introducing treatment would affect the data-comparison accuracy between the state and forecast values because the variability of baseline data with trend influences the data comparison [20,21]. Therefore, contour maps were constructed, which included baseline-data variability, change in level or slope, and the PND-LLT or conventional PND-STB.
In addition, the PND for the ε t values of 0.0 ± 0.0, reflecting true value without random variation; the PND for 1.0 changes in the level and slope, reflecting no-intervention effects; and the PND for 1.6 changes in the level and slope, reflecting maximal intervention effects, were extracted from the PND-LLT and PND-STB.
Furthermore, to contribute to the clinical indices, the coefficient of variation (CV) values for ε t values of 0.0-1.2 were calculated as well as the range of baseline-data variability (i.e., CV values) and changes in level and slope for acceptable data-comparison accuracy (i.e., PND value ≥ 70).

Field Study
In addition, field testing involved applying PND-LLT and PND-STB to 17 N-of-1 trial datasets from the behavioral rehabilitation database, based on published Japanese articles [23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39] (Table 1). Eligibility criteria for field studies included an N-of-1 research design, rehabilitation training, and a period of more than 2 and 4 days in baseline and intervention phases, respectively. All participants provided informed consent prior to participation in each published article. State and forecast values were calculated using the LLT, and the values in the intervention phase were compared using the PND-LLT and the PND-STB (Equation (5)). In the present study, when the PND value was ≥ 70, the intervention was considered effective.

Field Study
The effectiveness of intervention in each of the 17 N-of-1 studies was re-examined using the PND-LLT and the PND-STB. Participants in our field study were patients over the age of 40 years with various diseases, such as brain injury, stroke, dementia, and cervical myelopathy. In addition, the field study included various target behaviors, such as dressing, using chopsticks, and walking, and various interventions, such as token economy, shaping, and reinforcement. Figure 7 presents the results of the field study for the actual N-of-1 trial datasets. For all 17 (100%) studies, the PND-LLT values were ≥70. However, for 15 of the 17 (88%) studies, the PND-STB values were ≥70. Additionally, for 8 of the 17 (47%) studies, the PND-LLT values were higher than the PND-STB values.

Field Study
The effectiveness of intervention in each of the 17 N-of-1 studies was re-examined using the PND-LLT and the PND-STB. Participants in our field study were patients over the age of 40 years with various diseases, such as brain injury, stroke, dementia, and cervical myelopathy. In addition, the field study included various target behaviors, such as dressing, using chopsticks, and walking, and various interventions, such as token economy, shaping, and reinforcement. Figure

Discussion
This study aimed to test the hypothesis that the accuracy of data comparison between baseline and intervention phases is affected by the variability of baseline data and changes in slope and level after treatment initiation, and that the LLT can accurately predict the effectiveness of actual field data. Our computational simulation results showed that baseline-data variability and changes in the level and slope after the intervention affect the accuracy of data comparison based on the PND-LLT. In addition, according to our field results, the PND-LLT confirmed the effectiveness of 100% of previous N-of-1 studies in actual field intervention.
The first novel observation of our study is that the baseline-data variability and changes in the level and slope affect the accuracy of data comparison based on the PND-LLT. Once the N-of-1 data in the baseline and intervention phases were obtained, they were conventionally compared between the phases using the PND-STB [12,40,41]. In the PND-STB, the percentage of raw data in the intervention phase greater than the maximal data in the baseline phase was calculated [12,40,41]. Although conventional analysis methods, such as the PND-STB, are useful for N-of-1 data analysis, they cannot assume a change in the slope of the data, apart from in linearly stable data. Therefore, these analyses cannot precisely predict the current and future data. In our study, the PND values between the state and forecast values increased in accordance with the increment of baseline-variabilities for 1.0 changes in the level or slope for the PND-LLT, although the PND increased in response to an increment of more than a 1.0 change in level or slope. However, the PND values were high regardless of the baseline variabilities for 1.0 changes in the level and slope in the PND-STB. These simulation results indicate that baseline-data variability is important for understanding the effectiveness of an intervention and that the PND-LLT could more accurately predict the intervention effects compared to conventional PND-STB because it had fewer type I errors. These observations may help understand and promote personalized interventions based on an N-of-1 research design from the perspective of each individual's precision rehabilitation.
We calculated the CV values in the baseline phase and explored the range of baseline-CV values and changes in level and slope for acceptable data-comparison accuracy. For the PND-LLT, the limitations of baseline-CV values for detectable changes in level were 0.13 ± 0.00, 0.17 ± 0.01, or 0.26 ± 0.01 for changes in level of 1.1, 1.3, or 1.5, respectively. Likewise, the limitations of baseline-CV values for detectable changes in slope were 0.13 ± 0.00 or 0.17 ± 0.01 for changes in slope of 1.2 or 2.0, respectively. However, the baseline-CV value of ≥0.36 ± 0.01 or 0.26 ± 0.01 could not detect changes in level of 1.1-1.6 or the change in slope of 1.2-2.2, respectively. Our results suggest that changes in level and slope for detecting the intervention effect can be estimated by an assessment of baseline-CV values. Therefore, the ranges of baseline-CV values and changes in level and slope for acceptable data-comparison accuracy may contribute to an increasingly evidence-based personalized approach in clinical rehabilitation settings. Future research should address baseline-CV variability ≥ 0.36 ± 0.01 or 0.26 ± 0.01 and the detection of changes in level and slope.
The second novel finding of our study is that PND-LLT predicted that all 17 (100%) field studies conducted effective interventions. In fact, in 8 of 17 (47%) studies, the PND-LLT values were higher than the PND-STB values. These field results indicate that the PND-LLT could more sensitively predict the intervention effects compared to the PND-STB. In the original studies, the N-of-1 data were conventionally compared between the baseline and intervention phases using not only the PND-STB but also visual inspection, split-middle line, ARIMA, or Tau-U [12][13][14][15]42]. In visual inspection, the trend and level of N-of-1 data are visually compared between phases without mathematical operations [42]. ARIMA can assess the difference between current and previous values with autoregression and can forecast future values [43]. Furthermore, Tau-U can consider the trend of the baseline and intervention phases and compare the data between the two [15]. These conventional analysis methods assume a linearly stable slope of the data, similar to the PND-STB. However, we only focused on the data-comparison accuracy between the PND-LLT and the conventional PND-STB. Therefore, future studies are needed to clarify the data-comparison accuracy for other analysis methods, including PND-LLT, the split-middle line, ARIMA, and Tau-U. When performing a rehabilitation intervention, therapists assess the effectiveness of the treatment using an N-of-1 paradigm and modify the treatment according to the analysis results based on N-of-1 research. These objective assessments may minimize intervention bias, such as continuation of less effective interventions. The PND-LLT applied to actual field data may offer advantages in future personalized interventions from the perspective of precision rehabilitation, as analysis reaching scientific significance may help clinicians to determine whether to continue the intervention.
Our study had several limitations. First, the number of trials in the baseline and intervention phases was set to 5 and 10, respectively, based on the number of sessions in previous N-of-1 research [2,3,7,10,15]. Our simulation results could not be applied to much longer or shorter trial periods, although a small number of trials (i.e., 5 to 10) in the baseline and intervention phases could be applied to broad rehabilitation interventions. Therefore, further research is needed to explore the relationship between the data-comparison accu-racy of the PND-LLT, baseline-data variability, and changes in the slope and level after intervention using a simulation study with various trial periods. Second, our field study used non-probability samples. This sampling method contains some bias compared to random selection, because the N-of-1 trials do not have an equal chance of being selected under the specific inclusion criteria. Thus, larger probability samples are needed in future studies to enable us to readily assume that the sample represents a broad population in the case of rehabilitation interventions.

Conclusions
In conclusion, through a simulation study, we found that baseline-data variability and changes in level and slope after N-of-1 intervention affect the accuracy of data comparison based on the PND-LLT. In addition, the PND-LLT could accurately predict the intervention effects for actual data in a field analysis. Therefore, the PND-LLT may serve as a sensitively meaningful analysis method in personalized clinical practice that uses an N-of-1 research design. However, large baseline-data variability (the CV values of ≥0.36 ± 0.01 or 0.26 ± 0.01) could not detect all changes in level or slope, respectively. These findings contribute toward an increasingly evidence-based approach for the personalized rehabilitation of individual patients.

Informed Consent Statement:
In the field study, informed consent was obtained from all participants involved in each published study. Data Availability Statement: Derived data supporting the findings of this study are available from the corresponding author, M.S.