Usefulness of Implementation Outcome Scales for Digital Mental Health (iOSDMH): Experiences from Six Randomized Controlled Trials

Objectives: Measuring implementation outcomes for digital mental health interventions is essential for examining the effective delivery of these interventions. The “Implementation Outcome Scale of Digital Mental Health” (iOSDMH) has been validated and used in several trials. This study aimed to compare the iOSDMH for participants in six randomized controlled trials (RCTs) involving web-based interventions and to discuss the implications of the iOSDMH for improving the interventions. Additionally, this study examined the associations between iOSDMH scores and program completion rate (adherence). Methods: Variations in total scores and subscales of the iOSDMH were compared in six RCTs of digital mental health interventions conducted in Japan. The web-based intervention programs were based on cognitive behavioral therapy (2 programs), behavioral activation (1 program), acceptance and commitment (1 program), a combination of mindfulness, behavioral activation, and physical activity (1 program), and government guidelines for suicide prevention (1 program). Participants were full-time employees (2 programs), perinatal women (2 programs), working mothers with children (1 program), and students (1 program). The total score and subscale scores were tested using analysis of variance for between-group differences. Results: Total score and subscale scores of the iOSDMH among six trials showed a significant group difference, reflecting users’ perceptions of how each program was implemented, including aspects such as acceptability, appropriateness, feasibility, overall satisfaction, and harm. Subscale scores showed positive associations with completion rate, especially in terms of acceptability and satisfaction (R-squared = 0.93 and 0.89, respectively). Conclusions: The iOSDMH may be a useful tool for evaluating participants’ perceptions of features implemented in web-based interventions, which could contribute to improvements and further development of the intervention.


Introduction
Digital mental health interventions have rapidly become available worldwide, with recent studies [1][2][3][4] finding they effectively prevent and improve various mental health outcomes. Although digital mental health interventions can be a key solution to the shortage of mental health care providers and the stigma of medical visits for mental health problems, these interventions face challenges related to insufficient implementation, including low program adherence and high attrition rates.
We previously developed and validated the implementation Outcome Scales of Digital Mental Health (iOSDMH) for users (i.e., people who use the program or patients) to evaluate implementation aspects of mental health interventions delivered via digital and telecommunication technologies such as internet websites, movies, apps, and e-mails [5]. Although different measurement tools exist (e.g., system usability scale [6]), there was few scales for measuring implementation outcomes comprehensively, focusing on digital mental health. We thus developed iOSDMH based upon Proctor's implementation conceptual frameworks [7,8], which reflects exist literature comprehensively, and related research [9][10][11][12] in order to assess indicators of implementation success, implementation processes, and intermediate outcomes linked to effectiveness or quality outcomes.
The iOSDMH for users includes 19 implementation items, including acceptability, appropriateness, feasibility, satisfaction, and harm. However, it remains unclear which implementation items are more predictive of completion rate or participant attitude compared with other items. A previous literature review indicated a positive association between treatment satisfaction and adherence, compliance, or persistence [13]. This association was partially explained by the mechanism of the theory of reasoned action (TRA) developed by Martin Fishbein and Icek Ajzen (1975Ajzen ( , 1980, in which a positive preexisting attitude and subjective norms promote behavioral intentions [14]. Therefore, 'users' beliefs and values about the impact of receiving an intervention, which influences the evaluation of satisfaction, might be important in achieving high completion in digital mental health. Although satisfaction and other implementation outcomes can be influenced by many aspects of an intervention (e.g., efficacy, side effects, communication with health care providers, personal treatment history), these outcomes might increase intervention effectiveness if improving implementation outcomes indeed increases program completion rates. However, to our knowledge, there is no available evidence showing an association between implementation outcome and completion rate in digital mental health interventions. Nor is it known how to utilize profile patterns of the iOSDMH to improve the completion rate of future web-based interventions. We administered the iOSDMH to intervention group participants in six randomized controlled trials (RCTs) of digital mental health interventions conducted in Japan that included the iOSDMH. Our aim was two-fold: (1) to investigate the usefulness of iOSDMH total scores or subscale scores in differentiating implementation aspects of each intervention; and (2) to determine their association with completion rates. We further discussed the interpretation and variations of iOSDMH scores and how such findings can improve program contents or intervention methods for future investigation.

Study Design
We compared variations of implementation outcomes in six RCTs of digital mental health interventions (with their scales). Table 1 presents the study characteristics and completion rates of the RCTs. These six trials were registered and/or the protocols were published elsewhere [15][16][17][18][19][20]. Table 1 also presents the time points at which the iOSDMH was measured in each study. All study procedures were approved by the Research Ethics Review Board of the Graduate School of Medicine, University of Tokyo, and each study utilized the same scale (iOSDMH) to measure implementation outcomes. Study 1 was an app-based self-help Cognitive Behavioral Therapy (CBT) intervention for pregnant women [18]. Study 2 was a web-based self-help acceptance and commitment therapy (ACT) intervention with a writing exercise for working mothers with a small child [20]. Study 3 was a machine-guided self-help CBT intervention with a writing exercise for workers [17]. Study 4 was a psychoeducational intervention using a website for workers [16]. Study 5 was a video-based gatekeeper program for students to prevent suicide of their peers. Study 6 was a web-based behavioral activation therapy (BA) intervention with a writing exercise for postnatal women [19].

Implementation Outcome Scales for Digital Mental Health (iOSDMH)
The iOSDMH has several distinct versions for users, providers, and managers. This study utilized the users' version, which comprises two parts: (1) evaluations (14 items) and (2) adverse events of using digital mental health programs (5 items). Each item's response was scored on a 4-point Likert-type scale ranging from 1 (disagree) to 4 (agree). The subscales interpreted "relatively agree" and "agree" as being implemented (preferable). Evaluations with their number of items and possible score ranges were as follows: acceptability (3 items; 3-12), appropriateness (4 items; 4-16), feasibility (6 items; 6-24), harm (5 items; 5-20), and satisfaction (1 item; 1-4). The total score has 14 items (14-56, excepting the harm items). Scores were calculated by summing the items' scores. The original development paper [5] calculated the total score by summing all 19 items, which we changed so that a high score of 14 items signified good implementation and 5 harm items signified less favorable implementation. Item 9 was reversed before summing. Inclusion of reversed scale items enhances scale validity because it strategically drives respondents to attend more carefully to specific content of individual items [21].

Details of the Intervention Studies
The study characteristics were collected as descriptive data: research design (target population, total number of study participants, recruitment method, primary aim of the intervention, primary outcome), intervention details (intervention type, basal theory of intervention, number of sessions, learning time per session, intervention duration, content type, homework/exercises, availability, interactions with professionals/other participants), facilitations and functions (timing of new module reminder, additional reminders for non-learners, participants' reward, reward for questionnaire completion), and findings and presentations (program completion rate, respondents' characteristics for the iOSDMH, timing of iOSDMH measurement).

Statistical Analysis
We determined whether iOSDMH scores can differentiate implementation outcomes among the studies by testing group differences with a chi-square test for the proportion implemented in each item. We tested total score and subscale scores using analysis of variance for group differences. The association between implementation subscales and completion rate was assessed by calculating the R-squared value using Microsoft Excel (Microsoft, Redmond, WA, USA). The statistical significance for all analyses in this study was set at 0.05 (two-tailed), and 95% confidence intervals were calculated. All statistical analyses were performed using the Japanese version of SPSS 28.0 (IBM Corp., Armonk, NY, USA). Table 2 shows the descriptive scores and response rates indicating users' positive evaluations. All items demonstrated significant group differences (item 15, p = 0.029; others, p < 0.001).    For acceptability, Study 5 demonstrated the highest score for all items among the six trials. Study 5 was an online student peer gatekeeper program that provided basic knowledge about suicide prevention via YouTube video for students willing to become peer supporters of suicide prevention. Study 3 showed the second highest scores for all items in acceptability. Studies 1, 2, and 6 were e-learning programs targeting a specific population. These studies presented similar profile patterns for three items in acceptability: (1) positive evaluation rates fell between 68.9% and 79.7% for item 1 (advantages outweigh the disadvantages for keeping my mental health) and item 3 (acceptable for me); and (2) low evaluation rates (positive evaluation rates between 30.0% and 51.8%) for item 3 (improves my social image). Study 4 was an e-learning program for full-time workers that encouraged participants to read webpages of interest. This study presented intermediate scores (positive evaluation rates between 67.2% and 77.2%) for items 1 (advantages outweigh the disadvantages for keeping my mental health) and 3 (acceptable for me) and a low evaluation score (44.7% positive evaluation rate) for item 2 (improves my social image).

Results
For appropriateness, Study 5 demonstrated the highest evaluation scores for three of four items among all the programs: item 4 (appropriate [from your perspective, it is the right thing to do]), item 6 (suitable for my social condition), and item 7 (fits my living condition). The studies on prenatal (Study 1) and postnatal (Study 6) experiences demonstrated a positive evaluation rate above 80% in appropriateness-related items, especially for item 5 (applicable to my health status). Study 2 presented high user evaluations (77.6%) for item 4 (appropriate [from your perspective, it is the right thing to do]) and moderate user evaluations for other items. Study 3 also presented high evaluations (84.6%) for item 4 (appropriate [from your perspective, it is the right thing to do]) and moderate evaluations for other items. Study 4 demonstrated a low evaluation for item 5 (applicable to my health status) and item 7 (fits my living condition), with a positive evaluation rate of 50.2% and 48.9%, respectively.
For feasibility, Study 1, Study 3, and Study 5 demonstrated higher user evaluations for all items compared with other programs. Study 2 reported moderate to high feasibility (positive evaluation rates between 60.2% and 77.2%) except for item 9 (physical effort; 32.0%). Study 4 showed moderate user evaluations (positive evaluation rates between 57.4% and 67.1%). Study 6, which required the longest time per session and the longest total time to complete the whole program, reported low scores for item 8 (easy to use), item 9 (physical effort), item 10 (total length is implementable), item 11 (length of one content is implementable), and item 13 (easy to understand), with positive evaluation rates of 58.6%, 34.5%, 41.3%, 48.3%, and 41.4%, respectively.
As for overall satisfaction, Study 5 reported the highest scores. For harms, Studies 2, 4, 5, and 6 reported high rates of concern that the program was time-consuming (over 25%), while Studies 2 and 6 reported that users regularly perceived excessive learning-related stress (negative evaluation rates of 52.5% and 41.4%, respectively). Table 3 shows iOSDMH total scores and subscale scores, which were characterized by significant group differences. Study 5 presented the highest total score as well as the highest subscale scores in acceptability, appropriateness, feasibility, and satisfaction. Study 1 and Study 3 presented the second highest total scores. 3.0 (0.6) 2.9 (0.9) 3.0 (0.7) 2.6 (0.7) 3.4 (0.7) 2.8 (0.9) F = 29.4, p < 0.001 Figure 1 shows the association between iOSDMH subscale scores and total iOSDMH scores (excluding harm scores) with completion rates. Subscale scores and completion rates showed a nearly linear trend. Acceptability and satisfaction were highly associated with completion rate (R-squared value = 0.93, R 2 = 0.89; respectively). Harm showed a weak inverse association. iOSDMH total scores also showed a high association with completion rate (R-squared value = 0.95). Figure 1 shows the association between iOSDMH subscale scores and total iOSDMH scores (excluding harm scores) with completion rates. Subscale scores and completion rates showed a nearly linear trend. Acceptability and satisfaction were highly associated with completion rate (R-squared value = 0.93, R 2 = 0.89; respectively). Harm showed a weak inverse association. iOSDMH total scores also showed a high association with completion rate (R-squared value = 0.95).

Discussion
This study evaluated implementation and dissemination aspects of six RCTs of digital mental health programs using iOSDMH scales as developed in a published study. The iOSDMH was shown to be an effective tool for understanding program characteristics on acceptability, appropriateness, feasibility, overall satisfaction, and harm as well as user evaluations of these points. Moreover, the iOSDMH was found to effectively measure prediction of program completion rates with moderate to high associations. Compared with appropriateness and feasibility, acceptability and satisfaction were more strongly associated with completion rate. These findings might indicate that subjective positive feelings about a program are important for adherence. Examination of iOSDMH total scores or subscales made it possible to clarify future assignments to promote further program implementation.

Acceptability
Acceptability considers whether a program's users feel that the program benefits their mental health and they have a positive impression of the program.
Acceptability relates to various factors, including a program's topics, theory, and participation style. Study 5 (targeting students interested in suicide prevention) seemed to attract users' interest and utilized effective delivery methods such as peer role play and YouTube, resulting in high acceptability scores. YouTube videos and peer role play might have contributed to these high evaluations of acceptability, a finding that is consistent with a previous study [22]. Study 3 (e-learning programs with AI feedback customized to users) received the second highest acceptability scores. Individualized program support might have led to high program acceptance. Moreover, the programs that focused on

Discussion
This study evaluated implementation and dissemination aspects of six RCTs of digital mental health programs using iOSDMH scales as developed in a published study. The iOSDMH was shown to be an effective tool for understanding program characteristics on acceptability, appropriateness, feasibility, overall satisfaction, and harm as well as user evaluations of these points. Moreover, the iOSDMH was found to effectively measure prediction of program completion rates with moderate to high associations. Compared with appropriateness and feasibility, acceptability and satisfaction were more strongly associated with completion rate. These findings might indicate that subjective positive feelings about a program are important for adherence. Examination of iOSDMH total scores or subscales made it possible to clarify future assignments to promote further program implementation.

Acceptability
Acceptability considers whether a program's users feel that the program benefits their mental health and they have a positive impression of the program.
Acceptability relates to various factors, including a program's topics, theory, and participation style. Study 5 (targeting students interested in suicide prevention) seemed to attract users' interest and utilized effective delivery methods such as peer role play and YouTube, resulting in high acceptability scores. YouTube videos and peer role play might have contributed to these high evaluations of acceptability, a finding that is consistent with a previous study [22]. Study 3 (e-learning programs with AI feedback customized to users) received the second highest acceptability scores. Individualized program support might have led to high program acceptance. Moreover, the programs that focused on concerns raised by specific populations of women (Studies 1, 2, and 6) also reported high user acceptability. Topic selection that matches users' characteristics is essential for increasing user acceptability. In contrast, Study 4 resulted in intermediate acceptability even though this study adopted a unique strategy that allowed users to tailor the program's order according to their interests. However, the topics covered might have been too general and not aimed at a specific population.
Our study showed high associations between acceptability and completion rate, which is consistent with previous studies showing that meeting client needs and preferences, sharing decision-making, and tailoring care are important for improving implementation and effectiveness [23,24]. Our study suggests directions for future research in that personalized or tailored programs can lead to increased acceptability to maximize completion rates.

Appropriateness
We found that matching program content with a user's physical or medical needs is an important factor for appropriateness. In Study 5, students willing to be gatekeepers for suicide prevention might have needed communication skills for suicide prevention and thus felt that the program content was appropriate and suited their social condition. They might have perceived that communication skills useful for listening to peers and connecting to professionals were relevant or correct. The programs on perinatal-specific issues (Studies 1 and 6) also received high scores for content appropriateness and fitting users' health status. Our finding was consistent with a previous finding that the perceived relevance of program content and personal circumstances were important for treatment engagement and adherence [25]. However, Study 2 focused on specific needs of working mothers with preschool children but did not receive high evaluations for appropriateness in terms of users' health status or social condition. Study 2 was developed based on ACT theory [26], and contained metaphors and stories about people under high distress. Some users might have felt that the program was not appropriate for their health status or social condition. Researchers should understand the pros and cons of targeting participants' specific issues and possibly consider a detailed assessment of those needs and adopt a segmentation strategy. In our study, appropriateness showed moderate or low associations with completion rate. However, belief in starting a program and treatment can be based upon perceived appropriateness, which indicates perceived efficacy in meeting needs and recognizing innovation for addressing personal issues and problems [27]. Improving appropriateness rather than adherence might benefit another side of the process.

Feasibility
For feasibility, we assessed the difficulty of using the program, time requirements for program participation, and content difficulty. Time seemed to be a key factor among programs with high feasibility evaluations. Programs that required less than 15 min to complete a session, with six sessions overall, received high evaluations. Study 5 received high feasibility evaluations because its movie sharing media (YouTube) took little time per session, which also shortened the duration of the whole course. Media sharing sites such as YouTube are potential resources for knowledge translation as they are easy to use and free of cost [28]. Furthermore, young participants in Study 5 might have found the YouTube intervention more feasible than a reading-based platform. In contrast, each session and the whole course of Study 6 took longer to complete, with users evaluating this program as less implementable. Simplifying content and reducing the number of sessions and content length might lead to higher program feasibility. As for completion rate, feasibility showed a moderate association with completion rate. Feasibility impacts the early phase of program adoption [8] because users' motivation to continue using a system can reflect on the completion rate.

Satisfaction
Our study included satisfaction evaluations. Programs with high evaluations in other subcategories also showed high satisfaction scores. In implementation science, users' per-ceived satisfaction with a program is important and future studies should explore the details. Regarding completion rate, satisfaction showed a high association with completion rate. Previous research suggests that high treatment satisfaction achieves high adherence [13]. Note that overall satisfaction with the iOSDMH is evaluated by only one item just once after the intervention, even though satisfaction has multifaceted or multilevel time-varying components [14]. For instance, program completion might promote high satisfaction but it remains unclear in this study.

Total Scores of the iOSDMH
The iOSDMH total scores reflected overall subscale evaluations, with Study 5 receiving the highest scores of all. Study 5 successfully reached the population in need, had appropriate programs utilizing a feasible method, and thus provided a sense of satisfaction. The iOSDMH total scores showed a positive correlation with program completion rate, suggesting that perceived evaluation of implementation aspects has some influence on completion rate. Although completion rate might be influenced more by reminder frequency, internal learning needs, and informal pressures for contextual use, implementation outcomes measured by the iOSDMH partially predict adherence. Revising the program to improve its total score might contribute to program effectiveness and efficacy through increased adherence. Because the programs were based on different psychological models and have different primary outcomes, total scores among different trials should be interpreted with caution. Due to differences among the trials, cutoff points for total scores were not reported in this study. The iOSDMH total scores might be beneficial especially when compared with studies having similar research designs and primary outcomes or to reevaluate implementation aspects of the program after modifying program contents or intervention delivery methods.

Implications of the iOSDMH
We found that iOSDMH subscales provided rich information on users' evaluation of a program in terms of acceptability, appropriateness, feasibility, and overall satisfaction, and these measures seemed to be associated with program completion rate. Researchers can review each subscale score to fulfill unmet needs of the program contents or intervention design as perceived by users. Because this study discussed important factors for each item of implementation, researchers can benefit from our findings to find clues for improvement. After refining a program or study design, researchers can then readminister iOSDMH scales for reevaluation. Moreover, even when an intervention program fails to show a significant effect for primary endpoint outcomes, implementation outcomes would provide researchers essential information to explore the reasons an intervention program did not perform as expected, leading to future improvement of the program. In the clinical practice, using this scale provide the indicator of improvement progress. Comprehensive assessment of iOSDMH may enlighten the target area which needs to be improved and lead further refinement for contents or delivery-related strategies of digital mental health program in clinical settings.

Limitation
This study had several limitations. First, a cut-off point was not reported for each iOSDMH subscale or total score because each study had different study settings and objectives. Second, iOSDMH scales relied on users' reported outcomes. Therefore, we attempted to compare iOSDMH scores with the program completion rate. Third, as the trials examined in this study consisted of a convenience sample of RCTs, our findings must be validated in other trials. Forth, the iOSDMH was not validated compared to other scales for measuring implementation outcomes of digital interventions. There is still room to examine further priority of the scale compared to other existing scales.

Conclusions
This study showed the effectiveness of the iOSDMH scale to evaluate essential aspects of digital mental health programs and serve as a significant indicator of program completion. Evaluation of implementation outcomes might also be important for maximizing effectiveness, which can be highly affected by completion rate. The iOSDMH scale can direct researchers toward future program goals to achieve social implementation and maximum effectiveness.
Author Contributions: D.N. oversaw this project, supervised the process, and provided his expert opinion. E.O., N.S., K.I., K.N. and N.K. provided data from their original randomized controlled trials. Collaborators R.V., P.C., T.S. and M.K., who were among of the developers of iOSDMH, ensured that questions about the accuracy or integrity of any part of the work were appropriately investigated and resolved. E.O. and N.S. wrote the first draft of the manuscript, and all other authors critically revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement:
This study was not reviewed by the Research Ethics Committee because no new original data were collected in this study.

Data Availability Statement:
The data that support the findings of this study (related to iOSDMH only) are available from the corresponding author, DN, upon reasonable request.