Evaluation of Android and Apple Store Depression Applications Based on Mobile Application Rating Scale

There are a large number of mobile applications that allow the monitoring of health status. The quality of the applications is only evaluated by users and not by standard criteria. This study aimed to examine depression-related applications in major mobile application stores and to analyze them using the rating scale tool Mobile Application Rating Scale (MARS). A search of digital applications for the control of symptoms and behavioral changes in depression was carried out in the two reference mobile operating systems, Apple (App Store) and Android (Play Store), by means of two reviewers with a blind methodology between September and October 2019 in stores from Spain and the United Kingdom. Eighteen applications from the Android Play Store and twelve from the App Store were included in this study. The quality of the applications was evaluated using the MARS scale from 1 (inadequate) to 5 (excellent). The average score of the applications based on the MARS was 3.67 ± 0.53. The sections with the highest scores were “Functionality” (4.51) and “Esthetics” (3.98) and the lowest “Application Subjective quality” (2.86) and “Information” (3.08). Mobile Health applications for the treatment of depression have great potential to influence the health status of users; however, applications come to the digital market without health control.


Introduction
Depression is one of the great psychological diseases that occur throughout life. It has a prevalence close to 20%, being higher in women [1]. This produces a high number of hospital consultations and, therefore, a high expense in the public system, as well as being an important factor in people's quality of life [2]. The sociodemographic and psychopathological factors and the course of the disease itself are fundamental in the treatment of this pathology, which allow one to determine whether it is a chronic or no chronic disorder [3].
The use of mobile phones as a tool for changing behavior and habits in patients is a reality. There are also a large number of mobile applications; in the Android Play store, there are 2,100,000 applications available, whereas there are 1,800,000 in the App Store [4]. The mobile applications used in health can be categorized into six main groups: lifestyle-oriented apps, patients-oriented apps, clinician-oriented apps, disease management systems, traditional telehealth, and mHealth systems [5]. However, not all applications, categorized as medical or health, have the same effect on patients [6].
Cognitive behavioral therapy (CBT) and behavioral activation are two of the most widely used evidence-based treatments for the treatment of depression, without considering pharmacological interventions [7]. However, the lack of adherence to these models and the lack of efficacy studies of them make the usefulness of mobile applications in this area questionable [8].
Users of different digital marketplaces can set scores from 1 to 5 for applications. This assessment is carried out completely subjective without establishing a prior criterion. This allows applications to place themselves in the highest positions in the search engine [9]. However, there are validated tools that allow the assessment of digital applications based on specific criteria [10], as well as assess the ability of the applications to produce behavioral changes [11]. However, users can access these applications freely without the guidance of a specialist. This fact does not ensure a reliable use of them, which can be harmful. In no case should these applications be a substitute for the intervention of a professional; they should be considered by users as a support tool [8].
The assessment of mobile applications can be carried out by means of the Mobile App Rating Scale (MARS) [10]. As far as we know, mobile applications for depression have not been assessed with this tool; it has been used for pain management [12], weight management [13], or asthma [14], among others. The MARS is a simple, objective, and reliable method to measure the quality of mobile applications, which has demonstrated good psychometric properties [10]. This instrument has been translated and adapted to Spanish while retaining its psychometric properties [15].
Therefore, it is of great relevance to carry out a systematic review of the mobile applications existing in the market focused on depression, assessing their user participation, and the functionality of the application, aesthetics, information provided, subjective quality, and impact of the application on the user based on the criteria established by the MARS. The results of this review will allow the specialist to recommend the best mobile applications for patients. Consequently, the aim of this study was to examine depression-related applications in major mobile application stores and analyze them using the rating scale tool MARS.

Information Sources and Search Strategy
This study included the mobile applications (apps) related to depression (free and paid) identified in the App Store (iOS) and Play Store (Android) in July 2019. Both systems are the most widespread among mobile phones; 99% of them use them. The search was conducted in the app stores of Spain and the UK.
Two searches were conducted, the first one with the term "depression" and the second one with the terms "depression" and "CBT" (cognitive behavioral therapy). The inclusion criteria for apps related to depression were: relation to include an evaluation system and to monitor the evaluations carried out. We also included those apps that implied behavior change techniques (BCTs) in the user or established objectives related to depression.

Eligibility Criteria
An initial evaluation was carried out that eliminated all "junk" apps that are not related to health problems, such as wallpaper apps or apps that only offer "text quotes." Only one of the apps was analyzed in case these were repeated in different stores and only the language was changed (apps that were, for example, in the UK and Spanish Apple store). Apps related exclusively to depression assessment or general information, apps with a language other than English or Spanish, or apps that required an initial payment for its execution were also excluded.

Study Selection
Two of the authors made the selection of apps based on the inclusion and exclusion criteria with a blind methodology. In case of conflict, a third author decided its inclusion or exclusion based on the established criteria. After performing the search, 30 apps were included (18 from the Android Play Store system and 12 from the Apple App Store operating system). A flow diagram based on PRISMA statement was included for the selected apps [16] (Figure 1). The included apps were independently assessed by two of the authors (J.M.-M. and A.M.-C) using the MARS Spanish assessment [15].

Eligibility Criteria
An initial evaluation was carried out that eliminated all "junk" apps that are not related to health problems, such as wallpaper apps or apps that only offer "text quotes." Only one of the apps was analyzed in case these were repeated in different stores and only the language was changed (apps that were, for example, in the UK and Spanish Apple store). Apps related exclusively to depression assessment or general information, apps with a language other than English or Spanish, or apps that required an initial payment for its execution were also excluded.

Study Selection
Two of the authors made the selection of apps based on the inclusion and exclusion criteria with a blind methodology. In case of conflict, a third author decided its inclusion or exclusion based on the established criteria. After performing the search, 30 apps were included (18 from the Android Play Store system and 12 from the Apple App Store operating system). A flow diagram based on PRISMA statement was included for the selected apps

Data Extraction and Quality Assessment
The MARS has 23 structured questions in six sections: engagement, functionality, esthetics, information, app subjective quality, and app-specific. The questions are evaluated from 1 (inappropriate) to 5 (excellent); a final average score of the 4 initial sections is generated. The app subjective quality and app-specific sections are independently evaluated. MARS scores between the two reviewers were compared, and in case of discrepancies (two-point difference), they were compared. If there was still disagreement, a third reviewer participated to determine the score. The final score for each of the apps was obtained by the average of the scores of each reviewer.

Data Extraction and Quality Assessment
The MARS has 23 structured questions in six sections: engagement, functionality, esthetics, information, app subjective quality, and app-specific. The questions are evaluated from 1 (inappropriate) to 5 (excellent); a final average score of the 4 initial sections is generated. The app subjective quality and app-specific sections are independently evaluated. MARS scores between the two reviewers were compared, and in case of discrepancies (twopoint difference), they were compared. If there was still disagreement, a third reviewer participated to determine the score. The final score for each of the apps was obtained by the average of the scores of each reviewer.

Data Synthesis and Analysis
Statistical analysis was performed using SPSS statistical software, version 22.0 (SPSS Inc, Chicago, IL, USA). Average scores for each section of each app were retrieved. A descriptive analysis was performed for each of the MARS sections (mean and standard deviation). For inferential analysis, a bivariate correlation method was performed based on Pearson's or Spearman's coefficient according to the normality of the variables. The analyses carried out were: user ratings versus MARS ratings; app protection (binary variable) versus the number of downloads; app protection versus user scores; and app protection versus MARS.

Results
After an initial search for apps, a total of 1213 apps were found. A total of 30 apps ( Figure 1) remained after the removal of paid apps, duplicated apps, and irrelevant apps. Eighteen applications from the Android Play Store and twelve from the App Store were included in this study. Likewise, 12 of the 30 applications (40%) included payments within the application to acquire full functionality or subscription. Only two of the included applications were not commercial affiliations (Table 1). All off the included apps (n = 30) were evaluated using the MARS tool for evaluating the quality of health apps (Table 2). No relationship was found between the number of downloads and the app's MARS score r = 0.19 (p = 0.39). A moderate positive relationship was found between the users' score and the MARS score as well, r = 0.48 (p = 0.04). No significant associations on the correlation analysis (p > 0.05) were shown between the app protection (binary variable) versus the number of downloads; app protection versus user scores; and app protection versus MARS. Based on the MARS results, the apps evaluated were divided into four quartiles (Table 2).  An overall score was obtained for each app covering the target fields of engagement, functionality, aesthetics, and information. The score of subjective subitems was also obtained. The best score based on the MARS was 4.58 for "Sanvello" and the lowest rating "PerSoNClinic (Depression, Chronic Pain, Cancer)." Based on the engagement rating by the MARS, the best was "Savello." For functionality, 10 apps obtained the best score; in esthetics, 3 applications rated the top score; in information, "MoodMission" was best; for app subjective quality, "WellTrack -Interactive Self-Help Therapy" and "Youper -Emotional Health" were best; and for app-specific, 4 applications rated the best score (Table 3).

Discussion
In this paper, we aimed to use a tool commonly used for the analysis of mobile apps (MARS) to assess the quality of apps dedicated to depression. There are a multitude of these apps in the mobile device app stores, but in many cases, these apps are rated by the users who use them but lack an objective evaluation.
The MARS tool is a multidimensional tool that provides an overall score covering four objective quality indicators (engagement, functionality, aesthetics, and information quality) [10]. The MARS tool also has two other sections: one for subjective quality and a sixth section to assess the perceived impact of the app on the user. The score on the MARS scale ranges from 0 (worst) to 5 (best). Based on the results of our analysis, the "Sanvello" application obtained the highest results (4.58) and "Per-SoNClinic (Depression, Chronic Pain, Cancer)" the lowest rating (2.38).
The initial search for apps was carried out in the official Google Play Store and Apple App Store in Spain and the UK. A total of 1213 apps were found in this initial search, of which a total of 30 apps were analyzed. It should be noted that for the apps to be included in this study, they had to be functional and related to the subject of depression. They also had to include a system for evaluating and monitoring the progress made.
It is worth noting that the market for mobile apps is constantly changing. Some apps are constantly updated or disappear, while new ones appear. In the course of this work, some apps that were found in the initial search had disappeared when we started to analyze them in detail.
In the case of the scores that the apps obtained from the users, these varied between 5 stars for the app with the highest score and 1.5 stars for the app with the lowest number of votes (average of 4.33). The apps obtained an average score on the MARS scale of 3.67, with scores ranging from 2.38 to 4.60. No correlation was observed between the users' score and the score obtained on the MARS (p = 0.39). This type of result is not uncommon [10,17]. The apps that we can find in the app stores are usually commercial (in our sample, only one came from government and one from university). Users' opinions are usually variable, unreliable, and subjective. This demonstrates the need to develop more science-based and less commercial apps in mHealth apps.
The tools most used for the management of depression in the apps analyzed were Assessment (18/30), CBT (17/30), and Feedback (15/30). CBT is a treatment approach for a range of mental and emotional health issues, and it is a technique widely used in mHealth apps [18].
A positive correlation was found between the number of downloads and the MARS score (p = 0.04). This could indicate that people do not just rely on user ratings, but also look for quality apps.
Nowadays, the issue of security is very important on the internet, even more so when dealing with personal health data [19,20], as is the case in the depression apps of this study. The MARS evaluates the safety of the app in two items: it asks if the app allows password protection and if it requires log-in. However, these items do not count in the overall MARS rating of the app. In the case of the apps seen in this study, only 50% met this minimum security (Table 1). Users do not seem to care much about privacy neither. No relationship was found between the security of the app and the number of downloads (Spearman's rho = 0.27, p = 0.29), or the security of the app and user ratings (Spearman's rho = 0.13, p = 0.56).

Conclusions
The use of smartphones is becoming more and more widespread. This leads to an increase in the use of apps among which mHealth apps stand out. mHealth apps (such as apps for the treatment of depression) have great potential to influence the health status of users who use them. Unfortunately, most of the apps we found in app stores have a commercial purpose with a lack of scientific rigor. These apps come to the market from mobile app stores without any kind of health control. The only control they have is the publication policy of the app store itself. In many cases, we have found apps that only showed some information about depression (without any information about where it was obtained), forums for people with depression (without any control over the forum), or that gave access to an online psychologist. Not only could some of these apps not be beneficial for depression, but in some cases could be harmful to the individual. This shows the great importance of developing apps with scientific rigor that can bring benefits to the individual who uses them.

Conflicts of Interest:
The authors declare no conflict of interest.