Automatic Diagnosis of Mental Healthcare Information Actionability: Developing Binary Classifiers

We aimed to develop a quantitative instrument to assist with the automatic evaluation of the actionability of mental healthcare information. We collected and classified two large sets of mental health information from certified mental health websites: generic and patient-specific mental healthcare information. We compared the performance of the optimised classifier with popular readability tools and non-optimised classifiers in predicting mental health information of high actionability for people with mental disorders. sensitivity of the classifier using both semantic and structural features as variables achieved statistically higher than that of the binary classifier using either semantic (p < 0.001) or structural features (p = 0.0010). The specificity of the optimized classifier was statistically higher than that of the classifier using structural variables (p = 0.002) and the classifier using semantic variables (p = 0.001). Differences in specificity between the full-variable classifier and the optimised classifier were statistically insignificant (p = 0.687). These findings suggest the optimised classifier using as few as 19 semantic-structural variables was the best-performing classifier. By combining insights of linguistics and statistical analyses, we effectively increased the interpretability and the diagnostic utility of the binary classifiers to guide the development, evaluation of the actionability and usability of mental healthcare information.


Introduction
Information readability and actionability form two key factors of the effectiveness of patient-oriented healthcare information [1][2][3][4][5]. Many current online healthcare information quality evaluations focus on readability assessment. This benefits from the long tradition of quantitative readability evaluation in medical and health education using readability instruments [6][7][8][9][10]. Even though the two concepts are distinct from each other in both research and clinical practice, many current studies still confuse and conflate the two distinct concepts into a single dimension of health information quality assessment. Actionable content means information that can automatically prompt the best decisions about care at the point in time when clinical decisions need to be made [11][12][13]. This requires the design of health information that is based on the real-life circumstances of the intended information users and best reflects their practical needs, varying health literacy levels, cognitive abilities, socioeconomic circumstances, and other determinants [14][15][16][17]. Actionability assessment thus requires a distinct approach to readability or understandability assessment. Actionability can have an important impact on the acceptability and practical usability of the information for target readers [18,19]. Existing approaches to the evaluation of health information actionability are often qualitative. Patient Education Materials Assessment Tool (PEMAT) is widely used by health and medical professionals to evaluate the practical usability of printed or audio-visual materials [20,21]. However, there is a lack of quantitative tools to assist health professionals with the evaluation of the actionability of mental health resources.
Our study aimed to offer an automatic quantitative assessment of mental health information actionability. This was based on two major types of online mental healthcare information: generic and patient-specific. Generic mental health information is written for the general public, with an accessible, understandable language, but without specifying the intended patients or information users. This represents the mainstream health information in many medical and health domains. Another major type of mental healthcare information is patient-specific, using a similarly easy, plain, and understandable language but with a focus on well-defined patients and reader groups. General mental health information focuses on the explanation of symptoms, signs of prevalent mental disorder, wide-ranging causes, and determinants, as well as generic treatment plans and interventions. By contrast, patient-specific mental health information often adopts a narrative communication style, discussing the diverse, complex yet recurrent practical needs of the intended readers, recognising their potential for achieving better mental health and wellbeing, adapting general treatment plans and interventions to suit their needs, and taking into consideration likely barriers to external resources. An increasing number of not-for-profit organisations in English-speaking countries are actively engaged in the development of patient-specific mental health information for vulnerable people [22][23][24]. This increasing direction towards personalised mental healthcare support through patient-tailored mental health information provides valuable resources to the development of automatic assessment instruments, which, in turn, will further advance research and improve clinical practice in patientcentred mental healthcare.

Information Sources, and Search Strategies
We started the search for generic and patient-specific mental health resources on Google. We limited the search to the 100 top websites on mental healthcare up to 1 July 2021. We used the label of HON.Net as a measure of health and medical content quality control [25]. With the collection of generic mental healthcare information, 36 websites were excluded for not having been certified by HON.Net, 22 websites were excluded for not addressing general readers, and 6 were excluded for not being suitable for the automatic statistical analysis based on natural language features, such as too short paragraphs, sentences, or passages. With patient-specific mental healthcare information, 36 websites were excluded for not having been certified by HON.Net, 42 websites were excluded for not addressing general readers, and 4 were excluded for not being suitable for natural language mining and subsequent quantitative assessments. In the final screened texts, we randomly selected 70% from each text group to develop the training and testing dataset. The final dataset of patient-specific information included mental health selfcare resources for 4 population groups: teenagers (aged 11-18) (8.9%), young adults (aged 18-35) (87.5%), people over 65 years (1.67%), women (1.17%), and men (0.76%).
The two sets of mental healthcare information collected, that is, generic (GEN) and patient-specific (PAS), were fully annotated with two sets of natural language features using Readability Studio (Oleander Software) [26,27] and the English semantic annotation system (USAS) developed by the University of Lancaster, United Kingdom [28][29][30]. Readability Studio annotated the health information with rich structural features to help us measure the lexical, morphological, and syntactic complexity of the texts. Some of these features were studied extensively in the field of readability assessment, such as the average number of characters, average number of syllables, number of monosyllabic words, average number of sentences per paragraph, average sentence length, whereas others were less studied. Structural features added by Readability Studio were collectively labelled with TOF in the training and testing of machine learning classifiers. The automatic semantic annotation with USAS added rich semantic information of the mental healthcare resources.

Semantic Feature Labelling Strategy
Semantic features were well studied in medical document classification and clinically significant information retrieval. In our study, semantic features (labelled as SOF) refer to health information contents. The semantic classification of USAS was loosely based on Tom McArthur's Longman Lexicon of Contemporary English, which broadly divided the English lexicon into 22 large categories (Table 2) [31][32][33]. Semantic annotation has wide applications in medical and health informatics, such as document classification and information retrieval. In the clinic, the semantic annotation has been explored to organise unstructured clinical information or data to support medical research or clinical trials. It can aid in the automatic extraction of critical information from clinical texts such as temporal information, symptoms, diseases to facilitate clinical decision making. Compared to structural features, semantic features can help with more contextualised analyses of health information, for example, in our study, this means how the use of certain semantic classes (Table 2) such as medical and health terms (B), emotions (E), sports (K), movement M), measurements (N), temporal expressions (T), psychological actions, states and processes (X), science and technology (Y), and proper names (Z), may affect the levels of the actionability of mental healthcare resources.
Since the USAS semantic annotation scheme was designed for general language studies, we adapted the descriptive labels of some semantic categories (notably B, K, Z) to reflect its application in the study of mental healthcare information. Subcategories of the original 22 large semantic categories that occurred rarely in mental healthcare information were trimmed. As a result, the adjusted descriptive labels indicated the largest and most frequent subcategory within each large semantic category. For example, the original USAS descriptive label for B was (Body and the Individual). In our study, it was found that most of the subclasses of B belonged to B1 (Anatomy and physiology); B2 (Health and disease); B3 (Medicines and medical treatment). There were large missing cases of B4 (Cleaning and personal care) and B5 (Clothes and personal belongings). We, therefore, adapted B to Medicine and Health terms in our study.
The original USAS descriptive label for K was Entertainment, which includes K1 (Entertainment generally), K2 (Music and related activities), K3 (Recorded sound), K4 (Drama, the theatre and show business), K5 (Sports and games generally) and K6 (Children's games). Similarly, the original descriptive label of category I was (Money), including I1 (Money generally), I2 (Business), I3 (work and employment), and I4 (Industry). The most frequent subcategory of mental healthcare resources was I3 on employment, so we relabelled I as Work Employment. Most words of K belonged to K5 and K6; we, thus, adjusted the descriptive label to reflect the most relevant subcategories of K in our study on mental healthcare resources, especially for youth mental healthcare. Another example is the semantic category Z, whose original USA descriptive label was Names (Z1-Z3) and Grammar (Z4-Z9 plus Z99). To distinguish the effects of infrequent expressions, such as medical jargon, we separated Z99 (out-of-dictionary expressions) from other Z subclasses: Z1 (personal names); Z2 (geographical names); Z3 (other proper names), Z5 (grammatical bins), Z6 (negative expressions), Z7 (if), Z8 (pronouns), and Z9 (trash can). Table 1 shows the statistical analysis (Mann-Whitney U test of 2 independent samples) of the distribution of structural features (TOF) in general, non-specified and patient-specific mental healthcare information. First, we calculated the overall difficulty of written mental health information using validated readability assessment tools including Flesch Reading Ease Score, FORCAST, Gunning Fog Index, SMOG Index. The mean Flesch Ease Reading score of generic, non-specified mental healthcare information was 46.71 (SD = 11.92) and was statistically more difficult than that of patient-specific mental health information (mean = 66.74, SD = 12.27, Mann-Whitney U test, p = 0.000). This suggests that generic information with a mean of 46.61 was suitable for college students (Flesch Reading Ease range: 50-30); and patient-tailored mental health information could be easily understood by 13-15-year-old students or general readers with Year 9 education (Flesch Reading Ease range: 70-60). The average Gunning Fog score of the generic mental health information was 12.46 (SD = 2.03) and was statistically higher than that of patient-tailored mental health information (mean = 9.02, SD = 1.93, p = 0.000). The difficulty of generic mental health information measured by SMOG Index score was 12.50 (SD = 1.50) and was statistically higher than patient-tailored resources (mean = 10.12, SD = 1.6, p = 0.000). Both results of the Gunning Fog Index and the SMOG Index suggested that patient-tailored information was suitable for readers with Year 9-10 education, and generic mental healthcare resources required at least three more years of education. Both generic and patient-specific mental healthcare information was above the Year 6-8 reading levels recommended by the World Health Organisation.
In terms of morphological complexity, the mean of generic mental healthcare information was statistically higher than that of patient-specific information in 7 categories: average number of characters (mean _GEN  Lastly, generic patient mental health information was syntactically more complex than patient-specific mental healthcare information: number of difficult sentences (more than 22 words) (mean _GEN = 14.69, mean_ PAS = 8.68, p = 0.000), average sentence length (mean _GEN = 13.8, mean_ PAS = 12.73, p = 0.000), and passive voice (mean _GEN = 5.22, mean_ PAS = 2.79, p = 0.000). Characteristics of syntactical structures of patient-specific mental healthcare information included the use of more lengthy, descriptive paragraphs compared to syntactic brevity of generic, non-specified mental health information: average number of sentences per paragraph (mean _GEN = 1.54, mean_ PAS = 2.58, p = 0.000); stronger emphasis on logical coherence: sentences that begin with conjunctions (and, but, though, while, even though, etc.) (mean _GEN = 1.11, mean_ PAS = 1.64, p = 0.000); and the use of more interactive sentence structures: number of interrogative sentences (ques-tions) (mean _GEN = 2.81, mean_ PAS = 4.3, p = 0.000) and number of exclamatory sentences (mean _GEN = 0.08, mean_ PAS = 1.35, p = 0.000).  Table 2 shows that the mean of generic, non-specified mental healthcare information was statistically higher than that of patient-specific mental healthcare information in semantic categories of general and abstract terms (A) (mean _GEN    We divided the entire dataset between 70% for constructing the binary classifier and 30% for validating the classifier. Table 3 shows the result of exploratory factor analysis (EFA) used to reduce the dimensions of observed variables, i.e., the total 49 natural language features. Within the two-dimensional instrument constructed, the first and second dimensions accounted for 42.361% and 36.396% of the total variance in the 70% training dataset, respectively. Figure 1 is the screen plot that visualised that increases in the amount of variance explained by the instrument started to flatten after the second dimension, again suggesting that the optimised dimension number be set at 2.   Table 4 exhibits the rotated loading (varimax rotation with Kaiser normalisation) of the observed variables on the first two large dimensions. The first dimension or component of the instrument encompassed 9 structural features and 1 semantic feature (B med-

Results
After exploratory factor analysis on the 70% training data, we validated the logistic regression model on the remaining 30% dataset and compared the performance of the optimised binary classifier with that of popular readability tools and binary classifiers with original variables. The 4 binary classifiers were based on semantic variables (22), structural variables (27), both semantic and structural features (49), and the optimised variables (19) through exploratory factor analysis. Table 5 shows the paired-sample Wilcoxon signedrank test, which assessed whether the difference between the AUC of each classifier or readability tool and the reference 0.5 was statistically significant or not. A p smaller than 0.05 was considered statistically significant. It shows the AUC of readability tools and binary classifiers using different variables was statistically higher than the reference AUC.  Table 6 shows the paired-sample t-test of the AUC of readability formula and binary classifiers using different variables, with the adjusted Bonferroni significance at 0.00179. p-values smaller than 0.00179 were considered statistically significant. It shows that among the 4 readability formula, Gunning Fog Index (AUC = 0.893) and Flesch Reading Index (p = 0.882) were the two top classifiers and the differences in their AUC was statistically insignificant (p = 0.009852 > 0.00179). Among the four binary classifiers, the two top classifiers were the one using all variables (49), including semantic and structural (AUC = 0.872) and the one based on optimised variables (19 variables) (AUC = 0.863), and the difference between the two was statistically insignificant (p = 0.2292). The AUC of Gunning Fog Index was statistically higher than the binary classifier using 49 variables (p = 0.000168) and the optimised binary classifier using 19 variables (p = 0.000462). The AUC of the Flesch Reading Ease Index was statistically similar to that of the binary classifier using 49 variables (p = 0.02513) and the optimised binary classifier using 19 variables (p = 0.1837).   Table 7 shows that despite the sensitivity and specificity pairs of the 2 top readability formula-based classifiers: Gunning Fog Index and the Flesch Reading Ease Index. It shows that under the different thresholds, sensitivity increases as specificity decreases. Table 8 shows the sensitivity and specificity of 4 binary classifiers using natural language features as variables. When setting the specificity at 0.85, the binary classifier with all variables (49) had the highest sensitivity (0.907), followed by the optimised classifier (19) using exploratory factor analysis (0.890), Gunning Fog (0.769), and Flesch Reading Ease (0.729). As a result, the two binary classifiers had achieved better sensitivity and specificity when compared to the readability formula-based classifiers, which had much lower specificity. Table 9 shows the paired-sample t-test of differences in sensitivity between 4 binary classifiers using natural language features as independent variables. p-values are statistically significant when smaller than the Bonferroni correction (adjusted alpha = 0.00833). It shows that sensitivity of the classifier using both semantic and structural features (49) as variables achieved statistically higher than that of the binary classifier using either semantic (22) (p = 0.000) or structural features (27) (p = 0.0010). Similarly, the sensitivity of the binary classifier using optimised features (factor analysis) (19) was statistically higher than that of the binary classifier using either semantic (p = 0.000) or structural features (p = 0.0020). The difference in sensitivity between the binary classifier using full variable set (49) and the optimised classifier (19) was statistically insignificant (p = 0.0398 > 0.00833).   Table 10 shows the paired-sample t-test of differences in specificity between 4 binary classifiers. It shows that the specificity of the classifier using full variables (49) was statistically higher than that of the classifier based on structural variables (27) (p = 0.001) but statistically similar to that of the classifier based on semantic variables (22) (p = 0.011 > 0.00833). The specificity of the optimised classifier was statistically higher than that of the classifier using structural variables (27) (p = 0.002) and the classifier using semantic variables (22) (p = 0.001).
Differences in specificity between the full-variable classifier and the optimised classifier were statistically insignificant (p = 0.687). These findings suggest that the optimised classifier using as few as 19 semantic-structural variables was the best-performing classifier.

Discussions
Improving the quality and usability of current mental healthcare necessitates the development of highly actionable and better-targeted resources for people with different mental disorders or at high risks of developing mental diseases. This requires a more personalised and patient-centred approach to mental health information evaluation. The increasing amount of high-quality mental health information on the internet provides valuable first-hand materials to develop new quantitative evaluation tools and systems. Our study has made a useful attempt in this direction. In the development of automatic tools for the evaluation of mental health information actionability, we found that semantic features had an important role in actionability on mental health resources. For example, there were three semantic categories in which the mean of patient-specific mental healthcare information was statistically higher than that of generic mental health information: arts and cultures (C) (mean _GEN = 33.909, mean_ PAS = 16.810, p = 0.000); locations (M) (mean _GEN = 33.909, mean_ PAS = 16.810, p = 0.000), and speech acts (Q) (mean _GEN = 33.909, mean_ PAS = 16.810, p = 0.000). These contrast findings suggested that generic information was richer and more varied compared to patient-specific mental healthcare information. Discussions in generic mental healthcare covered a broad range of risk factors causing mental disorders the public, such as social and political circumstances (G, S), environmental stressors (W), household and living environments (H), employment (I), nutrition (F), individual attributes (E), physical activities (K), science, technology, and medicine (B, Y), and education (P). The wide range of topics in general mental healthcare information, despite being more informative than patient-specific mental healthcare resources, could significantly reduce the actionability of the health information. By contrast, patient-specific mental healthcare information had stronger focuses on more concrete and tangible approaches to mental healthcare such as artistic and creative activities, as typical words in the semantic category C were artwork, caricature, carvings, crochet, D.I.Y. graphics, knit, paintbrush, photos, and paintings. Patient-specific mental health information also had a statistically higher (p = 0.045) use of speech acts expressions such as address, (have) conversation (about your mental health), tell (their stories), (your) point of contact, question, speak (with your communities), (peer) mentoring (programmes), talk (about your mental health openly), ask (what happened?), and talk (with a friend in need).
In our study, since most structural-semantic features had statistically significant differences (p = 0.000) in the two sets of mental healthcare information, we computed the adjusted effect size Hedges'g (for 2 independent samples of non-equal sizes) and common language effect sizes (CLES) measures. The general rules of thumb described by Cohen suggest that an effect size of 0.2 represents a "small" effect, an effect size of 0.5 represents a "medium" effect, and an effect size of 0.8 represents a "large" effect. CLES ranges between 0 and 1 and has a positive correlation with effect sizes. We found that an important shared property of structural features, which were retained in the variable reduction process, were those with medium or large effect sizes, suggesting that effect sizes can be a useful indicator of whether a certain observed variable is suitable to discriminate binary-dependent variables. Specifically, structural-semantic variables that had medium effect (Hedges'g 0. The linguistic meanings and the varying discriminating functionality of these structural and semantic features as measured by their statistical significance (p < 0.05) and effect sizes (Hedges'g > 0.5) were used to develop automatic binary classifiers to assess the actionability of mental healthcare information.
Study Limitation: Mental health is highly complex. Patients with different demographic profiles and mental disorders need well-designed resources to better support them. In our study, we divided mental health information into patient-oriented and generic mental health information. However, within patient-oriented health information, our newly developed quantitative tool could not assess whether a certain piece of mental health information is more suitable for a particular social group, such as young people, elderly people or adults, men, or women. There is space to develop evaluation tools to support health information assessment for specific populations.
Future Work: Increasing the actionability of mental health information for needed populations can significantly improve the quality of current mental health services in both developed and developing countries. To achieve this goal, the development of quantitative and machine learning-based evaluation tools and instruments will provide healthcare providers with much-needed resources. In our study, we made a useful attempt towards this goal. In future work, we aim to enrich the contents of the classifiers by testing different sets of language features such as sentiment features [34,35]. We also aim to test our methods with mental health resources in languages other than English to help better support mental health organizations working with multicultural, multilingual populations. These useful tools, such as the one we developed, do not assume any prior knowledge of the patients' languages, which could effectively close the language gap between patients and health professionals supporting them.

Conclusions
Our study developed a quantitative instrument to assist with the automatic evaluation of the actionability of mental healthcare information. By combining the insights of linguistics and statistical analyses, we effectively increased the interpretability and the diagnostic utility of the binary classifiers to guide the development and evaluation of the actionability and usability of mental healthcare information.
Author Contributions: M.J. was responsible for research conceptualisation, data interpretation and writing; W.X. responsible for formal analysis and development of machine learning models; X.Q. and R.H. responsible for data curation and collection. All authors have read and agreed to the published version of the manuscript.