Clinical Utility of the Parent-Report Version of the Strengths and Difficulties Questionnaire (SDQ) in Latvian Child and Adolescent Psychiatry Practice

Background and Objectives: Screening instruments can be crucial in child and adolescent mental healthcare practice by allowing professionals to triage the patient flow in a limited resource setting and help in clinical decision making. Our study aimed to examine whether the Strengths and Difficulties Questionnaire (SDQ), with the application of the original UK-based scoring algorithm, can reliably detect children and adolescents with different mental disorders in a clinical population sample. Materials and Methods: a total of 363 outpatients aged 2 to 17 years from two outpatient child psychiatry centres in Latvia were screened with the parent-report version of the SDQ and assigned clinical psychiatric diagnoses. The ability of the SDQ to predict the clinical diagnosis in major diagnostic groups (emotional, conduct, hyperactivity, and developmental disorders) was assessed. Results: The subscales of the parent-report SDQ showed a significant correlation with the corresponding clinical diagnoses. The sensitivity of the SDQ ranged 65–78%, and the specificity was 57–78%. The discriminative ability of the SDQ, as measured by the diagnostic odds ratio, did not quite reach the level of clinical utility in specialised psychiatric settings. Conclusions: We suggest the SDQ be used in primary healthcare settings, where it can be an essential tool to help family physicians recognise children needing further specialised psychiatric evaluation. There is a need to assess the psychometric properties and validate the SDQ in a larger populational sample in Latvia, determine the population-specific cut-off scores, and reassess the performance of the scale in primary healthcare practice.


Introduction
Mental, behavioural, and neurodevelopmental disorders in children and adolescents have been a rising concern during the last decades worldwide [1]. They have become the leading cause of disability in this population cohort in developed and developing countries alike [2].
Childhood and adolescence are critical stages of development for mental health and wellbeing throughout the lifespan. Most mental health problems of adulthood have their onset during or before adolescence [3] making this a critical period for recognition and treatment. Early identification and access to appropriate, evidence-based psychosocial interventions and support in childhood mental, behavioural, and neurodevelopmental disorders are essential to good recovery and better psychosocial functioning in adulthood [4,5].
While the global coverage of prevalence data for mental disorders in children and adolescents is limited, and only one-quarter of countries are collecting data on the number of children treated by a mental health professional [6], there is clear evidence of a massive gap between the number of children and adolescents needing mental health treatment and support and the number of children receiving it in the mental health services [7]. Mental health services worldwide are overwhelmed by demand that has been further increased as an effect of the COVID-19 pandemic, which can result in long waiting times and more suffering for the youngsters and families seeking help [8].
The introduction of screening procedures in mental health services can be a helpful step because it can potentially allow professionals to triage help-seeking patients based on their level of risk of having a mental disorder and determine the most appropriate treatment programme and level of care. This allows for youngsters with the highest risks and the highest need for intervention to be prioritised and for less time to be spent on psychiatric evaluation of healthy children. However, for a screening procedure to work, we must be sure that the screening tools used have reasonable validity and clinical utility in the population in which they are used [9].
The Strengths and Difficulties Questionnaire (SDQ) [10] has long been established as one of the most widely used screening instruments in child mental health research and clinical practice. It is easy to complete, relatively short, and user-friendly as it covers not only the child's difficulties but also their strengths. It allows comparisons between different populations and is sensitive to change over time [11].
The SDQ consists of 25 items comprising of five subscales: emotional symptoms, peer relationship problems, conduct problems, hyperactivity/inattention, and pro-social behaviour. The "total difficulties" scale is a compound scale that can be calculated by summing up all the "difficulties" subscales, excluding the pro-social subscale that represents the child's "strengths" [10]. In later studies, two other compound scales were proposed: the internalising difficulties scale, which is the sum of emotional problems and peer problems subscales, and an externalising problems scale which is the sum of conduct problems and hyperactivity subscales [12]. The SDQ is available in parent-report and teacher-report versions for 2-to 4-year-olds and 4-to 17-year-olds, as well as a self-report version for 11to 17-year-olds.
Initially, the SDQ was suggested as a screening instrument in community screening programmes to potentially increase the detection of child psychiatric disorders, thereby improving access to effective treatments [13]. However, over the years, the SDQ has been used more and more in clinical settings as a measure of child psychopathology and in other types of research, e.g., aetiological, longitudinal, and service evaluation studies [11]. In the UK, parent and teacher versions of the SDQ demonstrated good validity, not only in discriminating between children with diagnosed mental disorders and a representative community sample [13] but also in identifying different categories of disorders within the clinical sample [14]. The parent and teacher SDQs proved to be valid and helpful questionnaires for use in the framework of a multi-dimensional behavioural assessment and appeared to be well suited for screening purposes, longitudinal monitoring of therapeutic effects, and scientific research purposes [15]. The findings by Hall et al. indicate that the SDQ is a valid outcome measure for use in RCTs and clinical settings [16].
There is a vast body of research on the internal consistency and reliability of the subscales of the SDQ in different contexts and populations with mixed results. In some studies, the SDQ exhibited strong internal consistency [17], but there are also studies expressing concerns regarding the reliability of the subscales, with most subscales showing only satisfactory or low internal consistency [18].
For younger age groups, it has been suggested that only the total difficulties score should be used for screening purposes because some subscales are unreliable [19]. In their systematic review, Kersten et al. suggested that an assessment of a pre-schooler should not rely on a single informant because of a moderate level of consistency between different informants [20].
Many studies have been done on the psychometric properties of the Strengths and Difficulties Questionnaire in different cultures. The Chinese version of the SDQ exhibited high levels of reliability and validity, indicating that the SDQ is appropriate for assessing psychopathology in Chinese adolescents [18]. The parent-and self-report versions of the SDQ showed good concurrent validity and psychometric properties in a Dutch community sample [21]. It was concluded that the results favour using the Swedish SDQ-S as a screening instrument for adolescents, despite the low internal consistencies of some of its subscales [22]. The usefulness of SDQ UK-based scoring algorithms in detecting mental health disorders among Norwegian patients was only partly supported; it seems best suited to identify children and adolescents who do not require further psychiatric evaluation [23].
The SDQ has been routinely used as a screening tool in Latvian child and adolescent mental health services because it is one of the rare screening tools that have been translated and adapted into the Latvian language in 2014 and is freely available for use in clinical practice from the developer's website [24]. There is some data regarding the validity of the Latvian version of the SDQ in the populational samples, and it has been used in comparative research [25]. In a general school-based sample of 8-10-year-old children (n = 269), the Cronbach's alfas for the Latvian parent-report SDQ subscales ranged from 0.49 to 0.70 [26]. In another recent study in a sample of 3-6-year-old children (n = 507), the reported internal consistency of the SDQ subscales ranged from 0.72 to 0.8 [27]. However, the validity and utility of the SDQ in the clinical sample of Latvian children and adolescents have never been evaluated. There is also no data available regarding the population-specific cut-off scores to differentiate normal and abnormal SDQ results, so in clinical practice, the original UK-based scoring algorithm is used in Latvia, which has been shown to be a potentially problematic practice in previous studies done in other countries [23].
This study aimed to examine whether the SDQ with the application of the original UKbased scoring algorithm, as it is currently used in the outpatient child-adolescent psychiatry settings in Latvia, can reliably detect children and adolescents with emotional disorders, conduct disorder, hyperactivity, or developmental disorders and to examine the sensitivity, specificity, and other predictive properties of this screening instrument in a sample of Latvian children and adolescents with clinically established mental health diagnoses.

Participants
The study sample consisted of 2-17-year-old children and adolescents who received outpatient psychiatric care in two outpatient psychiatry centres in Latvia from November 2019 to October 2020. Screening data were collected before the first-time psychiatric appointment from the patient's parent in all participants of the study. In one of the study centres-Children's Clinical University Hospital Child psychiatry clinic, located in the capital of Riga and providing secondary and tertiary psychiatric inpatient and outpatient care to paediatric patients from all over the country but mainly from the metropolitan region-the screening was a part of routine clinical practice, and the SDQ screening data were available for retrospective analysis in the patient's medical documentation. In the other study centre-Hospital "Gintermuiza", a specialised psychiatric hospital located in the city of Jelgava (Latvia's 4th largest city) and providing secondary psychiatric inpatient and outpatient care to children and adults from mostly rural Zemgale region-the screening procedure was introduced for the purposes of this study, so written informed consent was obtained from the study participants before inclusion in the study.
The second centre was included in the study to make the clinical study sample, despite being a convenience sample, maximally representative of the types of outpatient child mental health services in Latvia and of the types of clinical help-seeking populations receiving care. There were no significant differences in the performance of the SDQ between centres in further analysis, so data on centre-specific characteristics is not separately reported in the Results section.

Questionnaires
The paper-based Latvian version of the parent-report SDQ was used. The questionnaire consists of 25 items which cover five subscales: emotional difficulties, peer problems, hyperactivity and inattention, conduct problems, and prosocial behaviour. Each item is rated on a 3-point scale (0 = not true, 1 = somewhat true, 2 = certainly true). Scores for each subscale were calculated according to the SDQ manual. The scale scores were calculated by summing the item scores per scale. Scales with at least one missing value were excluded from the analysis.

Diagnostic Algorithm
The positive screening result was defined using the UK-based scoring algorithm created by the author of the screening instrument based on a large UK community sample [18]. A positive screening result was defined as having "high" or "very high" level of difficulties, which in the original UK populational sample had identified children scoring higher than the 90th percentile. In previous research done in clinical samples both by Goodman in the UK population [14] and researchers in other countries [28], defining "caseness" using this algorithm has appeared to be more clinically relevant than choosing a lower (above 80th percentile) or higher (above 95th percentile) diagnostic threshold.

Clinical Diagnoses
Clinical psychiatric diagnoses were established by a board-certified child and adolescent psychiatrist with the involvement of other members of the multidisciplinary outpatient team (e.g., clinical psychologist), based on the thorough clinical-psychiatric investigation of the child and detailed anamnestic information from multiple informants. The psychiatric interview was performed with all study participants, and the clinician performing the diagnostic evaluation was blinded to the results of the screening test for the purposes of the study.
Clinical diagnoses were established and formulated according to the ICD-10 diagnostic guidelines. For the purposes of this study, the clinical diagnoses were further united in broader diagnostic groups according to the relevance of the subscales of the SDQ for capturing particular psychopathological phenomena, as reported by Goodman [13,14]. The category of "any emotional disorders" included any mood disorder (F3 category according to the ICD-10), neurotic, stress-related and somatoform disorders (F4), and emotional disorders with onset specific to childhood (F93). The category of "any conduct disorder" included conduct disorder (F91) and mixed disorders of conduct and emotions (F92). The "hyperkinetic disorder" category was defined identically to the corresponding ICD-10 category (F90). "Any conduct disorder" and "hyperkinetic disorder" groups were then merged to form a broader "externalising disorder" group. The category of "any developmental disability" included mental retardation (F7), mixed specific developmental disorder (F83), and pervasive developmental disorder (F84).

Statistical Analyses
Statistical analysis was performed with IBM SPSS Statistics, version 26 (IBM Corp., Armonk, NY, USA). Cronbach's alfa coefficients were computed to evaluate the internal consistency of the SDQ subscales and compound scales. Chi-square analysis was used to establish the correlations between dichotomous screening results and clinical diagnoses. The screening efficiency statistics were calculated in terms of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LHR + ), negative likelihood ratio (LHR − ), and diagnostic odds ratio (OR D ). The confidence intervals for sensitivity and specificity estimates were calculated on the 95% level of confidence.
Sensitivity and specificity are important metrics of a screening test that describe the accuracy of the test. Still, for clinical practice, the more relevant metrics are PPV and NPV, which represent the probability that a positive or a negative screening result correctly reflects the presence or absence of a diagnosis. Likelihood ratios are summary statistics (ratios of probabilities) that show how a positive or negative screening result changes the likelihood of a patient being diagnosed with a particular disorder. OR D is a summary measure that depicts the discriminative ability of a screening test. To interpret the results, we used the characteristics of the clinical usefulness of the test described by Fischer and colleagues, suggesting that for a test to have the potential to alter clinical decisions, it should have a LHR + > 7 and a LHR − < 0.3 or an OR D > 20 [29].
A total of 314 (86.5%) patients had received clinical diagnoses falling into only one of the categories mentioned above. Of these, 38 patients (10.5%) had two comorbid diagnoses falling into different diagnostic groups, one patient (0.3%) had 3, and 1 patient (0.3%) had received comorbid diagnoses in 4 different groups.
The internal consistency statistics of the five parent-report SDQ subscales and compound scales are presented in Table 1. Emotional problems, hyperactivity, and prosocial subscales, as well as the externalising and total difficulties scales, demonstrated acceptable internal consistency (Cronbach's alfa > 0.7). The results for the conduct problems and internalising difficulties scales were also close to being on the acceptable level. The peer problems subscale was the only SDQ scale with poor internal consistency. The results of the parent-report SDQ screening are summarised in Table 2. As expected in a clinical help-seeking sample, the screening results show much higher degrees of reported psychopathology than is usually found in the populational cohorts. The positive screening results for different SDQ subscales ranged from 31.8% to 60.2%, with 60.9% of patients screening positive on the total difficulties scale. The dichotomous screening results for the compound internalising and externalising subscales could not be calculated because Goodman reported no cut-off values for the original UK sample.  Table 3 presents the screening efficiency of the parent-report version of the SDQ. A diagnosis of any emotional disorder significantly correlated with a positive screening result in emotional problems and a negative screening result in peer problems and hyperactivity. Any conduct disorder, hyperkinetic disorder, and externalising disorder significantly correlated with a positive screening result in hyperactivity, conduct problems, and total difficulties. Hyperkinetic disorder also had a correlation with a negative screening result in emotional problems. Developmental disability significantly correlated with a positive screening result in peer problems and low or very low result on the prosocial scale.  The sensitivity and specificity of appropriate subscales of the parent-report SDQ were 67%, CI (0.57, 0.77) and 57%, CI (0.50, 0.64) for any emotional disorder; 78%, CI (0.67, 0.89) and 57%, CI (0.50, 0.64) for any conduct disorder; 65%, CI (0.55, 0.75) and 78%, CI (0.73, 0.83) for the hyperkinetic disorder; and 72%, CI (0.63, 0.81) and 44%, CI (0.36, 0.52) for developmental disability.
Overall, none of the subscales of the SDQ has reached the interval of potential usefulness for clinical decision-making, based on the LHR + , LHR − , and OR D results (LHR + > 7, LHR − < 0.3, OR D > 20).
For example, the positive screening results in the parent-report SDQ hyperactivity scale had a LHR+ of 2.95, which means that the likelihood of a child getting diagnosed with hyperkinetic disorder after a positive screening test was 2.95 times higher, whereas after getting a negative screening result, the possibility of getting a clinical diagnosis was 0.45 times lower (LHR − ). The aggregated chances of getting diagnosed with hyperkinetic disorder after a positive screening test were 6.56 times higher (OR D ) than after a negative screening test, which is interpreted as not a significant enough difference to substantially influence clinical decisions in this highly saturated patient population [28,29].

Discussion
The use of screening procedures is a widespread and potentially beneficial practice in mental health services worldwide [9]. The benefits of screening might be even greater in lower resource settings as it can allow making use of the limited resources more effectively by triaging the patient flow [30]. However, as is the case in other areas of medicine, for the screening to be beneficial, we must be sure that the instruments we use are "fit for purpose". Unfortunately, most of the research regarding the predictive properties of available mental health screening instruments in real-life clinical practice is done in high-resource settings and big countries where these instruments are usually developed, and there is little evidence regarding the clinical utility of these instruments in lower resource and smaller country conditions [28].
The primary purpose of our study was to evaluate the clinical utility of the parentreport version of the SDQ as it is currently used in Latvian child and adolescent mental healthcare settings, with the application of the original UK-based scoring algorithm developed by the author of the screening instrument.
The results of our study regarding the level of psychopathology detected by the SDQ using the same UK-based scoring algorithm in a clinical child and adolescent sample appear to be comparable to clinical samples from other countries. The number of Latvian children scoring positive in parent-report SDQ for emotional difficulties (49%) was higher than reported in Norway (25%) and the UK (33%) but slightly lower than in Bangladesh (55%). In contrast, in conduct problems, our results (47.9%) were significantly lower than in the UK (70%) but slightly higher than in Norway (40%) or Bangladesh (34%). In hyperactivity, Latvian results (31.8%) were similar to Norway (30%), slightly lower than Bangladesh (37%), and significantly lower than the UK (46%) [14,28].
The cross-cultural differences in the level of psychopathology detected by the parentreport SDQ can likely be explained by the differences in the psychometric functioning of the screening instrument in different cultural settings (cross-cultural validity of the SDQ in the absence of proper psychometric data and country-specific norms cannot be safely assumed [30][31][32]), and the differences in the functioning of the local mental healthcare systems (e.g., possible selection bias in the process of forming the clinical sample in the context of a specific mental healthcare system), rather than the actual differences in the levels of psychopathology in the population.
In our clinical sample, all the SDQ scales and subscales showed reasonable internal consistency, except the peer problem subscale which demonstrated poor internal consistency. In their systematic review of the psychometric properties of the SDQ, Kersten et al. have synthesised internal consistency estimates from 26 studies and found that the weighted average Chronbach's alfa for the peer problem subscale of the parent-report SDQ was 0.49, for conduct problems, 0.56, and for other subscales in the range of 0.62 to 0.69, but for the total difficulties scale, 0.76, which is a pattern similar to the one found in our clinical data [20]. Interestingly, a similar pattern of internal consistencies was recently reported for the adolescent self-report version of the SDQ. In a comparative study of datasets from seven different countries (Bulgaria, Germany, Greece, Netherlands, Poland, Romania, Slovenia), Duinhof et al. report the Chronbach's alfas for the peer problem subscale ranging from 0.55 to 0.65, which was lower than for other subscales of the SDQ that all reached acceptable levels of internal consistency (above 0.7) [31]. These findings can indicate that the factor structure of the parent-report version of the SDQ in non-UK populations might differ from the original factor structure of the screening instrument, with peer problem subscale items being the most problematic for cross-cultural application.
The sensitivity, specificity, and other predictive properties of the parent-report version of the SDQ found in the Latvian clinical sample are similar to the performance estimates of this screening instrument with the application of the original UK-based scoring algorithm in clinical samples of children and adolescents in other countries of the world [14,33]. In the study done by Goodman et al. in a looked-after child population in the UK, they found a sensitivity of 60.4% for conduct disorder, 78.6% for hyperkinetic disorder, and 64.3% for any anxiety or depressive disorder [34], which is similar to our findings and higher than the sensitivity and specificity reported by Goodman in a single-informant screening in the community sample [14]. In a study by Brøndbo et [23]. These findings are somewhat better than the ones in our sample. Still, similarly to our findings, in Norway, the single informant parent-report version of the SDQ did not reach a level of clinical utility that could be deemed sufficient and warrant its use as a screening tool in a highly psychopathologically saturated clinical sample of children and adolescents.

Strengths and Limitations
Despite the SDQ being a highly studied and widely used screening instrument both in scientific research and clinical practice, this is one of the few studies examining not only the psychometric and predictive properties of the scale in a particular population, but also its clinical utility as used in real-life practice. The clinicians that have established the clinical diagnoses were blinded to the results of the screening procedure to avoid biasing the result toward better agreement between the results of the screening and the clinical diagnoses. The study has been done in a sufficiently large group of patients to make inferences regarding the scale's psychometric properties with a subject-to-item ratio of 14.5.
There are a number of limitations to this study. The study sample was a convenience sample of first-time psychiatric outpatients, so there is a potential for selection bias, although the patient population included in the study could be regarded as representative of Latvian clinical day-to-day practice and is ecologically valid. Another major limitation is the use of a clinical psychiatric diagnosis as the study's assumed "golden standard". For example, in our sample, the hyperactivity and conduct problems subscales of the SDQ appeared to be very similar in their ability to predict the diagnosis of both hyperkinetic disorder and conduct disorder, which rather than being just a problem of the discriminant ability of the subscales is also likely to indicate that in Latvian clinical child psychiatry, practice physicians tend to use these clinical psychiatric diagnoses interchangeably and not discriminate between conduct problems with and without an underlying attention deficit hyperactivity disorder neurophenotype. The use of the UK-based scoring algorithm to define "positive" screening results, although being the only valid option in the absence of local population-based norms, can also be regarded as a limiting factor, as well as the use of a single-informant screening protocol. Both the predictive properties and the clinical utility of the SDQ could potentially be improved by employing a multi-informant screening protocol (based on the combination of parent-, teacher-, and self-report SDQ) as suggested by the author of the screening instrument [13,14,34].

Conclusions
This study is the first examination of the Latvian parent-report version of the SDQ performed in a clinical population sample to date. Its findings have the potential to influence the way this screening instrument is used in Latvian clinical practice. Our findings illustrate the need to assess not only the psychometric properties of a scale but also its clinical utility when making a decision to introduce a screening procedure to clinical practice. The findings of this study also add to the ongoing discussion on the cross-cultural applicability of this psychometric instrument [23,31,32].
Our study suggests that the parent-report version of the SDQ, as it is currently used in Latvian child and adolescent mental healthcare practice, has sufficient internal consistency, and the relevant subscales of the SDQ significantly correlate with the clinical diagnoses of emotional disorders, conduct disorder, hyperactivity, and developmental disability in a clinical sample.
However, the predictive properties and performance of the scale in the Latvian clinical population suggest that it might be more suitable for use in less clinically saturated samples, e.g., in populational research or as a screening tool to be used by family physicians in the primary healthcare settings to assist the decision of whether to refer the child to a specialised child mental health service for further evaluation.
Based on the results of our analysis, we suggest using an aggregated externalising difficulties score to screen for any externalising disorder rather than using hyperactivity or conduct problem scores separately.
The performance of the parent-report SDQ can be further improved by formulating the country-specific normative thresholds based on populational data in a larger generalpopulation cohort, further investigating the factor structure and psychometric properties of the screening instrument and possibly assessing its predictive properties in a primary healthcare setting.
Author Contributions: N , .B. and A.V. contributed to the conception and design of the study. A.K. was responsible for data collection. N , .B. and A.K. have analysed the data and drafted the manuscript; E.R. and A.V. participated in the data interpretation and critically reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Rīga Stradin , š University (protocol Nr.85/21.12.2017).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study in the Hospital "Gintermuiza". The study data in the Children's clinical university hospital study site was collected retrospectively from anonymised patient records and did not require separate patient consent forms to be obtained.

Data Availability Statement:
The full study protocol and datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.