Diagnostic Accuracy of Digital Solutions for Screening for Cognitive Impairment: A Systematic Review and Meta-Analysis

: The early detection of cognitive impairment is essential in order to initiate interventions and guarantee access to healthcare services. Digital solutions are emerging in the literature as an alternative approach to cognitive screening. Our primary goal is to synthesize the evidence on digital solutions’ diagnostic ability to screen for cognitive impairment and their accuracy. A secondary goal is to distinguish whether the ability to screen for cognitive impairment varies as a function of the type of digital solution: paper-based or innovative digital solutions. A systematic review and meta-analysis of digital solutions’ diagnostic accuracy were conducted, including 25 studies. Digital solutions presented a variable diagnostic accuracy range. Innovative digital solutions offered at least 0.78 of sensitivity but showed lower specificity levels than the other subgroup. Paper-based digital solutions revealed at least 0.72 of specificity, but sensitivity started at 0.49. Most digital solutions do not demand the presence of a trained professional and include an automatic digital screening system and scoring, which can enhance cognitive screening and monitoring. Digital solutions can potentially be used for cognitive screening in the community and clinical practice, but more investigation is needed for an evidence-based decision. A careful assessment of the accuracy levels and quality of evidence of each digital solution is recommended.


Introduction
There are several known risk factors (e.g., diabetes, hypertension, hypercholesterolemia, depression, physical frailty, a low education level, or a low social support level) contributing to neurodegenerative diseases such as Alzheimer, Parkinson, Huntington, or frontotemporal dementia, but aging is the strongest one [1][2][3][4][5][6].Therefore, the prevalence of these diseases increases as our society ages.
Mild cognitive impairment (MCI), an intermediate stage between normal aging and dementia, is characterized by an objective cognitive decline in one or more cognitive domains (e.g., memory, attention, language, or executive function) without any significant impairment in daily-life activities [7], and may be associated with a variety of underlying causes, including Alzheimer's pathophysiology [7][8][9].In turn, dementia is a major neurocognitive disorder that is characterized by a significant decline in one or more cognitive domains that interferes with a person's independence in daily activities [10].Although there is evidence that patients with MCI may experience reversion to cognitive normality [10,11], there is a high probability that this condition will progress to dementia.Therefore, early detection of MCI is critical to effectively initiate the intervention (including counseling, psychoeducation, cognitive training, and medication [12]), and guarantee to both patients and relatives access to relevant healthcare services [13].However, MCI is significantly misdiagnosed due to a diverse set of barriers, namely the high prevalence of comorbidities among older adults, a lack of expertise or limited confidence of the practitioners, the short duration of most primary care visits, limitations of the assessment instruments, or the inadequacy of electronic health record systems in terms of the integration of cognitive assessments, which limits the ability to track an individual's cognitive function over time [7].
Despite these barriers, there are many screening tests that provide a quick evaluation of cognitive and functional aspects.At present, two of the most well-known cognitive screening tests are the Mini-Mental State Examination (MMSE) [14] and the Montreal Cognitive Assessment (MoCA) [15], which include tasks for assessing multiple cognitive domains.In addition to MMSE and MoCA, other currently available cognitive tests also encompass multiple cognitive domains, including the Neuropsychiatry Unit Cognitive Assessment Tool (NUCOG) [16], the Saint Louis University Mental Status examination (SLUMS) [17], the Self-Administered Gerocognitive Examination (SAGE) [18], or Addenbrooke's Cognitive Examination III (ACE-III) [19].In turn, screening tests such as the Alzheimer Quick Test (AQT) [20], Scenery Picture Memory Test (SPMT) [21], Memory Impairment Screen (MIS) [22], Mini-Cog [23,24], or Clock Drawing [25] measure one or two cognitive domain (i.e., attention for the AQT, episodic memory for the SPMT, memory and orientation for the MIS, memory and visuospatial abilities for the Mini-Cog, or executive functions and visuospatial abilities for the Clock Drawing), but require less than five minutes to be applied [26].
Computerized solutions to support neuropsychological tests have existed for several decades and might use different types of interaction devices, be it computers, handheld devices, or virtual reality [27].Some solutions offer adaptations of paper-based tests to evaluate specific cognitive domains [28] (e.g., the Trail-Making Test or Simple and Complex Reaction Time) or multiple cognitive domains [27] (e.g., MoCA [29], MMSE [30], or SAGE [31]), while other solutions (e.g., Memoro [32], the NutriNet-Santé Cognitive Test Battery (NutriCog) [33], or the Cambridge Neuropsychological Test Automated Battery (CANTAB) [34]) were specifically developed to be applied using electronic means.
In recent years, several innovative solutions have been developed for diagnosing, monitoring (e.g., artificial intelligence applied to radiomics analysis [35]), and managing cognitive impairment (e.g., digital solutions to support self-management in older people with cognitive impairment [36]).Furthermore, the scientific literature reports the development of new instruments that are able to monitor individuals in their residential environments without the presence of a health professional [27].This possibility maximizes flexibility and widens people's access to cognitive assessment at lower costs [27].In this respect, smart devices (e.g., smartphones, smartwatches, or smart-home devices) may collect data on individuals' habits and patterns, which can be analyzed to detect subtle changes that may indicate a decline in cognitive performance [37].Moreover, serious games and virtual reality are alternative approaches to cognitive screening and may also reduce feelings of test anxiety [37,38].
The research question to be addressed in this systematic review and meta-analysis is: how accurate are digital solutions for detecting both the presence and absence of cognitive impairment in individuals aged 18 years old and over?The primary goal of the current review is to synthesize the evidence on digital solutions' diagnostic ability to screen for cognitive impairment and their accuracy.A secondary goal is to distinguish whether the ability to screen cognitive impairment varies as a function of the type of digital solution: (1) based, in essence, on pre-existing traditional tests, named paper-and-pencil tests (abbreviated as paper-based digital solutions throughout this article); (2) developed from their inception to be applied by electronic means (abbreviated as innovative digital solutions throughout this article).

Protocol Registration
This systematic review was conducted considering the recommendations of the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy [39], and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [40].The protocol was registered in PROSPERO [41] on the 2nd of November 2021 (CRD42021282993).

Search Strategy and Study Eligibility Criteria
The search was performed in Scopus, Web of Science, and PubMed in September 2022.Databases were searched from inception to August 2022 using the following Boolean expression: ('cognitive screening' OR 'cognitive test' OR 'memory screening' OR 'memory test' OR 'attention screening' OR 'attention test') AND ('computer' OR 'game' OR 'gaming' OR 'virtual' OR 'online' OR 'internet' OR 'mobile' OR 'app' OR 'digital').
To be included in this review, studies had to: (i) focus on any digital solution (the index test, i.e., the new or alternative test whose accuracy is being evaluated against a reference standard, i.e., the test against which the index test is being compared and that is considered a "gold standard" [42]) that can be used as a generic community-based screening tool for cognitive impairment, and that was self-administrated, i.e., performed independently by the participant without a professional conducting the test [27]; (ii) include a sample of adults (≥18 years old) or older adults (≥65 years old); (iii) compare the digital solution with a reference standard (i.e., another instrument, a clinical assessment, or a combination of these); (iv) be written in English; (v) follow case-control, cross-sectional, or cohort designs that at some point allow for the identification of two groups (with and without cognitive impairment); (vi) report at least one diagnostic accuracy property, namely sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), or, alternatively, provide enough data to calculate these indicators.Studies that included participants that had any acute neurological condition or cognitive impairment, or were institutionalized were excluded.In addition, studies that reported on digital solutions used as a monitoring tool for patients with an existing cognitive impairment diagnosis were also excluded.

Study Selection Process and Data Extraction
All retrieved references were imported into the Mendeley Desktop software, Version 1.19.8, and checked for duplicates by one author (NPR).This author (NPR) screened the titles and abstracts of all citations according to the predefined study-selection criteria.Then, the full texts of potentially relevant articles were retrieved and independently assessed by two randomly chosen authors from a set of three authors (AGS, AIM, and NPR), to verify if the inclusion and exclusion criteria were met.If a consensus could not be reached between the two authors, the third author was consulted.
Data from included studies were extracted by two authors (AIM and MM) using an electronic form developed for this purpose.The extracted information was revised and discussed with the other two authors (AGS and NPR).The information extracted from each study was: the author(s) and year of publication; the sample sizes and characteristics (e.g., sex, age); the type and name of the digital solution (index test); the type and name of the reference standard test; and the diagnostic accuracy property (e.g., estimates of sensitivity and specificity).For each study, the information used to construct a two-by-two contingency table for each index test, including the number of True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) results was also extracted.When these counts were missing from the study, the data needed (e.g., sample size, number of participants with the target condition, estimates of sensitivity and specificity, and estimates of PPV and NPV) were extracted.
The results presented in this review consider the best cut-off reported for each index test in each study for achieving the best diagnostic ability to screen for cognitive impairment.
If more than one index test result was presented (e.g., different thresholds), we chose the results given by the better cut-off reported, considering the reference standard test.Sensitivity and specificity depend on the cut-off value considered positive to identify the target condition (i.e., generally, the higher the sensitivity, the lower the specificity, and the higher the specificity, the lower the sensitivity) [43,44].

Methodological Quality Assessment
Each manuscript was independently assessed by two randomly chosen authors from a set of three authors (AGS, AIM, and NPR).Disagreements were solved by consensus or discussion with the third author.The assessment of the eligible studies' methodological quality was performed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool (QUADAS-2).QUADAS-2 is a validated tool used to evaluate the quality of diagnostic accuracy studies [45], and comprises four domains: patient selection, index test, reference standard, and flow and timing.Each domain is assessed in terms of the risk of bias through signaling questions that can be answered with "yes", "no", or "unclear".The first three domains (i.e., patient selection, index test, and reference standard) are also assessed based on applicability concerns.Overall concerns about the risk of bias and applicability per domain are then rated as "high", "low", or "unclear".These results are defined for each domain [45].A pilot test for the bias risk assessment was conducted, using studies that were not eligible for this review.

Quality of the Evidence
The overall quality (certainty) of the evidence for each meta-analysis was assessed independently by two authors (MM and AGS), according to the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach [46,47].The GRADE approach guided the assessment and the rating of the evidence's quality and confidence considering the domains of risk of bias, inconsistency, indirectness, and publication bias.For the publication bias assessment, additional statistical analyses were conducted, namely the test for funnel plot asymmetry (Deek's test).The quality of the evidence was rated, based on the assessment of each domain, as "high", "moderate", "low", or "very low".

Data Analysis
For each study, a two-by-two contingency table was constructed, including the TP, FP, TN, and FN for the index tests.If these values were not reported in the manuscript, they were calculated from the data extracted from each study (sample size, number of participants with the target condition, estimates of sensitivity and specificity, or estimates of PPV and NPV), following the recommendations of the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy for the calculation of TP, FP, TN, and FN results [48].Approximations and rounding were made, if necessary.Calculations were double-checked and cross-checked against the accuracy measures presented in the study.
For the meta-analysis, we used hierarchical random-effects models and Receiver Operating Characteristic (ROC) analysis.Hierarchical Summary Receiver Operating Characteristics (HSROC) models were implemented for the estimation of a Summary Receiver Operating Characteristic (SROC) curve.This method provides information on the test performance, describing variations in sensitivity and specificity [43,49], considering the recommendations of the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy [39].
A SROC plot was developed, presenting the results of each study in the ROC space, the Summary ROC (SROC) curve, the summary estimates of sensitivity, and the false positive rate (1-specificity), with 95% confidence and 95% predictive regions.
A sensitivity analysis was performed by removing the studies of paper-based digital solutions and displaying the Summary ROC (SROC) curves, summary estimates of sensitivity, and the false positive rate (1-specificity), with 95% confidence and 95% predictive regions, in a SROC plot.
Data were subdivided into two subgroups considering the type of index test used: (i) paper-based digital solutions; (ii) innovative digital solutions.A SROC plot was also developed for each index test subgroup.The estimate of the summary points and confidence intervals (CI) for sensitivity and specificity were calculated.
To perform the meta-analysis, a web application developed in R (R Core Team, Vienna) using Shiny, the MetaDTA, was used [50,51].Among other features, MetaDTA allows to incorporate the data obtained by the QUADAS-2 tool into the graphical representation.

Study Selection
The results of the search performed on databases are presented in Figure 1.A total of 8557 articles were identified.In the first step, 3452 duplicate articles, 311 reviews or surveys, 171 references without an abstract or without authors, 141 articles not written in English, and one article retracted, were removed.After that, 4481 articles remained for screening based on the title and abstract.Of these, 4373 articles were excluded because they did not meet the outlined inclusion criteria, whereas 108 full-text articles were thought potentially eligible.Twenty-five studies were included in this systematic review according to the eligibility criteria (Figure 1).
A sensitivity analysis was performed by removing the studies of paper-based digital solutions and displaying the Summary ROC (SROC) curves, summary estimates of sensitivity, and the false positive rate (1-specificity), with 95% confidence and 95% predictive regions, in a SROC plot.
Data were subdivided into two subgroups considering the type of index test used: (i) paper-based digital solutions; (ii) innovative digital solutions.A SROC plot was also developed for each index test subgroup.The estimate of the summary points and confidence intervals (CI) for sensitivity and specificity were calculated.
To perform the meta-analysis, a web application developed in R (R Core Team, Vienna) using Shiny, the MetaDTA, was used [50,51].Among other features, MetaDTA allows to incorporate the data obtained by the QUADAS-2 tool into the graphical representation.

Study Selection
The results of the search performed on databases are presented in Figure 1.A total of 8557 articles were identified.In the first step, 3452 duplicate articles, 311 reviews or surveys, 171 references without an abstract or without authors, 141 articles not written in English, and one article retracted, were removed.After that, 4481 articles remained for screening based on the title and abstract.Of these, 4373 articles were excluded because they did not meet the outlined inclusion criteria, whereas 108 full-text articles were thought potentially eligible.Twenty-five studies were included in this systematic review according to the eligibility criteria (Figure 1).

Methodological Quality
The results of the QUADAS-2 assessment are summarized in Table 1 and displayed in Figure 2. The risk of bias in the flow and timing domain is low in 17 out of the 25 diagnostic accuracy studies evaluated.The risk of bias in the reference standard and index text domains is unclear in 16 out of the 25 studies.Eighteen studies present a high risk of bias in the patient-selection domain.Concerns about risk applicability for the patient-selection domain were rated as high in 12 studies, low in 12 studies, and unclear in 1 study.Concerns on the applicability in the domains of the reference standard and the index test were rated as low for most of the studies.The exceptions were three studies [52][53][54] that scored high in the domain of the reference standard and one study [55] that scored high in the domain of the index test.

General Overview of Included Studies
The studies included in this review adopted distinct definitions of the target condition.Most of them ( 18

General Overview of Included Studies
The studies included in this review adopted distinct definitions of the target condition.Most of them (18 out of the 25) defined the target condition as Mild Cognitive Impairment (MCI), including two studies that used different terminologies, namely Subtle Cognitive Impairment (SCI) and Mild Cognitive Dementia (MCD).One study considered amnestic Mild Cognitive Impairment (aMCI) as the target condition.Three studies included MCI or Mild Impairment (MI) and other clinical conditions (e.g., MCI and Dementia, MCI and Mild Alzheimer's Disease, MI and Impairment).Three studies specified Cognitive Impairment (CI) as the target condition; one of them included a significant percentage of severe cases of dementia and was excluded from the meta-analysis [66], as the other studies did not have a substantial proportion of severe cases of dementia in their samples.
Twenty-three studies out of the 25 studies included used distinct instruments and/or clinical assessment processes as reference standards.
The index tests differed across all included studies, except for two studies that used the MemTrax test (MTX) [67,72], and two studies that used the Brain Health Assessment test (BHA) [71,75].Subgrouping the studies, there were 16 studies reporting on paperbased digital solutions and 9 studies reporting on innovative digital solutions.The index tests reporting on paper-based digital solutions varied from a direct transposition of the original paper-based test [62,63] to a substantial modification, including visual and visuospatial tasks [65] or a creation of a virtual environment [64].Regarding the innovative digital solutions, this is also a very diverse subgroup, including solutions involving digital tasks [58,59], virtual reality and gaming [73], and artificial intelligence methods [55,76].The characteristics of the included studies, the reference standard used, and the index tests' description can be found in the Supplementary Materials.

Meta-Analysis Results
The meta-analysis included 24 studies.The HSROC models for the estimation of a Summary ROC (SROC) curve project the results of each of the 24 studies in the ROC space, with the covariate of the index test subgroup, and display the summary estimates of sensitivity and the false positive rate (1-specificity), with 95% confidence and 95% predictive regions (Figure 3).The results indicate that sensitivity values for the index tests vary between 0.49 and 0.95 and the specificity values vary between 0.50 and 0.91.Innovative digital solutions presented values for sensitivity that vary between 0.78 and 0.94 and specificity values that vary between 0.50 and 0.90.The sensitivity of paper-based digital solutions varies between 0.49 and 0.95 and their specificity varies between 0.72 and 0.91.
The results of the sensitivity analysis of the original model (a random-effects metaanalysis of all digital solutions), when removing the studies of paper-based digital solutions (sensitivity analysis model), are displayed in Figure 4.The results of the sensitivity analysis of the original model (a random-effects meta analysis of all digital solutions), when removing the studies of paper-based digital solu tions (sensitivity analysis model), are displayed in Figure 4.The results of the sensitivity analysis of the original model (a random-effects meta analysis of all digital solutions), when removing the studies of paper-based digital solu tions (sensitivity analysis model), are displayed in Figure 4.The meta-analysis's accuracy estimates of the sensitivity, specificity, and false positive rate, with 95% confidence intervals (CI), are presented in Table 2.The meta-analysis's accuracy estimates of the sensitivity, specificity, and false positive rate, with 95% confidence intervals (CI), are presented in Table 2.For the meta-analysis of all digital solutions, a low-quality estimate of the sensitivity is 0.79 (95% CI: 0.75-0.83),and a low-quality estimate of the specificity is 0.77 (95% CI: 0.73-0.81)(Table 3).For the meta-analysis of innovative digital solutions, a moderate quality estimate of the sensitivity is 0.82 (95% IC: 0.79-0.86),and a low-quality estimate of the specificity is 0.73 (95% IC: 0.64-0.80),and for the meta-analysis of paper-based digital solutions, a low-quality estimate of the sensitivity is 0.77 (95% IC: 0.70-0.83),and a moderate quality estimate of the specificity estimate is 0.78 (95% IC: 0.74-0.82)(Table 3).
The results of each of the 24 studies in the ROC space with the quality assessment obtained using the QUADAS-2 tool, namely concerning the risk of bias and quality concerns, are present in the Supplementary Materials.Serious (downgraded one level).Seventeen studies (71%) suggested a high risk of bias in the patient selection, and sixteen studies (67%) presented an unclear risk of bias for the index test and reference standard domains, as assessed by the QUADAS-2.
Serious (downgraded one level).High heterogeneity was identified based on the dissimilarity of the point estimates, nonoverlap of CIs through forest plots, and examining the random-effects meta-analysis figure, where the 95% prediction region is much larger than the 95% confidence region.

Not serious.
Regarding indirectness, no critical differences were found between the populations studied and those for whom the recommendation is intended.Also, there is a low concern about the applicability for the index and reference test domains.

Not serious.
The CI of the specificity summary point was considered sufficiently narrow not to demand downgrading.

None.
A test for funnel plot asymmetry (Deek's test) for assessment of publication bias suggests that the publication bias is low (p-value = 0.78).

Low
Sensitivity (innovative digital solutions) 9 studies (628 participants) Serious (downgraded one level).Six studies (67%) suggested a high risk of bias in the patient selection, eight studies (89%) presented an unclear risk of bias for the index test domain, and seven studies (78%) presented an unclear risk of bias for the reference standard domain, as assessed by the QUADAS-2.

Not serious.
Low heterogeneity was identified based on the similarity of the point estimates, in the overlap of CIs with the inspection of forest plots, and in examining the random-effects meta-analysis figure, where the 95% prediction region is close to the 95% confidence region.
Not serious.Regarding indirectness, no critical differences were found between the populations studied and those for whom the recommendation is intended.Also, there is a low concern about applicability for the index and reference test domains.

Not serious.
The CI of the sensitivity summary point was considered sufficiently narrow not to demand downgrading.

None.
A test for funnel plot asymmetry (Deek's test) for the assessment of publication bias suggests that the publication bias is low (p-value = 0.94).

Discussion
This systematic review assessed the diagnostic accuracy of digital solutions used for cognitive screening, further analyzing whether these were paper-based digital solutions or innovative digital solutions.There is low-to moderate-quality evidence suggesting that digital solutions are reasonably sensitive and specific to be used for cognitive impairment screening.
The index tests assessed were quite variable, with sensitivity levels varying between 0.49 and 0.95 and specificity levels between 0.50 and 0.91.The index tests classified as innovative digital solutions offered at least a sensitivity value of 0.78 but showed lower specificity levels than the other subgroup (between 0.50 and 0.90).The index tests classified as paper-based digital solutions revealed at least a specificity value of 0.72, but sensitivity started at 0.49 (and eight studies out of fifteen reported sensitivity values below 0.78).
The study that reported higher sensitivity values among those tests classified as paperbased digital solutions reported on the Beijing version of the MoCA (sensitivity = 0.95; specificity = 0.87) [62].This performance was similar to the MoCA paper-and-pencil version of the instrument for detecting MCI in elderly Chinese living in communities (sensitivity = 0.81; specificity = 0.83) [77], suggesting that both versions are equivalent.For the subgroup of tests classified as innovative digital solutions, the Digital Screening System [76] showed the highest sensitivity and specificity levels (sensitivity = 0.85, specificity = 0.90).These two index tests were assessed against robust reference tests (a clinical assessment performed by a team of health professionals including a neurologist, a geriatrician, and a psychiatrist (MoCA-CC [62]), and experienced doctors and neuropsychologists (Digital Screening System [76])).The assessment by a team of specialists is the gold standard for cognitive evaluation [78].The Digital Screening System aims to assess visuospatial constructional capabilities, visual memory function, and cognitive functions, such as visuospatial abilities, visual episodic memory, organization skills, attention, and visuomotor coordination.It is based on the neuropsychological test Rey-Osterrieth Complex Figure and uses a data-driven convolutional neural network architecture through transfer learning and deep learning methods [76].Despite being developed from inception to be applied by electronic means, most innovative digital solutions are inspired by traditional neuropsychological tests.In the innovative digital solutions subgroup, the index test Virtual Supermarket Program (VSP) stands out for using virtual reality game-based tests in screening for MCI in older adults, showing an attempt to develop a test that uses a task from daily life, potentially increasing its ecological validity.Interestingly, this test showed high sensitivity and specificity values (sensitivity = 0.85; specificity = 0.80) [73], suggesting that there is value in exploring the use of game-based tests to screen for cognitive impairment.
The index test that presented lower sensitivity and specificity in the subgroup of index tests based on paper-and-pencil tests is the MemTrax test (MTX) (sensitivity = 0.49, specificity = 0.78) [67].This index test was based on the Continuous Recognition Task (CRT) paradigm.Among the index tests developed from inception to be applied as digital solutions, Cognivue [70] and CogEvo [58] showed the lowest sensitivity and specificity levels (Cognivue: sensitivity = 0.78, specificity = 0.50; CogEvo: sensitivity = 0.78, specificity = 0.54).These three index tests that presented the lowest sensitivity/specificity levels were compared against a reference standard consisting of only brief cognitive screening instruments (i.e., MoCA, SLUMS, and MMSE, respectively).The MoCA paper-andpencil test demonstrated a sensitivity of 90% and a specificity of 87% for detecting MCI [15].The MMSE paper-and-pencil test showed a pooled sensitivity of 85% and a specificity of 86% in a non-clinical community setting [79].The SLUMS paper-and-pencil test for detecting MCI in patients with less than a high school education had a sensitivity of 92% and a specificity of 81%, and in patients with a high school education or more, a sensitivity of 95% and a specificity of 76% [17].Despite the relatively high sensitivity and specificity levels, these instruments are not the gold standard for cognitive assessment and, therefore, their use might have affected the sensitivity and specificity calculations of the index test and, certainly, undermines the confidence in the reported results.
The early detection of cognitive impairment is critical to an early intervention [12,13].Index tests with high sensitivity levels are essential when the goal is to identify a serious disease with available treatment [44,80].Digital solutions emerge as a valid alternative for cognitive screening, potentially enhancing cognitive screening and monitoring in the general and clinical population, since most do not require the presence of a trained professional and have an automatic digital screening system and scoring [52,62,76], decreasing the costs associated with their use and facilitating the screening for high numbers of individuals.Digital solutions can be valuable in neuropsychological assessment, enabling the development of large-scale, norm-based, and technology-driven tests [28].These tools produce large cognitive datasets that can be informative through machine learning and big data analysis, contributing to the detection of patterns and declines in cognitive performance [28].The accuracy estimates of sensitivity, specificity, and the false positive rate found in this meta-analysis suggest that digital solutions have satisfactory accuracy and the potential to be used as instruments for cognitive screening.However, these estimates must be interpreted and compared with caution, since the GRADE evidence assessment and rating of these accuracy estimates mainly showed a low quality of evidence.The risk of bias and inconsistency found in the GRADE assessment downgraded the quality of the evidence.
The quality of the included studies as evaluated by the QUADAS-2 tool suggests a risk of bias in the patients' selection domain, including for those studies presenting the digital index tests with higher sensitivity/specificity values.A test accuracy study with a high risk of bias in the participant selection domain can give inflated estimates of sensitivity and specificity [81].Despite the different definitions used by the studies, we found relative homogeneity in the target condition, as they all focus on the diagnostic ability and accuracy when screening for cognitive impairment.Nevertheless, the reference standards display substantial methodological heterogeneity.This heterogeneity was due to significant variations in the instruments adopted and/or the clinical assessment process followed across the studies.A similar reference standard, preferentially a gold standard, should be applied across studies to facilitate accuracy comparisons and increase the confidence in the results [39].
Considering the heterogeneity in reference standards and index tests across studies, the meta-analysis estimates have limitations, and the interpretation and comparison of estimates should be performed cautiously [43].Also, the high risk of bias in the patients' selection downgraded the quality of the evidence.When applying the GRADE approach, overall, there was a serious risk of bias due to less robust procedures regarding patient selection, the index test, and reference standard domains, and consequently, the quality of evidence was downgraded by one level in this domain.The high heterogeneity of the outcomes of the included studies also prompted the downgrading of the evidence due to inconsistency by one level.These aspects must be considered in the design of cognitive diagnostic accuracy studies to improve the quality of the evidence.
Future studies should adopt more rigorous, at-random sampling procedures to reduce the probability of the risk of bias from patient recruitment.Also, future diagnostic tools should consider adopting a similar gold standard to a reference test to facilitate comparisons and increase the confidence in the results.Gold standards involve the assessment of multiple cognitive domains, including memory, by qualified professionals [78].However, investigators and practitioners must consider the diagnostic properties of the different digital solutions and the reference test against which the accuracy values were calculated to make an informed choice.
The impact of participants' digital skills on the access to and on the results of digitally administered tests can also be addressed in future studies.Also, the feasibility of digital solutions for cognitive remote screening for specific populations also needs to be investigated in future studies.
out of the 25) defined the target condition as Mild Cognitive Impairment (MCI), including two studies that used different terminologies, namely Subtle Cognitive Impairment (SCI) and Mild Cognitive Dementia (MCD).One study considered amnestic Mild Cognitive Impairment (aMCI) as the target condition.

Figure 3 .
Figure 3. Random-effects meta-analysis of all digital solutions-Summary ROC (SROC) curve, sum mary estimates of sensitivity and the false positive rate (1-specificity), with 95% confidence and 95% predictive regions, presenting the covariate index test subgroup.

Figure 3 .
Figure 3. Random-effects meta-analysis of all digital solutions-Summary ROC (SROC) curve, summary estimates of sensitivity and the false positive rate (1-specificity), with 95% confidence and 95% predictive regions, presenting the covariate index test subgroup.

Figure 3 .
Figure 3. Random-effects meta-analysis of all digital solutions-Summary ROC (SROC) curve, sum mary estimates of sensitivity and the false positive rate (1-specificity), with 95% confidence and 95% predictive regions, presenting the covariate index test subgroup.

Figure 4 .
Figure 4. Original model (random-effects meta-analysis of all digital solutions) sensitivity analysis, without the studies of paper-based digital solutions (sensitivity analysis model)-Summary ROC (SROC) curve, summary estimates of sensitivity and the false positive rate (1-specificity), with 95% confidence and 95% predictive regions.

Figures 5
Figures 5 and 6 display the summary estimates of sensitivity and the false positive rate (1-specificity), presenting the 95% confidence and 95% predictive regions for the index tests subgrouped according to the type of test (paper-based digital solutions or innovative digital solutions).Each study is represented with a circle in the ROC space.The forest plots of sensitivity and specificity are presented per study.

Figure 4 .
Figure 4. Original model (random-effects meta-analysis of all digital solutions) sensitivity analysis, without the studies of paper-based digital solutions (sensitivity analysis model)-Summary ROC (SROC) curve, summary estimates of sensitivity and the false positive rate (1-specificity), with 95% confidence and 95% predictive regions.

Figures 5 Figure 5 .
Figures5 and 6display the summary estimates of sensitivity and the false positive rate (1-specificity), presenting the 95% confidence and 95% predictive regions for the index tests subgrouped according to the type of test (paper-based digital solutions or innovative digital solutions).Each study is represented with a circle in the ROC space.The forest plots of sensitivity and specificity are presented per study.

of studies with low, high, or unclear CONCERNS regarding APPLICABILITY
Three studies in-

Table 2 .
Summary accuracy estimates-meta-analysis summary points of the sensitivity, the specificity, and the false positive rate, with 95% confidence intervals (CI).

Table 2 .
Summary accuracy estimates-meta-analysis summary points of the sensitivity, the specificity, and the false positive rate, with 95% confidence intervals (CI).

Table 3 .
GRADE assessment results for the meta-analysis.