Accuracy of Artificial Intelligence-Based Technologies for the Diagnosis of Atrial Fibrillation: A Systematic Review and Meta-Analysis

Atrial fibrillation (AF) is the most common arrhythmia with a high burden of morbidity including impaired quality of life and increased risk of thromboembolism. Early detection and management of AF could prevent thromboembolic events. Artificial intelligence (AI)--based methods in healthcare are developing quickly and can be proved as valuable for the detection of atrial fibrillation. In this metanalysis, we aim to review the diagnostic accuracy of AI-based methods for the diagnosis of atrial fibrillation. A predetermined search strategy was applied on four databases, the PubMed on 31 August 2022, the Google Scholar and Cochrane Library on 3 September 2022, and the Embase on 15 October 2022. The identified studies were screened by two independent investigators. Studies assessing the diagnostic accuracy of AI-based devices for the detection of AF in adults against a gold standard were selected. Qualitative and quantitative synthesis to calculate the pooled sensitivity and specificity was performed, and the QUADAS-2 tool was used for the risk of bias and applicability assessment. We screened 14,770 studies, from which 31 were eligible and included. All were diagnostic accuracy studies with case–control or cohort design. The main technologies used were: (a) photoplethysmography (PPG) with pooled sensitivity 95.1% and specificity 96.2%, and (b) single-lead ECG with pooled sensitivity 92.3% and specificity 96.2%. In the PPG group, 0% to 43.2% of the tracings could not be classified using the AI algorithm as AF or not, and in the single-lead ECG group, this figure fluctuated between 0% and 38%. Our analysis showed that AI-based methods for the diagnosis of atrial fibrillation have high sensitivity and specificity for the detection of AF. Further studies should examine whether utilization of these methods could improve clinical outcomes.


Introduction
Atrial fibrillation (AF) is the most common arrhythmia in adults worldwide.AF can be completely asymptomatic, and often its initial presentation includes thromboembolic events, such as strokes.It is estimated that more than 25% of strokes are caused by previously asymptomatic atrial fibrillation.In most of the cases, the stroke could have been prevented if the atrial fibrillation had been detected earlier, and the patients were started on anticoagulation therapy [1].
Given that many of the complications are preventable, many screening strategies have been suggested [2,3].Currently, the European Society of Cardiology (ESC) guidelines suggest opportunistic screening for people above 65 years old, and systematic screening for people > 75 years old or those with increased risk of stroke [3].The recommended screening tools include pulse check, single-lead ECG > 30 s. or 12-lead ECG interpreted by a physician [3].However, since AF is often paroxysmal, these screening methods result in many false negative results, and therefore their use is limited [2].
Over the last few years, mobile heath technology has been developing quickly [4].So far, various mobile devices and smartwatches with AI algorithms have been developed to detect AF and they demonstrate high diagnostic accuracy against a gold standard (i.e., 12-lead ECG, single-lead ECG, telemetry, Holter monitor, or implantable cardiac monitor) [5][6][7].
So far, the two main technologies used by AI-based devices to automatically detect AF are the photoplethysmography (PPG) and the single-lead ECG.The former is a photoelectric method that measures changes in blood volume in the peripheral vessels.PPG devices consist of a light source and receptor, and based on the reflected light can detect changes in the blood volume.These changes can be captured in a PPG trace which is then interpreted by an AI algorithm [8,9].The single-lead ECG methods consist of a portable or wearable device which can record a single-lead ECG trace.To complete this assessment, the individual is asked to keep two parts of their body (e.g., wrist and finger or two fingers, etc.) in touch with the device for a pre-determined time.The recording is then transmitted to an AI application for interpretation [10][11][12].These AI methods classify their recordings as "possible AF", "normal" or "no AF", "undiagnosable/unclassified", or "error" [11,12].
Compared to the conventional methods, AI-based devices for the diagnosis of AF are widely available, easy to use, and offer prolonged monitoring times, which increase the chances of detecting paroxysmal episodes of AF [12].If accurate, they can also accelerate the decision-making process by the physicians, who could use these data without the need to wait for further time-consuming investigations.In addition, single-lead ECG devices can save the ECG tracings, which can then be reviewed by a physician.
On the other hand, the rapid increase in uncertified devices and applications can lead to many false results.This can cause stress to the patients, unnecessary treatments and investigations, and a cost burden for the health care systems [12].Also, single-lead ECGs are conducted by untrained individuals rather than trained health care professionals, which can result in poor quality tracings and thus unreliable outcomes [12].
The aim of our study is to provide a systematic review and meta-analysis of the diagnostic accuracy of all the available AI-based methods for the diagnosis of atrial fibrillation.

Inclusion and Exclusion Criteria
We included: (1) diagnostic studies with a cohort or case-control design, (2) studies conducted in adults 18 years old and above, (3) studies which tested AI-based devices to detect AF, (4) studies which used an acceptable reference standard interpreted via a healthcare professional, including 12-lead ECG, 6-lead ECG, single-lead ECG, 3-lead Holter monitor and telemetry, (5) studies that provided true positive, true negative, false positive, and false negative results or provided enough data to calculate them, (6) studies in which unclassified/unreadable results by the devices were reported separately.
Exclusion criteria included: (1) conference abstracts or studies without available full text, (2) studies published in a language other than English, (3) studies that only provided measurement-based instead of individual-based results, (4) studies that validated novel devices without automated interpretation, (5) studies in which the reference standard test was not completed in all the participants.
Unclassified results are the ones that could not be classified by the automated algorithm as AF or not AF.Unreadable results are the ones that could not be interpreted by the automated algorithm, e.g., poor quality or short tracings.

Data Sources and Search Strategy
To identify all the relevant studies, we searched the databases: (1) PubMed, (2) Embase, (3) Cochrane Library, and (4) Google Scholar.In addition, we conducted a manual search for further eligible studies.
The search in PubMed was undertaken on 31 August 2022, in Cochrane Library and Google Scholar on 3 September 2022 and in the Embase database on 15 October 2022.
The search strategy we used was: ((ai OR artificial intelligence OR machine learning OR ml OR deep learning OR neural network OR wearables OR smartwatches OR wearable OR smartwatch OR applewatch OR alivecor OR iECG) AND (diagnosis OR diagnosing OR detection OR detect OR detecting) AND (af OR atrial fibrillation OR afib OR arrhythmia OR svt OR supraventricular tachycardia OR atrial flutter OR tachycardia)).
The search strategy was created by the first author (NMS), reviewed by a second member of the team (IMS), and approved by the supervising professor (AB).

Screening
The identified citations were imported in the web application Covidence, which is endorsed by the Cochrane Collaboration for the conduction of systematic reviews [15].The screening was performed by two independent and blinded researchers (NMS and IMS).Initially, duplicates were removed either automatically by the Covidence web app, or, less frequently, manually by the researchers.Following that, we screened the studies by reading the title and abstract, and then, for the selected studies, we performed a full-text review.Studies that met our inclusion and exclusion criteria were selected.In case of disagreement, the 2 researchers discussed until an agreement was reached.

Data Extraction
Data extraction was executed in Microsoft Excel, version 16.69.In case of uncertainty, a second researcher was asked to extract the data for the study in question, which was then discussed.In addition, when data calculation was impossible, the authors were contacted.If this was impossible, the study was reviewed by the second researcher before exclusion.For all the included studies, we extracted data including among others: the first author, the year of publication, the setting (inpatient vs. outpatient), the study design, the name of the device, the type of AI algorithm, the duration of the index test, the reference standard, basic demographics, true positive and negative results, false positive and negative results, and unclassified and unreadable results.

Assessment of Risk of Bias and Applicability
For the assessment of risk of bias and applicability, we used the quality assessment of diagnostic accuracy studies-2 (QUADAS-2) tool, which is recommended by the Cochrane Collaboration and the U.K. National Institute for Health and Care Excellence [16].We assessed each study in 4 domains (1) selection of participants, (2) index test, (3) reference standard, (4) flow and timing.For each study, we also assessed the first 3 domains regarding its applicability.We used predetermined signaling questions tailored to our review.The assessment of risk of bias and applicability was performed by the main researcher (NMS).

Statistical Analysis
Data synthesis was conducted separately for the two main types of technology, photoplethysmography (PPG) and single-lead ECG.For the studies that tested technologies other than the above two, we did not perform a quantitative analysis due to lack of sufficient data, however, we describe their results.As effect measures of diagnostic accuracy, we used sensitivity and specificity.For the unclassified/unreadable results, we did not perform a quantitative analysis, however we describe them separately for each group.To present the unclassified/unreadable outcomes, we used their percentages out of total results as the effect measure.Studies that tested more than one device/technology are included as separate studies.We performed subgroup analysis on the PPG (inpatients vs. outpatients) and single-lead ECG groups (inpatients vs. outpatients and duration of index test).
To calculate our summary values and create the graphical interpretations, we used the mada package in R, version 4.2.3 (which uses the bivariate model of Reitsma, which is equivalent with the HSROC of Rutter and Gatsonis when covariates are not used).Also, we used the interactive online application MetaDTA, version 2.0 [17].For the data synthesis, we used the random effects methodology due to the expected clinical heterogeneity among the studies.Due to the lack of a gold standard for the assessment of heterogeneity in diagnostic accuracy studies, we used the Zhou and Dendukuri approach, which considers the correlation between sensitivity and specificity for the calculation of I 2 [18].

Study Selection
The flowchart (Figure 1) illustrates our study selection process.We identified 14,770 studies from which 43 were selected.From those, 12 studies were excluded in a later stage.Six of them were excluded because they only provided measurement-based, and not patientbased, results [19][20][21][22][23].The remaining six studies were excluded because they either did not provide enough data or we were unable to communicate with the authors to provide data for analysis [24][25][26][27][28][29].In the end, 31 studies were included in our analysis (Figure 1).

Assessment of Risk of Bias and Applicability of PPG Studies
Fourteen studies [11,[34][35][36][37][38][39][40] were deemed high risk of bias in the participants' domain, and two studies [11,35] in the index test domain.The rest were deemed either low or unclear risk of bias (Figure 2).The studies were low in risk regarding their applicability (Figure 2).

Data Synthesis of the PPG Studies
The total sensitivity for the diagnosis of atrial fibrillation in the PPG group was 95.1% (95% C.I. 92.5-96.8%), the specificity was 96.2% (95%C.I. 94.3-97.5%), the area under the curve (AUC) for the SROC curve was 0.983 and the partial AUC was 0.961.The I 2 was 12.5% (Figures 3 and 4).
Among the studies, the AF prevalence was found to be between 2.5% and 57%, with a median prevalence of 44%.Based on these data, we used the total sensitivity and specificity to calculate the predictive false results in 1000 patients, by using different prevalence values.For prevalence of 5%, PPG devices would have resulted in 47 (95% C.I. 30-71) false positive results and 2 (95% C.I. 1-3) false negative results in 1000 patients.For the median prevalence of our studies, 44%, PPG devices would have resulted in 27 (95% C.I.  false positive results and 17 (95% C.I. 11-25) false negative results in 1000 patients.For a high prevalence of 60%, PPG devices would have resulted in 20 (95% C.I. 13-30) false positive results and 23 (95% C.I. 15-34) false negative results in 1000 patients.

Subgroup Analysis (Inpatients vs. Outpatients) of the PPG Studies
We did not proceed to a formal subgroup analysis for the PPG studies due to the low number of studies per subgroup, but also because we did not observe clusters in this subgroup's SROC curve (Figure 5).

Assessment of Risk of Bias and Applicability of PPG Studies
Fourteen studies [11,[34][35][36][37][38][39][40] were deemed high risk of bias in the participants' domain, and two studies [11,35] in the index test domain.The rest were deemed either low or unclear risk of bias (Figure 2).The studies were low in risk regarding their applicability (Figure 2).

Data Synthesis of the PPG Studies
The total sensitivity for the diagnosis of atrial fibrillation in the PPG group was 95.1% (95% C.I. 92.5-96.8%), the specificity was 96.2% (95%C.I. 94.3-97.5%), the area under the curve (AUC) for the SROC curve was 0.983 and the partial AUC was 0.961.The I 2 was 12.5% (Figures 3 and 4).

Assessment of Risk of Bias and Applicability of the Single-Lead ECG Studies
Twelve studies [11,37,40,41,43,45,47,49,52,55] were deemed to be at high risk of bias in the participants' domain, nine studies [11,32,45,49,51,54,57] in the index test domain, four studies [45,53] in the reference standard domain, and two studies [11,45] were deemed to be at high risk of bias in the flow and timing domain (Figure 3).The studies were low in risk regarding their applicability (Figure 6).

Data Synthesis of the Single-Lead ECG Studies
The total sensitivity for the detection of atrial fibrillation by using single-lead ECG was 92.3% (95% C.I. 88.9-94.8%), the specificity was 96.2% (95%C.I. 94.6-97.4%), the area under the curve (AUC) for the SROC curve was 0.979, and the partial AUC was 0.939.The I 2 was 9.2% (Figures 7 and 8).
Among the studies, the AF prevalence was found to be between 2% and 61%, with median prevalence 31%.Based on these data, we used the total sensitivity and specificity to calculate the predictive false results in 1000 patients, by using different prevalence values.For a prevalence of 5%, a single-lead ECG device would have resulted in 73 (95% C.I. 49-106) false positive results and 2 (95% C.I. 1-3) false negative results in 1000 patients.For the median prevalence of our studies, 31%, a single-lead ECG device would have resulted in 53 (95% C.I. 36-77) false positive results and 12 (95% C.I. 8-17) false negative results in 1000 patients.For a high prevalence of 60%, a single-lead ECG device would have resulted in 31 (95% C.I. 21-44) false positive results and 23 (95% C.I. 16-32) false negative results in 1000 patients.

Subgroup Analysis (Inpatients vs. Outpatients) of the Single-Lead ECG Studies
We conducted a subgroup analysis according to the setting.In this analysis, we did not include either the studies in which the setting was not clear, or the ones that included both inpatients and outpatients.

Assessment of Risk of Bias and Applicability of the Single-Lead ECG Studies
Twelve studies [11,37,40,41,43,45,47,49,52,55] were deemed to be at high risk of bias in the participants' domain, nine studies [11,32,45,49,51,54,57] in the index test domain, four studies [45,53] in the reference standard domain, and two studies [11,45] were deemed to be at high risk of bias in the flow and timing domain (Figure 3).The studies were low in risk regarding their applicability (Figure 6).novel parasternal lead).

Data Synthesis of the Single-Lead ECG Studies
The total sensitivity for the detection of atrial fibrillation by using single-lead ECG was 92.3% (95% C.I. 88.9-94.8%), the specificity was 96.2% (95%C.I. 94.6-97.4%), the area under the curve (AUC) for the SROC curve was 0.979, and the partial AUC was 0.939.The I 2 was 9.2% (Figures 7 and 8).For the inpatients, the total sensitivity was 92.9% (95% C.I. 87.6-96) and the specificity was 94.2% (95% C.I. 91.8-95.9).The AUC was 0.974 and the partial AUC was 0.898.The I 2 was 14.4%.For the outpatients, the total sensitivity was 90.7% (95% C.I. 76.8-96.6)and the specificity was 98.1% (95% C.I. 95.1-99.3).The AUC was 0.983 and the partial AUC was 0.949.The I 2 was 26.9%.Although the sensitivity was higher in the inpatient group, the specificity was higher in the outpatients.However, the 95% confidence intervals were overlapping.In addition, there was a difference in I 2 between the subgroups.In the inpatient group, the I 2 was 14.4%, and in the outpatient group it was 26.9% (Figure 9).

Subgroup Analysis (Duration of Index Test) of the Single-Lead ECG Studies
We did not proceed to a formal subgroup analysis regarding the duration of the index test since most of the studies used it for 30 s (Figure 10).Among the studies, the AF prevalence was found to be between 2% and 61%, with median prevalence 31%.Based on these data, we used the total sensitivity and specificity to calculate the predictive false results in 1000 patients, by using different prevalence values.For a prevalence of 5%, a single-lead ECG device would have resulted in 73 (95% C.I.We conducted a subgroup analysis according to the setting.In this analysis, we did not include either the studies in which the setting was not clear, or the ones that included both inpatients and outpatients. For the inpatients, the total sensitivity was 92.9% (95% C.I. 87.6-96) and the specificity was 94.2% (95% C.I. 91.8-95.9).The ΑUC was 0.974 and the partial AUC was 0.898.The I 2 was 14.4%.For the outpatients, the total sensitivity was 90.7% (95% C.I. 76.8-96.6)and the specificity was 98.1% (95% C.I. 95.1-99.3).The ΑUC was 0.983 and the partial AUC was 0.949.The I 2 was 26.9%.Although the sensitivity was higher in the inpatient group, the specificity was higher in the outpatients.However, the 95% confidence intervals were overlapping.In addition, there was a difference in I 2 between the subgroups.In the inpatient group, the I 2 was 14.4%, and in the outpatient group it was 26.9% (Figure 9).We did not proceed to a formal subgroup analysis regarding the duration of the index test since most of the studies used it for 30 s (Figure 10).

Unclassified/Unreadable Results of the Single-Lead ECG Studies
Regarding the unclassified/unreadable results, we also identified significant heterogeneity in the single-lead ECG group.The reported unclassified/unreadable results ranged from 0% the minimum [32,41,46,52,57] to 38% the maximum (Table 1) [54].We did not proceed to a formal subgroup analysis regarding the duration of the index test since most of the studies used it for 30 s (Figure 10).

Diagnostic Performance of Technologies Other Than PPG or Single-Lead ECG
As mentioned earlier, some of the studies tested technologies other than PPG and single-lead ECG.Due to there only being a few studies, we did not proceed to quantitative synthesis, but we have described them separately.Their characteristics are summarized in Table 1 and Figure 11.Regarding the unclassified/unreadable results, we also identified significant heterogeneity in the single-lead ECG group.The reported unclassified/unreadable results ranged from 0% the minimum [32,41,46,52,57] to 38% the maximum (Table 1) [54].

Diagnostic Performance of Technologies Other Than PPG or Single-Lead ECG
As mentioned earlier, some of the studies tested technologies other than PPG and single-lead ECG.Due to there only being a few studies, we did not proceed to quantitative synthesis, but we have described them separately.Their characteristics are summarized in Table 1 and Figure 11.
Finally, the study of Chen et al., 2020 [40], which was described in both the PPG and single-lead ECG groups, also tested the combination of both technologies.Specifically, during this test, the PPG mode was on, and if AF was detected, then participants were notified to perform a single-lead ECG.If the single-lead ECG was also positive for AF, then the result was considered positive.Otherwise, the final result was considered negative.The sensitivity for this mode was 80% (95% C.I. 72.52-85.90)and the specificity was 96.81% (95% C.I. 93.58-98.51).The study of Lown et al., 2018 [49], apart from single-lead ECG, tested three more devices in the same population.It tested the Watch BP device, which is a modified sphygmomanometer, and compared it with a 12-lead ECG.The resulting sensitivity was 96.34% (95% C.I. 89.68-99.24%)and the specificity was 93.45% (95% C.I. 90.25-95.85%).The same study tested two more devices that can detect AF by using heart rate variability.The Polar H7 device had a sensitivity of 96.34% (95% C.I. 89.68-99.24%)and specificity of 98.21% (95% C.I. 96.17-99.34%),and the Bodyguard 2 had a sensitivity of 96.34% (95% C.I. 89.68-99.24%)and a specificity of 98.51% (95% C.I. 96.56-99.52%).
Finally, the study of Chen et al., 2020 [40], which was described in both the PPG and single-lead ECG groups, also tested the combination of both technologies.Specifically, during this test, the PPG mode was on, and if AF was detected, then participants were notified to perform a single-lead ECG.If the single-lead ECG was also positive for AF, then the result was considered positive.Otherwise, the final result was considered negative.The sensitivity for this mode was 80% (95% C.I. 72.52-85.90)and the specificity was 96.81% (95% C.I. 93.58-98.51).

Discussion
In this metanalysis, the two main technologies used to automatically detect AF (PPG and single-lead ECG) demonstrated very high diagnostic accuracy.Although the PPG technology proved to be more sensitive than the single-lead ECG, their 95% confidence intervals were overlapping.On the other hand, the two technologies had equal specificity.
In the PPG group, we noticed that four studies [31,33,36,42] showed significantly lower specificity compared to the rest (Figure 3).A further review of the studies demonstrated that, in most cases, the duration of the index test was prolonged, which may increase the false positive results.On contrary, the prolonged period of the index test can decrease the unclassified/unreadable results, since most of the studies with 0% unclassified/unreadable results used the devices for a longer period of time, and specifically from 10 min [35] to 1 week [41].In the subgroup analysis between inpatients and outpatients in the PPG group, we did not observe any differences in the SROC curve; however, the small number of studies did not allow us to proceed to a quantitative synthesis.
In the single-lead ECG group, the lower pooled sensitivity could be partially explained by the lower duration of the index test.In most of the studies, it was applied for 30 to 60 s, compared to the PPG which was applied for at least 1 min.In addition, operation of a single-lead ECG requires action by the individual, and therefore unsupervised recordings could result in more poor-quality tracings.In this group, we performed two subgroup analyses.In the inpatients versus outpatients subgroup, the 95% confidence intervals were overlapping, and in the duration of the index test analysis (30 s vs. 60 s), we did not observe any clusters in the SROC curve.In relation to the unclassified/unreadable results, we observed significant heterogeneity in this group as well.Similarly with the PPG group, we noticed that most of the single-lead ECG studies with 0% unclassified/unreadable results used the index test for a prolonged period of time and/or allowed multiple measurements.
In both of the above groups, risk of bias was high or unknown in the participants selection domain, mainly due to case-control design in combination with ambiguity of the selection process.The rest of the domains were deemed mostly low risk of bias, and the applicability of the diagnostic test was satisfactory.
Other technologies, such as the modified sphygmomanometer and the heart rate variability, demonstrated very high sensitivity and specificity in their respective studies; however, the data were not enough to conduct a metanalysis.The study of Chen et al., 2020 [40] is especially interesting, because it tested the combination of PPG and singlelead ECG.During this study, individuals were being tested by continuous PPG, and they were asked to perform a single-lead ECG only when the PPG outcome was "possible AF".Only if the single-lead ECG confirmed the diagnosis, then the individual was notified that they may suffer from AF.This study showed very high specificity but not as high sensitivity (~80%).Since more and more devices offer the possibility of both PPG and single-lead ECG, its combination can be proved valuable.All the technologies resulted in unclassified/unreadable results, which demonstrated significant heterogeneity among the studies.
Our findings are comparable with previous similar metanalyses [5][6][7]59] and suggest that widely available AI-based devices can accurately detect AF and can be used as a screening tool.So far, screening for AF is a controversial area.ESC guidelines support screening in targeted populations [3]; however, the American guidelines advise that the evidence is limited [60].Long-term continuous screening in high-risk populations proved effective in detection of AF in a randomized study [61].Another randomized trial showed that screening for AF led to fewer events for the combined primary outcome which included stroke, systemic embolism, bleeding leading to hospitalization and all-cause death [62].Simulation studies using contemporary screening methods in elderly populations showed that screening is cost-effective, reduces stroke episodes, but increases bleeding risk and events [63,64].
In this context, our findings suggest that easily accessible AI-based devices can be convenient and non-invasive tools for AF screening.Compared to the traditional methods, these devices allow long-term passive monitoring, which is a paramount advantage given the paroxysmal and often asymptomatic nature of AF.Also, it provides individuals with the opportunity to record a trace at any time which can be useful when, for example, they develop symptoms.Most importantly, the AI-based devices do not require a health care professional at the stage of rhythm diagnosis; therefore, the devices allow more time for physicians to focus on the rest of the management.

Strengths and Limitations
Our study was designed and conducted according to the PRISMA guidelines.The review was very extended, since it identified almost 15,000 studies, and more than 1000 studies were reviewed based on their full text.The screening was conducted by two blinded and independent investigators, and the statistical analysis was performed for the two main technologies separately.Furthermore, we proceeded to subgroup analysis and described technologies other than the main two.We also calculated the false results for different prevalence values, which eventually is directly applicable to daily clinical practice.
On the other hand, our study demonstrates certain weaknesses.First of all, part of our study's drawbacks arises from the limitations of the included studies.To start with the unclassified/unreadable results, there was significant heterogeneity among the studies.Many authors excluded them completely, some included them as false, and others included them as true or false depending on the reference standard.In our study, these results were excluded from the calculation of sensitivity and specificity and were described separately.Also, in many studies, atrial flutter and fibrillation were considered as the same disease, with the argument that their complications and treatment are very similar.However, others either excluded patients with atrial flutter completely or included them in the control group.Lastly, there was heterogeneity in the control groups, since some studies used only patients in sinus rhythm as control, and others used patients with any rhythm other than AF.Another issue was the use of multiple different devices and AI algorithms.On several occasions, the name or the version of the device and/or the AI algorithm were not even reported.Many authors tested the same devices with different algorithms, or they tested an amended version of the commercial algorithm.This heterogeneity constitutes a burden in the validation of devices and algorithms since it is difficult to appreciate the impact of their variability.
In addition, the executive part of our study appears to have certain limitations.First of all, the data extraction was performed mainly by one researcher, due to limited time and resources.Also, we had to amend our protocol, especially regarding the choice of reference standard test.Apart from the 12-lead ECG, we included other reference standard tests, since more tests are now accepted as gold standards for the diagnosis of atrial fibrillation.Furthermore, due to the complexity of diagnostic metanalysis, we could not proceed to more advanced statistical analyses, such as further subgroup and network metanalysis.Similarly, we did not calculate the reporting bias due to the complexity of metanalysis of diagnostic accuracy studies.

Conclusions
In summary, our findings support that both PPG and single-lead ECG devices have excellent sensitivity and specificity for the automated diagnosis of atrial fibrillation and can be used as screening tools.A prolonged period of monitoring may result in more false positive results, but less unclassified/unreadable outcomes.Further validation studies need to be conducted for alternative technologies, such as modified sphygmomanometry and combination of PPG and single-lead ECG.Further clinical trials are necessary to evaluate the cost-effectiveness, and risks and benefits, especially in younger populations where AI-based devices are widely available.

Figure 2 .
Figure 2. Assessment of risk of bias and applicability of the PPG studies.(Väliaho et al., 2019 (A): testing the AFEvidence algorithm; Väliaho et al., 2019 (B): testing the COSEn algorithm; Väliaho et al., 2021 (A): testing device performance when time interval between every measurement is 10 min; Väliaho et al., 2021 (B): testing device performance when time interval between every measurement is 20 min; Väliaho et al., 2021 (C): testing device performance when time interval between every measurement is 30 min; Väliaho et al., 2021 (D): testing device performance when time interval between every measurement is 60 min; Dörr et al., 2019 (A): testing performance of device when recording for 1 min; Dörr et al., 2019 (B): testing performance of device when recording for 3 min; Dörr et al., 2019 (C): testing performance of device when recording for 5 min).

Figure 2 .
Figure 2. Assessment of risk of bias and applicability of the PPG studies.(Väliaho et al., 2019 (A): testing the AFEvidence algorithm; Väliaho et al., 2019 (B): testing the COSEn algorithm; Väliaho et al., 2021 (A): testing device performance when time interval between every measurement is 10 min; Väliaho et al., 2021 (B): testing device performance when time interval between every measurement is 20 min; Väliaho et al., 2021 (C): testing device performance when time interval between every measurement is 30 min; Väliaho et al., 2021 (D): testing device performance when time interval between every measurement is 60 min; Dörr et al., 2019 (A): testing performance of device when recording for 1 min; Dörr et al., 2019 (B): testing performance of device when recording for 3 min; Dörr et al., 2019 (C): testing performance of device when recording for 5 min).

J 30 Figure 10 .
Figure 10.Single-lead ECG group: subgroup analysis (duration of index test).3.3.6.Unclassified/Unreadable Results of the Single-Lead ECG Studies