The Risk of Gastrointestinal Bleeding between Non-Vitamin K Antagonist Oral Anticoagulants and Vitamin K Antagonists in the Asian Atrial Fibrillation Patients: A Meta-Analysis

Background: Non-vitamin K antagonist oral anticoagulants (NOACs) are more commonly used to prevent atrial fibrillation (AF) patients from thromboembolic events than vitamin K antagonists (VKAs). However, the gastrointestinal bleeding (GIB) risk in the Asian AF patients associated with NOACs in comparison with VKAs remained unaddressed. Materials and Methods: A systematic search of studies on NOACs and VKAs in the Asian AF patients was conducted in PubMed, Cochrane Library, and ClinicalTrials.gov. The primary outcome was the hazard ratio (HR) of any GIB associated with NOACs versus VKAs. The secondary outcome was the GIB risks in different kinds of NOACs compared with VKAs. Results: This meta-analysis included two randomized controlled trials (RCTs) and four retrospective studies, comprising at least 200,000 patients in total. A significantly lower HR of GIB risks was found in all kinds of NOACs than VKAs in the Asian AF patients (HR: 0.633; 95% confidence interval: 0.535–0.748; p < 0.001). Additionally, the GIB risks of different NOACs were apixaban (HR: 0.392), edoxaban (HR: 0.603), dabigatran (HR: 0.685), and rivaroxaban (HR: 0.794), respectively. Conclusions: NOACs significantly reduced the risk of GIB in the Asian AF patients compared with VKAs. In the four NOACs compared with VKAs, apixaban probably had a trend of the least GIB risk. We need further head-to-head studies of different NOACs to confirm which NOAC is the most suitable for Asian AF patients and to know the optimal dosage regimen of different NOACs.


Introduction
The overall atrial fibrillation (AF) prevalence is about 1% worldwide, and nearly 10% in populations older than 80 years old [1]. Stroke prevention in the AF patients is an important issue because patients with AF have an approximately five times higher risk of stroke than those without AF [2,3]. The resulting mortality and bed-ridden status bring plenty of problems in terms of medical expenditure and long-term care [4].
Non-vitamin K antagonist oral anticoagulants (NOACs) involve dabigatran, which inhibits thrombin, and rivaroxaban, apixaban, and edoxaban, which inhibit factor Xa. NOACs have some advantages, such as minor drug-food or drug-drug interactions and no need for laboratory monitoring. Besides, Asian AF patients under vitamin K antagonists (VKAs, warfarin) use easily encountered bleeding events and would seldom reach an optimal international normalized ratio control when taking VKAs.
PubMed, Cochrane Library, and ClinicalTrials.gov were searched from inception to December 2019 without publication date restriction for the studies which are relevant to GIB between NOACs and VKAs in the Asian AF patients. The bibliographies of the included trials and related review articles were manually reviewed for relevant references. Any duplicated records, titles not compatible with our population/intervention/control/outcome (PICO), case report, or abstract were excluded. We ruled out studies without GIB data or appropriate statistical methods, and removed the literature probably extracted from the identical database, the same population, non-Asians, or not fully Asians. We investigated studies with a GIB risk evaluation and employing the Asian AF patients receiving different kinds of NOACs or VKAs. The search strategy (File S2) comprised the following keywords variably combined with rivaroxaban, Xarelto, dabigatran, Pradaxa, apixaban, Eliquis, edoxaban, Lixiana, novel oral anticoagulant, new oral anticoagulant, direct oral anticoagulant, NOAC, DOAC, novel, new, oral, anticoagulant, coagulant, oral anticoagulant, OAC, antithrombin, thrombin, factor Xa inhibitor, Xa inhibitor, factor IIa inhibitor, IIa inhibitor, Non-vitamin K antagonist, gastrointestinal bleeding, GIB, GI bleeding, gastrointestinal, bleeding, gastrointestinal hemorrhage, GI hemorrhage, hemorrhage, warfarin, vitamin K antagonist, VKA, atrial fibrillation, AF, Afib, Asia, Asian, Taiwan, Taiwanese, China, Chinese, Abkhazian, Iran, Palestine, Afghanistan, Iraq, Akrotiri, Dhekelia, Israel, Philippines, Japan, Japanese, Qatar, Armenia, Jordan, Azerbaijan, Kazakh, Bahrain, North Korea, Russia, Bangladesh, Korea, Korean, Saudi Arabia, Bhutan, Kuwait, British Indian Ocean Territory, Kyrgyz, Singapore, Sri Lanka, Brunei, Laos, Syria, Cambodia, Lebanon, Tajik, Macao, Thailand, Christmas Island, Malaysia, East Timor, Cocos, Maldives, Turkey, Turkmenistan, Cyprus, Mongolia, Aliani, Arab, Egypt, Myanmar, Uzbek, Georgia, Nepal, Vietnam, Hong Kong, Oman, Yemen, India, Indian, Pakistan, and Indonesia. Randomized controlled trials (RCTs) and comparative retrospective studies were included. All retrieved studies were required to comprise at least two treatment arms, one of which was NOACs and the other was warfarin. The target population was Asian patients who had AF. Studies that explored the adverse events of NOACS or VKAs and the detailed sites of GIB were beyond the scope of the present meta-analysis.

Data Extraction and Quality Assessment
Two reviewers examined all the retrieved articles and extracted data by using a predetermined form. We recorded the first author, published year, type of interventions, study design, number of patients, average age, data source, country, outcome, GIB hazard ratio (HR), and 95% confidence interval (CI). The methodological quality of enrolled studies was evaluated by two independent reviewers. We used the risk of bias tools (ROB 2.0) for the RCTs [14]. We applied the risk of bias in non-randomized studies of interventions (ROBINS-I) for the retrospective studies [15]. ROB 2.0 evaluates the methodology of RCTs according to five domains, indicating low risk, some concerns, and high risk: bias arising from the randomization process, bias due to deviations from intended interventions, bias due to missing outcome data, bias in measurement of the outcome, and bias in selection of the reported result. ROB 2.0 excluded "other bias" from the previous version.
The ROBINS-I contains three subgroups, including pre-intervention, at-intervention, and post-intervention. Pre-intervention emphasizes bias due to confounding and bias in selection of participants into the study. At-intervention highlights bias in classification of interventions. Post-intervention underlines bias due to deviations from intended interventions, bias due to missing data, bias in measurement of outcomes, and bias in selection of the reported result. The discrepancies between the reviewers were discussed under the supervision of the other authors.

Data Synthesis and Analysis
Before pooling RCTs and retrospective studies, we performed meta-regression analysis to assess the potential difference of reported outcomes between RCTs and retrospective studies. We pooled RCTs and retrospective studies and then separated them into RCTs and retrospective studies subgroups. A random-effects model was employed to pool individual HR and 95% CI of any GIB of NOACs or VKAs users as the primary outcome. The data was extracted from the visual analog scales evaluated. HR less than 1 indicated NOACs to be a favorable treatment option. Different kinds of NOACs compared with VKAs causing GIB comprised the secondary outcome. The Forrest plot was applied to measure the primary and secondary outcomes. All analyses were performed using Comprehensive Meta-Analysis (CMA) software, version 3 (Biostat, Englewood, NJ, USA). I-squared tests were used to determine the between-trial heterogeneity. Funnel plots and Egger's test were used to examine the potential publication bias [16]. A p-value less than 0.05 defined statistical significance, except for the determination of the publication bias, which employed p less than 0.10.

GRADE System, Meta-Regressions, and Sensitivity Analyses
The GRADE system was used to grade the quality of evidence [17]. The GRADE system judged evidence of having a lower quality if there were study limitations, inconsistency, indirectness, imprecision, or publication bias. Large effect, dose-response, or plausible confounders were factors that caused higher quality. We made meta-regressions to examine the important and common covariates which might influence the outcomes. We also performed sensitivity analyses by excluding one study at a time and calculating the pooled HRs.

Risk of Bias in Enrolled Articles
The risk of bias outcome was based on the Cochrane guidelines [14,15] and is summarized in Figure 2a,b.
The enrolled articles were mainly retrospective studies, belonging to the nonrandomized group. They had moderate to serious risk of bias due to confounding. They had serious performance in bias in selection of participants into the study. There was a low risk of bias in the classification of interventions and bias in missing data. Most studies were appraised as having a serious risk of bias due to deviations from intended interventions. All retrospective studies were evaluated as serious risk of bias in measurement of outcomes. Finally, all retrospective studies were appraised as moderate risk of bias in the selection of the reported result.
The two RCTs had low risk in bias arising from the randomization process, bias due to deviations from intended interventions, bias due to missing outcome data, bias in measurement of the outcome, and bias in the selection of the reported result.

Gastrointestinal Bleeding Comparison and Possible Moderators
After adjusting for the study design with meta-regression, the heterogeneity variance was not decreased (pooled: 0.0393; adjusted: 0.0401), which hinted that the mean treatment effect was not different between RCTs and retrospective studies. The estimated difference between these two designs was also not statistically significant (odds ratio (OR) = 1.43; 95% CI: 0.8033 to 2.5459; p-value = 0.2241). As shown in Table 2, the outcome after pooling four retrospective studies and two RCTs showed a significantly lower GIB risk of NOACs than VKAs in the Asian AF patients (HR: 0.633; 95% CI, 0.535 to 0.748; p < 0.001; I 2 : 61.6%) (Figure 3a). The retrospective subgroup also showed a significantly lower GIB risk of NOACs than VKAs (HR: 0.610; 95% CI, 0.509 to 0.730; p < 0.001; I 2 : 68.9%). However, the RCT subgroup revealed a trend toward less GIB risk for NOAC users but did not show statistical significance (HR: 0.864; 95% CI, 0.529 to 1.409; p = 0.557; I 2 : 0%) (Figure 3b).  The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low certainty: Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect. Very low certainty: We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect. a Downgraded due to risk of bias. b Downgraded due to inconsistency. c Downgraded due to imprecision. d Downgraded due to indirectness. e Downgraded due to publication bias. f Upgraded due to large effect. g Upgraded due to dose-response. h Upgraded due to plausible confounders.   The further analysis showed a ranking from lower to higher risk of NOACs compared with VKAs was apixaban (HR: 0.392; 95% CI, 0.173 to 0.890; p = 0.025; I 2 : 82.0%) (Figure 4a   An Egger's test revealed no significant publication bias (p: 0.3584). There was neither no publication bias in the retrospective studies, RCTs, edoxaban, rivaroxaban, and dabigatran subgroups. Due to rare articles for apixaban, the publication bias could not be evaluated. The funnel plot showed standard error and log hazard ratio for overall NOACs ( Figure 5).

GRADE for Overall
As shown in Table 2, there was a significantly lower hazard ratio for GIB of NOACs than VKAs in the Asian AF patients with moderate heterogeneity (I 2 : 61.6%). Because the overall meta-analysis included a large proportion of retrospective studies, the study limitations downgraded the quality of evidence to low. The overall risk of bias was serious. Imprecision and indirectness did not exist. The publication bias was not likely according to the Egger's test showing a 2-tailed p-value 0.3584, which was > 0.1. The HR was more than 0.5, so the large effect was not prominent. Because NOAC dosage regimen had a response in clinical conditions such as treatment efficacy and bleeding association, quality due to dose-response was upgraded. In retrospective studies, there were no obvious plausible confounders. The GRADE system showed very low certainty, which indicated An Egger's test revealed no significant publication bias (p: 0.3584). There was neither no publication bias in the retrospective studies, RCTs, edoxaban, rivaroxaban, and dabigatran subgroups. Due to rare articles for apixaban, the publication bias could not be evaluated. The funnel plot showed standard error and log hazard ratio for overall NOACs ( Figure 5).  An Egger's test revealed no significant publication bias (p: 0.3584). There was neither no publication bias in the retrospective studies, RCTs, edoxaban, rivaroxaban, and dabigatran subgroups. Due to rare articles for apixaban, the publication bias could not be evaluated. The funnel plot showed standard error and log hazard ratio for overall NOACs ( Figure 5).

GRADE for Overall
As shown in Table 2, there was a significantly lower hazard ratio for GIB of NOACs than VKAs in the Asian AF patients with moderate heterogeneity (I 2 : 61.6%). Because the overall meta-analysis included a large proportion of retrospective studies, the study limitations downgraded the quality of evidence to low. The overall risk of bias was serious. Imprecision and indirectness did not exist. The publication bias was not likely according to the Egger's test showing a 2-tailed p-value 0.3584, which was > 0.1. The HR was more than 0.5, so the large effect was not prominent. Because NOAC dosage regimen had a response in clinical conditions such as treatment efficacy and bleeding association, quality due to dose-response was upgraded. In retrospective studies, there were no obvious plausible confounders. The GRADE system showed very low certainty, which indicated

GRADE for Overall
As shown in Table 2, there was a significantly lower hazard ratio for GIB of NOACs than VKAs in the Asian AF patients with moderate heterogeneity (I 2 : 61.6%). Because the overall meta-analysis included a large proportion of retrospective studies, the study limitations downgraded the quality of evidence to low. The overall risk of bias was serious. Imprecision and indirectness did not exist. The publication bias was not likely according to the Egger's test showing a 2-tailed p-value 0.3584, which was >0.1. The HR was more than 0.5, so the large effect was not prominent. Because NOAC dosage regimen had a response in clinical conditions such as treatment efficacy and bleeding association, quality due to dose-response was upgraded. In retrospective studies, there were no obvious plausible confounders. The GRADE system showed very low certainty, which indicated that the confidence in the effect estimate was limited. The true effect might be substantially different from the estimate of the effect. However, there were few studies about the post-marketing evaluation of NOACs in Asia.

GRADE for RCT Subgroup
In RCTs, study limitation was mild due to some concerns in ROB 2.0 appraisal. Heterogeneity did not exist. Imprecision and indirectness were excluded. Publication bias was not evident, either. The HR was more than 0.5, so there was no large effect. Dose-response resulted in an upgrade for the quality of the evidence. There might be some plausible confounders in RCTs due to the short follow up duration for the enrolled patients, which could increase the HR and cause an upgrade for the quality of the evidence. We evaluated high certainty for the RCT group.

GRADE for Retrospective Studies Subgroup
Almost like the pooled outcome, there was a serious risk of bias which downgraded. Inconsistency (I 2 : 68.9%) existed. We found no obvious publication bias, indirectness, or imprecision. Large effect and plausible confounders were not present. Dose-response resulted in an upgrade for the quality of the evidence. The retrospective studies subgroup acquired very low certainty.

GRADE for the Subgroups of Different NOACs
We noticed that from the ranking of the HR of each NOAC, apixaban was known as the best for the GI safety profile. Its large effect (HR: 0.39) caused an upgrade for the quality of evidence, but publication bias was suspected due to the very low HR outcome and few GIB studies for apixaban. Study limitations existed in the four subgroups, whereas the heterogeneity was prominent only in the apixaban subgroup. Dose-response brought about an upgrade in all NOACs.

Meta-Regressions and Sensitivity Analyses
We examined several important and common covariates concerning the study outcomes. As shown in Table 3, the 2-sided p-value was >0.05 in age, female ratio, and publication year, which indicated that no covariates showed statistically significant associations in the meta-regression analyses. The meta-regressions of the log hazard ratio on age, female ratio, and publication year were shown in Figures S1-S3. Sensitivity analyses revealed the corresponding results did not change in the direction substantially. For example, when we excluded the study by Lee [54] (the study which carried the most weight) from our analysis, the HR remained statistically significant (HR: 0.572; 95% CI, 0.394 to 0.831; p = 0.003), and the Egger's test showed 2-tailed p-Value 0.8930 (no obvious publication bias). This analysis verified the consistency of the lower GIB risk of NOACs than of VKAs.

Discussion
This study was the first meta-analysis investigating the GIB risk associated with NOACs in Asian AF patients. It highlighted the real situations of GIB resulting from NOACs use in Asia. Our study recruited more than 200,000 patients. We used the rigorous article appraisal tools such as ROB 2.0 for RCTs and ROBINS-I for retrospective studies.
According to this study, overall, NOACs cause less risk of GIB than conventional VKAs. Among them, apixaban seemed to bring about the lowest risk of GIB compared with VKAs, though we could not evaluate the publication bias due to the limited number of enrolled studies. Rivaroxaban was noted with the highest risk of GIB in current analyses.
We set the primary outcome of "any" GIB instead of "major" GIB because clinical decision making is often affected by the GIB signs related to NOACs and VKAs. A major GIB analysis was also performed and the result was similar to any GIB ( Figure S4). Except for major GIB, physicians usually hold NOACs or VKAs if the other GIB conditions were suspected, not only focusing on major GIB. Therefore, our study was close to the realworld circumstances. Additionally, we excluded the studies which probably used the identical database or the same population to increase the validity of the meta-analysis. We recruited only Asian patients to emphasize the clinical practicality when physicians prescribe NOACs.
Our study revealed NOACs GI safety versus VKAs and presented different NOACs versus VKAs for the Asian AF population. Previous systematic reviews and meta-analyses reported mainly non-Asians and did not separate AF from the other diseases that also need NOACs treatment. One systematic review and meta-analysis revealed a similar risk of major GIB between NOACs and conventional anticoagulants [55]. However, it included patients with AF and venous thromboembolism (VTE). Among the AF patients, they were almost all non-Asians. The other systematic review and meta-analysis enrolling data from RCTs and real-world studies reported no significant difference in the risk of major GIB between the patients receiving NOACs and conventional anticoagulants. Rivaroxaban users had a 39% increase in the risk for major GIB [56]. However, the recruited patients were nearly from the non-Asia regions. Furthermore, they did not focus on the AF population and not consider the dosage difference between the enrolled studies. Another large-scale network meta-analysis showed that apixaban and edoxaban had the most favorable major GIB safety profile, while rivaroxaban and dabigatran were the least safe [57]. Although the primary outcome was similar to our study and the population was large, it did not focus on Asian AF population and might cause selection bias. We used a more precise statistical method such as HRs, which can represent instantaneous risk over the study period time, or some subset thereof. HRs suffer somewhat less from selection bias concerning the endpoints chosen and can indicate the risks that happen before the endpoint. Therefore, our study could offer a significant and favorable choice of NOACs for clinicians and give patients medical advice about the real GIB risk data.
Different from other areas of the world, NOACs are beneficial for the Asian population and result in less GIB risk. In the non-Asian population, the use of NOACs seems to cause a higher risk of GIB than the Asian population. Holster et al. revealed an increased risk of GIB among NOAC users compared with standard care (pooled OR = 1.45), although significant heterogeneity existed regarding the choice of drugs and the indications of anticoagulation [11]. The other meta-analysis recruiting mainly studies from the USA, New Zealand, and Europe revealed a slightly higher risk of GIB with dabigatran compared with VKAs. In contrast, no significant difference was found between rivaroxaban and VKAs for GIB risk [58]. Another meta-analysis showed that rivaroxaban, high-dose dabigatran, and edoxaban should not be prescribed to patients with high GIB risk [59]. However, this study did not solely enroll Asians.
We disclosed that overall NOACs presented better than VKAs in GIB for the Asian AF population. Previously, a new score system which was named "SAMe-TT2R2" could predict the quality of anticoagulation control among patients with AF on VKAs [60]. Based on Chan's study, the time in therapeutic range (TTR) decreased progressively with increasing SAMe-TT2R2 score (p: 0.016). When the cut-off value of SAMe-TT2R2 score was set at 2, the sensitivity and specificity to predict TTR < 70% were 85.7% and 17.8%, respectively [61]. In the Chinese AF patients, the SAMe-TT2R2 score has a good correlation with TTR. For example, a female Asian's SAMe-TT2R2 score is at least three, high in the baseline. Then low TTR could cause VKA-related GIB. Therefore, NOACs are a better choice for Asians than VKAs.
We recruited two RCTs in this meta-analysis. Only one enrolled RCT showed some concerns in bias due to deviations from intended interventions because the selected group using NOACs might have less GIB risk. It also showed some concerns in bias in the selection of the reported result. Nevertheless, these two RCTs were very significant for this study because there were few RCTs about NOACs GIB in Asia. Although RCTs have advantages in the GRADE system, they may not reflect the real situations of the AF population in Asia. First, RCTs with limited follow-up can potentially underestimate the long-term benefits of treatment and may fail to detect delayed hazards. A post-trial follow-up of RCTs, which means extended follow-up starting after the end of the scheduled period of the original trial is needed. It is essential not only to define the impact of a long-term intervention but also to ascertain the safety profile. Moreover, potential hazards may not be obvious during the duration of trial follow-up [62]. Second, RCTs usually pay attention to major GIB only, which might lead to an underestimation of all GIB. Third, we need real-world data to distinguish GIB risk from different NOAC because it is impossible to perform headto-head RCTs currently. In addition, the Asian AF patients only accounted for about 10% in the pivotal RCTs [54]. Therefore, we need to recruit the postmarketing studies and the retrospective observational studies for more accurate data. Besides, physicians might avoid prescribing NOACs for the patients at high risk of GIB in the real-world clinical conditions. Our study contained two RCTs and four retrospective studies, which was close to the real-world practice situations. They were also strictly evaluated by the current appraisal tools from the Cochrane system. There were still some biases in our enrolled retrospective studies. They had serious performance in bias in the selection of participants into the study, which was probably due to ICD codes not precise in the diagnosis from the database. There was a low risk of bias in the classification of interventions and bias in missing data because the clinical setting was prominent while GIB happened and the database was intact in several Asian countries/regions (Taiwan, Korea, Japan, China/Hong Kong, etc.). A serious risk of bias due to deviations from intended interventions was originated from unseen biases such as the methods of study design, patient's lifestyle and eating habits, body mass index, and alcohol/betel nut/smoking. A serious risk of bias in the measurement of outcomes happened because sometimes clinical conditions such as bloody sputum or food digestion color were mistaken as GIB. A moderate risk of bias in the selection of the reported result was evaluated because there might be some negative result data not reported. The definite conclusions could not be just based on these studies.
NOACs, mainly rivaroxaban and dabigatran, were considered more dangerous in GIB. However, the apixaban and the edoxaban observational studies re-defined the GIB risk. One study revealed the non-major bleeding (including GIB) was substantially less frequent in apixaban than in warfarin [69]. Another first head-to-head Korean study made a comparison of the effectiveness and safety between rivaroxaban and edoxaban and showed that edoxaban had a trend toward less GIB [70]. The results from these two studies were similar to our study. However, we still need more observational studies from other countries in Asia to establish the NOACs GI safety profile in the future.
Our studies had some limitations. First, not all Asian countries were included, and the results could not be applied to the whole Asian population. Second, the apixaban and the edoxaban head-to-head studies are still lacking because the marketed time was shorter than that of dabigatran and rivaroxaban. Third, we calculated the HR of GIB from different NOACs compared with VKAs, but our enrolled studies did not uniformly use the same definition of GIB event and did not describe the source of GIB at all. Finally, we did not focus on the meta-analyses of the different doses of NOACs versus VKAs for GIB risk because few studies included this concern. In our enrolled articles, only Yamashita et al. [49] and Chan et al. [51] had analyzed the GIB risk of standard-dose and low-dose NOACs compared with VKAs. The result of meta-analyses was shown in Figure S5, which suggested that low-dose NOACs was significantly associated with a lower risk of GIB than standard-dose NOACs compared with VKAs.

Conclusions
This meta-analysis revealed that NOACs could cause less GIB risk than VKAs. Among the NOACs compared with VKAs, apixaban was associated with the least risk of GIB. We need further comparative studies of different NOACs to confirm which NOAC has the best GI safety for the Asian AF patients and to determine the best dosage regimen of different NOACs.