Verification of the “Upward Variation in the Reporting Odds Ratio Scores” to Detect the Signals of Drug–Drug Interactions

The reporting odds ratio (ROR) is easy to calculate, and there have been several examples of its use because of its potential to speed up the detection of drug–drug interaction signals by using the “upward variation of ROR score”. However, since the validity of the detection method is unknown, this study followed previous studies to investigate the detection trend. The statistics models (the Ω shrinkage measure and the “upward variation of ROR score”) were compared using the verification dataset created from the Japanese Adverse Drug Event Report database (JADER). The drugs registered as “suspect drugs” in the verification dataset were considered as the drugs to be investigated, and the target adverse event in this study was Stevens–Johnson syndrome (SJS), as in previous studies. Of 3924 pairs that reported SJS, the number of positive signals detected by the Ω shrinkage measure and the “upward variation of ROR score” (Model 1, the Susuta Model, and Model 2) was 712, 2112, 1758, and 637, respectively. Furthermore, 1239 positive signals were detected when the Haldane–Anscombe 1/2 correction was applied to Model 2, the statistical model that showed the most conservative detection trend. This result indicated the instability of the positive signal detected in Model 2. The ROR scores based on the frequency-based statistics are easily inflated; thus, the use of the “upward variation of ROR scores” to search for drug–drug interaction signals increases the likelihood of false-positive signal detection. Consequently, the active use of the “upward variation of ROR scores” is not recommended, despite the existence of the Ω shrinkage measure, which shows a conservative detection trend.


Introduction
To ensure the proper use of drugs, it is important to understand the related adverse events. However, pre-marketing randomized clinical trials focus on establishing the safety and efficacy of a single drug, rather than investigating drug-drug interactions. Therefore, patients using other drugs along with the drug being studied in a clinical trial are excluded from an investigation. However, unlike pre-marketing studies, in actual clinical practice, multiple drugs are generally used for treatment.
Recent reports have estimated that the proportion of adverse events caused by drugdrug interactions is approximately 30% of unexpected adverse events [1]. Considering the numerous reports on polypharmacy in treatment in recent years [2][3][4][5][6], early identification of adverse events that may be caused by drug-drug interactions is an important issue that should be addressed.
Spontaneous reporting systems, which play an important role in pharmacovigilance, are a source of information for the detection of previously unknown adverse events not identified in clinical trials, including adverse events caused by drugs in post-marketing.
Using spontaneous reporting systems, there are several reports of safety assessments that reflect real-world use in specific populations and clinical practice. However, the databases used in spontaneous reporting systems contain only cases of adverse events caused by the use of drugs and do not include the number of users of the drugs; therefore, the incidence of adverse events cannot be calculated. Thus, instead of incidence, the disproportionality analysis signals have been used to search for unknown adverse events [7][8][9]. The disproportionality analysis focuses on differences in the proportion of adverse event reports. If the reporting rate of the medicinal product of interest is high compared to the average reporting rate for all other medicinal products, this indicates that "the medicinal product and the adverse event have an association" [10]. For this analysis, a number of algorithms have also been reported to search for signals of drug-drug interactions [11,12]. Among these, the Ω shrinkage measure [13][14][15], proposed by Noren et al., is used by the World Health Organization Uppsala Monitoring Center (WHO-UMC), and previous studies have shown that it has the most conservative signal detection trend among signal detection methods based on frequentist statistics [16]. This study defined a conservative signal detection algorithm as one that detects few signals that are specific to that detection algorithm, and many signals that are common to other algorithms.
Surprisingly, however, there have only been few papers using the Ω shrinkage measure [17]; rather, several reports have used a detection method that extends the reporting odds ratio (ROR) score [18][19][20]. In addition, there is also a report that evaluated the ROR score of the concomitant use [21]. However, the validity of the analysis method using the ROR score has been questioned by Kuss et al., and it was discussed together with the results using another detection algorithm [22].
The ROR is an algorithm used by the Pharmaceuticals and Medical Devices Agency (PMDA) in Japan and the Pharmacovigilance Center (Lareb) in the Netherlands, which generally searches for adverse events caused by a single drug rather than drug-drug interactions [23].
A detection method that expands the ROR score evaluates a combination as having an increased risk of adverse events if the ROR score for the combination of drugs is higher than the ROR score for a single drug (i.e., if there is an upward variation in the ROR score). Although Susuta et al. were the first to propose a detection method that uses the "upward variation in the ROR score" as the signals for drug-drug interactions in Japan [18], the proposed cutoff values of the scores and the comparison with other methods for detecting signals of drug-drug interactions have not been sufficiently verified, and the validity of the signals detected by this analysis method is unknown. Therefore, in this study, we aimed to verify an analysis method that uses the "upward variation in the ROR scores" as the signals for drug-drug interactions, referring to similar previous studies that evaluated the detection tendency of search algorithms [16,[24][25][26].

Data Sources
The dataset used for validation was the same dataset used in our previous study [16]. The dataset was created from the Japanese Adverse Drug Event Report database (JADER) data from the first quarter of 2004 to the fourth quarter of 2015. The JADER consists of four comma-separated values (csv) files as data tables: DEMO.csv (patient information), DRUG.csv (medicine information), HIST.csv (patient past history), and REAC.csv (AE event information). These files were linked by identification numbers, and the four csv files were combined using the identification numbers to create a database for analysis. In this study, we did not use patient information and past history, because we were only investigating the differences in detection trends between algorithms. Therefore, we removed this information when creating the verification dataset.
The Japanese authority, the PMDA, which owns these data, does not permit sharing the data directly. The latest version of the file can be accessed directly here: (http:// www.info.pmda.go.jp/fukusayoudb/CsvDownload.jsp (accessed date: 10 August 2021)) (in Japanese only).

Targeted Drugs and Adverse Events
The drugs registered as "suspected drugs" in the verification dataset were considered as the drugs to be investigated. As in previous studies [16,18,[24][25][26], the only adverse event targeted for this signal search was set to be the preferred term; that is, "Stevens-Johnson syndrome" (SJS) in the Medical Dictionary for Regulatory Activities Japanese version (MedDRA/J), which is registered in the verification dataset.

Statistical Models and Criteria
The Ω shrinkage measure that detected the most conservative signal [16] is used as the control model in this study. The score of the Ω shrinkage measure was calculated from the number of reports (n 111 ) and its expected value (E 111 ), as shown in Table 1, using Equations (1)-(6). The lower limit of the 95% confidence interval (CI) for the Ω shrinkage measure was Ω 025 , and a positive signal was considered to exist when Ω 025 > 0, as previously reported [13].
where n 111 is the number of reports and E 111 is the expected value.
where n is the number of reports shown in Table 1. For example, n 111 is the number of reported target adverse event caused by drug D 1 and drug D 2 .
when f 10 < f 00 , which denotes no risk of an adverse event caused by drug D 1 , the most sensible estimator g 11 = max (f 00 , f 01 ) is yielded and vice versa when f 01 < f 00 .

Target AE Other AEs Total
drug D 1 and drug D 2 n 111 n 110 n 11+ drug D 1 without drug D 2 n 101 n 100 n 10+ drug D 2 without drug D 1 n 011 n 010 n 01+ Neither drug D 1 or drug D 2 n 001 n 000 n 00+ Total n ++1 n ++0 n +++ AE: adverse event, n: the number of reports (e.g., n 111 : the number of target drugs (drug D 1 and drug D 2 ) induced AE, n +++ : the number of all reports).

Upward Variation in Reporting Odds Ratio Scores
The ROR scores were calculated using the data from the number of reports shown in Table 2 and Equations (7) and (8). The lower limit of the 95% CI for ROR was ROR 025 , and the upper limit was ROR 975 [23].
ROR (95% CI) = e ln (ROR)±1.96 N 00 (8) where N is the number of reports shown in Table 2. For example, N 11 is the number of reported target adverse events caused by drug D 1 and drug D 2 .
The signal of drug-drug interactions detected using ROR has been recently reported in Japan as "the lower limit of the 95% CI of the ROR score when drug D 1 and drug D 2 are used together (ROR 025 drug D1 ∩ drug D2 ) > 1" and "when drug D 1 and drug D 2 are used together (ROR drug D1 ∩ drug D2 ) is greater than either the ROR score of drug D 1 (ROR drug D1 ) or the ROR score of drug D 2 (ROR drug D2 ), whichever is greater (Model 1)." The ROR score ratio (Model 1) is calculated using Equation (9).
Susuta et al. proposed "(ROR 025 drug D 1 ∩ drug D 2 ) > 1" and "ROR score ratio (Model 1) > 2" as signal detection criteria for drug-drug interactions [18]. This was evaluated as the "Susuta model." However, Model 1 and the Susuta model failed to account for the overlap between the lower 95% CI of the ROR score for drugs D 1 and drug D 2 together and the upper 95% CI of the ROR score for either drug D 1 or drug D 2 , whichever is greater (Figure 1).

Figure 1.
The association between the ROR score ratio (Model 1, Susuta model) and disproportionality score.
If such an overlap occurs, a risk signal may not be detected for concomitant use. Therefore, with reference to the interaction signal score (INTSS) [27] and concomitant signal score (CSS) [26], we used the criteria to detect if "(ROR025 drug D1 ∩ drug D2) > 1" and "the lower limit of the 95%CI of the ROR of the combination of drug D1 and drug D2 (ROR025 drug D1 ∩ drug D2) is greater than either the upper limit of the 95% CI of drug D1 (ROR975 drug D1) or the upper limit of the 95% CI of the ROR of drug D2 (ROR975 drug D2), whichever is greater (Model 2)" (Figure 2, Equation (10)). ROR score ratio model 2 ROR ∩ max ROR , ROR (10) Figure 2. The association between the ROR score ratio (Model 2) and disproportionality score.
However, in the estimation of the ROR, if any one of the four cells in Table 2 is 0, the ROR will be 0 or ∞. In addition, if there is zero in the perimeter sum, the ROR cannot be defined. Furthermore, when estimating ROR when the sample size in a cell is small, as seen in a few reports and in this study, the effect of a single case in a small sample cell is very large, making the estimation unstable [28]. The Haldane-Anscombe 1/2 correction, which adds 1/2 to each cell, is known to solve this problem [29]. In this study, among the detection methods (Model 1, Susuta Model, and Model 2) that utilize the "upward variation in the ROR scores," the statistical model that showed the most conservative detection tendency was also analyzed with the Haldane-Anscombe 1/2 correction. If such an overlap occurs, a risk signal may not be detected for concomitant use. Therefore, with reference to the interaction signal score (INTSS) [27] and concomitant signal score (CSS) [26], we used the criteria to detect if "(ROR 025 drug D1 ∩ drug D2 ) > 1" and "the lower limit of the 95% CI of the ROR of the combination of drug D 1 and drug D 2 (ROR 025 drug D1 ∩ drug D2 ) is greater than either the upper limit of the 95% CI of drug D 1 (ROR 975 drug D1 ) or the upper limit of the 95% CI of the ROR of drug D 2 (ROR 975 drug D2 ), whichever is greater (Model 2)" (Figure 2, Equation (10)).
If such an overlap occurs, a risk signal may not be detected for concomitant use. Therefore, with reference to the interaction signal score (INTSS) [27] and concomitant signal score (CSS) [26], we used the criteria to detect if "(ROR025 drug D1 ∩ drug D2) > 1" and "the lower limit of the 95%CI of the ROR of the combination of drug D1 and drug D2 (ROR025 drug D1 ∩ drug D2) is greater than either the upper limit of the 95% CI of drug D1 (ROR975 drug D1) or the upper limit of the 95% CI of the ROR of drug D2 (ROR975 drug D2), whichever is greater (Model 2)" (Figure 2, Equation (10)). However, in the estimation of the ROR, if any one of the four cells in Table 2 is 0, the ROR will be 0 or ∞. In addition, if there is zero in the perimeter sum, the ROR cannot be defined. Furthermore, when estimating ROR when the sample size in a cell is small, as seen in a few reports and in this study, the effect of a single case in a small sample cell is very large, making the estimation unstable [28]. The Haldane-Anscombe 1/2 correction, which adds 1/2 to each cell, is known to solve this problem [29]. In this study, among the detection methods (Model 1, Susuta Model, and Model 2) that utilize the "upward variation in the ROR scores," the statistical model that showed the most conservative detection tendency was also analyzed with the Haldane-Anscombe 1/2 correction. However, in the estimation of the ROR, if any one of the four cells in Table 2 is 0, the ROR will be 0 or ∞. In addition, if there is zero in the perimeter sum, the ROR cannot be defined. Furthermore, when estimating ROR when the sample size in a cell is small, as seen in a few reports and in this study, the effect of a single case in a small sample cell is very large, making the estimation unstable [28]. The Haldane-Anscombe 1/2 correction, which adds 1/2 to each cell, is known to solve this problem [29]. In this study, among the detection methods (Model 1, Susuta Model, and Model 2) that utilize the "upward variation in the ROR scores", the statistical model that showed the most conservative detection tendency was also analyzed with the Haldane-Anscombe 1/2 correction.

Targeted Drugs and Adverse Events
The signal similarity of each statistical detection method was evaluated using Cohen's kappa coefficient (κ), proportionate agreement for the positive rating (P pos ), and proportionate agreement for the negative rating (P neg ), as reported in previous studies [16,24]. Cohen's kappa coefficient and its 95% CI were obtained from Table 3 and Equations (11)-(14) [30].
The number of N yy , N y. , N y. , N nn , N n. , N .n and N . can be obtained from Table 3.

Cohen s kappa coe f f icient
95% CI o f kappa coe f f icient = κ ± 1.96 The P pos and P neg can be obtained from Equations (15) and (16) and Table 3 [27].
P pos = 2N yy N y. + N .y (15) P neg = 2N nn N n. + N .n (16) However, the number of N yy , N y. , N y. , N nn , N n. and N .n can be obtained from Table 3. Table 4 shows the number of reports (n 111 ) and the number of signals detected for each of the Ω shrinkage measure, Model 1, the Susuta model, Model 2, and Model 2 (Haldane-Anscombe 1/2 correction).

Statistical Models Signal (Y/N) Number (%) of Combinations
However, the similarity of the detection trend of Model 2 decreased when the Haldane-Anscombe 1/2 correction was applied (κ: 0.476; 95% CI: 0.459-0.492; P pos : 0.597; P neg : 0.867). Figure 3 shows the relationship between the number of reports (n 111 ), the ROR score (ROR drug D1 ∩ drug D2 ), and the ROR score ratio when two drug D 1 and drug D 2 were used in combination.
In cases where the number of reports (N 11 = [n 111 ]) was less than 10, there were many cases where the ROR drug D1 ∩ drug D2 exceeded 50, and the maximum score was over 700. The ROR score ratio (Model 1) was also inflated by the inflation of the ROR drug D1 ∩ drug D2 , the maximum score of ROR score ratio (Model 1) was over 450 (Figure 3).

Discussion
In this study, we examined a method of using the "upward variation in the ROR score" as a signal of drug-drug interactions. Strictly, detection trends should be compared for all adverse events registered in the database. Unfortunately, even with fast and powerful computers, calculating signal scores for all combinations of multiple drugs and adverse events can be expected to take an enormous amount of time; thus, targeting all combinations was not a realistic research method. Therefore, in this study, as in a previous study [16,18,24], the target adverse event was SJS.
The Ω shrinkage measure detected 712 positive signals [16]. The detection method with the most similar detection tendency as the Ω shrinkage measure was Model 2 (Table 5), with 392 positive signals in common. However, among the detection methods that utilize the "upward variation in ROR score", even in Model 2, which showed the most conservative detection tendency, even one of the four cells in Table 2 became a 0 cell, and there were 1160 pairs in which the ROR score ratio could not be calculated ( Table 4). The problem of negative signals due to the inability to calculate this score was solved by Haldane-Anscombe 1/2 correction; however, the number of positive signals increased by 602 from that before the correction, for a total of 1239 (Table 4). This result indicated the instability of the positive signal detected in Model 2.
Furthermore, even Model 2, which had the highest similarity to the Ω shrinkage measure in this study, had a similarity lower than that between the Ω shrinkage measure and the Chi-square model [31] in a previous study using the same verification database and the same targeted adverse event [16].
The signals obtained from JADER, the database used in this study, require verification to confirm that they are true adverse events, and true data are needed to evaluate the validity of the detection results. However, it is not possible to prepare true data, including the data of "unknown" adverse events.
In the verification of a signal detection method for a single drug, Szarfman et al. reported [32] that the information in the medical package inserts was set as the "true" data and evaluated using receiver operating characteristic (ROC) curves with different cutoff values for signal scores [32]. However, as Watanabe et al. pointed out, it is unclear whether the information in the package insert is the only "true" data, and this verification method has its limitations [33].
In fact, considering that signal detection methods are designed to search for "unknown" adverse events, it is not appropriate to evaluate the performance of detection methods using only "known" information. Thus, following the combinations of drug-drug interactions described in the medical package insert, as in Kubota et al. and our previous studies [16,24], we only compared the detection trends of each statistical model in this study. This study is not affected by patient background, as it only shows the difference between the calculated results of each statistical model and the interpretation of the signal scores. This is the same as previous studies [16,18,24].
However, this limitation of not being able to provide "true" data makes it difficult to determine whether the signal detection results from the detection method using the "upward variation in ROR score" are overestimated or whether the signal detection results from the Ω shrinkage measure are underestimated.
This study was conducted under limited conditions and may require further investigation using simulation data. However, unlike the Bayesian confidence neural network (BCPNN) [34] and the empirical Bayes geometric mean (EBGM) [35] based on Bayesian statistical methods, the ROR based on frequency-based statistical methods is prone to signal score inflation when the number of reports is small [36], leading to unstable detection results [37].
In general, the number of reports for the combination of two drugs (drug D 1 ∩ drug D 2 ) will be less than the number of reports for single drug use (drug D 1 or drug D 2 ); therefore, the ROR score (ROR drug D1 ∩ drug D2 ) is likely to be inflated. In fact, in this study, there were many cases of inflation of ROR drug D1 ∩ drug D2 , as shown in Figure 3. The ROR drug D1 ∩ drug D2 is a numerator in the equation of Model 1 and the Susuta model, and such signal score inflation may make it easier to detect false-positive signals of drug-drug interactions.
Further, it is known that the 95% CIs are wider when the number of reports is small, and in this study, the method that did not take into account the overlap between the 95% CI of the signal score for the combination of two drugs (drug D 1 ∩ drug D 2 ) and the 95% CI of the signal score for single drug (drug D 1 or drug D 2 ) use (Model 1, Susuta model) resulted in a higher likelihood of detecting false-positive signals than Model 2, the method that took into account the overlap.
The results indicated that when the number of reports was small, as in the case of drug-drug interactions, it was important to consider the overlap between the 95% CI of the signal score for the combination of two drugs (drug D 1 ∩ drug D 2 ) and the 95% CI of the signal score for single drug use (drug D 1 or drug D 2 ) when calculating the ROR score ratio.
The spontaneous reporting systems, which are used for disproportionality analysis, consist only of spontaneously "reported" cases and, naturally, do not include those that occur but are not reported. Additionally, reports are known to contain a variety of biases [38][39][40]. Therefore, the calculated signal score is also affected by the biases. Spontaneous reports often lack information on concomitant medications, leading to underestimation of the drug-drug interaction signal score in some cases. Thus, the use of any statistical analysis method cannot overcome the inherent qualitative and quantitative limitations of spontaneous reporting systems [37].
The ROR is simple to calculate, and utilizing the "upward variation in the ROR score" may speed up the detection of drug-drug interaction signals. However, even Model 2, which showed the most conservative detection tendency among the detection methods utilizing the "upward variation in the ROR score", did not show stable signal detection results due to the small number of reports, making it difficult to deny the overestimation of positive signals of drug-drug interactions, as in Model 1 and the Susuta model.
It is known that there are many attentive points when analyzing a spontaneous reporting database, and various analysis algorithms have been proposed [10]. Considering the history of the development of the BCPNN and the EBGM based on Bayesian statistical models [37] to avoid signal score inflation that detects a large number of false-positive signals in the detection of single drug signals, even though there is not only the Ω shrinkage measure used in WHO-UMC, but also several alternative detection methods for signal detection of drug-drug interactions [10,11], there is no reason to actively recommend the use of "upward variation in the ROR score", which is more likely to detect false-positive signals.

Conclusions
Recently, many patients have been concomitantly using drugs, and in order to use drugs appropriately, it is necessary to screen not only for single drugs but also for safety signals such as drug-drug interactions. In this study, Model 2, which corrects the problems contained in Model 1 by referring to INTSS and CSS, was also examined. However, the ROR scores based on the frequency-based statistics are easily inflated; thus, the use of the "upward variation of ROR scores" in either statistic model to search for drug-drug interaction signals increases the likelihood of false-positive signal detection. Although, some researchers have used "upward variation of ROR scores" (the active use of this algorithm is not recommended), because of the existence of the Ω shrinkage measure, which shows a conservative detection trend. In order to reduce false-positive signals, the selection of appropriate detection algorithms is desired.