1. Introduction
In the hyperacute treatment of patients with acute ischemic stroke (AIS), the prompt identification of emergent large vessel occlusion (ELVO) and endovascular treatment (EVT) for rapid recanalization are key to improving outcomes [
1]. Thus, thrombectomy-capable centers optimize their hyperacute stroke imaging protocols to identify ELVO [
2]. However, not all patients attend thrombectomy-capable centers, and many present to primary stroke centers or primary hospitals [
3].
The reported median onset-to-revascularization time for ELVO patients in the United States was shorter for direct versus transfer patients, resulting in lower rates of functional independence in the transfer group [
4]. Good functional outcomes may rely on shorter transfer times and faster reperfusion in the interhospital transfer system [
5,
6]. However, there are multiple barriers to this, such as imaging acquisition time, imaging protocol differences across primary centers, and inadequate care from primary physicians or emergency medical technicians. As non-contrast computed tomography (NCCT) is routinely performed in nearly all hospitals for patients suspected of stroke, an artificial intelligence (AI)-based clinical decision support system that can identify ELVO and EVT candidacy based on NCCT images may help streamline interhospital transfers. Furthermore, NCCT may be performed even if the patient only has minor symptoms, even in thrombectomy-capable centers. ELVO identification based on NCCT may also facilitate the streamlining of in-hospital processes.
In this clinical trial, we aimed to validate diagnostic performance improvement with the assistance of an AI-based clinical decision support system that identifies ELVO and EVT candidacy based on NCCT (Heuron ELVO) by clinicians of varying specialties and expertise. In particular, this trial was designed to validate the performance and safety of Heuron ELVO for its approval by the South Korean Ministry of Food and Drug Safety.
2. Methods
2.1. Study Ethics
The protocols for this clinical trial were approved by the South Korean Ministry of Food and Drug Safety (KFDS-1283, 2021). Data collection and evaluation were approved by the Institutional Review Board of Gachon University Gil Medical Center (GCIRB2022-002). The ethical standards of the 1964 Declaration of Helsinki and its later amendments were implemented, and the need for written informed consent was waived owing to the retrospective data collection method used.
2.2. Heuron ELVO for the Identification of Emergent Large Vessel Occlusion
Heuron ELVO is a computer-aided triage and notification software that analyzes NCCT brain images. It is intended to assist hospital networks and trained stroke experts in workflow triage by flagging findings suggestive of ELVO in the internal carotid artery (ICA) or middle cerebral artery (MCA) M1 on NCCT images. To acquire these results, CT image processing and Deep Learning-based triage models are applied.
At least three imaging markers can be identified as indicators of emergent large vessel occlusion on NCCT images. The most representative imaging marker is the hyperdense artery sign (HAS), a phenomenon in which an artery is blocked by a thrombus and appears hyperdense on NCCT images. Although it is the most representative imaging marker of MCA occlusion on NCCT, its presence depends on the slice thickness of the NCCT images. Thus, HAS alone may have low sensitivity when classifying patients in NCCT [
7]. A second ELVO imaging marker that can be identified on NCCT is eyeball deviation, which deviates in the direction of the hemisphere of the ELVO. A study that classified patients with ELVO using eyeball deviation observed on CT images as a single indicator reported a sensitivity of 71% and a specificity of 77.5% [
8]. A third potential imaging marker is early ischemic change (EIC) following ELVO. EIC represents ischemic changes in brain tissues identified in NCCT images taken at an early stage after stroke onset and is the basis of the Alberta Stroke Program Early CT Score (ASPECTS) [
9]. While EIC is not a specific indicator of ELVO, its presence may indicate ELVO.
NCCT images of 2184 cases (ELVO: 1202 cases, non-ELVO: 982 cases) were used as the learning model for ELVO classification. The total cases were split in an 8:1:1 ratio for training, validation, and testing, respectively. In both the training and validation datasets for this model, any cases with the same registration ID from the same institution were excluded. Among the imaging biomarkers, HAS and EIC were identified by stroke experts and trained as ELVO imaging biomarkers. Eye deviation was independently identified and included in the learning model by detecting the deviation of the crystalline lens on NCCT.
Figure 1A shows the Heuron ELVO process from input to output. After the preprocessed image was delivered to the convolutional neural network (CNN) model, Heuron ELVO identified the presence of eyeball deviation, HAS, EIC, and old infarct (OI). We employed class activation mapping to visualize region-specific feature contributions in NCCT scans, particularly for identifying occlusion patterns. These methods allow us to validate whether model decisions correlated with the radiological features of large vessel occlusions (e.g., hyperdense artery signs or early ischemic change). Subsequently, the features identified using the Heuron-2D CNN models were concatenated and input into the Heuron recurrent neural network model to classify whether ELVO was present, and a probability value between 0 and 1 was calculated. We applied Platt scaling to the validation set for probability calibration. The decision threshold was determined by selecting the point with the highest Youden’s J statistic (J = sensitivity + specificity − 1J = sensitivity + specificity − 1) on the ROC curve (Shown in
Figure S1 and
Table S1).
Figure 1B shows an example of a Heuron ELVO output presented to clinicians.
2.3. The Design of the Clinical Trial
This study was a single-center, multi-reader, blinded, retrospective clinical test performed to evaluate the effectiveness of a Deep Learning solution that automatically classifies ELVO patients based on brain NCCT. The primary endpoint was the sensitivity and specificity of ELVO prediction following Heuron ELVO assistance compared to that of unassisted NCCT image analysis only. The sample size for primary endpoint verification was calculated based on the results of previous studies and the results of Heuron ELVO internal testing. In the studies by McCluskey et al. (2019) [
8] and Lim et al. (2018) [
10], sensitivity and specificity were approximately 69% and 80%, respectively, when clinicians identified ELVO using only NCCT images. In addition, the sensitivity and specificity of the Heuron ELVO in the internal test were 87.1% and 88.8%, respectively. The sample size was calculated with the significance level of the two-tailed test set to 0.05, with 90% power after correction according to multiple tests, and based on the sensitivity and specificity values mentioned above. Accordingly, the number of cases in the ELVO group was 101, and that in the non-ELVO group was 329. Accounting for a dropout rate of 10%, 113 and 366 patients were recruited in the ELVO and non-ELVO groups, respectively.
2.4. Study Population and Data Collection
This trial was conducted at a tertiary hospital (Gachon University Gil Medical Center). When a patient suspected of stroke presents to the emergency department, NCCT is performed first. CT angiography and perfusion imaging are then performed in patients who present within 6 h of symptom onset without hemorrhage and who are likely to benefit from reperfusion therapies. The current study was a retrospective imaging-based clinical trial. Specifically, patients aged ≥19 years presenting to the emergency department between January 2019 and January 2022 and sequentially undergoing NCCT, CT angiography, and perfusion imaging due to suspected acute ischemic stroke were retrospectively grouped based on their ELVO status. The ELVO group included patients in whom ELVO was confirmed using CT angiography, perfusion, and diffusion-weighted MRI, while the non-ELVO group included patients in whom ELVO was not present. Cases were consecutively selected in reverse chronological order for each group until the target number was obtained. During this process, patients were excluded if cerebral infarction occurred elsewhere other than in the anterior circulation, if there was a lack of information, such as metadata or electronic medical records during automatic anonymization or human errors to record, or if there was severe noise in the CT images (
Figure 2). All demographic information was anonymized, and an independent ID was provided for this study. In the CT images of registered cases, slight differences were observed in the scanning model of the device, scanning protocol, or parameters, depending on the scan date. However, all were obtained using devices from the same vendor (Siemens Healthineers, Erlangen, Germany), and the slice thickness varied from 3 to 5 mm.
2.5. The Generation of the ELVO Reference Standard
Two stroke experts with >10 years of clinical experience generated the ELVO reference standard based on non-contrast CT images, CT angiography, perfusion imaging, and diffusion-weighted MRI. The hemisphere was identified if a patient was diagnosed with a large ELVO. The criteria for ELVO positivity were limited to cases where vascular occlusion from the ICA terminus to the distal M1 segment of the MCA was confirmed on CT angiography, and an acute lesion (infarction) corresponding to the occlusion was identified on perfusion parameter maps such as CBF, CBV, or MTT, or on MR diffusion-weighted images. Core volume criteria or mismatch ratios were not utilized for its classification. If the opinions of the two specialists differed, a consensus was reached through discussion between them.
2.6. ELVO Classification Based on Imaging Evaluations
This clinical trial aimed to assess the effectiveness of Heuron ELVO in supporting the clinicians’ identification of ELVO. Therefore, the observer performance method used in previous studies was employed to verify its effectiveness [
11,
12]. Four readers (one board-certified neurosurgeon, one board-certified emergency medicine specialist, and two neurology residents) performed imaging evaluations. As the principal investigator, an individual senior neurosurgeon gathered the readings of the other investigators and made a final decision based on a majority vote. Such a design was based on the Artificial Intelligence Medical Device Clinical Trial Method Design Guidelines announced by the Korean Ministry of Food and Drug Safety in 2022. The assessment was conducted in two stages. First, an unassisted evaluation was performed by the physicians’ visual grading. After a 2-week washout period, a Heuron ELVO-assisted evaluation was performed by the same investigators using the Heuron ELVO results as a reference, identifying the AI-based presence of ELVO and the involved hemisphere if identified as positive. During the reading, each reader identified the level of suspicion for ELVO using a score of 0–4 (0, no suspicion at all; 1, uncertain but not suspicious; 2, slightly suspicious; 3, suspicious; and 4, highly suspicious). When analyzing the primary endpoint, the 0–1 levels of suspicion were treated as “negative,” and levels “2–4” were treated as positive [
13]. The primary investigator made the final decision based on the investigators’ majority-vote level of suspicion. When the results were split into a 2:2 ratio, the primary investigator decided on the final consensus after a personal image review. Both the primary investigator and other readers performed the image reading uniformly. For unassisted reading, they independently assessed non-contrast CT images, and during AI-assisted reading, they referenced AI-provided results while evaluating non-contrast CT scans.
2.7. Statistical Analysis
Demographic information such as sex and age was compared between the groups. The chi-square test was used for categorical variables, and an independent t-test was used for continuous variables. The statistical analysis of demographics was performed using IBM SPSS Statistics, Version 28 (IBM Corp., Armonk, NY, USA).
The primary endpoint was the sensitivity and specificity of ELVO prediction after Heuron ELVO assistance compared to when only NCCT images were analyzed. Four different analyses were used as secondary endpoints. First, the accuracy of the ELVO classification between the assisted and unassisted analyses was compared. Second, the receiver operating characteristic (ROC) curve and area under the curve (AUROC) values of the classification results were compared between analyses. Third, the differences in sensitivity and specificity between assisted and unassisted analyses for each reader were compared. Finally, the performance of Heuron ELVO in predicting ELVO (reference standard) was validated using sensitivity and specificity analyses.
Significant differences in the indicators between readings analyzed as primary and secondary endpoints were compared using McNemar’s Test (implemented in MedCalc version 20.111, MedCalc Software Ltd., Ostend, Belgium), such as sensitivity, specificity, accuracy, and AUROC. Confidence intervals were derived using the Clopper–Pearson confidence interval method. In general, an AUROC of 0.5 was thought to suggest no discrimination (i.e., the ability to diagnose patients with and without the disease or condition based on the test), 0.7–0.8 was considered acceptable, 0.8–0.9 was considered excellent, and >0.9 was considered outstanding [
14]. The interrater reliability of the four readers was analyzed and compared between assisted and unassisted readings using Fleiss’ kappa method [
15]. The interpretation of kappa values was as follows: κ < 0 as poor, 0.01–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as almost perfect agreement [
16]. The interrater reliability was analyzed using the Fleiss toolbox (version 2.0.0.0,
https://matlab.mathworks.com/open/fileexchange/v1?id=15426 (accessed on 3 August 2023)), which is based on MATLAB (R2023b Update 3, Mathworks, Natick, MA, USA). Please refer to the
Supplementary Materials for the formulas of metrics used in the statistical analysis (shown in
Table S2).
4. Discussion
In the current study, we report on the ability of the Deep Learning-based AI clinical decision support system, Heuron ELVO, to predict ELVO from NCCT images of patients presenting to the emergency room with suspected stroke. While optimized hyperacute stroke imaging protocols facilitate the diagnosis of ELVO, its identification on NCCT images, when possible, is clinically significant, as NCCT is the primary imaging protocol for patients suspected of stroke. This study shows that compared to unassisted evaluation, Heuron ELVO-assisted evaluation improves diagnostic performance in detecting ELVO stroke on NCCT. The Heuron ELVO algorithm resulted in outstanding standalone diagnostic performance. Furthermore, it improved the interrater reliability between readers to a substantial degree.
While some studies have sought to identify ELVO based on NCCT [
17,
18], the current study is the first clinical trial to show that the NCCT-based identification of ELVO can be improved with AI assistance. Furthermore, with AI assistance, the predictive performance of readers with different backgrounds and expertise can be improved to outstanding levels. The lowest sensitivity and specificity of the assisted evaluation of individual readers were 74% and 91%, respectively, both of which do not differ greatly from previous CT angiography-based AI identifications of large-vessel occlusions [
19]. The AUROC of the assisted evaluation was also outstanding, ranging from 0.91 to 0.96 among all four readers in the current trial.
We believe that the AI-assisted identification of ELVO based on NCCT will improve patient outcomes by facilitating interhospital delays and screening for ELVO in patients with minor symptoms in whom angiography was not obtained. In primary hospitals, CTA acquisition may be limited due to concerns about contrast exposure. Furthermore, brain magnetic resonance image-based angiography may be selected as the next imaging modality after NCCT, which can result in significant time losses. AI-supported patient triage based on NCCT images would encourage physicians to transfer potential patients needing reperfusion treatments in a timely manner. Furthermore, there is a growing interest in extending EVT to patients with low neurological severity [
20]. Only NCCT would be needed for patients with miscellaneous or minor symptoms, while AI clinical decision support systems may identify ELVO in these patients, specifically those who previously may have been neglected. Overall, the current study does not argue for an EVT patient selection protocol change based on NCCT. Rather, we believe that when CT angiography is not available, AI-supported patient triage based on NCCT images is a practical alternative.
The current study showed a high sensitivity and specificity of 88.39% and 91.23%, respectively, in Heuron ELVO standalone performance when predicting ELVO. Given the difficulty of making direct comparisons between commercially available programs with identical datasets, we reviewed the validation results reported for other products. A previous study reported the diagnostic accuracy of AI-based NCCT ELVO predictions. This study used MethinksLVO, which determines the presence of acute ischemic tissue, chronic lesions, intracranial hemorrhage, and hyperdense clot signs and combines them into a single scalar indicating the presence of LVO. This study reported an AUC of 0.87 for the identification of LVO (with a sensitivity of 83%, a specificity of 71%, a positive predictive value of 79%, and a negative predictive value of 76%) [
17]. Another study reported the ability of the RAPID NCCT stroke platform to detect LVO with a sensitivity of 63.5% and specificity of 95.1% [
18]. This platform identifies LVO based on HAS and EIC. While the model used in both studies was developed using a similar approach to identify the presence of HAS, EIC, OI, and ELVO, there was a slightly higher predictive performance in the current study. Despite the discrimination of similar biomarkers, the reason for the slightly higher predictive performance in the current study may be the inclusion of eye deviation as a reference. The reported specificity of RAPID NCCT was higher than that observed in our current study. For the validation of RAPID NCCT, the primary causes of false positives in the non-ELVO group were calcified internal carotid arteries or partial volume effects associated with chronic lesions. Heuron ELVO exhibited similar false-positive patterns. However, when human readers interpreted the NCCT images with AI assistance, they could exclude such patterns, suggesting that reader–AI interaction may mitigate some false positives and improve overall specificity in clinical settings. In addition, it may also differ in that the current study only focused on ICA and MCA ELVO (the RAPID NCCT study also limited ELVO patients to anterior circulation). Therefore, errors that could occur during the identification of posterior circulation or anterior cerebral artery ELVO were minimized. Further studies may be needed to determine the contribution of each imaging biomarker element to the overall predictive ability.
Although Heuron ELVO uses a Deep Learning algorithm, it is based on the detection of HAS, EIC, and imaging-based eyeball preponderance. Previously, a visual analysis of the hyperdense artery sign reported a sensitivity of 52% and specificity of 92%. In acute ischemic stroke, HAS indicates a high likelihood of arterial obstruction, but its absence indicates only a 50/50 chance of normal arterial patency [
7]. Another study reported a sensitivity and specificity of 67% and 82%, respectively, for detecting ELVO on thin NCCT scans [
10]. Radiological eye deviation predicted ELVO with a sensitivity, specificity, and accuracy of 71%, 77.5%, and 84.6%, respectively [
8]. Conversely, the predictive value of EIC for ELVO has not been previously evaluated. Theoretically, EIC may be observed in hemispheric or partial ischemic stroke even without ELVO; thus, it cannot be used alone to predict ELVO. In this study, we used EIC along with other variables and believe that it substantially improves the sensitivity of ELVO prediction compared to existing studies using HAS or eye deviations. In this study, we did not identify the specific distributions of EIC that may further point toward ELVO. Further studies are required to address this issue.
5. Limitations and Future Work
This clinical trial has some limitations. First, the study population may not represent real-world patients with thrombolysis code activation because this analysis was retrospectively performed based on medical data and imaging protocols. Patients with hemorrhagic stroke were not included in the analysis, and there was a seemingly low number of patients with ischemic stroke in the non-ELVO group. While this is a limitation, it should be noted that a substantial percentage of stroke mimics present to emergency departments [
21]. Moreover, as Heuron ELVO is pre-equipped with hemorrhage detection [
22], we believe that its diagnostic power results will be comparable in real-world clinical situations. Second, the diagnostic power of clinicians in diagnosing ELVO may have been overestimated because the final adjudication was based on consensus. Nevertheless, we believe that this does not weaken the efficacy of the Heuron ELVO in further assisting clinical decision-making because predictive power was improved during both individual-level evaluation and consensus-based analysis. Bias could be minimized using independent blind assessments or external validation panels. Third, a two-week adjudication interval may not have been sufficient to prevent recall bias, and a longer interval, or analysis by a separate reader, may be ideal in the future. The decision to set the washout period to two weeks in the current study was based on similar, prior research that utilized washout periods from two weeks up to four weeks [
23,
24]. To further minimize recall bias, we also randomized the order in which the images were presented to the readers, but we could not directly assess the impact of recall bias. Overall, the inherent limitation of the current study lies in its design as a single-center-based retrospective evaluation. Considering the above factors, a prospective multicenter trial evaluating reproducibility across various settings, such as the inclusion of hemorrhagic and posterior circulation stroke, is needed. Furthermore, setting patient classification thresholds for clinical trials should involve a cost–benefit analysis to determine whether reductions in false negatives or false positives should be prioritized in clinical practice, thereby optimizing patient triage. Thess issue will be explored in a prospective study that is being conducted as the next step of this research.
Finally, we acknowledge that a significant portion of the research team is affiliated with Heuron Co. Ltd., which may raise concerns regarding potential conflicts of interest and the risk of bias in study design, data handling, analysis, and interpretation. To mitigate these risks and uphold scientific transparency and objectivity, we implemented the following safeguards. First, although model development was led by Heuron-affiliated researchers, the final validation of the algorithm was conducted using a separate dataset curated and annotated by independent stroke experts unaffiliated with the company. These readers were blinded to both the algorithm output and patient outcomes during annotation. Second, the study followed a predefined analysis plan finalized prior to data collection and analysis. Key endpoints and statistical methods were established as a priori and were not altered post hoc. Third, for both reading stages, all CT image cases were reviewed by board-certified emergency medicine specialists, neurologists, and neurosurgeons who were blinded to the AI results and clinical data during adjudication. Fourth, the data used for model training and testing were locked prior to analysis, and version-controlled audit logs were maintained. Model performance metrics were computed on the locked dataset without retrospective tuning or performance inflation. Fifth, the study protocol was reviewed and approved by an independent IRB, and all research activities adhered to institutional data governance and ethical standards.
In conclusion, Heuron ELVO reliably identified the presence of ELVO on NCCT images, and its assistance significantly improved clinician diagnosis. These findings suggest that Deep Learning algorithm software may assist in shortening the in-hospital process of primary medical centers while facilitating interhospital transfer decisions, resulting in improved treatment outcomes by providing EVT to necessary patients.