1. Introduction
In recent years, deep learning techniques have been developing rapidly, and there is much interest in the adoption of deep learning for medical applications. More than 60 Artificial Intelligence (AI)-equipped medical devices have already been approved by the Food and Drug Administration (FDA) in the United States [
1]. Indeed, it has been pointed out that diagnostic systems using deep learning may detect abnormalities and diseases more quickly and accurately than humans can; however, this requires the availability of enough datasets on both normal and abnormal subjects for different diseases [
2,
3].
It is estimated that congenital heart disease (CHD) exists in approximately 1% of live births, and critical CHD accounts for the largest proportion of infant mortality resulting from birth defects [
4,
5,
6]. In this regard, abnormal cardiac findings on routine prenatal ultrasound screening by mainly obstetricians should trigger a more precise examination as soon as feasible. Proper prenatal diagnosis, allowing for prompt treatment within a week of the birth, is known to markedly improve the prognosis [
7]. Fetal ultrasound screening of every pregnancy at risk for CHD is generally recommended at 18 to 22 weeks of gestation worldwide [
8,
9]. Despite its importance, however, the total prenatal diagnostic rate of 30–50% remains insufficient due to differences in diagnostic skill levels between examiners [
8,
10,
11]. Due to its manual operation, effective fetal cardiac ultrasound screening requires high skill levels and experience among examiners coupled to feedback from fetal or pediatric cardiologists and cardiovascular surgeons. The relatively low incidence of CHD and different levels of medical expertise at hospitals result in inconsistencies. Hence, it is important to develop a system that can always conduct fetal cardiac ultrasound screening with a high skill level.
In the present study, we have used deep learning with relatively small and incomplete datasets of fetal ultrasound videos, to provide diagnostic support for examiners in fetal cardiac ultrasound screening. Each video consisted of the informative sequential cross-sections in our datasets; hence, no high skill levels were required to accurately describe the standardized transverse scanning planes. Generally, experts use their own judgement to determine whether certain cardiac substructures, such as valves and blood vessels, are in the correct anatomical localizations, by comparing normal and abnormal fetal heart images. This process is like the object detection technique, which allows us to distinguish the localizations and classify multiple substructures appearing in videos. Here, we demonstrated a novel deep learning approach for automatic detection of cardiac substructures and its application to detect cardiac structural abnormalities in fetal ultrasound videos.
4. Discussion
Fetal cardiac ultrasound assessments of an affected pregnancy should be performed sufficiently early to provide time for a proper treatment if needed. The importance of fetal cardiac ultrasound screening, incorporating multiple views of the heart and blood vessels, has been advocated to improve the prenatal detection rate for CHD [
8]. Recent advances in computer processing and transducer technology have also expanded the capacity of fetal ultrasound to include a wide variety of new modalities and sophisticated measures for cardiac structure and function. Nevertheless, the detection rate remains inaccurate and dependent on the type of ultrasound practice and experience of the examiners [
24,
25]. Previous experience with CHDs and exposure to practical advice and feedback from experts, cardiologists, and cardiovascular surgeons are necessary to become a well-qualified examiner. The manual operation adds to the practical difficulties of normalizing the sweep scanning techniques and the resulting images. The research and development of the modalities with fixed patient or subject and constant measurement time, including computed tomography (CT), magnetic resonance imaging (MRI), X-ray, and pathological images, have led to advances in high quality controls [
26,
27]. However, the characteristic issues in ultrasound described above have slowed the progress of research, and there have been few publications and products associated with deep learning-based analyses of ultrasound images compared to other modalities [
28,
29,
30]. Some models to support CHD screening by detecting the standardized transverse scanning planes have been reported, but the robustness of their input data needs to be considered [
12,
13,
14].
We investigated deep learning using relatively small and incomplete datasets. The low incidence rate of CHD limited our ability to collect large volumes of relevant ultrasound images or videos for deep learning training. On the other hand, most pregnant women have a singleton fetus with a normal heart, among which there is little structural atypia. Therefore, we developed a novel application of object detection supervised from the dataset of normal cases only, to detect fetal cardiac substructures and structural abnormalities in fetal cardiac ultrasound screening. We analyzed fetal ultrasound videos, which consisted of the informative sequential cross-sections in an examiner-independent manner. For quality control, a high quality expert assisted in addressing the technical variety of annotation of the 18 different anatomical substructures. Our proposed SONO achieved a high detection ability, whereas the detail of their AP distribution implied that there were the detectable and undetectable substructures. Relatively small substructures such as a tricuspid valve, mitral valve, pulmonary vein, and ductus arteriosus were undetectable.
We converted the video data into a barcode-like timeline. Enhancing the perspicuity of the whole examination, the barcode-like timeline made it easy to identify which substructures affected the diagnosis and hence, shorten the confirmation time. The examination results were standardized regardless of the technical levels of examiners, using automatic cardiac substructure detection. Our analyses comparing normal and some CHD cases showed that this timeline correctly captured their clinical characteristics. The important findings were that a pulmonary artery was not detected as normal in TOF, which reflects its narrowing. In CHD cases, we could see the probability transition and identify the critical differences from normal cases. While previous methods have tried to hide the detection variability in video sequences, this study showed the variability in video object detection as useful information for examiners. The barcode-like timeline is useful in terms of explainability, and can be highlighted as one of the features of “explainable AI.”
To assess detection ability of cardiac structural abnormalities, we focused on the sequential 20 video frames of cross-sections around the 4CVs and 3VTVs. Through the ROC analysis, SONO performed better than the four conventional anomaly detection algorithms in both test datasets. In addition, SONO used one-third of the videos of the other algorithms in the training dataset, thereby reducing the cost and effort of data collection. Furthermore, the detection accuracy of outflow tracts and vessels was higher than the other heart structures in SONO. The conventional algorithms, ConvAE and AE + global feature, were engineering advanced and adapted to high quality images photographed with a security camera; however, their domain specific abilities of anomaly detection were insufficient for the low-resolution ultrasound videos. AnoGAN, originally intended for still ultrasound images, and the versatile algorithm ConvAE-1frame were inferior to SONO regarding fetal ultrasound videos.
Limitations
There are several limitations in this study. First, owing to the relatively low incidence of CHD, we used the small volume of CHD data from limited institutions. Our training data consisted of only normal cases; however, further CHD data collection is needed as test data for the validity and reliability evaluation of detecting cardiac structural abnormalities, by cooperating with other hospitals throughout Japan or globally. Second, our fetal ultrasound videos were obtained using the same type of ultrasonography machine. In terms of the robustness, we have to verify whether SONO works in a different equipment and setting. Third, SONO consisted of mainly apical view data and could not handle any kind of fetal presentations. Inputting further non-apical view datasets to the CNN might resolve this limitation. Finally, it was still hard for SONO to capture the isomerism, complete transposition of large vessels, and the subtle changes of the cardiac substructures, such as a ventricular hypertrophy, ventricular septal defect, and valve abnormalities. Therefore, we have to consider add-on technologies including image segmentation, for further accurate detection of these findings.