3.2.1. Congenital Heart Diseases
Congenital heart diseases (CHDs) are classified as common and severe congenital malformations in fetuses, occurring in approximately 6 to 13 out of every 1000 cases [
88]. Although, CHDs may have no prenatal symptoms, they may result in significant morbidities, and even death, later in life. Since heart defects are the most common fetal anomalies among fetuses, research interest in this matter is consequently higher than other types of defects. Evaluating the cardiac function of a fetus is challenging due to the factors such as the fetus’s constant movement, rapid heart rate, small size, limited access, and insufficient expertise in fetal echocardiography among some sonographers, which makes the identification of complex abnormal heart structures difficult and prone to errors [
89,
90,
91]. Fetal echocardiography was introduced about 25 years ago and now needs to incorporate advanced technologies.
The inability to identify CHD during prenatal screening is more strongly influenced by a deficiency in adaptation skills during the performance of the SAS test than by situational variables like body mass index or fetal position. The cardiac images exhibited a considerably higher frequency of insufficient quality in undiscovered instances as compared to identified ones. In spite of the satisfactory image quality, CHD was undetected in 31% of instances. Furthermore, it is worth noting that in 20% of instances when CHD went undiscovered, the condition was not visually apparent despite the presence of high-quality images [
92]. This study illustrates the significance and necessity of ML approaches as tools that can successfully reduce the number of undetected CHD cases and enhance the accuracy of prenatal diagnosis.
Echocardiography, a specialized US technique, remains the primary and essential method for early detection of fetal cardiac abnormalities and mortality risk, aimed at identifying congenital heart defects before birth. It is extensively employed during pregnancy, and the obtained images can be used to train DL models like CNN to automate and enhance the identification of abnormalities [
93]. An echocardiogram consists of a detailed US test of the fetal heart, performed prenatally; utilizing AI for analyzing echocardiograms holds promise in advancing prenatal diagnosis and improving heart defect screening [
94]. In this context, Gong et al. conducted a study wherein they developed an innovative GAN model. Integrating the DANomaly and GACNN (generative adversarial CNN) neural network architectures resulted in the creation of this model. The objective of this study was to train the model using extracted features derived from FCH images obtained from echocardiogram video slices. Moreover, they used an extension of the original GAN model called the Wasserstein generative adversarial network with gradient penalty (WGAN-GP) to extract features from fetal FCH images. They eventually developed a novel DGACNN, intending to identify CHD by combining the GAN discriminator architecture with additional CNN layers. According to the study, the DGACNN model demonstrated an 85% recognition accuracy in detecting fetal congenital heart disease (FHD), surpassing other advanced networks by 1% to 20%. Compared to expert cardiologists in FHD recognition, the proposed network achieved a remarkable 84% accuracy in the test set [
95].
While GANs have demonstrated their effectiveness in anomaly detection and generative modeling, it is possible to enhance their analytical performance for intricate tasks like fetal echocardiography assessment by training an ensemble of multiple neural networks and integrating their predictions. The use of an ensemble of neural networks involves the integration of different neural networks in order to address certain machine-learning objectives. The key concept is that an ensemble of multiple neural networks would typically exhibit greater performance compared to any individual network. In this regard, Arnaout et al. trained an ensemble of neural networks to differentiate normal from CHD cases with respect to the guideline-recommended cardiac views. They used 107,823 images from 1326 echocardiograms and ultrasound images of fetuses between 18 and 24 weeks of gestation. A CNN view classifier was used to train a model capable of identifying the five screening views in fetal ultrasounds. Any image that did not correspond to one of the five views specified by guidelines was classified as ‘non-target’, such as the head, foot, or placenta. The results indicated great performance with an area under the curve (AUC) of 0.99 [
96].
The four-chamber view facilitates the assessment of cardiac chamber size and the septum. In contrast, the left ventricular outflow tract view offers a visualization of the aortic valve and root. The right ventricular outflow tract view provides insight into the pulmonary valve and artery, and the three-vessel view confirms normal anatomy by showcasing the pulmonary artery, aorta, and superior vena cava. Additionally, the arch view scrutinizes the transverse aortic arch and branching vessels. During routine obstetric US screenings, these five standard views—the four-chamber, left ventricular outflow, right ventricular outflow, three-vessel, and arch views—give a full view of the fetal heart and major blood vessels (
Table 6). This inclusive approach allows for detecting various significant congenital heart conditions before birth.
Emphasizing the importance of the four-chamber views, we can delve into a study by Zhou et al. [
97]. They introduced a category attention network aimed at simultaneous image segmentation for the four-chamber view. They modified the SOLOv2 model for object instance segmentation. However, SOLOv2 encounters a potential misclassification issue with grids within divisions containing pixels from different instance categories. This discrepancy arises because the category score of a grid might erroneously surpass that of surrounding grids, which affects the final quality of instance segmentation. Certain image portions would become intertwined, leading to challenges in accurate object classification. To address this, the researchers integrated a “category attention module” (CAM) into SOLOv2, creating CA-ISNet. The CAM analyzes various image sections, aiding in accurately determining object categories. The proposed CA-ISNet model underwent training using a dataset of 319 images encompassing the four cardiac chambers of the fetuses. The functionality of this model relies on three distinct branches:
Category Branch, responsible for assigning each instance to an appropriate cardiac chamber by predicting the semantic category of the instance.
Mask Branch, segmenting the heart chambers within the images.
Category Attention Branch. This component learns the category-related information of instances to rectify any inaccurate classifications made by the category branch.
The results demonstrated an average precision rate of 45.64%, with a DICE range of 0.7470 to 0.8199. DICE is an average value of two other measurements, which are precision and recall rate, and it gives us an overall performance rate for models.
Concerning the simultaneous segmentation framework, another study was conducted to analyze and simultaneously segment lung and heart US images using a U-Net based architecture. One of the challenges with these approaches is that they can lead to a “multi-scale’’ problem. This is because every neural network model has its own receptive field scale, but organs in US images vary in size and scale. Therefore, a single scale may not accurately segment all organs. However, in a recent study, the mentioned problem has been addressed by their proposed multi-scale model with an attention mechanism by extracting multi-scale features from images with additive attention gate units for irrelevant feature elimination. Their dataset consisted of 312 US images of the fetal heart and lungs. The images, however, were acquired from a single source, which can lead to an overfitting problem and a relatively low number of images. Nevertheless, the simultaneous segmentation capability of this model has great potential because it allows a more holistic view of fetal anatomy to assess developmental anomalies. In addition, it can also allow for efficient single-pass processing of US images [
98].
Another recent study aimed to predict 24 objects within the fetal heart in the four-chamber view using a Mask-RCNN architecture. Instead of using the whole ultrasound, the researchers employed the four standard fetal heart images as input data. These objects comprised the four standard shapes of fetal heart views, 17 heart chamber objects for each view, and three types of CHD: atrial septal defect (ASD), ventricular septal defect (VSD), and atrioventricular septal defect (AVSD). The model achieved a DICE of 89.70% and IoU of 79.97 [
99]. However, it is worth noting that their DL-based approach was evaluated using a relatively small dataset of 1149 fetal heart images. Additionally, the study was conducted using data from a single center, which may limit the generalization of the results to other populations.
Xu Lu et al. proposed a novel approach to segmenting the apical four-chamber view in fetal echocardiography. Their method employs a cascaded CNN referred to as DW-Net [
100]. Cascaded CNNs connect multiple CNNs sequentially to learn hierarchical visual features. Unlike GANs for generative modeling or ensembles that combine different models, cascaded CNNs break down difficult vision tasks into smaller problems that can be solved efficiently in a pipeline. As an advantage, they can scale to very deep networks. However, it can be resource-intensive to train each CNN individually, and errors may propagate across the entire network. The DW-Net model provided by Xu et al. comprises two sequential stages. The initial stage produces a preliminary segmentation map, while the subsequent refinement stage enhances the map’s accuracy. Their proposed approach enhances the reliability of identifying the defects by employing the DW-Net architecture with its dual-stage segmentation process. The cascaded neural network’s ability to generate refined segmentation maps ensures that subtle structural variations and anomalies within the fetal heart can be accurately determined. However, the dataset used for training and evaluation was still relatively small as it included 895 images from only healthy fetuses, and the apical four-chamber view was studied. In another study, Xu et al. developed a cascaded U-Net (CU-Net) that uses two branch supervisions to improve boundary clarity and prevent the vanishing gradient problem as the network gets deeper. It also benefits from connections between network layers to transfer useful information from shallow to deep layers for more precise segmentation. Additionally, their SSIM loss helps maintain fine structural details and produce clearer boundaries in the segmented images [
101].
A recent study has introduced the multi-feature pyramid U-net (MFP-Unet), a novel deep-learning architecture for automated segmentation of the left ventricle (LV) in 2D echocardiography images [
102]. MFP-Unet blends the U-Net and feature pyramid network (FPN) architectures to improve segmentation accuracy. Object recognition and image segmentation tasks are the focus of FPNs. FPNs enhance feature representation by creating a multi-scale hierarchy of feature maps through lateral connections and top-down pathways. This allows the network to collect both fine-grained and high-level contextual input, which ultimately enhances the network’s accuracy when detecting objects of varying sizes. This capability can be especially beneficial for medical images. For example, in identifying fetal heart defects in echocardiographic images, FPNs can assist by effectively detecting complex cardiac structures, ranging from subtle anomalies to the broader context of anatomical features. Their multi-scale approach is crucial in recognizing localized abnormalities and holistic heart structures. However, the FPN’s computational complexity and memory requirements may serve as limiting factors. Furthermore, the utilization of MobileNet, U-Net, and FPNs demonstrated a 14.54% increase in IoU compared to using only U-Net, when applied to the segmentation of a cardiac four-chamber image [
103].
The proposed MFP-Unet model achieved an average DSC of 0.953 in a public dataset, outperforming other state-of-the-art models. The main innovation in this work is the combination of multi-scale feature pyramids with U-Net to enhance segmentation robustness and accuracy, along with “network symmetry and skip connections between the encoder-decoder paths” [
102]. Skip connections are essential in neural networks because they help overcome training challenges, facilitate information flow, handle different scales of features, and promote faster convergence. Because of their small dataset of only 137 images, an augmentation method was used in this study. The researchers created 10 slightly different versions of the images by applying the elastic deformation method. Consequently, the augmentation of the image quantity by a factor of ten yielded a total of 1370 images. Each of these augmented images would be considered a new data point for training the neural network. By applying elastic deformation to the images, they introduced variations in the shape and appearance of the heart structures in the echocardiographic images. This augmentation technique helps the neural network learn to be more robust to different shapes and conditions it might encounter in real-world echocardiographic data. It is a common practice in deep learning to use data augmentation to artificially increase the size and diversity of training datasets when the original dataset is limited in size.
Table 6.
Overview of key sections in fetal echocardiography. A summary of the purposes of different views of the fetal heart that are used in a standard fetal echocardiography procedure [
104,
105,
106].
Table 6.
Overview of key sections in fetal echocardiography. A summary of the purposes of different views of the fetal heart that are used in a standard fetal echocardiography procedure [
104,
105,
106].
Section | Description | Purpose |
---|
Fetal Apical Four-Chamber Heart Section | View of the fetal heart from the apex, capturing all four chambers (left and right atria, left and right ventricles) | Assess size, structure, and function of each chamber individually and their alignment |
Three-Vessel Catheter Section | Evaluates three major blood vessels in the fetus’s chest area: aorta, pulmonary artery, and superior vena cava | Assess size, position, and potential abnormalities of these vessels |
Three-Vessel Trachea Section | Evaluates aorta, pulmonary artery, superior vena cava, and trachea simultaneously | Detect abnormalities involving both cardiovascular and respiratory systems |
Right Ventricular Outflow Tract Section | Focuses on assessing the outflow tract of the right ventricle connecting to the pulmonary artery | Identify obstructions or malformations affecting blood flow from the right ventricle to the pulmonary artery |
Left Ventricular Outflow Tract Section | Concentrates on evaluating the outflow tract of the left ventricle connecting to the aorta | Identify abnormalities or blockages hindering the flow of oxygenated blood from the left ventricle to the aorta |
In a recent study protocol, Ungureanu et al. proposed a ML-based intelligent decision support system to analyze first-trimester fetal echocardiogram videos and help sonographers detect fetal cardiac anomalies. The system will then be validated on new US videos, with the primary outcome of improved anomaly detection in critical views of the heart by less experienced sonographers. Secondary outcomes assessed will be the optimization of clinical workflow and reduced discrepancies between evaluators. As a protocol, no results are presented since the study has yet to be conducted. However, this approach can be further investigated to help technicians in their diagnosis [
105].
Yang et al. developed a DL-based classifier to identify ventricular septal defects. They obtained 1779 normal and abnormal fetal US cardiac images in the five standard views of the heart. They used five YOLOv5 networks as their primary model to classify images into “normal” and “abnormal”. According to the study, their model reached an overall accuracy rate of 90.67%. The performance of YOLOv5 was also compared to other mainstream recognition models, such as Fast RCNN and ResNet50, and Fast RCNN and MobileNetv2, and was found to be superior in terms of accuracy [
107].
In addition to US image analysis, other approaches like cardiac QT signal processing have been used but require further research and assessment [
108]. In another study, Dong et al. developed a DL framework comprising three CNN networks, namely, CNN, a deep-CNN, and an aggregated residual visual block net (ARVBNet), which is able to detect key anatomical structures on a plane. They aimed to build a fully automatic fetal heart US image quality control system. The model achieved the highest mean average precision (mAP) of 93.52% [
109].
In another study, researchers examined the effectiveness of HeartAssist, an AI-based software designed to evaluate fetal heart health and identify any potential anomalies during the screening process. The study discovered that the quantity and percentage of images regarded as adequate visually by the expert or using HeartAssistTM were equivalent, with a percentage of more than 87% for all cardiac views examined. This indicates that using a program like HeartAssist to evaluate fetal cardiac problems during the second-trimester ultrasonographic screening for abnormalities has many potentials [
110].
The mentioned studies can be used with other models to achieve a fully reliable automated system. For example, the work of Dong et al. [
109], where they developed a CNN-based framework, could be used to automatically assess the quality of fetal US cardiac images before they are fed into the primary model for diagnosis. This helps ensure that only high-quality images are used for diagnosis, which can further improve the accuracy and reliability of the diagnosis.
3.2.2. Head and Neck Anomalies
The development of the fetal brain is the most essential process that takes place during the 18–21 weeks of pregnancy. Any abnormalities in the fetal brain can have severe effects on various functionalities of the brain, such as cognitive function, motor skills, language development, cortical maturation, and learning capabilities [
111,
112]. Thus, a precise anomaly detection method is of the utmost importance. Currently, US is still the most commonly used method to initially examine the development of the fetal brain for any fetal anomalies during pregnancy. During the 18- to 21-week pregnancy period, US imaging is used to measure the cerebrum, midbrain, cerebellum, brainstem, and other regions of the brain as part of the screening for fetal abnormalities [
113,
114]. To detect fetal brain abnormalities, Sreelakshmy et al. developed a model (ReU-Net) based on U-Net and ResNet for the segmentation of fetuses’ cerebellum using 740 fetal brain US images [
115].
The cerebellum is an essential part of the brain that plays a crucial role in motor control, coordination, and balance. The fetal cerebellum can be seen and distinguished from other parts of the brain in US images, which makes it relatively easy for technicians to examine it during scans and, consequently, for researchers to employ DL-based models for the segmentation of the obtained images. Moreover, ResNet is a popular model frequently used for medical image segmentation, and it offers to skip connections to address the vanishing gradient problem. More specifically, in deep networks, gradients that are used to guide the weight information update for layers can become smaller and smaller as they are multiplied at each layer, and they will eventually reach close to zero. This makes the network struggle to learn complex patterns from images, which is essential in medical image processing. Besides using ResNets, Sreelakshmy et al. also employed the Wiener filter, which reduces unwanted noises in most US images. As a result, their ReU-Net model achieved 94% and 91% for precision rate and DICE, respectively. Singh et al. also used the ResNet model in conjunction with U-Nets to automate the cerebellum segmentation procedure. However, in this study, by including residual blocks and using dilation convolution in the last two layers, they were able to improve cerebellar segmentation from noisy US images [
116].
The subcortical volume development in a fetus is a crucial aspect to monitor during pregnancy. Hesse et al. constructed a CNN-based model for an automated segmentation of subcortical structures in 537 3D US images [
117]. One important aspect of this research is the use of few-shot learning to train the CNN using relatively few manually annotated data (in this case, only nine). Few-shot learning is a machine learning paradigm characterized by the training of a model to perform various tasks using a very restricted amount of data. This quantity is often significantly smaller than what is typically required by conventional machine learning approaches. The basic goal of few-shot learning is to make models flexible and capable of doing tasks that would otherwise need extensive labeled data collection, which can be either time-consuming or expensive.
Cystic hygroma is an abnormal growth that frequently occurs in the fetal nuchal area, within the posterior triangle of the neck. This growth originates from a lymphatic system abnormality, which develops from jugular-lymphatic blockage in 1 in every 285 fetuses [
118]. The diagnosis of cystic hygroma is made with an evaluation of the NT thickness. Studies have also shown the connection between cystic hygroma and chromosomal abnormalities in first-trimester screenings [
119]. In this concern, a CNN model called DenseNet was trained by Walker et al. on a dataset that included 289 sagittal fetal US images (129 images were from cystic hygroma cases, and 160 were from normal NT controls) in order to diagnose cystic hygroma in the first-trimester US images. The model was used to classify images as either “normal” or “cystic hygroma”, with an overall accuracy of 93% [
120]. Several studies have shown the advantages of DenseNet models over ResNet architectures in terms of achieving higher performance while requiring less computational power, along with parameter efficiency and enhanced feature reuse [
121,
122,
123].
To perform US in order to look for abnormalities in the brains of prenatal fetuses, the standard planes of fetal brain are commonly used. However, fetal head plane detection is a subjective procedure, and consequently, prone to errors and mistakes by technicians. Recently, a study was conducted to automate fetal head plane detection by constructing a multi-task learning framework with regional CNNs (R-CNN). This MF R-CNN model was able to accurately locate the six fetal anatomical structures and perform a quality assessment for US images [
124]. Similarly, Qu et al. proposed a method using differential CNNs for accurately identifying the six fetal brain standard planes. Unlike traditional CNNs that process each image independently, a differential CNN takes two input images and computes the element-wise difference between the corresponding pixels. This difference map, the differential image, is fed into the network for further processing. Large databases are necessary for researchers in this field, but they can also cause overfitting and other model limitations. The researchers used a dataset of images comprising 155 fetal images, which is a relatively small dataset. However, the researchers used several data augmentation methods, including rotation, flipping, and scaling, to increase the size of the training dataset to 30,000 images and to prevent the model from overfitting [
125].
Lin et al. made a model that was trained on 1842 2D sagittal-view US images. It was made to find nine intracranial structures of the fetus, including the thalami, midbrain, palate, fourth ventricle, cisterna magna, NT, nasal tip, nasal skin, and nasal bone [
126]. The study used both standard and non-standard sagittal-view ultrasound images. The researchers also used an external test set of 156 images from a different medical facility to assess the generalization, robustness, and real-world application of their fetus framework. This enabled them to evaluate how well the model performed beyond its initial training data, verifying that it could manage a wide range of clinical scenarios, patient demographics, and equipment variances. Unlike the Lin et al. model, which was also used for non-standard planes, the Xie et al. model was trained only on standard planes, which makes it prone to misjudgments if non-standard planes are presented. Additionally, this model only indicates that the cases are normal or abnormal, and lacks specificity regarding a clear and comprehensive diagnosis, which is necessary [
127].
Based on the same dataset provided by Xie et al. [
127], another study was conducted to develop a computer-aided framework for diagnosing fetal brain anomalies. Craniocerebral regions of fetal head images were first extracted using a DCNN with U-Nets and a VGG-Net network, and then classified into normal and abnormal categories. In small datasets, using VGG networks can lead to overfitting because of the large number of parameters available in these models. However, they used this model on a large dataset of US images and achieved an overall accuracy of 91.5%. In addition, the researchers implemented class activation mapping (CAM) to localize lesions and provide visual evidence for diagnosing abnormal cases, which can make them visually comprehensive for non-expert technicians. However, the IoU value of the predicted lesions was too low, and thus, more advanced object detection techniques are required for a more precise localization [
128]. Furthermore, Sahli et al. proposed a SVM classifier to categorize fetal head US images into two categories: normal and abnormal. However, their database included images of fetuses with the same gestational age, which may limit the model’s generalization to diagnose fetal defects in images from different gestational ages [
129]. In another recent study, researchers used 43,890 neurosonography images of normal and abnormal fetuses to build a DL-based model using the YOLOv3 architecture to find different patterns of fetal intracranial anomalies in standard planes and make a diagnosis for congenital CNS malformations. Their model is called the Prenatal Ultrasound Diagnosis Artificial Intelligence Conduct System (PAICS) and is capable of diagnosing ten different types of patterns. The micro-average AUC values for the PAICS range from approximately 0.898 to 0.981, indicating a high level of accuracy [
130]. Real-time detection for tasks similar to this is essential for immediate diagnosis and decision making, especially if such models are eventually considered to be used in hospitals. In this case, Lin et al. used YOLOv3, which is known for its speed and efficiency in real-time object detection [
131]. Unlike the previous study, which used CAM to localize lesions following their classification, YOLOv3 can simultaneously classify and localize anomalies in bounding boxes more accurately.
Other valuable information can be drawn from the segmentation of fetal head images in obstetrics for monitoring fetal growth [
132]. This information is valuable for the assessment of fetal health. Everwijn et al. performed detailed neurosonography, including 3D volume acquisition, on fetuses with isolated CHD starting at 20 weeks of gestation. They used an algorithm to automatically evaluate the degree of fetal brain maturity and compare it between the CHD cases and the control group. The CHD cases were further categorized based on blood flow and oxygenation profiles according to the physiology of the defect. Subgroup analyses were then conducted. The results showed a significant delay in brain development in fetuses with CHD, especially those with transposition of the great arteries (TGA), which is a congenital heart defect where the two main arteries leaving the heart are switched (transposed), or intracardiac mixing, compared to the control group [
133]. However, the study did not explain the reasons for these differences or whether they were only due to decreased oxygenated blood flow to the fetal brain. The authors have previously published another study on this matter and concluded that, compared to healthy control cases, fetuses with isolated congenital heart abnormalities had a slight delay in their cortical development [
134].
Biometric parameters such as head circumference [
135], biparietal diameter, and occipitofrontal diameter are commonly used in ultrasound examinations to assess fetal skull characteristics such as shape and size [
59]. Zeng et al. developed a very lightweight DL-based model for a fast and accurate fetal head circumference measurement from two-dimensional US images [
136]. Using the same dataset as the previous study, Wang et al.’s model achieved a DSC of 98.21% for the automatic measurement of fetal head circumference using a graph convolutional network (GCN), exceeding other state-of-the-art methods such as U-Net, V-Net, and Mask-RCNN [
137]. Both of these studies used an augmentation method to increase the number of images. One important difference between the two studies was their efficiency in computation and memory demands. Lightweight DCNNs demand less computational power and memory compared to GCNs.
3.2.4. Chromosomal Abnormalities
Chromosomal disorders are frequently occurring genetic conditions that contribute to congenital disabilities. These disorders arise due to abnormalities in the structure or number of chromosomes in an individual’s cells, leading to significant health challenges and impairments present from birth. There are, however, various ways to detect them early on in the pregnancy. The ones that we are concerned with here are those evaluations that help us detect genetic disorders from US images. These include the following:
NT measurement, which measures the thickness of the fluid-filled space at the back of the fetus’s neck.
Detailed anomaly scan, a thorough US examination that checks for any structural abnormalities in fetuses.
Fetal echocardiography, which focuses on evaluating the fetal heart structure and function to detect cardiac anomalies.
Nasal bone (NB), whose absence is a valuable biomarker of Down syndrome in the first trimester of pregnancy.
In addition to the mentioned procedures, another technique that can be used to detect chromosomal disorders from US images is the measurement of fetal facial structure. Certain facial features can indicate the presence of certain genetic conditions [
143]. For example, during a US screening, a technician will carefully examine the fetus’s facial structure for any abnormalities or distinctive features that may suggest a chromosomal disorder. For example, some common facial features of Down syndrome include a flat nasal bridge, upward-slanting eyes, and a small mouth. These features may be visible during a US and can raise the likelihood of a chromosomal disorder [
144].
Tang et al. developed a two-stage ensemble learning model named Fgds-EL that uses CNN and RF models to train a model to diagnose genetic diseases based on the facial features of the fetuses. This study used 932 images (680 were labeled normal, and 252 were diagnosed with various genetic disorders). To detect anomalies, the researchers extracted key features from a fetal facial structure, such as the nasal bone, frontal bone, and jaw. These are specific locations where genetic disorders such as trisomy 21, 19, 13, and others can be identified. The CNN was trained to extract high-level features from the facial images, while the RF was used to classify the extracted features and make the final diagnosis. The proposed model achieved a sensitivity of 0.92 and a specificity of 0.97 in the test set [
145].
NT is the term used to describe the sonographic appearance of an accumulation of fluid under the skin of the fetus’s neck at around 11–13 weeks into the pregnancy (
Figure 8b). Current research suggests that this measurement is crucial in assessing the risk of chromosomal abnormalities.
Currently, an NT measurement of 3.5 mm is considered an indication for invasive testing, often followed by chromosomal microarray analysis. In addition, fetal chromosomal abnormalities are not always accompanied by abnormal fetal karyotypes [
146]. In this vein, one study found that when NT thickness is between the 95th centile and 2.5 mm, there is a potential existence of chromosomal abnormalities [
25]. However, based on the quantitative results of another study, researchers concluded that the NT cut-off for invasive testing could be 3.0 mm instead of 3.5 mm [
147].
Identifying NT abnormalities can be a difficult task, and researchers have found that the possibility of detecting fetal anomalies at the 11–13 week scan falls into the following categories [
148]:
Always detectable
Never detectable
Sometimes detectable
In terms of NT measurement, there are specific locations on the fetal head where medical professionals look for abnormalities (
Figure 8a):
Tip of the Nose
Nasal Bone
Palate
Diencephalon
Nuchal Translucency
By checking the mentioned locations, we can detect any abnormalities or variations in the thickness of the NT during the fetal US. Thus, any abnormalities in these areas can indicate potential genetic disorders or chromosomal abnormalities such as Down syndrome, various types of trisomy, and Turner syndrome [
149]. Additionally, NT image segmentation using ML models has also shown to be effective for the early diagnosis of brain anomalies [
150].
Down syndrome is the most frequent chromosomal abnormality and the most frequent cause of non-inherited mental retardation, characterized by a full or partial extra copy of chromosome 21. Children with Down syndrome often experience slower growth and have intellectual disabilities [
151]. Thus, screening for trisomy 21 during the first trimester and early second trimester of pregnancy is crucial, so that mothers with affected fetuses can make informed decisions about their reproductive options as early as possible [
152].
Most fetuses with trisomy 21 have a thicker NT and an absence of a nasal bone [
153]. Babies born with trisomy 21 may have nasal bones that are underdeveloped or absent, resulting in a flat bridge. According to research, most fetuses with trisomy 21 lack a nasal bone. As a result, trisomy 21 is more likely in cases where the nasal bone is missing [
154,
155]. Another study found that the nasal bone-to-nasal tip length ratio might also be a potential marker for the diagnosis of trisomy 21 [
156]. In a recently published paper, researchers employed an adaptive stochastic gradient descent algorithm to study the connection between NT thickness level and the potential existence of fetal anomalies. They collected 100 fetal US images to evaluate for anomalies. According to the authors, the accuracy of their model achieved 98.64% precision for classifying anomalies linked with NT thickness [
157]. The previously mentioned Lin et al. model was also capable of NT identification [
126].
Tekesin et al. demonstrated how valuable first-trimester US scanning can be performed by incorporating a detailed fetal anomaly scan into first-trimester screening algorithms, which is conducive to an improvement in the detection of trisomy 18 and 13, triploidies, and Turner syndrome [
158,
159]. Sun et al. developed a nomogram based on US images of fetuses with trisomy 21 in this context. Since nomograms are used in cases where multiple variables are available, they analyzed fetal profile images and identified facial markers and NT thickness. Based on the extracted markers, the LASSO (least absolute shrinkage and selection operator) method was used to make a prediction model for trisomy 21 screening in the first trimester of pregnancy. LASSO is a statistical method used for regression analysis. It adds a penalty term to the ordinary least squares method to shrink some of the coefficients to zero, effectively selecting the most critical variables and reducing model complexity. The resulting LASSO model achieved high accuracy, with AUC values of 0.983 and 0.979 in the training and validation sets, respectively [
153]. The nomogram method for detecting Down syndrome using US images is simple, understandable, and does not need many data. It works well with limited resources and avoids overfitting by automatically selecting markers. Neural network models are good at finding complex patterns but need a lot of labeled data and computing power. This makes the nomogram a good choice, especially when data are limited or interpretability is essential.
Tang et al. developed a fully automated prenatal screening algorithm called Pgds-ResNet based on deep neural networks. Their model detected high-risk fetuses affected by various common genetic diseases, such as trisomy 21, 18, and 13, along with rare genetic diseases. Their dataset consisted of 845 normal images and 275 rare genetic disease images. Their feature extraction process indicated that the fetal nose, jaw, and forehead contained valuable diagnostic information [
160]. However, their model was trained on a relatively small dataset from a single data center. Moreover, it was primarily designed for genetic abnormality screening rather than diagnosing specific conditions.
To detect trisomy 21, Zhang et al. constructed a CNN-based model using US images from 822 fetuses (548 were from normal fetuses and 274 were from fetuses diagnosed with trisomy 21). Their model was not only restricted to the NT thickness but successfully detected trisomy 21 based on images from the fetal head region with an accuracy of 89% in the validation set [
161]. Nevertheless, one of the limitations of their model was that it was only trained to diagnose trisomy 21. There are cases where the fetus presents with more than one trisomy. Thus, developing a multi-task learning model for the simultaneous recognition of various types of trisomy is necessary [
162,
163].