Cognitive Electronic Unit for AI-Guided Real-Time Echocardiographic Imaging

De Luca, Emanuele; Amato, Emanuele; Valente, Vincenzo; La Rocca, Marianna; Maggipinto, Tommaso; Bellotti, Roberto; Dell’Olio, Francesco

doi:10.3390/app15095001

Open AccessArticle

Cognitive Electronic Unit for AI-Guided Real-Time Echocardiographic Imaging

by

Emanuele De Luca

¹

,

Emanuele Amato

^2,3

,

Vincenzo Valente

⁴,

Marianna La Rocca

^2,3,

Tommaso Maggipinto

^2,3

,

Roberto Bellotti

^2,3 and

Francesco Dell’Olio

^1,*

¹

Micro Nano Sensor Group, Politecnico di Bari, 70125 Bari, Italy

²

Dipartimento Interateneo di Fisica, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy

³

Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, 70125 Bari, Italy

⁴

Predict S.p.A., 70125 Bari, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 5001; https://doi.org/10.3390/app15095001

Submission received: 28 March 2025 / Revised: 24 April 2025 / Accepted: 28 April 2025 / Published: 30 April 2025

(This article belongs to the Special Issue Recent Progress and Challenges of Digital Health and Bioengineering)

Download

Browse Figures

Versions Notes

Abstract

Echocardiography is a fundamental tool in cardiovascular diagnostics, providing radiation-free real-time assessments of cardiac function. However, its accuracy strongly depends on operator expertise, resulting in inter-operator variability that affects diagnostic consistency. Recent advances in artificial intelligence have enabled new applications for real-time image classification and probe guidance, but these typically rely on large datasets and specialized hardware such as GPU-based or embedded accelerators, limiting their clinical adoption. Here, we address this challenge by developing a cognitive electronic unit that integrates convolutional neural network (CNN) models and an inertial sensor for assisted echocardiography. We show that our system—powered by an NVIDIA Jetson Orin Nano—can effectively classify standard cardiac views and differentiate good-quality from poor-quality ultrasound images in real time even when trained on relatively small datasets. Preliminary results indicate that the combined use of CNN-based classification and inertial sensor-based feedback can reduce inter-operator variability and may also enhance diagnostic precision. By lowering barriers to data acquisition and providing real-time guidance, this system has the potential to benefit both novice and experienced sonographers, helping to standardize echocardiographic exams and improve patient outcomes. Further data collection and model refinements are ongoing, progressing the way for a more robust and widely applicable clinical solution.

Keywords:

assisted ultrasonography; artificial intelligence; embedded systems; deep learning

1. Introduction

Ultrasound is one of the most widely used diagnostic techniques due to its numerous advantages, including cost-effectiveness, safety (being radiation-free), and the ability to be performed in real time [1]. Additionally, ultrasound demonstrates significant versatility, being applicable to almost any part of the body except for bones, lungs, and sections of the intestine. However, performing an ultrasound scan correctly necessitates substantial training and years of experience. This requirement creates a considerable workload for specialists and complicates the repeatability of diagnostic examinations. Consequently, the expertise and skill of the sonographer profoundly influences the performance of an examination. This leads to substantial inter-operator variability stemming from inconsistent probe positioning and acquisition angles, which can in turn impact the clarity and diagnostic consistency of echocardiographic images. Recent advancements in artificial intelligence techniques have enabled the exploration of both robotic and assisted ultrasound applications to address these critical issues [2].

In the robotic scenario, a robotic arm performs ultrasound examinations under the guidance of artificial intelligence algorithms. These algorithms manage various functions, including planning the scanning path, adjusting the arm’s movement in space and the probe’s pressure on the scanned body area, and completing the scan. Robotic ultrasound systems can be classified into three categories based on their level of autonomy: teleoperated [3], semi-autonomous [4], and fully autonomous [5]. Teleoperated systems involve robotic arms piloted by an operator, aiming to reduce or prevent musculoskeletal disorders in physicians and facilitate remote diagnosis. Semi-autonomous systems autonomously perform some tasks, such as positioning the probe in the body region of interest, but still require operator intervention to conduct the examination. Fully autonomous systems can independently plan and execute the ultrasound scan, acquiring the necessary images without sonographer intervention and allowing the physician to review the examination subsequently. By contrast, in the assisted ultrasound scenario a sonographer performs the examination while being guided by an algorithm in their movement of the probe.

Although robotic ultrasound remains primarily within the research domain, some applications of assisted ultrasound have already reached the market, including [6], which validated an AI-based real-time guidance system enabling novice operators to obtain diagnostic-quality echocardiographic images with minimal training, illustrating the potential of assisted ultrasound solutions to broaden access and streamline workflow in clinical settings. In particular, echocardiography requires precise probe placement and immediate feedback to capture complex cardiac dynamics, making it especially poised to benefit from these innovations.

Notable advancements in the echocardiographic domain have employed deep learning algorithms to guide non-expert operators. To reduce inter-operator variability, the use of deep learning algorithms in echocardiographic examinations is proving promising for the selection of echocardiographic views, image quality control, and improved real-time probe guidance [7]. Among deep learning algorithms, CNNs and their variants have shown excellent performance in recognizing and segmenting cardiac structures [8]. The use of CNNs allows for the quantification of cardiac parameters and identification of pathological patterns with accuracy comparable to that of clinical experts [9]. Models developed for the classification of cardiac views differ in the number of classifiable views, but have shown accuracy levels of up to 98% [10] for five cardiac views [11] and fifteen cardiac views [12]. An important distinction must be made between classification performed on videos and on static images; because the heart is a moving organ, it is necessary to identify transitions between different views and consider changes in the image due to the opening and closing of the heart valves. Some studies have explored the use of recurrent neural networks (RNNs) and generative adversarial networks (GANs); the former can enable real-time image classification and provide guidance for correct probe alignment [13,14], while the latter have been used to adapt the quality of the ultrasound image to a user-defined level without the need for matched pairs of low- and high-quality images [15]. Using a GAN, [16] achieved accuracy greater than 90% in both classification and segmentation of cardiac views for the diagnosis of left ventricular hypertrophy. To improve performance, architectures that simultaneously use spatial and temporal information have recently been explored, including time-distributed and two-stream networks [14].

Despite significant advances in algorithms, there are numerous critical issues that need to be addressed in the development of assisted ultrasound applications that can be easily used by operators. The most relevant aspect is the need to acquire, label, and archive large datasets [9]. This activity should be carried out through multi-center studies and with different equipment, which is contrary to usual practice [11]. In addition, the use of specific datasets can improve the accuracy of automated diagnosis and promote greater integration of artificial intelligence into clinical practice for patients with congenital and structural heart disease [17]. Another significant limitation is poor image quality due to background noise and limited acoustic windows [18] as well as the movement of anatomical structures, which requires considering the temporal evolution of the image [12]. This aspect is linked to the poor interpretability of deep learning algorithms [7,8] and may leads to more cautious adoption of these technologies by healthcare professionals [10]. The computing power required to train such algorithms is another aspect that must be properly assessed, as it is necessary to find a balance between image resolution and computational load [16].

For assisted probe positioning, one of the most important issues is the tracking method [13], which can be implemented with cameras or IMUs. The use of cameras can lead to privacy issues and inflexible setups, while tracking with a single IMU is complex due to signal noise that prevents the correct reconstruction of the trajectory. One method that has been explored to obtain a generic trajectory of an object using a single IMU is to break down the acquired trajectory into basic trajectories, then pair these with predefined geometric models [19]. Another promising technique appears to be exploiting the continuous wavelet transform to correct drift and obtain a more accurate position estimate [20]. A further approach involves error compensation to correct drift accumulation through a reset mechanism. However, these approaches cannot compensate for poor sensor quality or avoid the need for accurate initial calibration [21].

In summary, existing solutions for assisted echocardiography often rely on large labeled datasets [9,11] or multiple sensors [13,19], and may still encounter issues with noise, drift, and interpretability, particularly when using single-IMU setups and AI-based image analysis [7,8]. While deep learning has achieved notable results in classifying echocardiographic views and monitoring image quality [13,14], practical barriers such as complex calibration steps [20,21], data scarcity [9,11], and computational overhead [16] continue to persist.

The objective of the ongoing research activity reported in this paper is to develop a cognitive unit that enhances ultrasound imaging by assisting operators with real-time guidance and assessment. In this framework, ‘cognitive’ highlights the system’s planned ability to learn optimal probe manipulation by fusing real-time inertial sensor data with CNN-based analysis, ultimately providing adaptive and context-specific guidance to the operator. This unit aims to improve image quality and provide valuable feedback during the scanning process. Analysis of the proposed system’s recognition of elementary movements were carried out using the accelerometer signal. The integration of such a cognitive unit with commercial ultrasound scanners can potentially aid in reducing inter-operator variability, benefiting both trainees and general practitioners.

2. Materials and Methods

2.1. Hardware Setup

The selection of appropriate hardware for executing deep learning algorithms was guided by a thorough literature review focused on evaluating the performance of edge devices in artificial intelligence applications. Given the parallel development of software and the absence of specific algorithms to test, it was crucial to base hardware choices on established benchmarks and comparative analyses available in the literature.

The initial comparative analysis included devices commonly tested in image processing applications, such as the ASUS Tinker Edge R, Raspberry Pi 4, Google Coral Dev Board, NVIDIA Jetson Nano, and Arduino Nano 33 BLE. The evaluation considered inference speed and accuracy across various network models. Findings indicated that the Google Coral Dev Board demonstrated superior performance in continuous computational applications for models compatible with the TensorFlow Lite framework. The NVIDIA Jetson Nano ranked closely behind, offering greater versatility and the ability to train models using the onboard GPU [22].

Further analysis narrowed the focus to GPU-equipped devices. The literature indicated that the NVIDIA Jetson Nano exhibited better image processing performance compared to the Jetson TX2, GTX 1060, and Tesla V100 in convolutional neural network applications [23]. Within the NVIDIA Jetson series, the Jetson Orin Nano was identified as significantly outperforming both the Jetson Nano and Jetson AGX Xavier for video processing tasks using convolutional neural network models developed in PyTorch and optimized with NVIDIA’s Torch-TensorRT SDK [24]. Based on these insights, the NVIDIA Jetson Orin Nano was selected for its robust performance. Table 1 summarizes the main features of the NVIDIA Jetson Orin Nano.

Figure 1 shows the experimental setup used for both ultrasound data collection and the inference phase. A small inertial measurement unit (IMU) was selected for attachment to the ultrasound probe in order to utilize the probe’s spatial position during inference.

After conducting tests on both online and offline ultrasound video processing through deep learning applications, we began to evaluate the performance of the hardware setup by developing a simple CNN model trained on the Camus dataset [25], which includes echocardiographic sequences from 500 patients with varying image quality. Given the high variability in image acquisition and quality, the only preprocessing step we applied was resizing all images to 256 × 256 pixels to ensure input uniformity. This minimal preprocessing allows the model to remain robust across different input qualities, which is crucial for generalization in real-world clinical settings.

The neural network then classified echocardiographic images into “Apical projection with 2-chamber view (2CH)”, “Apical projection with 4-chamber view (4CH)”, and “Unknown”, with the latter including all non-classifiable images. This network was designed to perform classification of the observed cardiac projection automatically and in real time.

A second CNN was trained on the same dataset to perform binary classification of image quality, distinguishing between “good quality” and “bad/poor quality” for each frame. In this study, we define “good quality” images as those displaying clear anatomical structures, adequate contrast, and minimal artifacts, whereas “bad/poor quality” images exhibit significant noise, low contrast, or insufficient structural visibility. Figure 2 presents example echocardiographic images acquired in both four-chamber (4CH) and two-chamber (2CH) views under varying quality conditions. No data from the inertial sensor were taken into account during the training and inference phases of the two models.

2.2. Neural Network Model and Data Capture

The CNN models used for cardiac projection classification (presented in Figure 3) and quality classification (presented in Figure 4) were both designed to balance computational complexity and generalization capacity, making them suitable for smaller datasets. The balance between computational complexity and generalization capacity was evaluated as a tradeoff between validation accuracy, inference time, and resource efficiency. The architecture comprises several layers, each of which performs specific operations to extract features and classify images. The original images were in NIfTI format and had different sizes; thus, they were converted into JPG format and resized to a resolution of

64 \times 64 \times 3

. This resolution was chosen as a compromise between preserving sufficient spatial information for classification and meeting the computational constraints of the deployment environment by keeping the inference computational cost lower. Each pixel is represented on 8 bits.

The first CNN model begins with a convolutional layer that applies eight filters of size

3 \times 3

on each input image (

64 \times 64 \times 3

). This convolutional layer uses a stride of [1,1] and ‘same’ padding in order to maintain output dimensions equal to the input and preserve the spatial dimensions of the feature maps across convolutional layers. Following the convolution, a rectified linear unit (ReLU) activation function [26] introduces nonlinearity into the model, which is essential for learning complex data relationships. To mitigate overfitting, a dropout layer with a 30% dropout rate is applied to randomly deactivate neurons during training. A dropout rate of 30% was selected through an iterative process in which different values were tested empirically. The second stage involves another convolutional layer, this time with 16 filters of size

3 \times 3

, maintaining the same stride and padding settings. The ReLU activation function is used again to ensure nonlinearity. Subsequently, a max pooling layer with a pool size of [2,2] and stride of [2,2] reduces the spatial dimensions of the input while preserving the most significant features. This choice was based on its ability to highlight the strongest activations within each region, which is particularly effective in compact architectures.

In the third stage, the model includes a third convolutional layer with 32 filters of size

3 \times 3

, followed by a max pooling layer with an identical configuration to the previous one. This further reduces the spatial dimensions of the input, ensuring that the model focuses on the most prominent features. The output from the previous layers is then flattened into a one-dimensional vector to prepare it for the fully connected layers. The first fully connected layer consists of 32 units, with a ReLU activation function to introduce further nonlinearity. The final fully connected layer is the output layer. It comprises three units, corresponding to the number of classes in the classification problem. This layer uses a softmax activation function to assign probabilities to each class.

The second CNN model is designed for image quality classification. It begins with two consecutive convolutional layers, each with 64 filters of size

3 \times 3

, using a stride of [1,1] and ‘same’ padding. Each convolution is followed by a ReLU activation, a

2 \times 2

max pooling layer with stride [2,2], and a 25% dropout layer. The second stage repeats this structure but with 32 filters per convolutional layer, while the third stage uses 16 filters. The feature maps are then flattened and passed through three fully connected layers of 64, 32, and 16 units, each followed by a ReLU activation and a 10% dropout. The final layer is a softmax output sized to the number of classes. The network is compiled using the Adam optimizer and the categorical cross-entropy loss function

Our choice of simpler models such as those described above was motivated by the limited availability of training data. More complex models such as ResNet [27] and vision transformer (ViT) [28] are more prone to overfitting and require extensive computational resources, which is not ideal for scenarios with smaller datasets. The modular structure of simple CNN models allows for easy integration of additional components such as long short-term memory (LSTM) layers [29], allowing for further enhancements if necessary. In a real-time acquisition scenario when the probe is held steady in the correct position, LSTM layers can capture the temporal dependencies between consecutive frames, effectively modeling the cardiac motion cycle and potentially improving the temporal coherence of predictions. This flexibility and efficiency makes such models well suited for the targeted application. It is also important to note that until now there has been no need to quantize the neural network model.

Due to strict patient privacy regulations, scarce physician availability for labeling, and the time-consuming nature of data annotation, data collection represents a critical aspect of artificial intelligence model development. Available datasets are often insufficient to meet research objectives, necessitating the acquisition of new data in collaboration with clinical professionals. To streamline this process, a comprehensive data capture unit was implemented. This unit is capable of recording ultrasound screens, capturing the spatial position of the probe, and receiving anonymized data from the ultrasound scanner. It also facilitates secure transmission of the acquired data for further analysis. The data capture unit incorporates components that automatically receive, process, and anonymize data from the ultrasound scanner. A dedicated graphical user interface (GUI) was developed to support the data collection process.

Despite the automation provided by this system, labeling the data requires significant back-office effort from clinical staff members. It has been established that each ultrasound examination requires not only the DICOM files but also the videos of each ultrasound examination. For each frame of these videos, the corresponding timestamps are saved, allowing the samples of the signals acquired from the accelerometer and gyroscope to be associated with each frame. The goal is to be able to associate the variations in image quality used to label each frame with the movements performed by the ultrasonographer.

3. Results

3.1. Evaluation of CNN Models

The performance of the first CNN is shown in Table 2, while the confusion matrix on the test set is reported in Figure 5.

For training and evaluation of the second CNN, only 500 images were used, each labeled at the frame level to capture variations in image quality. Because labels in the CAMUS dataset are defined on individual frames rather than full sequences, the model was trained using a stratified K-fold validation strategy, with a 70–10–20 split for training, validation, and testing, respectively. Consequently, 20% of the dataset was reserved exclusively for testing, while the remaining part was allocated to the training and validation phases. The overall accuracy of the model reached 66%. Considering that the dataset was originally designed for purposes different from ours, this result is satisfactory, although further data collection remains necessary. The confusion matrix is reported in Figure 6.

To provide a robust evaluation of classifier performance, the balanced accuracy and Matthews correlation coefficient (MCC) were computed. The results are reported in Table 3. These metrics offer complementary insights, with the balanced accuracy mitigating the effects of class imbalance and the MCC providing a comprehensive measure of overall prediction quality.

To verify the hardware performance, we acquired the ultrasound video stream during an echocardiography performed on a volunteer. Using the first CNN, we classified the cardiac projection contained in each frame in real time. The output of the network was printed on a monitor and superimposed on the ultrasound image. During the test, the CPU temperature remained stable at around 55.5 °C and the RAM usage was approximately 70%. The average inference time for a single frame was measured at 13.74 ± 2.48 ms, confirming the hardware’s capability for the intended application. It should be noted that until now it was not necessary to use the GPU to perform the tests.

3.2. IMU Tests

As we were unable to test the inertial sensor by connecting it to the ultrasound probe, we performed some preliminary analyses on the accelerometer signal. The purpose of performing these tests was to find a way to analyze the trajectories offline in order to associate the transitions in image quality with elementary operator movements, namely, translations and (where possible) rotations and tilting of the ultrasound probe. The probe is subject to drift, which prevents the correct position over time from being obtained through integration. Therefore, we used the robotic arm shown in Figure 7 to acquire the signal by moving the sensor parallel to the x-axis of its reference system, as follows: at the beginning of the acquisition, the sensor remained still for three seconds, creating an initial state of rest; subsequently, the sensor was moved from the origin in the positive direction of the x-axis for 30 cm; the sensor then remained still for another 3 s before being moved in the opposite direction back to the origin of the axis, where it remained still for another 3 s. This movement was repeated for 10 min while sampling the signal at 50 Hz. The acquired signal was filtered using a Kalman filter. Taking advantage of the intervals in which the sensor was still, it was then divided into windows labeled “positive direction” and “negative direction” to identify the direction of movement along the x-axis, as shown in Figure 8. The resulting windows were collected into a dataset, which was then used to train a random forest classifier to recognize the direction of movement, obtaining a classification accuracy of 92.5% on the training set and 75.0% on the test set.

4. Discussion

This study introduces a novel approach to improving echocardiographic imaging by combining advanced convolutional neural networks with inertial motion data to guide operators and optimize image quality in real time. Our approach is novel in that it operates on an embedded device not bound to a specific ultrasound system, allowing onboard inference without high-end servers and facilitating integration into diverse clinical workflows. Furthermore, to simplify the hardware requirements and mitigate privacy concerns, we investigate the use of a single inertial sensor for motion tracking rather than employing traditional camera-based setups. Our system achieved an overall classification accuracy of over 85% in differentiating apical 2CH, apical 4CH, and unrelated views. In addition, it achieved 66% accuracy in distinguishing good-quality vs. poor-quality images on a limited dataset. Notably, we observed an average inference time of ∼13 ms per frame, allowing for a live on-screen classification overlay without disrupting the clinical workflow.

Our system addresses two key tasks in ultrasound practice: accurate identification of clinically relevant cardiac views, and automated assessment of frame quality. Evaluating image quality helps to ensure uniform acquisition standards and minimizes the need for repeat exams, enabling second opinions and consultations without further patient exposure. Simultaneously, identifying clinically relevant cardiac views is essential for acquiring reproducible measurements; our system’s automated guidance can significantly assist novice operators in mastering this crucial step. By developing and training two specialized neural networks (a multiclass model for distinguishing apical 2CH, apical 4CH, and unrelated images along with a binary model for evaluating image clarity), our system achieves notable performance on relatively small datasets. These results emphasize the effectiveness of streamlined architectures, which can provide robust feature extraction and high inference speeds in resource-limited clinical environments.

The hardware foundation built around the NVIDIA Jetson Orin Nano proved capable of handling real-time data processing, maintaining stable performance metrics and modest power usage during continuous operation. When compared with existing approaches that often rely on large labeled datasets or multiple sensors, our system is designed to minimize hardware complexity and data requirements by integrating a single low-cost IMU sensor with lightweight CNN models. By conducting preliminary tests with a low-cost IMU, we also found that the system can detect and categorize basic unidirectional translational movements along a single axis, achieving preliminary accuracy of around 75–90% in a controlled setting utilizing a robotic arm. Although integration with our CNN-based image analysis has not yet been carried out, the ongoing research plan is to collect synchronized IMU and ultrasound data from real-world scanning, enabling a fused approach that offers comprehensive real-time guidance.

Collaboration with sonographers and a structured data collection initiative will be essential for further refinement, as large and diverse datasets will allow the system to accommodate a wide range of patient anatomies, ultrasound machines, and operator skill levels. The dedicated data acquisition setup captures anonymized video streams, IMU signals, and clinical metadata, paving the way for deeper analyses of how operator actions correlate with ultrasound images. In addition to standardizing the acquisition process, this integrated solution could significantly streamline training, reduce inter-operator variability, and enable more consistent echocardiographic examinations.

Looking ahead, the system’s design supports additional enhancements. Incorporating advanced drift compensation algorithms for the IMU along with interpretability techniques could allow clinicians to more transparently understand the network’s outputs. By guiding operators towards optimal probe positioning and providing immediate quality feedback, this approach aims to mitigate common errors in ultrasound scanning and improve patient care. Ultimately, these innovations hold promise not only for accelerating workflows but also for expanding the reach of expert-level echocardiographic imaging across diverse clinical settings.

5. Conclusions

This work has demonstrated that lightweight CNN architectures running on embedded hardware can perform near–real-time classification of echocardiographic views and image quality. Our approach reached over 85% accuracy for differentiating apical two-chamber, apical four-chamber, and unrelated views as well as 66% accuracy in distinguishing good-quality vs. poor-quality images on a relatively small dataset. With an average inference time of ∼13 ms per frame, the proposed system provides immediate visual feedback, aiding sonographers in maintaining consistent scanning protocols and reducing inter-operator variability. Building on these findings, we plan to expand our dataset through extensive clinical trials in order to capture a broader range of patient anatomies, imaging conditions, and device types to further validate and refine our models. Another key step is the direct integration of IMU signals into the CNN pipeline to create a unified spatiotemporal framework capable of capturing more intricate probe movements and cardiac dynamics. Enhancing interpretability is another critical research direction; by adopting explainable AI techniques, we aim to foster greater clinician trust and facilitate smoother adoption in everyday practice. Ultimately, these advancements—larger and more diverse datasets, unified IMU-CNN architectures, and improved interpretability—are all geared toward standardizing echocardiographic imaging while ensuring more accurate and reproducible examinations. By offering real-time guidance on an embedded platform, our system holds promise for streamlining clinical workflows, extending access to high-quality cardiac imaging, and supporting both novice and experienced sonographers in delivering consistent and reliable patient care.

Author Contributions

Conceptualization, E.D.L., E.A., M.L.R. and F.D.; formal analysis, M.L.R., T.M., R.B. and F.D.; funding acquisition, R.B. and F.D.; methodology, E.D.L. and E.A.; software, E.D.L., E.A. and V.V.; supervision, T.M., R.B. and F.D.; validation, M.L.R. and T.M.; writing—original draft, E.D.L.; writing—review and editing, E.D.L., E.A., V.V., M.L.R., T.M., R.B. and F.D. All authors have read and agreed to the published version of the manuscript.

Funding

The authors received no specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest. Author Mr. Vincenzo Valente is employed by the company Predict S.p.A., Bari, Italy. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Jiang, Z.; Salcudean, S.E.; Navab, N. Robotic Ultrasound Imaging: State-of-the-art and Future Perspectives. Med. Image Anal. 2023, 89, 102878. [Google Scholar] [CrossRef] [PubMed]
Tenajas, R.; Miraut, D.; Illana, C.I.; Alonso-Gonzalez, R.; Arias-Valcayo, F.; Herraiz, J.L. Recent Advances in Artificial Intelligence-Assisted Ultrasound Scanning. Appl. Sci. 2023, 13, 3693. [Google Scholar] [CrossRef]
Mathiassen, K.; Fjellin, J.E.; Glette, K.; Hol, P.K.; Elle, O.J. An Ultrasound Robotic System Using the Commercial Robot UR5. Front. Robot. AI 2016, 3, 1. [Google Scholar] [CrossRef]
Mathur, B.; Topiwala, A.; Schaffer, S.; Kam, M.; Saeidi, H.; Fleiter, T.; Krieger, A. A Semi-Autonomous Robotic System for Remote Trauma Assessment. In Proceedings of the 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), Athens, Greece, 28–30 October 2019; pp. 649–656. [Google Scholar] [CrossRef]
Huang, Q.; Lan, J.; Li, X. Robotic Arm Based Automatic Ultrasound Scanning for Three-Dimensional Imaging. IEEE Trans. Ind. Inform. 2019, 15, 1173–1182. [Google Scholar] [CrossRef]
Narang, A.; Bae, R.; Hong, H.; Thomas, Y.; Surette, S.; Cadieu, C.; Chaudhry, A.; Martin, R.P.; McCarthy, P.M.; Rubenson, D.S.; et al. Utility of a Deep-Learning Algorithm to Guide Novices to Acquire Echocardiograms for Limited Diagnostic Use. JAMA Cardiol. 2021, 6, 624. [Google Scholar] [CrossRef] [PubMed]
Ferraz, S.; Coimbra, M.; Pedrosa, J. Assisted Probe Guidance in Cardiac Ultrasound: A Review. Front. Cardiovasc. Med. 2023, 10, 1056055. [Google Scholar] [CrossRef] [PubMed]
Chen, C.; Qin, C.; Qiu, H.; Tarroni, G.; Duan, J.; Bai, W.; Rueckert, D. Deep Learning for Cardiac Image Segmentation: A Review. Front. Cardiovasc. Med. 2020, 7, 25. [Google Scholar] [CrossRef] [PubMed]
Alsharqi, M.; Woodward, W.J.; Mumith, J.A.; Markham, D.C.; Upton, R.; Leeson, P. Artificial Intelligence and Echocardiography. Echo Res. Pract. 2018, 5, R115–R125. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Du, M.; Chang, S.; Chen, Z. Artificial Intelligence in Echocardiography: Detection, Functional Evaluation, and Disease Diagnosis. Cardiovasc. Ultrasound 2021, 19, 29. [Google Scholar] [CrossRef] [PubMed]
Kusunose, K.; Haga, A.; Inoue, M.; Fukuda, D.; Yamada, H.; Sata, M. Clinically Feasible and Accurate View Classification of Echocardiographic Images Using Deep Learning. Biomolecules 2020, 10, 665. [Google Scholar] [CrossRef] [PubMed]
Madani, A.; Arnaout, R.; Mofrad, M.; Arnaout, R. Fast and Accurate View Classification of Echocardiograms Using Deep Learning. NPJ Digit. Med. 2018, 1, 6. [Google Scholar] [CrossRef] [PubMed]
Grimwood, A.; McNair, H.; Hu, Y.; Bonmati, E.; Barratt, D.; Harris, E.J. Assisted Probe Positioning for Ultrasound Guided Radiotherapy Using Image Sequence Classification. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2020; Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 12263, pp. 544–552. [Google Scholar] [CrossRef]
Howard, J.P.; Tan, J.; Shun-Shin, M.J.; Mahdi, D.; Nowbar, A.N.; Arnold, A.D.; Ahmad, Y.; McCartney, P.; Zolgharni, M.; Linton, N.W.F.; et al. Improving Ultrasound Video Classification: An Evaluation of Novel Deep Learning Methods in Echocardiography. J. Med. Artif. Intell. 2020, 3, 4. [Google Scholar] [CrossRef] [PubMed]
Liao, Z.; Jafari, M.H.; Girgis, H.; Gin, K.; Rohling, R.; Abolmaesumi, P.; Tsang, T. Echocardiography View Classification Using Quality Transfer Star Generative Adversarial Networks. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2019; Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.T., Khan, A., Eds.; Springer International Publishing: Cham, Switzerland, 2019; Volume 11765, pp. 687–695. [Google Scholar] [CrossRef]
Madani, A.; Ong, J.R.; Tibrewal, A.; Mofrad, M.R.K. Deep Echocardiography: Data-Efficient Supervised and Semi-Supervised Deep Learning towards Automated Diagnosis of Cardiac Disease. NPJ Digit. Med. 2018, 1, 59. [Google Scholar] [CrossRef] [PubMed]
Wegner, F.; Benesch Vidal, M.; Niehues, P.; Willy, K.; Radke, R.; Garthe, P.; Eckardt, L.; Baumgartner, H.; Diller, G.P.; Orwat, S. Accuracy of Deep Learning Echocardiographic View Classification in Patients with Congenital or Structural Heart Disease: Importance of Specific Datasets. J. Clin. Med. 2022, 11, 690. [Google Scholar] [CrossRef] [PubMed]
Akkus, Z.; Aly, Y.H.; Attia, I.Z.; Lopez-Jimenez, F.; Arruda-Olson, A.M.; Pellikka, P.A.; Pislaru, S.V.; Kane, G.C.; Friedman, P.A.; Oh, J.K. Artificial Intelligence (AI)-Empowered Echocardiography Interpretation: A State-of-the-Art Review. J. Clin. Med. 2021, 10, 1391. [Google Scholar] [CrossRef] [PubMed]
Pan, T.Y.; Kuo, C.H.; Liu, H.T.; Hu, M.C. Handwriting Trajectory Reconstruction Using Low-Cost IMU. IEEE Trans. Emerg. Top. Comput. Intell. 2018, 3, 261–270. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, Y. Arbitrary Spatial Trajectory Reconstruction Based on a Single Inertial Sensor. IEEE Sens. J. 2023, 23, 10009–10022. [Google Scholar] [CrossRef]
Suvorova, S.; Vaithianathan, T.; Caelli, T. Action Trajectory Reconstruction from Inertial Sensor Measurements. In Proceedings of the 2012 11th International Conference on Information Science, Signal Processing and Their Applications (ISSPA), Montreal, QC, Canada, 2–5 July 2012; pp. 989–994. [Google Scholar] [CrossRef]
Baller, S.P.; Jindal, A.; Chadha, M.; Gerndt, M. DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices. In Proceedings of the 2021 IEEE International Conference on Cloud Engineering (IC2E), San Francisco, CA, USA, 4–8 October 2021; pp. 20–30. [Google Scholar] [CrossRef]
Jo, J.; Jeong, S.; Kang, P. Benchmarking GPU-Accelerated Edge Devices. In Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Republic of Korea, 19–22 February 2020; pp. 117–120. [Google Scholar] [CrossRef]
Pham, H.V.; Tran, T.G.; Le, C.D.; Le, A.D.; Vo, H.B. Benchmarking Jetson Edge Devices with an End-to-End Video-Based Anomaly Detection System. In Advances in Information and Communication; Arai, K., Ed.; Springer Nature: Cham, Switzerland, 2024; Volume 920, pp. 358–374. [Google Scholar] [CrossRef]
Leclerc, S.; Smistad, E.; Pedrosa, J.; Ostvik, A.; Cervenansky, F.; Espinosa, F.; Espeland, T.; Berg, E.A.R.; Jodoin, P.M.; Grenier, T.; et al. Deep Learning for Segmentation Using an Open Large-Scale Dataset in 2D Echocardiography. IEEE Trans. Med. Imaging 2019, 38, 2198–2210. [Google Scholar] [CrossRef] [PubMed]
Agarap, A.F. Deep Learning Using Rectified Linear Units (ReLU). arXiv 2018, arXiv:1803.08375. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]

Figure 1. Hardware setup: the ultrasound scanner feed is acquired by a dedicated video capture board and processed by the Jetson Orin Nano, while an IMU captures the probe’s motion. Both streams are synchronized by the data acquisition system, enabling real-time classification (view type, image quality) and immediate operator feedback.

Figure 2. Examples of 4CH and 2CH cardiac views at different quality levels: (a) 4CH good quality, (b) 4CH poor quality, (c) 2CH good quality, and (d) 2CH poor quality.

Figure 3. CNN architecture for cardiac projection classification. This architecture was designed with shallow convolutional layers and limited fully connected blocks in order to balance accuracy and computational load while supporting real-time inference on embedded hardware.

Figure 4. CNN architecture for image quality classification. This architecture was configured with a modest number of layers to reduce memory usage and allow for rapid inference while preserving sufficient capacity to reliably differentiate image quality in resource-constrained scenarios.

Figure 5. Confusion matrix for cardiac projection classification on the test set; 1040 images were used.

Figure 6. Confusion matrix for image quality classification on the test set; images in the “medium” and “good” classes in the CAMUS dataset were grouped in the “good quality” class.

Figure 7. Robotic arm used to acquire the accelerometer signal; the red arrow shows the positive direction of the x-axis.

Figure 8. Filtered acceleration along the x-axis in (a) the positive direction and (b) the negative direction.

Table 1. NVIDIA Jetson Orin Nano technical specifications.

Parameter	Value
CPU cores	6
CUDA cores (GPU)	1024
Tensor cores (GPU)	32
TOP/s	40
Power (W)	7–15

Table 2. Classification report on the test set.

	Precision	Recall	F1-Score
2 CH	0.86	0.79	0.82
4 CH	0.81	0.88	0.85
Unknown	1.00	0.97	0.99
Accuracy	0.85	–	–
Macro Average	0.89	0.88	0.89
Weighted Average	0.85	0.85	0.85

Table 3. Performance metrics for cardiac projection and image quality classification. The table reports the balanced accuracy and Matthews correlation coefficient (MCC). This metric pair provides complementary insights into classifier robustness and performance across potentially imbalanced datasets.

Task	Balanced Accuracy (%)	MCC
Cardiac Projection Classification	88	0.73
Image Quality Classification	64	0.29

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

De Luca, E.; Amato, E.; Valente, V.; La Rocca, M.; Maggipinto, T.; Bellotti, R.; Dell’Olio, F. Cognitive Electronic Unit for AI-Guided Real-Time Echocardiographic Imaging. Appl. Sci. 2025, 15, 5001. https://doi.org/10.3390/app15095001

AMA Style

De Luca E, Amato E, Valente V, La Rocca M, Maggipinto T, Bellotti R, Dell’Olio F. Cognitive Electronic Unit for AI-Guided Real-Time Echocardiographic Imaging. Applied Sciences. 2025; 15(9):5001. https://doi.org/10.3390/app15095001

Chicago/Turabian Style

De Luca, Emanuele, Emanuele Amato, Vincenzo Valente, Marianna La Rocca, Tommaso Maggipinto, Roberto Bellotti, and Francesco Dell’Olio. 2025. "Cognitive Electronic Unit for AI-Guided Real-Time Echocardiographic Imaging" Applied Sciences 15, no. 9: 5001. https://doi.org/10.3390/app15095001

APA Style

De Luca, E., Amato, E., Valente, V., La Rocca, M., Maggipinto, T., Bellotti, R., & Dell’Olio, F. (2025). Cognitive Electronic Unit for AI-Guided Real-Time Echocardiographic Imaging. Applied Sciences, 15(9), 5001. https://doi.org/10.3390/app15095001

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cognitive Electronic Unit for AI-Guided Real-Time Echocardiographic Imaging

Abstract

1. Introduction

2. Materials and Methods

2.1. Hardware Setup

2.2. Neural Network Model and Data Capture

3. Results

3.1. Evaluation of CNN Models

3.2. IMU Tests

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI