Accuracy and Efficacy of Artificial Intelligence-Derived Automatic Measurements of Transthoracic Echocardiography in Routine Clinical Practice

Background: Transthoracic echocardiography (TTE) is the gold standard modality for evaluating cardiac morphology, function, and hemodynamics in clinical practice. While artificial intelligence (AI) is expected to contribute to improved accuracy and is being applied clinically, its impact on daily clinical practice has not been fully evaluated. Methods: We retrospectively examined 30 consecutive patients who underwent AI-equipped TTE at a single institution. All patients underwent manual and automatic measurements of TTE parameters using the AI-equipped TTE. Measurements were performed by three sonographers with varying experience levels: beginner, intermediate, and expert. Results: A comparison between the manual and automatic measurements assessed by the experts showed extremely high agreement in the left ventricular (LV) filling velocities (E wave: r = 0.998, A wave: r = 0.996; both p < 0.001). The automated measurements of LV end-diastolic and end-systolic diameters were slightly smaller (−2.41 mm and −1.19 mm) than the manual measurements, although without significant differences, and both methods showing high agreement (r = 0.942 and 0.977, both p < 0.001). However, LV wall thickness showed low agreement between the automated and manual measurements (septum: r = 0.670, posterior: r = 0.561; both p < 0.01), with automated measurements tending to be larger. Regarding interobserver variabilities, statistically significant agreement was observed among the measurements of expert, intermediate, and beginner sonographers for all the measurements. In terms of measurement time, automatic measurement significantly reduced measurement time compared to manual measurement (p < 0.001). Conclusions: This preliminary study confirms the accuracy and efficacy of AI-equipped TTE in routine clinical practice. A multicenter study with a larger sample size is warranted.


Introduction
Transthoracic echocardiography (TTE) is the most commonly utilized modality to assess cardiac morphology, function, and hemodynamics in routine clinical practice owing to its cost-effectiveness and minimally invasive nature [1,2].TTE is essential in various scenarios, including detecting myocardial damage due to anticancer drugs [3,4] and the rising incidence of heart failure and valvular diseases in an increasingly aging society [5].This modality is essential for determining the need for and effectiveness of treatments.Consequently, the demand for TTE has surged in recent years and is expected to continue to rise.Beyond the traditionally measured left-sided size and function, there is now a growing demand for assessing right-sided function and global longitudinal strain using speckle-tracking echocardiography [6][7][8][9][10][11][12][13], for which the number of parameters has continued to increase, increasing the complexity and time required for testing a problem in daily clinical practice.Additionally, the accuracy of echocardiographic assessments is known to be operator-dependent, with interobserver variability being a significant concern [1].Recently developed TTE systems equipped with artificial intelligence (AI)-assisted automatic measurement capabilities promise to enhance examination accuracy by reducing both measurement time and interobserver variability [14][15][16][17].However, the impact of these advancements on daily clinical practice has not yet been investigated.The purpose of this study was to compare the accuracy and time efficiency of routine examinations performed using an ultrasound system equipped with an AI application.

Study Population
This retrospective study enrolled 49 patients examined with AI-equipped echocardiography using an EPIQ CVx, X5-1c transducer (Philips Healthcare Ultrasound LLC, Bothell, WA, USA) between 15 December 2022 and 6 January 2023 at St Marianna Medical University Hospital.Of these, a total of 19 cases were excluded: atrial fibrillation (9 cases), PVC frequency (1 case), pacemaker implantation (1 case), severe bradycardia (1 case), no images recorded for analysis (6 cases), and difficult to delineate (1 case).Ultimately, 30 cases were included.The study protocol was approved by our ethics committee (approval no.6189), and patient consent was obtained using an opt-out approach.The AI used in this study was based on TTE data collected from a large number of adults of various ethnicities in a multicenter, international approach spanning multiple continents in the Americas (North, Central, and South America), Europe, Africa, and Asia (North, Central, and South Asia), totaling more than 3000 TTE results from healthy subjects and patients with various heart disease, which were used to train and validate the algorithm.

Transthoracic Echocardiography
Two-dimensional (2D) and Doppler echocardiography were performed according to the American Society of Echocardiography guidelines [2,7].Measurements included parasternal left ventricular (LV) long-axis images of interventricular septum (IVS), left ventricular posterior wall (LVPW), LV end-diastolic diameter (LVDd), LV dimension diastoles (LVDs), mitral inflow velocity (E, A, and E/A) by pulsed Doppler, deceleration time (DT), LV outflow tract (LVOT) diameter to measure time-integrated values (VTI-velocity time integral) and peak velocity, tissue Doppler measurement of septal mitral annular velocity waveforms e' and a', and lateral mitral annular velocities e' and a' using tissue Doppler.The automatic measurements were performed using an automatic analysis software designed with AI-based algorithms, for the same parameters as in the manual measurements.The measurements were carried out offline on the equipment with the data stored on a hard disk.

Manual Measurements
Measurements of the aforementioned parameters were performed manually on the TTE machine for LVDd, IVS, and LVPW, and three measurements were taken at the first frame immediately after mitral valve closure or at the peak of the R wave of the ECG at end-diastole, just below the mitral valve leaflet and perpendicular to the endocardial border of the ventricular septum and posterior wall.LVDs were measured when the LV was smallest just before the mitral valve opened during diastole.Mitral inflow velocities were measured from the recorded waveforms for E wave (early diastole), A wave (atrial systole) velocity, and E wave deceleration time (DT).Mitral annular velocity waveforms were measured from the recorded waveforms on the septal and lateral sides of the mitral annulus, measuring e' (early diastolic) and a' (atrial systolic) velocities, respectively.The LVOT velocity was traced, and the velocity-time integral (VTI) and peak velocity of the LVOT flow were measured.

Automatic Measurements
The automatic measurements were performed using an automatic analysis software designed with AI-based algorithms for the same parameters as in the manual measurements.After automatic measurement, corrections were made as needed (Figure 1).The "Auto Measure" function was trained to predict the measured values for all items using an algorithm in accordance with the American Society of Echocardiography Guidelines [7], and the time phase setting and measurement were performed automatically when the panel button for each measurement item was pressed.For the LVDd, IVS, LVPW, and LVDs, the time phase was automatically set to end-diastole or end-systole as appropriate.If the time phase did not match, manual correction was made so that the frame was set at end-diastole or end-systole as appropriate.Measurements of left ventricular wall thickness and left ventricular diameter were taken just below the apex of the mitral valve leaflet, perpendicular to the left ventricular long axis.The measurements were made just above the boundary between the ventricular septum and the lumen and between the left ventricular posterior wall and the pericardium, and corrections were made for measurement sites that did not align, such as in the case of poor images (Figure 1B).Doppler velocities were also automatically measured, and corrections were made for those Doppler velocities that were not measured correctly due to irregular envelopes (Figure 1D).
J. Clin.Med.2024, 13, x FOR PEER REVIEW 3 of 12 velocity was traced, and the velocity-time integral (VTI) and peak velocity of the LVOT flow were measured.

Automatic Measurements
The automatic measurements were performed using an automatic analysis software designed with AI-based algorithms for the same parameters as in the manual measurements.After automatic measurement, corrections were made as needed (Figure 1).The "Auto Measure" function was trained to predict the measured values for all items using an algorithm in accordance with the American Society of Echocardiography Guidelines [7], and the time phase setting and measurement were performed automatically when the panel button for each measurement item was pressed.For the LVDd, IVS, LVPW, and LVDs, the time phase was automatically set to end-diastole or end-systole as appropriate.If the time phase did not match, manual correction was made so that the frame was set at end-diastole or end-systole as appropriate.Measurements of left ventricular wall thickness and left ventricular diameter were taken just below the apex of the mitral valve leaflet, perpendicular to the left ventricular long axis.The measurements were made just above the boundary between the ventricular septum and the lumen and between the left ventricular posterior wall and the pericardium, and corrections were made for measurement sites that did not align, such as in the case of poor images (Figure 1B).Doppler velocities were also automatically measured, and corrections were made for those Doppler velocities that were not measured correctly due to irregular envelopes (Figure 1D).

Reproducibility
The manual and automated measurements for all cases and all parameters were tested for reproducibility by three investigators.To reduce potential bias between measurements, the manual and automatic measurements were performed at least two days apart.All measurements were also performed by three sonographers with different years of echocardiographic experience: a beginner with less than one year of practice, an intermediate technician with less than five years, and an expert with more than 20 years.

Reproducibility
The manual and automated measurements for all cases and all parameters were tested for reproducibility by three investigators.To reduce potential bias between measurements, the manual and automatic measurements were performed at least two days apart.All measurements were also performed by three sonographers with different years of echocardiographic experience: a beginner with less than one year of practice, an intermediate technician with less than five years, and an expert with more than 20 years.Interobserver and intraobserver measurement reproducibility using the manual and automatic measurements was performed in all cases.Two investigators independently analyzed the same images.These investigators were blinded to each other's results and all other previous measurements.

Examination Time Analysis
Each investigator recorded the time required to take the manual and automatic measurements.The timer was paused when the reader switched between images and was restarted with the reinitiation of further measurements.

Statistical Analysis
Continuous variables were expressed as median and inter-quartile range (IQR) or percentage according to the data distribution.Both Spearman's rank correlation coefficient and the Bland-Altman method were used to investigate measurement error between manual and automatic measurements and between investigators.The time required for measurement was analyzed using the Wilcoxon signed rank sum test, with p < 0.05 indicating a significant difference.Statistical analyses were performed using GraphPad Prism version 10.1.0(264) for Mac OS (La Jolla, CA, USA).

Comparisons of Manual vs. Automatic Measurements
Both the measurements manually evaluated by the expert investigator and the AIbased automatic measurements are shown in Table 2.For LVDd and LVDs, the automatic measurements showed slightly smaller values than the manual measurements and were consistent (LVDd: r = 0.942; LVDs: r = 0. 977) (both p < 0.001), with a bias of −2.41 mm for LVDd and −1.19 mm for LVDs, according to the Bland-Altman analysis.There was little substantial dissimilarity between the two measurements.On the other hand, the correlation coefficients for IVS and LVPW were slightly lower (IVS: r = 0.670; LVPW: r = 0.561) and automatic measurements tended to measure a slightly thicker wall thickness than manual measurements.For the E and A waves of the LV inflow velocity, a very high agreement was observed between the automatic and manual measurements (E wave: r = 0.998; A wave: r = 0.996), with a small bias, based on the Bland-Altman analysis (E wave: 1.37 cm/s; A wave: 0.08 cm/s).For DT (deceleration time) and tissue Doppler (e' and a' waves), automated measurements also showed high agreement with the manual measurements (correlation coefficient > 0.832), especially for e' in the lateral wall (r ≥ 0.957 or higher), which was in very strong agreement with the manual measurements.The LVOT VTI and peak velocity also showed very high agreement between the two measurements (VTI: r = 0.982; peak velocity: r = 0.972), and the bias from the Bland-Altman analysis was very small (VTI: −0.13 cm; peak velocity: 2.87 cm/s).In addition, Table 3 shows the results of a study on the impact of image quality on measurement accuracy.Better image quality improves the accuracy of automatic measurement.

Reproducibility
A comparison between the manual measurements taken by experts and those taken by intermediate users and beginners is shown in Table 4.The correlation coefficients ranged from 0.549 to 0.992, indicating that most of the indexes were reliable for the measurement of each parameter, regardless of the experience level.Furthermore, the manual measurements by experts and the automatic measurements by intermediate users and beginners are shown in Table 5.Overall, the correlation coefficients were high, with p < 0.05 for all measurements, indicating statistically significant agreement between the intermediate and beginner automatic measurements and the expert manual measurements.A high level of agreement in LVDd and LVDs was observed between the manual (expert) and automatic (intermediate and beginner) measurements, with a correlation coefficient of 0.92 for each.For LV wall thickness, the agreement was lower for IVS (intermediate users: r = 0.70; beginners: r = 0.75) and for LVPW (intermediate users: r = 0.51; beginners: r = 0.41) than for the other measurements.High agreement was found for VTI and peak velocity in the LVOT (beginners: r = 0.97; intermediate users: r = 0.95).The E and A waves also showed very high agreement (beginners: r = 0.99; intermediate users: r = 0.99).Automatic measurements tended to be more consistent than manual measurements for most measurements.

Discussion
This study is the first to investigate the accuracy of commercially available AI-assisted automated TTE measurements and their implications for routine clinical practice.
The key findings are as follows: (1) High accuracy of AI: The accuracy of many echocardiographic parameters was high for automated measurements using AI.This was particularly true for the Doppler echocardiography, which showed a high degree of agreement with manual measurements.(2) Reduced examination time required: Automatic measurement reduced the examination time compared to manual measurement, suggesting that it may contribute to increased efficiency of the examination.The reduction was particularly noticeable for beginners.(3) Reduction in interobserver variabilities: The use of automated measurements reduced interobserver variabilities between experts and beginners, indicating that it can also be used as an educational tool.
Previous studies related to AI-based automation in echocardiography have reported the automation of morphological and functional assessments and the use of machine learning algorithms for image recognition and analyses for use in diagnosis [18][19][20][21][22][23][24][25][26].This study focused on basic measurement parameters performed in routine clinical practice.AI-based automated measurements showed high agreement with conventional manual measurements for several measurements performed in echocardiography, with particularly significant agreement (r > 0.99) for Doppler indexes.This is a result of AI technology facilitating the standardization of measurements, indicating that AI facilitates measurement standardization and reduces interobserver variabilities.However, the results for LV wall thickness (IVS and LVPW) were less consistent than those for the other measurements, and care should be taken to ensure that the measurement of LV wall thickness does not include the wall column, the right ventricular zone, or the subvalvular tissue of the tricuspid valve and that the boundary between the right and left ventricular cavity is measured so that the LV posterior wall side is not included, nor the boundary between the LV cavity and the myocardium or the mitral valve subvalvular tissue [2].This is thought to be due to the fact that measurements require care and are susceptible to influence, such as not including the boundary between the left ventricular lumen and myocardium or the subvalvular tissue of the mitral valve, indicating the need for careful assessment as appropriate in certain parameters.
With regard to the time required to carry out multi-item measurements, the results indicate that automatic measurements have the potential to reduce measurement time.
For both expert and beginner groups, the use of automatic measurements resulted in a significant reduction in the time required to take measurements.This suggests that automated measurement is an effective tool to improve the efficiency of ultrasound examinations not only for technicians with advanced expertise, but also for less-experienced technicians.In a previous study, Knackstedt et al. [15] found that fully automated LV volume and ejection fraction measurements reduced the measurement time and enabled more efficient examinations to be performed.In the present study, a similar reduction in measurement time was achieved.The fact that experts were able to reduce the time required for measurement by using automated measurement complements a great deal of experience and skill, in addition to the inspection itself being carried out more quickly.It is clear that beginners can significantly reduce the time required to take measurements by using automatic measurement.The results suggest that beginners may have taken longer to perform manual measurements due to uncertainty and technical inexperience.Although echocardiographic studies using AI have reported its usefulness as an image acquisition guide for inexperienced beginners and as an educational and diagnostic aid [14,27], the automatic measurement used in this study may also be useful for beginners.The automatic measurement used in this study can also be used by beginners as a guide for automatic analysis itself, which may allow for a faster examination and more efficient skills training.In contrast, no statistically significant difference in the time required for measurement was observed for intermediate users; however, when the time required was divided into groups according to image quality, a statistically significant difference was observed between automatic and manual measurements as image quality improved, indicating that measurement can be carried out more efficiently.This suggests that the AI-based automatic measurement algorithm works efficiently when the image quality is good, but requires more correction and time when image quality is low.The lack of statistically significant differences between the automatic and manual measurements performed by intermediate users may be due to the possibility that they have sufficient experience in echocardiography to have a degree of proficiency in manual measurements but have not adapted to automatic measurement techniques, which may make it difficult for the benefits of AI to emerge.Additional research is needed on this point, in particular with technicians of many different levels of expertise.However, taken together, the results on the use of automatic measurement in beginners indicate that automatic measurement may significantly improve the measurement time and accuracy of beginners compared to manual measurement, even for all image qualities.The improvements were particularly noticeable for poor images, suggesting that automatic measurement can be used by beginners as an inspection aid tool to enable reliable measurements, as well as an educational support tool when learning the technique.While the current study focused on the impact of AI on the accuracy of echocardiographic measurements and examination time, future studies should investigate how AI measurements can influence diagnostic flow and decision making in clinical practice.

Study Limitations
The study had several limitations.First, this study is a small, single-center study with limited data, which may be insufficient for determining statistical significance.There may also be a lack of data coverage and diversity of patients with different cases and clinical backgrounds.Secondly, the assessment of image quality includes subjective assessments, which may lead to bias.Thirdly, although comparisons with other modalities, such as MRI, have not been made, the automatic measurement parameters in this study are not volume data of the LV and their usefulness in routine clinical practice has been verified, which is not necessarily.Finally, this study has not been compared with AI systems from other vendors, nor has it examined the superiority of our system's AI relative to others, warranting further investigation.

Conclusions
In this study, AI-based automated measurements were found to have the potential to achieve high accuracy in routine clinical practice.The results also suggest that these types of measurements could contribute to reducing examination duration and eliminating interobserver variabilities.Although large-scale multicenter prospective studies are warranted in the future to confirm and expand on these findings, it is expected that AI-equipped TTE will be widely used in daily clinical practice.

Figure 1 .
Figure 1.Representative cases where fully automatic measurement was possible (A and C), and cases where correction was necessary (B and D).(A) had good image quality and did not need to be corrected after automatic measurement.(B) was of poor image quality, and the measurement position did not capture the boundaries of the left ventricle, so a correction was made.(C) No correction was made after automatic measurement.(D) Corrections were made because the boundaries of the pulsed Doppler waveform were not captured.

Figure 1 .
Figure 1.Representative cases where fully automatic measurement was possible (A,C), and cases where correction was necessary (B,D).(A) had good image quality and did not need to be corrected after automatic measurement.(B) was of poor image quality, and the measurement position did not capture the boundaries of the left ventricle, so a correction was made.(C) No correction was made after automatic measurement.(D) Corrections were made because the boundaries of the pulsed Doppler waveform were not captured.

Figure 2
Figure 2 show the results of a comparison of the time required for manual and automatic measurements for investigators with different levels of experience.Experts significantly reduced their measurement time with the use of automatic measurement (manual (81.5 [73.4-92.0]seconds) vs. automatic (59.0 [38.0-75.0]seconds; p < 0.001).No statistically significant differences were found between the two intermediate groups (manual (80.0 [76.0-99.5]seconds) vs. automatic (82.0 [66.3-95.0]seconds); p = 0.296).Similar to the experts, beginners took significantly less time with automatic measurement (manual (121.5 [103.0-169.3]seconds) vs. automatic (89.0 [73.0-103.3]seconds; p < 0.001).Table 6 shows the results of a comparison of the variation in examination duration between manual and automatic measurements according to image quality.Beginners consistently took longer than intermediate users and experts for all image qualities in manual measurements, with a median of 120.0 [105.0-205.5]seconds for poor image quality, 121.5 [103.8-169.3]ms for fair image quality, and 124.5 [104.3-146.8]ms for good image quality in manual measurements.Automatic measurements took less time than manual measurements: poor image quality, 84 [72-105] s; fair image quality, 97 [79-114] s; and good image quality, 89 [67-96] s.For intermediate users, automatic measurements took a slightly longer time than manual measurements for poor and fair image quality.However, for good image quality, automatic measurements took significantly less time than manual measurements.Among the experts, automatic measurements reduced the measurement time for all image qualities compared to manual measurements, and similar to the intermediate level, as the image quality improved, so did the measurement time with automatic measurements (poor image quality, 83.0 [55.5-95.0]s; fair image quality, 55.0 [49.0-78.0]s; and good image quality, 52.3 [41.5-67.8]s).

[
76.0-99.5]seconds) vs. automatic (82.0 [66.3-95.0]seconds); p = 0 perts, beginners took significantly less time with automatic measu [103.0-169.3]seconds) vs. automatic (89.0 [73.0-103.3]seconds; p the results of a comparison of the variation in examination duratio automatic measurements according to image quality.Beginners c than intermediate users and experts for all image qualities in manu a median of 120.0 [105.0-205.5]seconds for poor image quality, 12 fair image quality, and 124.5 [104.3-146.8]ms for good image qual ments.Automatic measurements took less time than manual mea quality, 84 [72-105] s; fair image quality, 97 [79-114] s; and good im s.For intermediate users, automatic measurements took a slightly ual measurements for poor and fair image quality.However, for g tomatic measurements took significantly less time than manual m the experts, automatic measurements reduced the measurement ti ties compared to manual measurements, and similar to the interm age quality improved, so did the measurement time with automat image quality, 83.0 [55.5-95.0]s; fair image quality, 55.0 [49.0-78 quality, 52.3 [41.5-67.8]s).

Figure 2 .
Figure 2. Comparisons of examination duration in each level of investigat ners showed a reduction in examination time for automatic measurement measurement, but no difference in examination time was observed betwe intermediate users.

Figure 2 .
Figure 2. Comparisons of examination duration in each level of investigators.Experts and beginners showed a reduction in examination time for automatic measurement compared to manual measurement, but no difference in examination time was observed between the two methods for intermediate users.

Table 2 .
Comparison of manual vs. automatic measurements assessed by expert.

Table 3 .
Comparison of manual vs. automatic measurements based on the image quality.

Table 4 .
Comparison of manual measurements among experts, intermediate users, and beginners.

Table 6 .
Impact of image quality on measurement time.

Table 6 .
Impact of image quality on measurement time.