1. Introduction
Pressed pipe-fitting connections are critical components in aerospace and other high-performance applications. In order to ensure structural integrity, sealing capability, and process safety while maintaining low weight, nondestructive evaluation (NDE) is inevitable. Faulty connections can lead to leakage, structural failure, or costly rework. The groove filling level directly correlates with connection quality and service life. Conventional quality control is often carried out destructively by micrographs, which are time-consuming and costly. Furthermore, it is impossible to achieve 100 percent inspection using these methods. Process variability, such as that due to pressing force, material tolerances and tool wear, requires an inline-capable NDE solution.
The subject of this work is the examination of pressed pipe-fitting connections. We present the main results of the Project “Hauptarbeitspaket 2–NDT-Verfahren (Einrollen)” [
1]. The fittings have six annular grooves on the inside that run around the circumference with a cross-sectional area of 1 mm × 0.25 mm. A sketch of the cross-section of such a pipe-fitting connection is shown in
Figure 1. The filling level of the grooves was defined as shown in
Figure 2. Considering a single groove,
x is defined as the difference between the highest and lowest point on the pipe and
y as the difference between the highest and the lowest point of the fitting. The filling level of the groove is then given as
.
The pipes consist of pure titanium, and the fittings consist of a titanium alloy. Photographs of a fitting are shown in
Figure 3a,b, and in
Figure 3c, a photograph of a pressed pipe-fitting connection taken from the inner side of the pipe is shown. The imprints of the tool for pressing the inner pipe into the grooves of the fitting are clearly visible.
For the joining of the pipe-fitting connections, the fittings are placed on the pipes, which are then pressed into the fittings resp. The grooves of the fittings from the inside. The task is to nondestructively determine the filling level of those grooves.
In aerospace fluid-distribution systems, pressed titanium pipe-fitting connections of the type considered here are used in fuel, hydraulic, and bleed-air lines where tightness, low mass, and high reliability are mandatory. In current industrial practice, the quality of such joints is typically qualified by destructive sectioning of a limited number of specimens and, where feasible, by high-resolution X-ray CT on representative samples, while routine production relies mainly on process control and leak testing. To our knowledge, there is no established nondestructive technique that provides quantitative groove-by-groove filling levels for these titanium press-fit joints under production conditions. The present work, therefore, addresses an industrially relevant gap by developing a PAUT plus CNN route that yields per-groove filling values that can, in principle, be integrated into automated test stations for aerospace pipe manufacturing.
2. State of Art and Related Work
The state-of-the-art solutions for the given inspection task include X-ray computed tomography (CT), which allows for a complete 3D visualization with a high resolution. X-ray imaging measurements combine X-ray exposure with computational reconstruction to produce images or sequences of the object studied [
2]. In contrast to radiography 2D outputs, X-ray CT provides 3D models by taking many radiographs as the specimen rotates between the source and the detector, followed by model reconstruction and visualization [
3]. This technique delivers a high spatial resolution, making X-ray methods particularly effective for nondestructively identifying most irregularities in materials and welds. The process can be time-consuming and, due to the required rotation, may be restricted by geometry, depending on material and thickness. Therefore, radiography and CT are often used as reference methods [
3,
4]. X-ray techniques can detect very small defects within the specimen’s volume and can also be used to characterize different material or joint properties beyond defect detection [
5]. The disadvantages in this case are, on the one hand, the similar X-ray attenuation of Ti and Ti-alloy, which are often used for such joints in aerospace applications and results in low material contrast. Further, it would require long measurement times in the range of hours to days by means of
CT to achieve sufficient voxel resolution in the order of 10
m, which makes an inline integration impossible. For these reasons X-ray CT examinations were used solely as reference methods to visualize the joint quality of selected test specimens and thus ultimately to assess the detection limits of the other NDT methods within the described project. A possible remedy for these shortcomings is provided by ultrasound procedures, although the quantitative filling level determination in sub-millimeter grooves remains barely documented.
With respect to the achievable resolution and detection sensitivity, conventional ultrasound techniques often cannot detect sufficiently small irregularities required for efficient and safe lightweight design structures. To address this, specialized high-frequency ultrasound methods (HF-US) have been developed, operating in the 10 to 200 MHz range. The higher ultrasound frequency yields an improved resolution and sensitivity, which can be achieved by performing measurements in a water-filled immersion tank, taking advantage of the velocity difference between water and air [
6]. As a result, irregularities with a size up to 0.2 mm and even kissing bonds in weld seams, which are very difficult to detect with other NDT methods, can be reliably detected [
5]. In contrast to the measurements using a single transducer as a probe, phased array ultrasonic testing (PAUT) uses arrays of probes. An array consists of several elementary transducers that can be controlled independently of each other. They generate spherical (matrix array) or cylindrical (linear array) waves that overlap in the material and form a wavefront through their interference. By exciting the elements at different times, the shape of the interference can be controlled and thus swiveled or focused [
5,
7]. Nevertheless, HF-US and PAUT are not applicable as inline NDT-methods for a lot of applications due to corrosive effects. In the case of the application described in this paper, no adverse effects due to corrosive effects are to be expected, which is why the groove fill level was determined using PAUT.
For titanium alloys, ultrasonic propagation is characterized by comparatively high attenuation and microstructural scattering at high frequencies, as well as a relatively small acoustic impedance contrast between different titanium grades. At 20 MHz, this limits the inspection depth but remains sufficient for the wall thickness of the present pipes while, at the same time, making purely amplitude-based defect detection challenging. Phased array immersion testing was, therefore, chosen over single-element probes or contact setups because electronic focusing and steering enable the concentration of energy in the groove region and the acquisition of 2D B-scans in a single rotation. These B-scans contain multiple reflection paths and later echoes that are particularly valuable for distinguishing different groove filling levels in low-contrast titanium joints.
The ever-increasing use of AI in NDE also contributes to new possibilities and more robust evaluations of NDE sensor data. In recent years, there has been an increase in research focusing on the AI-based evaluation of ultrasound data for various inspection situations. Several studies support the feasibility of operating on minimally processed PAUT data. Siljama et al. show that CNNs can ingest multi-channel PAUT B-scans without the synthetic aperture focusing technique (SAFT) or the total focusing method (TFM) to detect weld flaws, leveraging augmentation to scale training [
8]. Virkkunen et al. similarly train deep CNNs on immersion PAUT B-scans around 1.8–2 MHz, using extensive virtual flaw augmentation while avoiding reconstruction [
9]. Pushing toward even rawer inputs, Jia and Rakhmatov classify crack attributes directly from 2D raw channel frames, highlighting the value of phase-preserved measurements over beamformed images [
10]. These works collectively demonstrate robust learning on raw or near-raw PAUT data, but they focus on detection or attribute classification. Continuous regression from array ultrasonics has been demonstrated most convincingly on reconstructed images. Pyle et al. use CNNs on plane-wave images at 5 MHz to regress crack length and angle, outperforming conventional sizing with hybrid simulated and experimental datasets [
11]. This establishes the practicality of quantitative targets in ultrasonic ML. Complementary evidence for quantitative characterization from coherent raw representations comes from Bai et al., who compare ML and Bayesian inversion on scattering matrices derived from FMC scans, arguing that physics-aware, phase-coherent domains can support parameter estimation with uncertainty [
12]. Together, these precedents suggest that regression is feasible and that raw or coherent representations are advantageous. Although not targeting regression from raw B-scans, this line of work underscores the acoustic and calibration demands of high-frequency arrays in Ti that are directly relevant at approximately 20 MHz. As a geometry analog, Shi et al. classify inner-wall circumferential slots using raw A-scans at around 2 MHz, demonstrating the practicality of circumferential scanning with a probe aligned to the pipe, but without PAUT, without regression, and at a far lower frequency [
13]. For broader context, two additional references illustrate the surrounding landscape without directly advancing the target. Latete et al. explore CNNs for PAUT defect location, identification, and sizing, contributing to the general trend toward ML-driven characterization in array ultrasonics, but without raw RF B-scan inputs, titanium-specific setups, or per-feature continuous regression [
14]. Naddaf-Sh et al. benchmark transformer and YOLO detectors on industrial PAUT B-scan images of pipeline welds, strengthening image-level detection baselines on real data but not addressing raw RF inputs, titanium materials, or continuous regression of quantitative targets [
15]. In parallel, several studies have investigated the ultrasonic testing of titanium alloys and quantitative characterization of small geometric features such as shallow surface-breaking notches or narrow grooves in metallic components [
16,
17]. These works demonstrate that sub-millimeter defect sizing in titanium is feasible in principle, but they focus on different joint geometries and do not provide continuous per-groove filling levels in press-fit connections. The present study complements this literature by targeting closed sub-millimeter grooves in titanium pipe-fitting joints and by combining high-frequency PAUT with CNN-based regression of the groove filling. In summary, the closest building blocks to our goal are raw multi-channel or channel-frame ingestion without reconstruction for detection or classification [
8,
9,
10], continuous regression for ultrasonic characterization but on reconstructed images [
11], coherent raw-domain parameter estimation with uncertainty [
12], and circumferential inner-wall slot scanning as a geometry analog at a low frequency [
13]. To our knowledge, within this set, there is no prior demonstration of continuous 0–100 percent per-groove fill regression from raw PAUT B-scans in titanium press-fit connections. The present work addresses this gap by combining phase-preserved raw B-scan inputs, high-frequency Ti acquisition, and per-groove CNN regression grounded in destructive metrology while situating results against image-level detection baselines and general PAUT-CNN sizing efforts for context [
14,
15].
Compared to conventional inspection routes, the proposed PAUT-CNN approach offers several advantages for the present titanium press-fit joints. X-ray computed tomography is limited by low contrast between pipe and fitting, requires long scan times, and is therefore unsuitable for inline use in this context. Destructive micrographs provide accurate groove filling levels but are slow, costly, and cannot be applied to every joint. Manual high-frequency ultrasound evaluation with a single probe, as illustrated in
Table 1, yields only moderate agreement with CT references and does not scale well because it relies on hand-picked transit times. A more automated time-of-flight evaluation based on hand-crafted features and multi-echo rules would be possible in principle, but it would require complex and inspection-specific signal processing pipelines that are difficult to tune and to maintain under varying noise and coupling conditions. In contrast, the PAUT-CNN method exploits the full raw B-scan, including later echoes, learns the relevant patterns directly from data without explicit feature engineering, and achieves a test RMSE of about 7% across all grooves while remaining fully nondestructive.
The application of convolutional neural networks to raw sensor data follows the broader paradigm of deep learning, where hierarchical representations are learned directly from input data through end-to-end optimization [
18]. This approach has demonstrated superior performance across diverse domains by eliminating manual feature engineering in favor of data-driven feature extraction through multiple layers of nonlinear transformations. For ultrasonic NDE, this paradigm shift enables the network to discover complex acoustic patterns (multi-echo interference, phase relationships, geometric signatures) that would be difficult to encode through conventional signal processing rules, particularly for sub-millimeter features in titanium joints where traditional time-of-flight analysis showed limited accuracy.
3. Methodology
In the first step, a pipe-fitting connection was tested from the outside of the pipe using phased array ultrasound in order to obtain an initial indication of the detectability of the grooves. The test setup for performing the measurements on the connections in question is shown in
Figure 4. The test was carried out using the immersion technique with a 20 MHz phased array probe with a linear scan.
Before the groove filling levels were determined destructively, reference measurements by means of X-ray computed tomography (CT) were obtained. The resulting voxel edge length (38
m) of the reconstructed volume image was unfortunately too low to get groove filling levels with satisfying precision. Quantitatively, the groove height of 0.25 mm corresponds to only about six to seven voxels in the reconstructed CT volume, so partial-volume effects and small segmentation inaccuracies translate into large relative errors in the estimated filling level. Together with the almost identical X-ray attenuation of the titanium pipe and the titanium-alloy fitting, this prevents reliable quantitative per-groove filling values from CT, so in the present study, CT is used only qualitatively to verify that the intended variation of groove filling across specimens has been achieved and to visualize the overall joint morphology. Furthermore, the pipe and the fitting cannot be distinguished in CT scans due to identical X-ray attenuation, which made it even more difficult to determine the groove filling levels. CT sectional views for pipe-fitting connections manufactured with low, medium and high force, respectively, are shown in
Figure 5a–c. The targeted variance in the filling levels of the grooves of the differently pressed pipe-fitting connections could thus be verified, as can be seen clearly.
The ultrasound A, B and C scans were used for evaluation. First, a measuring aperture was placed over the rear wall echo of the inner component (pipe), with the signal level scaled to 80% (see red frame in
Figure 6). This clearly shows the areas where the sound signal reaches or does not reach the inner pipe. The ultrasound B scan also shows that the rear wall of the fitting is partially invisible, whereas, in these cases, the rear wall of the pipe is clearly visible.
This already indicates the quality of the connection since, in the event of an air gap in the grooves, the sound transmission is interrupted, and therefore, no rear wall echo of the pipe can be detected within the area outlined in red. The red dotted line on the right-hand side of
Figure 7 illustrates the measurement position of the tested connection, which is shown in the amplitude image on the left.
It should be noted that the first groove (directly at the transition to the larger diameter of the fitting (on the left in each figure)) cannot be detected here due to the geometric boundary conditions. Since the first groove could not be detected from the outside of the pipe in the previous phased array measurements due to geometric effects, investigations were carried out to determine possible optimizations of the sensor technology. To this end, after a pipe was cut open, high-resolution measurements were carried out using a single sensor from the inside of the pipe; see
Figure 8. Clear echoes of the first groove were also detected in the measured signals, as can be seen in
Figure 9.
A manual evaluation of these measurements, shown in
Figure 9 and
Figure 10, in which the transit times of the echoes from the grooves were compared with the transit times of the echoes from the adjacent bumps, revealed moderate agreement with the reference values derived from the CT measurements. Only in the case of transmission through the transition between the pipe and the fitting did this type of evaluation reveal discrepancies, as can be seen in
Figure 10 at probe position 7 (“PK 7”) and listed in
Table 1.
Since the results of the manually evaluated single-sensor measurements showed a moderate agreement with the reference values, it was obvious not only to consider the first echo that falls back but also to take into account later repeated echoes to improve accuracy. The otherwise complex feature engineering required to consider multiple echoes motivated the use of a convolutional neural network for the task of determining the filling level from the ultrasonic measurement data. Since the training of a neural network requires, in general, large amounts of suitable data to achieve satisfactory accuracy, 25 pipe-fitting connections, which were manufactured with different pressing forces in order to ensure an even distribution of filling levels, were examined by means of phased-array ultrasonic measurements in a water bath. Measurements were taken at six equidistant angles around the circumference, such that, in total, 150 B-Scans were acquired. Afterwards, the pipe-fitting connections were examined destructively to determine the filling levels of the grooves at those angles. The acquired B-Scans, together with the destructively determined groove filling levels, were used to train a convolutional neural network.
Compared to this manual time-of-flight evaluation, the subsequent PAUT-CNN approach avoids explicit feature engineering and the manual picking of individual echoes. In principle, a more automated time-of-flight pipeline could be constructed by defining algorithmic rules for detecting several echo families, measuring their relative transit times, and combining these hand-crafted features in a regression model. However, such a rule-based design would be highly specific to the present geometry and measurement setup, would require extensive tuning to remain stable under noise and coupling variations, and would still rely on a limited set of engineered descriptors. The CNN, by contrast, learns directly from the full B-scan, including later echoes and subtle multi-echo interference patterns, which reduces the influence of ambiguous transit-time differences and improves quantitative accuracy while, at the same time, eliminating manual and rule-based feature design.
For the phased array ultrasonic measurements from the inside of the pipe, an M2M Multi2000 device was used, together with an array with 128 elements with a pitch of 0.2 mm. The sample rate was set to 100 MS/s, while the excitation frequency was 20 MHz. The measurements were taken in a water bath, with the specimen rotating axially around the array that was positioned inside the pipe. B-Scans were acquired in 60° steps around the circumference of the pipe. Examples of B-Scans taken from a pipe-fitting connection, manufactured with low force, medium force and high force, are shown in
Figure 11a–c. From an inline perspective, the proposed inspection chain is compatible with typical production cycle times. Each B-scan consists of
samples, which corresponds to less than 0.4 MB of raw data per scan, so the data volume per joint remains small, even when six circumferential positions are inspected. With standard phased-array repetition rates and a simple rotation mechanism, the acquisition of the six B-scans (in 60° steps) for one joint can be completed within a few seconds, such that ultrasonic acquisition does not dominate the overall cycle time. The trained CNN has about 65,000 trainable parameters, and a single forward pass on one B-scan takes only a few milliseconds on a modern CPU, so the computational inference time is negligible compared to mechanical handling and data acquisition. Inline feasibility is, therefore, primarily limited by the mechanical integration of the immersion setup, rather than by the evaluation algorithm itself.
The CNN-based approach follows the paradigm established by Krizhevsky et al. for image classification [
19], enabling end-to-end learning of hierarchical features directly from raw 2D ultrasonic data without manual feature engineering. This is particularly advantageous for the present task, where traditional time-of-flight analysis showed limited accuracy due to complex multi-echo patterns in sub-millimeter grooves.
The architecture of the CNN used to evaluate the acquired B scans is shown in
Figure 12.
Table 2 and
Table 3 summarize the main structural details of each layer, including kernel size, stride, activation functions, dropout rates, and the number of trainable parameters. The model consists of four identical convolutional blocks with 10 filters of size 17 by 11 and stride 2, each followed by a LeakyReLU activation with an alpha of 0.1 and a dropout layer with a rate of 0.5. A flattening layer and two dense layers with 12 and 6 units form the classification head, where the final layer uses a sigmoid activation and Gaussian weight initialization with mean 0 and standard deviation 0.2. Leaky ReLU was selected over standard ReLU to prevent dying neurons during training, particularly critical, given the limited dataset size [
20]. This activation function maintains a gradient flow for negative inputs while enabling the principled weight initialization strategies that facilitate convergence in deeper networks. The last dense layer has a sigmoid activation and six output nodes corresponding to the six groove filling levels. The output values between 0 and 1 correspond to the groove filling levels from 0% to 100%.
This six-output architecture implements implicit multi-task learning, where a single shared convolutional backbone extracts features from the B-scan while the final dense layer produces independent predictions for each groove [
21]. Multi-task learning can improve generalization when tasks are related, as is the case here, where all six grooves share similar acoustic properties and geometric constraints. The shared representation forces the network to learn features relevant across all grooves, rather than overfitting to groove-specific noise, which is particularly valuable, given the 114-sample training set.
For the training of the CNN, the total 150 B-Scans, together with the destructively determined groove filling levels, were split into 114 dates for the training data and 36 dates for the test data. The batch size was set to 1, which led to better generalization than larger batch sizes. Empirically, batch sizes of 4 and 8 reduced the stochasticity of the gradient but consistently yielded a higher RMSE on the held-out test set, despite a slightly faster decrease in the training loss. We, therefore, treat batch size 1 as a deliberate regularization choice for this small-data regression problem, accepting the longer training time and the somewhat higher run-to-run variance in exchange for better generalization. This observation aligns with theoretical findings by Keskar et al., who demonstrated that small-batch training converges to flatter minima in the loss landscape, yielding improved generalization compared to large-batch methods that tend toward sharp minima [
22]. Beyond regularization, the dropout layers enable Bayesian uncertainty estimation through Monte Carlo dropout at the inference time, where multiple forward passes with active dropout provide prediction variance [
23]. This capability is particularly relevant, given the limited training set of 114 samples, as it allows the quantification of model confidence for individual predictions. Given the small dataset and regression task, the combination of batch size 1, dropout, and implicit multi-task learning was critical to achieving robust generalization. For the optimizer, Adam was chosen with a learning rate of 0.0001, while the parameter
was set to 0.3 and the parameter
to 0.999. The Adam optimizer [
24] was selected due to its adaptive learning rate properties and computational efficiency. The non-standard
value of 0.3 (default: 0.9) was chosen to reduce momentum averaging, which helped prevent overfitting, given the limited training set size of 114 samples. The mean squared error (MSE) was used as the loss function.
To quantitatively evaluate the agreement between CNN predictions and destructive reference measurements, we use the root mean square error (RMSE) of the filling levels. For a test set with
N grooves and reference filling levels
and corresponding predictions
, the RMSE is defined as
while the MSE, which was used as the loss function during the training, is defined as
Unless stated otherwise, we report the RMSE as a percentage, that is, the above expression multiplied by 100. In addition to the global RMSE across all grooves, we also compute separate RMSE values for each groove index to analyze potential systematic differences between groove positions.
No explicit data augmentation techniques were applied during training despite the limited dataset size of 114 B-scans. While augmentation strategies such as geometric transformations, additive noise, or intensity scaling are common for expanding small image datasets [
25], preserving the physical authenticity of ultrasonic signal characteristics was prioritized. The decision avoided potential artifacts that could misrepresent echo timing, amplitude relationships, or phase information critical for acoustic interpretation. Instead, regularization through dropout and small-batch training addressed overfitting, as evidenced by the 7% RMSE on held-out test data. Future work may explore domain-specific augmentation informed by ultrasonic wave propagation physics or hybrid synthetic-experimental data generation.
In the present project, we did not employ physics-based simulation to generate synthetic B-scans because setting up and validating a full wave propagation model for the specific titanium geometry and the 20 MHz array would have exceeded the available effort. Nevertheless, high-fidelity simulation-based augmentation is a promising option for future work when aiming at larger and more diverse training sets.
The architecture does not include batch normalization layers, which are common in modern CNNs for stabilizing training and enabling higher learning rates [
26]. This decision was deliberate, given the batch size of 1, for which batch normalization would compute statistics over single samples, rather than mini-batches, potentially introducing noise, rather than stabilization. With 114 training samples, the combination of dropout before each convolutional layer and small-batch gradient descent provided sufficient regularization without the statistical instability that single-sample batch normalization can introduce. Future work with larger datasets may explore batch normalization with appropriately sized mini-batches or alternative normalization strategies such as layer normalization or group normalization that are less sensitive to batch size.
The sequential convolutional architecture without residual connections was chosen for its simplicity and sufficiency, given the four-layer depth and 114-sample training set. While residual learning frameworks introduced by He et al. enable the training of much deeper networks by addressing vanishing gradients through skip connections [
27], the present shallow architecture showed no evidence of gradient degradation during training. However, residual connections could be explored in future work if scaling to deeper architectures becomes necessary for more complex inspection scenarios involving multiple materials, variable groove geometries, or additional defect types beyond filling level quantification. The current architecture balances representational capacity with overfitting risk appropriate to the available data volume.
4. Results
The result of the AI training is shown in
Figure 13a–h.
Figure 13a shows the comparison of the destructive reference values (x-axis) with the output of the neural network (y-axis) for the training data. The comparison for the test data, i.e., the data that is not used to train the neural network but to evaluate how well the neural network generalizes to new, unseen data, is shown in
Figure 13b. These generally provide a slightly poorer correlation with the reference values than the training data. For the test data, it was possible to achieve agreement between the predictions of the neural network and the reference values down to an RMSE (root mean square error) of approximately 7% of the groove filling level. In
Figure 13c–h, the groove filling levels are shown separately for the different groove positions.
Table 4 summarizes RMSE, mean error, and standard deviation per groove index. The values indicate that the prediction performance is relatively uniform across grooves, with only limited variation between positions. We did not observe a clear trend of systematically higher errors near the pipe-to-fitting transition or at specific groove indices, which suggests that the network can handle the moderate geometric variations within the present configuration.
All B-scans used for training and testing contain realistic measurement disturbances such as speckle, small coupling variations, and minor geometric misalignments. No explicit denoising was applied, so the reported RMSE of approximately 7% already reflects the influence of these effects under typical immersion-tank conditions. In the available data, we did not observe systematic failure patterns that could be clearly attributed to noise spikes or isolated artifacts. However, extreme situations such as the complete loss of coupling or strong saturation effects are not represented in the present dataset and, therefore, remain outside the validated operating range of the method.
One of the reasons for using a neural network to determine the degree of filling based on ultrasound data was that the neural network would not only use the first echo to determine the degree of filling but would also take subsequent echoes into account, thus enabling a more precise determination of the degree of filling. To verify that the neural network actually “looks” also at later echoes, the explainable AI technique “guided Grad-CAM” was used [
28] to visualize the regions in the input images (B-Scans) that are most decisive for the network’s decision.
Figure 14 shows an example of a B-Scan (a), together with the saliency map determined by Grad-CAM (b), as well as both overlaid (c). It is clear to see that CNN not only uses later echoes but also even focuses on them.
In addition, the CNN-based evaluation compares favorably to the manual single-probe time-of-flight analysis discussed in
Section 3. While the largest deviation in
Table 1 lies in the order of magnitude of the groove heights themselves at certain probe positions, the CNN predictions remain within about 7% of the filling level across all grooves in the test set. Given that both manual and conceivable automated time-of-flight approaches would rely on a restricted number of hand-crafted timing features and explicit rules for echo selection, this underlines the benefit of exploiting the complete B-scan and multiple echo trains through data-driven learning instead of relying on hand-picked transit times and manually designed feature pipelines.