A Preliminary Study on the Adaptive SNR Threshold Method for Depth of Penetration Measurements in Diagnostic Ultrasounds

Featured Application: The present work is aimed at providing a novel image analysis-based method forthemaximumdepth of penetration (DOP) measurement in Quality Assessment (QA) of medical ultrasound systems. Abstract: Maximum depth of penetration (DOP) is among the most relevant parameters in quality assurance programs for Ultrasound (US) scanners. Nowadays, a generally-accepted protocol for DOP estimation is still awaited and, in common practice, DOP is visually assessed despite the low accuracy. To overcome the eye-based assessment subjectivity, automatic image analysis methods have been proposed in literature. The present work focuses on a novel automatic method, namely the adaptive Signal to Noise Ratio (SNR) threshold method (AdSTM), developed in the MATLAB environment, by comparing it with an existing automatic approach, namely the tangent threshold method (TTM), and the mean judgment of eight observers (naked eye method). The three investigated methods were applied on data acquired from four US scanners for general purpose imaging, equipped with linear, convex, and vector array probes. Tests were carried out in two di ﬀ erent conﬁguration settings (raw scanner and default preset working conditions). AdSTM outcomes were tested by means of Monte Carlo Simulations. Most of measurement results were compatible despite the fact that the AdSTM seemed to be more sensitive and faster than the TTM. The results analysis conﬁrms the higher dispersion of the naked eye method in DOP assessment with respect to the proposed automatic methods.


Introduction
As pointed out in several studies [1][2][3][4][5][6][7][8][9], in recent years quality assurance (QA) programs, able to objectively provide ultrasound (US) parameter measurements, have become more and more important in the assessment of US scanner functionality, given that they are implied in a wide range of well-known clinical applications for the visualization of anatomical districts worldwide. It is therefore of primary importance to make sure that clinicians can identify and characterize possible abnormalities in an anatomic image (e.g., malignancies). This should be possible at any depth with the highest reliability, resulting in an improvement of both diagnostic and therapeutic phases and in a reduction of the clinician mismatching to the minimum as well as patient discomfort due to wrong therapeutic choices. Such results can be achieved as long as the US scanners are able to provide a consistent quality in the images after having checked several B-mode US parameters. Among the main ones, maximum depth of penetration (DOP) deserves particular attention-defined as the maximum low contrast penetration depth at which US speckle due to the scattering within a tissue-mimicking material (TMM), is displayed by the US scanner without noise [1,2,4,8,10,11]. In particular, speckle pattern is prominently displayed in the image when considering depths much smaller than DOP, while the situation changes at greater depths than DOP where the electronic noise is predominant with respect to speckle. The DOP is an indicator of how well the US probe is able to display the features of anatomical districts far from the probe-skin/phantom interface. Browne et al. suggested that it could be considered a predictor for the clinician's image quality perception [12]. Nowadays, there is neither a wide consensus nor a commonly-accepted method for DOP estimation and in many cases it is still visually assessed [13,14]. However, it should be highlighted that DOP operator dependence is detrimental for assessment objectivity. On the other hand Mannila et al. [15], after having defined DOP as the depth where speckle starts to be dominated from noise, pointed out how its estimation is necessary because of its influence on vertical distance accuracy measurements. Another scientific study [16] provided an alternative definition of such parameters as the maximum depth at which speckle is still visible. The estimation was based on the standard deviation computation of both speckle and noise, from two consecutive images captured by holding the probe still. Nevertheless, image analysis-based off-line methods have been proposed in literature [10,11,[17][18][19] aimed at overcoming the intrinsic limit of subjective visually-performed tests.
Among the most used, the SNR-based evaluation at several depths deserves to be mentioned [20] since it is reported as a standard reference method in [17]. In particular, such methods involve the storage of images captured from a US phantom [7,19,21] and consequently DOP is evaluated in the post-processing phase as the maximum depth at which the estimated SNR exceeds a set threshold. Despite the development of such methods, it should be pointed out that the most accredited ones are based on a threshold value subjectively chosen by the operator [4] or without clarifying the choice criteria [16,19]. This constitutes a source of error and unreliability because such methods suffer from an indirect operator subjectivity.
The contribution of the present work lies firstly in the fact that the proposed method is no longer based on threshold values chosen by users or operators but it determines the thresholds in an a totally independent and automatic manner-this is a great advantage in terms of reproducibility and reliability of the results. Secondly, in order to validate this novel method, it was compared with another operator-independent algorithm, already proposed in literature by the same authors [18], whose complexity may make it harder to employ. Therefore, the goal of this study was to improve and validate the functionality of the above method, based also on a previous preliminary study [22], the adaptive SNR threshold method (AdSTM), and compare the measurement results with both the tangent threshold method (TTM) in [18] and the mean judgment of eight different observers. In this study, the AdSTM was tested on four US scanners, each one equipped with three different probe models. Finally, many Monte Carlo simulations (MCS) were carried out in order to quantify the method uncertainty in the DOP measurement.

Materials and Methods
The AdSTM is a SNR-based method developed to assess the maximum depth of penetration on the basis of phantom and in-air clips obtained with the transducer on the phantom and held in-air, respectively. The data acquired was processed in MATLAB, obtaining a single image for both phantom and in-air clips (I and I a ), which were the result of the average of the first 15 frames. Then, the proper diagnostic average images (I c and I c,a ) were obtained after cropping through automatic masking of I and I a to remove all the US settings details. In order to compute the SNR as a function of depth, the signal S(z) and the noise N(z) were estimated according to the mean pixel value on a row within a rectangular region of interest (ROI), automatically selected on I c and I c,a , respectively. An accurate estimation of DOP is closely related to a suitable number of pixels to include in the ROI at each depth. Nevertheless, the main limitation on the number of pixels per depth is due to the phantom section width. Furthermore, other aspects should be taken into account-artifacts at the transducer-phantom (or transducer-air) interface; the possible presence of strong echoes, corresponding to the phantom bottom; and the delimitation of the different attenuation zones. In addition, an excessive pixel number per depth could increase DOP computational cost. Therefore, the minimal pixel number choice (e.g., 30 pixels) is based on a trade-off between an accurate estimation and a limited computational time (about 10 ms). Then, the AdSTM evaluates the corresponding SNR(z) from the ratio S(z)/N(z) and computes the adaptive SNR threshold (SNR th ), automatically estimated as: where S max is the maximum system sensitivity retrieved from the logistic function of the SNR curve [22], ∆g min is the minimum gray level difference distinguishable from the human eye [23][24][25] and L fs the luminance full scale expressed in terms of gray levels. In particular, the ratio ∆g min /L fs is always between 0 and 1, while S max was determined according to the following steps: 1.
The non-linear curve fitting f (z) was derived, from SNR(z), by the iterative computation of the coefficients β, χ, γ, η according to the sigmoidal function: 2.
The first order derivative f '(z) was calculated and its minimum value z min was used to estimate S max as follows: Finally, the AdSTM retrieved the depth value corresponding to the maximum depth of penetration from the intersection between the threshold SNR th and the SNR curve.
In order to test the AdSTM on different US scanners equipped with different probes, the results were compared with the ones obtained from the TTM, an alternative algorithm developed in [18], and the mean judgement of eight independent observers without medical expertise through an eye-based test of I c (naked eye test). Medical expertise is not mandatory for quality controls of the US systems by means of ad hoc tissue-mimicking phantoms, as they are usually performed by technicians. The protocol employed in this study for the naked eye test was designed by the authors, on the basis of the American Association of Physicists in Medicine (AAPM) report [2]. Each observer independently performed the test making sure to preserve the same test conditions-a well-lit room, maximum monitor light without glare, and keeping a fixed distance from it. Moreover, through an in-house MATLAB software the observer could indicate with a simple mouse click, his decision on I c . In order to test subjects' inter-and intra-variability, the test was repeated six times providing a randomized image order to the observer, automatically determined by the software.
TTM is an automatic method developed for DOP measurement and based on the gray scale mapping function (GSMF) determination [18] without a threshold subjectively selected by the technician. In particular, after the ROI selection in the diagnostic image, the gray levels were computed as a function of depth, averaging adjacent columns into the ROI. The mean depth profile g(z) obtained, was then interpolated with a non-linear curve, according to the following logistic function (ζ, ν and ξ are constant): Following Equation (4), the DOP was evaluated as the depth at which b(z) intersects the tangent threshold T th , estimated on b(z) as follows: where K max is assumed as the maximum system sensitivity and depends both on some characteristics of the medium (i.e., the attenuation constant and a factor related to the absorption in the medium) and some parameters related to US scanner settings (i.e., the bandwidth central frequency, the electronic noise amplitude, and the echo signal amplitude at z = 0) as reported in [18]. The main advantage of the AdSTM, as for the TTM, is the automatic determination of the adaptive threshold based on the human eye sensitivity, as defined in Equations (1) and (5), respectively. The arbitrary threshold from the operator is no longer required. On the other hand, the main disadvantage of the TTM is the slow GSMF determination and the high acquisition time.
As in [22], the phantom clips were acquired on a multi-purpose, multi-tissue US phantom (CIRS, Model 040GSE). The device [26] is embedded with test objects of known characteristics and it was designed to provide tissue-mimicking properties under B-mode ultrasound. Such phantoms provide the possibility of testing the US probe on two different zones with attenuation coefficients of 0.70 and 0.50 dB·cm −1 ·MHz −1 . The main characteristics of CIRS 040GSE are reported in Table 1. , indicated as A, B, C, and D, each one of them equipped with three US probes (linear, convex, and vector array, respectively), were tested by means of the phantom zone with 0.70 dB·cm −1 ·MHz −1 of attenuation, paying attention to display the speckle background only. All the probes were set at their own central frequency among the suitable range (nominal frequency). Moreover, scanning settings (e.g., overall gain, sensitivity time control (STC), and dynamic range) were preserved throughout the clip acquisitions because of their influence on SNR and DOP. In order to compare the different diagnostic systems tested in this work, the phantom and in-air clip acquisitions were carried out according to two different configuration settings (sets 1 and 2 in Table 2), which corresponded to the raw scanner working condition (QA protocols to provide a comparison of the outcomes from different scanners at similar working conditions) and the default preset working condition (clinical preset) provided by the US specialist, respectively. In addition, the ROI was always automatically selected in the same area of the diagnostic image according to the probe model. The influence of the ROI selection was evaluated, in terms of repeatability uncertainty, with changes in ROI size and position by means of Monte Carlo simulations (see the Monte Carlo simulation section). Moreover, a tissue-mimicking phantom with uniform speckle was used in our tests to reduce the aforementioned error component as much as possible.  Finally, the compatibility between the different DOP outcomes retrieved from the AdSTM, the TTM, and the naked eye test, were verified according to the following condition [27]: where µ 1 and µ 2 are the mean DOP values while σ 1 and σ 2 are the corresponding total standard deviations. If Equation (6) stands, there is no significant discrepancy between the measurements.

Monte Carlo Simulation
Monte Carlo simulations (MCSs) can be used to evaluate measurement uncertainty and method robustness in several applications [28][29][30][31][32][33]; the approach with MCS is also suitable for the uncertainty estimation in DOP measurements as image analysis-based methods [31][32][33] may not be expressed in an analytical form that is suitable for error analysis by the propagation error law.
As in [22], a first series of MCSs (10 4 cycles) were carried out to evaluate the standard deviation σ (SD) of the SNR threshold (th SNR ), through 2.5 and 97.5 percentiles. The following uniform distributions were assigned to the variables influencing th SNR : ROI width l and ROI shift s (s is evaluated with respect to the ROI initial position), both expressed as number of pixels (Table 3a,b). The aforementioned distributions were used, in an analogous way, in order to retrieve the GSMF coefficients contained in the logistic function b(z). Afterwards, a second series of MCSs (10 5 cycles) was carried out for the AdSTM to evaluate DOP distribution histogram for each probe, according to the corresponding th SNR distribution reported in Table 3a,b, together with the previously-assigned distributions l (mean value: 30 px, SD: 1 px) and s, (mean value: 0 px, SD: 2 px). A similar procedure was applied for the TTM where, in addition to l and s distributions, the previously-retrieved GSMF coefficients distributions for DOP assessment were used. The lower number of iterations in the first series was due to the high computational cost of the non-linear curve fitting.

Results
DOP outcomes for the A, B, C, and D diagnostic systems equipped with the three US probes tested in two configuration settings are reported in Table 4 and shown in Figures 1 and 2. Both the AdSTM and the TTM repeatability uncertainties were evaluated through 2.5 and 97.5 percentiles of the data distributions from the second series of the aforementioned MCSs. Then, the total uncertainties reported in Table 4 were obtained by combining the above repeatability uncertainties with the probe-dependent uncertainty due to the influence of probe placement on the phantom scanning area, according to [28]. On the other hand, the one-way ANOVA statistical test (p < 0.001) was used to test intra-and inter-observer variability in the naked eye test. Globally, by considering the same probe model, independently from the US scanner (except for D scanner), the measurement outcomes are compatible for both the SNR threshold and tangent method and the observers' judgment (Table 4). Nevertheless, the fact that DOP values provided by the AdSTM were generally higher than the TTM ones seems to suggest that the former may be the most sensitive. This may have been due to the different reference profile used in the two algorithms, SNR and gray level, respectively. Such issues seems to be confirmed by the outcomes retrieved for the D scanner. In fact for the latter, the TTM provided incompatible results with the mean observers' judgment, whose DOP values, in turn, almost agreed with the AdSTM outcomes. L = linear array probe; C = convex array probe, V = vector/phased array probe. Results are reported in terms of mean ± SD (µ ± σ). The percentage error is expressed with respect to the field of view (% FOV = 100 (σ / FOV)) whose values are reported in Table 2. 1 Missing outcome.  Table 4.  Table 4.
Furthermore, the only missing outcome reported in Table 4 was due to the TTM impossibility of retrieving a DOP value for the convex probe in set 1 configuration, despite the highest STC setting. A possible cause may be found in the intrinsic low level of the D scanner dynamic; such a limit is mainly emphasized in correspondence of the raw scanner working conditions (set 1) resulting in very darker US images, as shown in Figure 2g-i.

Discussion
By focusing attention on the linear array probe for the first three scanners, the outcomes of both the automatic methods were mostly compatible with the mean observers' judgment. Moreover, there was not a significant discrepancy between DOP values in the two configuration settings. On the other hand, the results retrieved for the convex array probe showed a slight disagreement between the two automatic methods, despite the agreement between the TTM and the scores provided by the observers. Nevertheless, it can be seen that mean DOP values, assessed with both automatic methods, were usually higher in the set 1 configuration than the corresponding ones in set 2 for all the US scanners. This phenomenon may suggest that in the default preset condition, the scanner image pre-and post-processing seemed to improve the image quality to the detriment of the maximum penetration depth. For the vector array probe both automatic methods were generally compatible with the visually-assessed DOP values (excluding D scanner results). Lastly, a further consideration is that, independently from the scanner considered, among the US probes, the linear array showed the higher compatibility occurrence for the results obtained with the three different protocols. Furthermore, it can be noticed that such occurrence is mainly verified for C scanner.
As regards the percentage error for the AdSTM and the TTM, for A and B scanners, the highest and the lowest percentage errors were observed for the linear array (4.3%) and for the convex array (1.1%) probes, respectively. On the other hand, for C and D scanners, considering the AdSTM only, the highest (5.0% and 3.1%) and the lowest (1.1% and 1.0%) percentage errors were always observed for the vector array probe, whereas, by considering the TTM for C scanner only, the highest and the lowest percentage errors were observed for the convex array (3.3%) and for the vector array (1.1%) probes. Furthermore, considering A, B, and C scanners, data confirm the equivalence between the two automatic methods in evaluating DOP uncertainty. In fact, the mean percentage error for the AdSTM was around 2.3%, whereas for the TTM was around 2.6%. Conversely, the mean percentage error for the visual assessment was around 4.5%, therefore confirming the higher dispersion of the naked eye method in DOP assessment with respect to the automatic methods. Nevertheless, it should be noticed that the lowest observers' absolute uncertainty was obtained for the linear array probe, which was likely due to the differences in probe technology and scanning format between the linear and vector/convex array models.
Finally, the performances of the four US systems for the AdSTM, the TTM, and naked eye test were compared on the basis of the highest and lowest DOP mean values. For the linear array probe models ( Figure 3a) the best performance was provided by the C scanner (set 2) for all the methods, while the worst performance was given by the A scanner (set 1) for the AdSTM and the D scanner (set 2) for the TTM and naked eye method. On the other hand, among the convex array probe models (Figure 3b), the highest DOP mean value was achieved for the A scanner (set 1) applying the automatic methods, whereas according to the observer judgement such value was obtained by the C scanner (set 1). Besides, the lowest DOP mean value could be found for the B scanner (set 2) in the AdSTM and naked eye method, while for the D scanner (set 2) it was the TTM. Moreover, for vector array probe models ( Figure 3c) the AdSTM and naked eye method agreed in attributing the best performance to the D scanner (set 2), while for the TTM the C scanner (set 2) provided the highest depth of penetration mean value. Conversely, the worst performance was provided by the A scanner in the AdSTM and naked eye method (sets 1 and 2, respectively), while the D scanner (set 1) this was for the TTM. In general, considering the same probe model, it can be noticed that no significant differences in the results presented for the four systems were detected due to changes of the method applied.
In particular, the highest and the lowest DOP mean values were shown in correspondence of the same US system, except for cases related to the D scanner, mainly due to the intrinsic low level of the D scanner brightness level that prevented the TTM from providing consistent results. However, even if the results are very promising, some limitations should be taken into account for further developments-the influence of US system settings deserves an in-depth investigation and, in addition, more tests on different US phantoms should be carried out.

Conclusions
The adaptive SNR threshold method (AdSTM), developed in a previous preliminary study, was tested on a sample of medical US scanners in two different configuration settings. In particular, data was acquired on the same tissue-mimicking phantom from four US diagnostic systems, each equipped with three different US probes that worked at their own central frequency. The AdSTM outcomes were compared with the results from: (a) another automatic method taken from the current state of the art (tangent threshold method, TTM) developed by the same authors and, (b) the mean judgment of eight different observers without medical expertise. Each observer independently performed the test through an in-house MATLAB software preserving the same acquisition conditions. Both AdSTM and TTM repeatability uncertainties were estimated through MCS and subsequently combined with the probe placement uncertainty.
The outcomes usually showed measurement compatibility among both the automatic methods and the observers' judgment. However, it should be noted that the naked eye test scores' compatibility was conditioned by the large observer sample and by the high number of test repetitions. In QA programs where DOP assessment is still visually carried on by a single operator, such compatibility could be no longer guaranteed, whereas, the application of the AdSTM and the TTM assure reproducibility and traceability because of the algorithm threshold that does not depend on the subjectivity of the technician. Nevertheless, the AdSTM seems to have a higher sensitivity than the TTM because of higher DOP values. Both methods have the advantage of the simplicity in electronic noise estimation because, in the first case it can be directly retrieved from the in-air clip, while in the second, such problems are bypassed through the GSMF evaluation. Nevertheless, in correspondence of a US scanner with limited gray level dynamic range, the TTM seemed to be less reliable. Moreover, the AdSTM required lower computing performance and lower acquisition as well as processing time, excluding the (same) probe positioning and data transfer procedure. With regard to the acquisition time, the TTM needed the acquisition of at least 60 images by varying the STC configuration and the overall gain for the GSMF evaluation, whereas in the AdSTM the acquisition of only two clips of 2 s each one with a US scanner frame rate in the range of 20-30 Hz was required. As a final remark, it should be pointed out that the AdSTM application usually took half the time of the TTM. On the basis of the above results and their limitations, further developments of the present study could be the application of the AdSTM to DOP measurements on (a) a wide number of US diagnostic systems available in the market and (b) different US phantom models, as well as (c) changing US system settings. Such follow-up studies will allow for the selection of the best US system settings for DOP measurement and the improvement of AdSTM robustness and will contribute to providing a reference measurement standard in ultrasound DOP measurements.