Author Contributions
Conceptualization, C.-Y.C. and L.-K.M.; methodology, C.-Y.C. and L.-K.M.; software, L.-K.M.; validation, C.-Y.C. and L.-K.M.; formal analysis, L.-K.M.; investigation, C.-Y.C., M.-H.H., C.-H.L. and L.-K.M.; resources, C.-Y.C., M.-H.H., C.-H.L. and Y.-N.S.; data curation, C.-Y.C., M.-H.H., C.-H.L. and L.-K.M.; writing—original draft preparation, L.-K.M.; writing—review and editing, C.-Y.C., M.-H.H., C.-H.L., Y.-N.S. and L.-K.M.; visualization, L.-K.M.; supervision, C.-Y.C., Y.-N.S. All authors have read and agreed to the published version of the manuscript.
Figure 1.
An overview of the proposed architecture for detecting the malposition. It consists of a ResNet-based backbone, Coarse-to-Fine Attention (CTFA), FPN-based neck, FCOS-based detection head, and segmentation head. The legends below demonstrate the operations above.
Figure 1.
An overview of the proposed architecture for detecting the malposition. It consists of a ResNet-based backbone, Coarse-to-Fine Attention (CTFA), FPN-based neck, FCOS-based detection head, and segmentation head. The legends below demonstrate the operations above.
Figure 2.
An illustration of coarse-to-fine attention (CTFA). CTFA consisted of a global-modelling attention (GA) and a scale attention (SA). GA was aimed at capturing long-range relationships and SA was aimed at reweighting with local relationships.
Figure 2.
An illustration of coarse-to-fine attention (CTFA). CTFA consisted of a global-modelling attention (GA) and a scale attention (SA). GA was aimed at capturing long-range relationships and SA was aimed at reweighting with local relationships.
Figure 3.
An illustration of global-modelling attention (GA). GA generated long-range relationships through two branches. The upper branch was aimed at capturing long-range context information and the lower branch was aimed at grabbing local context information. Then, this information is integrated by a series of operations.
Figure 3.
An illustration of global-modelling attention (GA). GA generated long-range relationships through two branches. The upper branch was aimed at capturing long-range context information and the lower branch was aimed at grabbing local context information. Then, this information is integrated by a series of operations.
Figure 4.
An illustration of scale attention (SA). SA addressed the defects of convolutional block attention module (CBAM) by adaptive channel pooling and squeeze-and-excitation (SE) block.
Figure 4.
An illustration of scale attention (SA). SA addressed the defects of convolutional block attention module (CBAM) by adaptive channel pooling and squeeze-and-excitation (SE) block.
Figure 5.
An illustration of post-process.
Figure 5.
An illustration of post-process.
Figure 6.
Ground Truth. (a) Original ground truth. (b) Pre-processed ground truth.
Figure 6.
Ground Truth. (a) Original ground truth. (b) Pre-processed ground truth.
Figure 7.
Ensuring at most one ETT tip/Carina left. (a) Without post-process. (b) With post-process.
Figure 7.
Ensuring at most one ETT tip/Carina left. (a) Without post-process. (b) With post-process.
Figure 8.
Refining the feature point of ETT tip/Cairna by the bbox of ETT/Bifurcation. (a) Without post-process. (b) With post-process.
Figure 8.
Refining the feature point of ETT tip/Cairna by the bbox of ETT/Bifurcation. (a) Without post-process. (b) With post-process.
Table 1.
The performance in ETT–Carina distance error.
Table 1.
The performance in ETT–Carina distance error.
| Test Folder | Acc. (%) | Mean (mm) | Std. (mm) |
|---|
| Folder 1 | 90.37 | 5.130 | 5.609 |
| Folder 2 | 87.70 | 5.969 | 8.325 |
| Folder 3 | 88.24 | 5.256 | 5.491 |
| Folder 4 | 86.63 | 5.437 | 6.663 |
| Folder 5 | 91.18 | 4.874 | 5.111 |
| Average | 88.82 | 5.333 | 6.240 |
| External val. | 90.67 | 5.015 | 5.147 |
Table 2.
The distance error distribution in ETT–Carina.
Table 2.
The distance error distribution in ETT–Carina.
| Test Folder | ≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm |
|---|
| Folder 1 | 62.57 | 85.29 | 93.58 | 96.79 |
| Folder 2 | 63.10 | 84.22 | 92.25 | 95.72 |
| Folder 3 | 63.90 | 83.96 | 92.25 | 95.45 |
| Folder 4 | 63.90 | 87.17 | 92.78 | 97.06 |
| Folder 5 | 64.44 | 88.50 | 93.85 | 97.06 |
| Average | 63.58 | 85.83 | 92.94 | 96.42 |
| External val. | 66.00 | 84.00 | 92.67 | 97.33 |
Table 3.
The confusion matrix of diagnosis.
Table 3.
The confusion matrix of diagnosis.
| GT | Suitable | Unsuitable |
|---|
| Predict |
|---|
| Suitable | 1350 | 126 |
| Unsuitable | 66 | 311 |
| Undetection | 12 | 5 |
Table 4.
The confusion matrix of diagnosis (external val.).
Table 4.
The confusion matrix of diagnosis (external val.).
| GT | Suitable | Unsuitable |
|---|
| Predict |
|---|
| Suitable | 110 | 8 |
| Unsuitable | 5 | 26 |
| Undetection | 1 | 0 |
Table 5.
The performance in recall and precision.
Table 5.
The performance in recall and precision.
| Recall and Precision | ETT Tip | Carina |
|---|
| Test Folder | Recall (%) | Precision (%) | Recall (%) | Precision (%) |
| Folder 1 | 90.64 | 91.37 | 94.65 | 94.91 |
| Folder 2 | 89.30 | 89.54 | 93.58 | 93.58 |
| Folder 3 | 90.91 | 92.14 | 92.25 | 92.49 |
| Folder 4 | 91.18 | 91.42 | 94.92 | 95.17 |
| Folder 5 | 92.78 | 93.53 | 94.12 | 94.37 |
| Average | 90.96 | 91.60 | 93.90 | 94.10 |
| External val. | 92.67 | 93.29 | 88.00 | 88.59 |
Table 6.
The performance in object error.
Table 6.
The performance in object error.
| Object Error | ETT Tip | Carina |
|---|
| Test Folder | Mean (mm) | Std. (mm) | Mean (mm) | Std. (mm) |
| Folder 1 | 4.415 | 5.281 | 3.952 | 3.345 |
| Folder 2 | 4.858 | 7.869 | 4.236 | 3.663 |
| Folder 3 | 3.974 | 4.405 | 4.322 | 3.947 |
| Folder 4 | 4.584 | 6.273 | 3.895 | 3.527 |
| Folder 5 | 3.690 | 3.800 | 4.185 | 3.793 |
| Average | 4.304 | 5.526 | 4.118 | 3.655 |
| External val. | 3.733 | 4.613 | 4.688 | 4.043 |
Table 7.
The object error distribution in ETT tip.
Table 7.
The object error distribution in ETT tip.
| Test Folder | ≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm (%) |
|---|
| Folder 1 | 75.94 | 90.64 | 94.39 | 97.06 |
| Folder 2 | 75.40 | 89.30 | 94.65 | 96.79 |
| Folder 3 | 78.61 | 90.91 | 95.19 | 97.06 |
| Folder 4 | 73.26 | 91.18 | 94.92 | 97.86 |
| Folder 5 | 81.02 | 92.78 | 97.06 | 97.59 |
| Average | 76.85 | 90.96 | 95.24 | 97.27 |
| External val. | 83.33 | 92.67 | 94.67 | 96.67 |
Table 8.
The object error distribution in Carina.
Table 8.
The object error distribution in Carina.
| Test Folder | ≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm (%) |
|---|
| Folder 1 | 74.60 | 94.65 | 98.13 | 99.20 |
| Folder 2 | 74.06 | 93.58 | 97.59 | 99.20 |
| Folder 3 | 73.53 | 92.25 | 96.52 | 98.40 |
| Folder 4 | 78.34 | 94.92 | 97.86 | 98.93 |
| Folder 5 | 74.06 | 94.12 | 98.13 | 98.40 |
| Average | 74.92 | 93.90 | 97.65 | 98.83 |
| External val. | 68.67 | 88.00 | 96.67 | 98.00 |
Table 9.
The comparison results of accuracy and ETT–Carina distance error.
Table 9.
The comparison results of accuracy and ETT–Carina distance error.
| Method | Malposition Accuracy (%) | ETT-Carina Distance Error |
|---|
| Mean (mm) | Std. (mm) |
|---|
| SOTA average [11] | 88.11 | 5.543 | 6.310 |
| Ours average | 88.82 (+0.81%) | 5.333 (−3.79%) | 6.240 (−1.11%) |
| SOTA external val. [11] | 87.33 | 5.668 | 6.651 |
| Ours external val. | 90.67 (+3.82%) | 5.015 (−11.52%) | 5.147 (−22.61%) |
Table 10.
The comparison results of error distribution on the ETT–Carina distance.
Table 10.
The comparison results of error distribution on the ETT–Carina distance.
| Method | ETT-Carina Distance Error Distribution |
|---|
| ≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm (%) |
|---|
| SOTA average [11] | 60.37 | 84.20 | 92.78 | 95.39 |
| Ours average | 63.58 (+5.32%) | 85.83 (+1.94%) | 92.94 (+0.17%) | 96.42 (+1.08%) |
| SOTA external val. [11] | 64.00 | 82.00 | 90.67 | 94.67 |
| Ours external val. | 66.00 (+3.13%) | 84.00 (+2.44%) | 92.67 (+2.21%) | 97.33 (+2.81%) |
Table 11.
The comparison results of recall, precision, and object error on the ETT tip.
Table 11.
The comparison results of recall, precision, and object error on the ETT tip.
| Method | ETT Tip |
|---|
| Recall (%) | Precision (%) | Mean (mm) | Std. (mm) |
|---|
| SOTA average [11] | 93.31 | 93.49 | 4.122 | 4.402 |
| Ours average | 90.96 (−2.52%) | 91.60 (−2.02%) | 4.304 (+4.42%) | 5.526 (+25.53%) |
| SOTA external val. [11] | 90.27 | 90.27 | 4.286 | 5.943 |
| Ours external val. | 92.67 (+2.66%) | 93.29 (+3.35%) | 3.733 (−12.90%) | 4.613 (−22.38%) |
Table 12.
The comparison results of error distribution on the ETT tip.
Table 12.
The comparison results of error distribution on the ETT tip.
| Method | ETT Tip Object Error Distribution |
|---|
| ≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm (%) |
|---|
| SOTA average [11] | 75.08 | 93.31 | 96.36 | 98.21 |
| Ours average | 76.85 (+2.36%) | 90.96 (−2.52%) | 95.24 (−1.16%) | 97.29 (−0.94%) |
| SOTA external val. [11] | 79.33 | 90.27 | 95.33 | 96.97 |
| Ours external val. | 83.33 (+5.04%) | 92.67 (+2.66%) | 94.67 (−0.69%) | 96.67 (−0.31%) |
Table 13.
The comparison results of recall, precision, and object error on the Carina.
Table 13.
The comparison results of recall, precision, and object error on the Carina.
| Method | Carina |
|---|
| Recall (%) | Precision (%) | Mean (mm) | Std. (mm) |
|---|
| SOTA average [11] | 94.70 | 95.23 | 4.775 | 5.342 |
| Ours average | 93.90 (−0.84%) | 94.10 (−1.19%) | 4.118 (−13.76%) | 3.655 (−31.58%) |
| SOTA external val. [11] | 91.64 | 91.96 | 4.567 | 4.513 |
| Ours external val. | 88.00 (−3.97%) | 88.59 (−3.66%) | 4.688 (+2.65%) | 4.043 (−10.41%) |
Table 14.
The comparison results of error distribution on the Carina.
Table 14.
The comparison results of error distribution on the Carina.
| Method | Carina Object Error Distribution |
|---|
| ≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm (%) |
|---|
| SOTA average [11] | 68.84 | 94.70 | 95.55 | 97.12 |
| Ours average | 74.92 (+8.83%) | 93.90 (−0.84%) | 97.65 (+2.20%) | 98.83 (+1.76%) |
| SOTA external val. [11] | 73.33 | 91.64 | 95.33 | 96.54 |
| Ours external val. | 68.67 (−6.35%) | 88.00 (−3.97%) | 96.67 (+1.41%) | 98.00 (+1.51%) |
Table 15.
The effect of softmax in GA.
Table 15.
The effect of softmax in GA.
| Method | Malposition Accuracy (% | ETT-Carina | ETT Tip | Carina |
|---|
| Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
|---|
| w/softmax | 90.11 | 5.209 | 6.628 | 3.968 | 5.800 | 4.203 | 4.097 |
| w/o softmax | 91.18 | 4.911 | 5.114 | 3.689 | 3.802 | 4.238 | 3.862 |
Table 16.
The effect of channel, kernel and SE block of SA.
Table 16.
The effect of channel, kernel and SE block of SA.
| Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
|---|
| Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
|---|
| SA (c1 + k7) | 83.69 | 4.904 | 4.813 | 3.998 | 3.625 | 4.386 | 3.750 |
| SA (c1 + k1) | 85.83 | 5.648 | 7.628 | 4.911 | 8.605 | 4.185 | 3.674 |
| SA (c4 + k1) | 85.83 | 5.182 | 6.245 | 4.188 | 4.067 | 4.611 | 5.759 |
| SA (c8 + k1) | 87.70 | 5.067 | 5.248 | 4.273 | 4.418 | 4.305 | 4.016 |
| SA (c8 + k7) | 83.69 | 4.644 | 4.401 | 4.007 | 3.615 | 4.028 | 3.372 |
| SA (c16 + k1) | 85.56 | 4.883 | 4.778 | 3.985 | 3.696 | 4.351 | 3.969 |
| SA (w/o SE) | 86.36 | 5.491 | 9.697 | 4.619 | 11.997 | 4.391 | 3.956 |
Table 17.
The comparison results of attention modules.
Table 17.
The comparison results of attention modules.
| Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
|---|
| Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
|---|
| FCOS [13] | 86.10 | 5.335 | 7.831 | 4.254 | 5.427 | 4.659 | 7.497 |
| FCOS + SE [36] | 85.03 | 5.424 | 5.854 | 4.284 | 4.156 | 4.543 | 4.943 |
| FCOS + CSPnonlocal [45] | 86.10 | 5.404 | 5.817 | 3.980 | 3.708 | 4.332 | 4.416 |
| FCOS + nonlocal [33] | 86.10 | 5.422 | 6.139 | 4.521 | 10.059 | 4.411 | 4.423 |
| FCOS + CBAM [15] | 86.10 | 5.303 | 5.654 | 4.381 | 4.870 | 4.380 | 4.260 |
| FCOS + CCAM [14] | 86.90 | 4.632 | 4.491 | 4.025 | 3.641 | 4.035 | 3.517 |
| FCOS + SA | 87.70 | 5.067 | 5.248 | 4.273 | 4.418 | 4.305 | 4.016 |
Table 18.
The comparison results of attention modules in parameters and GFLOPs.
Table 18.
The comparison results of attention modules in parameters and GFLOPs.
| Method | Parameters (M) | GFLOPs |
|---|
| FCOS [13] | 32.118 | 19.764 |
| FCOS + SE [36] | 32.126 (+0.02%) | 19.764 (+0%) |
| FCOS + CSPnonlocal [45] | 32.284 (+0.52%) | 19.782 (+0.09%) |
| FCOS + nonlocal [33] | 33.302 (+3.69%) | 19.882 (+0.60%) |
| FCOS + CBAM [15] | 32.127 (+0.03%) | 19.764 (+0%) |
| FCOS + CCAM [14] | 34.154 (+6.34%) | 19.964 (+1.01%) |
| FCOS + SA | 32.253 (+0.42%) | 19.778 (+0.07%) |
Table 19.
The results of GA and SA fusion method.
Table 19.
The results of GA and SA fusion method.
| Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
|---|
| Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
|---|
| FCOS | 86.10 | 5.335 | 7.831 | 4.254 | 5.427 | 4.659 | 7.497 |
| FCOS + SA + GA | 83.96 | 5.225 | 5.306 | 4.304 | 4.376 | 4.425 | 4.032 |
| FCOS + GA || SA | 87.17 | 5.492 | 6.583 | 4.956 | 9.549 | 4.164 | 4.158 |
| FCOS + GA + SA | 87.97 | 4.868 | 4.953 | 4.143 | 4.157 | 4.016 | 3.350 |
Table 20.
The effect of fusing global modelling attention and scale attention.
Table 20.
The effect of fusing global modelling attention and scale attention.
| Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
|---|
| Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
|---|
| FCOS | 86.10 | 5.335 | 7.831 | 4.254 | 5.427 | 4.659 | 7.497 |
| FCOS + nonlocal*2 | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| FCOS + CSPnonlocal*2 | 85.56 | 5.800 | 8.991 | 4.391 | 5.543 | 4.703 | 7.512 |
| FCOS + CCAM*2 | 86.10 | 4.855 | 4.988 | 4.020 | 4.037 | 4.301 | 3.835 |
| FCOS + SA*2 | 87.17 | 5.643 | 6.820 | 4.432 | 5.021 | 4.422 | 5.387 |
| FCOS + CSPnonlocal + SA | 86.90 | 5.727 | 6.518 | 4.646 | 5.267 | 4.685 | 4.732 |
| FCOS + CCAM + SA | 87.97 | 4.868 | 4.953 | 4.143 | 4.157 | 4.016 | 3.350 |
Table 21.
The results of employing mask branch into FCOS.
Table 21.
The results of employing mask branch into FCOS.
| Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
|---|
| Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
|---|
| FCOS [13] | 86.10 | 5.335 | 7.831 | 4.254 | 5.427 | 4.659 | 7.497 |
| CTFA | 87.97 | 4.868 | 4.953 | 4.143 | 4.157 | 4.016 | 3.350 |
| Seg (All) | 87.97 | 4.909 | 5.179 | 3.939 | 4.468 | 4.043 | 3.121 |
| Seg (ETT) | 89.04 | 5.486 | 7.682 | 4.398 | 6.754 | 4.521 | 4.244 |
| Seg (ETT + Carina) | 90.11 | 5.334 | 6.752 | 4.088 | 5.989 | 4.329 | 4.253 |
| Seg (ETT + Carina) + Fusion | 91.18 | 4.911 | 5.114 | 3.689 | 3.802 | 4.238 | 3.862 |
Table 22.
The results of adopting mask prediction or not in the post-process algorithm.
Table 22.
The results of adopting mask prediction or not in the post-process algorithm.
| Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
|---|
| Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
|---|
| w/mask | 85.56 | 7.438 | 10.552 | 6.389 | 10.672 | 4.329 | 4.253 |
| w/o mask | 90.11 | 5.334 | 6.752 | 4.088 | 5.989 | 4.329 | 4.253 |
Table 23.
The visualization results.
Table 23.
The visualization results.