# GIT: A Transformer-Based Deep Learning Model for Geoacoustic Inversion

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Problem Formulation

## 3. Methodology

#### 3.1. Data Preprocessing for Tokenization

#### 3.2. Frequency and Sensor 2-D Positional Embedding

#### 3.3. Transformer Model Based on MHSA

`[CLS]`token similar to vision transformers [19]. The first encoder receives feature tokens ${\mathbf{X}}_{0}$ of size ($F\times L+1$, $dim$). Then the MHSA layer in the N transformer encoder generates three attention matrices, a query matrix ${\mathbf{Q}}_{\mathbf{h}}$, a key matrix ${\mathbf{K}}_{\mathbf{h}}$, and the corresponding value matrix ${\mathbf{V}}_{\mathbf{h}}$, to extract features from ${\mathbf{X}}_{\mathbf{n}-1}$.

#### 3.4. Multi-Task Learning with Transformer

## 4. Experiments and Analysis

#### 4.1. Numerical Simulations

#### 4.2. Inversion Procedure

- 1.
- The proposed GIT model is trained using ${10}^{5}$ replicas, sampled for a uniform distribution for each parameter in the interval, which is shown in Table 1.Due to the large data size, the inversion parameters of the training data are believed to cover the whole parameter interval.
- 2.
- Multi-task learning is used to optimize the neural network for multiple objective losses, along the direction of minimizing the MSE loss function for each subtask i, which is expressed as:$${L}_{i}=\frac{1}{N}\sum _{N}{\left(\right)}^{y}2$$
- 3.
- To improve the robustness, Gaussian white noise is added to the training data with signal-to-noise-ratios (SNRs) randomly from 10 to 50 dB, which can be formulated as:$$SNR=10{log}_{10}\frac{{\sum}_{l=1}^{L}{\left(\right)}_{{\mathbf{p}}_{l}}^{}22}{}FL{\sigma}^{2}$$
- 4.
- For the testing data, it consists of ${10}^{4}$ samples (SNR = 20 dB), which is also generated using the parameters in Table 1.

#### 4.3. Experimental Setup

#### 4.4. Experimental Results

#### 4.4.1. Performance Comparison with CNNs

#### 4.4.2. Performance Impact on the FS Positional Embedding

#### 4.5. Parameter Sensitivity

#### 4.6. Model Complexity

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Bonnel, J.; Pecknold, S.P.; Hines, P.C.; Chapman, N.R. An Experimental Benchmark for Geoacoustic Inversion Methods. IEEE J. Ocean. Eng.
**2021**, 46, 261–282. [Google Scholar] [CrossRef] - Xue, Y.; Lei, F.; Zhu, H.; Xiao, R.; Chen, C.; Cui, Z. An Inversion Method for Geoacoustic Parameters of Multilayer Seabed in Shallow Water. J. Phys. Conf. Ser.
**2021**, 1739, 012019. [Google Scholar] [CrossRef] - Dumaz, L.; Garnier, J.; Lepoultier, G. Acoustic and geoacoustic inverse problems in randomly perturbed shallow-water environments. J. Acoust. Soc. Am.
**2019**, 146, 458–469. [Google Scholar] [CrossRef] [PubMed] - Liu, H.; Yang, K.; Ma, Y.; Yang, Q.; Huang, C. Synchrosqueezing transform for geoacoustic inversion with air-gun source in the East China Sea. Appl. Acoust.
**2020**, 169, 107460. [Google Scholar] [CrossRef] - Lu, L.; Ren, Q.; Ma, L. Geoacoustic inversion base on modeling ocean-bottom reflection wave. J. Acoust. Soc. Am.
**2016**, 140, 3067. [Google Scholar] [CrossRef] - Wang, P.; Song, W. Matched-field geoacoustic inversion using propagation invariant in a range-dependent waveguide. J. Acoust. Soc. Am.
**2020**, 147, EL491–EL497. [Google Scholar] [CrossRef] - Dahl, P.H.; Dall’Osto, D.R. Vector Acoustic Analysis of Time-Separated Modal Arrivals From Explosive Sound Sources during the 2017 Seabed Characterization Experiment. IEEE J. Ocean. Eng.
**2020**, 45, 131–143. [Google Scholar] [CrossRef] - Bonnel, J.; Dall’Osto, D.R.; Dahl, P.H. Geoacoustic inversion using vector acoustic modal dispersion. J. Acoust. Soc. Am.
**2019**, 146, 2930. [Google Scholar] [CrossRef] - Zheng, G.; Zhu, H.; Wang, X.; Khan, S.; Li, N.; Xue, Y. Bayesian Inversion for Geoacoustic Parameters in Shallow Sea. Sensors
**2020**, 20, 2150. [Google Scholar] [CrossRef] - Yang, H.; Lee, K.; Choo, Y.; Kim, K. Underwater Acoustic Research Trends with Machine Learning: Passive SONAR Applications. J. Ocean Eng. Technol.
**2020**, 34, 227–236. [Google Scholar] [CrossRef] - Yang, H.; Lee, K.; Choo, Y.; Kim, K. Underwater Acoustic Research Trends with Machine Learning: Ocean Parameter Inversion Applications. J. Ocean Eng. Technol.
**2020**, 34, 371–376. [Google Scholar] [CrossRef] - Liu, M.; Niu, H.; Li, Z.; Liu, Y.; Zhang, Q. Deep-learning geoacoustic inversion using multi-range vertical array data in shallow water. J. Acoust. Soc. Am.
**2022**, 151, 2101–2116. [Google Scholar] [CrossRef] [PubMed] - Zhu, X.; Dong, H. Shear Wave Velocity Estimation Based on Deep-Q Network. Appl. Sci.
**2022**, 12, 8919. [Google Scholar] [CrossRef] - Shen, Y.; Pan, X.; Zheng, Z.; Gerstoft, P. Matched-field geoacoustic inversion based on radial basis function neural network. J. Acoust. Soc. Am.
**2020**, 148, 3279–3290. [Google Scholar] [CrossRef] - Alfarraj, M.; AlRegib, G. Semi-supervised Learning for Acoustic Impedance Inversion. arXiv
**2019**, arXiv:1905.13412. [Google Scholar] - Yang, F.; Ma, J. Deep-learning inversion: A next-generation seismic velocity model building methodDL for velocity model building. Geophysics
**2019**, 84, R583–R599. [Google Scholar] [CrossRef] - Feng, S.; Zhu, X. A Transformer-Based Deep Learning Network for Underwater Acoustic Target Recognition. IEEE Geosci. Remote Sens. Lett.
**2022**, 19, 1505805. [Google Scholar] [CrossRef] - Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv
**2020**, arXiv:2010.11929. [Google Scholar] - Gong, Y.; Chung, Y.A.; Glass, J. Ast: Audio spectrogram transformer. arXiv
**2021**, arXiv:2104.01778. [Google Scholar] - Niu, H.; Ozanich, E.; Gerstoft, P. Ship localization in Santa Barbara Channel using machine learning classifiers. J. Acoust. Soc. Am.
**2017**, 142, EL455–EL460. [Google Scholar] [CrossRef] - Niu, H.; Reeves, E.; Gerstoft, P. Source localization in an ocean waveguide using supervised machine learning. J. Acoust. Soc. Am.
**2017**, 142, 1176–1188. [Google Scholar] [CrossRef] [PubMed] - Kendall, A.; Gal, Y.; Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7482–7491. [Google Scholar]
- Monteiro, N.M.; Oliveira, T.C. Mesh generation for underwater acoustic modeling with KRAKEN. Adv. Eng. Softw.
**2023**, 180, 103455. [Google Scholar] [CrossRef] - Murray, J.; Ensberg, D. The Swellex-96 Experiment 1996. 1996. Available online: http://swellex96.ucsd.edu/index.html (accessed on 12 March 2023).
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar]

**Figure 3.**Simulated signal waveform and spectrum based on the ocean environment shown in Figure 2.

Parameters | Interval |
---|---|

${h}_{s}$ (m) | [30, 40] |

${c}_{st}$ (m/s) | [1480, 1800] |

${c}_{sb}$ (m/s) | [1480, 1800] |

${\rho}_{s}$ (g/cm${}^{3}$) | [1.5, 2.2] |

${c}_{mb}$ (m/s) | [1800, 2600] |

Item | Values |
---|---|

2 × CPU | Intel Xeon Platinum 8377C |

4 × GPU | Nvidia GeForce RTX 3090 |

Memory | 128 GB |

Python | 3.9 |

PyTorch | 1.8.0 |

Methods | MAE | MAE_${\mathit{h}}_{\mathit{s}}$ | MAE_${\mathit{c}}_{\mathit{s}\mathit{t}}$ | MAE_${\mathit{c}}_{\mathit{s}\mathit{b}}$ | MAE_${\mathit{\rho}}_{\mathit{s}}$ | MAE_${\mathit{c}}_{\mathit{m}\mathit{t}}$ |
---|---|---|---|---|---|---|

CRNN | 93.2 | 0.167 | 9.33 | 28.6 | 0.0243 | 55.1 |

FCN | 64.2 | 0.131 | 4.92 | 22.6 | 0.0175 | 36.6 |

GIT (proposed) | 56.7 | 0.101 | 4.26 | 22.1 | 0.0128 | 30.2 |

Position Embedding | MAE | MAE_${\mathit{h}}_{\mathit{s}}$ | MAE_${\mathit{c}}_{\mathit{s}\mathit{t}}$ | MAE_${\mathit{c}}_{\mathit{s}\mathit{b}}$ | MAE_${\mathit{\rho}}_{\mathit{s}}$ | MAE_${\mathit{c}}_{\mathit{m}\mathit{t}}$ |
---|---|---|---|---|---|---|

1-D | 66.8 | 0.129 | 4.67 | 27.1 | 0.0125 | 34.9 |

FS | 56.7 | 0.101 | 4.26 | 22.1 | 0.0128 | 30.2 |

Number of Encoders N | MAE | MAE_${\mathit{h}}_{\mathit{s}}$ | MAE_${\mathit{c}}_{\mathit{s}\mathit{t}}$ | MAE_${\mathit{c}}_{\mathit{s}\mathit{b}}$ | MAE_${\mathit{\rho}}_{\mathit{s}}$ | MAE_${\mathit{c}}_{\mathit{m}\mathit{t}}$ |
---|---|---|---|---|---|---|

2 | 67.8 | 0.134 | 4.54 | 27.9 | 0.0141 | 35.3 |

3 | 56.7 | 0.101 | 4.26 | 22.1 | 0.0128 | 30.2 |

4 | 55.6 | 0.111 | 4.30 | 21.2 | 0.0125 | 30.0 |

5 | 52.6 | 0.108 | 3.93 | 20.0 | 0.0120 | 28.5 |

6 | 75.2 | 0.157 | 5.15 | 32.5 | 0.0171 | 37.4 |

**Table 6.**Inversion performance varying with the number of attention heads on the objective parameters.

Number of Heads H | MAE | MAE_${\mathit{h}}_{\mathit{s}}$ | MAE_${\mathit{c}}_{\mathit{s}\mathit{t}}$ | MAE_${\mathit{c}}_{\mathit{s}\mathit{b}}$ | MAE_${\mathit{\rho}}_{\mathit{s}}$ | MAE_${\mathit{c}}_{\mathit{m}\mathit{t}}$ |
---|---|---|---|---|---|---|

3 | 59.2 | 0.109 | 5.17 | 21.7 | 0.0249 | 32.2 |

6 | 65.3 | 0.125 | 5.74 | 25.7 | 0.0165 | 33.7 |

12 | 56.7 | 0.101 | 4.26 | 22.1 | 0.0128 | 30.2 |

24 | 56.3 | 0.108 | 4.28 | 22.0 | 0.0139 | 30.0 |

Embedding Dimension $\mathit{d}\mathit{i}\mathit{m}$ | MAE | MAE_${\mathit{h}}_{\mathit{s}}$ | MAE_${\mathit{c}}_{\mathit{s}\mathit{t}}$ | MAE_${\mathit{c}}_{\mathit{s}\mathit{b}}$ | MAE_${\mathit{\rho}}_{\mathit{s}}$ | MAE_${\mathit{c}}_{\mathit{m}\mathit{t}}$ |
---|---|---|---|---|---|---|

96 | 59.2 | 0.120 | 5.17 | 23.8 | 0.0165 | 36.5 |

192 | 67.3 | 0.125 | 5.74 | 25.7 | 0.0215 | 36.3 |

384 | 56.7 | 0.101 | 4.26 | 22.1 | 0.0128 | 30.2 |

768 | 67.6 | 0.108 | 4.26 | 27.9 | 0.0120 | 35.3 |

Model | No. Params (M) | Avg. Time (ms) | FPS |
---|---|---|---|

CRNN | 0.237 | 1341.944 ± 33.171 | 0.75 |

FCN | 31.043 | 8.805 ± 0.102 | 113.57 |

GIT | 4.467 | 1.804 ± 0.246 | 554.46 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Feng, S.; Zhu, X.; Ma, S.; Lan, Q.
GIT: A Transformer-Based Deep Learning Model for Geoacoustic Inversion. *J. Mar. Sci. Eng.* **2023**, *11*, 1108.
https://doi.org/10.3390/jmse11061108

**AMA Style**

Feng S, Zhu X, Ma S, Lan Q.
GIT: A Transformer-Based Deep Learning Model for Geoacoustic Inversion. *Journal of Marine Science and Engineering*. 2023; 11(6):1108.
https://doi.org/10.3390/jmse11061108

**Chicago/Turabian Style**

Feng, Sheng, Xiaoqian Zhu, Shuqing Ma, and Qiang Lan.
2023. "GIT: A Transformer-Based Deep Learning Model for Geoacoustic Inversion" *Journal of Marine Science and Engineering* 11, no. 6: 1108.
https://doi.org/10.3390/jmse11061108