Next Article in Journal
Exploring New Vista of Intelligent Recommendation Framework for Tourism Industries: An Itinerary through Big Data Paradigm
Next Article in Special Issue
Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient
Previous Article in Journal
Acknowledgment to Reviewers of Information in 2021
Previous Article in Special Issue
Performance Study on Extractive Text Summarization Using BERT Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Bidirectional Context Embedding Transformer for Automatic Speech Recognition

1
Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou 350118, China
2
Fujian Provincial Universities Engineering Research Center for Intelligent Driving Technology, Fujian University of Technology, Fuzhou 350118, China
3
College of Internet of Things Engineering, Hohai University, Changzhou 213022, China
4
College of Environmental Science and Engineering, North China Electric Power University, Beijing 102206, China
*
Author to whom correspondence should be addressed.
Information 2022, 13(2), 69; https://doi.org/10.3390/info13020069
Submission received: 16 December 2021 / Revised: 23 January 2022 / Accepted: 23 January 2022 / Published: 29 January 2022
(This article belongs to the Special Issue Novel Methods and Applications in Natural Language Processing)

Abstract

Transformers have become popular in building end-to-end automatic speech recognition (ASR) systems. However, transformer ASR systems are usually trained to give output sequences in the left-to-right order, disregarding the right-to-left context. Currently, the existing transformer-based ASR systems that employ two decoders for bidirectional decoding are complex in terms of computation and optimization. The existing ASR transformer with a single decoder for bidirectional decoding requires extra methods (such as a self-mask) to resolve the problem of information leakage in the attention mechanism This paper explores different options for the development of a speech transformer that utilizes a single decoder equipped with bidirectional context embedding (BCE) for bidirectional decoding. The decoding direction, which is set up at the input level, enables the model to attend to different directional contexts without extra decoders and also alleviates any information leakage. The effectiveness of this method was verified with a bidirectional beam search method that generates bidirectional output sequences and determines the best hypothesis according to the output score. We achieved a word error rate (WER) of 7.65%/18.97% on the clean/other LibriSpeech test set, outperforming the left-to-right decoding style in our work by 3.17%/3.47%. The results are also close to, or better than, other state-of-the-art end-to-end models.
Keywords: automatic speech recognition (ASR); speech transformer; bidirectional decoder; bidirectional embedding; end-to-end model; attention; bidirectional beam search automatic speech recognition (ASR); speech transformer; bidirectional decoder; bidirectional embedding; end-to-end model; attention; bidirectional beam search

Share and Cite

MDPI and ACS Style

Liao, L.; Afedzie Kwofie, F.; Chen, Z.; Han, G.; Wang, Y.; Lin, Y.; Hu, D. A Bidirectional Context Embedding Transformer for Automatic Speech Recognition. Information 2022, 13, 69. https://doi.org/10.3390/info13020069

AMA Style

Liao L, Afedzie Kwofie F, Chen Z, Han G, Wang Y, Lin Y, Hu D. A Bidirectional Context Embedding Transformer for Automatic Speech Recognition. Information. 2022; 13(2):69. https://doi.org/10.3390/info13020069

Chicago/Turabian Style

Liao, Lyuchao, Francis Afedzie Kwofie, Zhifeng Chen, Guangjie Han, Yongqiang Wang, Yuyuan Lin, and Dongmei Hu. 2022. "A Bidirectional Context Embedding Transformer for Automatic Speech Recognition" Information 13, no. 2: 69. https://doi.org/10.3390/info13020069

APA Style

Liao, L., Afedzie Kwofie, F., Chen, Z., Han, G., Wang, Y., Lin, Y., & Hu, D. (2022). A Bidirectional Context Embedding Transformer for Automatic Speech Recognition. Information, 13(2), 69. https://doi.org/10.3390/info13020069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop