Multitask Fine Tuning on Pretrained Language Model for Retrieval-Based Question Answering in Automotive Domain
Abstract
:1. Introduction
- We construct Chinese question-and-answer corpora, document corpora, and multitask annotated corpora specifically tailored to the automotive domain.
- We propose a joint learning framework with a pretraining–multitask fine-tuning architecture to incorporate domain knowledge and conduct a comparative analysis of the contributions of various auxiliary task objectives to model performance.
- We create an evaluation dataset based on ChatGPT using a semiautomated approach, along with the introduction of the MLWR metric for evaluation.
2. Related Work
2.1. Encoder
2.2. Retriever
3. Our Approach
3.1. Corpus Construction
- MLM Corpus: A subset of sentences is selected from the base corpus, and a certain proportion of words are randomly masked. The MLM corpus is created by annotating the original words in the masked positions.
- NSP Corpus: Each example consists of two sentences. A certain proportion of sentence pairs are randomly selected from the base corpus. Half of the pairs consist of consecutive sentences, labeled as 1, while the other half consist of randomly selected sentence pairs, labeled as 0.
- STS Corpus: The STS corpus is constructed by paraphrasing sentences from the base corpus. To ensure a balanced corpus, positive and negative examples are generated in equal proportions. A total of 50% of the sentences from the document collection are randomly sampled and paraphrased into similar sentences using iFlytek tools. The paraphrased sentences are then filtered by annotators, and those deemed similar to the original sentences are selected as positive examples (labeled as 1, similar). Negative examples are generated by employing a random matching method, where sentences of similar length to the positive examples are randomly sampled and paired to form negative examples (labeled as 0, dissimilar). This approach maintains consistency in length between positive and negative examples, thereby facilitating model training.
- TE Corpus: To construct the TE corpus, we extracted 880,000 annotated examples from an existing Chinese textual entailment dataset. Each example includes a premise and a hypothesis. Based on the inference relationship between the two, they were labeled as entailment (0), contradiction (2), or neutral (1).
3.2. Encoder: A Pretraining–Multitask Fine-Tuning Framework
3.3. Retriever Based on Faiss
4. Experiments
4.1. Evaluation Dataset
4.2. Evaluation Metrics
4.3. Comparison Results
4.4. Case Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gupta, N.; Tur, G.; Hakkani-Tur, D.; Bangalore, S.; Riccardi, G.; Gilbert, M. The AT&T spoken language understanding system. IEEE Trans. Audio Speech Lang. Process. 2005, 14, 213–222. [Google Scholar]
- Ferrucci, D.G.; Brown, E.; Chu-Carroll, J.; Fan, J.; Gondek, D.; Kalyanpur, A.A.; Lally, A.; Murdock, J.W.; Nyberg, E.; Prager, J.; et al. Building Watson: An Overview of the DeepQA Project. AI Mag. 2010, 31, 59–79. [Google Scholar] [CrossRef] [Green Version]
- Qu, C.; Yang, L.; Qiu, M.H. Open domain question answering using early fusion of knowledge bases and text. arXiv 2018, arXiv:1809.00782. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (ANIPS), California, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, MI, USA, 2–7 July 2019; pp. 4171–4186. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 19 May 2023).
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners. Available online: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.\pdf (accessed on 19 May 2023).
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems, Virtual-only Conference, 6–12 December 2020; Volume 33, pp. 1877–1901. [Google Scholar]
- Rajpurkar, P.; Zhang, J.; Lopyrev, K.; Liang, P. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, TX, USA, 1–5 November 2016; pp. 2383–2392. [Google Scholar]
- Rajpurkar, P.; Jia, R.; Liang, P. Know what you don’t know: Unanswerable questions for squad. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, 15–20 July 2018; pp. 784–789. [Google Scholar]
- Li, H.; Tomko, M.; Vasardani, M.; Baldwin, T. MultiSpanQA: A dataset for multi-span question answering. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Seattle, WA, USA, 10–15 July 2022; pp. 1250–1260. [Google Scholar]
- Yang, W.; Xie, Y.; Lin, A.; Li, X.; Tan, L.; Xiong, K.; Li, M.; Lin, J. End-to-end open-domain question answering with bertserini. arXiv 2019, arXiv:1902.01718. [Google Scholar]
- Zhang, Q.; Chen, S.S.; Xu, D.K.; Cao, Q.Q.; Chen, X.J.; Cohn, T.; Fang, M. A Survey for Efficient Open Domain Question Answering. arXiv 2022, arXiv:2211.07886. [Google Scholar]
- Jones, S.K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 1972, 28, 11–21. [Google Scholar] [CrossRef]
- Baldwin, T.; De Marneffe, M.C.; Han, B.; Kim, Y.B.; Ritter, A.; Xu, W. Shared tasks of the 2015 workshop on noisy user-generated text: Twitter lexical normalization and named entity recognition. In Proceedings of the Workshop on Noisy User-Generated Text, Beijing, China, 31 July 2015; pp. 126–135. [Google Scholar]
- Derczynski, L.; Maynard, D.; Rizzo, G.; Van Erp, M.; Gorrell, G.; Troncy, R.; Petrak, J.; Bontcheva, K. Analysis of named entity recognition and linking for tweets. Inf. Process. Manag. 2015, 51, 32–49. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Harris, Z.S. Distributional structure. J. Doc. 1954, 10, 146–162. [Google Scholar] [CrossRef]
- Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- McCann, B.; Bradbury, J.; Xiong, C.; Socher, R. Learned in translation: Contextualized word vectors. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), California, CA, USA, 4–9 December 2017; pp. 6297–6308. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Associ-ation for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; pp. 2227–2237. [Google Scholar]
- Devika, R.; Vairavasundaram, S.; Mahenthar, C.S.J.; Varadarajan, V.; Kotecha, K. A Deep Learning Model Based on BERT and Sentence Transformer for Semantic Keyphrase Extraction on Big Social Data. IEEE Access 2021, 9, 165252–165261. [Google Scholar] [CrossRef]
- Natarajan, B.; Rajalakshmi, E.; Elakkiya, R.; Kotecha, K.; Abraham, A.; Gabralla, L.A.; Subramaniyaswamy, V. Development of an End-to-End Deep Learning Frame-work for Sign Language Recognition, Translation, and Video Generation. IEEE Access 2022, 10, 104358–104374. [Google Scholar] [CrossRef]
- Bentley, J.L. Multidimensional binary search trees used for associative searching. Commun. ACM 1975, 18, 509–517. [Google Scholar] [CrossRef]
- Liu, T.; Moore, A.W.; Gray, A. New algorithms for efficient high-dimensional nonparametric classification. JMLR 2006, 7, 1135–1158. [Google Scholar]
- Omohundro, S.M. Five Balltree Construction Algorithms; International Computer Science Institute: Berkeley, CA, USA, 1989. [Google Scholar]
- Friedman, J.H.; Bentley, J.L.; Finkel, R.A. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 1977, 3, 209–226. [Google Scholar] [CrossRef] [Green Version]
- Jégou, H.; Douze, M.; Schmid, C. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 117–128. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Indyk, P.; Motwani, R. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 24–26 May 1998; pp. 604–613. [Google Scholar]
- Malkov, Y.A.; Yashunin, D.A. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv 2018, arXiv:1603.09320. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jégou, H.; Perronnin, F.; Douze, M.; Sánchez, J.; Pérez, P.; Schmid, C. Aggregating local descriptors into a compact image representation. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3304–3311. [Google Scholar]
- Wang, K.; Liu, Z.; Lin, Y.; Lin, W.; Han, S. HAQ: Hardware-Aware Automated Quantization with Mixed Precision. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Los Angeles, CA, USA, 15–20 June 2019; pp. 1669–1678. [Google Scholar]
- Johnson, J.; Douze, M.; Jégou, H. Billion-scale similarity search with GPUs. IEEE Trans. Big Data 2019, 7, 535–547. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Clark, C.; Lee, K.; Chang, M.W.; Kwiatkowski, T.; Collins, M.; Toutanova, K. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Trans. Assoc. Comput. Linguist. 2020, 8, 64–77. [Google Scholar]
An Example of QA Pairs |
---|
{ “id”: 0, |
“question”: “WEY 是什么品牌的车?”, |
(Which automobile brand is WEY?) |
“answer”: “WEY是长城汽车旗下的高端品牌…”, |
(WEY is a high-end brand under Great Wall Motors…) |
} |
Name | Data Format |
---|---|
MLM Corpus | [MASK][MASK]车队首次在1994年的JGTC第四站比[MASK]中亮相,获得了资格赛第二名的位置。 (The [MASK] team made its debut in the fourth round of the JGTC in 1994, securing the second position in the qualifying [MASK].) |
NSP Corpus | Sentence1:上世纪90年代,TRD为丰田TOM’S车队打造了Supra赛车。 (In the 1990s, TRD built Supra race cars for the Toyota TOM’S team.) Sentence2: 丰田车队首次在1994年的JGTC第四站比赛中亮相,获得了资格赛第二名的位置。(The Toyota team made its debut in the fourth round of the JGTC in 1994, securing the second position in the qualifying race.) Label: continuous (1) |
STS Corpus | Sentence1: 丰田车队首次在1994年的JGTC第四站比赛中亮相,获得了资格赛第二名的位置。 (The Toyota team made its debut in the fourth round of the JGTC in 1994, securing the second position in the qualifying race.) Sentence2: 在1994年的JGTC第四站比赛中,丰田车队首次参赛,并在资格赛中获得了第二名的成绩。 (In 1994, during the fourth round of the JGTC, the Toyota team made its debut and secured a second-place finish in the qualifying race.) Label: similar (1) |
TE Corpus | Sentence1: 一个年轻的黑人正试图向另外两个人解释一些事情。 (A young black person is trying to explain something to two other individuals.) Sentence2: 一位年轻的黑人男子正在和另外两位说话。 (A young black man is talking to two other people.) Label: entailment (0) |
Questions |
---|
{ 未来车门会是什么样?} |
(What will the car doors of the future look like?) |
{ WEY是什么品牌的车?} |
(Which automobile brand is WEY?) |
{ 长安沃尔沃和吉利沃尔沃有什么区别?} |
(What is the difference between Changan Volvo and Geely Volvo?) |
An Illustration Example |
---|
{ “reference”: “长安cs75plus车型热销背后的几点思考”, |
(Some thoughts behind the hot sales of the Changan CS75 Plus model.) |
“queries”: [ |
“长安cs75plus车型热销背后原因解析”, |
(Analysis of the reasons behind the high sales of the Changan CS75 Plus model.) |
“购买长安cs75plus车型的几点原因”, |
(Several reasons for purchasing the Changan CS75 Plus model.) |
“长安cs75plus车型为什么能够热销,列出几点原因”, |
(Please list a few reasons to explain why the Changan CS75 Plus model sells well.) |
“长安cs75plus车型,热销背后的原因思考”, |
(Reflection on the reasons behind the popularity of the Changan CS75 Plus model.) |
] |
} |
Model | Hit@1 (%) | Hit@3 (%) | Hit@5 (%) | MRR (%) | MLWR (%) | |||||
---|---|---|---|---|---|---|---|---|---|---|
CLS | MEAN | CLS | MEAN | CLS | MEAN | CLS | MEAN | CLS | MEAN | |
30.00 | 64.50 | 39.50 | 72.25 | 43.25 | 77.25 | 36.57 | 70.21 | 47.37 | 79.97 | |
FT-MLM | 47.25 | 77.00 | 55.50 | 83.50 | 57.00 | 85.75 | 52.19 | 80.94 | 58.96 | 86.76 |
FT-STS | 13.00 | 60.25 | 18.50 | 71.25 | 22.25 | 74.75 | 17.73 | 66.58 | 26.21 | 76.19 |
FT-TE | 5.00 | 24.75 | 8.50 | 30.25 | 9.00 | 32.75 | 7.27 | 28.89 | 10.73 | 35.41 |
FT-NSP | 14.75 | 59.00 | 14.75 | 67.75 | 20.75 | 71.00 | 18.53 | 64.53 | 24.56 | 73.29 |
Model | Hit@1 (%) | Hit@3 (%) | Hit@5 (%) | MRR (%) | MLWR (%) |
---|---|---|---|---|---|
STS+MLM [CLS] | 74.25 | 84.00 | 85.25 | 79.54 | 87.12 |
NSP+MLM [CLS] | 8.25 | 15.00 | 18.75 | 13.08 | 21.65 |
STS+MLM [MEAN] | 77.50 | 84.75 | 87.00 | 81.60 | 87.71 |
NSP+MLM [MEAN] | 73.25 | 78.75 | 80.50 | 76.77 | 82.17 |
Model | Hit@1 (%) | Hit@3 (%) | Hit@5 (%) | MRR (%) | MLWR (%) | |||||
---|---|---|---|---|---|---|---|---|---|---|
CLS | MEAN | CLS | MEAN | CLS | MEAN | CLS | MEAN | CLS | MEAN | |
26.50 | 64.50 | 39.50 | 72.25 | 43.25 | 77.25 | 36.57 | 70.21 | 47.37 | 79.97 | |
FT-MLM | 29.25 | 76.75 | 36.50 | 82.00 | 39.50 | 85.25 | 33.88 | 80.42 | 41.56 | 86.19 |
MLM+STS | 18.50 | 78.00 | 22.20 | 83.50 | 23.50 | 85.75 | 21.13 | 81.67 | 25.16 | 88.42 |
RoBERTa | 69.00 | 73.00 | 77.75 | 79.50 | 79.25 | 82.75 | 74.16 | 77.41 | 81.59 | 84.61 |
FT-MLM | 73.25 | 75.75 | 79.25 | 81.75 | 83.25 | 84.00 | 77.63 | 79.79 | 85.31 | 86.81 |
MLM+STS | 37.50 | 78.75 | 48.25 | 84.50 | 51.75 | 87.75 | 44.46 | 82.50 | 55.16 | 88.79 |
Input Query: | “如何对高尔夫7系列车型进行自我检修?” ( How to perform self-maintenance on Volkswagen Golf7 series?) |
Model | Retrieved answers |
1. 高尔夫Mk1:1974年5月,第一代高尔夫... 2. 高尔夫Mk2:1983年8月,大众推出第二代... ( 1. Golf Mk1: In May 1974, the first generation Golf was introduced.. 2. Golf Mk2: In August 1983, Volkswagen unveiled the second generation...) | |
FT-MTM | 1. 动力组合对比:高尔夫一共有三种排量... 2. 质保政策对比:高尔夫官方保修周期为3年或者... ( 1. Powertrain Comparison: The Golf is available in three different engine displacements... 2. Warranty Policy Comparison: The official warranty period for the Golf is 3 years or...) |
STS+MLM | 1. 更换4.5升5w40美孚全合成机油、博世机滤、博世空气滤芯、博世空调滤芯、博世汽油滤芯、3m燃油添加剂、免拆三元催化清洗剂2. 动平衡、更换原厂轮毂盖 ( 1. Replace 4.5 L of 5w40 Mobil fully synthetic engine oil, Bosch oil filter, Bosch air filter, Bosch cabin air filter, Bosch fuel filter, 3M fuel additive, and non-dismantle catalytic converter cleaner. 2. Dynamic balancing and replace original wheel hub covers. ) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, Z.; Yan, S.; Luo, S. Multitask Fine Tuning on Pretrained Language Model for Retrieval-Based Question Answering in Automotive Domain. Mathematics 2023, 11, 2733. https://doi.org/10.3390/math11122733
Luo Z, Yan S, Luo S. Multitask Fine Tuning on Pretrained Language Model for Retrieval-Based Question Answering in Automotive Domain. Mathematics. 2023; 11(12):2733. https://doi.org/10.3390/math11122733
Chicago/Turabian StyleLuo, Zhiyi, Sirui Yan, and Shuyun Luo. 2023. "Multitask Fine Tuning on Pretrained Language Model for Retrieval-Based Question Answering in Automotive Domain" Mathematics 11, no. 12: 2733. https://doi.org/10.3390/math11122733
APA StyleLuo, Z., Yan, S., & Luo, S. (2023). Multitask Fine Tuning on Pretrained Language Model for Retrieval-Based Question Answering in Automotive Domain. Mathematics, 11(12), 2733. https://doi.org/10.3390/math11122733