PLTA-FinBERT: Pseudo-Label Generation-Based Test-Time Adaptation for Financial Sentiment Analysis
Abstract
1. Introduction
- 1.
- Proposed a pseudo-label generation mechanism that integrates multi-perturbation prediction with confidence-based filtering to ensure pseudo-label reliability and reduce reliance on manual annotation.
- 2.
- Established a test-time adaptation strategy that enables FinBERT to dynamically update itself during inference, thereby overcoming the limitations of traditional static models.
- 3.
- Conducted empirical evaluations on the FiQA and financial sentiment analysis benchmark datasets, demonstrating that the proposed method achieves state-of-the-art performance across multiple metrics.
2. Related Work
2.1. Financial Sentiment Analysis
2.2. Test-Time Training
3. Method
3.1. Multi-Perturbation Prediction and Confidence Filtering
3.1.1. Data Augmentation
3.1.2. Confidence Filtering
3.2. Dynamic Update During Testing
| Algorithm 1 Self-Learning Framework |
|
4. Experiment
4.1. Dataset
4.2. Evaluation Metrics
4.3. Baseline
4.4. Experimental Parameters
4.5. Experimental Results
4.5.1. Experimental Results on Financial Sentiment Analysis
4.5.2. Experimental Results on FiQA-SA
4.5.3. Hyperparameter Analysis
4.5.4. Dynamic Adaptability Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Manaka, T.; Zyl, T.V.; Kar, D.; Wade, A. Multi-step transfer learning in natural language processing for the health domain. Neural Process. Lett. 2024, 56, 177. [Google Scholar] [CrossRef]
- Sheik, R.; Sundara, K.P.S.; Nirmala, S.J. Neural data augmentation for legal overruling task: Small deep learning models vs. large language models. Neural Process. Lett. 2024, 56, 121. [Google Scholar] [CrossRef]
- Meng, Z.; Cai, Z.; Feng, J.; Ma, H.; Zhang, H.; Li, S. Braille Character Segmentation Algorithm Based on Gaussian Diffusion. Comput. Mater. Contin. 2024, 79, 1143–1159. [Google Scholar] [CrossRef]
- Ranjan, R.; Sharma, K.; Kumar, A. Introduction to NLP in Finance: Sentiment Analysis and Risk Management. In Transformative Natural Language Processing: Bridging Ambiguity in Healthcare, Legal, and Financial Applications; Springer: Cham, Switzerland, 2025; pp. 75–100. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar]
- Araci, D. Finbert: Financial sentiment analysis with pre-trained language models. arXiv 2019, arXiv:1908.10063. [Google Scholar]
- Guo, Y.; Hu, C.; Yang, Y. Predict the future from the past? On the temporal data distribution shift in financial sentiment classifications. arXiv 2023, arXiv:2310.12620. [Google Scholar] [CrossRef]
- Daudert, T. A multi-source entity-level sentiment corpus for the financial domain: The FinLin corpus. Lang. Resour. Eval. 2022, 56, 333–356. [Google Scholar] [CrossRef] [PubMed]
- Rubtsova, Y. Reducing the deterioration of sentiment analysis results due to the time impact. Information 2018, 9, 184. [Google Scholar] [CrossRef]
- Sedinkina, M.; Breitkopf, N.; Schütze, H. Automatic domain adaptation outperforms manual domain adaptation for predicting financial outcomes. arXiv 2020, arXiv:2006.14209. [Google Scholar] [CrossRef]
- Gururangan, S.; Marasović, A.; Swayamdipta, S.; Lo, K.; Beltagy, I.; Downey, D.; Smith, N.A. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; Association for Computational Linguistics: Minneapolis, MN, USA, 2020; pp. 8342–8360. [Google Scholar]
- Sohangir, S.; Petty, N.; Wang, D. Financial sentiment lexicon analysis. In Proceedings of the 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 31 January–2 February 2018; IEEE: New York, NY, USA, 2018; pp. 286–289. [Google Scholar]
- Li, G.; Lin, Z.; Wang, H.; Wei, X. A discriminative approach to sentiment classification. Neural Process. Lett. 2020, 51, 749–758. [Google Scholar] [CrossRef]
- Loughran, T.; McDonald, B. Textual analysis in accounting and finance: A survey. J. Account. Res. 2016, 54, 1187–1230. [Google Scholar] [CrossRef]
- Renault, T. Sentiment analysis and machine learning in finance: A comparison of methods and models on one million messages. Digit. Financ. 2020, 2, 1–13. [Google Scholar] [CrossRef]
- Malo, P.; Sinha, A.; Korhonen, P.; Wallenius, J.; Takala, P. Good debt or bad debt: Detecting semantic orientations in economic texts. J. Assoc. Inf. Sci. Technol. 2014, 65, 782–796. [Google Scholar] [CrossRef]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Association for Computational Linguistics: Minneapolis, MN, USA, 2016; pp. 1480–1489. [Google Scholar]
- Johnson, E.; Nasir, W.; Smith, C. Contrastive Learning-Based Sentiment Analysis. Preprints 2024. [Google Scholar] [CrossRef]
- Wu, S.; Irsoy, O.; Lu, S.; Dabravolski, V.; Dredze, M.; Gehrmann, S.; Kambadur, P.; Rosenberg, D.; Mann, G. Bloomberggpt: A large language model for finance. arXiv 2023, arXiv:2303.17564. [Google Scholar] [CrossRef]
- Wang, N.; Yang, H.; Wang, C.D. Fingpt: Instruction tuning benchmark for open-source large language models in financial datasets. arXiv 2023, arXiv:2310.04793. [Google Scholar]
- Konstantinidis, T.; Iacovides, G.; Xu, M.; Constantinides, T.G.; Mandic, D. Finllama: Financial sentiment classification for algorithmic trading applications. arXiv 2024, arXiv:2403.12285. [Google Scholar] [CrossRef]
- Sun, Y.; Wang, X.; Liu, Z.; Miller, J.; Efros, A.; Hardt, M. Test-Time Training with Self-Supervision for Generalization under Distribution Shifts. In Proceedings of the 37th International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2020; pp. 9229–9248. [Google Scholar]
- He, H.; Hosseini, M.S.; Wang, Y. PathTTT: Test-Time Training with Meta-auxiliary Learning for Pathology Image Classification. In Proceedings of the International Conference on Information Processing in Medical Imaging, Kos, Greece, 25–30 May 2025; Springer: Berlin/Heidelberg, Germany, 2025; pp. 33–46. [Google Scholar]
- Goyal, S.; Sun, M.; Raghunathan, A.; Kolter, J.Z. Test time adaptation via conjugate pseudo-labels. In Advances in Neural Information Processing Systems; NeurIPS Foundation: San Diego, CA, USA, 2022; Volume 35, pp. 6204–6218. [Google Scholar]
- Banerjee, P.; Gokhale, T.; Baral, C. Self-supervised test-time learning for reading comprehension. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Association for Computational Linguistics: Minneapolis, MN, USA, 2021; pp. 1200–1211. [Google Scholar]
- Maia, M.; Handschuh, S.; Freitas, A.; Davis, B.; McDermott, R.; Zarrouk, M.; Balahur, A. Www’18 open challenge: Financial opinion mining and question answering. In Companion Proceedings of the Web Conference 2018, Lyon, France, 23–27 April 2018; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2018; pp. 1941–1942. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Minneapolis, MN, USA, 2014; pp. 1532–1543. [Google Scholar]
- Liu, Z.; Guo, X.; Lou, F.; Zeng, L.; Niu, J.; Wang, Z.; Xu, J.; Cai, W.; Yang, Z.; Zhao, X.; et al. Fin-r1: A large language model for financial reasoning through reinforcement learning. arXiv 2025, arXiv:2503.16252. [Google Scholar] [CrossRef]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems; NeurIPS Foundation: San Diego, CA, USA, 2019; Volume 32. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Yang, S.; Rosenfeld, J.; Makutonin, J. Financial aspect-based sentiment analysis using deep representations. arXiv 2018, arXiv:1808.07931. [Google Scholar] [CrossRef]
- Piao, G.; Breslin, J.G. Financial aspect and sentiment predictions with deep neural networks: An ensemble approach. In Companion Proceedings of the Web Conference 2018, Lyon, France, 23–27 April 2018; International World Wide Web Conferences Steering Committee: Lyon, France, 2018; pp. 1973–1977. [Google Scholar]





| Feature Category | Statistical Value |
|---|---|
| Total Samples | 5840 |
| Time Coverage | 2006–2022 |
| Text Type Distribution | |
| Traditional Financial News | 72.6% |
| Social Media Texts | 12.7% |
| Emerging Field Texts | 14.% |
| Sentiment Distribution | |
| Positive | 28% |
| Negative | 12% |
| Neutral | 59% |
| Task Type | Evaluation Metrics |
|---|---|
| Classification Task | Accuracy Macro F1-average |
| Regression Task | MSE (Mean Squared Error) (Coefficient of Determination) |
| Method Stability | AGF (Adaptive Gain Frequency) |
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Max sequence length | 64 tokens |
| Batch size | 1 |
| Number of perturbed variants per sample | 10 |
| Regression task: | |
| Learning rate | 48 |
| Variance threshold () | 0.1 |
| Noise weight | 0.1 |
| Classification task: | |
| Learning rate | 5 |
| Modal confidence threshold () | 0.8 |
| Noise weight | 0.1 |
| Computing device | NVIDIA RTX 3090 GPU (24 GB VRAM) |
| Random seed | 42 |
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| LSTM + GLoVe | 0.7086 | 0.6209 | 0.5713 | 0.5708 |
| Fin-R1 | 0.7577 | 0.7279 | 0.7537 | 0.7274 |
| XLNet | 0.8033 | 0.8209 | 0.8033 | 0.8093 |
| DistilBERT | 0.6758 | 0.6002 | 0.6758 | 0.6326 |
| BERT | 0.7905 | 0.7809 | 0.7905 | 0.7835 |
| FinBERT | 0.8051 | 0.7700 | 0.8300 | 0.7700 |
| PLTA-FinBERT (Ours) |
| Model | MSE | |
|---|---|---|
| Yang et al. [31] 1 | 0.08 | 0.40 |
| Piao and Breslin [32] 1 | 0.09 | 0.41 |
| PLTA-FinBERT |
| Example Text | True Label | Initial Prediction | Corrected Prediction |
|---|---|---|---|
| “The Vaisala Group is a successful international technology company that develops, manufactures and markets electronic measurement systems and products.” | Positive | Neutral | Positive |
| “Operating loss totalled EUR 0.3 mn, down from a profit of EUR 5.1 mn in the first half of 2009.” | Neutral | Negative | Neutral |
| “My $DWA play up 6% today. I’m still skeptical. Will take profits. Not a time cheer.” | Neutral | Positive | Neutral |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yang, H.; Chen, H.; Jiang, C.; He, J.; Li, P. PLTA-FinBERT: Pseudo-Label Generation-Based Test-Time Adaptation for Financial Sentiment Analysis. Big Data Cogn. Comput. 2026, 10, 59. https://doi.org/10.3390/bdcc10020059
Yang H, Chen H, Jiang C, He J, Li P. PLTA-FinBERT: Pseudo-Label Generation-Based Test-Time Adaptation for Financial Sentiment Analysis. Big Data and Cognitive Computing. 2026; 10(2):59. https://doi.org/10.3390/bdcc10020059
Chicago/Turabian StyleYang, Hai, Hainan Chen, Chang Jiang, Juntao He, and Pengyang Li. 2026. "PLTA-FinBERT: Pseudo-Label Generation-Based Test-Time Adaptation for Financial Sentiment Analysis" Big Data and Cognitive Computing 10, no. 2: 59. https://doi.org/10.3390/bdcc10020059
APA StyleYang, H., Chen, H., Jiang, C., He, J., & Li, P. (2026). PLTA-FinBERT: Pseudo-Label Generation-Based Test-Time Adaptation for Financial Sentiment Analysis. Big Data and Cognitive Computing, 10(2), 59. https://doi.org/10.3390/bdcc10020059

