Next Article in Journal
Leveraging Learning Analytics to Improve the User Experience of Learning Management Systems in Higher Education Institutions
Previous Article in Journal
A Novel Method for Community Detection in Bipartite Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Big Five Personality Trait Prediction Based on User Comments

by
Kit-May Shum
*,
Michal Ptaszynski
* and
Fumito Masui
Text Information Processing Laboratory, Faculty of Engineering, Kitami Institute of Technology, 165 Koencho, Kitami 090-0015, Hokkaido, Japan
*
Authors to whom correspondence should be addressed.
Information 2025, 16(5), 418; https://doi.org/10.3390/info16050418
Submission received: 10 March 2025 / Revised: 11 May 2025 / Accepted: 14 May 2025 / Published: 20 May 2025

Abstract

The study of personalities is a major component of human psychology, and with an understanding of personality traits, practical applications can be used in various domains, such as mental health care, predicting job performance, and optimising marketing strategies. This study explores the prediction of Big Five personality trait scores from online comments using transformer-based language models, focusing on improving the model performance with a larger dataset and investigating the role of intercorrelations among traits. Using the PANDORA dataset from Reddit, the RoBERTa and BERT models, including both the base and large variants, were fine-tuned and evaluated to determine their effectiveness in personality trait prediction. Compared to previous work, our study utilises a significantly larger dataset to enhance the model’s generalisation and robustness. The results indicate that RoBERTa outperforms BERT across most metrics, with RoBERTa large achieving the best overall performance. In addition to evaluating the overall predictive accuracy, this study investigates the impact of intercorrelations among personality traits. A comparative analysis is conducted between a single-model approach, which predicts all five traits simultaneously, and a multiple-model approach, fine-tuning the models independently and each predicting a single trait. The findings reveal that the single-model approach achieves a lower RMSE and higher R2 values, highlighting the importance of incorporating trait intercorrelations in improving the prediction accuracy. Furthermore, RoBERTa large demonstrated a stronger ability to capture these intercorrelations compared to previous studies. These findings emphasise the potential of transformer-based models in personality computing and underscore the importance of leveraging both larger datasets and intercorrelations to enhance predictive performance.
Keywords: natural language processing (NLP); personality; automatic personality recognition; machine learning; personality computing; big five personality traits natural language processing (NLP); personality; automatic personality recognition; machine learning; personality computing; big five personality traits

Share and Cite

MDPI and ACS Style

Shum, K.-M.; Ptaszynski, M.; Masui, F. Big Five Personality Trait Prediction Based on User Comments. Information 2025, 16, 418. https://doi.org/10.3390/info16050418

AMA Style

Shum K-M, Ptaszynski M, Masui F. Big Five Personality Trait Prediction Based on User Comments. Information. 2025; 16(5):418. https://doi.org/10.3390/info16050418

Chicago/Turabian Style

Shum, Kit-May, Michal Ptaszynski, and Fumito Masui. 2025. "Big Five Personality Trait Prediction Based on User Comments" Information 16, no. 5: 418. https://doi.org/10.3390/info16050418

APA Style

Shum, K.-M., Ptaszynski, M., & Masui, F. (2025). Big Five Personality Trait Prediction Based on User Comments. Information, 16(5), 418. https://doi.org/10.3390/info16050418

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop