Big Five Personality Trait Prediction Based on User Comments

Shum, Kit-May; Ptaszynski, Michal; Masui, Fumito

doi:10.3390/info16050418

Open AccessArticle

Big Five Personality Trait Prediction Based on User Comments

by

Kit-May Shum

^*,

Michal Ptaszynski

^*

and

Fumito Masui

Text Information Processing Laboratory, Faculty of Engineering, Kitami Institute of Technology, 165 Koen-cho, Kitami 090-0015, Hokkaido, Japan

^*

Authors to whom correspondence should be addressed.

Information 2025, 16(5), 418; https://doi.org/10.3390/info16050418

Submission received: 10 March 2025 / Revised: 11 May 2025 / Accepted: 14 May 2025 / Published: 20 May 2025

Download

Browse Figures

Versions Notes

Abstract

The study of personalities is a major component of human psychology, and with an understanding of personality traits, practical applications can be used in various domains, such as mental health care, predicting job performance, and optimising marketing strategies. This study explores the prediction of Big Five personality trait scores from online comments using transformer-based language models, focusing on improving the model performance with a larger dataset and investigating the role of intercorrelations among traits. Using the PANDORA dataset from Reddit, the RoBERTa and BERT models, including both the base and large variants, were fine-tuned and evaluated to determine their effectiveness in personality trait prediction. Compared to previous work, our study utilises a significantly larger dataset to enhance the model’s generalisation and robustness. The results indicate that RoBERTa outperforms BERT across most metrics, with RoBERTa large achieving the best overall performance. In addition to evaluating the overall predictive accuracy, this study investigates the impact of intercorrelations among personality traits. A comparative analysis is conducted between a single-model approach, which predicts all five traits simultaneously, and a multiple-model approach, fine-tuning the models independently and each predicting a single trait. The findings reveal that the single-model approach achieves a lower RMSE and higher

R^{2}

values, highlighting the importance of incorporating trait intercorrelations in improving the prediction accuracy. Furthermore, RoBERTa large demonstrated a stronger ability to capture these intercorrelations compared to previous studies. These findings emphasise the potential of transformer-based models in personality computing and underscore the importance of leveraging both larger datasets and intercorrelations to enhance predictive performance.

Keywords:

natural language processing (NLP); personality; automatic personality recognition; machine learning; personality computing; big five personality traits

1. Introduction

1.1. The Background

Personality traits are stable patterns of thoughts, emotions, and behaviours that indicate a consistent tendency to react in specific ways across different situations [1]. The study of personalities is a major component of human psychology, and extensive studies have been conducted to understand human behaviour, emotions, and social interactions. With an understanding of personality traits, practical applications can be used in various domains, such as mental health care [1], predicting job performance [2], optimising marketing strategies [3], and enhancing recommendation systems [4].

Various personality taxonomies have been developed to comprehend and assess personality—for example, the Big Five Model [5] and the Myers–Briggs Type Indicator (MBTI) [6]. The Big Five Model has emerged as one of the most widely utilised frameworks for personality evaluations due to its strong empirical backing and proven applicability across different cultures [7]. The Big Five Model consists of five key dimensions: Openness (OPE), Conscientiousness (CON), Extraversion (EXT), Agreeableness (AGR), and Neuroticism (NEU).

Personality trait assessments have traditionally depended on questionnaire-based methods, self-reports, or other reports [8]. Some well-known questionnaires that are used to assess the Big Five personality traits include the Revised NEO Personality Inventory (NEO-PI-R) [9] and the Big Five Inventory (BFI) [10]. These questionnaires generally include a set of items or statements that the participants evaluate by indicating their level of agreement or preference on a scale. The answers are subsequently scored and analysed to generate personality trait scores or profiles for each individual [10]. However, these questionnaire-based methods tend to be labour-intensive and time-intensive [8].

However, in the last decade, social media has revolutionised the way individuals express themselves, communicate, and engage with the world. With billions of users actively sharing their thoughts, experiences, and opinions online, social media platforms have become a rich data source reflecting diverse human behaviours and characteristics [11]. Researchers have recognised the opportunity to explore users’ personalities based on their social media posts and profiles using automated personality assessment methods, giving rise to the field of personality computing (PC).

PC is an emerging field that bridges personality research and computer science using computational methods and machine learning techniques [12]. It aims to identify and analyse personality-related information, such as Big Five personality trait scores, by utilising techniques such as Natural Language Processing (NLP) and machine learning (ML) algorithms to process the information encapsulated in various forms, including written text [13], smartphone interactions [14], speech patterns [15], and gameplay behaviour [16]. ML algorithms are trained to analyse this information automatically and predict self-reported personality trait scores or trait scores reported in other ways using signals detected by the system [12]. This eliminates the need for human evaluators, decreasing the labour and reducing the time consumption.

Previous research has leveraged architectures such as Recurrent Neural Networks (RNNs) to predict personality trait scores and has shown substantial results [17]. However, emergent transformers such as the Bidirectional Encoder Representations from Transformers (BERT) which were trained on extensive collections of text have demonstrated an exceptional performance across various tasks [18]. These advancements have also significantly propelled the development of automated PC.

Although numerous studies have focused on predicting personality trait scores, the intercorrelations between traits have not been sufficiently explored, despite these being considered essential for a deeper understanding of the underlying structure of personality [8]. This has led studies to focus on this gap in the field, deepening our understanding of the intercorrelations of personality traits [19]. However, no study mentioned has proven the efficiency of the intercorrelations of personality traits.

This paper presents a detailed methodological framework, outlining the preprocessing steps and the implementation of the models on a larger dataset compared to that used in previous work [19]. Specifically, the BERT model [18] and the RoBERTa model [20], an advanced version of BERT known for its ability to comprehend context and semantics in text, were tested. Their results were then analysed to identify the optimal model for predicting personality trait scores within the chosen models. Furthermore, the impact of personality trait intercorrelations in improving the prediction of personality trait scores was investigated to highlight its importance.

1.2. Problem Definition

Understanding personality traits from social media data has garnered significant attention due to its broad applications in fields such as psychology [21] and marketing [3]. However, in previous work [19], only a subset of the dataset has been used, which has limited the model’s ability to generalise to the broader online population. By utilising a full dataset, this study aims to enhance the model’s robustness and improve its applicability to diverse user-generated content. The performance of several models was evaluated to identify the most effective approach to accurately predicting personality trait scores from text data.

Furthermore, while many studies have focused on predicting individual personality trait scores, the intercorrelations between traits remain relatively under-explored [8]. This gap limits our understanding of the structure of personality and its influence on prediction accuracy.

To address this issue, this paper examines how incorporating intercorrelations among traits impacts the predictive accuracy, demonstrating that leveraging these relationships enhances the model performance and contributes to the development of more robust personality prediction models.

1.3. Research Contributions

The main contributions of this work are summarised as follows:

Continuous Trait Prediction: We investigate the effectiveness of transformer-based language models, specifically BERT and RoBERTa, in predicting continuous Big Five personality trait scores, a regression task that remains under-explored in personality computing compared to classification approaches;
Evaluation on Real-World Data: We assess the model performance under challenging conditions, including large-scale, noisy, real-world data and scenarios with limited supervision, reflecting practical deployment challenges;
Trait Intercorrelation Analysis: We examine how intercorrelations among personality traits affect the prediction performance by comparing a multi-trait transformer model with independently fine-tuned single-trait models, providing empirical insights to guide future research on transformer-based personality predictions.

2. The Literature Review

2.1. The Big Five

The Big Five Model, also referred to as the OCEAN model, is a widely accepted taxonomy that represents personality traits along a dimensional structure [5]. Independent research groups have investigated this taxonomy [5,22,23] and have consistently identified five distinct dimensions (traits) that account for inter-individual differences in personality [8]. The five different traits include [24] the following:

Openness: Characterised by a keen intellectual curiosity and a desire for new experiences and diversity;
Conscientiousness: Demonstrated through traits such as discipline, organisation, and achievement orientation;
Extraversion: Marked by increased sociability, assertiveness, and talkativeness;
Agreeableness: Refers to being helpful, cooperative, and sympathetic towards others;
Neuroticism: Refers to the degree of emotional stability, impulse control, and susceptibility to anxiety.

These five traits, while distinct, often exhibit meaningful intercorrelations, forming a cohesive personality profile. Additionally, these personality dimensions play a crucial role in forecasting individual behaviours and life outcomes. For instance, Conscientiousness has consistently been linked to job performance across many occupations [25]. On the other hand, Neuroticism is strongly associated with challenges in psychological adjustment and emotional stability, indicating that individuals having low Neuroticism helps them manage their feelings better and has strong links to their better health in the future [26]. Thus, understanding these traits and their interrelations can provide valuable insights into predicting various aspects of an individual’s behaviour, career success, and overall well-being.

2.2. Intercorrelation of the Big Five Personality Traits

The Big Five personality traits, though categorised separately, are not entirely independent; instead, they show meaningful intercorrelations that reflect underlying patterns in personality, a finding consistently highlighted by questionnaire-based approaches. For example, in one of the earliest works, Digman [27] reported a mean correlation of 0.26 across all traits after analysing data from 14 studies, examining the inter-scale correlations among the Big Five personality traits, signifying the existence of intercorrelations between the Big Five personality traits. Van der Linden et al. [28] also examined the intercorrelations among the Big Five personality traits, revising the correlation between Extraversion and Openness to 0.43. Additionally, they reported a strong positive correlation of 0.43 between Conscientiousness and Agreeableness. In contrast, they identified a negative correlation of

- 0.43

between Conscientiousness and Neuroticism and a negative correlation of

- 0.36

between Agreeableness and Neuroticism.

Alexander Kachur et al. [29] observed that Conscientiousness and Agreeableness exhibited the strongest positive correlation in both men and women, with an average correlation coefficient of 0.406. Additionally, the correlation between Conscientiousness and Neuroticism was found to be the strongest negative correlation across both genders, with an average correlation coefficient of

- 0.485

.

Research such as that mentioned above suggests that certain pairs of Big Five traits often show positive correlations. For instance, individuals high in Openness frequently display sociable and outgoing behaviours, which are characteristic of Extraversion. Similarly, those high in Conscientiousness are often cooperative and considerate, behaviours associated with Agreeableness.

Conversely, some traits tend to exhibit negative correlations. High Neuroticism is often associated with lower Conscientiousness, which suggests difficulties in maintaining organisation and self-discipline. Likewise, individuals with high Neuroticism scores tend to have low Agreeableness, indicating possible difficulties in cooperating and being considerate toward others [28].

While such intercorrelations and patterns are commonly observed in the research, they should not be regarded as absolute. Moreover, the study of these intercorrelations remains under-explored in the emerging field of personality computing [8]. Given the importance of intercorrelations to the foundation of personality trait theory, further research and integration in this area could significantly enhance the understanding of how personality traits interact and influence one another.

2.3. Personality Computing

Personality computing, an interdisciplinary field that bridges psychology and computer science, has its roots in traditional personality assessments, based primarily on questionnaires [9,10]. However, these traditional methods tend to be labour-intensive, time-intensive, and susceptible to biases such as social desirability or a lack of self-awareness [8]. As the field advanced, researchers began to seek alternative data sources and methodologies that could overcome these limitations and provide richer, more dynamic insights into personality. This shift was catalysed by the rapid expansion of the internet, which emerged as a potentially rich source of behavioural data.

Researchers started investigating the use of social media text to predict personality traits based on the Big Five traits and saw the success of using information from social media platforms as a valuable data source. Some of these earlier works involved using smaller sources of data such as essays [30,31] and blogs [32]. The release of the MyPersonality dataset [33], which contains information such as user profiles and labels for the Big Five model, from the widely used social media platform Facebook [34], was very successful and became a turning point in the field of personality computing. However, due to privacy concerns, most of the dataset was removed.

Automatic personality trait detection methods using text generated by online users are divided into two categories: the lexical method and the open vocabulary machine learning method [35].

The lexical method is most commonly represented by Linguistic Inquiry and Word Count (LIWC) [36]. LIWC is a text analysis tool that classifies words into psychological, linguistic, and cognitive categories. Its purpose is to analyse individuals’ emotional, social, and psychological states based on their written language. It works by matching words to a predefined dictionary and measuring the frequency of words in specific psychological or linguistic categories. These psychological features have been widely used as inputs for machine learning models in personality detection studies [12,21]. However, despite its popularity, LIWC has several limitations. One major drawback is its limited language support, as it only covers a small selection of languages [37]. Furthermore, LIWC relies solely on basic dictionary look-ups and word matching, making it unable to understand the context or interpret metaphorical language [38].

In contrast, the open vocabulary method analyses text data using machine learning and NLP techniques without relying on a predefined dictionary. This approach identifies patterns and linguistic features linked to psychological traits in the text [39]. Open vocabulary methods have demonstrated the ability to uncover more precise patterns across various content domains and handle ambiguous word meanings more effectively and are less prone to misinterpretation. These advantages make open vocabulary methods particularly effective in capturing the subtlety of everyday psychological processes [40].

In the last decade, deep neural networks (DNNs) and large language models (LLMs) have gained considerable attention due to their ability to model complex relationships within textual data, leading to significant advances in NLP applications [41]. These techniques have also been applied to personality computing. Tadesse et al. [42] used methods such as multilayer perceptrons, Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), and One-Dimensional Convolutional Neural Networks (1-DCNNs) to predict personality traits based on user behaviour on the Facebook platform. These methods helped to extract behavioural patterns from social media activity and text. Similarly, Xue et al. [41] employed deep-learning-based methods to recognise personality traits from text posts on online social networks, utilising DNNs and LSTMs to capture the complex relationships in the data and uncover complex patterns linked to personality traits.

Jefri Tanwijaya et al. [43] investigated the performance of traditional machine learning algorithms and a transformer model, finding that the transformer model outperformed the other machine learning models. Specifically, the transformer model achieved the highest accuracy of 75.46% in predicting Openness using the MyPersonality dataset [33], demonstrating its effectiveness in personality prediction tasks. Similarly, Arijanto et al. [44] leveraged BERT to analyse Twitter data and predict personality traits, further supporting the utility of transformer-based models in this domain. Moreover, Mehta et al. [45] highlighted the value of integrating psycholinguistic and language model features, such as BERT, to enhance the personality prediction performance. BERT excels in this task due to its ability to process complex language structures and semantic meanings [18], allowing it to detect subtle psychological traits embedded within the text.

This range of studies and approaches demonstrates the versatility and potential of the open vocabulary method in advancing automated personality computing.

3. Methodology

3.1. The Dataset

The PANDORA dataset [46], sourced from Reddit [47], was utilised in this study’s experiments. It comprises over 17 million comments on subreddits from over 10,000 users (authors), each annotated with self-reported personality labels. It provides three personality models: the Big Five, the Myers–Briggs Type Indicator (MBTI) [6], and the Enneagram [48]. However, only the Big Five data were used for the experiments in this study. Additionally, the PANDORA dataset includes the demographic information for some authors, such as gender, age, and country. Table 1 provides an overview of the dataset attributes, while Table 2 details the total number of authors with complete personality labels, along with their corresponding comments.

The MBTI9k dataset, based on the Pushift Reddit dataset with comments dating back to 2015, served as the foundation for the PANDORA dataset. Flairs—short descriptions users add to their profiles on various subreddits—and occasionally comments were used to collect MBTI and Enneagram labels, as these were easier to identify compared to Big Five personality trait scores.

In contrast, the Big Five scores were more challenging to gather due to their varied reporting formats depending on the test taken. Unlike MBTI and Enneagram labels, Big Five scores were typically not included in flairs but rather found in comments replying to posts about specific online tests.

To extract Big Five personality scores from the Reddit comments, a semi-automatic process was employed to address the unstandardised formats and lack of flairs. Comments mentioning traits were first retrieved and linked to posts referencing specific tests, identified through the links provided. Comments tied to text-based prediction services were discarded, and scores were extracted using test-specific regular expressions.

The extracted scores were manually verified to ensure the accuracy of the data and its association with the correct comments. For comments lacking identifiable links, a test identification classifier, with an F1-macro score of 81.4%, was used to determine the referenced tests. The score extraction procedure was then repeated for these comments, yielding a total of Big Five personality trait scores for 1608 authors.

The authors of this paper were not involved in the collection or verification of the Big Five personality scores in the dataset. For detailed information on the data collection process, please refer to the original publication [46].

The dataset was distributed into two separate CSV files: one containing the author information and the other containing their generated text. Some authors had missing information, which, as noted in the PANDORA paper, may have been excluded from the dataset upon request. Additionally, authors with incomplete Big Five scores were removed to ensure data consistency for the experiments. These exclusions resulted in fewer authors being included in this study compared to the number reported in the original paper.

From the first file, authors with complete Big Five scores were identified. Their corresponding comments from the second file were then extracted and combined into a single dataset. After processing, the final dataset included 1568 authors, yielding a total of 3,006,567 comments.

The dataset is divided into three subsets: training, validation, and test sets. First, the dataset is split into a training set and a test set, with 70% of the data allocated into the training set and the remaining 30% into the test set. This ensures that the test set remains independent and is used solely for the final evaluation of the model. To maintain reproducibility, the split is performed using a fixed random state of 42.

Next, the training set obtained from the initial split is further divided into a smaller training set and a validation set. The validation set accounts for 20% of the original training set, which corresponds to 14% of the total dataset. The remaining 56% of the total data is used as the final training set. Similarly, the random state is fixed at 42 during this second split to ensure consistency.

3.2. Data Cleaning and Preprocessing

The data cleaning and preprocessing for the dataset were minimal, as BERT is designed to understand the contextual meaning of words, eliminating the need for traditional preprocessing steps such as lemmatisation, stemming, stopword removal, and text case conversion [18,19]. BERT’s tokenizer uses WordPiece tokenisation to break words into subwords, allowing it to handle out-of-vocabulary words effectively by mapping them to smaller, meaningful units. This enables better representation and generalisation across diverse vocabulary [49].

Furthermore, BERT incorporates special tokens such as “[CLS]” and “[SEP]” to mark the start and end of sentences, allowing the model to process text contextually rather than treating words in isolation. This ability to capture context-dependent meanings makes BERT highly adaptable to linguistic variations [18].

The minimal data cleaning and preprocessing employed in this study also allow for a more realistic evaluation of how well BERT performs on real-world, noisy data. This approach provides valuable insights into the model’s ability to extract relevant information and make accurate predictions despite the challenges posed by unstructured and imperfect data.

3.3. Experiments

Two experiments that were conducted will be described in the following section. The models used, the flow of the training process, and their hyperparameters will be outlined in this section.

3.3.1. A Comparative Analysis of the RoBERTa and BERT Models for Predicting Big Five Personality Trait Scores

In previous work, Chen et al. [19] fine-tuned and evaluated two models—RoBERTa-base and Bidirectional Long Short-Term Memory (Bi-LSTM)—using a random sample of 20,000 comments from 134 authors. Building on this, the present study adopts a broader approach by including all authors with complete Big Five personality scores, a total of 3,006,567 comments from 1568 authors, thereby expanding the dataset for fine-tuning and evaluation. This larger dataset allows for a more comprehensive assessment of the model performance with a more generalised dataset.

In previous work [19], RoBERTa, an improved version of BERT, and Bi-LSTM were chosen to predict the Big Five personality trait scores. Although RoBERTa shares the same transformer-based architecture as BERT, utilising multiple layers of self-attention mechanisms, it sets itself apart with its extensive pretraining corpus [20]. This enables RoBERTa to effectively capture the underlying structure of language and the contextual meaning of words [19]. In the previous work [19], the results showed that RoBERTa base significantly outperformed Bi-LSTM across all the evaluation metrics, demonstrating the superiority of transformer-based architectures in capturing linguistic nuances and contextual relationships in a text.

The model used in this study is a pretrained model accessible through the Hugging Face library [50,51,52,53].

Raw text from the dataset is input into the model, and the model then processes it and outputs a vector representation for each input text. The vector representation is then passed through a dropout layer to reduce overfitting by randomly disabling a portion of the inputs during fine-tuning, helping the model learn more generalised and robust features. The dropout rate is set to 0.3.

After the dropout layer, the vector representation is passed through a fully connected linear layer, where it is transformed from a high-dimensional space into a lower-dimensional space that corresponds to the Big Five personality traits. The output size of this linear layer is set to 5 to match the multitask regression problem, which is appropriate for this regression task, personality prediction.

The models are fine-tuned using the AdamW optimiser, a variant of Adam that includes a weight decay for better regularisation and has a learning rate of

1 \times 10^{- 5}

. The loss function used is the Mean Squared Error (MSE), and the models are fine-tuned over 10 epochs with a batch size of 16.

Additionally, to explore potential improvements, the RoBERTa large model is fine-tuned and evaluated alongside the base variant. To further assess the suitability of transformer-based models for this task, BERT base and BERT large are also included in this study.

The hyperparameters are referenced from the previous work [19], which identified them as the most optimal through a grid search strategy. Moreover, to ensure fair and reliable comparisons, the fine-tuning process, the experimental environment, and the hyperparameters are kept consistent across all models. This controlled setup allows for an objective evaluation of the relative strengths and weaknesses of each model.

3.3.2. Investigating the Impact of Intercorrelations Among Big Five Personality Traits

To investigate the impact of the intercorrelations of the Big Five personality traits, two fine-tuning approaches were employed and tested.

The first approach is a single-model approach, fine-tuning a single model to predict all five traits simultaneously. This model received all Big Five personality trait scores as inputs and was trained to predict all five scores at once. Table 3 provides a sample of the data used for fine-tuning with the single-model approach. The first column, labelled “body”, contains comments written by the authors, while the five columns on the right represent the corresponding Big Five personality trait scores. As shown in the table, all five personality trait scores are used together to fine-tune a single model.

In contrast, the second approach is a multiple-model approach, fine-tuning five separate models separately, with each dedicated to predicting only one specific trait. In this approach, each model received input data from which irrelevant trait scores were removed to ensure a focused learning process. Table 4 visually represents the data used for this fine-tuning method. The “body” column contains the authors’ comments, while the “agr”, column represents the corresponding authors’ Big Five personality trait scores that the model is being fine-tuned on, and in this table, the Agreeableness personality trait scores are displayed.

Both approaches used the same models (the RoBERTa base model), parameters, dataset, and fine-tuning process as described in Section 3.3.1 to ensure comparability. The only difference was in the output size of the linear layer: in the single-model approach, the linear layer following the dropout layer had an output size of 5, consistent with the multitask regression setup. In contrast, the multiple-model approach set the output size to 1 for each model, as each was trained to predict a single personality trait independently.

4. Results

This section begins by introducing the evaluation metrics used to assess the model performance in Section 4.1. Then, Section 4.1 and Section 4.3 present the results of the two experiments described in Section 3.3, along with a comparison with the findings from previous work [19], highlighting key differences and insights.

4.1. Evaluation Metrics

The evaluation metrics chosen for this study are the Root Mean Square Error (RMSE) and the coefficient of determination (

R^{2}

). These metrics were selected due to their complementary perspectives on model performance, offering a comprehensive evaluation framework. Additionally, using these metrics facilitates a clear and direct comparison of the results between this study and the findings presented in the previous work [19]. A lower RMSE signifies a greater accuracy by reducing the prediction errors.

The RMSE quantifies the average prediction error, providing insights into how much the predicted values typically deviate from the actual values on average. It is an important metric for evaluating model performance and is widely used as a benchmark in numerous studies within the field of personality computation.

In contrast,

R^{2}

, or the coefficient of determination, assesses the extent to which the model’s predictions align with the actual values by measuring the explained variance [54]. It reflects the proportion of variance in the dependent variable that can be attributed to the independent variable, thereby indicating the model’s overall goodness of fit. A higher

R^{2}

reflects a better model performance by indicating stronger alignment between the predicted and actual values.

Equations (1) and (2) illustrate the calculations for the RMSE and

R^{2}

, respectively.

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(1)

R^{2} = 1 - \frac{\sum {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum {(y_{i} - \bar{y})}^{2}}

(2)

n: Number of data points;
$y_{i}$ : The actual value for the i-th data point;
${\hat{y}}_{i}$ : The predicted value for the i-th data point;
$\bar{y}$ : The mean of the actual values.

4.2. A Comparative Analysis of the RoBERTa and BERT Models for Predicting Big Five Personality Trait Scores

Table 5 presents the performance of the four transformer-based models: RoBERTa base, RoBERTa large, BERT base, and BERT large. The table includes a comparison of the evaluation metrics covered in Section 4.1 to assess how well each model predicts the Big Five personality traits. The results provide insights into the impact of the model’s size and architecture on performance, helping to determine the most effective model for this task.

The average RMSE values across all of the models in this paper were similar, approximately 0.26, indicating comparable levels of prediction error. However, the RoBERTa models outperformed the BERT models in terms of

R^{2}

, with the large RoBERTa model achieving the highest average

R^{2}

value of 0.2404. Furthermore, the large RoBERTa model demonstrated a slightly better performance than that of its base counterpart, suggesting that the dataset’s size was sufficient to effectively leverage the increased capacity of the larger models. However, both the base and large BERT models exhibited a nearly identical performance, showing no improvement despite the increased model capacity.

When examining the prediction of the individual traits, Openness consistently exhibited the lowest RMSE values across all models, indicating that it was the easiest trait to predict. Similarly, Extraversion achieved the highest

R^{2}

values across all models, highlighting its relative predictability. In contrast, the models struggled with predictions of Conscientiousness, Agreeableness, and Neuroticism. Conscientiousness posed significant challenges, as three models (RoBERTa base, BERT base, and BERT large) recorded the lowest

R^{2}

values for this trait. Agreeableness was also difficult to predict, with two models yielding the highest RMSE values for this trait. Neuroticism proved particularly challenging for the RoBERTa large model, which recorded both the highest RMSE and the lowest

R^{2}

values for this trait.

These findings highlight that while a larger model such as RoBERTa large generally outperforms smaller models with this dataset, the prediction difficulty varies significantly across traits. Traits like Openness and Extraversion are more predictable, while others, such as Conscientiousness and Neuroticism, present persistent challenges even for advanced architectures.

Figure 1 illustrates the training loss trajectories for the four models, while Figure 2 provides the corresponding validation loss trends for the same models.

Based on the training loss graph (Figure 1), RoBERTa large consistently achieves the lowest training loss across all epochs, demonstrating its effectiveness in learning from the training data. BERT large also shows a strong performance, with a lower training loss compared to that if both RoBERTa base and BERT base. This shows that the larger variants were able to handle the dataset better compared to the base variants.

Turning to the validation loss graph (Figure 2), RoBERTa large again outperforms the other models, achieving the lowest validation loss throughout, which indicates its superior generalisation capabilities. Notably, despite exhibiting a higher training loss than BERT large, RoBERTa base generally achieves a better validation performance, indicating more effective generalisation to unseen data.

While BERT large performs slightly better than BERT base, both are outperformed by the RoBERTa models. These results further highlight the advantages of the RoBERTa large architecture in effectively capturing linguistic nuances and context for this task.

Table 6 compares the results of a previous study [19], the models fine-tuned on a subset of the dataset used in this study, and the best-performing model in this paper—RoBERTa large—fine-tuned on a larger dataset. The columns labelled “Mean Baseline” and “RoBERTa” present the results from the reference study [19], with “Mean Baseline” serving as a simplistic benchmark and “RoBERTa” showing the performance of the RoBERTa base model. In contrast, the column labelled “RoBERTa (large)” represents the best-performing model in this paper, the RoBERTa large model.

The RoBERTa large model, which was fine-tuned on a larger dataset, outperformed the mean baseline in all five Big Five personality traits, demonstrating a superior predictive accuracy across the board. In comparison to the mean baseline, the RoBERTa large model consistently achieved lower RMSE values and higher

R^{2}

scores, indicating an improved model performance, especially in the

R^{2}

evaluation metric.

However, when comparing RoBERTa large to the RoBERTa model used in the previous work, we observe that the larger dataset used to fine-tune RoBERTa large did not translate into a better performance in the evaluation metrics used. Despite its larger training data, RoBERTa large underperformed relative to the RoBERTa model from the previous work in terms of both the RMSE and

R^{2}

across all traits. Specifically, the RoBERTa model from the previous work achieved an average RMSE of 0.2241 and an average

R^{2}

of 0.4148, both of which were generally better than those of RoBERTa large, which achieved an average RMSE of 0.2606 and an average

R^{2}

of 0.2404. This suggests that even though a larger dataset was used to fine-tune RoBERTa large, this did not necessarily improve its predictive capabilities compared to those of RoBERTa fine-tuned on a smaller dataset in the previous work.

Despite these differences, the two models exhibited similar trends in terms of trait prediction. Both RoBERTa models showed the lowest RMSE in predicting Openness, suggesting that the models were more accurate for this trait. Additionally, both models showed the highest

R^{2}

for Extraversion, indicating their relatively stronger performance in predicting this trait. However, both models struggled with Neuroticism, as it was associated with the highest RMSE and the lowest

R^{2}

in both cases. This suggests that predicting Neuroticism is particularly challenging for these models, regardless of the dataset size.

4.3. Investigating the Impact of Intercorrelations Among Big Five Personality Traits

Table 7 summarises the results of the Big Five personality trait score predictions in Section 3.3.2 in comparison to the previous work [19]. The columns labelled “Mean Baseline” and “RoBERTa” present the results from the reference study [19]. In contrast, the columns labelled “RoBERTa (Single)” and “RoBERTa (Multiple)” reflect the results of this study. “RoBERTa (Single)” represents the results of the single-model approach, fine-tuning a single model to predict all five traits simultaneously, while “RoBERTa (Multiple)” refers to the average performance of the separate models in the multiple-model approach, fine-tuned individually for each trait.

When the overall performance of the single-model RoBERTa approach in this study is compared with the results of RoBERTa in the previous work, it is observed that the model fine-tuned in this study underperformed despite it being fine-tuned on a larger dataset. However, the trends in the results are mostly consistent between the two models.

For both models, Openness achieved the lowest RMSE (0.1742 for the model from the previous work and 0.2345 for the single-model approach), while Neuroticism had the highest RMSE (0.2527 for the model from the previous study and 0.2737 for the single-model approach). This indicates that Openness is relatively easier to predict, whereas Neuroticism is more challenging. In terms of

R^{2}

, both models also agree that Extraversion is the easiest trait to predict, with the model from the previous work scoring 0.5229 and the single-model approach scoring 0.2604. However, the model fine-tuned in this study recorded the lowest

R^{2}

for Conscientiousness (0.2161), whereas the previous work reported Neuroticism as having the lowest

R^{2}

(0.3248). This suggests differences in the trait-specific performance despite some alignment in the trait predictability trends.

In contrast, the results for the multiple-model approach, where separate models were fine-tuned for each trait, were closer to the mean baseline results. These models exhibited higher RMSE values overall and

R^{2}

values close to zero, indicating their poor predictive performance. This suggests that fine-tuning separate models for each trait is less effective and produces more inconclusive results than the single-model approach. This highlights the significance of the intercorrelations among the Big Five personality traits in enhancing the predictive performance.

5. Discussion

Figure 3 illustrates the distribution of the Big Five personality trait scores that were used in the experiments conducted in this paper. Although the distribution does not appear to have many outliers or extreme bias, the dataset predominantly comprises authors from English-speaking countries, specifically the United States and Canada, as reported by Matej Gjurkovic et al. [46]. Consequently, the dataset may not fully represent the distribution of the global online population, which could limit the generalisability of the findings.

The average number of comments per author is approximately 1917, with the range varying from a single comment to as many as 52,406 comments. The author column was removed during the fine-tuning process, which prevented the models from distinguishing between repeated authors. This may have caused the models to overfit to certain authors who had a fixed set of Big Five personality scores. Such overfitting could lead to a lack of generalisation and poorer performance on unseen data.

Table 8 presents the personality trait correlations between the Big Five personality traits derived from the dataset and those predicted by the RoBERTa large model. Slight differences can be observed between the two sets of correlations. Notably, the largest differences occur in the trait pairs AGR-CON (

- 0.1290

) and EXT-OPE (

- 0.0653

). However, these variations are relatively minor. A paired t-test analysis of the actual and predicted personality trait correlations yielded a p-value of 0.9367, which was significantly greater than the commonly accepted significance threshold of 0.05. This result indicates that the observed differences are not statistically significant, suggesting that the variations between the predicted and actual correlations can be attributed to random variation rather than systematic error.

Table 9 compares the intercorrelations of the Big Five personality traits from the dataset used in the experiments with the intercorrelations predicted by the RoBERTa models from the previous work [19]. Although RoBERTa fine-tuned with a larger dataset underperformed compared to the model from previous work, it demonstrated a lower average absolute difference in the intercorrelations between the actual and predicted values. Specifically, the absolute average difference in this paper is 0.0372, compared to 0.2540 for the previous work. This finding indicates that while the overall performance was lower based on the evaluation metrics used in this paper, the RoBERTa large model was able to capture the intercorrelations between the Big Five personality traits better after being fine-tuned on a larger dataset.

6. Conclusions and Future Work

This study highlights the impact of the dataset size on the model performance, revealing that while larger datasets can lead to greater generalisation, they may also lower the predictive accuracy. However, this trade-off allowed the model to capture the intercorrelations among the Big Five personality traits better, an essential factor in accurately predicting personality trait scores.

Among the models evaluated, the RoBERTa large model outperformed its base counterpart, achieving the lowest RMSE and the highest

R^{2}

. In contrast, the BERT large model showed minimal improvements over its base counterpart, despite its more complex architecture designed for large datasets. While the RoBERTa large model underperformed in its evaluation metrics compared to those in the previous work [19], it effectively modelled the intercorrelations between traits, achieving an average absolute difference of just 0.0372.

Interestingly, all of the tested models demonstrated a stronger predictive accuracy for Openness and Extraversion but struggled with Neuroticism, Agreeableness, and Conscientiousness. This study also reinforces the importance of intercorrelations among the Big Five personality traits, an aspect often overlooked in previous research.

Another key finding was the advantage of fine-tuning the models simultaneously on all five traits rather than training separate models for each trait. While the separately fine-tuned models exhibited high RMSE values and an average

R^{2}

close to zero, similar to the mean baseline, the simultaneous fine-tuning approach significantly reduced the RMSE and increased

R^{2}

, demonstrating a superior performance.

These results highlight the potential of RoBERTa, particularly its large model variant, to enhance personality trait predictions. They also underscore the under-explored influence of intercorrelations among the Big Five traits, which could improve both the prediction accuracy and model interpretability.

Despite the strong performance in capturing the intercorrelations of the Big Five personality traits, future work should focus on addressing potential dataset biases and enhancing the model’s generalisability to improve the accuracy of personality trait predictions. Given the large range of comments per author, balancing the dataset by imposing constraints on the number of comments or the number of words per user can help to prevent overfitting to specific score patterns. Additionally, incorporating demographic labels from the original dataset as features will provide valuable context, allowing the model to account for variations across different population groups. Expanding the dataset to include multiple languages and a broader representation of countries and regions would further enhance the model’s applicability, ensuring it reflected the diversity of the global online population better.

To boost the performance further, testing various hyperparameter configurations or exploring newer LLMs like LLaMA, which may be better at capturing the intercorrelations between the traits, could improve the accuracy and predict personality trait scores better. Future research can advance the field of personality trait prediction by addressing these key areas. This will ensure that models are not only more accurate but also equitable and generalisable across diverse contexts and populations.

Author Contributions

Conceptualization: M.P. and K.-M.S. Methodology: M.P. and K.-M.S. Software: K.-M.S. Validation: F.M., M.P. and K.-M.S. Formal analysis: K.-M.S. Investigation: K.-M.S. Resources: K.-M.S. Data curation: K.-M.S. Writing—original draft preparation: K.-M.S. Writing—review and editing: M.P. and K.-M.S. Visualization: M.P. and K.-M.S. Supervision: F.M. and M.P. Project administration: F.M. and M.P. Funding acquisition: F.M. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study utilizes the PANDORA dataset, available at https://psy.takelab.fer.hr/datasets/all/ (accessed on 12 April 2024), with the code for this research available at https://github.com/kitmay2001/Big-Five-personality-score-predictions_RoBERTa_and_BERT (accessed on 8 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AGR	Agreeableness
BERT	Bidirectional Encoder Representations from Transformers
BFI	Big Five Inventory
Bi-LSTM	Bidirectional Long Short-Term Memory
CLS	Classification Token
CON	Conscientiousness
CSV	Comma-Separated Values
DNN	Deep Neural Network
EXT	Extraversion
GRU	Gated Recurrent Unit
LIWC	Linguistic Inquiry and Word Count
LLaMA	Large Language Model Meta AI
LLM	Large Language Model
LSTM	Long Short-Term Memory
MBTI	Myers–Briggs Type Indicator
ML	Machine Learning
MSE	Mean Squared Error
NEO	Neuroticism, Extraversion, Openness
NEU	Neuroticism
NLP	Natural Language Processing
OCEAN	Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism
OPE	Openness
PC	Personality Computing
RoBERTa	Robustly Optimized BERT Pretraining Approach
RMSE	Root Mean Square Error
RNN	Recurrent Neural Network
$R^{2}$	Coefficient of Determination
SEP	Separator Token
1-DCNN	One-Dimensional Convolutional Neural Network

References

Kang, W.; Steffens, F.; Pineda, S.; Widuch, K.; Malvaso, A. Personality traits and dimensions of mental health. Sci. Rep. 2023, 13, 7091. [Google Scholar] [CrossRef] [PubMed]
He, Y.; Donnellan, M.B.; Mendoza, A.M. Five-factor personality domains and job performance: A second order meta-analysis. J. Res. Personal. 2019, 82, 103848. [Google Scholar] [CrossRef]
Caliskan, A. Applying the right relationship marketing strategy through big five personality traits. J. Relatsh. Mark. 2019, 18, 196–215. [Google Scholar] [CrossRef]
Ricci, F.; Rokach, L.; Shapira, B. Recommender systems: Techniques, applications, and challenges. In Recommender Systems Handbook; Springer Nature: Berlin/Heidelberg, Germany, 2021; pp. 1–35. [Google Scholar]
Costa, P.T., Jr.; McCrae, R.R. The five-factor model of personality and its relevance to personality disorders. J. Personal. Disord. 1992, 6, 343–359. [Google Scholar] [CrossRef]
Myers, I.B. The Myers-Briggs Type Indicator: Manual (1962); PsycInfo Database Record (c) 2025 APA; American Psychological Association (APA): Washington, DC, USA, 1962. [Google Scholar]
Ching, C.M.; Church, A.T.; Katigbak, M.S.; Reyes, J.A.S.; Tanaka-Matsumi, J.; Takaoka, S.; Zhang, H.; Shen, J.; Arias, R.M.; Rincon, B.C.; et al. The manifestation of traits in everyday behavior and affect: A five-culture study. J. Res. Personal. 2014, 48, 1–16. [Google Scholar] [CrossRef]
Fang, Q.; Giachanou, A.; Bagheri, A.; Boeschoten, L.; van Kesteren, E.J.; Kamalabad, M.S.; Oberski, D. On text-based personality computing: Challenges and future directions. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 10861–10879. [Google Scholar]
Costa, P.T.; McCrae, R.R. The revised neo personality inventory (neo-pi-r). SAGE Handb. Personal. Theory Assess. 2008, 2, 179–198. [Google Scholar]
John, O.P.; Srivastava, S. The Big-Five trait taxonomy: History, measurement, and theoretical perspectives. In Handbook of personality: Theory and Research; Guilford Press: New York, NY, USA, 1999. [Google Scholar]
Azucar, D.; Marengo, D.; Settanni, M. Predicting the Big 5 personality traits from digital footprints on social media: A meta-analysis. Personal. Individ. Differ. 2018, 124, 150–159. [Google Scholar] [CrossRef]
Phan, L.V.; Rauthmann, J.F. Personality computing: New frontiers in personality assessment. Soc. Personal. Psychol. Compass 2021, 15, e12624. [Google Scholar] [CrossRef]
Feizi-Derakhshi, A.R.; Feizi-Derakhshi, M.R.; Ramezani, M.; Nikzad-Khasmakhi, N.; Asgari-Chenaghlu, M.; Akan, T.; Ranjbar-Khadivi, M.; Zafarni-Moattar, E.; Jahanbakhsh-Naghadeh, Z. Text-based automatic personality prediction: A bibliographic review. J. Comput. Soc. Sci. 2022, 5, 1555–1593. [Google Scholar] [CrossRef]
Harari, G.M.; Vaid, S.S.; Müller, S.R.; Stachl, C.; Marrero, Z.; Schoedel, R.; Bühner, M.; Gosling, S.D. Personality sensing for theory development and assessment in the digital age. Eur. J. Personal. 2020, 34, 649–669. [Google Scholar] [CrossRef]
Lukac, M. Speech-based personality prediction using deep learning with acoustic and linguistic embeddings. Sci. Rep. 2024, 14, 30149. [Google Scholar] [CrossRef] [PubMed]
Quwaider, M.; Alabed, A.; Duwairi, R. Shooter video games for personality prediction using five factor model traits and machine learning. Simul. Model. Pract. Theory 2023, 122, 102665. [Google Scholar] [CrossRef]
Kosan, M.A.; Karacan, H.; Urgen, B.A. Predicting personality traits with semantic structures and LSTM-based neural networks. Alex. Eng. J. 2022, 61, 8007–8025. [Google Scholar] [CrossRef]
Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Chen, Y. Exploring the Intercorrelations of Big Five Personality Traits: Comparing Questionnaire-Based Methods and Automated Personality Assessment using BERT and RNN Models. Master’s Thesis, Utrecht University, Utrecht, The Netherlands, 2023. [Google Scholar]
Liu, Y. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Li, M.; Liu, H.; Wu, B.; Bai, T. Language style matters: Personality prediction from textual styles learning. In Proceedings of the 2022 IEEE International Conference on Knowledge Graph (ICKG), Orlando, FL, USA, 30 November–1 December 2022; pp. 141–148. [Google Scholar]
Cattell, R.B. The description of personality. I. Foundations of trait measurement. Psychol. Rev. 1943, 50, 559. [Google Scholar] [CrossRef]
Tupes, E.C.; Christal, R.E. Recurrent personality factors based on trait ratings. J. Personal. 1992, 60, 225–251. [Google Scholar] [CrossRef]
Komarraju, M.; Karau, S.J.; Schmeck, R.R.; Avdic, A. The Big Five personality traits, learning styles, and academic achievement. Personal. Individ. Differ. 2011, 51, 472–477. [Google Scholar] [CrossRef]
Wilmot, M.P.; Ones, D.S. A century of research on conscientiousness at work. Proc. Natl. Acad. Sci. USA 2019, 116, 23004–23010. [Google Scholar] [CrossRef]
Oh, V.; Tong, E.M. Negative emotion differentiation and long-term physical health—The moderating role of neuroticism. Health Psychol. 2020, 39, 127. [Google Scholar] [CrossRef]
Digman, J.M. Higher-order factors of the Big Five. J. Personal. Soc. Psychol. 1997, 73, 1246. [Google Scholar] [CrossRef] [PubMed]
van der Linden, D.; te Nijenhuis, J.; Bakker, A.B. The General Factor of Personality: A meta-analysis of Big Five intercorrelations and a criterion-related validity study. J. Res. Personal. 2010, 44, 315–327. [Google Scholar] [CrossRef]
Kachur, A.; Osin, E.; Davydov, D.; Shutilov, K.; Novokshonov, A. Assessing the Big Five personality traits using real-life static facial images. Sci. Rep. 2020, 10, 8487. [Google Scholar] [CrossRef]
Pennebaker, J.W.; King, L.A. Linguistic styles: Language use as an individual difference. J. Personal. Soc. Psychol. 1999, 77, 1296. [Google Scholar] [CrossRef]
Mairesse, F.; Walker, M.A.; Mehl, M.R.; Moore, R.K. Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. 2007, 30, 457–500. [Google Scholar] [CrossRef]
Oberlander, J.; Nowson, S. Whose thumb is it anyway? Classifying author personality from weblog text. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, 17–18 July 2006; pp. 627–634. [Google Scholar]
Stillwell, D.J.; Kosinski, M. myPersonality project: Example of successful utilization of online social networks for large-scale social research. Am. Psychol. 2004, 59, 93–104. [Google Scholar]
Meta Platforms, Inc. Facebook. 2004. Available online: https://www.facebook.com/ (accessed on 18 January 2025).
Ren, Z.; Shen, Q.; Diao, X.; Xu, H. A sentiment-aware deep learning approach for personality detection from text. Inf. Process. Manag. 2021, 58, 102532. [Google Scholar] [CrossRef]
Pennebaker, J.W.; Boyd, R.L.; Jordan, K.; Blackburn, K. LIWC2015 User’s Manual; LIWC.net: Austin, TX, USA, 2015; Available online: www.liwc.net (accessed on 18 January 2025).
Adi, G.Y.N.; Tandio, M.H.; Ong, V.; Suhartono, D. Optimization for automatic personality recognition on Twitter in Bahasa Indonesia. Procedia Comput. Sci. 2018, 135, 473–480. [Google Scholar] [CrossRef]
Ptaszynski, M.; Zasko-Zielinska, M.; Marcinczuk, M.; Leliwa, G.; Fortuna, M.; Soliwoda, K.; Dziublewska, I.; Hubert, O.; Skrzek, P.; Piesiewicz, J.; et al. Looking for razors and needles in a haystack: Multifaceted analysis of suicidal declarations on social media—A pragmalinguistic approach. Int. J. Environ. Res. Public Health 2021, 18, 11759. [Google Scholar] [CrossRef]
Schwartz, H.A.; Eichstaedt, J.C.; Kern, M.L.; Dziurzynski, L.; Ramones, S.M.; Agrawal, M.; Shah, A.; Kosinski, M.; Stillwell, D.; Seligman, M.E.; et al. Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE 2013, 8, e73791. [Google Scholar] [CrossRef]
Eichstaedt, J.C.; Kern, M.L.; Yaden, D.B.; Schwartz, H.A.; Giorgi, S.; Park, G.; Hagan, C.A.; Tobolsky, V.A.; Smith, L.K.; Buffone, A.; et al. Closed-and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations. Psychol. Methods 2021, 26, 398. [Google Scholar] [CrossRef] [PubMed]
Xue, D.; Wu, L.; Hong, Z.; Guo, S.; Gao, L.; Wu, Z.; Zhong, X.; Sun, J. Deep learning-based personality recognition from text posts of online social networks. Appl. Intell. 2018, 48, 4232–4246. [Google Scholar] [CrossRef]
Tadesse, M.M.; Lin, H.; Xu, B.; Yang, L. Personality predictions based on user behavior on the facebook social media platform. IEEE Access 2018, 6, 61959–61969. [Google Scholar] [CrossRef]
Tanwijaya, J.; Suhartono, D. Towards Personality Identification from Social Media Text Status Using Machine Learning and Transformer. ICIC Express Lett. 2022, 13, 233–240. [Google Scholar]
Arijanto, J.E.; Geraldy, S.; Tania, C.; Suhartono, D. Personality prediction based on text analytics using bidirectional encoder representations from transformers from english twitter dataset. Int. J. Fuzzy Log. Intell. Syst. 2021, 21, 310–316. [Google Scholar] [CrossRef]
Mehta, Y.; Fatehi, S.; Kazameini, A.; Stachl, C.; Cambria, E.; Eetemadi, S. Bottom-up and top-down: Predicting personality with psycholinguistic and language model features. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 1184–1189. [Google Scholar]
Gjurković, M.; Karan, M.; Vukojević, I.; Bošnjak, M.; Šnajder, J. PANDORA talks: Personality and demographics on Reddit. arXiv 2020, arXiv:2004.04460. [Google Scholar]
Inc., R. Reddit. 2005. Available online: https://www.reddit.com/ (accessed on 18 January 2025).
Riso, D. Personality Types: Using the Enneagram for Self-Discovery; Houghton Mifflin Company: Boston, MA, USA, 1996. [Google Scholar]
Ma, W.; Cui, Y.; Si, C.; Liu, T.; Wang, S.; Hu, G. CharBERT: Character-aware pre-trained language model. arXiv 2020, arXiv:2011.01513. [Google Scholar]
Chaumond, J.; Sanseviero, O.; Debut, L. BERT Base Model (Cased). 2018. Available online: https://huggingface.co/google-bert/bert-base-cased (accessed on 13 June 2024).
Chaumond, J.; Sanseviero, O.; Debut, L. BERT Large Model (Cased). 2018. Available online: https://huggingface.co/google-bert/bert-large-cased (accessed on 13 June 2024).
Chaumond, J.; Debut, L. RoBERTa Base Model. 2019. Available online: https://huggingface.co/FacebookAI/roberta-base (accessed on 13 June 2024).
Chaumond, J.; Debut, L. RoBERTa Large Model. 2019. Available online: https://huggingface.co/FacebookAI/roberta-large (accessed on 13 June 2024).
Uma, A.N.; Fornaciari, T.; Hovy, D.; Paun, S.; Plank, B.; Poesio, M. Learning from disagreement: A survey. J. Artif. Intell. Res. 2021, 72, 1385–1470. [Google Scholar] [CrossRef]

Figure 1. Training loss progression for RoBERTa base (blue), RoBERTa large (red), BERT base (green), and BERT large (yellow) over 10 training epochs.

Figure 2. Validation loss progression for RoBERTa base (blue), RoBERTa large (red), BERT base (green), and BERT large (yellow) over 10 training epochs.

Figure 3. Distribution of Big Five personality trait scores (Note: Extraversion (EXT), Neuroticism (NEU), Agreeableness (AGR), Conscientiousness (CON), Openness (OPE)).

Table 1. PANDORA dataset details.

Attributes	Details
Source	Reddit
Period	2015–2020
Total authors	10,295
Total comments	17,640,979
Demographic labels	Age, Gender, Location *
Language	English, Spanish, French *

* Note: Demographic labels were not available for all authors. Only the three most common languages in the dataset are listed.

Table 2. Number of comments and authors with complete personality labels.

Personality Taxonomy	Number of Authors	Number of Comments
MBTI	9067	15,555,974
Big Five	1568	3,006,567
Enneagram	794	1,458,816
Big Five + MBTI	377	1,045,375
Big Five + Enneagram	64	235,883
MBTI + Enneagram	793	1,457,625
Big Five + MBTI + Enneagram	63	234,692

Table 3. Sample input data and corresponding Big Five trait scores for single-model fine-tuning.

Body	Agr	Ope	Con	Ext	Neu
I admit having fallen into the trap myself…	0.3	0.7	0.15	0.15	0.5
thats a great business idea, why didn’t i…	0.09	0.59	0.05	0.73	0.07
Hey, at least you lost something that’s…	0.09	0.61	0.13	0.04	0.72

Note: The ellipsis (“…”) in the table represents the remainder of the sentences not shown for brevity.

Table 4. Sample input data and corresponding Agreeableness scores for fine-tuning multiple models.

Body	Agr
I admit having fallen into the trap myself. As much as I know…	0.3
thats a great business idea, why didn’t i think of that!!	0.09
Hey, at least you lost something that’s still currently made. I’ve…	0.09

Note: The ellipsis (“…”) in the table represents the remainder of the sentences not shown for brevity.

Table 5. Personality trait predictions using RoBERTa and BERT.

Big Five Personality Traits	RoBERTa (Base)		RoBERTa (Large)		BERT (Base)		BERT (Large)
Big Five Personality Traits	RMSE	$R^{2}$	RMSE	$R^{2}$	RMSE	$R^{2}$	RMSE	$R^{2}$
Openness	0.2345	0.2201	0.2315	0.2403	0.2391	0.1895	0.2386	0.1926
Conscientiousness	0.2669	0.2161	0.2653	0.2255	0.2739	0.1743	0.2730	0.1797
Extraversion	0.2625	0.2604	0.2602	0.2734	0.2694	0.2212	0.2702	0.2168
Agreeableness	0.2743	0.2288	0.2727	0.2379	0.2803	0.1949	0.2805	0.1939
Neuroticism	0.2737	0.2223	0.2732	0.2248	0.2796	0.1884	0.2800	0.1859

Table 6. Personality trait prediction results of RoBERTa large in comparison with previous work [19].

Big Five Personality Traits	Mean Baseline		RoBERTa		RoBERTa (Large)
Big Five Personality Traits	RMSE	$R^{2}$	RMSE	$R^{2}$	RMSE	$R^{2}$
Openness	0.2314	$- 0.0002$	0.1742	0.4333	0.2315	0.2403
Conscientiousness	0.2633	0.0000	0.2046	0.3964	0.2653	0.2255
Extraversion	0.3574	$- 0.0001$	0.2469	0.5229	0.2602	0.2734
Agreeableness	0.3036	$- 0.0004$	0.2423	0.3964	0.2727	0.2379
Neuroticism	0.3074	$- 0.0002$	0.2527	0.3248	0.2732	0.2248

Table 7. Results of Big Five personality trait score predictions with single-model and multiple-model approaches in comparison to previous work [19].

Big Five Personality Traits	Mean Baseline		RoBERTa		RoBERTa (Single)		RoBERTa (Multiple)
Big Five Personality Traits	RMSE	$R^{2}$	RMSE	$R^{2}$	RMSE	$R^{2}$	RMSE	$R^{2}$
Openness	0.2314	$- 0.0002$	0.1742	0.4333	0.2345	0.2201	0.2655	0.0008
Conscientiousness	0.2633	0.0000	0.2046	0.3964	0.2669	0.2161	$- 0.3016$	$- 0.0005$
Extraversion	0.3574	$- 0.0001$	0.2469	0.5229	0.2625	0.2604	0.3033	0.0131
Agreeableness	0.3036	$- 0.0004$	0.2423	0.3964	0.2743	0.2288	0.3124	$- 0.0001$
Neuroticism	0.3074	$- 0.0002$	0.2527	0.3248	0.2737	0.2223	0.3103	$- 0.0001$

Table 8. Comparison of Big Five personality trait intercorrelations between the dataset and the correlations predicted by RoBERTa large.

Trait Pairs	Dataset	RoBERTa Large
EXT-NEU	$- 0.2905$	$- 0.4252$
EXT-AGR	$- 0.0615$	$- 0.0803$
EXT-CON	0.0689	0.0923
EXT-OPE	0.2344	0.2782
NEU-AGR	0.0491	0.1027
NEU-CON	$- 0.2474$	$- 0.2377$
NEU-OPE	0.0433	0.0583
AGR-CON	0.1350	0.1823
AGR-OPE	0.1178	0.1132
CON-OPE	$- 0.0655$	$- 0.0861$

(Note: Extraversion (EXT), Neuroticism (NEU), Agreeableness (AGR), Conscientiousness (CON), Openness (OPE)).

Table 9. Comparison of Big Five personality trait intercorrelations between the dataset and the correlations predicted by RoBERTa in previous work (Chen 2023 [19]).

Trait Pairs	Dataset	RoBERTa
EXT-NEU	$- 0.5471$	$- 0.7611$
EXT-AGR	$- 0.4373$	$- 0.6480$
EXT-CON	$- 0.0484$	$- 0.1260$
EXT-OPE	0.2428	−0.4436
NEU-AGR	0.0309	0.3321
NEU-CON	$- 0.1578$	$- 0.0346$
NEU-OPE	$- 0.0572$	$- 0.1750$
AGR-CON	0.3486	0.6360
AGR-OPE	$- 0.1860$	$- 0.4895$
CON-OPE	$- 0.4200$	$- 0.6385$

(Note: Extraversion (EXT), Neuroticism (NEU), Agreeableness (AGR), Conscientiousness (CON), Openness (OPE)).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shum, K.-M.; Ptaszynski, M.; Masui, F. Big Five Personality Trait Prediction Based on User Comments. Information 2025, 16, 418. https://doi.org/10.3390/info16050418

AMA Style

Shum K-M, Ptaszynski M, Masui F. Big Five Personality Trait Prediction Based on User Comments. Information. 2025; 16(5):418. https://doi.org/10.3390/info16050418

Chicago/Turabian Style

Shum, Kit-May, Michal Ptaszynski, and Fumito Masui. 2025. "Big Five Personality Trait Prediction Based on User Comments" Information 16, no. 5: 418. https://doi.org/10.3390/info16050418

APA Style

Shum, K.-M., Ptaszynski, M., & Masui, F. (2025). Big Five Personality Trait Prediction Based on User Comments. Information, 16(5), 418. https://doi.org/10.3390/info16050418

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Big Five Personality Trait Prediction Based on User Comments

Abstract

1. Introduction

1.1. The Background

1.2. Problem Definition

1.3. Research Contributions

2. The Literature Review

2.1. The Big Five

2.2. Intercorrelation of the Big Five Personality Traits

2.3. Personality Computing

3. Methodology

3.1. The Dataset

3.2. Data Cleaning and Preprocessing

3.3. Experiments

3.3.1. A Comparative Analysis of the RoBERTa and BERT Models for Predicting Big Five Personality Trait Scores

3.3.2. Investigating the Impact of Intercorrelations Among Big Five Personality Traits

4. Results

4.1. Evaluation Metrics

4.2. A Comparative Analysis of the RoBERTa and BERT Models for Predicting Big Five Personality Trait Scores

4.3. Investigating the Impact of Intercorrelations Among Big Five Personality Traits

5. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI