Next Article in Journal
Investment Portfolios Optimization with Genetic Algorithm: An Approach Applied to the Spanish Market (IBEX 35)
Previous Article in Journal
A Fault Direction Discrimination Method for a Two-Terminal Weakly Fed AC System Using the Time-Domain Fault Model for the Difference Discrimination of Composite Electrical Quantities
 
 
Article
Peer-Review Record

Personality Prediction Model: An Enhanced Machine Learning Approach

Electronics 2025, 14(13), 2558; https://doi.org/10.3390/electronics14132558
by Moses Ashawa 1,*, Joshua David Bryan 2 and Nsikak Owoh 1
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2025, 14(13), 2558; https://doi.org/10.3390/electronics14132558
Submission received: 26 May 2025 / Revised: 16 June 2025 / Accepted: 18 June 2025 / Published: 24 June 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents a machine learning framework for predicting individuals' Big Five personality traits using Instagram data. It integrates visual features extracted from user images, leveraging Places_365 and YOLOv8, with metadata and self-reported questionnaire scores. The study also discusses ethical data handling and introduces a  GUI for real-time personality inference.

While the abstract outlines the method and some results, it lacks a clear research challenges, the motivation , and the broader implications of the findings.

The introduction section does not clearly define a research gap. It moves too quickly from background information to implementation without adequately framing the problem or discussing the technical and ethical challenges involved in personality prediction from social media data.

In the Related Works section, it is unclear how the literature review directly informs or supports the proposed solution. The connection between past studies and the current approach should be made more explicit.

The Materials and Methods section would benefit from significant restructuring. Currently, it reads more like annotated source code than a scholarly description. There is too much focus on implementation details such as file names and directory paths, and not enough emphasis on the rationale behind each step. Repetitions, particularly around CSV handling, reverse scoring, and data normalisation, should be consolidated. Additionally, the section lacks higher-level summaries or visual diagrams explaining the overall pipeline or justifying key design choices.

The Results and Discussion section also reveals several technical issues.
- Metric confusion. The term A.M. in Table 3 seem to be incorrectly used to compare with MAE values in Tables 2 and 4. These are fundamentally different metrics and should not be compared directly.
- Inconsistency in MAE scores. Table 4 reports a MAE of 0.3671 for Neuroticism, whereas Table 2 lists it as 0.1021?
- Lack of statistical significance testing. Without significance testing, readers cannot determine whether the observed improvements over benchmarks are meaningful or possibly due to random variation.

Author Response

We sincerely thank you for taking the time to review our manuscript and for providing thoughtful and constructive feedback. We have carefully considered each of your comments and have addressed them thoroughly in the revised version. We also, proofread the manuscript to check and correct any grammatical error and ambiguity where possible.

 

 

Comment 1: While the abstract outlines the method and some results, it lacks a clear research challenges, the motivation , and the broader implications of the findings.

Response to Comment1: Here we revised the whole abstract to incorporate a clearer articulation of the research motivation, challenges, and broader implications of findings as highlighted in the text.

Abstract: In today’s digital era, social media platforms like Instagram have become deeply embedded in daily life, generating billions of content items each day. This vast stream of publicly accessible data presents a unique opportunity for researchers to gain insights into human behavior and personality. However, leveraging such unstructured and highly variable data for psychological analysis introduces significant challenges including data sparsity, noise, and ethical considerations around privacy. This study addresses these challenges by exploring the potential of machine learning to infer personality traits from Instagram content. Motivated by the growing demand for scalable, non-intrusive methods of psychological assessment, we developed a personality prediction system combining convolutional neural networks (CNNs) and random forest (RF) algorithms. Our model is grounded in the Big Five Personality framework which includes Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness. Using data collected with informed consent from 941 participants, we extracted visual features from their Instagram images using two pretrained CNNs which were then used to train five RF models, each targeting a specific trait. The proposed system achieved an average mean absolute error of 0.1867 across all traits. Compared to the PAN-2015 benchmark, our method demonstrated competitive performance. These results highlight that using social media data for personality prediction offers potential applications in personalized content delivery, mental health monitoring, and human-computer interaction.

 

Comment 2: The introduction section does not clearly define a research gap. It moves too quickly from background information to implementation without adequately framing the problem or discussing the technical and ethical challenges involved in personality prediction from social media data.

 

Response to Comment 2: The authors have included a state-of-the-art challenge in paragraph 1 and 2 of the introduction section to address this comment.

However, despite the promising potential of personality prediction from social media data, several challenges continue to hinder meaningful progress and widespread adoption in applied settings. First is the absence of a unified, scalable methodology for accurately linking diverse forms of social media content such as text, images, metadata, and user interactions to establish and validate psychological personality models. While there are existing models that try to solve these challenges, there approaches vary significantly in terms of data sources and feature engineering strategies. This often leads to limited generalizability across different user populations and platforms. This lack of standardization makes it difficult to compare findings across studies or deploy these systems in real-world scenarios with confidence especially when such systems are modelled for social media platforms such as Instagram.

Second, there are ethical and legal concerns surrounding the use of user-generated content for psychological inference. Many users are unaware of the extent to which their online activity can be analyzed to derive personal attributes including mental health status and personality traits. This asymmetry of awareness introduces serious concerns about informed consent, autonomy, and the potential misuse of the generated data. Addressing these challenges requires the development of a framework that incorporates ethical safeguards to ensure data anonymization and to provide clear communication to users about how their data will be used. Only by synchronizing both the technical inconsistencies and ethical vulnerabilities that creating an effective and scalable personality prediction model for real-world deployment can be achieved.

 

Comment 3: In the Related Works section, it is unclear how the literature review directly informs or supports the proposed solution. The connection between past studies and the current approach should be made more explicit.

Response to Comment 3: Authors have provided sentences and paragraphs that directly links the reviewed studies in areas that support the proposed model. The can comments are highlighted at the end of paragraph 1, beginning of paragraph 2, end of paragraph 3, beginning of paragraph 4 and end of paragraph 4, end of paragraph 5, beginning and end of paragraph 6, end of paragraph 7.

 

 

Comment 4: The Materials and Methods section would benefit from significant restructuring. Currently, it reads more like annotated source code than a scholarly description. There is too much focus on implementation details such as file names and directory paths, and not enough emphasis on the rationale behind each step. Repetitions, particularly around CSV handling, reverse scoring, and data normalisation, should be consolidated. Additionally, the section lacks higher-level summaries or visual diagrams explaining the overall pipeline or justifying key design choices.

 

Response to Comment 4: The authors deleted repetitive statements around CSV handling, reverse scoring, and data normalisation. Instead of using much visual diagrams, authors used algorithms and texts to provide high-level summaries justifying key design choices as highlighted in the following sentences and paragraphs:

3.1. Data collection

Data acquisition began with the design of a structured questionnaire based on the Big Five Personality Traits model, comprising 44 items aimed at quantitatively assessing personality dimensions. An informed consent section was included at the beginning of the questionnaire to ensure ethical compliance and voluntary participation. The survey was created using Microsoft Forms and disseminated through professional and social platforms, including LinkedIn and Instagram, over an eight-week period.

A total of 941 participants provided consent and completed the questionnaire. Respondents were requested to confirm their submissions to enable the accurate linkage of their Instagram profiles with the collected data, thereby supporting subsequent multimodal analysis.

Responses were exported via the native functionality of Microsoft Forms to ensure data integrity and ease of organization. The dataset was processed to prepare it for analysis, including steps for response validation, scoring based on the Big Five Inventory, and preparing data for model input. Due to the inclusion of personally identifiable information, the dataset is not publicly available. However, all scripts used in data preprocessing and model development are available in the accompanying open-source repository: https://github.com/JoshuaBryan02/Personality-Prediction-System.

3.2. Data extraction

To facilitate the preprocessing and analysis of raw survey data collected via Microsoft Forms, a Python-based data processing pipeline was developed. The core script, titled CSV-splitting.py [link], was designed to automate the transformation of raw inputs into structured personality trait scores based on the Big Five Inventory (BFI) framework [20]. The extraction begins with the definition of file path variables to locate the original dataset and check for prior executions. Upon initialization, the script prompts the user for confirmation to proceed if previous processing results exist.

The raw dataset originally stored in Microsoft Excel (.xlsx) format was converted into a standardized comma-separated values (.csv) format using the pandas library in Python. This conversion ensured compatibility with downstream data processing routines to facilitate efficient parsing. To clearly separate the data, the converted dataset extracts two distinct files, one containing participant consent information and the other comprising responses to the BFI questionnaire for personality test. The later file reads the processed personality assessment data and computes individual trait for personality score computation by initializing data structures for each of the five BFI dimensions based on validated scoring schemes. Reverse scoring and normalization techniques are applied internally where required to ensure data consistency and comparability.

To measure personality traits in alignment with the Big Five Inventory (BFI), a systematic data processing was created to centere on trait-wise aggregation and normalization [21] of questionnaire responses. The framework employs a nested iteration structure wherein each participant’s response vector is individually processed to ensure accurate mapping of item-level responses to the five BFI dimensions.

To ensure consistency and comparability across participants, the raw personality trait scores were normalized using min-max scaling. This transformation mapped the original scores onto a [0, 1] interval, which were then scaled to a [0, 100] range to enhance interpretability as percentage-based trait intensities.

The normalization bounds were determined based on the theoretical minimum and maximum values for each trait, calculated from the total number of associated questionnaire items and the Likert scale range (1–5). This approach preserves the relative differences in trait expression while allowing for standardized comparisons across individuals. The normalized scores for the five traits were structured into a unified dataset to support downstream tasks such as model training, evaluation, and visual analysis. This design choice facilitates streamlined data integration and aligns with best practices in personality prediction studies employing supervised learning approach.

 

3.5. Normalization

A nested iterative process was employed to extract frequency-based features from image-level metadata. The outer loop aggregated the occurrence frequency of each individual across all Instagram posts. To ensure comparability across samples, frequencies were normalized by the total number of images per volunteer. A further normalization step scaled the values to the [0, 1] interval. To mitigate the risk of division-by-zero errors and enhance numerical stability, a small constant offset was added during this transformation.

 

To predict personality traits, the personality classification process was implemented using Python, with a Random Forest model serving as the baseline algorithm due to its robustness against overfitting and interpretability. The process integrates multiple feature sources: environmental descriptors, social interaction features, and image-derived person-frequency metrics. Data integration commenced by independently importing and preprocessing the environmental and people count feature sets. These were concatenated to form a unified feature matrix. Simultaneously, raw Instagram metadata was processed to extract auxiliary features, which were normalized using min–max scaling with pre-stored normalization parameters to ensure cross-session consistency. Next, the normalized personality trait labels derived from a validated psychometric tool were merged with the composite feature matrix to form the final training dataset. This dataset was stored as Final_Dataset.csv for use in model training and evaluation.

 

 

Comment 5: The Results and Discussion section also reveals several technical issues.
- Metric confusion. The term A.M. in Table 3 seem to be incorrectly used to compare with MAE values in Tables 2 and 4. These are fundamentally different metrics and should not be compared directly.
- Inconsistency in MAE scores. Table 4 reports a MAE of 0.3671 for Neuroticism, whereas Table 2 lists it as 0.1021? - Lack of statistical significance testing. Without significance testing, readers cannot determine whether the observed improvements over benchmarks are meaningful or possibly due to random variation.

Response to Comment 5: We appreciate the reviewer’s observation. In response, the previous Table 3, which included comparisons based on the Arithmetic Mean (A.M), has been removed. In table 4 (now table 3), the A.M values were removed as well. The subsequent table has been renumbered as Table 3. Additionally, the variation in the MAE value for Neuroticism was identified as a typographical error. This has been carefully reviewed and corrected to ensure accuracy. Also, we conducted a paired-sample t-test to test the significance between our model and the benchmark. This is highlighted in subsection 4.4 as shown below:

4.4. Statistical testing

To determine whether the observed performance differences are statistically significant, we conducted a paired-sample t-test using the MAE values for each personality trait across our model and the benchmark model in Table 3. Two hypothesis  and  are formulated.  postulates that there is no significant difference in MAE between the PAN-2015 benchmark and our proposed model.  postulates that there is a significant difference in MAE between the two models.

Let  represent the MAE from PAN-2015 model for trait ,  represent the MAE from our proposed model for trait . The difrerent for each personality trait based on the values in table 3 is . Given  paired traits (E, N, A, C, O), we perform the paired-sample t-test to determine if the mean difference  between the two models is significantly different from zero. Equations (6)- (8) are the mathematic expressions of how the t-statistic, degree of freedom and p-value are calculated.

where  is the -distribution with 4 degress of freedom. Since  =0.309 > 0.05p we fail to reject the null hypothesis. Thus, there is no statistically significant difference between the models at the 95% confidence level. However, the practical improvements in Neuroticism and Agreeableness remain substantial.

Despite the lack of major statistical significance, the magnitude of error reduction in traits like Neuroticism (a drop of 54.9%) and Agreeableness (a drop of 30.3%) in our model is substantial from a practical and application-oriented perspective. These traits are often considered challenging to model due to their high variability and significant expression in user-generated content. The observed improvements is still valuable in real-world scenarios particularly in affective computing and human-centered systems where even small enhancements in predictive model can improve personalization and user interaction.

 

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper presents a personality prediction system using convolutional neural networks (CNN) and random forest (RF) to predict personality traits using Instagram data. In this paper, the Authors employed two pretrained convolutional neural networks to extract features from the images, and then to train five RF models, each corresponding to one of the Big Five traits, such as Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness. The experiments were conducted by the Authors, and the model was compared to the PAN-2015 22 benchmark. The topic is interesting, and the paper corresponds well with the journal’s aim and scope.
The paper is quite well written, and it contains all of the relevant sections. However, some shortcomings need to be improved.
The Introduction contains the considered problem and the aim of the paper. The contributions are also included. 
In section 2 Related works, the Authors included the comparison of related works – please consider adding a short description of the contents. Only one sentence is added (lines 152-153). 
Is it possible to extend the data collection process? How were the participants selected? 
In section 3, Figure 1 needs to be better prepared and moved to section 3.1
The equations need to be prepared according to the MDPI format.

Author Response

We sincerely thank you for taking the time to review our manuscript and for providing thoughtful and constructive feedback. We have carefully considered each of your comments and have addressed them thoroughly in the revised version.

Comment 1: In section 2 Related works, the Authors included the comparison of related works – please consider adding a short description of the contents.

Response to Comment 1: We added short descripts in the Related works. This can be found in the highlighted areas of the related work section. Also, authors have provided sentences and paragraphs that directly links the reviewed studies in areas that support the proposed model. The can comments are highlighted at the end of paragraph 1, beginning of paragraph 2, end of paragraph 3, beginning of paragraph 4 and end of paragraph 4, end of paragraph 5, beginning and end of paragraph 6, end of paragraph 7.

 

Comment 2: Only one sentence is added (lines 152-153). 

Response to Comment 2: The authors have expanded on this sentence to make the paragraph more clear as highlighted in the document

 The study employed the Big Five Personality traits (Extraversion, Neuroticism, Agreeableness, Conscientiousness, and Openness to Experience) as the theoretical framework for personality classification. Each dimension is treated as a continuous latent construct, enabling nuanced trait quantification at the individual level. By leveraging this framework, the study benefits from a robust psychological taxonomy that has demonstrated high construct validity and cross-cultural applicability. The integration of the Big Five facilitates alignment with standardized psychometric instruments (e.g., BFI, NEO-PI-R) that allows for more interpretable and comparable modeling outcomes across computational personality research.

 

Comment 3: Is it possible to extend the data collection process? 

Response to comment 3: We (authors) wanted to gather more than 941 participants. However, after eight-week period, we waited for another 6 days but during this time, no more individuals participated. So, been that the participation was voluntary and many people may not want to link their social media handle to any research, we decided to proceed we the data we collected from the 941 participants. In the future, we hope to gather more data from people using face to face approach rather than a survey.

 

Comment 4: How were the participants selected? 

Response to comment 4: Participants were selected voluntarily. A total of 941 individuals agreed to take part in the study by giving their consent and completing a questionnaire. After submitting their responses, they were asked to confirm their participation so that the researchers could correctly match their answers with their Instagram profiles. This allowed for accurate analysis using both their questionnaire data and social media activity.


Comment 5: In section 3, Figure 1 needs to be better prepared and moved to section 3.1

Response to comment 5: The authors increased the resolution of Figure 1 to enhance its visibility as recommended. The authors also moved Figure 1 to subsection 3.1 as recommended.


Comment 6: The equations need to be prepared according to the MDPI format.

Response to comment 5: The authors rearranged the equations and prepared them to fit the MDPI format.

 

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Dear authors,

The study is well designed and structures, however it presents some flaws that need to be addressed.

Despite a developed framework on the theme an initial research question is missing in the Introduction section. It emerges in the conclusion section but it must be highlighted and followed since the beginning.  

It needs a smooth transition from section to section – for instance through small sections’ synthesis  

A sample of instagram users tend to be demographically distorted (e.g. urban people)

The Conclusion must include practical and theoretical consequences from the study and also emphasize its main limitations:

In particular, it is highly ambitious to reduce personality to instagram pattern behaviors, as personality is very complex and context dependent, while online behavior can aim at simple social rewarding.

Moreover questionnaires might mismatch online behavior, as one could fill it only in the vein of social correctness and conventions, which introduces bias. As a result, posts do not necessarily reflect personality but rather social trends instead.  

Also the risk of stereotyping personalities into some given psychological categories should be deemed.

Those aforementioned limitations should be dealt, either throughout the main text or in a final Limitations section.

Best regards

Author Response

We sincerely thank you for taking the time to review our manuscript and for providing thoughtful and constructive feedback. We have carefully considered each of your comments and have addressed them thoroughly in the revised version.

 

Comment 1: Despite a developed framework on the theme an initial research question is missing in the Introduction section. It emerges in the conclusion section but it must be highlighted and followed since the beginning. 

Response to Comment 1: The research question has been appropriately relocated from the conclusion section to the introduction section to enhance the clarity of the study’s objectives and align with standard academic structure as identified by the reviewer. This highlighted in paragraph 8 of the introduction section.

 

Comment 2: It needs a smooth transition from section to section – for instance through small sections’ synthesis  

Response to comment 2: many sentences and paragraphs are added and rewritten to enhance smooth transition through sections and subsections. These changes are highlighted in the all the sections. See abstracted, introduction, related work, and discussion sections of the paper.

 

Comment 3: A sample of instagram users tend to be demographically distorted (e.g. urban people)

Responds to comment 3: We appreciate the reviewer’s observation regarding the demographic skew of Instagram users, particularly the tendency toward urban representation. This observation is valid and aligns with existing literature, which notes that urban populations are more active on visually driven platforms like Instagram due to greater digital literacy, consistent internet access, and higher smartphone penetration.

Our study intentionally focused on urban Instagram users for several practical and methodological reasons. Urban populations are more likely to engage regularly with Instagram, offering a richer and more consistent stream of social media content necessary for robust feature extraction and model training. This accessibility enables the capture of multimodal digital footprints. Recruiting participants from urban areas allowed for more reliable data collection, as participants were more reachable, responsive, and familiar with digital consent procedures and online questionnaires. This minimized missing data and improved the quality of linked personality trait assessments. Prior studies (e.g., [10], [12]) have noted that personality frameworks such as the Big Five tend to generalize more effectively in literate and urbanized contexts, where social expression patterns and identity formation align more closely with the psychometric assumptions underlying these models. While the current study is urban-centric, this choice was made to first validate the methodology under stable and data-rich conditions. In future work, we plan to explore model adaptation for rural or underrepresented populations using culturally-aware fine-tuning techniques and broader data augmentation strategies.

 

Comment 4: The Conclusion must include practical and theoretical consequences from the study and also emphasize its main limitations. In particular, it is highly ambitious to reduce personality to instagram pattern behaviors, as personality is very complex and context dependent, while online behavior can aim at simple social rewarding.

Response to comment 4: The authors have added two paragraphs in the conclusion section to include practical and theoretical consequences from the study. The two major limitations in the study are also highlighted in the last paragraph on the conclusion section as follows:

This study demonstrates the viability of leveraging multimodal social media data specifically image-derived features from Instagram to predict personality traits. This has direct applications in domains such as talent acquisition, personalized recommendation systems, mental health assessment, and human-computer interaction, where understanding an individual's personality can enhance decision-making and user experience design. Moreover, the integration of Random Forest and CNN-based models illustrates how hybrid architectures can improve robustness across trait dimensions to offer a scalable solution for real-world deployment. From a theoretical perspective, our study contributes to ongoing discourse in computational psychometrics by validating that the Big Five personality model retains predictive relevance when inferred from visual behavioural indicators. The findings support the emerging hypothesis that personality manifests in digital footprints particularly visual content which opens new directions for affective computing research and social signal processing.

However, our study is not without limitations. The limited size and demographic homogeneity of the dataset predominantly composed of urban Instagram users constrains the generalizability of the model to broader populations. Also, the underperformance in predicting Neuroticism suggests that certain personality traits may be less visually discernible or require context-enriched multimodal attributes such as interaction patterns for accurate inference. Future work will focus on expanding the dataset to cover rural areas and to incorporate cross-platform data for deeper personality signal extraction.

 

Comment 5: Moreover, questionnaires might mismatch online behaviour, as one could fill it only in the vein of social correctness and conventions, which introduces bias. As a result, posts do not necessarily reflect personality but rather social trends instead.  Also the risk of stereotyping personalities into some given psychological categories should be deemed.

Response to comment 5: We appreciate the reviewer’s insightful observation regarding the potential mismatch between self-reported questionnaire data and actual online behavioural. We acknowledge that self-reported questionnaires, such as those based on the Big Five Personality Model, may be susceptible to socially desirable responding. However, our study design mitigates this concern in the following ways. By complementing questionnaire data with Instagram-derived behavioural signals, our model captures personality-relevant trait that are less consciously controlled by users. This approach enables us to model both explicit (self-reported) and implicit (observed) personality indicators for reducing overreliance on any single modality that may carry inherent biases. We recognize that online behaviour can be influenced by evolving social trends. However, our machine learning approach, particularly through the use of image- and text-based features over time, is designed to detect consistent behavioural patterns rather than transient or trend-driven anomalies.

 

Comment 6: Those aforementioned limitations should be dealt, either throughout the main text or in a final Limitations section.

 

Response to comment 6: We have included a paragraph in the conclusion section to state the limitation of our research as follows:

However, our study is not without limitations. The limited size and demographic homogeneity of the dataset predominantly composed of urban Instagram users constrains the generalizability of the model to broader populations. Also, the underperformance in predicting Neuroticism suggests that certain personality traits may be less visually discernible or require context-enriched multimodal attributes such as interaction patterns for accurate inference. Future work will focus on expanding the dataset to cover rural areas and to incorporate cross-platform data for deeper personality signal extraction.

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed the concerns raised. The abstract was revised to clearly explain the research motivation, challenges etc., as well as enhancing the introduction to highlight the research gap. The Related Works section now includes more connections between prior studies and the proposed approach.  Additionlly, the Materials and Methods section has been restructured to remove repetitive information and hence, to provide clearer rationales. Algorithmic descriptions and high-level summaries were also added.
For future work, areas such as more rigorous statistical analysis, including clearer t-test reporting and interpretation, could be further refined to strengthen the reliability of the findings.

 

Back to TopTop