Application of Artificial Intelligence for Better Investment in Human Capital

Mohammed Abdullah Ammer; Zeyad A. T. Ahmed; Saleh Nagi Alsubari; Theyazn H. H. Aldhyani; Shahab Ahmad Almaaytah

doi:10.3390/math11030612

,

and

¹

Department of Finance, School of Business, King Faisal University, Al-Ahsa 31982, Saudi Arabia

²

Department of Computer Science, Dr Babasaheb Ambedkar Marathwada University, Aurangabad 431004, India

³

Applied College in Abqaiq, King Faisal University, Al-Ahsa 31982, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Mathematics2023, 11(3), 612;https://doi.org/10.3390/math11030612

This article belongs to the Special Issue Advanced Artificial Intelligence Models and Its Applications

Version Notes

Order Reprints

Abstract

Selecting candidates for a specific job or nominating a person for a specific position takes time and effort due to the need to search for the individual’s file. Ultimately, the hiring decision may not be successful. However, artificial intelligence helps organizations or companies choose the right person for the right job. In addition, artificial intelligence contributes to the selection of harmonious working teams capable of achieving an organization’s strategy and goals. This study aimed to contribute to the development of machine-learning models to analyze and cluster personality traits and classify applicants to conduct correct hiring decisions for particular jobs and identify their weaknesses and strengths. Helping applicants to succeed while managing work and training employees with weaknesses is necessary to achieving an organization’s goals. Applying the proposed methodology, we used a publicly available Big-Five-personality-traits-test dataset to conduct the analyses. Preprocessing techniques were adopted to clean the dataset. Moreover, hypothesis testing was performed using Pearson’s correlation approach. Based on the testing results, we concluded that a positive relationship exists between four personality traits (agreeableness, conscientiousness, extraversion, and openness), and a negative correlation occurred between neuroticism traits and the four traits. This dataset was unlabeled. However, we applied the K-mean clustering algorithm to the data-labeling task. Furthermore, various supervised machine-learning models, such as random forest (RF), support vector machine (SVM), K-nearest neighbor (KNN), and AdaBoost, were used for classification purposes. The experimental results revealed that the SVM attained the highest results, with an accuracy of 98%, outperforming the other classification models. This study adds to the current literature and body of knowledge through examining the extent of the application of artificial intelligence in the present and, potentially, the future of human-resource management. Our results may be of significance to companies, organizations and their leaders and human-resource executives, in addition to human-resource professionals.

Keywords:

Big Five personality test; artificial intelligence; human resources; employee selection; teamwork; machine learning

MSC:

68T01

1. Introduction

Recently, the world has witnessed tremendous developments in artificial intelligence (AI) techniques, which are necessary for management science because of their predictive accuracy, classification, ease of analysis, and time-saving features. Traditional methods were used in the past, based on handwriting as an analytical tool for personality or the manual observation of some personal traits [1].

The application of artificial intelligence (AI) for human-resource management through the use of the Big Five personality test can be a powerful tool for making data-driven decisions and improving efficiency. By analyzing the results of the Big Five personality test, human-resources professionals can identify patterns and trends in personality traits and predict which candidates will be likely to be successful in specific job roles. Additionally, AI algorithms can be used to identify training-and-development needs, analyze employee-performance data, and identify factors that contribute to high levels of employee engagement and retention. Overall, the use of AI in human-resource management has the potential to greatly enhance decision-making and improve the overall effectiveness of HR processes [2,3].

When candidates submit their applications for a specific job to a company, the first expectation of the human-resources manager is that they select the right candidate for potential placement. A common approach is to require a certificate and experience from the applicant. Most organizations focus on specific criteria, including creativity, communication, the ability to analyze, speed of intuition, and the ability to overcome the types of challenges associated with the position. In addition, leadership skills are often sought. These include firmness, administrative discipline, and the ability to direct others toward the organization’s goals [4,5,6].

The Big Five model, sometimes referred to as the five-factor model, is currently the theory of personality that has the greatest level of acceptance among psychologists. According to this idea, an individual’s personality may be broken down into five primary components, sometimes known by the acronym OCEAN (openness, conscientiousness, extraversion, agreeableness, and neuroticism) [7].

The Big Five personality test may be carried out by any organization. However, the test does not exclude or withhold some jobs from some people; the goal is more significant and profound. Some organizations are interested in developing a team with a high capacity for carrying out specific tasks to achieve company goals [8,9,10]. Figure 1 shows Big Five personality traits.

Figure 1. Big Five personality traits.

In modern psychology, the Big Five personality test examines the essential categories of individual personality included in OCEAN [11], which determine individuals’ personalities and explain their behaviors. The OCEAN categories include the following:

Openness to new experiences: This trait characterizes people who enjoy the arts and new adventures. People with a high score on this characteristic are often inquisitive, less traditional, and more inventive.
Conscientiousness measures people who are organized, productive, and accountable. High-scoring results for this trait are often obtained by meticulous and highly reliable individuals. Low scores are given to people who exhibit low performance and are uninterested in their jobs.
Extraversion evaluates sociability and an individual’s source of energy and excitement. Furthermore, in a manner that could be compared with a spotlight, those who score well on this characteristic often inspire others to succeed.
Agreeableness measures trustworthiness, candor, and getting along with people. Low scores for this trait are often indicative of less trustworthy and more dissatisfied individuals. These tend to be more argumentative, which may reduce their chances of being hired [12].
Neuroticism assesses an individual’s emotional stability, impulsivity, and anxiety in the face of pressure. People who go absent without leave (AWOL) from work, use harsh language, or behave negatively after an intense meeting will likely have a higher score for this trait.

This approach, which is recommended in this study, has two advantages:

A quicker and less expensive way to determine job applicants’ personalities.
A faster process means there is no need to spend significant time determining applicants’ behavior, reducing the need to spend significant money on interviews [13,14].

It was mentioned in [15] that it is necessary to investigate personality with more focus and with a fuller consideration of the particulars of expansion and revolution than have been employed hitherto. To highlight the issue in this study, it is indicated by Verma and Bandi [16] that 69% of employees’ inadequate qualities are due to poor hiring decisions. The explanations for this include deficiencies in the understanding of candidates’ profiles and their alignment with companies’ cultures, in addition to the subjective assessment of candidates’ hard and soft skills. This is because the employment process is typically conducted by human recruiters, who individually check CVs and other sources to find applicants. As individuals have narrow capabilities, performing all the necessary tasks is not easy, and generally entails more time from each individual recruiter. Other issues include human limitations, such as prejudices, biases, and time constraints, which can influence how a recruitment procedure operates [17] and may lead a company to lose applicants who are better suited [18]. Indeed, AI and machine learning may help to solve this problem by making human-capital management more smart and effective. Thus, the purpose of study is to enlarge upon and fill the gap in the current research by examining how AI will help organizations to select their human capital to increase the effectiveness of their recruitment.

The rest of the article is broken up into the following sections. The next part presents a literature review of the existing studies. Section 3 provides the methodology, which examines the data collection and sources, as well as the technique used for the analysis. The results are discussed in Section 4, while the discussion of these results and their connection to the motivation behind the study are presented in Section 5. The conclusion and potential directions for future research are presented in Section 5.

2. Literature Review

In the current era, technological advancements have made it possible to obtain and analyze data to acquire information about human behavior [19]. For example, the analysis of the Big Five personality traits has been applied in different fields, such as health care, education, online-behavior analysis, and human-resource management. Alamsyah et al. [20] identified prospective employment applicants based on a personality measurement using an ontology model with the help of social-media data. They selected five Twitter users whose data were available on social media as samples.. Furthermore, through their approach, they found that the personality measurement using this model revealed that each job applicant had different personality traits, such as openness, extraversion, agreeableness, and conscientiousness.

Another study, presented by Laleh et al. [21], analyzed the behavior of users and customers on social-media platforms such as Twitter and Facebook. The users’ text data, such as likes, follow-ups, and online posts were collected in order to track their activities. The aim was to determine which customers were targeted and attracted by the promotion of particular companies’ products, thus increasing these companies’ profits. Some companies also use online social media posts for behavioral and psychological analysis. Qin et al. [22] presented a deep-learning model based on an artificial neural network, BP, to predict OCEAN personality traits. The textual analysis and deep-learning model were trained and tested on a dataset collected from the Sina Weibo website. The results showed that the model can predict the efficiency of the OCEAN personality test, achieving an accuracy of 74%.

Some studies have been conducted on the academic field. For instance, John et al. [23] used questionnaires to test students’ performances. Another study, by Curtis et al. [24], tested the relationship between personality traits and employee aging. The study found that neuroticism may be negatively related to the tested individuals’ general cognitive ability, capacity, and smooth thinking.

In health care, Dymecka et al. [25] tested the influence of self-efficacy and the Big Five personality characteristics on emergency-telephone-number operators’ stress during the COVID-19 epidemic. One hundred emergency-telephone operators participated in the research and provided the data. The authors discovered that the operators of emergency telephone numbers suffered from a considerable amount of stress. All the Big Five personality characteristics and self-efficacy were linked to the amount of stress experienced. Self-efficacy and emotional stability were significant predictors of reported stress in a sample of emergency-telephone-number operators using stepwise regression.

Furthermore, Muntean et al. [26] tested doctors’ stress. The authors tested doctors exposed daily to several stressors; their levels of occupational stress were thoroughly examined. In mental health, Chavoshi et al. [27] studied the relationship between depressive symptoms and the Big Five personality traits. The results showed an association between neuroticism and depressive symptoms that was significantly positive, whereas the link between extraversion, conscientiousness, and openness was significantly negative [28].

Xu et al. [29] studied how the geographic environment influences human personality at the provincial level. The authors studied the association between the Big Five personality characteristics and the measurement of mountainous areas. They investigated the differences in the personalities of inhabitants of mainland China in relation to geographical region by exploring the relationships between the Big Five personality characteristics and indices of mountainous areas. Priyadharshini et al. [30] applied the Big Five personality test in the selection of decision makers and leaders in various investment, military, and government sectors, in which decisions determine the fates of countries or other sectors, and the failure or success of their projects.

There are some similarities between the Big Five test and the Myers–Briggs Type Indicator (MBTI).. The MBTI is a personality model that is rarely used in personality computing. Unlike the Big Five and HEXACO, the MBTI defines personality according to types rather than traits; in other words, the human personality is solely defined by a specific personality type or class, rather than through different scores for multiple traits. The Myers-Briggs Type Indicator (MBTI) classifies people into one of four categories: extraversion or introversion, sensing or intuition, thinking or feeling, and judging or perceiving. This is a technique that is usually used in the process of assisting persons in better comprehending their personal communication preferences, as well as the manner in which they engage with other people. Knowledge of the Myers-Briggs Type Indicator (MBTI) may assist individuals in adapting their interpersonal styles to suit a variety of settings and audiences. In psychology, the MBTI applies four binary criteria and categorizes individuals into one of 16 distinct personality types. The MBTI has long since been replaced by approaches such as the Big Five traits, which are more reliable, valid, and complete. These approaches are considerably more descriptive of the underlying reality (e.g., most individuals are neither drastically introverted nor extroverted, but rather somewhere in the center) than categorical characteristic dimensions such as those deployed by the MBTI. The Big Five is not the only contemporary theory of personality. Its most notable rivals are the honesty–humility, emotionality, extraversion, agreeableness, conscientiousness, and openness-to-experience (HEXACO) model. However, HEXACO and the Big Five are relatively similar; HEXACO’s additional honesty–humility element is the primary distinguishing feature [28], which adds a sixth dimension to personality analysis, [31].

3. Methodology

Many modern psychologists who study personality point to the Big Five personality traits as evidence that there are at least five fundamental aspects of human nature [32]. Extraversion, agreeableness, openness, conscientiousness, and neuroticism are the five main characteristics of a person’s character. In modern management methods, knowledge of personality traits is essential. Therefore, career professionals and psychologists use this information in a personality career test for recruitment and candidate assessment [33]. This study developed a machine-learning approach to Big Five personality test dataset to give decision makers in organizations or businesses detailed information on the personalities of applicants and more insight into how they react in different situations, which can help in selecting occupations for employees. The proposed methodology has seven phases, as follows:

Dataset collection.
Data preprocessing.
Feature selection.
Clustering algorithms.
Data splitting.
Training machine-learning models.
Evaluation of the results.

Figure 2 shows the structure of the methodology.

Figure 2. Architecture of the proposed methodology.

3.1. Dataset Collection

Open Psychometrics collected this dataset from participants worldwide through an online model [34]. This dataset contains information from 1,015,342 individuals who answered the questionnaire, which comprised 50 questions. It is publicly available on Kaggle [35]. Figure 3 shows the countries with 10,000 or more participants. A large number of participants were from the following countries: USA, with 545,912 participants; the UK, with 66,487; Canada, with 61,805; Australia, with 49,753; the Philippines, with 19,844; and India, with 17,482. Some countries had few participants. These included Yemen, with 14 participants, and Burundi, with one participant.

Figure 3. The distribution of participants on the dataset.

3.2. Data Preprocessing

This dataset needed processing because it had missing values. When no value was stored for a particular feature in the dataset, the “dropna” method was used to clean the dataset. The dataset also contained some unwanted features; therefore, we focused on the responses to questions that only related to personal traits. Next, MinMaxScaler was used to scale the data. It modifies attributes by scaling each attribute to a specified range. The default range is between 0 and 1. Subsequently, principal-component analysis (PCA) was used to reduce the dimensionality of the data and k-means for clustering to label the data. The following sections explain the rest of the preprocessing.

3.2.1. Correlation Testing Using Pearson’s Approach

The Pearson correlation coefficient (named after Karl Pearson) can be used to summarize the strength of the linear relationship between data samples. Python software 3.9 was used to find correlations between features. It is expressed by Equation (1), below.

r = \frac{\sum (x - m_{x}) (y - m_{y})}{\sqrt{\sum {(x - m_{x})}^{2} \sum {(y - m_{y})}^{2}}}

(1)

The range of the correlation is between −1 and +1. When the correlation value is closer to zero, there is no linear trend between the two variables. When the correlation is close to 1, the correlation is more positive, which means that a change in one variable affects the other variable. A correlation closer to −1 is similar, but instead of increasing, one variable decreases as the other increases [36]. The heatmap shows that the diagonals are all “1,” dark blue, because these squares correlate each variable with itself (indicating a perfect correlation). For the rest of the values, the larger the number and the darker the color, the stronger the correlation between the two variables. The plot is also symmetrical about the diagonal, since the same two variables are paired together in these squares. To make the heatmap in Figure 4 more comprehensible, we combined the personality traits into five variables using the mean value of 50 variables and tested the correlations. Figure 4 shows a positive relationship between conscientious personality (CSN) and open personality (OPN), of 0.4. Furthermore, there was a positive relationship between extraversion personality (EXT) and agreeableness personality (AGR), of 0.4. In addition, there was a positive relationship between the AGR and the CSN, of 0.36. When we compared the correlations between any personality trait and neuroticism personality traits, we found a weak correlation, as shown in Figure 4.

Figure 4. Correlations between personality traits.

3.2.2. Feature Selection

The dataset used in this study had 50 features, and each group of personal traits had 10 positive and negative questions. These groups had strong internal correlations. To develop our model, we used a subset from the dataset of 20,000 (20 k) samples, because of the limitations of computer configurations. However, PCA was applied. Principal-component analysis is an unsupervised learning approach used to decrease the dimensionality of data features and is extensively employed as a dimensionality-reduction algorithm [37,38,39]. Reducing the dimensionality of input dataset features used to train and test a predictive model achieves a higher performance level. From another perspective, it makes large datasets easy to process and classify in less calculation time. The main objective of the PCA algorithm is to wrap high-dimensional features into a set of lower-dimensional spaces and then reconstruct them. The PCA can be calculated by Equation (2), where

\bar{x}

is the mean and

x_{i}

a set of input features. Table 1 shows results of PCA method when selecting significant features.

x_{j} = x_{i} - \bar{x}

(2)

where i and j are simply index variables that are used to refer to different data features within a dataset.

Table 1. Results of PCA.

3.2.3. Clustering Algorithm

Clustering is the process of gathering data into groups based on patterns of similarity and distance. In this study, the dataset was not labeled; for this reason, we applied clustering to label the data. The use of k-means clustering is a simple way to divide a dataset into K groups that do not overlap. To implement k-means clustering, we must first assume how many clusters we require [40,41]. The k-means algorithm then locates each observation in one of the K clusters. The number of clusters was determined using the Elbow method.

Elbow method for clustering determination

The Elbow method is one of the most popular methods for determining the optimal value of k, referring to the number of clusters. The idea behind the Elbow method is based on how the arm is made. However, the structure of the Elbow method may change based on how the parameter “metric” is set. The Elbow method used a k-means algorithm to determine the k number of clusters by setting in the range (k = 2 to 9) to find groups in unlabeled data. The method detected five clusters, as shown in Figure 5.

Figure 5. Elbow-method plot.

The k-means algorithm grouped the data by dividing the samples into n groups with the same degree of variation. This was achieved by minimizing what is called inertia, or within-cluster sum-of-squares. The aim was to discover a centroid with the least amount of inertia or within-cluster sum-of-squares. The following is an explanation of how k-means works:

Step 1: Calculate the value “K,” which denotes the number of clusters. In this instance, we chose K = 5 (agreeableness, extraversion, openness, conscientiousness, neuroticism).
Step 2: Initialize a cluster by choosing, for instance, five different centroids at random from fresh data points. If “K” is equal to 5 and there are five centroids, cluster initialization occurs.
Step 3: Calculate the distance between each point and the centroid. For instance, calculate the distance between the first point and the centroid.
Step 4: Assign each point to the closest cluster and then measure the distance between the initial point and the centroid.
Step 5: As the new centroid, compute the mean of each cluster. Each cluster’s mean should be used to update the centroid.
Step 6: Repeat steps 3–5 with the new cluster center. Repeat until reaching a halt, indicating convergence (no more changes), as well as the maximum number of repetitions. The process is complete when the clustering does not change during the preceding round.

The algorithm created groups based on the similarity between the answers, and it was arranged in five clusters. The next step was training and testing output-cluster data based on machine-learning models.

3.3. Machine-Learning Models

Supervised learning is a machine-learning method that is applied using labeled datasets. The models based on this method must determine the mapping function connecting the input variable (X) to the output variable (Y). After the data were labeled, using a k-means clustering algorithm, we trained and tested several machine-learning algorithms to obtain a high-accuracy model to predict measurements of different personality traits. Random forest (RF), linear support vector machine (LSVM), K-nearest neighbor (KNN), and AdaBoost algorithms were applied to divide the dataset into the following classes: Class 0, Class 1, Class 2, Class 3, and Class 4.

3.3.1. Support Vector Machine (SVM) Method

Support-vector-machine classification is a supervised-learning algorithm that uses support-vector machines to classify feature values into different categories. This algorithm is particularly useful for linearly separable data, meaning the feature values can be easily separated into distinct categories based on their features. One of the main advantages of SVM classification is its ability to handle high-dimension data and large datasets. It can also handle cases in which the data are not linearly separable by using kernel functions to transform the data into a higher-dimensional space, in which they become separable.

Support-vector-machine classification works by finding the hyperplane in a high-dimensional space that maximally separates different classes. In predicting personality traits, SVM classification can be used to identify patterns in the data indicative of specific traits. For example, a hyperplane separating individuals high in openness from those low in openness may be identified through SVM classification, allowing for accurate predictions of an individual’s openness level. In this research, the radial basis function (RBF) was employed to classify the data [42].

K (X, X^{'}) = e x p (- \frac{‖ X - X^{'} ‖^{2}}{2 σ^{2}})

(3)

where

{‖ X - X^{'} ‖}^{2}

) is Euclidean distance between the input variables.

3.3.2. AdaBoost Method

AdaBoost (adaptive boosting) classification is a machine-learning technique. It works by iteratively training weak classifiers, which are models that perform slightly better than chance, and then combining them into a single strong classifier (Freund and). The weak classifiers are trained on different subsets of the data, with a greater weight given to misclassified samples in order to focus on improving their classification. One of the key benefits of AdaBoost classification is that it can be applied to a wide range of classification problems, including binary and multi-class classification. It has also been shown to perform well in cases in which the data are unbalanced.

AdaBoost classification can be used to predict personality traits in order to identify patterns in the data indicative of specific traits. For example, a classifier combining multiple weak learners that can accurately predict an individual’s openness level may be identified through AdaBoost classification.

3.3.3. K-Nearest Neighbors (KNN)

K-nearest neighbors (KNN) is a classification technique that is both one of the easiest and one of the most essential in ML. In the fields of pattern recognition, data mining, and intrusion detection, supervised learning is one of the strategies that is used the most often. Because it does not make any fundamental assumptions about the manner in which data are distributed, it is entirely superfluous in the context of real-world scenarios [43,44]. The KNN algorithm’s goal is to determine the class label that should be applied to a particular query point by locating the points that are geographically the most similar to that location. We determined that the k value should be set to 3.

A_{i} = \sqrt{(c_{1} - c_{2}) + (d_{1} - d_{2}})

(4)

The k value is employed to locate and compute the points in the feature vectors that are closest to each other. As a result, the value must stand out. Furthermore,

c_{1} - c_{2}

and

d_{1} - d_{2}

are feature vectors for finding the closest point

3.4. Evaluation Metrics

A model was evaluated by testing an algorithm on an unseen dataset that was not used during the training step to analyze the model performances. In these experiments, we used several standard micro averages of a metric, such as confusion matrix, accuracy, f1-score, precision, and recall. The classification results were also quantified using the ROC metric, which calculated the false-positive and false-negative samples, as illustrated in the representation graph below. A confusion matrix shows four categories of results: (1) true positive (TP), (2) false negative (FN), (3) false positive (FP), and (4) true negative (TP). The equations for these metrics are as follows:

A c c u r a c y = \frac{T P + T N}{F P + F N + T P + T N} \times 100

(5)

P r e c i s i o n = \frac{T P}{T P + F P} \times 100

(6)

R e c a l l = \frac{T P}{T P + F N} \times 100

(7)

F 1 - s c o r e = 2 * \frac{p r e c i s i o n \times S e n s i t i v i t y}{p r e c i s i o n + S e n s i t i v i t y} \times 100

(8)

3.5. Experimental Results

This section presents the empirical results from experiments conducted to classify participants’ personal traits into Class _0, Class_1, Class _2, Class _3, and Class_4. The participants belonging to Class_0 were identified as having the same medium score measurements for three traits, extraversion, agreeableness, and openness, and low scores for conscientiousness and neuroticism. Class_1 means the participant has a high score for neuroticism and a low score for conscientiousness, openness, extraversion, and agreeableness. Class_2 means the participant has a high agreeableness score and a medium score for other traits. Class_3 means that the participant has a low conscientiousness score and similar scores for other traits. Class_4 means that the participant has identical scores for openness and agreeableness and the same medium scores for the other three traits. We used 20,000 samples as a subset of the Big Five personality-test dataset.

The training and testing of the used dataset were carried out by using two different data-split approaches: traditional data splitting and cross-validation splitting.

3.5.1. Traditional Data Spilt

The samples were split as follows. In total, 70% were placed in the training set and 30% were used to test various machine-learning models to detect and classify participants’ personal traits. These models included KNN, SVM, RF, and AdaBoost. Performance evaluation of each model was conducted using weighted-measurement metrics, such as precision, recall, f1-score, and accuracy. In addition to these metrics, a confusion matrix was also applied. Figure 6 shows the confusion matrix for the best model-classification results.

Figure 6. Confusion matrix for the SVM model.

Based on the results of the confusion metric, the SVM model obtained 95 misclassified samples out of the 6000, which were used as a testing set. The model’s performance was reliable and could be applied to classify the participants’ personality traits accurately. Table 2 summarizes the classification results of the traditional data-splitting models.

Table 2. Testing results of the proposed machine-learning models using traditional data splitting.

Table 1 and the confusion matrix in Figure 6 analyze the performances of the proposed models. By comparing the results of the evaluation metrics, we found that the SVM classifier proved its effectiveness and efficiency with an accuracy of 98%. Furthermore, it outperformed the other classifiers in classifying and predicting participants’ personality traits using different categories, as described in the previous section. Furthermore, poor performance was observed using the AdaBoost model. The ROC curve for the SVM classifier is shown in Figure 7 shows ROC of SVM method in the training and testing evaluation method.

Figure 7. The ROC of the SVM model using traditional data split.

3.5.2. Cross-Validation Spilt

Cross-validation is a statistical method used to evaluate the performances of machine-learning models. It involves dividing a dataset into separate training and testing subsets and using the training subset to fit the model. The model is then evaluated on the testing subset to assess its performance. This process is repeated multiple times, with different subsets of the data used as the training and testing sets each time. In this study, we implemented a common k-fold cross-validation method to ensure the accuracy of our results, as shown in Table 1.

In this experiment, we used five-fold cross-validation, which is a resampling procedure used to evaluate the performance of our proposed machine-learning models. It involves dividing the dataset into five subsets (folds), training the model on four folds, and evaluating its performance on the remaining fold. This process is repeated five times, with a different fold used as the test set in each iteration. The performance measure is then averaged across all five iterations to estimate the model’s performance with unseen data. The testing results of the KNN, RF, SVM, and AdaBoost classifiers using the five-fold cross-validation are presented in Table 3, Table 4, Table 5 and Table 6. The experiential results clearly show that SVM classifier provided the best performance and outperformed other classifiers.

Table 3. The results of KNN classifier using five-fold cross-validation.

Table 4. The results of the RF classifier using five-fold cross-validation.

Table 5. The results of the SVM classifier using five-fold cross-validation.

Table 6. The results of the AdaBoost classifier using five-fold cross-validation.

4. Discussion

In this study, we presented a personal-traits-testing model based on machine-learning techniques that can help organizations and government agencies to select appropriate employees for specific jobs or to form a working team to perform a specific task. The compatibility of team qualities contributes significantly to the success of large businesses and the achievement of their strategic goals. The model was designed based on a standard dataset collected from the responses of individuals worldwide. Machine-learning techniques were used in the data analysis, clustering, and classification. For the clustering task, the k-means algorithm successfully sorted the data into five clusters, each containing similar personal patterns from the participants. These clusters were agreeableness, conscientiousness, extraversion, openness, and neuroticism. To the best of our knowledge, no other study has applied the same idea and dataset. However, some previous studies were identified in the literature review, such as social-identity personality traits based on social-media data [15,21], and in other domains, such as healthcare (using questionnaires [24]) and education [22]. Table 7 shows a comparison of the proposed system’s results with those of previous studies. Figure 8 shows ROC of SVM method in cross validation method.

Table 7. Comparison of results and those of previous systems.

Figure 8. The ROC of the SVM model using five-fold cross-validation.

The application of information technology in the management of human resources has developed steadily in different countries, along with the level of information technology. Human-resource-management information systems have the potential to reduce the amount of information transmitted, as well as the amount of time this requires. Additionally, it has the potential to release human-resource personnel from mundane administrative tasks, and change the mode of service that human-resources departments provide, transforming into a management role involving the provision of decision-making support and solutions

5. Conclusions

This study concluded that behavioral tests must be applied in human-resources management and performance management to motivate qualified employees toward achieving organization’s strategic goals. The study analyzed behavioral data collected from 20,000 participants through a publicly available dataset on the Kaggle platform. We obtained the Big Five personality-test results to cluster the participants by type of behavior. Supervised machine-learning models were used to analyze the responses to the questions according to personal traits. The algorithm was able to divide the individuals into a group of clusters; each cluster was a set of similar personality traits.

The findings of the study were as follows:

The internal correlation test of the groups for every ten questions showed a positive correlation.
There was a positive relationship between the following four traits: agreeableness, conscientiousness, extraversion, and openness.
The relationship between one personality trait, neuroticism, and the other four personality traits was negative.
This test helps to identify participants’ psychological and behavioral traits in any domain.
Companies and organizations prefer participants who can integrate and adapt to their work teams.
This test can be a source of safety for organizations in preventing violent behavior.

Machine-learning models, such as SVM, RF, KNN, and AdaBoost, were used to classify personalities based on psychological traits, derived from the participants’ responses, with satisfactory results. By comparing the evaluation-metrics results, we found that the SVM classifier proved effective and efficient, with an accuracy of 98%. Furthermore, it outperformed the other classifiers in classifying and predicting the participants’ personality traits using different categories. Finally, poor performance was evident when using the AdaBoost model.

The proposed personal-traits-testing model can be adopted. Its accuracy rate is high, and it can save time and effort compared to personal interviews and direct questions to determine the characteristics of the candidates. The answers to these questions may be untrue and hide aspects of a candidate’s true personality. Organizations can apply this proposed methodology to evaluate employees’ personality traits during their work on strategic plans. Achieving the goals that maintain an organization’s image requires people with specific personal and behavioral skills. Analyses of user preferences and behavioral predictions based on user data may provide some useful reference points for optimizing information structure and improving service accuracy. These can be learned from the data. However, the OCEAN user-personality-model-identification algorithms still have certain limitations. The machine-learning algorithm is one modern approach with a comparative advantage. This algorithm may be quickly adapted to meet a broad range of directional issues, which makes it a competitive option. When we began to write this paper, one of our primary objectives was to improve the identification process used by the OCEAN personality model via the application of a neural-network approach. The plan for advancing this direction of study is to develop a model based on deep-learning algorithms.

Author Contributions

Conceptualization, M.A.A., Z.A.T.A., S.N.A. and T.H.H.A.; methodology, M.A.A., Z.A.T.A. and S.N.A.; software, Z.A.T.A. and T.H.H.A.; validation, M.A.A., Z.A.T.A., S.A.A. and T.H.H.A.; formal analysis, Z.A.T.A. and S.N.A.; investigation, M.A.A., S.N.A. and T.H.H.A.; resources, M.A.A., Z.A.T.A., S.A.A. and T.H.H.A.; data curation, M.A.A., Z.A.T.A., S.N.A. and T.H.H.A.; writing—original draft preparation, M.A.A., Z.A.T.A., S.N.A. and T.H.H.A.; writing—review and editing, Z.A.T.A., S.N.A. and T.H.H.A.; visualization, Z.A.T.A., S.N.A. and T.H.H.A.; supervision, Z.A.T.A., S.N.A. and T.H.H.A.; project administration, M.A.A. and T.H.H.A.; funding acquisition, M.A.A. and T.H.H.A. All authors have read and agreed to the pubished version of the manuscript.

Funding

This research and the APC were funded by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. GRANT2713].

Data Availability Statement

Not applicable.

Acknowledgments

The authors acknowledge the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. GRANT2713].

Conflicts of Interest

The authors declare no conflict of interest.

References

Alnamrouti, A.; Rjoub, H.; Ozgit, H. Do Strategic Human Resources and Artificial Intelligence Help to Make Organisations More Sustainable? Evidence from Non-Governmental Organisations. Sustainability 2022, 14, 7327. [Google Scholar] [CrossRef]
Robbins, S.; van Wynsberghe, A. Our New Artificial Intelligence Infrastructure: Becoming Locked into an Unsustainable Future. Sustainability 2022, 14, 4829. [Google Scholar] [CrossRef]
Wang, M.; Pan, X. Drivers of Artificial Intelligence and Their Effects on Supply Chain Resilience and Performance: An Empirical Analysis on an Emerging Market. Sustainability 2022, 14, 16836. [Google Scholar] [CrossRef]
Carmichael, L.; Poirier, S.-M.; Coursaris, C.K.; Léger, P.-M.; Sénécal, S. Users’ Information Disclosure Behaviors during Interactions with Chatbots: The Effect of Information Disclosure Nudges. Appl. Sci. 2022, 12, 12660. [Google Scholar] [CrossRef]
Mahlasela, S.; Chinyamurindi, W.T. Technology-related factors and their influence on turnover intentions: A case of government employees in South Africa. Electron. J. Inf. Syst. Dev. Ctries. 2020, 86, e1. [Google Scholar] [CrossRef]
Bamiatzi, V.; Bozos, K.; Cavusgil, S.T.; Hult, G.T.M. Revisiting the firm, industry, and country effects on profitability under recessionary and expansion periods: A multilevel analysis. Strateg. Manag. J. 2016, 37, 1448–1471. [Google Scholar] [CrossRef]
Otero-López, J.M.; Santiago, M.J.; Castro, M.C. Big Five Personality Traits, Coping Strategies and Compulsive Buying in Spanish University Students. Int. J. Environ. Res. Public Health 2021, 18, 821. [Google Scholar] [CrossRef]
Fernández-Mesa, A.; Alegre, J. Entrepreneurial orientation and export intensity: Examining the interplay of organisational learning and innovation. Int. Bus. Rev. 2015, 24, 148–156. [Google Scholar] [CrossRef]
Zhu, C.; Liu, A.; Wang, Y. Integrating organisational learning with high-performance work system and entrepreneurial orientation: A moderated mediation framework. Front. Bus. Res. China 2019, 13, 1–24. [Google Scholar] [CrossRef]
North, K.; Kumta, G. Knowledge Management: Value Creation through Organizational Learning; Springer: Cham, Switzerland, 2018. [Google Scholar]
De Raad, B. The Big Five Personality Factors: The Psycholexical Approach to Personality; Hogrefe & Huber Publishers: Göttingen, Germany, 2000. [Google Scholar]
John, O.P.; Srivastava, S. The Big-Five Trait Taxonomy: History, Measurement, and Theoretical Perspectives. 1999. Available online: https://personality-project.org/revelle/syllabi/classreadings/john.pdf (accessed on 10 November 2022).
Benet-Martínez, V.; John, O.P. Los Cinco Grandes across Cultures and Ethnic Groups: Multitrait-Multimethod Analyses of the Big Five in Spanish and English. J. Pers. Soc. Psychol. 1998, 75, 729–750. [Google Scholar] [CrossRef]
John, O.P.; Donahue, E.M.; Kentle, R.L. The Big Five Inventory; Versions 4a and 54; University of California: Berkeley, CA, USA, 1991. [Google Scholar]
Digman, J.M. Personality Structure: Emergence of the Five Factor Model. Annu. Rev. Psychol. 1990, 41, 417–440. [Google Scholar] [CrossRef]
Verma, R.; Bandi, S. Challenges of artificial intelligence in human resource management in Indian IT sector. In Proceedings of the XXI Annual International Conference, New Delhi, India, 4–5 January 2020; Available online: https://www.internationalconference.in/XXI_AIC/TS5E/MsRichaVerma.pdf (accessed on 20 November 2022).
McRobert, C.J.; Hill, J.C.; Smale, T.; Hay, E.M.; Van der Windt, D.A. A multi-modal recruitment strategy using social media and internet-mediated methods to recruit a multidisciplinary, international sample of clinicians to an online research study. PLoS ONE 2018, 13, e0200184. [Google Scholar] [CrossRef]
Baron, I.S.; Mustafa; Agustina, H. The challenges of recruitment and selection systems in Indonesia. J. Manag. Ment Mark. Rev. 2018, 3, 185–192. [Google Scholar] [CrossRef]
Arpaci, I.; Kocadag Unver, T. Moderating Role of Gender in the Relationship between Big Five Personality Traits and Smartphone Addiction. Psychiatr. Q. 2020, 91, 577–585. [Google Scholar] [CrossRef]
Alamsyah, A.; Dudija, N. Identifying Personality of the New Job Applicants using the Ontology Model on Twitter Data. In Proceedings of the 2nd International Conference on ICT for Rural Development (IC-ICTRuDev), Jogjakarta, Indonesia, 27–28 October 2021; pp. 1–5. [Google Scholar]
Laleh, A.; Shahram, R. Analyzing Facebook activities for personality recognition. In Proceedings of the 16th IEEE international conference on machine learning and applications (ICMLA), Cancun, Mexico, 18–21 December 2017; pp. 960–964. [Google Scholar]
Qin, X.; Liu, Z.; Liu, Y.; Liu, S.; Yang, B.; Yin, L.; Zheng, W. User OCEAN Personality Model Construction Method Using a BP Neural Network. Electronics 2022, 11, 3022. [Google Scholar] [CrossRef]
John, R.; John, R.; Rao, Z.R. The Big Five personality traits and academic performance. J. Law Soc. Stud. 2020, 2, 10–19. [Google Scholar] [CrossRef]
Curtis, R.G.; Windsor, T.D.; Soubelet, A. The relationship between Big-5 personality traits and cognitive ability in older adults–a review. Aging Neuropsychol. Cogn. 2015, 22, 42–71. [Google Scholar] [CrossRef]
Dymecka, J.; Tarczyński, R.; Gerymski, R. Stress in emergency telephone number operators during the COVID-19 pandemic: The role of self-efficacy and Big Five personality traits. Health Psychol. Rep. 2022. Available online: https://www.researchgate.net/publication/361843686_Stress_in_emergency_telephone_number_operators_during_the_COVID-19_pandemic_the_role_of_self-efficacy_and_Big_Five_personality_traits (accessed on 20 December 2022).
Muntean, L.M.; Nireștean, A.; Mărușteri, M.; Sima-Comaniciu, A.; Lukacs, E. Occupational Stress and Personality in Medical Doctors from Romania. Healthcare 2022, 10, 1612. [Google Scholar] [CrossRef]
Chavoshi, P. The Relationship Between the Big-Five Personality Traits and Depressive Symptoms: A Meta-Analysis. Master’s Thesis, The University of Western Ontario, London, ON, Canada, 2022. Available online: https://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=11546&context=etd (accessed on 10 November 2022).
Aghabayk, K.; Rejali, S.; Shiwakoti, N. The Role of Big Five Personality Traits in Explaining Pedestrian Anger Expression. Sustainability 2022, 14, 12099. [Google Scholar] [CrossRef]
Xu, L.; Luo, Y.; Wen, X.; Sun, Z.; Chao, C.; Xia, T.; Xu, L. Human Personality Is Associated with Geographical Environment in Mainland China. Int. J. Environ. Res. Public Health 2022, 19, 10819. [Google Scholar] [CrossRef] [PubMed]
Priyadharshini, S.U. Influence of Big 5 personality traits on the investment decisions of retail investors-an empirical approach. PalArch’s J. Archaeol. Egypt/Egyptol. 2020, 17, 9725–9736. [Google Scholar]
Ludeke, S.G.; Bainbridge, T.F.; Liu, J.; Zhao, K.; Smillie, L.D.; Zettler, I. Using the Big Five Aspect Scales to translate between the HEXACO and Big Five personality models. J. Personal. 2019, 87, 1025–1038. [Google Scholar] [CrossRef] [PubMed]
Mueller, A.; Claes, L.; Mitchell, J.E.; Wonderlich, S.A.; Crosby, R.D.; de Zwaan, M. Personality Prototypes in Individuals with Compulsive Buying Based on the Big Five Model. Behav. Res. Ther. 2010, 48, 930–935. [Google Scholar] [CrossRef] [PubMed]
Big Five personality traits. Available online: https://www.123test.com/big-five-personality-theory/ (accessed on 2 November 2022).
Big Five Personality Test. Available online: https://openpsychometrics.org/tests/IPIP-BFFM/ (accessed on 28 October 2022).
Big Five Personality Test. Available online: https://www.kaggle.com/datasets/tunguz/big-five-personality-test (accessed on 24 October 2022).
Almaiah, M.A.; Almomani, O.; Alsaaidah, A.; Al-Otaibi, S.; Bani-Hani, N.; Hwaitat, A.K.A.; Al-Zahrani, A.; Lutfi, A.; Awad, A.B.; Aldhyani, T.H.H. Performance Investigation of Principal Component Analysis for Intrusion Detection System Using Different Support Vector Machine Kernels. Electronics 2022, 11, 3571. [Google Scholar] [CrossRef]
Al-Nefaie, A.H.; Aldhyani, T.H.H. Bitcoin Price Forecasting and Trading: Data Analytics Approaches. Electronics 2022, 11, 4088. [Google Scholar] [CrossRef]
Ibrahim, S.; Nazir, S.; Velastin, S.A. Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis. J. Imaging 2021, 7, 225. [Google Scholar] [CrossRef]
Aldhyani, T.H.H.; Alkahtani, H. Artificial Intelligence Algorithm-Based Economic Denial of Sustainability Attack Detection Systems: Cloud Computing Environments. Sensors 2022, 22, 4685. [Google Scholar] [CrossRef]
Granato, D.; Santos, J.S.; Escher, G.B.; Ferreira, B.L.; Maggio, R.M. Use of principal component analysis (PCA) and hierarchical cluster analysis (HCA) for multivariate association between bioactive compounds and functional properties in foods: A critical perspective. Trends Food Sci. Technol. 2018, 72, 83–90. [Google Scholar] [CrossRef]
Aldhyani, T.H.; Alshebami, A.S.A.; Alzahrani, M.Y. Soft Computing Model to Predict Chronic Diseases. J. Inf. Sci. Eng. 2020, 36, 365–376. [Google Scholar]
Al-Adhaileh, M.H.; Aldhyani, T.H.H. Artificial intelligence framework for modeling and predicting crop yield to enhance food security in Saudi Arabia. PeerJ Comput. Sci. 2022, 2022, e1104. [Google Scholar] [CrossRef]
Alkahtani, H.; Aldhyani, T.H.H. Developing Cybersecurity Systems Based on Machine Learning and Deep Learning Algorithms for Protecting Food Security Systems: Industrial Control Systems. Electronics 2022, 11, 1717. [Google Scholar] [CrossRef]
Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
Kodinariya, T.M.; Makwana, P.R. Review on determining number of Cluster in K-Means Clustering. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 2013, 1, 90–95. [Google Scholar]
Zeng, Z.; Qi, L. “Internet + Artificial Intelligence” Human Resource Information Management System Construction Innovation and Research. Math. Probl. Eng. 2021, 2021, 5585753. [Google Scholar] [CrossRef]

Figure 1. Big Five personality traits.

Figure 2. Architecture of the proposed methodology.

Figure 3. The distribution of participants on the dataset.

Figure 4. Correlations between personality traits.

Figure 5. Elbow-method plot.

Figure 6. Confusion matrix for the SVM model.

Figure 7. The ROC of the SVM model using traditional data split.

Figure 8. The ROC of the SVM model using five-fold cross-validation.

Table 1. Results of PCA.

Extraversion	Neuroticism	Agreeableness	Conscientiousness	Openness	Clusters
0.60	0.48	0.62	0.64	0.66	4
0.68	0.42	0.64	0.62	0.54	0
0.58	0.52	0.56	0.56	0.62	0
0.52	0.54	0.64	0.54	0.62	3
0.70	0.46	0.60	0.64	0.72	4

Table 2. Testing results of the proposed machine-learning models using traditional data splitting.

Classifier Name	Precision %	Recall %	F1-Score %	Testing Accuracy %	Training Accuracy %
KNN	92.88	92.85	92.85	92.85	96.1
RF	95.7	95.7	95.7	95.7	1.00
SVM	98.42	98.41	98.41	98.41	98.73
AdaBoost	68.69	68.35	68.17	68.35	69.00

Table 3. The results of KNN classifier using five-fold cross-validation.

K-Fold Iteration	Precision %	Recall %	F1-Score %	Testing Accuracy %	Training Accuracy %
Fold 1	92.9	92.9	92.9	92.9	96.1
Fold 2	93.5	93.5	93.5	93.5	96.0
Fold 3	93.4	93.4	93.4	93.4	95.9
Fold 4	92.1	92.1	92.1	92.1	96.4
Fold 5	93.3	93.3	93.3	93.3	96.0
Mean	93.0	93.0	93.0	93.0	96.1

Table 4. The results of the RF classifier using five-fold cross-validation.

K-Fold Iteration	Precision %	Recall %	F1-Score %	Testing Accuracy %	Training Accuracy %
Fold 1	96.1	96.1	96.1	96.1	1.0
Fold 2	95.7	95.7	95.7	95.7	1.0
Fold 3	96.1	96.1	96.1	96.1	1.0
Fold 4	95.6	95.6	95.6	95.6	1.0
Fold 5	95.3	95.3	95.3	95.3	1.0
Mean	95.8	95.8	95.8	95.8	1.0

Table 5. The results of the SVM classifier using five-fold cross-validation.

K-Fold Iteration	Precision %	Recall %	F1-Score %	Testing Accuracy %	Training Accuracy %
Fold 1	98.3	98.3	98.3	98.3	98.9
Fold 2	98.5	98.5	98.5	98.5	98.8
Fold 3	98.7	98.7	98.7	98.7	98.9
Fold 4	98.5	98.5	98.5	98.5	98.8
Fold 5	98.1	98.1	98.1	98.1	99.0
Mean	98.4	98.4	98.4	98.4	98.9

Table 6. The results of the AdaBoost classifier using five-fold cross-validation.

K-Fold Iteration	Precision %	Recall %	F1-Score %	Testing Accuracy %	Training Accuracy %
Fold 1	68.3	68.3	68.3	68.3	67.6
Fold 2	66.2	66.2	66.2	66.2	67.5
Fold 3	73.6	73.6	73.6	73.6	73.5
Fold 4	63.8	63.8	63.8	63.8	64.3
Fold 5	69.1	69.1	69.1	69.1	69.5
Mean	68.2	68.2	68.2	68.2	68.5

Table 7. Comparison of results and those of previous systems.

References	Method	Accuracy %
Ref. [45]	ANN	85.06
Ref. [46]	ANN	71
Proposed system	SVM	98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Application of Artificial Intelligence for Better Investment in Human Capital

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Dataset Collection

3.2. Data Preprocessing

3.2.1. Correlation Testing Using Pearson’s Approach

3.2.2. Feature Selection

3.2.3. Clustering Algorithm

3.3. Machine-Learning Models

3.3.1. Support Vector Machine (SVM) Method

3.3.2. AdaBoost Method

3.3.3. K-Nearest Neighbors (KNN)

3.4. Evaluation Metrics

3.5. Experimental Results

3.5.1. Traditional Data Spilt

3.5.2. Cross-Validation Spilt

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics