A Strategy Based on Genetic Algorithms for Forming Optimal Collaborative Learning Groups: An Empirical Study

: Considering that group formation is key when developing activities in collaborative learning scenarios, this paper aims to propose a strategy based on a genetic algorithm approach for achieving optimal collaborative learning groups, considering the students’ personality traits as grouping criteria. A controlled experiment was designed with 238 students, quantifying their personality traits through the “big five inventory” (BFI), forming working groups and developing a collaborative activity in programming and related courses. The experiment results allowed validation, not only from a computational point of view evaluating the algorithm performance but also from a pedagogical point of view, confronting the results obtained by students applying the proposed approach with those obtained through other group formation strategies. The highlight of the study is that those groups whose formation was pre-established by the teachers through the proposed strategy have generally had a better collaborative performance than the groups with traditional formation, except in the case of heterogeneous formation, at the time of developing a collaborative activity. In addition, through the experiment, it was found that not considering criteria related to personality traits before the group formation generally led to lower results.


Introduction
Although many studies have investigated the benefits of group formation in collaborative learning scenarios, to date, few studies have analyzed the achievements of published research on the subject [1]. Research has focused on the proper functioning of collaborative learning, but the formation process of such groups is neglected or belittled [2]. The group formation in collaborative environments is not a trivial task when achieving homogeneity or heterogeneity within the groups is concerned. Applying a good strategy in their forming, which considers not only one, but several of the student characteristics, depend largely on the general academic benefit [3]. Given the above, one of the aspects to be evaluated in group formation may be the personality of the students.
Despite the important contributions of the work on group formation, it is observed that few studies in the literature combine personality traits with other important factors in the grouping of students [4]. Even rarer is research that is based on pedagogical approaches, based on theories and learning strategies, and that considers student personality traits to support the formation of learning groups [1]. Therefore, there is great potential for research on this subject. In this context, the development of this work is motivated by the lack of scientific studies that combine personality traits with collaborative learning theories to support the group formation process in these scenarios.

Collaborative Performance Evaluation
Defining the characteristics of the collaboration process relevant to its quality and developing instruments capable of evaluating these characteristics are not trivial tasks [9]. In the literature, several works propose different methods and approaches of assessing the quality of collaboration, both for academic and organizational contexts. However, this multiplicity of dispersed works is generally presented without any validation [10,11]. Chebil et al. [12] mention that generally, in these works, the evaluation is carried out through the identification of metrics and their measurement and through the design of questionnaires that reflect the aspects to be evaluated and their use. Table 1 presents studies from the last decade specifically oriented to academic contexts, indicating some of their basic characteristics, as well as their validation scenario. Table 1. Studies on the evaluation of collaboration in the last decade.

Reference
Title-Year Description [13] Assessing Your Collaboration: A Self Evaluation Tool-1999 Self-assessment questionnaire by factors.
No validation scenario is specified. [14] Collaboration Assessment Guide and Tool-2007 Self-assessment questionnaire by indicators grouped into categories. Of general application, including educational. No validation scenario is specified. [9] A Rating Scheme for Assessing the Quality of Computer-Supported Collaboration Processes-2007 Based on ratings by dimensions. The teacher is the evaluator. Validated in a real scenario. [15] Validating empirically a rating approach for quantifying the quality of collaboration-2012 Based on ratings by dimensions. Use support software. Validated in a real scenario. [16] A Domain-Independent, Transferable and Timely Analysis Approach to Assess Student Collaboration-2013 Based on metrics. The teacher is the evaluator. Validated in a real scenario. [17] Tool to Assess Teamwork Performance in Higher Education-2013 Self-assessment questionnaire by indicators. Validated in a real scenario. [18] ReaderBench: Automated Evaluation of Collaboration Based on Cohesion and Dialogism-2015 Based on metrics. For speaking and writing collaborative activities. Validated in a real scenario. [19] Evaluating Collaboration for Effectiveness-2015 Self-assessment questionnaire by factors. Of general application, including educational. Validated in a real scenario. [20] Towards the Real-Time Evaluation of Collaborative Activities: Integration of an Automatic Rater of Collaboration Quality in the Classroom from the Teacher's Perspective-2016 Based on real-time monitoring software. The teacher is the evaluator. Validated in a real scenario. [21] Co-Measure: Developing an Assessment for Student Collaboration in STEAM Activities-2017 Based on rubrics by dimensions. Validated in a real scenario. [22] Mapping Individual to Group Level Collaboration Indicators Using Speech Data-2019 Based on machine learning. For collaborative speaking activities. Validated in a real scenario.
For this study, the questionnaire-type tool proposed by Diez et al. [17] was selected and adapted, taking into account its completeness and, at the same time, its systematic application, as well as its orientation and specific validation in educational contexts. Furthermore, since it is a self-assessment and co-assessment questionnaire, its use does not imply an additional "workload" for the teacher. Collaborative performance is measured through seven indicators that the authors call operational processes (see Figure 1). These processes describe functions and interactions that appear during group work and how these are managed to perform group tasks. They coincide directly or indirectly with the criteria or dimensions of group performance evaluation to be taken into account in collaborative work scenarios, described in some of the studies listed in Table 1.  It is worth clarifying that the proposal of Diez et al. [17] considers collaborative performance evaluation as an integral multidimensional process, in which one of its dimensions is precisely collaboration, disaggregating into collaboration, cooperation and coordination, but more as competence or students' ability than as the general macro-process of collaboration.
The tool is composed of a questionnaire that allows assessing the level of collaborative performance and a document that allows students to interpret the score obtained and provided them with guidelines for improvement. The questionnaire originally contains 23 questions directly related to activities and operations associated with the aforementioned processes, which were adapted in its writing and supplemented with an additional question on the leadership process. The questions are scored on a Likert scale from 0-4 (0 = totally disagree or not applicable, 1 = partially disagree, 2 = indifferent or undecided, 3 = partially agree, 4 = totally agree). Each operational process that you want to evaluate is associated with several questions in the questionnaire. To obtain a final score regarding a certain process, the average is calculated with the scores of the questions with which it is related.
The questionnaire is designed to be completed by students at the end of a collaborative activity, first individually by each member of each group and then agreeing on the answers as a group. In this way, at the end of the process, the groups have a consensus score for each question in the questionnaire. With these global scores, the averages are obtained for each operational process, from which the "trace" of each group is constructed, which represents evidence of their collaborative performance in carrying out a specific activity. Figure 2 presents an example of a trace for a working group. In practice, this trace is what would allow the teacher to provide feedback on the group's work, identifying the processes in which it could be improved. This indirectly provides the group and, by extension, each student, in particular, a powerful tool for the analysis of strengths and weaknesses of the working group as an initial basis for improvement [23,24].

About Group Formation
Groups constitute a basic social structure. They are formed in many ways for many different purposes: people meet in social situations, coordinate to perform work-related tasks or constitute commissions because of common interests. Although, in the academic fields, they are also easily formed, and for very different purposes, group creation in the classroom can be a complicated and unnatural process. However, for collaborative learning to succeed, it is important to set up effective groups [8].
The group formation can be characterized by different parameters, such as the size of the group, the duration of the group, the ideal method of grouping, the authority of grouping, and the type of the groups formed, among others. In addition, it is necessary to pay attention to the educational and psychological characteristics of the members involved in the grouping. These characteristics vary from group to group if the group's goals and tasks are different. In some of the works found in the literature, groups are formed using characteristics of the members such as knowledge, skills, and competencies, while other working groups are based on learning styles, personality traits, and other characteristics [25]. Regardless of what these characteristics are, three types of grouping can be generated related to the level of uniformity or similarity in them: homogeneous grouping, with this distribution, all the groups that are formed present greater similarity to the totality of individuals, when all the characteristics are considered together; heterogeneous grouping, with this distribution, all the groups that are formed present greater variability concerning the totality of individuals, when all the characteristics are considered together; and, mixed grouping, with this distribution, all the groups that are formed present greater similarity and variability concerning the totality of individuals, when is searching out homogeneity for certain characteristics and heterogeneity for certain other characteristics [25].
Several studies deal with the group formation problem, considering a limited number of characteristics of students and employing different optimization techniques. Sancho-Asensio et al. [26] experimented with an incremental search technique called "backtracking" applied to student profiles. Zheng and Pinkwart [27] used the discrete particle swarm optimization algorithm to heterogeneous group learners according to their personality traits and gender. Costaguta et al. [28] and Lescano et al. [29] present an approach to performing the automatic group formation of students in collaborative environments, which considers their learning styles, applying data mining techniques and genetic algorithms. Basu Roy et al. [30] employed a semantics-based recommendation system for group formation with different objectives. Tien et al. [31] examined the effectiveness of the group formation process by proposing a new strategy that adopted a genetic algorithm with the TOPSIS technique. Srba and Bielikova [32] formed groups through the application of so-called group technology, a concept that is managed in the engineering and manufacturing management sector; they considered feedback as a criterion for the group formation, arguing that the quality of previous collaboration is important when forming a group. Battur et al. [33] propose in their study that the teams be formed based on the student's complementary skills and ensuring that each team has an expert member in each identified skill. Wichmann et al. [34] investigated how group formation based on student behavior affects productivity in a small group task. Amara et al. [35] conducted research to form homogeneous groups in a mobile collaborative learning environment, with a personalized selection of training attributes; the technique used was the K-Means algorithm. Sadeghi and Kardan [36] and Amarasinghe et al. [37] propose as an optimization technique for group formation the entire binary linear programming, based on task assignment, gender, and language preferences. Manske and Hoppe [38] used a semantic algorithm, which maximized the diversity of knowledge in the groups; artefacts produced by students were analyzed to extract knowledge from each of them; student concepts were added in a unified data model and used to calculate diversity. Odo et al. [39] discussed how student affective status can affect their performance in collaborative study groups. Lykourentzou et al. [40] and R. C. D. Reis et al. [41] highlight in their works the importance of personality traits as critical elements affecting student collaboration and interaction, stating that this factor can influence student performance and satisfaction, and induce various actions and behaviors in group work.

Genetic Algorithms
Genetic algorithms were described by Holland [42] and are considered as a computational model family inspired by Darwin's principles of evolution. The common characteristic of these algorithms is that they encode the potential solutions to the problem using a data structure, generally a vector, as a chromosome, and apply recombination operators seeking to preserve the critical information that guides a satisfactory solution [43].
To provide a clearer idea of the general scheme of these algorithms, Figure 3 is presented. Here it can be seen that one starts from an initial population of individuals, typically random, where an individual is understood as a possible solution (not necessarily a good one). Everyone is then represented with a chromosome made up of genes where each gene refers to a part or sequence of such a solution. These individuals are then evaluated using a specific fitness function, and genetic operators are applied to obtain a new population (next-generation). These operators aim to preserve the chromosomes or part of them that represent better solutions, following the principles of natural selection, so that fittest individuals are more likely to survive. Regarding their application, genetic algorithms have been used, particularly during the last two decades, in a wide range of combinatorial optimization problems, including travel salesman problem (TSP), knapsack problem (KP), sequencing and scheduling of tasks (Scheduling), vehicle routing, among others [44], which makes this technique a good candidate to solve the grouping problem outlined in this paper.
The literature review by Cruz and Isotani [45] concerning group formation demonstrates the great interest of researchers in using the technique of genetic algorithms as a solution to the problem, given its relevance in dealing with a large number of variables and their ability to quickly generate optimal solutions, i.e., useful groups.
In this work, a modified genetic algorithm was used over other optimization techniques, alluding, among others, to the following reasons: genetic algorithms are intrinsically parallel, that is, they operate simultaneously with several solutions, instead of working sequentially as traditional techniques; they are less affected by local maximums (false solutions) than traditional techniques; they can manipulate many parameters simultaneously, which is interesting if you have several objectives to solve; they are relatively easy to run on today's massive parallel architectures; they use probabilistic operators instead of the typical deterministic operators of other techniques. A more detailed explanation of its specific application in the study problem is found in Section 3.3.

The "Big Five" Model
Currently, a wide variety of models and personality theories that offer different perspectives on how to approach a person's personality are presented. Some of these theories are Carl Jung's psychological types, Keirsey's personality types theory, The "big five" factors personality model, and Myers-Briggs type indicator (MBTI), among others [46].
In this work, it was decided to use the "big five" or "five-factor model (FFM)" personality model because it has obtained the greatest consensus in the area of Psychology and because it is one of the most widely used in the literature [47,48]. The "big five" is a personality traits hierarchical model composed of five big factors where each represents personality characteristics at a more general abstraction level. These factors or dimensions are traditionally referred to as extraversion, agreeableness, conscientiousness, neuroticism, and openness. Table 2 shows the most prominent characteristics of each of the dimensions of the model [49]. Each value combination in the different dimensions generates a personality type with a different tendency to behave, interact, react, and reason. Each personality has a different preference to perform a certain task or to work with certain people type. A personality analysis aims to know the abilities of a person and to know how to take advantage of them, and in turn, to know what weaknesses could manifest to balance them with the abilities of other people.
The "big five" model of personality is the most widely applied when investigating the influential role of personality traits on group functioning and performance [52][53][54][55]. Specifically, the relationship between this personality model and group performance has been investigated in several studies. For example, Baer et al. [56], Rhee et al. [57], Sánchez Hórreo and Carro [58] and Thoms et al. [59] found a positive relationship between the extraversion level and group performance; levels of agreeableness were positively related to group performance [55,60]; conscientiousness was considered the most related trait in both individual and group success [55], since conscientious individuals are described as hardworking, self-disciplined and reliable; De Dreu and Van Vianen [60] and Peeters et al. [55] showed a positive relationship between levels of conscientiousness and group performance; Peeters et al. [55] and Rhee et al. [57] also investigated emotional stability (or reversely neuroticism), and found a negative relationship between emotional stability and group performance; finally, Sánchez Hórreo and Carro [58] showed that heterogeneous groups based on openness got better results than homogeneous groups, and Baer et al. [56] found that the openness level was positively related to the group performance. Facets of personality, such as agreeableness, emotional stability, and helpfulness, contribute to the outcomes of collaborative work; other criteria related to relationships, such as the level of members' communication and the team cooperation, have also been associated with personality traits (Davies and Kanaki, 2006).

The Strategy for Forming Collaborative Learning Groups
The proposed strategy seeks to form homogeneous, heterogeneous, or mixed groups, contrasting the five dimensions proposed by the "big five" model to address a person's personality. The measurement of each of these personality dimensions can be obtained by the psychometric instrument "big five inventory" (BFI) to later find the mean of the group in each and then form the groups looking to optimize some measure intra-heterogeneity and inter-homogeneity.

Methodological Approach
The methodological approach for forming optimal collaborative learning groups based on participant's personality traits is presented as a sequential process comprising three stages described in the following sections. Figure 4 schematically summarizes this process. The texts at the top are the required inputs at each stage, and those at the bottom describe the outputs in each of them.
It is clarified that, as such, the proposed group-building technique would go only until the second stage. Stage three, related to the collaborative activity or activities to be developed, is incorporated into the process only for evaluation of the technique. This stage would be relative to the academic space in which it will be implemented, considering that a collaborative learning process consists of several tasks or activities that must be developed both by the cognitive mediator or facilitator and by the groups of learners, clearly defining three stages: pre-process, process and post-process, as it is proposed by Collazos et al. [61].

Measurement of Personality Traits
A Spanish adaptation of the BFI (big five inventory) is used as an instrument to measure the personality traits of students. The aim of using this instrument is to have a scientifically accepted way to quantify the personality traits of an individual, which, as will be seen later, is the input required by the grouping algorithm. At no time is it intended to issue any type of concept or psychological diagnosis of the study participants, as this is outside the scope of this. The adaptation of the BFI into Spanish by Oliver P. John and Verónica Benet-Martínez [62] is used, with the corresponding consent for investigative purposes. The instrument requires the individual to fill out an "informed consent", with which the use of the results for a specific purpose is accepted, in this case, the formation of groups as a didactic strategy. Figure 5 shows the Spanish and English versions of the instrument, the latter as a reference for non-Spanish-speaking readers, which were reconstructed from the original paper.
Once the BFI was applied to each of the n students to be grouped, the results obtained must be stored as shown in Table 4. Each row corresponds to a student, the first two columns being their identifier and his name, and the remaining five measures each of the personality dimensions considered by the "big five" model: extraversion, agreeableness, conscientiousness, neuroticism, and openness.
For it filling out and processing, one activity type "survey" and a complement for Moodle ™ were developed, respectively, which can be imported into any course or academic space that is implemented in the aforementioned LMS. The two resources are available for your use by previously contacting the authors.

Group Formation
The grouping of elements is a general combinatorial problem that consists of the distribution of a set of elements to a defined number of groups, generally of the same size, in such a way that a certain condition is satisfied [31]. Although at first glance it may seem simple, the complexity of this problem is mainly focused on two aspects. The first refers to the condition that must be satisfied, which in the most common case can be to obtain "equitable" or homogeneous groups considering a certain measure of value for each element. For example, in the case that concerns the present study, suppose that in a class you want to form several groups of students. A simple method of achieving some homogeneity (academically speaking) would be to order the students from highest to lowest according to some type of prior assessment and begin to assign one to each group in a sequential manner. However, what happens when each element does not take into account only one attribute, but several, and it is desired that each group is equitable considering them all? Worse still, what happens when such attributes are not proportional to each other? In these cases, the distribution criterion is no longer trivial but requires some intelligent search to find a solution that satisfies the required condition.
The second aspect is the resulting combinatorial explosion that goes hand-in-hand with the total number of elements to be grouped and the number of groups that want to form. Thus, for example, if you want to distribute 50 students into groups of 5, this value would amount to 2,118,760 possible combinations (employing combinatorial number), which makes finding the best solution from an exhaustive search not feasible in many cases.
When considering these two aspects together, it is easy to intuit that this problem can be treated as a multiobjective optimization, where each objective consists of reaching the highest possible level of similarity between the average of each group concerning each attribute and the average of all the elements [63].
One-way to approach this problem, as mentioned above, is through an exhaustive search, which depending on the number of elements and groups, will not always be possible given the inherent computational limitations. In these cases, heuristic search methods can be a good alternative because, although these methods do not guarantee to find the optimal solution, they generally find satisfactory one, using considerably less computational effort. Among these methods, we can mention [64]: local search, taboo search, and genetic algorithms, among others, the latter being the object of study in this proposal. As mentioned in Section 2.2, the specific case of the combinatorial problem of group formation solved through the genetic algorithm technique has been widely addressed by researchers, given its relevance when dealing with a large number of variables and its possibility to quickly generate optimal solutions.
Taking into account the principles of genetic algorithms, as well as the nature of the problem of interest, the proposed method for group formation is described in detail in this section. The procedure is based on the work of Moreno et al. [2], who propose a method to group elements (not necessarily students) in a homogeneous way, using genetic algorithms.

Students Representation
Since the idea is to consider not only one but several characteristics of the students, each n student can be represented employing a vector as follows; m is the number of characteristics: These characteristics might have different nature, for example, demographic (age, sex, etc.), psychological (personality traits, skills, abilities, etc.), academic (qualifications, pretests, self-assessment, etc.), and cognitive (learning styles, intelligence types, etc.), among others.
This representation requires that any characteristic m (1 ≤ m ≤ M) be quantified by a numerical value in a predefined range, which does not mean that categorical attributes can be considered. In these cases, a prior numerical discretization process would be required. For example, if an attribute takes "high", "medium", and "low" values, these may be changed to 1, 2, and 3, respectively.
The total number of students can be represented by an array of M × N, with M being the number of characteristics and N is the number of students, as shown in Table 4. Once the data are organized in this way, it may require being scaled to a common range so that there are no disturbances in the calculation and that they are easily comparable. An easy way to achieve this is that all data are in the range 0-1, applying statistical standardization based on the unit [65], using the following formula: where Xmax and Xmin are the maximum and minimum values of the corresponding dimension.

Individual Representations
In the grouping case, an individual corresponds to a given collection of G groups, each with up to N/G students, with N being the total number of students. In most jobs that use genetic algorithms, the data structure used is a vector where each position corresponds to a solution gene. In the proposed model, it is proposed to use an array, where the number of rows corresponds to the number of groups desired G and the number of columns corresponds to the maximum size of each N/G group. In this way, each gene that makes up the chromosome contains the identifier of an element, and its position within the matrix defines the group to which it would belong. This representation, in addition to the clarity it entails, facilitates the use of the genetic crossover operator proposed below.
In the group formation problem, as well as in other combinatorial problems, a chromosome cannot have repeated genes, meaning that an individual (feasible solution) is that in which each element is in a single position in the chromosome. For example, if you have a total of 20 students and want to form 4 groups, each will contain exactly 5 students. In this case, a possible individual, if the students are numbered consecutively, could be like the one presented in Table 3.

Fitness Measurement
Since this method aims to obtain homogeneous/heterogeneous groups concerning all students, it is necessary to define a measure of this homogeneity/heterogeneity. One possible way to do this is described below. The average of each characteristic of the entire student (TM) is first calculated: Then for each g group (1 ≤ g ≤ G) of everyone, the average of each characteristic is calculated. Because each individual i is represented as a X i matrix, such averages (IM) can be represented as follows: Subsequently, the sum of the squared differences between the characteristics M for each group g of the individual i and the average of each characteristic is calculated in the entire elements as well: The lower this value (with a minimum of 0), the more similar each of the groups will be on average concerning the total number of students, for the case of homogeneous formation, and the higher this value, the less similar each of the groups will be on average concerning the total number of students, for the case of heterogeneous formation. Therefore, the objective function of the problem could be expressed as follows: For mixed formation, that is, heterogeneous for certain characteristics and homogeneous for others, the problem becomes one of multiobjective optimization: it is required to maximize the differences for the heterogeneous characteristics and at the same time minimize the differences for the homogeneous characteristics. Considering the above, a possible way to deal with this situation is described below.
Let HT and HM be the vectors of characteristic for which heterogeneity and homogeneity are considered, respectively, represented as follows: For the fitness measure, the sum of the squared differences between the J characteristics of heterogeneity for each group g of individual i and the average of each characteristic in all the elements is calculated, and the value obtained from the sum is subtracted from the differences squared between the K characteristics of homogeneity for each group g of individual i and the average of each characteristic in all the elements, all this in absolute value, thus: The greater the difference in objectives, the better heterogeneity the groups would have in the HT characteristics and the better homogeneity in the HM characteristics, simultaneously. The objective function can be expressed as follows: To clarify this metric and apply the concepts presented so far, the following example is considered where you have 6 students with the 5 dimensions of the "big five" model valued, as shown in Table 4. After scaling these values according to the procedure described at the end of the "students representation" section, Table 5 is obtained. Now we want to form two groups, each with three students. Two possible individuals are shown in Table 6.
Finally, by calculating the fitness measures by applying (5), D 1 is obtained by 0.2790 and D 2 -0.3282. It can be observed that the grouping represented by individual 1 is more inter-homogeneous than individual 2. In other words, with this distribution, all the groups of individual 1 reflect the total of students with greater precision when all the characteristics are considered on the whole. On the contrary, the grouping represented by individual 2 is more inter-heterogeneous than individual 1. In other words, with this distribution, all the groups of individual 2 present greater variability concerning the total of students when all the characteristics are considered on the whole. Now a mixed formation is desired, which is homogeneous for C2 and C3, and at the same time heterogeneous for C1, C4 and C5. According to (7) and (8)  , Calculating the fitness measures applying (9) is obtained: It can be seen that the grouping represented by individual 1 is more inter-homogeneous for C2 and C3, and inter-heterogeneous for C1, C4, and C5 than individual 2, that is, with this distribution, all groups of individual 1 more accurately reflect similarity and variability with the total set of students, when simultaneously looking for homogeneity for C2 and C3, and heterogeneity for C1, C4, and C5.

Initial Population and Evolution
The example represented in Table 6 shows a trivial group formation: assign each student to a group based on the identifier they have. The first N/G students (in this case, 3) belong to group 1, the following N/G to group 2, and so on. While this formation is valid, the idea of the initial population is to randomly generate k individuals, using the matrix representation described above and complying with the restriction that each element must be in one and only one of the array positions.
Once the initial population is obtained, and according to the scheme represented in Figure 3, the process of evolution is carried out in which it is passed from one generation to another using genetic operators until the desired aptitude measure is obtained or until a total of h generations is reached.

Empirical Design
The research process was developed with an empirical design based on an experiment [66] as shown in Figure 6, seeking to verify one of the following hypotheses: H0: the collaborative performance of the students does not improve considering their personality traits when forming working groups (null hypothesis), or, H1: the collaborative performance of students improves considering their personality traits when forming working groups (research hypothesis). The experiment consisted of conducting a collaborative learning activity was carried out with 238 students from 14 courses programming and related, belonging to Programs Systems Engineering at the University of Nariño, the Mariana University, and CESMAG University the City of San Juan de Pasto-Colombia, in the academic semesters B-2019 and A-2020. It is worth mentioning that for the A-2020 period, only two courses were available that were developed in virtual modality, given the situation of COVID-19.
The courses indexed from 1 to 10 were the experimental groups to which the experimental treatment (Xi) was applied: homogeneous, heterogeneous, and mixed formation of the groups required for collaborative activity, applying the proposed strategy. The courses indexed from 11 to 14 were the control groups, to which the experimental treatment was not applied; the groups required for the activity were formed by student preference. Table 7 summarizes this characterization. In turn, Oi and Oj were the posttests applied at the end of the experiment, both to the experimental groups and to the control groups, which consisted of individual and group processing of the instrument described in Section 2.2, seeking to determine the incidence of the experimental treatment in the collaborative performance of the participants. In mixed formations, for the functional validation of the strategy, one dimension of the big five model is considered in each case for homogeneity and the others for heterogeneity, respectively openness (O), neuroticism (N), and conscientiousness (C). These dimensions are chosen for homogeneity because the evidence shows that they could be important and relevant to influence academic success in higher education [67].

Class Session
Before presenting the results obtained in the research process, it is convenient to show how a class session would be carried out in which you want to form groups with the proposed strategy. To do this, the teacher, taking into account the methodological scheme presented in Figure 4, performs the steps indicated in the activity diagram shown in Figure 7.  Figure 7 shows the performance of the following activities: 1. Preparation of students: The teacher in charge must explain to the students why and why the whole process is going to be developed. It should be recommended to students to answer all of the BFI in full, as honestly as possible since there are no "right" or "wrong" answers; clarify that the results provide a general personality profile of the participants, which will be used only as input for the grouping process. As mentioned above, it is recommended to leave proof of the above by individually filling out an "informed consent", with which the use of the results for a specific purpose is accepted, in this case, the group formation as a didactic strategy. This filling out is done through a physical format or computer media. 2. BFI application: Once the corresponding setting was made, we proceed to the "Spanish BFI" application. The questionnaire application is done directly in printed formats or digitally through a PDF form, an electronic sheet, or through computer applications specifically designed for this purpose. Students can do this activity at home or in a special class section designated for this purpose, taking into account to reserve a space of 10-15 min for the filling out of the instrument. 3. Obtaining and tabulating results: After applying the questionnaires, the teacher proceeds to obtain scores for each of the students in each of its dimensions. Like the previous task, obtaining results is done manually or in an automated way through a computational tool. At this point, it is important to clarify that these first three activities, corresponding to the measurement of the students' personality traits, are carried out once in the academic period (for the case under study, once a semester), since the group generally remains stable throughout the period, and the results can be used as many times as desired for the formation of new working groups, in terms of members or number of these.

Group formation:
The measurement of the student personality traits is supplied by the teacher to the computational tool that implements the algorithmic proposal, which allows obtaining the "ideal" formation of the groups (homogeneous, heterogeneous, or mixed), taking into account the participants' personality traits. 5. Assignment of the collaborative activity: Considering the curricular planning of the course, the teacher assigns the group the corresponding collaborative activity, establishing the execution and evaluation rules, also informing how the different working groups were formed for their development. 6. Development of the collaborative activity: The students in the different working groups develop the activity proposed by the teacher under his guidance and supervision.
7. Evaluation of the collaborative activity: Once the activity is finished by the students, the teacher proceeds to evaluate the activity, taking into account the established evaluation rubric. Table 8 describes the roles, the fulfillment, and the duration of each of the tasks that were carried out to support the different themes through the strategy proposed in the experimental courses listed in Table 7.

Results
To comprehensively evaluate the collaborative performance achieved by the students participating in this study, the instrument described in Section 2.2 was applied to 10 experimental groups (with 58 working groups) and 4 control groups (with 16 working groups), which was implemented as a "survey" type activity for Moodle™. The "working group" is considered as a unit of experimentation; in total 74 groups, were made up of 3 or 4 students.
The working groups developed collaborative activities within the framework of their respective experimentation groups (experimental and control) for each of the courses (see Table 7). The students carried out one collaborative work per course, evaluated from the instrument filling out by each group. These works correspond to practical cases of collaborative programming. The group activity was considered in all courses as a non-face-toface task, and the instruments were filling out outside of school hours.
Before analyzing the data, the reliability of the results was verified through the calculation of Cronbach's α, obtaining a value of 0.875, classified as a high level of reliability. This value supports the reliability of the results discussed below.
However, knowing Cronbach's α by itself does not prove the validity of the instrument used. The conceptual coherence of the instrument was validated by observing whether the results it provides are consistent with the theory about collaborative work and the influence of personality in programming and software development activities. Specifically, and as will be seen later, a contrast of experimental groups was carried out for each of the different types of grouping (homogeneous, heterogeneous, and mixed) versus control groups. Table 9 and Figure 8 show the overall results of the working groups. The aim of these is to present in a summarized way the results of all groups, but their analysis at a general level is not significant for this study. In practice, what is interesting about the experience is the particular analysis of each group and its subsequent feedback.  Concerning the overall results of the working groups, it can be observed that operational processes 4 and 6 are those that obtain the highest average scores, which implies that internal communication is adequate and effective; and that collaboration is present in the groups to achieve the objectives that were established. On the other hand, operative process 7 is the one that obtained the least score, this being a key factor to take into account when managing collaborative working groups [68,69], independent of its form of composition since an undefined leadership in the groups can affect their performance, especially when it is not managed well [70]. A fairly uniform behavior is also observed between the groups (little variability in the data).
The mean scores of the experimental groups were contrasted versus those of the control groups, seeking to verify if there is a general improvement in the collaborative performance of the working groups applying the proposed strategy, concerning the process of self-training of groups traditionally used by teachers and their students, when developing a collaborative activity. Table 10 summarizes the mean scores of the experimentation groups (experimental and control) in each of the operational processes and for each of the types of formation, followed by the corresponding traces (see Figure 9). Figure 10 contrasts the experimental groups by type of grouping. (e)  These behaviors highlight the following: Those groups whose formation was pre-established by the teachers (experimental) through the proposed strategy have generally had a better collaborative performance than the groups with traditional formation (control), except in the case of heterogeneous training. The mixed formations were the best scored. The lowest rated process in all relationships is "conflict management" This is an aspect to improve regardless of the type of grouping since conflict within the group can affect its performance [70].
"Leadership" is the process that evolved the most in the different relationships, except in the one that considers mixed formation privileging the dimension of neuroticism (N). In other words, homogeneous formation for neuroticism and heterogeneous for the other dimensions. The latter is explained because emotional stability and high levels of emotional adjustment, contrary to neuroticism, predict leadership potential in a person, while on the contrary, neurotic people have a low probability of showing leadership characteristics, even more so when working together [71].
"Internal communication" is the process that always excels in control groups. Allowing students to choose their classmates or group members makes them feel more comfortable and motivated to work, allowing communication within the group to flow in a better way [8]. The processes of "participation/decision-making", "problem resolution", "external communication" and "collaboration" score better in the experimental groups, except in the case of heterogeneous formation, since students may feel uncomfortable with the diversity of opinions and the possible tensions arising from disagreements, situations that tend to occur in this type of grouping [8].
The mixed formations that favor the dimension of openness (O) and conscientiousness (C) share the best average score for all operational processes. This confirms what has been stated in some studies that affirm that these dimensions of the big five model are among the most representative in programmers and software developers, both in their individual and group work [72][73][74][75][76][77][78]. For its part, as mentioned above, the mixed training that favors the neuroticism dimension (N) was the one that obtained the lowest average score, confirming in the same way what was stated in some related studies [49,72,73,77,[79][80][81].

Discussion
This work aimed to propose a strategy based on a genetic algorithm approach that allows obtaining optimal collaborative learning groups, considering the students' personality traits as a grouping criterion, addressing the different needs for group formation: homogeneous, heterogeneous, and mixed.
In practice, a group formation strategy like this can be particularly useful for teachers just beginning to understand the unique abilities of their students or when they want to consider more complex factors for such formation; it can also be useful for distance education teachers who need to form "virtual groups" without the benefits of face-to-face meetings; outside of academia, for example, software managers may find it useful to form development teams, although they would need to be very specific in the use of psychological variables. On the other hand, it may be less useful for teachers who know their students well enough to develop their own strategies for creating successful small learning groups.
The algorithm is flexible to the number and type of attributes of the individuals to the group. Depending on the context, different sets of attributes can be used to form the groups. In this case, the five personality dimensions handled by the big five model were used as characteristics or attributes. Other attributes that can be quantified and used with the algorithm can be learning styles, personal skills, levels of knowledge, and team roles, among others [1,25], which opens a wide range of research possibilities in the field of group formation in collaborative learning scenarios.
The proposed strategy also entails some ethical implications that must be taken into account when replicating the experience, since, as mentioned throughout the paper, it involves in a certain way a scheme of "classification" of the students using their personality traits, assessed through a psychometric test. Among them, it is worth highlighting that it must necessarily have the "informed consent" on the part of the participating students; the psychometric test to be used must have its proper authorization for use; the teacher becomes the custodian of the privacy and confidentiality of the data even though psychological concepts are not emitted; and it should be clarified to students that there is no communication of test results, neither personal nor group.
Likewise, considering that the computational implementation involves a "genetic allocation algorithm", with highly stochastic elements and programmed by humans, this may not necessarily lead to the "perfect solution" in terms of the real context, since a certain degree of bias, before which it would be important to consider in the process of group formation the perception that the teacher has of the personal characteristics of his students, especially in situations where it is evident that the algorithm exerts a high degree of subjectivity or discrimination. The above, as an ethical measure before the possible consequences of the diffusion of the model and its eventual adoption as a standard.
Given that the context of the application of the study is framed in academic scenarios, which by their nature are dynamic, situations may arise that threaten its internal validity, and this affects the results in experimentation. According to the classification made by Kirk [66], specifically for the study, some of these situations are: external events occur during the experiment and influence only some students (history); the students change or mature during the experiment (maturation); the students abandon the experiment (experimental mortality); students from different groups communicate with each other (treatments diffusion); students from control groups perceive that they receive nothing and this discourages them (compensation); the teacher's behavior has a positive or negative influence on the experiment (experimenter behavior); and, the environmental conditions of the experiment are not the same for all participating groups (experimental environment).
For its part, the main threat to the external validity of the results of the study lies in the size and context of the sample, 238 students of programming courses and related courses of the systems engineering programs of the City of San Juan de Pasto-Colombia, presenting limitations in terms of the number of courses available for experimentation by type of grouping (homogeneous, heterogeneous, mixed or by student preference), and in terms of population diversity, which somewhat restricts the generalization of the findings.
Taking into account that the processes for measuring personality traits and collaborative performance assessment require the filling out of questionnaires, a response bias could occur if this activity is not carried out conscientiously by the students. In the same way, experimenting with different teachers in different courses could cause a procedural bias if the criteria for experimentation are not previously unified.

Conclusions and Further Work
The work presented in this paper aims to be a relevant contribution in the context of collaborative learning since the research focuses on a key element: the formation of learning groups. The main objective is to obtain homogeneous or heterogeneous groups, as similar or different as possible to the general characteristics (personality traits) of the total sample of students.
Although in this work, the experimentation was limited to a few characteristics (the five dimensions of the big five model), the proposed strategy allows working with an unlimited number of characteristics of different nature, so it can be considered generic. To achieve this goal, the method uses a matrix representation for each student and then translates the grouping problem into a multiobjective optimization problem under a combinatorial scenario, which is solved with an approach based on genetic algorithms. This algorithm defines the fitness value as the mean squared difference between the entire sample and each group and uses an iterative process that is based on the principles of evolution of the species, where different solutions are obtained looking for the optimal value. Now, from the pedagogical point of view, it is confirming that not considering criteria related to personality traits before the group formation has generally led to lower results. Taking into account personality traits in the group formation process enhances their collaborative performance when it comes to software development activities [78,[82][83][84][85][86][87][88][89][90]. In this aspect, the highlight of the study is that those groups whose formation has been pre-established by the teachers through the proposed strategy, in general, have had a better collaborative performance than groups with traditional formation, except in the case of heterogeneous training.
The experimental data collected shows the suitability of the instrument used to measure collaborative performance in terms of its representativeness and objectivity. The results observed in real working groups created and formed according to the proposed strategy are consistent with what could be expected.
It is important to clarify that none of the working groups had previous experience and/or training in structured and well-defined collaborative work. Studies affirm that later experiences and/or previous training in collaborative work would lead to a more significant improvement in the group performance [91,92].
Taking into account the latter, in the future, it is suggested to extend the described experiment by incorporating a pretest, which would allow, in the first place, to verify the hypothesis that later experiences and/or previous training lead to an evolution in the collaborative performance of participants; and, second, as part of the pretest, an assessment of academic performance could be carried out to later be contrasted with the corresponding posttest, in search of evolution or involution in the academic performance of the participants.
It is also proposed to assess in each context, also through controlled experiments, what type of approach is the most appropriate to organize the groups (homogeneous, heterogeneous, or mixed), bearing in mind that there are a strategy and a tool that facilitate this work and automates it.
In addition, it is suggested to explore which specific dimensions of the personality can most directly influence the learning process of programming and software engineering in general. From this, eventually propose a recommendation system that "recommends" what would be the "ideal" formation of a working group, taking into account the specific characteristics of the students and the proposed collaborative activities.
Given that the application of the strategy would entail an additional effort for both teachers and students, it is also proposed as future work to evaluate this "amount of additional effort" and if, or under what circumstances, it is worth it. Institutional Review Board Statement: Ethical review and approval were waived for this study because all participant students were informed about the research's purpose and scope and were ensured their anonymity.
Informed Consent Statement: Informed consent was obtained from all students involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.