Fuzzy Multicriteria Models for Decision Making in Gamification

Gamification is an innovative teaching technique that may prove hugely beneficial when properly used. For this reason, since 2002, the number of situations in which gamification is used has increased exponentially. This large number of options makes it difficult to choose the best application, especially in circumstances where there is the usual uncertainty that real-life decision making involves. To address this problem, this study creates two models, one using a fuzzy analytic hierarchy process (AHP), and the other, which combines fuzzy AHP with the measuring attractiveness by a categorical-based evaluation technique (MACBETH) approach, to choose the best gamification application for the ‘Operations Management’ course, within the Masters in Industrial Engineering. This is the first contribution in the literature combining fuzzy AHP and MACBETH. The decision centre used was the lecturer who teaches the course. There is no precedent in the literature using fuzzy logic to choose the best gamification application for a course. The results of the study show that Socrative is the best gamification application for this course within the Masters, and, as the models begin to be used in degree courses, the better choice would be Quizizz, the more clearly the earlier the course is taught within the degree programme.


Introduction
Unlike the traditional teaching model based on the lecture, received passively by students, recent trends in education recommend active and participative methodologies [1]. In this regard, gamification stands out as a tool that adapts the mechanics and strategies of games and videogames to education, or non-gaming environments, with the aim of altering people's behaviour [2] and encourage the performance of certain actions, although, according to Contreras and Egia [3] it is an improvement process, with the possibility of providing the experience of a game, with the idea of serving as a backing for the activities of the users. Therefore, the gamification of learning is an educational approach designed to motivate students to learn by using video game design and game elements in learning environments.
The benefits of using gamification in teaching are many [4][5][6][7]: increasing the essential involvement and motivation of the participants; combining theory and practice, making it easier to assimilate learning; providing immediate feedback; meaning that mistakes encourage students to try again; offering continuous intellectual activity as they are constantly interacting with the computer; turning the learning of difficult concepts and material into something fun; with a high degree of interdisciplinarity; developing skills for seeking and choosing information; improving the group dynamic; producing a healthy spirit of competition; increasing creativity; encouraging individual, rigorous, methodical work; developing digital skills; assisting in problem solving; visualising simulations; encouraging interaction between participants; increasing interest in participating in class; increasing the channels of communication between teacher and student, etc.; as well as driving connectivity and interoperability in ClassDojo, Class Realm, CodeCombat, Coursera, Duolingo, Edmodo Gamificación, Gimkit, Goose Chase, Knowre, Kahoot!, keySkillset, Khan Academy, Maven, Memrise, Minecraft: Education Edition, Pear Deck, Play Brighter, Quizizz, Quizlet, TEDEd, The World Peace Game, Tinycards, Trivinet, Toovari, SoloLearn, Udemy, Yousician, Virtonomics, etc. [29][30][31][32]; and this number of applications is expected to rise in the future [2]. Although Acuña [9] states that the five best applications for universities are: FlipQuiz, Quizizz, Socrative, Kahoot! and uLearn Play, the large number of applications makes it difficult to choose the best one for any given course, and some statistical comparisons between applications has been undertaken, by for example Göksün and Gürso [33], who say that educational activities with Quizizz are less effective than Kahoot!, with respect to academic achievement. However, in a literature review on different databases (Emerald, Hindawi, Proquest, Science Direct and Scopus) using the terms "gamification multicriteria" and "education game multicriteria", the only precedent found was [34] who develops a model via the analytic hierarchy process (AHP) for the choice of a widening of gamification to the field of business; however, the research is more aimed at the characteristics of the seller (credibility of supplier, competitiveness of product, contract terms, etc.) and so is more related to the choice of any kind of software than gamification software directly related to teaching. No model has, therefore, been found to assist teachers in choosing the best gamification application for a given class, neither in the university environment nor at lower levels. Therefore, this study develops two models, one using fuzzy AHP and another combining fuzzy AHP with the measuring attractiveness by a categorical-based evaluation technique (MACBETH) approach to choose a gamification application for the Operations Management course taught at the Industrial Engineering School of Ciudad Real, part of the University of Castilla-La Mancha. The decision maker was the lecturer who teaches the course. There is no precedent in the literature using fuzzy logic to choose the best gamification application for a course. Using fuzzy logic allows the uncertainties, ambiguities, doubts and indecision typical of real-world decision making to be taken into account. The choice of fuzzy AHP rather than other fuzzy techniques is due to its prior application in studies that have shown how effective it is in many problems in real life [35]. The methodology that combines fuzzy AHP with the MACBETH approach stems from the fact that MACBETH gives additional tools to handle ambiguous, imprecise or inadequate information or the impossibility of giving precise values (this advantage in the assessment of fuzzy measures is acknowledged by Roubens [36]) and that can be combined with fuzzy AHP.
Also, MACBETH was chosen, rather than other multicriteria techniques, for its ability to establish quantitative measurement scales based on qualitative judgements [37]. This is the first contribution in the literature combining both multicriteria techniques. Another example of the combination of MACBETH with other multicriteria decision analysis techniques can be seen in Pishdar et al. [38], where it is combined with the best-worst method to classify the 19 international airports in Iran and determine the most suitable to be a hub airport, Gürbüz et al. [39], meanwhile, combines the analytic network process, Choquet integral and MACBETH to assess enterprise resource planning (ERP) alternatives.
Ertay et al. [40] compares the models built by MACBETH and fuzzy AHP to evaluate renewable energy alternatives such as solar, wind, hydropower, and geothermal. Along similar lines, Ferreira and Santos [41] compare models built by AHP, Delphi, and MACBETH in an application of trade-off readjustments in credit risk analysis of mortgage loans. Among the results, it is found that AHP and MACBETH are very similar in terms of accuracy and they both behave better than Delphi.
Although Dhouib [42] developed an extension of MACBETH for the fuzzy environment, using the approach of Herrera and Martínez [43], in an application to assess options in reverse logistics for tyre waste, the combination of methodologies used in this study includes ambiguity and imprecise information in equal measure, in a way that is easier to apply.
Gamification applications that have been assessed in the models are Kahoot!, Quizizz and Socrative, since they include what are currently the most commonly-used tool, Kahoot! and Socrative [44] and, furthermore, these tools are among the very few whose free version does not restrict the number of users or questions [45]. Additionally, the number of alternatives is limited because it is impossible for the experts to discuss a large number of alternatives at the same time [46].
The layout of this paper is as follows. Firstly, the methodology of the fuzzy analytic hierarchy process is set out. Then, the model built via the fuzzy analytic hierarchy process is described, including the structure of the model with the criteria, decision matrices and the intermediate values. Next, the model built by combining fuzzy AHP and the MACBETH approach is described. Finally come the results from both models, which include the sensitivity analysis, conclusions and future development.

Fuzzy Analytic Hierarchy Process Methodology
AHP is a well know multicriteria technique, whose essentials elements can be seen in [47] and [48]. It is widely recognised that human judgements are provided linguistically, with imprecise patterns for a complex problem [49]. To deal with the uncertainty due to imprecision or vagueness of human judgements, Zadeh [50] first introduced fuzzy set theory. Since linguistic terms are an approximation to the subjective judgements given by the decision makers, fuzzy numbers such as trapezoidal, interval, and type-2 fuzzy numbers are used to reflect the vagueness of these linguistic terms, although triangular numbers are used most widely in the literature, due to their intuitiveness and computationally efficient representation.
A triangular fuzzy number comprises three numbersM = (l, m, u), where l is the smallest value, m the modal value, and u the largest value used in defining uncertain judgements [51] and l ≤ m ≤ u; in the case where l = m = u we are dealing with a crisp number. This triangular fuzzy number is characterised by a continuous membership function which associates a real number on the interval [0, 1] [52]. The membership function µ x M ofM is shown in Equation (1) [53]. The closer µ( x M ) is to 1, the more x belongs to the fuzzy set A [54].

Operations
Results Fuzzy AHP has been widely applied in the literature, as seen in the literature review carried out by Kubler et al. [55]; they review 190 papers published between 2004 and 2016, which show predominant application to the manufacturing, industry and government sectors, and a high percentage of these studies combine fuzzy AHP with other tools, as in our own study. However, it is never combined with the MACBETH approach. Additional applications of fuzzy AHP are described in Calabrese et al. [56] and Turskis et al. [57].
Some fuzzy AHP methodologies are proposed in the literature to calculate the relative weights of the criteria: logarithmic least squares [58], geometric mean [59], synthetic extended analysis [60], fuzzy least-squares priority [61], direct fuzzification of the λ max [62], fuzzy preference programming [63], and two-stage logarithmic goal programming [64]. Although the Chang extent analysis methodology is widely used in the literature, it gives less accurate results [65]. The geometric mean method suggested by Buckley [59] is, therefore, the methodology to be used in this study as it is easier to apply and understand than other methodologies [66] and guarantees a unique solution.
The procedure for applying Buckley's geometric mean methodology is as follows: 1.
Choose the decision maker. A decision maker or a group of experts who know the problem and can provide information, judgements and the necessary validation of the results obtained from the model.

2.
Build the hierarchical structure. Choosing and defining the criteria and subcriteria relevant to the problem structuring them into a hierarchical tree. At the higher level of the hierarchy is the objective of the problem, while the criteria and subcriteria are placed at the following levels respectively, and finally, the alternatives are placed at the lower level.

3.
Select the fuzzy scale. The original scale proposed by Saaty, in which a judgement is associated with an integer from 1 to 9 or their inverses, does not include uncertainties (doubt, vagueness, hesitancy or ambiguous situations) [67] which characterises decision problems in the real world. Also, decision makers sometimes feel more confident giving interval judgments rather than crisp judgements [68]. Different fuzzy scales are proposed in the literature [67,[69][70][71]. From these, the scale given by Lamata [72], which is shown in Table 2, was chosen for this study, as it corresponds better to the original scale proposed by Saaty in the crisp AHP. One example of the application of this scale can be seen in Koulinas et al. [73].  8,9) (1/9,1/8,1/7) Extremely more important 99 = (8, 9,9) (1/9,1/9,1/8)

4.
Building the fuzzy judgement matrices among the entire hierarchy. The individual decision maker or decision group should give fuzzy judgements when comparing criteria/subcriteria or alternatives and between the scale levels of each criterion/subcriterion. The elements of the fuzzy pairwise comparison matrixÃ are the fuzzy valuesM ij , which express the decision maker's judgement about the relative importance of element i over element j, using the fuzzy scale of Table 1, at the same level of the hierarchy.

5.
Calculating fuzzy weights. The geometric mean method [59] is used to obtain the fuzzy weights of each criterion/subcriterion via Equations (3) and (4): whereM ij is the fuzzy comparison value of criterion i with respect to criterion j,r i is the geometric mean of the fuzzy comparison value of criterion i with respect to each criterion, andw i is the fuzzy weight of the criterion i. 6.
Defuzzification process.w i need to be defuzzified, finding the best non-fuzzy performance value (BNP) [74]. Different methods can be used for this: mean of maximum (MOM), centre of area (COA) and α-cut. The COA or centroid method is simple and practical, and there is no need to bring in the preferences of any assessors [75], and so it is the method applied in this research. Equation (5) should be applied to the w i obtained from Equation (4) [74,76]:
Evaluate the consistency of the judgements. LetR = [r ij ] be a fuzzy judgement matrix comprising triangular fuzzy Saaty [47] defined the consistency ratio (CR) to quantify the consistency of the judgements given in each pairwise matrix (see Equation (7)).
where CI is a consistency index calculated from Equation (8); with λ max the maximum eigenvalue and n the dimension of the matrix. The random consistency index (RCI) is computed from the simulation with random matrices with dimension equal to that assessed.
The judgements provided by the decision maker are consistent if CR is less than 0.05 for a 3 × 3 matrix, 0.08 for a 4 × 4 matrix or 0.1 for matrices of an order higher than 4 × 4 [77]. If the CR exceeds 0.1, the judgement of the pairwise comparison matrix should be removed and obtained afresh.

Fuzzy Analytic Hierarchy Process Model for Decision Making in Gamification
This section firstly describes the structuring process that allows the hierarchy structure of the problem to be built; then the weighting process is set out, with the calculations associated with the application of the geometric mean method, and finally the weighting process for the scale levels of each descriptor is described.

Structuring
The criteria used in this study are original and specific to the Business Management course in the Masters programme in Industrial Engineering, but could be used or adapted for use in other subjects and courses.
It is necessary to define a descriptor for each criterion or subcriterion to be assessed. A descriptor is an ordered set of possible levels of function or efficiency in a criterion to describe the impacts of the alternatives with respect to each criterion objectively [78]. The definition of the descriptor will facilitate the final assessment process for the alternatives with respect to each criterion and means that the criteria need not be assessed subjectively, and no information about the criterion is lost during the process.
The descriptors may be direct or indirect. In a direct descriptor, the scale levels directly reflect effects, while the levels of an indirect descriptor indicate causes. When the criterion is subjective or intangible in nature, or is made up of a number of interrelated elements, or when there are information gaps, there are no direct or indirect descriptors that can be defined, and so a constructed descriptor is used. The scale levels of a constructed descriptor may be quantitative, qualitative, or mixed, and may be produced from the description of the possible consequences of a scenario involving several elements [79]. All the descriptors described in this study are constructed.
After analysing the literature on gamification [2,9,[12][13][14]26,34,80] and considering the results of the direct experience of applying gamification in the classroom with the alternatives assessed, the following decision criteria were established: • Learning rhythm (LRH). Defined as the speed with which a person can learn, a number of levels can be identified: fast, medium and slow. It is essential that the application allows the rhythm of the activities, and thus of the learning, to be controlled. Thus, this criterion uses the descriptor Capacity of the Teacher to Control the Rhythm by a time limit, an unlimited time, and repeating the test as often as necessary. The scale levels of the descriptor in decreasing order of function are: -L21. Can be controlled by the teacher, who has the option to set a time limit or let each student complete the questionnaire at their own rhythm without a time limit and as many times as they wish (with no need to wait for other people's answers). (Good) -L22. Can be controlled by the teacher, who has the option to set a time limit or let each student complete the questionnaire at their own rhythm without a time limit but only once (with no need to wait for other people's answers). (Neutral) -L23. A time must be set for solving each question, with a limit of 15 min or less. -L24. A time must be set for solving each question, with a limit of 120 s or less.
• Assessment of the questionnaire (AQU). The descriptor used to evaluate this criterion is the versatility in assigning a score to each question, as well as calculating the time taken to respond. • Quality of the question library (QQL). Quantity of available public questionnaires, capacity for sharing, duplicating or editing, and the strength of the forum for exchanging experiences and information is the descriptor used to assess this criterion. It has the following scale levels: -L71. It has a library with more than two million publicly available questionnaires that can be used. The questionnaires can be shared, duplicated and edited. It has an active Internet forum for exchanging experience and information. (Good) - L72. It has a library with up to half a million publicly available questionnaires that can be used. The questionnaires can be shared and duplicated, but not edited. There are no forums with significant information on the application. (Neutral) - L73. There is no library, or the resources of other users are not available. Questionnaires cannot be shared, duplicated or edited. There are no forums with significant information on the application. Because the Business Management course has a small number of students, usually about 25, the decision maker did not consider team competition as a suitable criterion, as he wanted the individual assessment of students, although the students can consult with their classmates on their opinions about questions while the tests are being carried out, thereby using this more as a learning tool rather than as assessment tool. Also, the only alternatives assessed were those that allowed a free version of the application to be used, as these are the only ones applicable to the course, and so cost is also not considered a criterion in this model.
The hierarchy structure of the model is shown in Figure 1.

Weighting
The teacher teaching the course acts as the decision maker, giving fuzzy judgements for matrixÃ of Equation (3). The fuzzy scale shown in Table 1 was used to give the judgements. Table 3 shows the fuzzy pairwise comparison matrix between the criteria expressed as (l ij , m ij, u ij ).
We now calculate the geometric mean of the fuzzy comparison value of criterion i with respect to each criterion,r i , using Equation (3): (r 1 ⊕r 2 ⊕ · · · ⊕r n ) −1 =     We then use a centroid method to obtain crisp weightings for the criteria. This is done using Equation (5) After the normalisation process the resulting crisp weighs are: w FCQ = 0.261, w LRH = 0.159, w AQU = 0.159, w ORR = 0.159. w AJI = 0.103, w EGS = 0.060, w QQL = 0.060, w EUC = 0.039.
To check the consistency of the judgements given in building the pairwise comparison matrix by the decision maker, CR is calculated from the values m ij of the judgement matrix in Table 2.
A pairwise comparison matrix was produced between the scale levels of each indicator, and AHP applied, from the judgements given by the teacher of the course. In this case, the crisp numbers of Table 1 have been used, obtaining the valuations of each scale level for each indicator; these were turned into the utility vectors shown in Table 4; each component of the utility vector is associated with a previously defined level of each descriptor during the structuring section. The utility of the most preferred level for a descriptor is 1, and for the least preferred level it is 0.
All the pairwise comparison matrices used in the multicriteria model have consistency ratios below 0.1 or 10%.

Fuzzy Analytic Hierarchy Process Combined with the Measuring Attractiveness by a Categorical-Based Evaluation Technique (MACBETH) Approach Model for Decision Making in Gamification
The MACBETH approach, developed by Bana e Costa and Vansnick [81], is a complete multicriteria methodology that only requires qualitative judgements provided by an individual or a decision group to obtain a quantitative valuation of the alternatives. The theoretical foundations, together with examples of real-world applications, can be seen in Bana e Costa et al. [78] and Bana e Costa et al. [82] and Bana e Costa and Vansnick [83].
The MACBETH approach is a complete methodology which assists objective decision making, which includes an exhaustive procedure, which other techniques lack, such as the definition of indicators associated with each criterion, reference levels assigned to the scale levels of each descriptor, building of value functions which guarantee a comparison of the criteria on a common scale, validation of the values assigned to each alternative, and consistency in the judgements given.
The M-MACBETH software, which helps to build MACBETH models, is described in Bana e Costa et al. [84] (a user's guide can be downloaded at http://m-macbeth.com/demo/ and a description of the software can be found in Bana e Costa and Vansnick [85]).

Structuring
The structuring process is similar to that set out in Section 3.1, although MACBETH requires attributing two scale levels: good and neutral, to the scale levels of each descriptor. Good, considered by the decision maker to be fully satisfactory and neutral, if the decision maker considers a level to be neither satisfactory nor unsatisfactory [86]. This assignment by the decision maker is shown in Section 3.1.
The value tree or hierarchy structure of the model is similar to that shown in Figure 1.

Weighting
The combination of fuzzy AHP and the MACBETH approach occurs at this stage. This done by using weighting for the criteria derived by using the geometric mean method.
The value functions for each criterion were then obtained. To this end, the decision maker gives judgements between the scale levels of each descriptor using the MACBETH semantic categories: no, very weak, weak, moderate, strong, very strong, extreme or a blend of two or more successive categories. When the difference in attractiveness between scale levels cannot be quantified accurately, a positive category can be used. This characteristic of MACBETH is very useful for reflecting the uncertainty of the decision maker when giving the judgements, and reinforce the fuzzy logic expressed in fuzzy AHP. For example, in the criterion Flexibility in creating questionnaires, the decision maker gave the judgements shown in Figure 2. The level good is shown in green (L31) and neutral in blue (L33). The Figure shows that when comparing level L11 with level L15, the decision maker hesitated between very strong or extreme, and so a range of very strong-extreme was assigned. Something similar happens when comparing the other levels with L15. The judgements given can be seen to be consistent. Something similar happens when comparing the other levels with L15. The judgements given can be seen to be consistent. With the reference levels and the judgements issues, M-MACBETH creates, by linear programming, a function that assigns the value 100 to the good level, and 0 to the neutral level. The resulting value function for the criterion Flexibility in the creation of questionnaires is shown in Figure 3. A similar process was used with the other criteria, obtaining, in every case, consistent judgement matrices and their respective value functions (see Figure 3). These value functions should be validated by the decision maker to ensure that they properly represent the relative magnitude of the decision maker's judgements [Error! Reference source not found.]. With the reference levels and the judgements issues, M-MACBETH creates, by linear programming, a function that assigns the value 100 to the good level, and 0 to the neutral level. The resulting value function for the criterion Flexibility in the creation of questionnaires is shown in Figure 3. A similar process was used with the other criteria, obtaining, in every case, consistent judgement matrices and their respective value functions (see Figure 3). These value functions should be validated by the decision maker to ensure that they properly represent the relative magnitude of the decision maker's judgements [87].
programming, a function that assigns the value 100 to the good level, and 0 to the neutral level. The resulting value function for the criterion Flexibility in the creation of questionnaires is shown in Figure 3. A similar process was used with the other criteria, obtaining, in every case, consistent judgement matrices and their respective value functions (see Figure 3). These value functions should be validated by the decision maker to ensure that they properly represent the relative magnitude of the decision maker's judgements [Error! Reference source not found.].  Finally, the weightings of the criteria obtained in Section 3.2 by fuzzy AHP were fed into the M-MACBETH software. The M-MACBETH software automatically completes the judgement matrix between the criteria, to verify the weightings introduced (see Figure 4). Finally, the weightings of the criteria obtained in Section 3.2 by fuzzy AHP were fed into the M-MACBETH software. The M-MACBETH software automatically completes the judgement matrix between the criteria, to verify the weightings introduced (see Figure 4). just-in-time teaching, Elements of gamification (amusement) with impact/motivation on the student, Quality of the question library and Ease of use in class.
Finally, the weightings of the criteria obtained in Section 3.2 by fuzzy AHP were fed into the M-MACBETH software. The M-MACBETH software automatically completes the judgement matrix between the criteria, to verify the weightings introduced (see Figure 4).

Results and Discussion
Each alternative has been assessed by assigning a scale level to each descriptor in the two methodologies described.
The results of the model built with fuzzy AHP are shown in Figure 5.

Results and Discussion
Each alternative has been assessed by assigning a scale level to each descriptor in the two methodologies described.
The results of the model built with fuzzy AHP are shown in Figure 5. The assessment of gamification applications via MACBETH is performed by simple additive aggregation from bottom to top in the hierarchical or value tree. When considering decision criteria, the performance ( ) of an alternative is calculated from Equation (9)   The assessment of gamification applications via MACBETH is performed by simple additive aggregation from bottom to top in the hierarchical or value tree. When considering n decision criteria, the performance V(A) of an alternative A is calculated from Equation (9) [82]. The results in the case of fuzzy AHP (per unit) and combining fuzzy AHP and the MACBETH approach (as a percentage) are similar, and Socrative is the chosen gamification application, followed by Quizizz, and finally Kahoot!. Given the results provided by both models, the teacher of the course was asked his opinion. He remarked that this was in fact the alternative he intended to introduce and, given the characteristics of the students and the programme, it was the application best suited to these characteristics. Therefore, the results of the models are validated.
The decision maker justified the low weightings of the criteria Elements of gamification (fun) with impact/motivation of the student body because it was a Masters course, the students already had a degree with which to start their professional life, and so those who signed up to the course were already motivated to follow it. However, the decision maker considered that if the judgements were offered for a degree course, this criterion would have been more important, the more so the earlier the stage at which gamification is applied.
With respect to the criterion Quality of the library of questions, the small weighting that the decision maker gave to it was because the material to be taught is very specific and there are no prepared questionnaires which serve to assess the material, neither partly nor wholly. However, in the same way as the criterion related to the element of fun, material from the degree programme is more basic and the concepts may be found on some questionnaires designed by other users. Again, in first-year degree courses, which lead to the Masters in Industrial Engineering, this criterion would be more important, and would decrease in importance as the degree progresses.
The stability of the models is confirmed via a sensitivity analysis. This was done by making coherent changes to the weighting of each criterion, maintaining the proportionality of the weights in the other criteria, to observe possible changes in the classification of the alternatives.
The sensitivity analysis of the model built with fuzzy AHP was performed using the tool Dynamic Sensitivity of the software © Logical Decision. Increasing the weighting of the criteria Flexibility in the creation of questionnaires, Obtaining results and reports, Ability to apply just-in- The results in the case of fuzzy AHP (per unit) and combining fuzzy AHP and the MACBETH approach (as a percentage) are similar, and Socrative is the chosen gamification application, followed by Quizizz, and finally Kahoot!. Given the results provided by both models, the teacher of the course was asked his opinion. He remarked that this was in fact the alternative he intended to introduce and, given the characteristics of the students and the programme, it was the application best suited to these characteristics. Therefore, the results of the models are validated.
The decision maker justified the low weightings of the criteria Elements of gamification (fun) with impact/motivation of the student body because it was a Masters course, the students already had a degree with which to start their professional life, and so those who signed up to the course were already motivated to follow it. However, the decision maker considered that if the judgements were offered for a degree course, this criterion would have been more important, the more so the earlier the stage at which gamification is applied.
With respect to the criterion Quality of the library of questions, the small weighting that the decision maker gave to it was because the material to be taught is very specific and there are no prepared questionnaires which serve to assess the material, neither partly nor wholly. However, in the same way as the criterion related to the element of fun, material from the degree programme is more basic and the concepts may be found on some questionnaires designed by other users. Again, in first-year degree courses, which lead to the Masters in Industrial Engineering, this criterion would be more important, and would decrease in importance as the degree progresses.
The stability of the models is confirmed via a sensitivity analysis. This was done by making coherent changes to the weighting of each criterion, maintaining the proportionality of the weights in the other criteria, to observe possible changes in the classification of the alternatives.
The sensitivity analysis of the model built with fuzzy AHP was performed using the tool Dynamic Sensitivity of the software © Logical Decision. Increasing the weighting of the criteria Flexibility in the creation of questionnaires, Obtaining results and reports, Ability to apply just-in-time teaching to 100% or decreasing to 0% shows no variation in the classification of the alternatives, and Socrative is the application chosen in all cases.
Increasing the weighting of the criterion Learning rhythm to 100% does not change the classification of alternatives, but decreasing it to 0.5% leads to a change in the classification of alternatives in favour of Quizizz. The weighting of 0.5% means that in practice the criterion would not exist, which is not considered logically.
If the weighting of the criterion Valuation of the questionnaire is increased to 29% (an increase of 82.39%), Quizizz becomes the preferred alternative. However, if the weighting of this criterion is decreased to 0%, Socrative is still the preferred alternative. Something similar happens with the criterion Elements of gamification (fun) with impact/motivation in the student body. When the weighting reaches 20% (an increase of 233.33% with respect to the weighting actually assigned) the alternative chosen would be Quizizz, and when the weighting of this criterion is decreased, Socrative is always the alternative chosen. This supports the judgements given by the decision centre that at the lower stages, this criterion would be more important, and probably Quizizz would be chosen above Socrative. When the weighting of Quality of the library of questions is increased to 36% (an increase of 500%), the alternative chosen would be Kahoot!, while if the weighting is decreased from the actually assigned level of 6%, there is no change in the choice of Socrative. An increase to 92% in the criterion Ease of use in class would mean the choice of Quizizz, while a decrease of 3.9% does not change the choice of Socrative. In all these cases it is seen that the increases required in the weightings are too high, bearing in mind the importance of the other criteria; furthermore, a decrease of any amount in the weightings of these criteria does not change the classification of the alternatives in any case. Thus, as a result of these findings, the model is held to be robust. Figure 7 shows the variation in the classification of alternatives as the weightings of the criteria in the model combining fuzzy AHP and MACBETH change. The red vertical line shows the weightings proposed by the decision maker for a criterion (as a percentage). The y-axis shows the valuation of the alternatives as the weighting assigned to the criterion is varied. The x-axis shows the possible weighting of the criterion as a percentage from 0% to 100%. Figure 7 shows how for the criteria Flexibility in the creation of questionnaires, Learning rhythm, and Obtaining results and reports, there is no change in the classification of alternatives for any variation in the weightings assigned to the criteria. Furthermore, in the case of the criterion Ease of use in class, only when the criterion has a weighting of 100% (the other criteria would have a weighting of 0%) the alternatives Socrative and Quizizz achieve a similar valuation; since this is not coherent it is considered inviable. In this case, the criterion Valuation of the questionnaire, with a weighting of 15.90%, the weighting would have to increase to 62.6% (that is, an increase of 293.71%) for there to exist an inversion of the classification of alternatives, and so for Quizizz to take first place. This is similar in the case of the criterion Ability to use just-in-time teaching which would have to increase its weighting by 245.63%; in the case of the Elements of gamification (fun) with impact/motivation of students, would need to increase its weighting by 313.33% and for Quality of the library of questions by 500%, for there be a similar switch in the classification of the alternatives. All these increases are too high to be considered logical, and so the model is concluded to be robust, and small variations in the weightings assigned to the criteria do not change the final classification of the alternatives.

Conclusions
The search for more active methodologies in education has motivated the use of gamification in the classroom. The benefits to be obtained are very numerous and are contrasted with many real experiments. This success has led to the development of many gamification applications. However,

Conclusions
The search for more active methodologies in education has motivated the use of gamification in the classroom. The benefits to be obtained are very numerous and are contrasted with many real experiments. This success has led to the development of many gamification applications. However, the teacher needs to choose between thousands of applications. The literature review carried out shows that there are no models to assist in the decisions dealt with here, which could, however, have a great effect on students' results.
This study, therefore, developed two methodologies using fuzzy AHP and fuzzy AHP combined with the MACBETH approach to consider uncertainty due to the imprecision or vagueness of human judgements in real-world problems. This is the first contribution in the literature combining fuzzy AHP with the MACBETH approach. The geometric mean method suggested by Buckley was used to calculate the weightings used for choosing a gamification application to be used in a course in a Masters programme. The crisp weights from the fuzzy AHP methodology are used to obtain the complete classification of alternatives for the two methodologies described.
The results obtained from both methodologies are similar, showing that Socrative is the most suitable gamification application for the course assessed, rather than Quizizz or Kahoot!. However, it can be seen that if the course were part of the early stage of a degree programme, the choice would probably be in the direction of Quizizz. This is because, working at earlier stages, the motivation required is considered to be higher, and so it is more important to include elements related purely to games.
The result obtained is considered suitable, as it refers to a university-level course, students can obtain feedback about the level of learning reached, but without weighting the gamification elements with respect to learning.
The methodologies, criteria or descriptors and weightings can be used, as explained in this study, in other courses or programmes, or adapted to the specific needs of each of each course.
In future work, the aim is to introduce new gamification applications as alternatives in this course, for example, by using the group decision-making methodology based on card-sorting proposed in Morente-Molinera et al. [46], to reduce the initial set of alternatives to a feasible one that experts can analyse comfortably. It is also planned to apply both methodologies to other university courses in degree and Masters programmes, especially in those cases where there are several lecturers in charge of a class, and a decision group must be formed to obtain qualitative judgements for the weighting of criteria.