Machine Learning-Driven Approach for Large Scale Decision Making with the Analytic Hierarchy Process

: The Analytic Hierarchy Process (AHP) multicriteria method can be cognitively demanding for large-scale decision problems due to the requirement for the decision maker to make pairwise evaluations of all alternatives. To address this issue, this paper presents an interactive method that uses online learning to provide scalability for AHP. The proposed method involves a machine learning algorithm that learns the decision maker’s preferences through evaluations of small subsets of solutions, and guides the search for the optimal solution. The methodology was tested on four optimization problems with different surfaces to validate the results. We conducted a one factor at a time experimentation of each hyperparameter implemented, such as the number of alternatives to query the decision maker, the learner method, and the strategies for solution selection and recommendation. The results demonstrate that the model is able to learn the utility function that characterizes the decision maker in approximately 15 iterations with only a few comparisons, resulting in signiﬁcant time and cognitive effort savings. The initial subset of solutions can be chosen randomly or from a cluster. The subsequent ones are recommended during the iterative process, with the best selection strategy depending on the problem type. Recommendation based solely on the smallest Euclidean or Cosine distances reveals better results on linear problems. The proposed methodology can also easily incorporate new parameters and multicriteria methods based on pairwise comparisons.


Introduction
According to the Paradox of Choice [1], the greater the number of alternatives available, the harder the decision process is going to be. The Decision Maker (DM) needs to have different alternatives to choose from. However, if the alternatives are too many, the process can become time-consuming and tedious; also, the evaluations may be inconsistent over time [2,3]. Furthermore, in many projects, such as engineering problems, the DM (or user, manager, consumer, specialist, etc.) is required to participate in several stages of the process, requiring even more time to obtain the ranking of the different alternatives.
In decision theory, there are several Multicriteria Decision Making (MCDM) methods that help to rank the existing solutions for multifaceted problems. The ones based on the Multi-Attribute Utility Theory (MAUT) subclass assume the existence of a degree of utility that reflects the preferences of the DM. It is often used to identify trade-offs and to obtain a utility value for items or alternatives over more than one criterion in a consistent way.
In the MAUT subclass, the Analytic Hierarchy Process (AHP) [4] is the best-known and most widely used method [5][6][7][8][9][10]. It is based on pairwise comparison matrices, a divideand-conquer technique that analyzes two alternatives at a time in order to determine their relative utilities or preferences. However, the high number of queries (NQ) that the DM is required to make at once to construct the comparison matrices can make the method complex and unfeasible for large-scale problems [2,[11][12][13][14]. The main criticisms of the use of this method are related to the high cognitive effort [15], inconsistency [2,3,11,16,17], and the time required by the specialists [2,15].
The AHP described later in Section 2.1 has been applied in different areas such as the polymer extrusion process [18], sustainable supplier selection [9], engineering [5,19], operational risk in power substations [20], online shopping [10], plant location [21], undergraduate elective course planning [22], the management of architectural heritage in smart cities [23], supplier selection in the automotive industry [24], universities rankings [13], and others [2,5,25,26]. However, this method suffers from a notable drawback: it requires n × (n − 1)/2 comparisons of n alternatives for each criterion to solve the decision problem [11]. The time investment of experts, the required cognitive effort, and the possibility of ambiguity in the judgments pose challenges to using this method in large-scale problems [2,[10][11][12]15,25].
Learning the DM's preferences has been an alternative to making the MCDM methods, particularly those based on pairwise comparisons that are practical to use. Comparisons take a long time to perform, and currently, there are no guarantees on scalability in larger problems [2,15]. In companies, data scientists and statisticians often need to find simpler ways to present the best alternatives to managers. This is achieved by excluding dominated solutions, selecting the most diverse ones on the Pareto front, and those closest to the utopian or another reference point to facilitate the sorting of the solutions from the least to the most desirable. Nevertheless, the problem remains difficult, especially for Multi-Objective (MOO) (two and three objectives) and Many-Objective Optimization problems (MaOP) (four or more objectives) [27] in which the alternatives are all in the Pareto front and the preferences are not known.
Accordingly with Tuljak-Suban and Bajec [12], "decision makers need a relatively simple, reliable method which is not terribly time consuming". For this, an approach that couples the AHP multicriteria method and online machine learning to facilitate the task of obtaining preferences is proposed in this paper. The scalability is improved by presenting fewer alternatives to the DM. In the classical approach, all of the solutions are presented and evaluated at once. It is not feasible for large-scale problems, since those decisions must be taken rapidly and always with timely information.
To surpass these challenges, we employ a machine-learning-driven approach that acts as an interactive process. Between the two phases, small subsets of solutions are chosen using different strategies to query the DM and to capture their preferences. An existing regression method is applied to learn these relationships while trying to predict the remaining ones. In other words, the model learns the Multi-Attribute Utility Function (MAUF) that represents the DM, but it only considers some comparisons per iteration. This reduces the number of comparisons compared to the original method. To measure the agreements and disagreements between the rankings, the Kendall tau (KDT) distance [28] is used, as in [15,29]. Similar rankings mean that the model was able to approximate the DM's behavior.
In the related literature, the learning phase is viewed as an offline process [30], the preferences relations are treated as binary [29,31], and there are no strategies in the selection of new alternatives [15]. Although it reduces the problem dimensions, it also restricts the preferences between one alternative to another, rather than a range of preferences, and it may still require many evaluations compared to the original method.
The main contributions of this paper are summarized as follows. 1.
The proposal of a new version of the AHP method to make it scalable; 2.
Recommendation strategies and different solution selections are applied in an interactive process to learn more about the DM's preferences; 3.
A reduction in the number of solutions that need to be evaluated; 4.
A reduction in the time and cognitive effort to evaluate the solutions until solving the decision problem; 5.
Re-use of the trained model without new queries to the DM is possible in problems with the same domain.
The remainder of this work is as follows: Section 2 presents the bibliography review, introduces the AHP method and related works; Section 3 details the proposed approach; Section 4 presents the results and a discussion of the main findings; and Section 5 concludes the paper and points out new directions.

Introduction to the Analytic Hierarchy Process
MCDM is a two-part method that includes MOO and MaOP problems in the first part, and Multiple Criteria Decision Analysis in the second part [32]. Optimization plays an important role in the design cycle, and solving large-scale problems poses challenges among practitioners [27,33]. Modeling the problems under multiple objectives and disciplines is known as MCDM [32]. Fundamentally, the goal of these methods is to solve a decision problem and to help the DMs choose a solution that best portrays their preferences among all of the objectives.
In general, these methods are structured in a two-dimensional matrix, D nxm , as in (1), where C j is the j-th criterion, a i the i-th alternative solution, and x ij = C j (a i ) the evaluation of a i under C j .
Some methods normalize/standardize these values using the Min-Max normalization where µ is the mean and σ is the standard deviation. Then, these scaled values are multiplied by a vector of weights W = w 1 , . . . , w m that should represent the importance of each criterion, where w j 0 and ∑ j∈m w j = 1. Others, such as the AHP, are able to elicit its weights directly, also evaluating both the quantitative and qualitative criteria [25]. This integrated approach is often used in the literature, where analysts use the weights extracted from the AHP as an input to another method that relies on the decision maker to inform the importance of each criterion; see for instance [5,8,9,12,22,39].
Pairwise comparisons are a vital part of the prioritization procedure in AHP. When conducting an assessment, the decision problem is built in a hierarchical structure to obtain the D matrix. Then, the nine-point Likert scale known as Saaty's scale presented in Table 1 is used. For each criterion/objective, the DMs explicit their numeric and gradual preferences for each pair sampling. Absolute importance 1/9 (0.111) Source: Adapted from Saaty [40].
The steps of the AHP, adapted from its creator Prof. Saaty [4], are explained below: 1.
Structuring the hierarchy: The problem is decomposed into three parts: goal, subcriteria, and alternatives, in a hierarchical form. The approach of the AHP involves the structure of any complex problem into different hierarchy levels intending to accomplish the stated objective of the problem.

2.
Perform pairwise comparisons: Construct a matrix of pairwise comparisons for the set X of alternatives where the entries indicate how much the DM prefers a solution or criterion to another, using the description or the importance value from Table 1. At this point, it is important to highlight that the NQ is n × (n − 1)/2 evaluations for each matrix n × n. Depending on the number of alternatives n and/or criteria j, it could be a lengthy process.

3.
Calculate the consistency of the DM's assessments: The matrix of evaluations is considered to be consistent if all of its elements are transitive and reciprocate, such as x ij = x ik × x jk and x ij = 1/x jk where i, j, and k are any elements of the matrix Q ∈ X , and if i = j (main diagonal), then x ij = 1. To obtain the relative priority of each criterion, a normalized matrix is defined according to Equation (2).
where ∑ n i=1 x ij is the sum of the elements in column j. Then, the relative weight of each row is computed by dividing the sum of the values of each row by the number of elements n, see Equation (3).
Calculate the eigenvector for each comparison matrix. The maximum eigenvalue (λ max ) is a measure of consistency within the pairwise comparison matrix [17]. To obtain the λ max , also called the Eigenvalue problem [24], simply calculate the arithmetic mean of the elements of the vector. The largest eigenvalue is greater than or equal to n (λ max ≥ n). The closer λ max is to n, the more consistent is Q. The Consistency Index (CI) is given by Equation (4).
Calculate the Consistency Ratio (CR): The CR is calculated according to Equation (5).
where RI is a consistency index, proposed in the seminal work of Prof. Saaty [4], and is largely discussed in the literature [41]. Accordingly, with Saaty [4], if CR ≤ 0.10 the level of inconsistency in the judgments made by the DM is acceptable. Otherwise, the process needs to be redone, partially or fully. 5.
Synthesizing the results: The pairwise matrices are synthesized to calculate the overall priorities for the alternative solutions. Sort the priorities and select the alternative with the highest priority.
The most exhausting manner to solve the decision problem is to take all the solutions to the DM, at once, and ask him/her to evaluate them. This leads to a scalability problem when the number of alternatives and/or objectives is large [10,14,25]. This approach also carries a greater risk of inconsistency during the elicitation process, which can increase the number of evaluations needed and further complicate the process [2,10,11,15,37]. As a result, the size of the pairwise comparison matrix remains a limitation.
The complexity of dynamic systems has prompted efforts to improve the scalability of the decision making process in the literature. There are many possible approaches, such as data reduction [32], the selection of regions of interest (ROI) in the Pareto front [29,42], offline learning [30], and the selection of the prominent alternative [43]. However, they do not use the benefit of online learning to capture the DM's preferences by presenting only a few solutions at a time in an iterative process.

Decision Maker Preference Learning
The term preference may assume different contexts, but here, it is interpreted as subjective comparative evaluations that translate the DM's specific desires into a declarative way [44], subjectively, since it is typically attributed to a human. It is comparative because the evaluations are expressed as an item relative to another item. Additionally, it is evaluations because it concerns matters of value, typically concerning practical reasoning.
Learning, or eliciting preferences, is a central concept in decision making, and they can be obtained explicitly, such as ratings, statements, or queries, or implicitly, such as through user observation or inference. In the end, it generates utility functions from single to complex [15]. Fürnkranz and Hüllermeier [45] described different types of preferences that follow some properties presented in Salvatore [46]. Preference learning, in turn, has emerged as a new sub-field of machine learning (ML) dealing with the learning of (predictive) models from observed, revealed, or automatically extracted information [45]. It has been successfully used in Decision Theory and Multicriteria Problems [44,45]. It is often applied to form a total order relation on a collection of alternatives [44], also called ranking problems.
In online learning, the number of alternatives presented to the DM for review must, necessarily, be very limited and the data becomes available in sequential order. It is different, for instance, from active learning (or query learning [47]), which is a weak supervised learning technique where both labeled and unlabeled data may be used, and offline learning (or batch learning), where data are collected and the model is trained once [47].
In the learning phases, the DM is demanded to maximize a utility function U , indicating the priority between two alternatives {i, j} ∈ X with respect to each criterion in the sequence of t iterations [48]. The learner, then, makes a prediction p t , then the correct answer y t , taken from a target domain Y, is revealed and the learner suffers a loss l(p t , y t ). For binary (yes/no) answers and predictions, namely Y ∈ {0, 1}, is called online classification [48]. In regression problems, the focus of this research, X ∈ R d , corresponds to a set of features that represents the solutions in the variable space, and Y ∈ R. After the learner prediction, a loss function is computed to measure the difference between p and y. The most common loss functions are: Mean Absolute Percentage Error (MAPE), Mean Squared Error (MSE), Root MSE (RMSE), and Coefficient of Determination (R2) [49].
Those are the explicit manners of gathering the DM's preferences. However, these evaluations can also be obtained implicitly or even inferred. To improve the learning process and to reduce the number of queries, it may be beneficial to personalize the presentation of new solutions based on the decision maker's past evaluations and preferences.
Chen and Lin [30] stated that the key process for solving an MCDM problem is to capture the preference structure of the DM. Machine learning-driven approaches have made great progress because they enable the learning of the utility function, regardless of its structure or property. These authors developed an interactive Decision Neural Network (DNN) architecture to model the DM's preferences. The scalability is intrinsic in their approach, since some solutions are chosen to test whether the DNN model is satisfactory. If the trained model is not consistent with the results given by the DM, they suggest adding new solutions and retraining the algorithm. Chen and Lin [50] proposed a DNN with a "twin-topology" addressed to MOO problems in the search for the most desirable solution. These approaches are quite similar to that proposed by Alves et al. [15] which suggested an online learning methodology based on the Extreme Gradient Boosting (XGBoost) algorithm. The Kendall tau (KDT) [28] distance was used to evaluate model convergence through iterations.
Pedro and Takahashi [14] proposed a Multilayer Perceptron (MLP) architecture to capture information from the DM and to model a utility function based on a partial sorting process. Later, these authors proposed in [51] a method called Neural Network Decision Maker (NNDM) that uses the MLP and queries to approximate the utility function, extracting the DM's preferences. Mendonça et al. [31] extended the NNDM to another version, called NNDM-2, for portfolio optimization. In this research, the authors proposed a multiobjective financial portfolio optimization model analyzed with the decision methods NNDM and NNDM-2, considering the investor risk profiles of conservative, moderate, and aggressive. Although these proposals are contributions to the computational finance area, they do not focus on learning pairwise matrices and reducing the DM's cognitive efforts.
Even with the reduction in pairwise comparison matrices, many-objective optimization problems (MaOPs) can demand high computational effort and require good visualization techniques to make the decision process simple and efficient. In this direction, Pedro and Takahashi [29] worked on a selection of alternatives in the ROI. The DM interacts with the method by evaluating the alternatives, and a Radial Basis Function network is trained to construct the preference function. By limiting comparisons to a specific region of the Pareto front, this strategy helps to avoid redundant and unnecessary queries.
These works are quite interesting and provide relevant approaches to reducing the number of assessments, but most of them suffer from scalability, bias, and ambiguity in the comparisons. These factors may lead to inconsistency or even the lack of smart strategies for presenting new solutions to the DM over the iterations.

Proposed Approach
This section introduces a novel technique that improves the AHP multicriteria decision making method with machine learning to gain scalability, as illustrated in Figure 1.
The scalability is improved by reducing the number of alternatives presented to the DM. It follows the main arguments of the Paradox of Choice [1], leading to two benefits: first, it shows that the decision making process is easier with fewer options and, second, it facilitates the task of eliciting preferences.
In this way, instead of presenting all the solutions at once and asking a DM's preferences, it works iteratively with only a few alternatives at a time to be compared. A machine learning regression algorithm is invoked to learn these preferences and, after some iterations, the model is expected to be able to infer the behavior of the DM and to guide the search for an optimal solution to the decision problem.
The approach follows the steps detailed below. The numbers in the list correspond to the numbers on each box shown in Figure 1.

1.
Set of available solutions X .
In this paper, we use the Generalized Position-Distance (GPD) [27] tool to simulate different decision problems, as illustrated in Figure 2, to validate the applicability of the proposed methodology. We vary the number of objectives (two, three, and seven) and the surface of the regions in the Pareto front (convex, linear, and discontinuous). It allows us to simulate problems such as those that managers in organizations deal with on a daily basis. From a set of available alternatives, the managers have to evaluate these alternatives to choose the one that best meets their desires. The best solution using the AHP method is highlighted.

2.
Select q alternatives from X . When the DMs start evaluating solutions, very little is known about their preferences. Therefore, we implement two ways of choosing the initial solutions: random and clustering. In the former, q is randomly chosen, such that q ∈ X . The latter builds q clusters and chooses one alternative from each cluster. It may guarantee diversity and preserve diversity by presenting alternatives in different regions of the Pareto surface.

3.
Query the DM and build the training set. The training set is composed of all combinations of Q ⊂ X to gather the utilities. For each pair, ask the DM: "How much do you prefer the alternative i over j for objective k?". Following the illustration (3) in Figure 1, suppose that the first two alternatives are q = [A 1 , A 2 ] in the blue part. Using Table 1, the DM states that A 1 has very strong importance over A 2 in objective 1, which reveals that u(A 1 )/u(A 2 ) = 7 , then u(A 2 )/u(A 1 ) = 1/7 . In addition, it is said that A 1 has moderate importance over A 2 in objective 2, which leads to u(A 1 )/u(A 2 ) = 3 and u(A 2 )/u(A 1 ) =

5.
Make the predictions on the test set. The test set consists of all pairs of the remaining alternatives, X \ Q. The trained model is used to predict the label for this entire set, simulating the DM's behavior. Note that the alternatives to be evaluated are those that were not selected in step (3). However, as an iterative process, more alternatives are added to Q, and once the training set increases, the test set decreases. 6.
Apply the AHP and generate the ranking. In this step, the classical AHP [4] is applied to generate the total ordering of the alternatives in X . To do this, the method merges both of the preferences given according to step 1, and those predicted by the machine learning model in step 5. 7.
Compute the convergence measure. The KDT [28] defined in Equation (6) was used as a convergence measure. This metric measures the dissimilarity between the two rankings lying in the interval [0, 1]. The lower the dissimilarity between the ranking generated from the model's predictions at iteration t and the ranking generated at iteration t − 1, the greater the model's ability to approximate the DM preferences.
where τ 1 (i) and τ 2 (i) are the rankings for the elements i in the indexes. K(τ 1 , τ 2 ) is 0 if the lists (in our case, the rankings) are identical, and 1 otherwise. Additionally, the MAPE, MSE, RMSE, and R2 regressor metrics [49] are computed in order to assess overfitting and the preferences predicted by the model. 8.
In this work, a KDT that is less than or equal to 5% between the iteration t and t − 1 was defined as the stopping criterion. To obtain the model that best fits the utility function U that represents the DM, the algorithm is retrained and tunned whenever it does not reach the desired the minimum similarity between the rankings. This is performed so that the model can reduce the number of iterations until the stopping criterion. 9.
Recommending new solutions to be compared by the DM. Depending on the utility function that represents the DM, it is interesting to explore new regions of the decision boundary, to select and to recommend new solutions that he/she is going to likely be interested in, and also in a way in which it is going to positively impact the performance of the model. To do this, a parameter θ was implemented that represents the percentage of random solutions picked from the second iteration. θ = 0.0 indicates that the recommended solutions are going to be based on either the Cosine distance or the Euclidean distance [52], and θ = 1.0 means that all solutions are going to be randomly recommended.
Finally, Table 2 details the parameters and hyperparameters of the proposed approach, defining the expected values that are currently implemented. New ones can be easily added. The baseline was the classical AHP [4]. The NQ required until the ranking reaches the stop criterion is computed based on both the proposed approach and the original method.

Results and Discussion
This section initially focuses on discussing the results regarding the optimization of the hyperparameters described in Table 2. The ranking obtained by the proposed scalable method is presented and is discussed further at the end. It is demonstrated that the proposal is effective for different problems and that provides satisfactory results when compared to the classical method.
The results are presented in subsections for sake of organization. Section 4.1 describes the case studies, Section 4.2 shows the learners' performance on each problem, Section 4.3 details the effects of choosing initial solutions via cluster or at random, Section 4.4 analyzes the results of the model convergence when the recommended solutions are based on Euclidean or Cosine distance, Section 4.5 explains the best choice of the q, Section 4.6 shows how θ guides the local search to minimize the NQ until the stop condition, Section 4.7 provides a ranking obtained with the proposed approach, and Section 4.8 gives comparisons with other works, and future directions.

Case Studies
The problems PF1 to PF4 illustrated in Figure 2 are used to validate the applicability of the proposed approach. They represent different types of problems that managers may face on a daily basis. Then, the hyperparameters are explored one at a time, to improve the learner's performance and, consequently, to reduce the NQ. The analysis of the results is carried out over four runs of each experiment.

Machine Learning Methods Performance
The ML methods GBR, Lasso, ElasticNet, Ridge, and RF were applied to learn preferences for the problems PF1 to PF4. Although the ML algorithms vary, the other hyperparameters are kept fixed, that is, q = 5, θ = 0.2. The initial recommendation was selected based on clustering, and the remaining solutions in the smallest Cosine distance.
Both the GBR and RF-based models reached the stop criterion first at the 10th iteration for the decision problem PF1. From PF2 to PF4 was RF at iterations 11, 15, and 10, respectively. Table 3 shows the order of models in each problem. This order is based on the number of iterations that each model spent to reach the stopping condition. Table 3. Selection of the best ML method for each decision problem. The order of the models and the iteration where the best one (in bold) achieved the stop criterion.

Problem
Order Iteration

Initial Selection of the Solutions
The first alternatives chosen for the DMs to express their preferences can be selected in two ways: randomly or via cluster. Both have pros and cons. The former allows a greater possibility of choosing one or more solutions that are already in the region where the global optimum is located. On the other hand, it does not guarantee the coverage of the entire Pareto front region as the latter. In the MOO and MaOP problems, diversity on the PF is crucial [27]. Figure 3 shows the learning rate for problem PF3, which behaved differently than the others. The ranking stability throughout the iterations was greater when using cluster to select the first q solutions. In the figure, the lines with dots and the shaded area represent the mean and the 95% confidence interval, respectively. A smaller shaded area indicates that the trained model consistently predicted the preferences more accurately in that iteration. Additionally, the rankings generated between iterations was closer, as seen at t 5 .
The random choice of initial solutions resulted in a larger standard deviation over the iterations, since it does not guarantee the coverage of the entire Pareto surface as the cluster with q = 5 possibly does. This problem represents the DTLZ7 function whose surface is also highly nonlinear and exhibits many discontinuities. These characteristics make it difficult to predict the learner behavior and they pose more challenges to the model.

Similarity among the Recommended Solutions
This part refers to the selection of alternatives during the interactions with the DM. These alternatives are picked iteratively from the set X \ Q, based on proximity metrics. This analysis considers the best models mentioned above, initialization with cluster, q = 5, and θ = 0.2. Euclidean and Cosine are two values used as reference distances between solutions.
The main difference also occurred in problem PF3, as illustrated in Figure 4. It is possible to observe that the model that selected the alternatives based on the shortest Cosine distance outperformed the Euclidean one. The first reached the stopping condition in iteration t 10 , and the second in t 12 . The Euclidean distance is a measure of the distance between two points in a Euclidean space, while the Cosine targets the similarity between two vectors. A possible explanation for this difference is that the Euclidean distance is known to perform poorly in high dimensions [53], or even in discontinuous surfaces. This may cause the later convergence and larger standard deviation in predicting preferences at the final iterations. More studies can be carried out to specifically investigate this difference.
In this work, these recommendation strategies are utilized to minimize bias during the process of selecting the alternatives that will be presented to the decision maker. They identify the most relevant solutions for the DM, leading to more informed choices.

Number of Alternatives Presented to the DM
For this analysis, we considered q = [3,4,5,7,10], with a focus on the minimum NQ. The most appropriate hyperparameters from the previous analyses were taken into consideration. Figure 5 exhibits the results for the different values of q. By utilizing KDT as a reference, the models reached the stopping criterion at iterations 5, 7, 10, and 11 for PF1, PF2, PF3, and PF4, respectively. Although more solutions can bring more information to the learner, a slight improvement in the model's performance is observed by increasing q. On the other side, more comparisons are necessary. Fewer interactions translate into less effort and timeconsumption from the DM and, consequently, a faster selection of the preferred solution. It is also worth mentioning that there are greater chances of obtaining consistent assessments and achieving the appropriate CR (5).
The NQ until the learner reached the stopping condition is described in Table 4. Based on these problems, it can be assumed that the most appropriate number of alternatives to present to the DM is three at a time, which requires three queries per objective. It should be clear that, in practical terms, the managers should allocate the availability of time and effort to the analyst to perform such assessments. In this proposed approach, q can be easily modified according to the business needs.
Based on the Paradox of Choice main argument, the lower value is better. In other words, a smaller q implies fewer comparisons, even if this necessitates more interactions.
After that, the model learns the utility function and can predict the preferences for the remaining alternatives and build the final ranking.
Two limitations of AHP are addressed in this point: complexity and limited scope. Complexity is reduced since the number of required comparisons is smaller. The scope of AHP applications is improved because scalability allows for complex problems to be broken down and solved iteratively.

Strategies for Local Search
The hyperparameter θ guides the local search towards more promising solutions picked in each iteration t. The search can be from totally random (θ = 1.0) to only distancebased (θ = 0.0)-see Section 4.4. Considering q = 3 from the previous analysis, we vary θ to test the search for [0 − q] solutions.
The models converged similarly in problems PF1 and PF3. However, an interesting finding is observed in problems PF2 and PF4-see Figure 6. The search for purely distancebased solutions was more efficient in terms of reaching conditions with fewer iterations and, thus, less NQ. For problems whose decision boundaries are nonlinear and/or discontinuous, such as PF1 and PF3, a randomness factor can help the learner to find new search spaces and to escape from local minima. However, in linear ones, this problem is minimized, since convergence is the major challenge.

Ranking with the Scalable Approach
Since this paper deals with solving decision making problems with many criteria/objectives, the final ranking is expected at the end of the process. To illustrate an example, the ranking obtained for the PF2 problem with θ = 0.0 presented earlier in Section 4.6 and Figure 6a is going to be used.
Notice that the RF-based model required eight iterations to learn the preferences. After that, it predicts the preferences among all the remaining pairs. Although only up to 18 iterations are shown, the model can be applied to all X /q iterations without requiring new queries or retraining. The ranking obtained with the predicted preferences at iteration t 18 is presented in Figure 7. The model created with θ = 0.0 requires six assessments per iteration. At t 18 , the preferences predicted by the model generate a ranking that is quite similar to that provided by the analytical AHP using the entire set X . The arrows indicate the swaps between the indices that are necessary to obtain the exact ranking.  Using the AHP method, the best alternative is 169. It was ranked 2 using the proposed approach. It is possible to notice that the best solutions predicted by the proposed method are in the same region where the best AHP solution is.

Other Analysis and Directions
As argued before, preferences can be obtained either explicitly-as in the case of querying the DMs, or implicitly-when it is collected and/or inferred. Step 3 of the AHP, see Section 2.1, explains the consistency of the preferences. That transitivity is now used to obtain implicit relations. In each objective, the difference between A i , A j , A j , A q (recommended), and inferred to A i regarding A q is computed. Thus, the preference is simply the difference between A i and A q . For this, q − 1 are recommended, and one is used to extract the preferences. Based on some rounds of experiments, it was noticed that this strategy did not help the model to reduce the number of iterations to reach the stopping criterion. The principal justification is that the number of data entered implicitly at each iteration grows linearly with the number of training samples. It means that in the first iterations, as the method converged quickly, there was not yet much data imputed in a way that accelerated the model's convergence. The authors believe this strategy may work for more complex problems and can be investigated in future analyses.
A comparison with other works in the literature was conducted. We analyzed the proposed method by Alves et al. [15] and the classical AHP in [4]. In the former, the proposal is more simple. The initial selection is based solely at random, as well as the recommendation, which is equivalent to ours when θ = 1.0. Based on the results discussed in Section 4.3, the cluster strategy shows less variance in the problems such as PF3 and PF4. In addition, in Section 4.6, distance-based recommendation works better for problems with linear boundaries. Thus, the results of this current research are expected to outperform those presented in [15] in these cases. Compared to the latter, our proposal requires only Q alternatives to be evaluated in subgroups of size q for a few iterations. AHP, on the other hand, requires the evaluation of X alternatives in a single round.
This approach is an alternative for solving large-scale multicriteria decision making problems, where the number of alternatives and/or criteria is very large, making it hard for the decision maker to compare solutions. The methodology is customizable and can be incremented with new features and problems. Investigations aimed at reducing dimensionality and then reducing DM effort (cognitive or in NQ), correlation effect between criteria, consistency indicators, and others can be incorporated to acting as new hyperparameters and functionalities, for instance [3,10,33,54].

Conclusions
Managers and organizations are looking for tools that will speed up the decision process. However, methods such as the AHP, based on pairwise comparisons, make this difficult to achieve. An approach to making the classical Analytic Hierarchy Process method scalable is described in this paper. Instead of presenting all of the solutions to the DMs at once, it is achieved through successive iterations with help of a machine learning method to learn the preferences and to predict the remaining ones.
The scalability is improved through the optimization of the hyperparameters, and it acts directly on both the reduction in queries and the probability of inconsistency in the evaluations. In addition, it indirectly affects the wasted time and the possibility of having an automated model to use in new problems of the same domain.
The methodology has the advantage of different parameters that can help to further explore the decision problem and to accelerate convergence. A higher number of interactions with the DM allows for better convergence to the desired location on the Pareto front. However, as the number of queries depends on the number of interactions, analysts and DMs need to agree on a stopping condition that meets the organization's needs. In this article, the KDT was used as a merit function, and new ones can be implemented. Once the model has been trained, it can be reused without requiring new queries to the DM, except when the domain changes.
The parameters were analyzed one at a time on four problems with different shapes, including convex, linear, and discontinuous. Other findings include the number of solutions that require fewer queries (q = 3) and that searching for solutions based on the shortest distance tends to accelerate the models' learning when the problem has a linear surface. When θ is set to 0.0, the search is distance-based and tends to be more efficient in terms of reaching conditions with fewer iterations. In practical applications, the DM's effort is reduced from n × (n − 1)/2 to q × (q − 1)/2 assessments, with q n, in approximately 15 iterations.
Among the criticisms made of the AHP method, this article directly addresses the complexity, limited scope, and bias; and it addresses the subjectivity indirectly, and it does not address the lack of transparency. Scalability acts on complexity and scope limitation. Between two question phases, the model can learn the utility function with fewer alternatives. Bias is reduced using the recommendation strategies, and it is also controlled by the parameter θ, focusing on the most relevant solutions. Subjectivity is improved by reducing the number of comparisons at a time, which decreases the chances of inconsistent evaluations. However, the uncertainty or vagueness during evaluations is not directly discussed, and it may be a target of future studies, such as the application of fuzzy logic or scenarios investigation. In addition, the AHP method may be difficult to understand or to explain to stakeholders, causing a lack of transparency. Future improvements may involve, for instance, explainability methods.
For future research, we also suggest the implementation of a grid-search to automate the process of tuning the parameters, and we extend the methodology in more problems, either with artificial or real data. Furthermore, evaluating solutions in MaOPs is a very hard task, from objective function calculation, and visualization, to choosing the best solution. The proposed approach can support this class of problems, and it was modeled in a manner such that other features can be easily inserted, such as new ML algorithms, recommendation and solution selection strategies, and even other multicriteria methods based on pairwise comparisons.

Data Availability Statement:
The data supporting the findings of this study are available at: https: //github.com/mvoicer/doutorado.