3.1. BWM and BBWM
BWM is a subjective decision-making approach rooted in the AHP. Originally developed by [
62], the BWM derives criterion weights through paired comparisons between the most and least preferred criteria or alternatives. Unlike matrix-based methods such as AHP, the BWM follows a vector-based approach, requiring significantly fewer comparisons—only 2
n − 3 comparisons compared to
n(
n − 1)/2 comparisons in AHP. This reduction in comparison burden enhances its applicability while maintaining higher consistency in results. Furthermore, a developed consistency ratio is utilized to assess the reliability of the derived weights. The authors argue that the BWM outperforms AHP due to its improved consistency in pairwise comparisons, making it a more efficient and reliable MCDM method [
62].
A key advantage of the BWM is its reliance on integer-based comparisons, eliminating the need for fractional numbers, which are commonly used in AHP. This characteristic simplifies its practical implementation, increasing its usability across various decision-making contexts. Additionally, while traditional MCDM methods rely on a consistency ratio to assess the reliability of comparisons, the BWM inherently ensures a consistent reliability level as an output, thereby reducing subjectivity-induced errors in the decision-making process.
The BBWM represents a statistical extension of the BWM, introduced by [
17]. Unlike the traditional BWM, the BBWM conceptualizes criteria as random events and their corresponding weights as probabilities of realization [
64]. In this probabilistic framework, data from pairwise comparisons are modeled using probability distributions, enabling a more nuanced and statistically grounded approach to weight determination.
One of the most notable advantages of the BBWM is its ability to capture the collective preferences of multiple decision makers in a group decision-making setting. Unlike conventional aggregation techniques (e.g., arithmetic or geometric averaging), the BBWM combines individual pairwise comparisons into a final probability distribution that represents the overall group consensus [
64]. Furthermore, belief ranking is introduced, assigning a relationship and confidence level to each criterion pair. This ranking is then visualized through a weighted directed graph, which effectively illustrates the interdependencies among criteria.
The BBWM offers a superior alternative to other BWM extensions by integrating probability theory into the weight calculation process, thereby allowing for probabilistic control over criterion ranking. Although still a relatively recent development, empirical studies have already demonstrated its successful applications across diverse fields [
17,
64]. The application steps of the BBWM are systematically outlined in the literature, providing a structured methodological framework for researchers and practitioners aiming to implement this innovative decision-making tool [
17].
Step 1: The main and sub-criteria of the problem are systematically identified and denoted as Cj where j = 1, 2, 3, …, n. These criteria represent the key elements influencing the evaluation and selection process. Simultaneously, the decision makers (DMs) involved in the assessment are defined as DMk, where k = 1, 2, 3, …, K.
Step 2: Each DM determines the criteria they perceive as the most important (Best, CB) and least important (Worst, CW) based on their expertise and experience. These selections reflect the individual priorities of the decision makers in the evaluation process.
Following this identification, each DM assesses the best criterion in comparison to all other criteria. This process generates the best-to-others vector denoted by
AB, which is mathematically represented as shown in Equation (1). This structured approach ensures a systematic and objective weighting of criteria, forming the foundation for further analysis within the Best–Worst Method (BWM) framework.
Similarly, each DM assesses the worst criterion concerning all other criteria. This process results in the formulation of the others-to-worst vector denoted by
AW, which is mathematically defined in Equation (2). By incorporating both the best-to-others and others-to-worst comparisons, this approach ensures a balanced and structured weighting of criteria, enhancing the reliability of the Best–Worst Method (BWM) analysis.
In these vectors, aBj represents the degree of preference of the best criterion over the jth criterion, while ajW denotes the degree of preference of the jth criterion over the worst criterion. The best criterion is always equally preferred to itself, meaning that aBB = 1, and similarly, the worst criterion is always equally preferred to itself, ensuring that aWWW = 1. These conditions provide a consistent reference point for the paired comparisons within the Best–Worst Method (BWM) framework.
Step 3: Unlike the classical Best–Worst Method (BWM), where the AkB and AkW vectors obtained from different decision makers are directly aggregated, the Bayesian BWM adopts a statistical perspective to derive the AB and AW vectors. From this viewpoint, each criterion is regarded as a stochastic event, and the corresponding weights represent the probabilities of occurrence.
Per probability theory, the weight assigned to each criterion (wj) must be greater than zero, and the total weight of all criteria in the decision problem must sum to one. This condition justifies the use of probabilistic modeling in decision making, ensuring that the weight distributions accurately reflect uncertainty and expert preferences.
To model both the input distributions (
AB and
AW) and the output distributions (i.e., the optimal integrated final weights), the multinomial distribution is employed for both the best and the worst criteria. The probability mass function for the worst criterion (
AW) is mathematically expressed in Equation (3), where w denotes the probability distribution and
Aw is the number of occurrences of each event.
In the Bayesian Best–Worst Method (BBWM), the probability of event j is determined using the multinomial distribution, which models the likelihood of different criteria being selected as the best or worst by decision makers. For such a distribution, the probability of a given criterion is calculated as the ratio of its number of occurrences to the total number of trials.
Mathematically, this probability is formulated as shown in Equation (4), ensuring that the derived weights accurately reflect expert preferences while maintaining probabilistic consistency.
In a manner like the previous probability formulation, the weight of the worst criterion (ww) is obtained based on the condition that aWW = 1. This ensures that the worst criterion serves as a reference point in the weighting process.
Considering Equation (4), the relationship required to determine the optimal criterion weights in the BBWM framework can be formulated as shown in Equation (5). This step is fundamental in ensuring consistency in the weighting process, aligning the derived weights with decision makers’ comparative evaluations of the best and worst criteria.
Like the modeling of
AW, the best-to-others vector
AB can also be represented using a multinomial distribution. However, a key distinction arises in the modeling process: the sequence of operations in the pairwise comparisons differs between the best and worst criteria. Specifically, the comparative structure is reversed when evaluating the best criterion against the others. This adjustment is mathematically expressed in Equation (6), where (w) is the probability distribution and refers to the element-wise division ensuring that the probabilistic framework accurately reflects the hierarchical preference relationships among criteria.
Using Equation (6), the expression in Equation (7) can be written similarly:
The methodological approach undertaken in this study involves modeling the inputs of the Best–Worst Method (BWM) using a multinomial probability distribution. This probabilistic framework enables the derivation of criteria weights in MCDM problems through the estimation of a probability distribution. Given that criterion weights must adhere to the constraints of non-negativity and summation to unity, the Dirichlet distribution (Dir) emerges as the most appropriate statistical model, as it inherently satisfies such properties.
To operationalize this, statistical inference techniques are employed to estimate two key components: the optimal integrated weight vector (w
* = w
*1, …, w
*n), representing the aggregated priorities across all criteria, and the individual weight vectors (w
k, k = 1, …, K) corresponding to each decision maker (DM), derived from the probabilistic model by using Equation (8). This approach ensures robustness in weight estimation by leveraging the Dirichlet distribution’s capacity to model compositional data, thereby aligning theoretical rigor with the practical requirements of MCDM applications.
To estimate the optimal criteria weights, the Bayesian Best–Worst Method (BBWM) employs Just Another Gibbs Sampler (JAGS), a computational tool for Bayesian inference that utilizes Markov chain Monte Carlo (MCMC) simulation. This approach facilitates the derivation of the posterior probability distribution for the criteria weight vector (W), which is informed by the aggregated evaluations of decision makers.
The methodological framework is finalized through the application of credal ranking [
17], a metric grounded in the Dirichlet distribution. Credal ranking quantifies the degree of superiority among criteria and evaluates the reliability of these dominance relationships. The results are complemented by visual representations—such as credibility intervals or posterior probability plots—to illustrate the confidence levels associated with the hierarchical ordering of criteria.
Unlike traditional MCDM methods that require a separate consistency ratio, the Bayesian Best–Worst Method (BBWM) incorporates consistency evaluation through its probabilistic structure. Specifically, the credal ranking technique, based on the Dirichlet posterior distributions, allows for the assessment of the confidence level in the dominance of one criterion over another. The resulting credibility intervals, as visualized in
Figure 1 and
Figure 2, serve as statistical evidence supporting the robustness of the derived weights. This built-in mechanism offers a more nuanced and reliable approach to consistency validation in group decision-making settings, as also supported by [
17].
The Dirichlet distribution is selected as the prior for the probabilistic modeling of criteria weights in the BBWM due to its mathematical properties. Specifically, it is the natural conjugate prior for the multinomial distribution and ensures that the generated weight vectors are non-negative and collectively sum to one. Additionally, the Dirichlet distribution provides analytical convenience and computational efficiency when deriving posterior estimates, which makes it particularly suitable for group decision-making scenarios involving normalized weights.
3.2. Implementation
This section builds upon the BBWM methodology outlined earlier to evaluate the criteria for TMS software selection within an integrated decision-making framework. The hierarchical structure consists of five main criteria and sixteen sub-criteria (
Table 2). The BBWM is employed to derive the relative weights of these criteria using a probabilistic group decision-making model that incorporates expert consensus and accounts for uncertainty in preferences. The analysis proceeds in two stages: (1) pairwise evaluation of main criteria relative to one another, and (2) pairwise evaluation of sub-criteria within each main criterion. Once the weight structure is established, the TOPSIS method is applied to assess and rank the TMS software alternatives based on their performance across these weighted criteria. This sequential integration ensures a coherent transition from criteria prioritization to alternative evaluation, allowing the final decision to reflect both subjective expert judgment and objective performance metrics.
3.2.1. Data Collection and Expert Input
To determine the weights of predefined criteria for TMS software selection, a panel of eight business experts, actively engaged in logistics operations, is convened. Participants, averaging 31 years of age with six years of business-specific expertise, contribute evaluations through a structured three-phase protocol: (a) Likert scale assessments where criteria are rated on a five-point scale (1: very low to 5: very high) to quantify their perceived importance; (b) best–worst pairwise comparisons through which each decision maker (DM) identified the most (best) and least (worst) critical main criterion, followed by pairwise comparisons of the best criterion against all others; and (c) construction of two evaluation matrices (best-to-others and others-to-worst).
To ensure methodological rigor, a user-friendly decision support system is designed to aggregate responses, validate consistency, and minimize evaluator bias. The system automates data harmonization across the eight experts, consolidating inputs for both main and sub-criteria into a unified analytical framework.
3.2.2. Bayesian Computation and Results
The aggregated evaluations have been processed using matrix laboratory (MATLAB R2024b), following the Bayesian inference procedure detailed by [
17]. Customized versions of the authors’ publicly available MATLAB R2024b codes (URL-1, 2021) are executed to compute local weights for criteria and sub-criteria, derived from posterior distributions and belief rankings, quantifying the probabilistic dominance relationships between criteria.
The results include
Table 3, presenting the BBWM-derived weights of the main criteria, and
Figure 1, illustrating belief rankings via credibility intervals. These outputs validate the robustness of hierarchical prioritization, aligning theoretical rigor with empirical expert judgments.
The analysis reveals functionality as the paramount main criterion in TMS software selection, whereas software vendor business emerges as the least influential. Notably, cost—ranked second—exhibits minimal disparity in weight scores relative to functionality, yet both criteria demonstrate substantial divergence from software vendor business. This hierarchy underscores the prioritization of software functionality over ancillary elements in business decision making.
As illustrated in
Figure 1, the credal ranking derived from the Bayesian BWM delineates the probabilistic dominance relationships among criteria. Functionality occupies the apex, with directed edges (arrows) originating from it, signifying its superiority over subordinate criteria. Conversely, criteria such as software vendor business, positioned at the nadir, receive edges from all other criteria, reflecting their diminished relative importance. The width and directionality of these edges quantify the confidence intervals associated with each dominance relationship, as modeled by the Dirichlet distribution.
Table 4 delineates the weight scores of sub-criteria underpinning the TMS software selection process, derived via the Bayesian BWM. The hierarchical prioritization regarding the technological competence domain reveals that information flow and transparency attain the highest weight, emphasizing the imperative for software to enable seamless data transfer and operational clarity across stakeholders. Furthermore, ranked second, ease of use underscores the necessity of intuitive user interfaces to minimize learning curves. In addition, demonstrating comparatively lower priority, customized reporting is expected to have a secondary emphasis on tailored reporting features.
With respect to the service domain, reliability emerges as the foremost element, reflecting the criticality of consistent software performance and uninterrupted functionality, followed closely by after-sales support, highlighting the value of sustained vendor assistance post-implementation. Additionally, assigned reduced weights, training and error management are expected to play an ancillary role relative to core service attributes.
Concerning the functionality domain, load tracking dominates as the most critical sub-criterion, prioritizing the software’s capacity to monitor logistics operations in real time. Ranking second, software modules emphasize the need for modular flexibility to accommodate diverse operational workflows. Moreover, software customization displays moderate importance, aligning with businesses’ demands for adaptability.
Speaking of the cost domain, software licenses claim the highest weight, underscoring the dominance of initial investment considerations, followed with notable significance by update cost, signaling concerns over long-term maintenance expenditures. In addition, software module cost is assigned to the lowest priority, suggesting less emphasis on modular pricing structures.
As for the software developer (vendor) domain, industrial know-how achieves paramount importance, stressing the necessity of vendor expertise in business-specific challenges, while references and reputation have lower weights, implying limited influence of vendor accolades relative to technical proficiency.
Figure 2 complements these findings through credal ranking displays, illustrating probabilistic dominance relationships among sub-criteria. The visualizations corroborate the hierarchy in
Table 4, with arrows quantifying confidence intervals for superiority assertions. For instance, information flow and transparency exhibit robust dominance over peers in technological competence, while industrial know-how maintains clear precedence in software vendor evaluations.
3.3. TOPSIS
Leveraging the dataset derived in the previous section, a comparative evaluation was conducted between two software vendors specializing in logistics-business solutions. The analysis employs TOPSIS, a MCDM method introduced by [
65]. TOPSIS operationalizes the principle of identifying alternatives that simultaneously minimize geometric distance to the positive ideal solution (PIS) and maximize distance to the negative ideal solution (NIS) [
66,
67].
Step 1: Objectives are set, and evaluation criteria are defined.
Step 2: A decision matrix (D) is constructed with the help of Equation (9). Alternatives (a
1, …, a
n) are systematically listed, and for each alternative, the corresponding characteristics for each criterion (y
1k, …, y
nk) are recorded, as detailed by [
68].
Step 3: The normalized decision matrix (R) is then constructed. This matrix is derived by normalizing the original decision matrix through computing the square root of the sum of the squares of the criterion scores or attributes [
68]. Equation (10) delineates the normalization process, which ultimately yields the R matrix presented in Equation (11).
Step 4: The weighted normalized decision matrix (V) is created through Equation (12). Each criterion j is assigned a weight (wj) reflecting its relative importance. The weights are applied to the corresponding elements of the normalized decision matrix, as delineated by [
66].
Subsequently, each element within the columns of the R matrix (refer to Equation (11)) is multiplied by its corresponding weight (wij) as specified in Equation (12). This operation results in the formation of the V matrix, as presented in Equation (13) [
66].
Step 5: Ideal (*A) and negative ideal (A-) solutions are obtained. The positive ideal solution is defined as the set of optimal performance values obtained from the weighted normalized decision matrix, whereas the negative ideal solution comprises the least favorable values [
67]. These ideal solutions are computed using Equations (14) and (15).
In both formulas, the symbol I represents the benefit criterion (maximization), and J represents the cost criterion (minimization) [
66]. The results derived from Equation (14) can be expressed as A* = {v1*, v2*, …, vk*}, whereas the outcomes from Equation (15) are represented as A- = {v1-, v2-, …, vk-}.
Step 6: Discrimination measures are calculated. The separation (distance) between alternatives is measured. The distance of each alternative from the positive ideal solution is calculated as in Equation (16) [
66]:
Similarly, the distances from the negative ideal solution are calculated as in Equation (17) [
66]:
Step 7: The relative proximity to the ideal solution is calculated. The relative proximity, denoted as
, is computed using Equation (18) [
66].
Step 8: Alternatives are ranked in accordance with their closeness
to the ideal solution. The maximum
value is selected [
66].
3.3.1. TOPSIS Implementation
This section operationalizes the TOPSIS methodology, as delineated in prior sections, to evaluate and rank two leading Transportation Management System (TMS) software solutions within the logistics business in Türkiye. The analysis leverages the hierarchical criteria framework established before, comprising 5 main criteria and 16 sub-criteria (see
Table 2), with their respective weights derived through the Bayesian Best–Worst Method (BBWM). The methodological integration of the BBWM and TOPSIS ensures a robust evaluation process: The BBWM-derived weights (see
Table 4) reflect the probabilistic dominance relationships among criteria, grounded in expert judgments, and utilizing these weights, TOPSIS calculates the geometric proximity of each software alternative to the positive ideal solution (PIS) and negative ideal solution (NIS), as defined by [
65]. This dual-phase approach harmonizes the strengths of probabilistic weighting (BBWM) and distance-based ranking (TOPSIS), enabling a systematic, transparent comparison of software alternatives against multidimensional criteria. Subsequent subsections detail the computational steps and the results of the final rankings.
TOPSIS Computation and Results
The criterion weights obtained through the TOPSIS method are systematically presented in
Table 5 and
Table 6, highlighting the hierarchical prioritization of both main and sub-criteria. While the two software vendors (assumed to be ABC and XYZ from now on) exhibit varying levels of performance across different indicators, each possessing distinct competitive advantages in specific domains, a comprehensive evaluation necessitated the application of TOPSIS.
As discussed in previous sections, TOPSIS is chosen for its ability to integrate multiple criteria, quantify geometric proximity to ideal solutions, and produce a unified ranking that accounts for both benefit maximization and cost minimization principles (Hwang & Yoon, 1981) [
65]. This methodological approach ensures that trade-offs between conflicting criteria—such as functionality versus cost or vendor reliability versus technical adaptability—are systematically analyzed. Consequently, it facilitates a balanced, data-driven assessment of vendor suitability within the logistics business.
Table 5 presents the initial evaluation scores of the two software alternatives (ABC and XYZ) based on 16 sub-criteria, as assessed by expert opinions. The scores are assigned on a scale from 1 to 10, where higher values denote superior performance in benefit-type criteria, while lower values indicate better performance in cost-type criteria. This table serves as the foundational input for the TOPSIS analysis, providing the raw data necessary for normalization and subsequent multi-criteria evaluation.
The normalized decision table (
Table 6) mitigates the impact of differing units or scales across criteria, ensuring consistency in evaluation. Through vector normalization, benefit-type criteria are positively scaled, while cost-type criteria are inversely normalized. This process enhances comparability across all criteria and prepares the data for the subsequent integration of criterion weights. Normalization represents a crucial transformation, establishing a dimensionless and standardized decision framework essential for robust multi-criteria analysis.
As seen in
Table 7, the alternative software vendors are assessed through the evaluation stages. As a result of this analysis, ABC emerges as the top-performing software supplier, achieving the highest overall score.