You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

30 January 2025

Extension of Interval-Valued Hesitant Fermatean Fuzzy TOPSIS for Evaluating and Benchmarking of Generative AI Chatbots

Department of Management and Quantitative Methods in Economics, University of Plovdiv Paisii Hilendarski, 4000 Plovdiv, Bulgaria
This article belongs to the Special Issue Generative AI and Its Transformative Potential

Abstract

To aid in the selection of generative artificial intelligence (GAI) chatbots, this paper introduces a fuzzy multi-attribute decision-making framework based on their key features and performance. The proposed framework includes a new modification of the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS), adapted for an interval-valued hesitant Fermatean fuzzy (IVHFF) environment. This TOPSIS extension addresses the limitations of classical TOPSIS in handling complex and uncertain data capturing detailed membership degrees and representing hesitation more precisely. The framework is applicable for both static and dynamic evaluations of GAI chatbots in crisp or fuzzy assessments. Results from a practical example demonstrate the effectiveness of the proposed approach for comparing and ranking GAI chatbots. Finally, recommendations are provided for selecting and implementing these conversational agents in various applications.

1. Introduction

Generative Artificial Intelligence (GAI) chatbots, also known as conversational agents, are becoming an increasingly prevalent type of chatbot worldwide for several reasons. Advancements in AI and natural language processing have significantly enhanced their capabilities [1]. These intelligent chatbots leverage large language models (LLMs) to generate human-like responses in natural language, enabling more dynamic and contextually appropriate interactions compared to their traditional rule-based predecessors [2,3].
Additionally, the rise of digital communication platforms and the growing demand for instant customer service have driven businesses to adopt GAI chatbots. These systems provide a cost-effective way to deliver customer support, answer queries, and even facilitate transactions.
The COVID-19 pandemic further accelerated the adoption of remote work and virtual communication, increasing the demand for AI chatbots [4]. They have proven instrumental in managing the surge of online interactions, such as handling customer inquiries, scheduling appointments, and disseminating information.
The flexibility and scalability of GAI chatbots make them suitable for diverse industries, including construction [5], healthcare [6], finance [7], and e-commerce [8], but each industry may face unique regulatory, operational, or technological constraints. By tailoring chatbots to the specific requirements and integrating them into existing systems, organizations in many sectors can potentially enhance productivity and user experiences [9].
Market research predictions confirm a growing adoption of AI chatbots in the coming years. According to a Statista forecast [10], the global chatbot market is projected to reach approximately USD 1.25 billion by 2025—a nearly fivefold increase from USD 190.8 million in 2016. Meanwhile, Gartner estimated that over 80% of enterprises will leverage GAI APIs or applications by 2026 [11], highlighting the rapid and widespread adoption of advanced AI technologies for enhancing business efficiency, innovation, and customer experiences. However, this growth also presents challenges in areas such as data privacy, ethics, and addressing skill gaps.
Despite their growing popularity, GAI chatbots face several challenges to widespread adoption. Key obstacles include the following:
  • Lack of trust: Users may hesitate to fully trust AI-powered chatbots, especially when dealing with sensitive information or complex interactions. Building trust in the accuracy, security, and reliability of these systems is critical for their broader acceptance;
  • Limited understanding and awareness: Many users are unfamiliar with the capabilities and benefits that GAI chatbots can provide. This lack of knowledge or understanding about how they function and what they offer may hinder adoption;
  • User experience and satisfaction: Poorly designed chatbots can lead to unsatisfactory user experiences. Frustrating interactions or failure to resolve queries effectively may discourage continued use;
  • Cost and ROI: Developing and maintaining GAI chatbots can be expensive for small- and medium-sized enterprises. Organizations must carefully assess the return on investment (ROI) and weigh costs against potential benefits;
  • Ethical and bias concerns: GAI chatbots are only as reliable and fair as the data they are trained on, which can sometimes perpetuate biases or unfair practices. Ensuring chatbots are ethical, unbiased, and inclusive is important for their acceptance and broader implementation.
Overcoming these barriers will require advancements in technology, increased transparency, education, and a focus on user-centric design. To address the first three challenges, multi-criteria decision-making (MCDM) methods can be employed. These techniques enable organizations to compare a finite set of decision alternatives across various criteria, helping them select the most feasible option. MCDM methods have been successfully applied in several GAI-related fields, such as technology selection [12] and cloud system prioritization [13].
While conventional MCDM methods are reliable, they often struggle to address the complexities associated with imprecise and ambiguous evaluations. In contrast, fuzzy-based methods are specifically designed to manage such uncertainties, making them more effective in identifying the most suitable alternatives.
Various MCDM techniques have been enhanced through the integration of fuzzy sets and their advanced extensions [14]. By incorporating fuzzy assessments, these methods provide a more accurate representation of real-world conditions, thereby improving the reliability of rankings in scenarios characterized by subjectivity and evaluation uncertainties.
The key advantage of fuzzy multi-criteria algorithms lies in their ability to produce more realistic and dependable rankings, enhancing the overall decision-making process.
Key contributions of this paper include the following:
  • Analysis and categorization of existing multi-criteria approaches for AI chatbot selection, classified by the techniques used and the types of estimates employed (numeric, interval, linguistic values, as well as fuzzy numbers). These approaches are then grouped into three main categories based on complexity (number of multi-criteria techniques), flexibility (type of fuzziness), and iterativeness (single or repeated data processing);
  • Development of a theoretical framework for ranking GAI chatbots using both single and hybrid methods with crisp and fuzzy estimates. Single methods rely on one weight determination or ranking technique, while hybrid methods integrate several. The framework also incorporates complementary capabilities, including evaluations using crisp or fuzzy numbers, statistical analyses, and ranking interpretation, to enhance the decision-making process. Additionally, it introduces a newly developed 3D distance metric to enhance the effectiveness of the Fermatean fuzzy group TOPSIS method in case of hesitant interval assessments for more precise and effective multi-criteria comparisons of chatbot features;
  • Creation of static and dynamic rankings of an AI chatbot dataset via single or repeated multi-criteria decision analysis. In static rankings, experts’ opinions serve as inputs for the decision matrices, whereas dynamic rankings measure user attitudes based on behavior or survey data. Comparative analyses with other multi-criteria baselines underscore both the effectiveness and reliability of the proposed methods.
The paper begins with a literature review in Section 2, discussing the motivation behind exploring fuzzy ranking for GAI chatbots. Next, Section 3 details the proposed theoretical decision-making framework for GAI chatbot selection, emphasizing the role of interval-valued hesitant Fermatean fuzzy numbers (IVHFFNs) and a modified TOPSIS method tailored for the IVHFF environment. Practical examples and result analysis are provided in Section 4, showcasing the application of the framework. The final section concludes the research by summarizing the key findings, offering insights, and proposing directions for future studies.

3. Methodological Framework for GAI Chatbot Selection

This section outlines the theoretical foundations of interval-valued hesitant Fermatean fuzzy numbers (IVHFFNs), introduces a modified TOPSIS approach utilizing IVHFFNs, and proposes a conceptual framework for decision analysis of GAI chatbot data.
To address the challenge of GAI chatbot selection, we employed the classic Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) [36], complemented by recently developed fuzzy sets modification. As a distance-based multi-criteria decision-making method, TOPSIS determines the relative closeness of each alternative to the ideal solution (best outcome) and the anti-ideal solution (worst outcome) for each criterion. The alternative with the highest coefficient of relative proximity to the ideal solution is selected as the most suitable.

3.1. Interval-Valued Hesitant Fermatean Fuzzy Numbers: Some Basic Definitions and Operations

To enhance the TOPSIS methodology, we integrate interval-valued hesitant Fermatean fuzzy sets (IVHFFSs) [14]. This subsection provides an overview of the key concepts and arithmetic operations associated with IVHFFNs, which are essential for implementing this modification.
IVHFFSs extend earlier models such as interval-valued fuzzy sets (IVFSs) (1975) [37], hesitant fuzzy sets (HFSs) (2010) [38] and Fermatean fuzzy sets (FFSs) (2020) [39]. Represented in a three-dimensional space, IVHFFSs use interval values within the range [0, 1] to describe the belongingness degree (BD), non-belongingness degree (NBD), and indeterminacy degree. A notable feature of IVHFFSs is the use of interval values for BD and NBD, with the constraint that the cube of the upper bounds for these intervals must not exceed 1. Compared to FFSs, IVHFFSs provide a more complex representation of uncertainty.
When crisp BD and NBD values are challenging to obtain—due to imprecise or uncertain data—IVHFFNs, with their interval-valued flexibility and the ability to accommodate multiple intervals, offer a practical solution for decision makers and researchers. This flexibility ensures more accurate assessments of alternatives in situations where precise evaluations are unattainable.
In this section, some basic concepts of IVHFFSs are described.
Definition 1 
([14]). The IVHFFS T in a universe U is defined by the following:
T = { u i ,   α T u i ,   β T u i u i U }
where
α T u i = μ T l u i , μ T u u i α T u i μ T l u i , μ T u u i   and  
β T u i = ν T l u i , ν T u u i β T u i ν T l u i , ν T u u i
represent two sets of interval values in [0, 1], signifying the possible BD and NBD of an object  u i U  to  T , with the following constraints:
0   μ T l u i μ T u u i 1   , 0   ν T l u i ν T u u i 1   and  
0 μ T u u i + 3 + ν T u u i + 3 1 ,
such that
μ T l u i , μ T u u i   α T u i ,   ν T l u i , ν T u u i   β T u i ,
μ T u u i +   α T + u i = μ T l u i , μ T u u i α T u i max μ T u u i ,
ν T u u i +   β T + u i = ν T l ( u i ) , ν T u u i β T u i max ν T u u i   for   all   u i U .
The pair ( ( α T u i , β T u i )  is called an interval-valued hesitant Fermatean fuzzy number (IVHFFN), denoted by  ξ = α ,   β .
Definition 2 
([14]). Suppose that ξ = ( α , β ) is an IVHFFN. Then, the score function s for ξ can be defined as follows:
s ( ξ ) = 1 2 ( 1 # a μ T l u i , μ T u u i α T u i μ T l u i 3 ( 1 # b ν T l u i ,   ν T u u i β T u i ν T l u i 3 ) + 1 # a μ T l u i , μ T u u i α T u i μ T u u i 3 1 # b ν T l u i ,   ν T u u i β T u i ν T u u i 3 ) ,
where   # a   and   # b   represent the number of interval values in  α  and  β , respectively.
The larger the score value s ( ξ ) , the greater the IVHFFN ξ .
Since s ξ [ 1 ,   1 ] , an improved score function for an IVHFFN ξ in described in the following definition:
Definition 3 
([14]). Assume ξ = ( α , β ) is an IVHFFN. Then, an improved score function is defined by the following:
s * ξ = 1 2 s ξ + 1 ,
such that   s * ξ [ 0 ,   1 ] .
In case of different numbers of intervals in BD and NBD of an IVHFFN, a preprocessing step should be added. We assume to add the mean value of BD or the NBD for given object.
The arithmetic operations on IVHFFNs are given by the next definition.
Definition 4 
([14]). Let ξ 1 = α 1 ,   β 1 and ξ 2 = α 2 ,   β 2 be two IVHFFNs. Then, we have the following:
ξ 1 ξ 2 = μ ξ 1 l , μ ξ 1 u α 1 , ν ξ 1 l , ν ξ 1 u β 1   μ ξ 2 l , μ ξ 2 u α 2 , ν ξ 2 l , ν ξ 2 u β 2   μ ξ 1 l 3 + μ ξ 2 l 3 μ ξ 1 l 3 μ ξ 2 l 3 3 , μ ξ 1 u 3 + μ ξ 2 u 3 μ ξ 1 u 3 μ ξ 2 u 3 3 , ν ξ 1 l ν ξ 2 l , ν ξ 1 u ν ξ 2 u
ξ 1 ξ 2 = μ ξ 1 l , μ ξ 1 u α 1 , ν ξ 1 l , ν ξ 1 u β 1   μ ξ 2 l , μ ξ 2 u α 2 , ν ξ 2 l , ν ξ 2 u β 2   μ ξ 1 l μ ξ 2 l , μ ξ 1 u μ ξ 2 u , ν ξ 1 l 3 + ν ξ 2 l 3 ν ξ 1 l 3 ν ξ 2 l 3 3 , ν ξ 1 u 3 + ν ξ 2 u 3 ν ξ 1 u 3 ν ξ 2 u 3 3
λ ξ = μ ξ l , μ ξ u α , ν ξ l , ν ξ u β 1 1 μ ξ l 3 λ 3 , 1 1 μ ξ u 3 λ 3 , ν ξ l λ , ν ξ u λ ,
where   λ   0 R .
ξ λ = μ ξ l , μ ξ u α , ν ξ l , ν ξ u β μ ξ l λ , μ ξ u λ , 1 1 ν ξ l 3 λ 3 ,   1 1 ν ξ u 3 λ 3 ,
where   λ   0 R .
Definition 5. 
(Based on [14]) Let ξ 1 = α 1 ,   β 1 and ξ 2 = α 2 ,   β 2 be two IVHFFNs. Then, the distance between ξ 1 and ξ 2 is defined as follows:
d ξ 1 ,   ξ 2 = 1 4 φ 1 μ l 3 φ 2 μ l 3 λ + φ 1 μ u 3 φ 2 μ u 3 λ + φ 1 ν l 3 φ 2 ν l 3 λ + φ 1 ν u 3 φ 2 ν u 3 λ + π 1 l 3 π 2 l 3 λ + π 1 u 3 π 2 u 3 λ 1 / λ
where   φ s μ l = 1 # a s i = 1 # a s μ i l 3 ,   φ s μ u = 1 # a s i = 1 # a s μ i u 3 ,   φ s ν l = 1 # b s i = 1 # b s ν i l 3 ,   φ s ν u = 1 # b s i = 1 # b s ν i u 3 ,   # a s   and   # b s   denote the number of BD and NBD intervals in   ξ 1   and   ξ 2   , respectively; s = 1,2,   λ   >   0 and the following:
π 1 l = 1 1 # a 1 [ μ 1 l ,   μ 1 u ] α 1   ( μ 1 u ) 3 + 1 # b 1 [ ν 1 l ,   ν 1 u ] β 1   ( ν 1 u ) 3   3 , π 1 u = 1 1 # a 1 [ μ 1 l ,   μ 1 u ] α 1   ( μ 1 l ) 3 + 1 # b 1 [ ν 1 l ,   ν 1 u ] β 1   ( ν 1 l ) 3   3 , π 2 l = 1 1 # a 2 [ μ 2 l ,   μ 2 u ] α 2   ( μ 2 u ) 3 + 1 # b 2 [ ν 2 l ,   ν 2 u ] β 2   ( ν 2 u ) 3   3 , π 2 u = 1 1 # a 2 [ μ 2 l ,   μ 2 u ] α 2   ( μ 2 l ) 3 + 1 # b 2 [ ν 2 l ,   ν 2 u ] β 2   ( ν 2 l ) 3   3 .
Definition 6 
([14]). Let ξ i = μ i l ,   μ i u } , { ν i l ,   ν i u (i = 1, 2, …, m) be a collection of IVHFFNs and w = w 1 ,   w 2 ,   ,   w m T such that w i 0 ,   i = 1 m w i = 1 ; then, an interval-valued hesitant Fermatean fuzzy weighted average (IVHFFWA) operator is used to map I V H F F W A :   T n T :
I V H F F W A ξ 1 ,   ξ 2 ,   ,     ξ m = i = 1 m w i ξ i = μ i l , μ i u α i , ν i l , ν i u β i 1 i = 1 m 1 ( μ i l ) 3 w i   3 , 1 i = 1 m 1 ( μ i u ) 3 w i   3   ,   i = 1 m ( ν i l ) 3 w i ,   i = 1 m ( ν i u ) 3 w i     .
Specifically, if w = 1 / m ,   1 / m ,   ,   1 / m T , then the IVHFFWA operator is converted into the following formula:
I V H F F W A ξ 1 ,   ξ 2 ,   ,     ξ m = 1 m i = 1 m ξ i = μ i l , μ i u α i , ν i l , ν i u β i 1 i = 1 m 1 ( μ i l ) 3 1 / m   3 , 1 i = 1 m 1 ( μ i u ) 3 1 / m   3   ,   i = 1 m ( ν i l ) 3 1 / m ,   i = 1 m ( ν i u ) 3 1 / m   .
In summary, the space of interval-valued hesitant Fermatean fuzzy numbers (IVHFFNs) is broader than that of interval-valued Fermatean fuzzy numbers (IVFFNs). With a less restrictive constraint, IVHFFSs provide greater precision in addressing complex and uncertain MCDM problems compared to IVFFSs.

3.2. TOPSIS in IVHFFNs Environment

TOPSIS evaluates alternatives by measuring their closeness to an ideal solution and their distance from a negative-ideal solution. To adapt this method for IVHFFNs, we propose calculating the distances between alternatives using Equation (5). The pseudocode for the modified TOPSIS approach within the IVHFFN framework is presented in Algorithm 1.
Let A i ,   i = 1 ,   2 ,   ,   N represent the given set of alternatives, C j ,   j = 1 ,   2 ,   ,   M denote the set of identified criteria for A evaluation, and ω j be the set of relative weights of criteria C.
Algorithm 1. IVHFFNs TOPSIS.
Step 1. Gather the linguistic evaluations provided by expert k in the decision matrix
X k i , j A i , C j , k = 1 , 2 , K ,
where K is the number of experts. Convert the X matrices into values represented by IVHFFNs values.
Step 2. Compute the aggregated matrix X ~   for all experts according to Equation (9). Assume equal weighting for all experts (1/K) and apply the averaging formula provided:
X ~ i , j I V H F F W A X ~ 1 i , j , X ~ 2 i , j , , X ~ k i , j .
Step 3 .   Identify   the   minimizing   criteria ,   referred   to   as   the   cos t   criteria   and   denoted   by   C , while the remaining criteria are categorized as benefit criteria and denoted by 𝔹.
Step 4 .   Determine   the   normalized   values   of   the   decision   matrix   X ~ using its score function as described in Equation (3):
r ~ i , j x ~ i , j x ~ i , j 2
Step 5. Derive the weighted values of assessments for each criterion:
a ~ i , j w j r ~ i , j
according to Equation (6).
Step   6 .   Establish   the   ideal   A ~ * and   negative   ideal   A ~ solutions for each criterion:
A ~ * = a ~ 1 * , a ~ 2 * , , a ~ M * = max j a ~ i , j | j B min j a ~ i , j | j C
A ~ = a ~ 1 , a ~ 2 , , a ~ M = min j a ~ i , j | j B max j a ~ i , j | j C
for   beneficial   ( B )   and   cos t   criteria   ( C ).
Step 7. Measure the distances from each alternative to the ideal and negative ideal solutions using Equation (8):
D * i = j = 1 M D G a ~ i , j ,   a ~ * j  
D i = j = 1 M D G a ~ i , j ,   a ~ j
Step 8. Calculate the coefficients of relative closeness of each alternative to the ideal solution:
R C i = D D + D + .
Order the alternatives in descending order based on their coefficients of relative closeness to the ideal solution R C i and select the alternative with the highest coefficient as the optimal choice.
The proposed modification of TOPSIS integrates a new flexible IVHFFNs distance metric from Equation (8). Unlike standard Fermatean fuzzy numbers, which operate within a three-dimensional (3D) space, or IVFFNs, which utilize three 3D intervals to define the membership, non-membership, and hesitancy degrees, IVHFFNs introduce an even more complex structure. Specifically, they allow for different numbers of intervals to define the belongingness and the non-belongingness degrees. The increased flexibility in representing uncertainty results in a more accurate evaluation of alternatives.
However, the proposed new TOPSIS extension in IVHFF environment has a higher time complexity compared to its counterparts using crisp, classical, or other fuzzy models, including IVFFNs. This increased complexity arises from more intricate arithmetic operations, computationally intensive distance metric calculations, and the evaluation of multiple interval-based values in the score function.
Nevertheless, the tradeoff between more accurate representation and increased time complexity is justified, as these advanced 3D fuzzy numbers enable a more precise depiction of uncertainty in alternative evaluations.

3.3. Theoretical Framework for GAI Chatbot Selection

Selecting an appropriate generative AI (GAI) chatbot involves a structured, multi-stage decision-making process to ensure alignment with organizational needs and user expectations. The new framework for unified decision analysis of GAI chatbot data consists of eight stages (Figure 1).
Figure 1. The flowchart of proposed framework for decision analysis of GAI chatbots.
  • Stage 1: Needs Assessment
The decision-making process begins with clearly identifying the specific requirements and expectations for a GAI chatbot. This involves collecting data on available chatbots and understanding the current state of chatbot technology. Relevant information can be gathered from industry reports, user reviews, and technical specifications. The goal is to determine which chatbots are available, their capabilities, and how well they align with the organization’s needs. If the assessment confirms a need for a GAI chatbot, the process advances to the next stage.
  • Stage 2: User Requirements Specification
In this stage, surveys or interviews are conducted to collect feedback from potential users about their expectations and preferences. This input helps define the desired features and functionalities of the chatbot, such as natural language understanding, integration capabilities, and user interface design.
  • Stage 3: Development of Evaluation Criteria
A multi-criteria evaluation system is created to facilitate a systematic comparison of chatbots. This system is based on user requirements and the organizational importance of specific chatbot features. Key criteria may include technological specifications, ease of integration, user friendliness, scalability, and cost.
  • Stage 4. Selection of data types
The choice of data types and decision-making methods depends on the resources available and the data collected in Stage 3. If resources are limited, decision makers may select traditional data types and algorithms with lower computational complexity, respectively. For more precise results, advanced data types and MCDM methods can be employed, though they may require greater resources. Data collection methods may include expert evaluations, user testing, and market analysis.
  • Stage 5. Data reprocessing and storage
The collected data are processed and stored appropriately for further analysis. This step includes coding qualitative assessments into numerical forms, identifying and resolving duplicates or errors, addressing missing values, and ensuring overall data integrity. Once processed, the data are stored in a database or dataset for subsequent stages.
  • Stage 6. Determination of criteria weights
Based on the evaluation criteria and collected data, weight coefficients are assigned to each criterion to reflect their relative importance. These weights can either be predetermined or calculated using methods such as AHP or other weighting techniques.
  • Stage 7. Multi-criteria analysis
In this stage, the MCDM algorithm is applied to rank chatbot alternatives according to the weighted criteria. Using multiple MCDM methods or hybrid combinations can yield a more robust and comprehensive analysis.
  • Stage 8. Results analysis and interpretation
Decision makers analyze the rankings to identify the top chatbot alternatives. If the highest-ranked option satisfies organizational requirements, it is selected. If not, additional data may be collected and the process iterated from Stage 4. The final selection should align with long-term organizational goals and user expectations.
This structured approach ensures a comprehensive and objective selection process for GAI chatbots, customized to meet specific organizational needs.

4. A Case Study of Quality-Based Evaluation of GAI Chatbots

Let S be an organization faced with a GAI chatbot selection problem. The benefits of implementation of a GAI chatbot in the workflow of Organization S are numerous. The problem is how to find the best GAI chatbot for the organizational specifics.
The execution of Stage 1 of the proposed framework shows that there are several available GAI chatbots, and the process of chatbot selection can start. In this illustrative example, we utilize our own chatbot dataset, collected from benchmarking websites such as [23]. The dataset consists of four assessment criteria, namely C 1 ,   C 2 ,   , C 4 (Section 2.2), and five GAI chatbots, namely A 1 ,   A 2 ,   , A 5 (Section 2.3). The criteria are related to the following aspects of GAI chatbot features: C 1 —conversational ability, C 2 —user experience, C 3 —integration capability, and C 4 —price. The GAI chatbots are as follows: A 1 —ChatGPT, A 2 —Copilot, A 3 —Gemini, A 4 —Claude, and A 5 —Perplexity.
In Stage 2, experts from Organization S fill in the questionnaire about their GAI chatbot requirements. Respondents evaluate the chatbot features via a five-point Likert scale ranging from “extremely important” (corresponding to 5) to “unimportant” (corresponding to 1).
In the next stage, experts from Organization S complete a questionnaire outlining their requirements for generative AI (GAI) chatbots. Participants assess the chatbot features using a five-point Likert scale, ranging from “unimportant” (1) to “extremely important” (5).
In Stage 3, a multi-attribute criteria index is developed, consisting of variables:
C i ,   i = 1,4 ¯ .
In the next stage, decision makers decide that the data type is IVHFFNs and employ the proposed new IVHFFNs TOPSIS modification. The values of the decision matrix are converted into five-point Likert scale (Table 3). For transforming every linguistic variable into its corresponding IVHFFNs, the conversion table (Table 4) is applied.
Table 3. Input decision matrix for GAI chatbots selection.
Table 4. Linguistic variables and their corresponding IVHFFNs.
In Stage 5, we decide that the data type is IVHFFNs and implement the proposed IVHFFNs TOPSIS modification. The decision matrix values are converted into linguistic variables as shown in Table 3. Each linguistic variable is then transformed into its corresponding IVHFFN using the conversion rules provided in Table 4.
The weight coefficients for the criteria are equal, such that w 1 = w 2 = w 3 = w 4 = 0.25 .
The overall scores and rankings of given GAI chatbots obtained by using IVHFFNs and crisp TOPSIS method are displayed in Table 5.
Table 5. Scores and their corresponding rankings: TOPSIS method in IVHFFNs.
The problem was also solved using several other MCDM methods (Table 6)—weighted sum method (WSM), triangular fuzzy numbers’ (TFNs) WSM, evaluation based on distance from average solution (EDAS), and TOPSIS. In order to show that the IVHFF TOPSIS solution is feasible, we compare the obtained ranking with those obtained with crisp and triangular fuzzy estimates.
Table 6. Overall scores and their corresponding ranking.
The final rankings are as follows:
WSM (Benchmarking method): A1     A2     A3     A4     A5.
TFNs WSM: A1     A3     A2     A4     A5, ρ = 95 % .
EDAS: A1     A3     A2     A5     A4, ρ = 85 % .
TOPSIS: A1     A2     A3     A4     A5, ρ = 95 % .
IVFFNs TOPSIS: A1     A2     A3     A4     A4, ρ = 90 % .
Spearman’s rank correlation coefficient was utilized to assess the agreement between the benchmark ranking (WSM) and the rankings produced by other four MCDM methods. The analysis demonstrated high reliability of the alternative methods, with TFNs WSM and TOPSIS both achieving a Spearman’s ρ of 95% and EDAS reaching a ρ of 85%. These substantial correlation coefficients of the proposed IVHFFNs TOPSIS ( ρ = 90 % ) confirm that the proposed method aligns closely with the benchmark and alternative methods, ensuring dependable and consistent ranking outcome.
Analysis of the obtained rankings categorizes the GAI chatbots into two primary groups.
Group 1 (leading GAI chatbots) includes the leading GAI chatbots: ChatGPT (A1), Copilot (A2), and Gemini (A3). ChatGPT consistently secures the top position across all methods, highlighting its superior conversational ability (C1) and robust user experience (C2). Copilot and Gemini follow closely, demonstrating strong performance in integration capability (C3) and competitive price (C4). While Gemini maintains a comparable standing in most methods, Copilot showcases enhanced strengths in specific criteria, particularly in integration capability.
Group 2 (lower-ranked GAI chatbots), with Claude (A4) and Perplexity (A5), consistently occupy the lower ranks across all methods. Claude exhibits moderate performance but lags in conversational ability (C1), user experience (C2), and integration capability (C3), whereas Perplexity AI falls behind primarily due to its less competitive integration capability (C3).
The ranking analysis across multiple MCDM methods consistently identifies ChatGPT as the leading AI chatbot, followed by Copilot and Gemini. Claude and Perplexity are positioned in the lower tier, highlighting the need for further enhancements to improve their performance in areas such as conversational ability, user experience, and integration capability. The high correlation coefficient shows the robustness of the proposed TOPSIS modification, ensuring that the ranking reflects the underlying performance metrics.
Based on the real-life characteristics of the compared GAI chatbots, the final ranking is adequate. ChatGPT (A1) stands out with its strong conversational abilities, high user satisfaction, and versatile integration options, which is further supported by its extensive real-world adoption and positive user feedback. Copilot (A2) also offers robust capabilities, particularly in development-oriented tasks, while retaining reasonable usability and pricing. Gemini (A3), although relatively new and not widely available, is expected to provide advanced conversational features in line with its strong technological backing, albeit with moderate integration and a lower price point than ChatGPT (A1) or Copilot (A2). Claude (A4), known for producing safer, more controlled outputs, and Perplexity (A5), valued for its quick question-answering style, both serve specific niches; their medium-to-lower scores in conversational ability and integration reflect these narrower focuses compared to ChatGPT (A1) and Copilot (A2). Consequently, the observed ordering is consistent with the advantages, target markets, and limitations of these chatbots.
It can be concluded that the proposed framework is reliable and properly reflects the requirements of organization S.
Selecting the appropriate chatbot is crucial for enhancing user engagement and operational efficiency. To streamline this selection process, a comprehensive approach is essential. This methodology enables experts to evaluate various technological, integration, and performance characteristics; establish specific requirements; utilize fuzzy assessments; and objectively identify the most suitable chatbot for a particular organization. Decision makers can further refine the evaluation system by incorporating factors such as anticipated interaction volumes, scalability, maintenance and support, error handling and recovery, and customization capabilities.
The proposed methodology offers benefits to both end users and organizational decision makers. For end users, aligning chatbot functionalities with user preferences and requirements enhances satisfaction and engagement. A chatbot selected through this process delivers precise and efficient assistance, thereby elevating the overall user experience. For organizational decision makers, the new MCDM approach provides a clear and unbiased framework for evaluating chatbots against the organization’s strategic goals and operational needs. This leads to informed investment choices and the smooth integration of AI technologies into business processes.

5. Conclusions

The rapid advancement of LLMs has significantly increased the prominence of GAI chatbots in various sectors. Many organizations are integrating these conversational assistants into their workflows to enhance workflow efficiency and user engagement. However, there is currently no unified algorithmic approach for selecting suitable intelligent assistants.
In response to this challenge, we developed an integrated framework for GAI chatbot selection. This framework introduces an extension of TOPSIS within an IVHFFNs environment, enabling objective evaluation of generative chatbots. The fuzzy nature of this method effectively addresses uncertainty and vagueness in expert assessments. Moreover, the framework is versatile, accommodating both single and repeated data processing for chatbot selection.
The key advantages of the IVHFF TOPSIS include the following:
  • Incorporation of several interval-valued membership and non-membership grades, along with interval-valued hesitancy degrees in the evaluation process;
  • Integration of Minkowski distance-based family of metrics, enabling flexible and accurate distance calculations tailored to various data types;
  • Consideration of the lengths of belongingness, non-belongingness, and hesitancy intervals in distance calculations, ensuring a comprehensive assessment of each criterion’s impact.
To demonstrate the effectiveness of this new framework, we applied it to a practical scenario involving the selection of five GAI chatbots: ChatGPT, Copilot, Gemini (formerly Bard), Claude, and Perplexity. To capture the performance of the chatbots, we selected four critical criteria that align with user needs and technological capabilities. The analysis of the results indicates that the new methodology reliably reflects the features of the chatbots in the final rankings.
This evaluation process can be conducted periodically to account for the rapid advancements in GAI technologies and the evolving needs of organizations. Implementing an iterative procedure allows for continuous refinement of the selection criteria and adaptation to new developments, ensuring that the chosen chatbot solutions remain optimal over time.
In future work, we aim to enhance this conceptual framework by integrating recently developed multi-criteria decision-making methods. Additionally, we intend to develop a new hybrid method for chatbot evaluation that combines innovative weight determination algorithms with advanced multi-criteria decision-making techniques. We also plan to expand the ranking mechanism to address uncertainties using various classic and interval fuzzy sets, including interval type-3 and T-spherical fuzzy numbers. Furthermore, we acknowledge the limitation of assuming equal-weighted coefficients in the current study and plan to refine this aspect by incorporating adaptive weighting mechanisms in our future research.

Funding

This research was partially funded the Ministry of Education and Science and by the National Science Fund, co-founded by the European Regional Development Fund, Grant No. BG05M2OP001-1.002-0002 and BG16RFPR002-1.014-0013-M001 “Digitization of the Economy in Big Data Environment”.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The author thanks the academic editor and anonymous reviewers for their insightful comments and suggestions.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Bulchand-Gidumal, J. Impact of artificial intelligence in travel, tourism, and hospitality. In Handbook of e-Tourism; Springer International Publishing: Cham, Switzerland, 2022; pp. 1943–1962. [Google Scholar]
  2. Obaid, A.J.; Bhushan, B.; Rajest, S.S. (Eds.) Advanced Applications of Generative AI and Natural Language Processing Models; IGI Global: Hershey, PA, USA, 2023. [Google Scholar]
  3. Al-Amin, M.; Ali, M.S.; Salam, A.; Khan, A.; Ali, A.; Ullah, A.; Alam, M.N.; Chowdhury, S.K. History of Generative Artificial Intelligence (AI) Chatbots: Past, Present, and Future Development. arXiv 2024, arXiv:2402.05122. Available online: https://arxiv.org/abs/2402.05122 (accessed on 1 January 2025).
  4. Yenduri, G.; Srivastava, G.; Maddikunta, P.K.R.; Jhaveri, R.H.; Wang, W.; Vasilakos, A.V.; Gadekallu, T.R. Generative pre-trained transformer: A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. arXiv 2023, arXiv:2305.10435. [Google Scholar] [CrossRef]
  5. Saka, A.; Taiwo, R.; Saka, N.; Salami, B.A.; Ajayi, S.; Akande, K.; Kazemi, H. GPT models in construction industry: Opportunities, limitations, and a use case validation. Dev. Built Environ. 2023, 17, 100300. [Google Scholar] [CrossRef]
  6. Dwivedi, Y.K.; Pandey, N.; Currie, W.; Micu, A. Leveraging ChatGPT and other generative artificial intelligence (AI)-based applications in the hospitality and tourism industry: Practices, challenges and research agenda. Int. J. Contemp. Hosp. Manag. 2024, 36, 1–12. [Google Scholar] [CrossRef]
  7. Chen, B.; Wu, Z.; Zhao, R. From fiction to fact: The growing role of generative AI in business and finance. J. Chin. Econ. Bus. Stud. 2023, 21, 471–496. [Google Scholar] [CrossRef]
  8. Ghaffari, S.; Yousefimehr, B.; Ghatee, M. Generative-AI in E-Commerce: Use-Cases and Implementations. In Proceedings of the 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), Babol, Iran, 21–22 February 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
  9. Al Naqbi, H.; Bahroun, Z.; Ahmed, V. Enhancing Work Productivity through Generative Artificial Intelligence: A Comprehensive Literature Review. Sustainability 2024, 16, 1166. [Google Scholar] [CrossRef]
  10. Statista. Chatbot Market Worldwide 2016–2025. Available online: https://www.statista.com/statistics/656596/worldwide-chatbot-market/ (accessed on 30 June 2024).
  11. Gartner. Gartner Says More Than 80% of Enterprises Will Have Used Generative AI APIs or Deployed Generative AI-Enabled Applications by 2026. Available online: https://www.gartner.com/en/newsroom/press-releases/2023-10-11-gartner-says-more-than-80-percent-of-enterprises-will-have-used-generative-ai-apis-or-deployed-generative-ai-enabled-applications-by-2026 (accessed on 30 June 2024).
  12. Wang, K.; Ying, Z.; Goswami, S.S.; Yin, Y.; Zhao, Y. Investigating the role of artificial intelligence technologies in the construction industry using a Delphi-ANP-TOPSIS hybrid MCDM concept under a fuzzy environment. Sustainability 2023, 15, 11848. [Google Scholar] [CrossRef]
  13. Alshahrani, R.; Yenugula, M.; Algethami, H.; Alharbi, F.; Goswami, S.S.; Naveed, Q.N.; Zahmatkesh, S. Establishing the fuzzy integrated hybrid MCDM framework to identify the key barriers to implementing artificial intelligence-enabled sustainable cloud system in an IT industry. Expert Syst. Appl. 2024, 238, 121732. [Google Scholar] [CrossRef]
  14. Mishra, A.R.; Liu, P.; Rani, P. COPRAS method based on interval-valued hesitant Fermatean fuzzy sets and its application in selecting desalination technology. Appl. Soft Comput. 2022, 119, 108570. [Google Scholar] [CrossRef]
  15. Chakrabortty, R.K.; Abdel-Basset, M.; Ali, A.M. A multi-criteria decision analysis model for selecting an optimum customer service chatbot under uncertainty. Decis. Anal. J. 2023, 6, 100168. [Google Scholar] [CrossRef]
  16. Santa Barletta, V.; Caivano, D.; Colizzi, L.; Dimauro, G.; Piattini, M. Clinical-chatbot AHP evaluation based on “quality in use” of ISO/IEC 25010. Int. J. Med. Inform. 2023, 170, 104951. [Google Scholar] [CrossRef] [PubMed]
  17. ISO/IEC. Systems and Software Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—Product Quality Model; International Organization for Standardization (ISO): Geneva, Switzerland, 2023; Available online: https://www.iso.org/standard/78176.html (accessed on 1 January 2025).
  18. Singh, C.; Dash, M.K.; Sahu, R.; Singh, G. Evaluating Critical Success Factors for Acceptance of Digital Assistants for Online Shopping Using Grey–DEMATEL. Int. J. Hum. Comput. Interact. 2023, 40, 8674–8688. [Google Scholar] [CrossRef]
  19. Pandey, M.; Litoriya, R.; Pandey, P. Indicators of AI in Automation: An Evaluation Using Intuitionistic Fuzzy DEMATEL Method with Special Reference to Chat GPT. Wirel. Pers. Commun. 2024, 134, 445–465. Available online: https://link.springer.com/article/10.1007/s11277-024-10917-7 (accessed on 30 June 2024). [CrossRef]
  20. Pathak, A.; Bansal, V. Factors Influencing the Readiness for Artificial Intelligence Adoption in Indian Insurance Organizations. In Transfer, Diffusion and Adoption of Next-Generation Digital Technologies; Sharma, S.K., Dwivedi, Y.K., Metri, B., Lal, B., Elbanna, A., Eds.; IFIP Advances in Information and Communication Technology; Springer: Cham, Switzerland, 2024; Volume 698, pp. 384–397. Available online: https://link.springer.com/chapter/10.1007/978-3-031-50192-0_5 (accessed on 1 January 2025).
  21. Wiangkham, A.; Vongvit, R. Comparative Analysis of MCDM Methods for Prioritizing Influential Factors of Chatgpt Adoption in Higher Education. 2024. Available online: https://ssrn.com/abstract=5040810 (accessed on 1 January 2025).
  22. Ojo, Y.; Davids, V.; Oni, O.; Odoemene, M.; Idowu-Collin, P.; Eyeregba, U. A Multi-Criteria Approach for Evaluating the Use of AI for Matching Patients to Optimal Mental Health Treatment Plans. Read. Time 2024, 193, 201–222. Available online: https://worldscientificnews.com/wp-content/uploads/2024/04/WSN-1932-2024-201-222.pdf (accessed on 1 January 2025).
  23. Chatbot Arena. Available online: https://lmarena.ai (accessed on 1 January 2025).
  24. Artificial Analysis. Available online: https://artificialanalysis.ai/ (accessed on 1 January 2025).
  25. Parasuraman, A.; Zeithaml, V.A.; Berry, L.L. SERVQUAL: A multiple-item scale for measuring consumer perceptions of service quality. J. Retail. 1988, 64, 12–40. [Google Scholar]
  26. Davis, F.D. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q. 1989, 13, 319–340. [Google Scholar] [CrossRef]
  27. Verhoef, P.C.; Lemon, K.N.; Parasuraman, A.; Roggeveen, A.; Tsiros, M.; Schlesinger, L.A. Customer experience creation: Determinants, dynamics and management strategies. J. Retail. 2009, 85, 31–41. [Google Scholar] [CrossRef]
  28. Venkatesh, V.; Morris, M.G.; Davis, G.B.; Davis, F.D. User Acceptance of Information Technology: Toward a Unified View. MIS Q. 2003, 27, 425–478. [Google Scholar] [CrossRef]
  29. Tornatzky, L.G.; Fleischer, M. The Processes of Technological Innovation; Lexington Books: Lanham, MD, USA, 1990. [Google Scholar]
  30. Yusof, M.M.; Kuljis, J.; Papazafeiropoulou, A.; Stergioulas, L.K. An Evaluation Framework for Health Information Systems: Human, Organization and Technology-Fit Factors (HOT-Fit). Int. J. Med. Inform. 2008, 77, 386–398. [Google Scholar] [CrossRef]
  31. Pan, C.; Banerjee, J.S.; De, D.; Sarigiannidis, P.; Chakraborty, A.; Bhattacharyya, S. ChatGPT: A OpenAI platform for society 5.0. In Proceedings of the Doctoral Symposium on Human Centered Computing, Singapore, 25 February 2023; Springer Nature: Singapore, 2023; pp. 384–397. [Google Scholar]
  32. Stratton, J. An Introduction to Microsoft Copilot. In Copilot for Microsoft 365: Harness the Power of Generative AI in the Microsoft Apps You Use Every Day; Apress: Berkeley, CA, USA, 2024; pp. 19–35. [Google Scholar]
  33. Saeidnia, H.R. Welcome to the Gemini era: Google DeepMind and the information industry. Library Hi Tech News, 2023; ahead-of-print. [Google Scholar] [CrossRef]
  34. Priyanshu, A.; Maurya, Y.; Hong, Z. AI Governance and Accountability: An Analysis of Anthropic’s Claude. arXiv 2024, arXiv:2407.01557. [Google Scholar]
  35. Deike, M. Evaluating the performance of ChatGPT and Perplexity AI in Business Reference. J. Bus. Financ. Librariansh. 2024, 29, 125–154. [Google Scholar] [CrossRef]
  36. Hwang, C.L.; Yoon, K. Multiple Attribute Decision Making: Methods and Applications A State-of-the-Art Survey; Springer: Berlin/Heidelberg, Germany, 1981; Volume 186. [Google Scholar]
  37. Zadeh, L.A. The concept of a linguistic variable and its application to approximate reasoning—I. Inf. Sci. 1975, 8, 199–249. [Google Scholar] [CrossRef]
  38. Torra, V. Hesitant fuzzy sets. Int. J. Intell. Syst. 2010, 25, 529–539. [Google Scholar] [CrossRef]
  39. Senapati, T.; Yager, R.R. Fermatean fuzzy sets. J. Ambient Intell. Humaniz. Comput. 2020, 11, 663–674. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.