You are currently viewing a new version of our website. To view the old version click .

Getting Insight into How Different Chatbots Answer the Same Questions and How Reliable They Are

Topic Information

Dear Colleagues,

From November 2023 onward, ChatGPT has been used by a growing number of people looking for answers to a variety of questions in fields such as education, science, law, finance, marketing, public administration, health, and leisure. Since it was made publicly available, a steady flow of scientific studies has been published. Many of these studies use a SWOT analysis to describe its strengths, weaknesses, opportunities, and threats (e.g., Farrokhnia, 2023). Others have analyzed ChatGPT’s conversational characteristics or the extent to which ChatGPT is capable of self-reflection (Loos et al., 2023). Ethical questions have also attracted attention (e.g., Balmer, 2023; Loos & Radicke (submitted)). But ChatGPT is not the only chatbot generating human-like text. Other examples of such chatbots include Google Bard and Perplexity AI. It struck us that there are almost no studies (with studies such Liesenfeld et al.’s in 2023 being a notable exception) systematically comparing how different chatbots answer the same questions and how reliable these answers are. For this reason, we invite researchers to submit cutting-edge empirical studies that showcase how at least two chatbots answer the same question(s) and then present an analysis of the reliability of the answer(s). It is of the utmost importance that the researchers define and operationalize the notion of a ‘reliable answer’. This calls for a thorough knowledge of the field from which the questions are raised. We welcome empirical contributions pertaining to topics including, but not limited to, the reliability of the answers given by at least two chatbots related to the same question(s) in order to analyze the following:

  • The extent to which the language used in the prompts plays a role in the way the chatbots’ answers are generated;
  • If the countries in which the chatbots are used influence their answers;
  • The degree to which chatbots can generate answers in a specific language style (for example, for a twelve-year-old person);
  • If they are able to generate their answer in a specific textual form (e.g., as an essay, a recipe, a scientific paper, etc.);
  • The extent to which the chatbots can execute different tasks, such as writing a text, translating a text, formulating a research design, collecting data, coding, conducting data analysis, summarizing a text, correcting a text, or giving feedback on a text;
  • The sources the chatbots refer to and how reliable they are;
  • The interactional characteristics between users’ prompts and chatbots’ answers.
  • What answers chatbots generate when confronted with an ethical dilemma or politically or societally sensitive issue;
  • When chatbots hallucinate and the characteristics of these hallucinations;
  • If chatbots can answer questions related to current topics.

References

Balmer, A. A Sociological Conversation with ChatGPT about AI Ethics, Affect and Reflexivity. Sociology 2023, 00380385231169676.

Farrokhnia, M.; Banihashem, S.K.; Noroozi, O.; Wals, A. A SWOT analysis of ChatGPT: Implications for educational practice and research. Innovations in Education and Teaching International, 2023, 1-15.

Liesenfeld, A.; Lopez, A.; Dingemanse, M. Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators. In Proceedings of the 5th International Conference on Conversational User Interfaces, July 2023; pp. 1-6.

Loos, E.; Gröpler, J.; Goudeau, M.L.S. Using ChatGPT in Education: Human Reflection on ChatGPT’s Self-Reflection. Societies 2023, 13, 196.

Loos, E.; Radicke, J. (submitted). Using ChatGPT-3 as a writing tool: An educational assistant or a moral hazard? Current ChatGPT-3 media representations compared to Plato’s critical stance on writing in Phaedrus.

Dr. Eugène Loos
Dr. Loredana Ivan
Topic Editors

Keywords

  • AI
  • LLMs
  • chatbots
  • ChatGPT
  • Google Bard
  • perplexity AI
  • reliability

Participating Journals

Algorithms
Open Access
4,263 Articles
Launched in 2008
2.1Impact Factor
4.5CiteScore
19 DaysMedian Time to First Decision
Q2Highest JCR Category Ranking
Behavioral Sciences
Open Access
5,504 Articles
Launched in 2011
2.5Impact Factor
3.1CiteScore
32 DaysMedian Time to First Decision
Q2Highest JCR Category Ranking
Societies
Open Access
1,788 Articles
Launched in 2011
1.6Impact Factor
3.0CiteScore
30 DaysMedian Time to First Decision
Q2Highest JCR Category Ranking
Technologies
Open Access
1,774 Articles
Launched in 2013
3.6Impact Factor
8.5CiteScore
19 DaysMedian Time to First Decision
Q1Highest JCR Category Ranking

Published Papers