Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (43)

Search Parameters:
Keywords = chat bots

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1763 KB  
Article
Research on the Automatic Generation of Information Requirements for Emergency Response to Unexpected Events
by Yao Li, Chang Guo, Zhenhai Lu, Chao Zhang, Wei Gao, Jiaqi Liu and Jungang Yang
Appl. Sci. 2025, 15(22), 11953; https://doi.org/10.3390/app152211953 - 11 Nov 2025
Viewed by 521
Abstract
In dealing with emergency events, it is very important when making scientific and correct decisions. As an important premise, the creation of information needs is quite essential. Taking earthquakes as a type of unexpected event, this paper constructs a large and model-driven system [...] Read more.
In dealing with emergency events, it is very important when making scientific and correct decisions. As an important premise, the creation of information needs is quite essential. Taking earthquakes as a type of unexpected event, this paper constructs a large and model-driven system for automating the generating process of information requirements for earthquake response. This research explores how the different departments interact during an earthquake emergency response, how the information interacts with each other, and how the information requirement process operates. The system is designed from three points of view, building a knowledge base, designing and developing prompts, and designing the system structure. It talks about how computers automatically make info needs for sudden emergencies. During the experimental process, the backbone architectures used were four Large Language Models (LLMs): chatGLM (GLM-4.6), Spark (SparkX1.5), ERNIE Bot (4.5 Turbo), and DeepSeek (V3.2). According to the desired system process, information needs is generated by real-word cases and then they are compared to the gathered information needs by experts. In the comparison process, the “keyword weighted matching + text structure feature fusion” method was used to calculate the semantic similarity. Like true positives, false positives, and false negatives can be used to find differences and calculate metrics like precision and recal. And the F1-score is also computed. The experimental results show that all four LLMs achieved a precision and recall of over 90% in earthquake information extraction, with their F1-scores all exceeding 85%. This verifies the feasibility of the analytical method a chatGLM dopted in this research. Through comparative analysis, it was found that chatGLM exhibited the best performance, with an F1-score of 93.2%. Eventually, Python is used to script these aforementioned processes, which then create complete comparison charts for visual and test result checking. In the course of researching we also use Protege to create the knowledge requirements ontology, so it is easy for us to show and look at it. This research is particularly useful for emergency management departments, earthquake emergency response teams, and those working on intelligent emergency information systems or those focusing on the automated information requirement generation using technologies such as LLMs. It provides practical support for optimizing rapid decision-making in earthquake emergency response. Full article
Show Figures

Figure 1

19 pages, 276 KB  
Review
The Role of AI in Academic Writing: Impacts on Writing Skills, Critical Thinking, and Integrity in Higher Education
by Promethi Das Deep and Yixin Chen
Societies 2025, 15(9), 247; https://doi.org/10.3390/soc15090247 - 4 Sep 2025
Cited by 6 | Viewed by 28481
Abstract
Artificial Intelligence (AI) tools have transformed academic writing and literacy development in higher education. Students can now receive instant feedback on grammar, coherence, style, and argumentation using AI-powered writing assistants, like Grammarly, ChatGPT, and QuillBot. Moreover, these writing assistants can quickly produce completed [...] Read more.
Artificial Intelligence (AI) tools have transformed academic writing and literacy development in higher education. Students can now receive instant feedback on grammar, coherence, style, and argumentation using AI-powered writing assistants, like Grammarly, ChatGPT, and QuillBot. Moreover, these writing assistants can quickly produce completed essays and papers, leaving little else for the student to do aside from reading and perhaps editing the content. Many teachers are concerned that this erodes critical thinking skills and undermines ethical considerations since students are not performing the work themselves. This study addresses this concern by synthesizing and evaluating peer-reviewed literature on the effectiveness of AI in supporting writing pedagogy. Studies were selected based on their relevance and scholarly merit, following the Scale for the Assessment of Narrative Review Articles (SANRA) guidelines to ensure methodological rigor and quality. The findings reveal that although AI tools can be detrimental to the development of writing skills, they can foster self-directed learning and improvement when carefully integrated into coursework. They can facilitate enhanced writing fluency, offer personalized tutoring, and reduce the cognitive load of drafting and revising. This study also compares AI-assisted and traditional writing approaches and discusses best practices for integrating AI tools into curricula while preserving academic integrity and creativity in student writing. Full article
24 pages, 3720 KB  
Article
A Comparative Study of the Accuracy and Readability of Responses from Four Generative AI Models to COVID-19-Related Questions
by Zongjing Liang, Yun Kuang, Xiaobo Liang, Gongcheng Liang and Zhijie Li
COVID 2025, 5(7), 99; https://doi.org/10.3390/covid5070099 - 30 Jun 2025
Viewed by 1728
Abstract
The purpose of this study is to compare the accuracy and readability of Coronavirus Disease 2019 (COVID-19)-prevention and control knowledge texts generated by four current generative artificial intelligence (AI) models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—and [...] Read more.
The purpose of this study is to compare the accuracy and readability of Coronavirus Disease 2019 (COVID-19)-prevention and control knowledge texts generated by four current generative artificial intelligence (AI) models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—and to evaluate the other performance characteristics of texts generated by domestic and international models. This paper uses the questions and answers in the COVID-19 prevention guidelines issued by the U.S. Centers for Disease Control and Prevention (CDC) as the evaluation criteria. The accuracy, readability, and comprehensibility of the texts generated by each model are scored against the CDC standards. Then the neural network model in the intelligent algorithms is used to identify the factors that affect readability. Then the medical topics of the generated text are analyzed using text analysis technology. Finally, a questionnaire-based manual scoring approach was used to evaluate the AI-generated texts, which was then compared to automated machine scoring. Accuracy: domestic models have higher textual accuracy, while international models have higher reliability. Readability: domestic models produced more fluent and publicly accessible language; international models generated more standardized and formally structured texts with greater consistency. Comprehensibility: domestic models offered superior readability, while international models were more stable in output. Readability factors: the average words per sentence (AWPS) emerged as the most significant factor influencing readability across all models. Topic analysis: ChatGPT emphasized epidemiological knowledge; Gemini focused on general medical and health topics; Kimi provided more multidisciplinary content; and Ernie Bot concentrated on clinical medicine. From the empirical results, it can be found that the manual and machine scoring are highly consistent in the indicators SimHash and FKGL, which proves the effectiveness of the evaluation method proposed in this paper. Conclusion: Texts generated by domestic models are more accessible and better suited for public education, clinical communication, and health consultations. In contrast, the international model has a higher accuracy in generating expertise, especially in epidemiological studies and assessing knowledge literature on disease severity. The inclusion of manual evaluations confirms the reliability of the proposed assessment framework. It is therefore recommended that future AI-generated knowledge systems for infectious disease control balance professional rigor with public comprehensibility, in order to provide reliable and accessible reference materials during major infectious disease outbreaks. Full article
(This article belongs to the Section COVID Public Health and Epidemiology)
Show Figures

Figure 1

20 pages, 5749 KB  
Review
Artificial Intelligence Research in Tourism and Hospitality Journals: Trends, Emerging Themes, and the Rise of Generative AI
by Wai Ming To and Billy T. W. Yu
Tour. Hosp. 2025, 6(2), 63; https://doi.org/10.3390/tourhosp6020063 - 3 Apr 2025
Cited by 20 | Viewed by 13268
Abstract
This study examined the trends and key themes of artificial intelligence in the field of tourism and hospitality research. On 5 March 2025, a search was performed using “artificial intelligence” and related terms in the “Title, Abstract, and Keywords”, focusing on tourism and [...] Read more.
This study examined the trends and key themes of artificial intelligence in the field of tourism and hospitality research. On 5 March 2025, a search was performed using “artificial intelligence” and related terms in the “Title, Abstract, and Keywords”, focusing on tourism and hospitality journals indexed in Scopus. The identified documents were subjected to performance analysis and science mapping techniques. The search yielded 921 documents, comprising 882 articles and 39 reviews. The number of documents increased from 3 in 1987 to 277 in 2024. R. Law from the University of Macau was the most prolific author, while the Hong Kong Polytechnic University recorded the highest publication count. Chinese researchers produced the most documents, totaling 262 articles and reviews. A keyword co-occurrence analysis revealed four key themes: “machine learning and sentiment analysis of online reviews”, “adoption of AI including robots and ChatGPT in the hospitality industry”, “artificial neural networks for tourism management and demand analysis”, and “random forest models in travel”. Additionally, the study noted a shift in research focus from tourism demand forecasting and sentiment analysis to using service bots and applying artificial intelligence to enhance service quality, with a recent emphasis on generative AI tools like ChatGPT. Full article
Show Figures

Figure 1

11 pages, 506 KB  
Article
Language Artificial Intelligence Models as Pioneers in Diagnostic Medicine? A Retrospective Analysis on Real-Time Patients
by Azka Naeem, Omair Khan, Syed Mujtaba Baqir, Kundan Jana, Prem Shankar, Avleen Kaur, Morad Zaaya, Fatima Sajid, Fizza Mohsin, Marlon Rivera Boadla, Aung Oo, Victor Wong, Momna Noor, Samar Pal Singh Sandhu, Kseniya Slobodyanuk, Vijay Shetty and Aaron Z. Tokayer
J. Clin. Med. 2025, 14(4), 1131; https://doi.org/10.3390/jcm14041131 - 10 Feb 2025
Cited by 3 | Viewed by 1659
Abstract
Background/Objectives: GPT-3.5 and GPT-4 has shown promise in assisting healthcare professionals with clinical questions. However, their performance in real-time clinical scenarios remains underexplored. This study aims to evaluate their precision and reliability compared to board-certified emergency department attendings, highlighting their potential in improving [...] Read more.
Background/Objectives: GPT-3.5 and GPT-4 has shown promise in assisting healthcare professionals with clinical questions. However, their performance in real-time clinical scenarios remains underexplored. This study aims to evaluate their precision and reliability compared to board-certified emergency department attendings, highlighting their potential in improving patient care. We hypothesized that board-certified emergency department attendings at Maimonides Medical Center exhibit higher accuracy and reliability than GPT-3.5 and GPT-4 in generating differentials based on history and physical examination for patients presenting to the emergency department. Methods: Real-time patient data from Maimonides Medical Center’s emergency department, collected from 1 January 2023 to 1 March 2023 were analyzed. Demographic details, symptoms, medical history, and discharge diagnoses recorded by emergency room attendings were examined. AI algorithms (ChatGPT-3.5 and GPT-4) generated differential diagnoses, which were compared with those by attending physicians. Accuracy was determined by comparing each rater’s diagnoses with the gold standard discharge diagnosis, calculating the proportion of correctly identified cases. Precision was assessed using Cohen’s kappa coefficient and Intraclass Correlation Coefficient to measure agreement between raters. Results: Mean age of patients was 49.12 years, with 57.3% males and 42.7% females. Chief complaints included fever/sepsis (24.7%), gastrointestinal issues (17.7%), and cardiovascular problems (16.4%). Diagnostic accuracy against discharge diagnoses was highest for ChatGPT-4 (85.5%), followed by ChatGPT-3.5 (84.6%) and ED attendings (83%). Cohen’s kappa demonstrated moderate agreement (0.7) between AI models, with lower agreement observed for ED attendings. Stratified analysis revealed higher accuracy for gastrointestinal complaints with Chat GPT-4 (87.5%) and cardiovascular complaints with Chat GPT-3.5 (81.34%). Conclusions: Our study demonstrates that Chat GPT-4 and GPT-3.5 exhibit comparable diagnostic accuracy to board-certified emergency department attendings, highlighting their potential to aid decision-making in dynamic clinical settings. The stratified analysis revealed comparable reliability and precision of the AI chat bots for cardiovascular complaints which represents a significant proportion of the high-risk patients presenting to the emergency department and provided targeted insights into rater performance within specific medical domains. This study contributes to integrating AI models into medical practice, enhancing efficiency and effectiveness in clinical decision-making. Further research is warranted to explore broader applications of AI in healthcare. Full article
(This article belongs to the Section Intensive Care)
Show Figures

Figure 1

34 pages, 8804 KB  
Article
Artificial Intelligence (ChatGPT) and Bloom’s Taxonomy in Theoretical Computer Science Education
by Hashim Habiballa, Martin Kotyrba, Eva Volna, Vladimir Bradac and Martin Dusek
Appl. Sci. 2025, 15(2), 581; https://doi.org/10.3390/app15020581 - 9 Jan 2025
Cited by 9 | Viewed by 5643
Abstract
The study focuses on evaluating the performance of AI-based tools, specifically ChatGPT versions 3.5 and 4.0, in comparison to human students in the field of Theoretical Computer Science Education. The experiment aims to assess the capabilities of both AI and human subjects in [...] Read more.
The study focuses on evaluating the performance of AI-based tools, specifically ChatGPT versions 3.5 and 4.0, in comparison to human students in the field of Theoretical Computer Science Education. The experiment aims to assess the capabilities of both AI and human subjects in solving learning tasks based on Bloom’s Taxonomy. The primary objectives of the study are to determine the educational performance of AI and students, identify areas where students may outperform AI in learning tasks, and evaluate the normalized overall results of educational performance with equal assignment weights. The assessment included testing on various types of Bloom’s Learning Task Taxonomy (LTT) based on specific Learning Objectives (LOs). Hypotheses were formulated for quantitative and qualitative analysis to compare the performance of AI and human subjects. Quantitative analysis revealed engaging results regarding the educational performance evaluation with AI-based tools. While some differences were observed in the performance of AI and students, the normalized overall results with equal assignment weights did not show statistically significant differences. The study highlighted the advantages and disadvantages of both humans and AI bots in solving learning tasks. The study concludes that there are areas where students may have an advantage over AI in tasks requiring understanding, evaluation, and creative thinking. Recommendations are provided for educators on utilizing AI-based tools in education, emphasizing the coexistence of AI possibilities with well-designed assignments. The findings suggest that AI has a valuable role in education, and a thorough analysis of teaching approaches and student evaluation is essential in leveraging artificial intelligence effectively. Full article
Show Figures

Figure 1

34 pages, 482 KB  
Article
The Use of Large Language Models for Translating Buddhist Texts from Classical Chinese to Modern English: An Analysis and Evaluation with ChatGPT 4, ERNIE Bot 4, and Gemini Advanced
by Xiang Wei
Religions 2024, 15(12), 1559; https://doi.org/10.3390/rel15121559 - 20 Dec 2024
Cited by 6 | Viewed by 3982
Abstract
This study conducts a comprehensive evaluation of large language models (LLMs), including ChatGPT 4, ERNIE Bot 4, and Gemini Advanced, in the context of translating Buddhist texts from classical Chinese to modern English. Focusing on three distinct Buddhist texts encompassing various literary forms [...] Read more.
This study conducts a comprehensive evaluation of large language models (LLMs), including ChatGPT 4, ERNIE Bot 4, and Gemini Advanced, in the context of translating Buddhist texts from classical Chinese to modern English. Focusing on three distinct Buddhist texts encompassing various literary forms and complexities, the analysis examines the models’ capabilities in handling specialized Buddhist terminology, classical Chinese grammar, and the translation of complex, lengthy sentences. The study employs a methodology where selected excerpts from these texts are translated by the LLMs, followed by an in-depth analysis comparing these machine-generated translations to human translations. The evaluation criteria include word translation accuracy, the ability to recognize and correctly interpret specific meanings within both classical and modern contexts, and the completeness of phrases without omitting or unnecessarily adding words. The findings reveal significant variations in the performance of these LLMs, with detailed observations on their strengths and weaknesses in translating specialized terms, managing grammatical structures unique to classical Chinese, and maintaining the integrity of the original texts’ meanings. This paper aims to shed light on the potential and limitations of using LLMs for translating complex literary works from ancient to modern languages, contributing valuable insights into the field of computational linguistics and the ongoing development of translation technologies. Full article
50 pages, 6509 KB  
Article
A Comprehensive Review of AI Advancement Using testFAILS and testFAILS-2 for the Pursuit of AGI
by Yulia Kumar, Mengtian Lin, Christopher Paredes, Dan Li, Guohao Yang, Dov Kruger, J. Jenny Li and Patricia Morreale
Electronics 2024, 13(24), 4991; https://doi.org/10.3390/electronics13244991 - 18 Dec 2024
Cited by 4 | Viewed by 5812
Abstract
In a previous paper we defined testFAILS, a set of benchmarks for measuring the efficacy of Large Language Models in various domains. This paper defines a second-generation framework, testFAILS-2 to measure how current AI engines are progressing towards Artificial General Intelligence (AGI). The [...] Read more.
In a previous paper we defined testFAILS, a set of benchmarks for measuring the efficacy of Large Language Models in various domains. This paper defines a second-generation framework, testFAILS-2 to measure how current AI engines are progressing towards Artificial General Intelligence (AGI). The testFAILS-2 framework offers enhanced evaluation metrics that address the latest developments in Artificial Intelligence Linguistic Systems (AILS). A key feature of this re-view is the “Chat with Alan” project, a Retrieval-Augmented Generation (RAG)-based AI bot inspired by Alan Turing, designed to distinguish between human and AI generated interactions, thereby emulating Turing’s original vision. We assess a variety of models, including ChatGPT-4o-mini and other Small Language Models (SLMs), as well as prominent Large Language Models (LLMs), utilizing expanded criteria that encompass result relevance, accessibility, cost, multimodality, agent creation capabilities, emotional AI attributes, AI search capacity, and LLM-robot integration. The analysis reveals that testFAILS-2 significantly enhances the evaluation of model robustness and user productivity, while also identifying critical areas for improvement in multimodal processing and emotional reasoning. By integrating rigorous evaluation standards and novel testing methodologies, testFAILS-2 advances the assessment of AILS, providing essential insights that contribute to the ongoing development of more effective and resilient AI systems towards achieving AGI. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

41 pages, 12999 KB  
Article
The AI-Powered Evolution of Big Data
by Yulia Kumar, Jose Marchena, Ardalan H. Awlla, J. Jenny Li and Hemn Barzan Abdalla
Appl. Sci. 2024, 14(22), 10176; https://doi.org/10.3390/app142210176 - 6 Nov 2024
Cited by 36 | Viewed by 18024
Abstract
The rapid advancement of artificial intelligence (AI), coupled with the global rollout of 4G and 5G networks, has fundamentally transformed the Big Data landscape, redefining data management and analysis methodologies. The ability to manage and analyze such vast and varied datasets has exceeded [...] Read more.
The rapid advancement of artificial intelligence (AI), coupled with the global rollout of 4G and 5G networks, has fundamentally transformed the Big Data landscape, redefining data management and analysis methodologies. The ability to manage and analyze such vast and varied datasets has exceeded the capacity of any individual or organization. This study introduces an enhanced framework that expands upon the traditional four Vs of Big Data—volume, velocity, volatility, and veracity—by incorporating six additional dimensions: value, validity, visualization, variability, volatility, and vulnerability. This comprehensive framework offers a novel and straightforward approach to understanding and addressing the complexities of Big Data in the AI era. This article further explores the use of ‘Big D’, an AI-driven, RAG-based Big Data analytical bot powered by the ChatGPT-4o model (ChatGPT version 4.0). This article’s innovation represents a significant advance in the field, accelerating and deepening the extraction and analysis of insights from large-scale datasets. This will enable us to develop a more nuanced and comprehensive understanding of intricate data landscapes. In addition, we proposed a framework and analytical tools that contribute to the evolution of Big Data analytics, particularly in the context of AI-driven processes. Full article
(This article belongs to the Special Issue Applied Intelligence in Natural Language Processing)
Show Figures

Figure 1

18 pages, 17110 KB  
Article
Contribution of Artificial Intelligence (AI) to Code-Based 3D Modeling Tasks
by Marianna Zichar and Ildikó Papp
Designs 2024, 8(5), 104; https://doi.org/10.3390/designs8050104 - 18 Oct 2024
Cited by 2 | Viewed by 2863
Abstract
The rapid advancement of technology and innovation is also impacting education across different levels. The rise of Artificial Intelligence (AI) is beginning to transform education in various areas, from course materials to assessment systems. This requires educators to reconsider how they evaluate students’ [...] Read more.
The rapid advancement of technology and innovation is also impacting education across different levels. The rise of Artificial Intelligence (AI) is beginning to transform education in various areas, from course materials to assessment systems. This requires educators to reconsider how they evaluate students’ knowledge. It is crucial to understand if and to what extent assignments can be completed using AI tools. This study explores two hypotheses about the risks of using code-based 3D modeling software in education and the potential for students to delegate their work to AI when completing assignments. We selected two tasks that students were able to successfully complete independently and provided the same amount of information (both textual and image) to AI in order to generate the necessary code. We tested the widely used ChatGPT and Gemini AI bots to assess their current performance in generating code based on text prompts or image-based information for the two models. Our findings indicate that students are not yet able to entirely delegate their work to these AI tools. Full article
(This article belongs to the Special Issue Design Process for Additive Manufacturing)
Show Figures

Figure 1

35 pages, 15883 KB  
Article
Bias and Cyberbullying Detection and Data Generation Using Transformer Artificial Intelligence Models and Top Large Language Models
by Yulia Kumar, Kuan Huang, Angelo Perez, Guohao Yang, J. Jenny Li, Patricia Morreale, Dov Kruger and Raymond Jiang
Electronics 2024, 13(17), 3431; https://doi.org/10.3390/electronics13173431 - 29 Aug 2024
Cited by 13 | Viewed by 9239
Abstract
Despite significant advancements in Artificial Intelligence (AI) and Large Language Models (LLMs), detecting and mitigating bias remains a critical challenge, particularly on social media platforms like X (formerly Twitter), to address the prevalent cyberbullying on these platforms. This research investigates the effectiveness of [...] Read more.
Despite significant advancements in Artificial Intelligence (AI) and Large Language Models (LLMs), detecting and mitigating bias remains a critical challenge, particularly on social media platforms like X (formerly Twitter), to address the prevalent cyberbullying on these platforms. This research investigates the effectiveness of leading LLMs in generating synthetic biased and cyberbullying data and evaluates the proficiency of transformer AI models in detecting bias and cyberbullying within both authentic and synthetic contexts. The study involves semantic analysis and feature engineering on a dataset of over 48,000 sentences related to cyberbullying collected from Twitter (before it became X). Utilizing state-of-the-art LLMs and AI tools such as ChatGPT-4, Pi AI, Claude 3 Opus, and Gemini-1.5, synthetic biased, cyberbullying, and neutral data were generated to deepen the understanding of bias in human-generated data. AI models including DeBERTa, Longformer, BigBird, HateBERT, MobileBERT, DistilBERT, BERT, RoBERTa, ELECTRA, and XLNet were initially trained to classify Twitter cyberbullying data and subsequently fine-tuned, optimized, and experimentally quantized. This study focuses on intersectional cyberbullying and multilabel classification to detect both bias and cyberbullying. Additionally, it proposes two prototype applications: one that detects cyberbullying using an intersectional approach and the innovative CyberBulliedBiasedBot that combines the generation and detection of biased and cyberbullying content. Full article
(This article belongs to the Special Issue Emerging Artificial Intelligence Technologies and Applications)
Show Figures

Figure 1

16 pages, 6121 KB  
Article
Prediction of Machine-Generated Financial Tweets Using Advanced Bidirectional Encoder Representations from Transformers
by Muhammad Asad Arshed, Ștefan Cristian Gherghina, Dur-E-Zahra and Mahnoor Manzoor
Electronics 2024, 13(11), 2222; https://doi.org/10.3390/electronics13112222 - 6 Jun 2024
Cited by 1 | Viewed by 2191
Abstract
With the rise of Large Language Models (LLMs), distinguishing between genuine and AI-generated content, particularly in finance, has become challenging. Previous studies have focused on binary identification of ChatGPT-generated content, overlooking other AI tools used for text regeneration. This study addresses this gap [...] Read more.
With the rise of Large Language Models (LLMs), distinguishing between genuine and AI-generated content, particularly in finance, has become challenging. Previous studies have focused on binary identification of ChatGPT-generated content, overlooking other AI tools used for text regeneration. This study addresses this gap by examining various AI-regenerated content types in the finance domain. Objective: The study aims to differentiate between human-generated financial content and AI-regenerated content, specifically focusing on ChatGPT, QuillBot, and SpinBot. It constructs a dataset comprising real text and AI-regenerated text for this purpose. Contribution: This research contributes to the field by providing a dataset that includes various types of AI-regenerated financial content. It also evaluates the performance of different models, particularly highlighting the effectiveness of the Bidirectional Encoder Representations from the Transformers Base Cased model in distinguishing between these content types. Methods: The dataset is meticulously preprocessed to ensure quality and reliability. Various models, including Bidirectional Encoder Representations Base Cased, are fine-tuned and compared with traditional machine learning models using TFIDF and Word2Vec approaches. Results: The Bidirectional Encoder Representations Base Cased model outperforms other models, achieving an accuracy, precision, recall, and F1 score of 0.73, 0.73, 0.73, and 0.72 respectively, in distinguishing between real and AI-regenerated financial content. Conclusions: This study demonstrates the effectiveness of the Bidirectional Encoder Representations base model in differentiating between human-generated financial content and AI-regenerated content. It highlights the importance of considering various AI tools in identifying synthetic content, particularly in the finance domain in Pakistan. Full article
Show Figures

Figure 1

17 pages, 2827 KB  
Article
Analysis of the Effectiveness of Model, Data, and User-Centric Approaches for Chat Application: A Case Study of BlenderBot 2.0
by Chanjun Park, Jungseob Lee, Suhyune Son, Kinam Park, Jungsun Jang and Heuiseok Lim
Appl. Sci. 2024, 14(11), 4821; https://doi.org/10.3390/app14114821 - 2 Jun 2024
Cited by 1 | Viewed by 2768
Abstract
BlenderBot 2.0 represents a significant advancement in open-domain chatbots by incorporating real-time information and retaining user information across multiple sessions through an internet search module. Despite its innovations, there are still areas for improvement. This paper examines BlenderBot 2.0’s limitations and errors from [...] Read more.
BlenderBot 2.0 represents a significant advancement in open-domain chatbots by incorporating real-time information and retaining user information across multiple sessions through an internet search module. Despite its innovations, there are still areas for improvement. This paper examines BlenderBot 2.0’s limitations and errors from three perspectives: model, data, and user interaction. From the data perspective, we highlight the challenges associated with the crowdsourcing process, including unclear guidelines for workers, insufficient measures for filtering hate speech, and the lack of a robust process for verifying the accuracy of internet-sourced information. From the user perspective, we identify nine types of limitations and conduct a thorough investigation into their causes. For each perspective, we propose practical methods for improvement and discuss potential directions for future research. Additionally, we extend our analysis to include perspectives in the era of large language models (LLMs), further broadening our understanding of the challenges and opportunities present in current AI technologies. This multifaceted analysis not only sheds light on BlenderBot 2.0’s current limitations but also charts a path forward for the development of more sophisticated and reliable open-domain chatbots within the broader context of LLM advancements. Full article
Show Figures

Figure 1

9 pages, 1615 KB  
Article
Evaluation of AI ChatBots for the Creation of Patient-Informed Consent Sheets
by Florian Jürgen Raimann, Vanessa Neef, Marie Charlotte Hennighausen, Kai Zacharowski and Armin Niklas Flinspach
Mach. Learn. Knowl. Extr. 2024, 6(2), 1145-1153; https://doi.org/10.3390/make6020053 - 24 May 2024
Cited by 8 | Viewed by 3684
Abstract
Introduction: Large language models (LLMs), such as ChatGPT, are a topic of major public interest, and their potential benefits and threats are a subject of discussion. The potential contribution of these models to health care is widely discussed. However, few studies to date [...] Read more.
Introduction: Large language models (LLMs), such as ChatGPT, are a topic of major public interest, and their potential benefits and threats are a subject of discussion. The potential contribution of these models to health care is widely discussed. However, few studies to date have examined LLMs. For example, the potential use of LLMs in (individualized) informed consent remains unclear. Methods: We analyzed the performance of the LLMs ChatGPT 3.5, ChatGPT 4.0, and Gemini with regard to their ability to create an information sheet for six basic anesthesiologic procedures in response to corresponding questions. We performed multiple attempts to create forms for anesthesia and analyzed the results checklists based on existing standard sheets. Results: None of the LLMs tested were able to create a legally compliant information sheet for any basic anesthesiologic procedure. Overall, fewer than one-third of the risks, procedural descriptions, and preparations listed were covered by the LLMs. Conclusions: There are clear limitations of current LLMs in terms of practical application. Advantages in the generation of patient-adapted risk stratification within individual informed consent forms are not available at the moment, although the potential for further development is difficult to predict. Full article
Show Figures

Figure 1

24 pages, 10127 KB  
Article
Unveiling AI-Generated Financial Text: A Computational Approach Using Natural Language Processing and Generative Artificial Intelligence
by Muhammad Asad Arshed, Ștefan Cristian Gherghina, Christine Dewi, Asma Iqbal and Shahzad Mumtaz
Computation 2024, 12(5), 101; https://doi.org/10.3390/computation12050101 - 15 May 2024
Cited by 9 | Viewed by 3635
Abstract
This study is an in-depth exploration of the nascent field of Natural Language Processing (NLP) and generative Artificial Intelligence (AI), and it concentrates on the vital task of distinguishing between human-generated text and content that has been produced by AI models. Particularly, this [...] Read more.
This study is an in-depth exploration of the nascent field of Natural Language Processing (NLP) and generative Artificial Intelligence (AI), and it concentrates on the vital task of distinguishing between human-generated text and content that has been produced by AI models. Particularly, this research pioneers the identification of financial text derived from AI models such as ChatGPT and paraphrasing tools like QuillBot. While our primary focus is on financial content, we have also pinpointed texts generated by paragraph rewriting tools and utilized ChatGPT for various contexts this multiclass identification was missing in previous studies. In this paper, we use a comprehensive feature extraction methodology that combines TF–IDF with Word2Vec, along with individual feature extraction methods. Importantly, combining a Random Forest model with Word2Vec results in impressive outcomes. Moreover, this study investigates the significance of the window size parameters in the Word2Vec approach, revealing that a window size of one produces outstanding scores across various metrics, including accuracy, precision, recall and the F1 measure, all reaching a notable value of 0.74. In addition to this, our developed model performs well in classification, attaining AUC values of 0.94 for the ‘GPT’ class; 0.77 for the ‘Quil’ class; and 0.89 for the ‘Real’ class. We also achieved an accuracy of 0.72, precision of 0.71, recall of 0.72, and F1 of 0.71 for our extended prepared dataset. This study contributes significantly to the evolving landscape of AI text identification, providing valuable insights and promising directions for future research. Full article
Show Figures

Figure 1

Back to TopTop