Advances in Large Language Model Empowered Machine Learning: Design and Application

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 10 December 2024 | Viewed by 9112

Special Issue Editors


E-Mail Website
Guest Editor
School of Computing, National University of Singapore, 5 Prince George's Park, Singapore 118404, Singapore
Interests: natural language processing; computer vision; vision-language learning

E-Mail Website
Guest Editor
School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
Interests: natural language processing
School of Computing, National University of Singapore, Singapore 117417, Singapore
Interests: vision and language; video understanding
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The field of artificial intelligence (AI) has witnessed a monumental transformation with the advent of large language models (LLMs). LLM techniques, such as ChatGPT, GPT-4, Llama, Flamingo, Blip-2, etc., have displayed remarkable capabilities in natural language processing (NLP), computer vision (CV), and many other intelligence-related tasks. The breakthroughs achieved by LLMs have not only revolutionized the way we interact with machines but also opened up exciting avenues for empowering various machine learning applications. LLMs have proven to be remarkably effective in understanding and processing varied data, such as language, vision, and video, making them a powerful tool for tackling complex problems in a wide range of disciplines, including but not limited to healthcare, finance, education, marketing, and social sciences.

The focus of this Special Issue is to explore the cutting-edge developments in LLM-empowered machine learning, with a particular emphasis on both the design aspects of LLMs and their broad application across diverse domains, such as NLP and CV. We welcome submissions (both of original research papers and review articles) related, but not limited to, the following topics:

  • LLM-empowered machine learning:
    • In-context learning;
    • Chain-of-thought reasoning;
    • Content creation;
    • Data analysis and understanding;
    • Knowledge-base/graph enhanced reliable generation.
  • LLM-empowered NLP:
    • Summarization and text generation;
    • Information extraction;
    • Question answering;
    • Sentiment analysis and opinion mining;
    • Semantic parsing;
    • Machine translation;
    • Recommendation.
  • LLM-empowered multimodal learning:
    • Text-to-image generation;
    • Text-to-video generation;
    • Image/video captioning;
    • 3D understanding;
    • Multimodal information retrieval;
    • Multimodal question answering;
    • Multimodal fusion and integration of information;
    • Multimodal applications/pipelines.

We look forward to receiving your contributions.

Dr. Hao Fei
Dr. Fei Li
Dr. Wei Ji
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • large language models
  • natural language processing
  • computer vision
  • vision-language learning
  • machine learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

32 pages, 5459 KiB  
Article
Toward the Adoption of Explainable Pre-Trained Large Language Models for Classifying Human-Written and AI-Generated Sentences
by Luca Petrillo, Fabio Martinelli, Antonella Santone and Francesco Mercaldo
Electronics 2024, 13(20), 4057; https://doi.org/10.3390/electronics13204057 - 15 Oct 2024
Viewed by 1293
Abstract
Pre-trained large language models have demonstrated impressive text generation capabilities, including understanding, writing, and performing many tasks in natural language. Moreover, with time and improvements in training and text generation techniques, these models are proving efficient at generating increasingly human-like content. However, they [...] Read more.
Pre-trained large language models have demonstrated impressive text generation capabilities, including understanding, writing, and performing many tasks in natural language. Moreover, with time and improvements in training and text generation techniques, these models are proving efficient at generating increasingly human-like content. However, they can also be modified to generate persuasive, contextual content weaponized for malicious purposes, including disinformation and novel social engineering attacks. In this paper, we present a study on identifying human- and AI-generated content using different models. Precisely, we fine-tune different models belonging to the BERT family, an open-source version of the GPT model, ELECTRA, and XLNet, and then perform a text classification task using two different labeled datasets—the first one consisting of 25,000 sentences generated by both AI and humans and the second comprising 22,929 abstracts that are ChatGPT-generated and written by humans. Furthermore, we perform an additional phase where we submit 20 sentences generated by ChatGPT and 20 sentences randomly extracted from Wikipedia to our fine-tuned models to verify the efficiency and robustness of the latter. In order to understand the prediction of the models, we performed an explainability phase using two sentences: one generated by the AI and one written by a human. We leveraged the integrated gradients and token importance techniques, analyzing the words and subwords of the two sentences. As a result of the first experiment, we achieved an average accuracy of 99%, precision of 98%, recall of 99%, and F1-score of 99%. For the second experiment, we reached an average accuracy of 51%, precision of 50%, recall of 52%, and F1-score of 51%. Full article
Show Figures

Figure 1

18 pages, 1876 KiB  
Article
Improving Training Dataset Balance with ChatGPT Prompt Engineering
by Mateusz Kochanek, Igor Cichecki, Oliwier Kaszyca, Dominika Szydło, Michał Madej, Dawid Jędrzejewski, Przemysław Kazienko and Jan Kocoń
Electronics 2024, 13(12), 2255; https://doi.org/10.3390/electronics13122255 - 8 Jun 2024
Cited by 2 | Viewed by 1765
Abstract
The rapid evolution of large language models, in particular OpenAI’s GPT-3.5-turbo and GPT-4, indicates a growing interest in advanced computational methodologies. This paper proposes a novel approach to synthetic data generation and knowledge distillation through prompt engineering. The potential of large language models [...] Read more.
The rapid evolution of large language models, in particular OpenAI’s GPT-3.5-turbo and GPT-4, indicates a growing interest in advanced computational methodologies. This paper proposes a novel approach to synthetic data generation and knowledge distillation through prompt engineering. The potential of large language models (LLMs) is used to address the problem of unbalanced training datasets for other machine learning models. This is not only a common issue but also a crucial determinant of the final model quality and performance. Three prompting strategies have been considered: basic, composite, and similarity prompts. Although the initial results do not match the performance of comprehensive datasets, the similarity prompts method exhibits considerable promise, thus outperforming other methods. The investigation of our rebalancing methods opens pathways for future research on leveraging continuously developed LLMs for the enhanced generation of high-quality synthetic data. This could have an impact on many large-scale engineering applications. Full article
Show Figures

Figure 1

26 pages, 8779 KiB  
Article
LCV2: A Universal Pretraining-Free Framework for Grounded Visual Question Answering
by Yuhan Chen, Lumei Su, Lihua Chen and Zhiwei Lin
Electronics 2024, 13(11), 2061; https://doi.org/10.3390/electronics13112061 - 25 May 2024
Viewed by 924
Abstract
Grounded Visual Question Answering systems place heavy reliance on substantial computational power and data resources in pretraining. In response to this challenge, this paper introduces the LCV2 modular approach, which utilizes a frozen large language model (LLM) to bridge the off-the-shelf generic visual [...] Read more.
Grounded Visual Question Answering systems place heavy reliance on substantial computational power and data resources in pretraining. In response to this challenge, this paper introduces the LCV2 modular approach, which utilizes a frozen large language model (LLM) to bridge the off-the-shelf generic visual question answering (VQA) module with a generic visual grounding (VG) module. It leverages the generalizable knowledge of these expert models, avoiding the need for any large-scale pretraining. Innovatively, within the LCV2 framework, question and predicted answer pairs are transformed into descriptive and referring captions, enhancing the clarity of the visual cues directed by the question text for the VG module’s grounding. This compensates for the limitations of missing intrinsic text–visual coupling in non-end-to-end frameworks. Comprehensive experiments on benchmark datasets, such as GQA, CLEVR, and VizWiz-VQA-Grounding, were conducted to evaluate the method’s performance and compare it with several baseline methods. In particular, it achieved an IoU F1 score of 59.6% on the GQA dataset and an IoU F1 score of 37.4% on the CLEVR dataset, surpassing some baseline results and demonstrating the LCV2’s competitive performance. Full article
Show Figures

Figure 1

22 pages, 3038 KiB  
Article
Benchmarking Large Language Model (LLM) Performance for Game Playing via Tic-Tac-Toe
by Oguzhan Topsakal and Jackson B. Harper
Electronics 2024, 13(8), 1532; https://doi.org/10.3390/electronics13081532 - 17 Apr 2024
Viewed by 2884
Abstract
This study investigates the strategic decision-making abilities of large language models (LLMs) via the game of Tic-Tac-Toe, renowned for its straightforward rules and definitive outcomes. We developed a mobile application coupled with web services, facilitating gameplay among leading LLMs, including Jurassic-2 Ultra by [...] Read more.
This study investigates the strategic decision-making abilities of large language models (LLMs) via the game of Tic-Tac-Toe, renowned for its straightforward rules and definitive outcomes. We developed a mobile application coupled with web services, facilitating gameplay among leading LLMs, including Jurassic-2 Ultra by AI21, Claude 2.1 by Anthropic, Gemini-Pro by Google, GPT-3.5-Turbo and GPT-4 by OpenAI, Llama2-70B by Meta, and Mistral Large by Mistral, to assess their rule comprehension and strategic thinking. Using a consistent prompt structure in 10 sessions for each LLM pair, we systematically collected data on wins, draws, and invalid moves across 980 games, employing two distinct prompt types to vary the presentation of the game’s status. Our findings reveal significant performance variations among the LLMs. Notably, GPT-4, GPT-3.5-Turbo, and Llama2 secured the most wins with the list prompt, while GPT-4, Gemini-Pro, and Mistral Large excelled using the illustration prompt. GPT-4 emerged as the top performer, achieving victory with the minimum number of moves and the fewest errors for both prompt types. This research introduces a novel methodology for assessing LLM capabilities using a game that can illuminate their strategic thinking abilities. Beyond enhancing our comprehension of LLM performance, this study lays the groundwork for future exploration into their utility in complex decision-making scenarios, offering directions for further inquiry and the exploration of LLM limits within game-based frameworks. Full article
Show Figures

Figure 1

10 pages, 4481 KiB  
Article
Evaluation of Human Perception Thresholds Using Knowledge-Based Pattern Recognition
by Marek R. Ogiela and Urszula Ogiela
Electronics 2024, 13(4), 736; https://doi.org/10.3390/electronics13040736 - 11 Feb 2024
Viewed by 1105
Abstract
This paper presents research on determining individual perceptual thresholds in cognitive analyses and the understanding of visual patterns. Such techniques are based on the processes of cognitive resonance and can be applied to the division and reconstruction of images using threshold algorithms. The [...] Read more.
This paper presents research on determining individual perceptual thresholds in cognitive analyses and the understanding of visual patterns. Such techniques are based on the processes of cognitive resonance and can be applied to the division and reconstruction of images using threshold algorithms. The research presented here considers the most important parameters that affect the determination of visual perception thresholds. These parameters are the thematic knowledge and personal expectations that arise at the time of image observation and recognition. The determination of perceptual thresholds has been carried out using visual pattern splitting techniques through threshold methods. The reconstruction of the divided patterns was carried out by combining successive components that, as information was gathered, allowed more and more details to become apparent in the image until the observer could recognize it correctly. The study being carried out in this way made it possible to determine individual perceptual thresholds for dozens of test subjects. The results of the study also showed strong correlations between the determined perceptual thresholds and the participants’ accumulated thematic knowledge, expectations and experiences from a previous recognition of similar image patterns. Full article
Show Figures

Figure 1

Back to TopTop