Previous Issue
Volume 5, September
 
 

AI, Volume 5, Issue 4 (December 2024) – 31 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
15 pages, 2741 KiB  
Article
SC-Phi2: A Fine-Tuned Small Language Model for StarCraft II Build Order Prediction
by Muhammad Junaid Khan and Gita Sukthankar
AI 2024, 5(4), 2338-2352; https://doi.org/10.3390/ai5040115 - 13 Nov 2024
Abstract
Background: This article introduces SC-Phi2, a fine-tuned StarCraft II small language model. Small language models, like Phi2, Gemma, and DistilBERT, are streamlined versions of large language models (LLMs) with fewer parameters that require less computational power and memory to run. Method: To teach [...] Read more.
Background: This article introduces SC-Phi2, a fine-tuned StarCraft II small language model. Small language models, like Phi2, Gemma, and DistilBERT, are streamlined versions of large language models (LLMs) with fewer parameters that require less computational power and memory to run. Method: To teach Microsoft’s Phi2 model about StarCraft, we create a new SC2 text dataset with information about StarCraft races, roles, and actions and use it to fine-tune Phi-2 with self-supervised learning. We pair this language model with a Vision Transformer (ViT) from the pre-trained BLIP-2 (Bootstrapping Language Image Pre-training) model, fine-tuning it on the StarCraft replay dataset, MSC. This enables us to construct dynamic prompts that include visual game state information. Results: Unlike the large models used in StarCraft LLMs such as GPT-3.5, Phi2 is trained primarily on textbook data and contains little inherent knowledge of StarCraft II beyond what is provided by our training process. By using LoRA (Low-rank Adaptation) and quantization, our model can be trained on a single GPU. We demonstrate that our model performs well at build order prediction, an important StarCraft macromanagement task. Conclusions: Our research on the usage of small models is a step towards reducing the carbon footprint of AI agents. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

17 pages, 430 KiB  
Article
Adoption and Impact of ChatGPT in Computer Science Education: A Case Study on a Database Administration Course
by Daniel López-Fernández and Ricardo Vergaz
AI 2024, 5(4), 2321-2337; https://doi.org/10.3390/ai5040114 - 11 Nov 2024
Viewed by 342
Abstract
The irruption of GenAI such as ChatGPT has changed the educational landscape. Therefore, methodological guidelines and more empirical experiences are needed to better understand these tools and know how to use them to their fullest potential. This contribution presents an exploratory and correlational [...] Read more.
The irruption of GenAI such as ChatGPT has changed the educational landscape. Therefore, methodological guidelines and more empirical experiences are needed to better understand these tools and know how to use them to their fullest potential. This contribution presents an exploratory and correlational study conducted with 37 computer science students who used ChatGPT as a support tool to learn database administration. The article addresses three questions: The first one explores the degree of use of ChatGPT among computer science students to learn database administration, the second one explores the profile of students who get the most out of tools like ChatGPT to deal with database administration activities, and the third one explores how the utilization of ChatGPT can impact in academic performance. To empirically shed light on these questions the student’s grades and a comprehensive questionnaire were employed as research instruments. The obtained results indicate that traditional learning resources, such as teacher’s explanations and student’s reports, were widely used and correlated positively with student’s grades. The usage and perceived utility of ChatGPT were moderate, but positive correlations between students’ grades and ChatGPT usage were found. Indeed, a significantly higher use of this tool was identified among the group of outstanding students. This indicate that high-performing students are the ones who are using ChatGPT the most. So, a new digital trench could be rising between these students and those with a lower degree of fundamentals and worse prompting skills, who may not take advantage of all the ChatGPT possibilities. Full article
(This article belongs to the Topic Explainable AI in Education)
Show Figures

Figure 1

21 pages, 4886 KiB  
Article
Comparison of CNN-Based Architectures for Detection of Different Object Classes
by Nataliya Bilous, Vladyslav Malko, Marcus Frohme and Alina Nechyporenko
AI 2024, 5(4), 2300-2320; https://doi.org/10.3390/ai5040113 - 11 Nov 2024
Viewed by 426
Abstract
(1) Background: Detecting people and technical objects in various situations, such as natural disasters and warfare, is critical to search and rescue operations and the safety of civilians. A fast and accurate detection of people and equipment can significantly increase the effectiveness of [...] Read more.
(1) Background: Detecting people and technical objects in various situations, such as natural disasters and warfare, is critical to search and rescue operations and the safety of civilians. A fast and accurate detection of people and equipment can significantly increase the effectiveness of search and rescue missions and provide timely assistance to people. Computer vision and deep learning technologies play a key role in detecting the required objects due to their ability to analyze big volumes of visual data in real-time. (2) Methods: The performance of the neural networks such as You Only Look Once (YOLO) v4-v8, Faster R-CNN, Single Shot MultiBox Detector (SSD), and EfficientDet has been analyzed using COCO2017, SARD, SeaDronesSee, and VisDrone2019 datasets. The main metrics for comparison were mAP, Precision, Recall, F1-Score, and the ability of the neural network to work in real-time. (3) Results: The most important metrics for evaluating the efficiency and performance of models for a given task are accuracy (mAP), F1-Score, and processing speed (FPS). These metrics allow us to evaluate both the accuracy of object recognition and the ability to use the models in real-world environments where high processing speed is important. (4) Conclusion: Although different neural networks perform better on certain types of metrics, YOLO outperforms them on all metrics, showing the best results of mAP-0.88, F1-0.88, and FPS-48, so the focus was on these models. Full article
Show Figures

Figure 1

21 pages, 402 KiB  
Systematic Review
Enhancing IoT Security in Vehicles: A Comprehensive Review of AI-Driven Solutions for Cyber-Threat Detection
by Rafael Abreu, Emanuel Simão, Carlos Serôdio, Frederico Branco and António Valente
AI 2024, 5(4), 2279-2299; https://doi.org/10.3390/ai5040112 - 6 Nov 2024
Viewed by 1042
Abstract
Background: The Internet of Things (IoT) has improved many aspects that have impacted the industry and the people’s daily lives. To begin with, the IoT allows communication to be made across a wide range of devices, from household appliances to industrial machinery. This [...] Read more.
Background: The Internet of Things (IoT) has improved many aspects that have impacted the industry and the people’s daily lives. To begin with, the IoT allows communication to be made across a wide range of devices, from household appliances to industrial machinery. This connectivity allows for a better integration of the pervasive computing, making devices “smart” and capable of interacting with each other and with the corresponding users in a sublime way. However, the widespread adoption of IoT devices has introduced some security challenges, because these devices usually run in environments that have limited resources. As IoT technology becomes more integrated into critical infrastructure and daily life, the need for stronger security measures will increase. These devices are exposed to a variety of cyber-attacks. This literature review synthesizes the current research of artificial intelligence (AI) technologies to improve IoT security. This review addresses key research questions, including: (1) What are the primary challenges and threats that IoT devices face?; (2) How can AI be used to improve IoT security?; (3) What AI techniques are currently being used for this purpose?; and (4) How does applying AI to IoT security differ from traditional methods? Methods: We included a total of 33 peer-reviewed studies published between 2020 and 2024, specifically in journal and conference papers written in English. Studies irrelevant to the use of AI for IoT security, duplicate studies, and articles without full-text access were excluded. The literature search was conducted using scientific databases, including MDPI, ScienceDirect, IEEE Xplore, and SpringerLink. Results were synthesized through a narrative synthesis approach, with the help of the Parsifal tool to organize and visualize key themes and trends. Results: We focus on the use of machine learning, deep learning, and federated learning, which are used for anomaly detection to identify and mitigate the security threats inherent to these devices. AI-driven technologies offer promising solutions for attack detection and predictive analysis, reducing the need for human intervention more significantly. This review acknowledges limitations such as the rapidly evolving nature of IoT technologies, the early-stage development or proprietary nature of many AI techniques, the variable performance of AI models in real-world applications, and potential biases in the search and selection of articles. The risk of bias in this systematic review is moderate. While the study selection and data collection processes are robust, the reliance on narrative synthesis and the limited exploration of potential biases in the selection process introduce some risk. Transparency in funding and conflict of interest reporting reduces bias in those areas. Discussion: The effectiveness of these AI-based approaches can vary depending on the performance of the model and the computational efficiency. In this article, we provide a comprehensive overview of existing AI models applied to IoT security, including machine learning (ML), deep learning (DL), and hybrid approaches. We also examine their role in enhancing the detection accuracy. Despite all the advances, challenges still remain in terms of data privacy and the scalability of AI solutions in IoT security. Conclusion: This review provides a comprehensive overview of ML applications to enhance IoT security. We also discuss and outline future directions, emphasizing the need for collaboration between interested parties and ongoing innovation to address the evolving threat landscape in IoT security. Full article
Show Figures

Figure 1

19 pages, 2078 KiB  
Article
Enhancing Medical Image Classification with Unified Model Agnostic Computation and Explainable AI
by Elie Neghawi and Yan Liu
AI 2024, 5(4), 2260-2278; https://doi.org/10.3390/ai5040111 - 5 Nov 2024
Viewed by 531
Abstract
Background: Advances in medical image classification have recently benefited from general augmentation techniques. However, these methods often fall short in performance and interpretability. Objective: This paper applies the Unified Model Agnostic Computation (UMAC) framework specifically to the medical domain to demonstrate [...] Read more.
Background: Advances in medical image classification have recently benefited from general augmentation techniques. However, these methods often fall short in performance and interpretability. Objective: This paper applies the Unified Model Agnostic Computation (UMAC) framework specifically to the medical domain to demonstrate its utility in this critical area. Methods: UMAC is a model-agnostic methodology designed to develop machine learning approaches that integrate seamlessly with various paradigms, including self-supervised, semi-supervised, and supervised learning. By unifying and standardizing computational models and algorithms, UMAC ensures adaptability across different data types and computational environments while incorporating state-of-the-art methodologies. In this study, we integrate UMAC as a plug-and-play module within convolutional neural networks (CNNs) and Transformer architectures, enabling the generation of high-quality representations even with minimal data. Results: Our experiments across nine diverse 2D medical image datasets show that UMAC consistently outperforms traditional data augmentation methods, achieving a 1.89% improvement in classification accuracy. Conclusions: Additionally, by incorporating explainable AI (XAI) techniques, we enhance model transparency and reliability in decision-making. This study highlights UMAC’s potential as a powerful tool for improving both the performance and interpretability of medical image classification models. Full article
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)
Show Figures

Figure 1

23 pages, 632 KiB  
Article
Filtering Useful App Reviews Using Naïve Bayes—Which Naïve Bayes?
by Pouya Ataei, Sri Regula, Daniel Staegemann and Saurabh Malgaonkar
AI 2024, 5(4), 2237-2259; https://doi.org/10.3390/ai5040110 - 5 Nov 2024
Viewed by 419
Abstract
App reviews provide crucial feedback for software maintenance and evolution, but manually extracting useful reviews from vast volumes is time-consuming and challenging. This study investigates the effectiveness of six Naïve Bayes variants for automatically filtering useful app reviews. We evaluated these variants on [...] Read more.
App reviews provide crucial feedback for software maintenance and evolution, but manually extracting useful reviews from vast volumes is time-consuming and challenging. This study investigates the effectiveness of six Naïve Bayes variants for automatically filtering useful app reviews. We evaluated these variants on datasets from five popular apps, comparing their performance in terms of accuracy, precision, recall, F-measure, and processing time. Our results show that Expectation Maximization-Multinomial Naïve Bayes with Laplace smoothing performed best overall, achieving up to 89.2% accuracy and 0.89 F-measure. Complement Naïve Bayes with Laplace smoothing demonstrated particular effectiveness for imbalanced datasets. Generally, incorporating Laplace smoothing and Expectation Maximization improved performance, albeit with increased processing time. This study also examined the impact of data imbalance on classification performance. Our findings suggest that these advanced Naïve Bayes variants hold promise for filtering useful app reviews, especially when dealing with limited labeled data or imbalanced datasets. This research contributes to the body of evidence around app review mining and provides insights for enhancing software maintenance and evolution processes. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

19 pages, 589 KiB  
Article
Adaptive Exploration Artificial Bee Colony for Mathematical Optimization
by Shaymaa Alsamia, Edina Koch, Hazim Albedran and Richard Ray
AI 2024, 5(4), 2218-2236; https://doi.org/10.3390/ai5040109 - 5 Nov 2024
Viewed by 442
Abstract
The artificial bee colony (ABC) algorithm is a famous swarm intelligence method utilized across various disciplines due to its robustness. However, it exhibits limitations in exploration mechanisms, particularly in high-dimensional or complex landscapes. This article introduces the adaptive exploration artificial bee colony (AEABC), [...] Read more.
The artificial bee colony (ABC) algorithm is a famous swarm intelligence method utilized across various disciplines due to its robustness. However, it exhibits limitations in exploration mechanisms, particularly in high-dimensional or complex landscapes. This article introduces the adaptive exploration artificial bee colony (AEABC), a novel variant that reinspires the ABC algorithm based on real-world phenomena. AEABC incorporates new distance-based parameters and mechanisms to correct the original design, enhancing its robustness. The performance of AEABC was evaluated against 33 state-of-the-art metaheuristics across twenty-five benchmark functions and an engineering application. AEABC consistently outperformed its counterparts, demonstrating superior efficiency and accuracy. In a variable-sized problem (n = 10), the traditional ABC algorithm converged to 3.086 × 106, while AEABC achieved a convergence of 2.0596 × 10−255, highlighting its robust performance. By addressing the shortcomings of the traditional ABC algorithm, AEABC significantly advances mathematical optimization, especially in engineering applications. This work underscores the significance of the inspiration of the traditional ABC algorithm in enhancing the capabilities of swarm intelligence. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

15 pages, 5169 KiB  
Article
Predicting the Multiphotonic Absorption in Graphene by Machine Learning
by José Zahid García-Córdova, Jose Alberto Arano-Martinez, Cecilia Mercado-Zúñiga, Claudia Lizbeth Martínez-González and Carlos Torres-Torres
AI 2024, 5(4), 2203-2217; https://doi.org/10.3390/ai5040108 - 4 Nov 2024
Viewed by 326
Abstract
This study analyzes the nonlinear optical properties exhibited by graphene, focusing on the nonlinear absorption coefficient and the nonlinear refractive index. The evaluation was conducted using the Z-scan technique with a 532 nm wavelength laser at various intensities. The nonlinear optical absorption and [...] Read more.
This study analyzes the nonlinear optical properties exhibited by graphene, focusing on the nonlinear absorption coefficient and the nonlinear refractive index. The evaluation was conducted using the Z-scan technique with a 532 nm wavelength laser at various intensities. The nonlinear optical absorption and the nonlinear optical refractive index were measured. Four machine learning models, including linear regression, decision trees, random forests, and gradient boosting regression, were trained to analyze how the nonlinear optical absorption coefficient varies with variables such as spot radius, maximum energy, and normalized minimum transmission. The models were trained with synthetic data and subsequently validated with experimental data. Decision tree-based models, such as random forests and gradient boosting regression, demonstrated superior performance compared to linear regression, especially in terms of mean squared error. This work provides a detailed assessment of the nonlinear optical properties of graphene and highlights the effectiveness of machine learning methods in this context. Full article
(This article belongs to the Section Chemical Artificial Intelligence)
Show Figures

Figure 1

16 pages, 648 KiB  
Article
Dynamic Multiobjective Optimization Based on Multi-Environment Knowledge Selection and Transfer
by Wei Song and Jian Yu
AI 2024, 5(4), 2187-2202; https://doi.org/10.3390/ai5040107 - 1 Nov 2024
Viewed by 612
Abstract
Background: Dynamic multiobjective optimization problems (DMOPs) involve multiple conflicting and time-varying objectives, and dynamic multiobjective algorithms (DMOAs) aim to find Pareto optima that are closer to the real one in the new environment as soon as possible. In particular, the introduction of transfer [...] Read more.
Background: Dynamic multiobjective optimization problems (DMOPs) involve multiple conflicting and time-varying objectives, and dynamic multiobjective algorithms (DMOAs) aim to find Pareto optima that are closer to the real one in the new environment as soon as possible. In particular, the introduction of transfer learning in DMOAs has led to good results in solving DMOPs. However, the selection of valuable historical knowledge and the mitigation of negative transfer remain important problems in existing transfer learning-based DMOAs. Method: A DMOA based on multi-environment knowledge selection and transfer (MST-DMOA) is proposed in this article. First, by clustering historical Pareto optima, some representative solutions that can reflect the main evolutionary information are selected as knowledge of the environment. Second, the similarity between the historical and current environments is evaluated, and then the knowledge of multiple similar environments is selected as valuable historical knowledge to construct the source domain. Third, solutions with high quality in the new environment are obtained to form the target domain, which can better help historical knowledge to adapt to the current environment, thus effectively alleviating negative transfer. Conclusions: We compare the proposed MST-DMOA with five state-of-the-art DMOAs on fourteen benchmark test problems, and the experimental results verify the excellent performance of MST-DMOA in solving DMOPs. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

17 pages, 3717 KiB  
Article
OTM-HC: Enhanced Skeleton-Based Action Representation via One-to-Many Hierarchical Contrastive Learning
by Muhammad Usman, Wenming Cao, Zhao Huang, Jianqi Zhong and Ruiya Ji
AI 2024, 5(4), 2170-2186; https://doi.org/10.3390/ai5040106 - 1 Nov 2024
Viewed by 453
Abstract
Human action recognition has become crucial in computer vision, with growing applications in surveillance, human–computer interaction, and healthcare. Traditional approaches often use broad feature representations, which may miss subtle variations in timing and movement within action sequences. Our proposed One-to-Many Hierarchical Contrastive Learning [...] Read more.
Human action recognition has become crucial in computer vision, with growing applications in surveillance, human–computer interaction, and healthcare. Traditional approaches often use broad feature representations, which may miss subtle variations in timing and movement within action sequences. Our proposed One-to-Many Hierarchical Contrastive Learning (OTM-HC) framework maps the input into multi-layered feature vectors, creating a hierarchical contrast representation that captures various granularities within a human skeleton sequence temporal and spatial domains. Using sequence-to-sequence (Seq2Seq) transformer encoders and downsampling modules, OTM-HC can distinguish between multiple levels of action representations, such as instance, domain, clip, and part levels. Each level contributes significantly to a comprehensive understanding of action representations. The OTM-HC model design is adaptable, ensuring smooth integration with advanced Seq2Seq encoders. We tested the OTM-HC framework across four datasets, demonstrating improved performance over state-of-the-art models. Specifically, OTM-HC achieved improvements of 0.9% and 0.6% on NTU60, 0.4% and 0.7% on NTU120, and 0.7% and 0.3% on PKU-MMD I and II, respectively, surpassing previous leading approaches across these datasets. These results showcase the robustness and adaptability of our model for various skeleton-based action recognition tasks. Full article
Show Figures

Figure 1

23 pages, 6011 KiB  
Article
Optimizing Steering Angle Prediction in Self-Driving Vehicles Using Evolutionary Convolutional Neural Networks
by Bashar Khawaldeh, Antonio M. Mora and Hossam Faris
AI 2024, 5(4), 2147-2169; https://doi.org/10.3390/ai5040105 - 30 Oct 2024
Viewed by 644
Abstract
The global community is awaiting the advent of a self-driving vehicle that is safe, reliable, and capable of navigating a diverse range of road conditions and terrains. This requires a lot of research, study, and optimization. Thus, this work focused on implementing, training, [...] Read more.
The global community is awaiting the advent of a self-driving vehicle that is safe, reliable, and capable of navigating a diverse range of road conditions and terrains. This requires a lot of research, study, and optimization. Thus, this work focused on implementing, training, and optimizing a convolutional neural network (CNN) model, aiming to predict the steering angle during driving (one of the main issues). The considered dataset comprises images collected inside a car-driving simulator and further processed for augmentation and removal of unimportant details. In addition, an innovative data-balancing process was previously performed. A CNN model was trained with the dataset, conducting a comparison between several different standard optimizers. Moreover, evolutionary optimization was applied to optimize the model’s weights as well as the optimizers themselves. Several experiments were performed considering different approaches of genetic algorithms (GAs) along with other optimizers from the state of the art. The obtained results demonstrate that the GA is an effective optimization tool for this problem. Full article
Show Figures

Figure 1

20 pages, 892 KiB  
Article
TRust Your GENerator (TRYGEN): Enhancing Out-of-Model Scope Detection
by Václav Diviš, Bastian Spatz and Marek Hrúz
AI 2024, 5(4), 2127-2146; https://doi.org/10.3390/ai5040104 - 30 Oct 2024
Viewed by 349
Abstract
Recent research has drawn attention to the ambiguity surrounding the definition and learnability of Out-of-Distribution recognition. Although the original problem remains unsolved, the term “Out-of-Model Scope” detection offers a clearer perspective. The ability to detect Out-of-Model Scope inputs is particularly beneficial in safety-critical [...] Read more.
Recent research has drawn attention to the ambiguity surrounding the definition and learnability of Out-of-Distribution recognition. Although the original problem remains unsolved, the term “Out-of-Model Scope” detection offers a clearer perspective. The ability to detect Out-of-Model Scope inputs is particularly beneficial in safety-critical applications such as autonomous driving or medicine. By detecting Out-of-Model Scope situations, the system’s robustness is enhanced and it is prevented from operating in unknown and unsafe scenarios. In this paper, we propose a novel approach for Out-of-Model Scope detection that integrates three sources of information: (1) the original input, (2) its latent feature representation extracted by an encoder, and (3) a synthesized version of the input generated from its latent representation. We demonstrate the effectiveness of combining original and synthetically generated inputs to defend against adversarial attacks in the computer vision domain. Our method, TRust Your GENerator (TRYGEN), achieves results comparable to those of other state-of-the-art methods and allows any encoder to be integrated into our pipeline in a plug-and-train fashion. Through our experiments, we evaluate which combinations of the encoder’s features are most effective for discovering Out-of-Model Scope samples and highlight the importance of a compact feature space for training the generator. Full article
(This article belongs to the Section AI in Autonomous Systems)
Show Figures

Figure 1

23 pages, 4795 KiB  
Article
Hybrid Artificial Intelligence Strategies for Drone Navigation
by Rubén San-Segundo, Lucía Angulo, Manuel Gil-Martín, David Carramiñana and Ana M. Bernardos
AI 2024, 5(4), 2104-2126; https://doi.org/10.3390/ai5040103 - 29 Oct 2024
Viewed by 569
Abstract
Objective: This paper describes the development of hybrid artificial intelligence strategies for drone navigation. Methods: The navigation module combines a deep learning model with a rule-based engine depending on the agent state. The deep learning model has been trained using reinforcement learning. The [...] Read more.
Objective: This paper describes the development of hybrid artificial intelligence strategies for drone navigation. Methods: The navigation module combines a deep learning model with a rule-based engine depending on the agent state. The deep learning model has been trained using reinforcement learning. The rule-based engine uses expert knowledge to deal with specific situations. The navigation module incorporates several strategies to explain the drone decision based on its observation space, and different mechanisms for including human decisions in the navigation process. Finally, this paper proposes an evaluation methodology based on defining several scenarios and analyzing the performance of the different strategies according to metrics adapted to each scenario. Results: Two main navigation problems have been studied. For the first scenario (reaching known targets), it has been possible to obtain a 90% task completion rate, reducing significantly the number of collisions thanks to the rule-based engine. For the second scenario, it has been possible to reduce 20% of the time required to locate all the targets using the reinforcement learning model. Conclusions: Reinforcement learning is a very good strategy to learn policies for drone navigation, but in critical situations, it is necessary to complement it with a rule-based module to increase task success rate. Full article
Show Figures

Figure 1

12 pages, 1581 KiB  
Article
Airfoil Shape Generation and Feature Extraction Using the Conditional VAE-WGAN-gp
by Kazuo Yonekura, Yuki Tomori and Katsuyuki Suzuki
AI 2024, 5(4), 2092-2103; https://doi.org/10.3390/ai5040102 - 28 Oct 2024
Viewed by 631
Abstract
A machine learning method was applied to solve an inverse airfoil design problem. A conditional VAE-WGAN-gp model, which couples the conditional variational autoencoder (VAE) and Wasserstein generative adversarial network with gradient penalty (WGAN-gp), is proposed for an airfoil generation method, and then, it [...] Read more.
A machine learning method was applied to solve an inverse airfoil design problem. A conditional VAE-WGAN-gp model, which couples the conditional variational autoencoder (VAE) and Wasserstein generative adversarial network with gradient penalty (WGAN-gp), is proposed for an airfoil generation method, and then, it is compared with the WGAN-gp and VAE models. The VAEGAN model couples the VAE and GAN models, which enables feature extraction in the GAN models. In airfoil generation tasks, to generate airfoil shapes that satisfy lift coefficient requirements, it is known that VAE outperforms WGAN-gp with respect to the accuracy of the reproduction of the lift coefficient, whereas GAN outperforms VAE with respect to the smoothness and variations of generated shapes. In this study, VAE-WGAN-gp demonstrated a good performance in all three aspects. Latent distribution was also studied to compare the feature extraction ability of the proposed method. Full article
Show Figures

Figure 1

26 pages, 809 KiB  
Review
Deep Learning in Finance: A Survey of Applications and Techniques
by Ebikella Mienye, Nobert Jere, George Obaido, Ibomoiye Domor Mienye and Kehinde Aruleba
AI 2024, 5(4), 2066-2091; https://doi.org/10.3390/ai5040101 - 28 Oct 2024
Viewed by 1204
Abstract
Machine learning (ML) has transformed the financial industry by enabling advanced applications such as credit scoring, fraud detection, and market forecasting. At the core of this transformation is deep learning (DL), a subset of ML that is robust in processing and analyzing complex [...] Read more.
Machine learning (ML) has transformed the financial industry by enabling advanced applications such as credit scoring, fraud detection, and market forecasting. At the core of this transformation is deep learning (DL), a subset of ML that is robust in processing and analyzing complex and large datasets. This paper provides a comprehensive overview of key deep learning models, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), Deep Belief Networks (DBNs), Transformers, Generative Adversarial Networks (GANs), and Deep Reinforcement Learning (Deep RL). Beyond summarizing their mathematical foundations and learning processes, this study offers new insights into how these models are applied in real-world financial contexts, highlighting their specific advantages and limitations in tasks such as algorithmic trading, risk management, and portfolio optimization. It also examines recent advances and emerging trends in the financial industry alongside critical challenges such as data quality, model interpretability, and computational complexity. These insights can guide future research directions toward developing more efficient, robust, and explainable financial models that address the evolving needs of the financial sector. Full article
(This article belongs to the Special Issue AI in Finance: Leveraging AI to Transform Financial Services)
Show Figures

Figure 1

29 pages, 7459 KiB  
Article
Leveraging Explainable Artificial Intelligence (XAI) for Expert Interpretability in Predicting Rapid Kidney Enlargement Risks in Autosomal Dominant Polycystic Kidney Disease (ADPKD)
by Latifa Dwiyanti, Hidetaka Nambo and Nur Hamid
AI 2024, 5(4), 2037-2065; https://doi.org/10.3390/ai5040100 - 28 Oct 2024
Viewed by 693
Abstract
Autosomal dominant polycystic kidney disease (ADPKD) is the predominant hereditary factor leading to end-stage renal disease (ESRD) worldwide, affecting individuals across all races with a prevalence of 1 in 400 to 1 in 1000. The disease presents significant challenges in management, particularly with [...] Read more.
Autosomal dominant polycystic kidney disease (ADPKD) is the predominant hereditary factor leading to end-stage renal disease (ESRD) worldwide, affecting individuals across all races with a prevalence of 1 in 400 to 1 in 1000. The disease presents significant challenges in management, particularly with limited options for slowing cyst progression, as well as the use of tolvaptan being restricted to high-risk patients due to potential liver injury. However, determining high-risk status typically requires magnetic resonance imaging (MRI) to calculate total kidney volume (TKV), a time-consuming process demanding specialized expertise. Motivated by these challenges, this study proposes alternative methods for high-risk categorization that do not rely on TKV data. Utilizing historical patient data, we aim to predict rapid kidney enlargement in ADPKD patients to support clinical decision-making. We applied seven machine learning algorithms—Random Forest, Logistic Regression, Support Vector Machine (SVM), Light Gradient Boosting Machine (LightGBM), Gradient Boosting Tree, XGBoost, and Deep Neural Network (DNN)—to data from the Polycystic Kidney Disease Outcomes Consortium (PKDOC) database. The XGBoost model, combined with the Synthetic Minority Oversampling Technique (SMOTE), yielded the best performance. We also leveraged explainable artificial intelligence (XAI) techniques, specifically Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP), to visualize and clarify the model’s predictions. Furthermore, we generated text summaries to enhance interpretability. To evaluate the effectiveness of our approach, we proposed new metrics to assess explainability and conducted a survey with 27 doctors to compare models with and without XAI techniques. The results indicated that incorporating XAI and textual summaries significantly improved expert explainability and increased confidence in the model’s ability to support treatment decisions for ADPKD patients. Full article
(This article belongs to the Special Issue Interpretable and Explainable AI Applications)
Show Figures

Figure 1

19 pages, 7968 KiB  
Article
Intelligent Manufacturing in Wine Barrel Production: Deep Learning-Based Wood Stave Classification
by Frank A. Ricardo, Martxel Eizaguirre, Desmond K. Moru and Diego Borro
AI 2024, 5(4), 2018-2036; https://doi.org/10.3390/ai5040099 - 28 Oct 2024
Viewed by 637
Abstract
Innovative wood inspection technology is crucial in various industries, especially for determining wood quality by counting rings in each stave, a key factor in wine barrel production. (1) Background: Traditionally, human inspectors visually evaluate staves, compensating for natural variations and characteristics like dirt [...] Read more.
Innovative wood inspection technology is crucial in various industries, especially for determining wood quality by counting rings in each stave, a key factor in wine barrel production. (1) Background: Traditionally, human inspectors visually evaluate staves, compensating for natural variations and characteristics like dirt and saw-induced aberrations. These variations pose significant challenges for automatic inspection systems. Several techniques using classical image processing and deep learning have been developed to detect tree-ring boundaries, but they often struggle with woods exhibiting heterogeneity and texture irregularities. (2) Methods: This study proposes a hybrid approach combining classical computer vision techniques for preprocessing with deep learning algorithms for classification, designed for continuous automated processing. To enhance performance and accuracy, we employ a data augmentation strategy using cropping techniques to address intra-class variability in individual staves. (3) Results: Our approach significantly improves accuracy and reliability in classifying wood with irregular textures and heterogeneity. The use of explainable AI and model calibration offers a deeper understanding of the model’s decision-making process, ensuring robustness and transparency, and setting confidence thresholds for outputs. (4) Conclusions: The proposed system enhances the performance of automatic wood inspection technologies, providing a robust solution for industries requiring precise wood quality assessment, particularly in wine barrel production. Full article
Show Figures

Figure 1

41 pages, 4270 KiB  
Article
Integrating Digital Twins and Artificial Intelligence Multi-Modal Transformers into Water Resource Management: Overview and Advanced Predictive Framework
by Toqeer Ali Syed, Muhammad Yasar Khan, Salman Jan, Sami Albouq, Saad Said Alqahtany and Muhammad Tayyab Naqash
AI 2024, 5(4), 1977-2017; https://doi.org/10.3390/ai5040098 - 25 Oct 2024
Viewed by 1019
Abstract
Various Artificial Intelligence (AI) techniques in water resource management highlight the current methodologies’ strengths and limitations in forecasting, optimization, and control. We identify a gap in integrating these diverse approaches for enhanced water prediction and management. We critically analyze the existing literature on [...] Read more.
Various Artificial Intelligence (AI) techniques in water resource management highlight the current methodologies’ strengths and limitations in forecasting, optimization, and control. We identify a gap in integrating these diverse approaches for enhanced water prediction and management. We critically analyze the existing literature on artificial neural networks (ANNs), deep learning (DL), long short-term memory (LSTM) networks, machine learning (ML) models such as supervised learning (SL) and unsupervised learning (UL), and random forest (RF). In response, we propose a novel framework that synergizes these techniques into a unified, multi-layered model and incorporates a digital twin and a multi-modal transformer approach. This integration aims to leverage the collective advantages of each method while overcoming individual constraints, significantly enhancing prediction accuracy and operational efficiency. This paper sets the foundation for an innovative digital twin-integrated solution, focusing on reviewing past works as a precursor to a detailed exposition of our proposed model in a subsequent publication. This advanced approach promises to redefine accuracy in water demand forecasting and contribute significantly to global sustainability and efficiency in water use. Full article
Show Figures

Figure 1

22 pages, 1720 KiB  
Article
Machine Learning Models Informed by Connected Mixture Components for Short- and Medium-Term Time Series Forecasting
by Andrey K. Gorshenin and Anton L. Vilyaev
AI 2024, 5(4), 1955-1976; https://doi.org/10.3390/ai5040097 - 22 Oct 2024
Viewed by 783
Abstract
This paper presents a new approach in the field of probability-informed machine learning (ML). It implies improving the results of ML algorithms and neural networks (NNs) by using probability models as a source of additional features in situations where it is impossible to [...] Read more.
This paper presents a new approach in the field of probability-informed machine learning (ML). It implies improving the results of ML algorithms and neural networks (NNs) by using probability models as a source of additional features in situations where it is impossible to increase the training datasets for various reasons. We introduce connected mixture components as a source of additional information that can be extracted from a mathematical model. These components are formed using probability mixture models and a special algorithm for merging parameters in the sliding window mode. This approach has been proven effective when applied to real-world time series data for short- and medium-term forecasting. In all cases, the models informed by the connected mixture components showed better results than those that did not use them, although different informed models may be effective for various datasets. The fundamental novelty of the research lies both in a new mathematical approach to informing ML models and in the demonstrated increase in forecasting accuracy in various applications. For geophysical spatiotemporal data, the decrease in Root Mean Square Error (RMSE) was up to 27.7%, and the reduction in Mean Absolute Percentage Error (MAPE) was up to 45.7% compared with ML models without probability informing. The best metrics values were obtained by an informed ensemble architecture that fuses the results of a Long Short-Term Memory (LSTM) network and a transformer. The Mean Squared Error (MSE) for the electricity transformer oil temperature from the ETDataset had improved by up to 10.0% compared with vanilla methods. The best MSE value was obtained by informed random forest. The introduced probability-informed approach allows us to outperform the results of both transformer NN architectures and classical statistical and machine learning methods. Full article
Show Figures

Figure 1

13 pages, 851 KiB  
Article
Feasibility of GPT-3.5 versus Machine Learning for Automated Surgical Decision-Making Determination: A Multicenter Study on Suspected Appendicitis
by Sebastian Sanduleanu, Koray Ersahin, Johannes Bremm, Narmin Talibova, Tim Damer, Merve Erdogan, Jonathan Kottlors, Lukas Goertz, Christiane Bruns, David Maintz and Nuran Abdullayev
AI 2024, 5(4), 1942-1954; https://doi.org/10.3390/ai5040096 - 16 Oct 2024
Viewed by 677
Abstract
Background: Nonsurgical treatment of uncomplicated appendicitis is a reasonable option in many cases despite the sparsity of robust, easy access, externally validated, and multimodally informed clinical decision support systems (CDSSs). Developed by OpenAI, the Generative Pre-trained Transformer 3.5 model (GPT-3) may provide enhanced [...] Read more.
Background: Nonsurgical treatment of uncomplicated appendicitis is a reasonable option in many cases despite the sparsity of robust, easy access, externally validated, and multimodally informed clinical decision support systems (CDSSs). Developed by OpenAI, the Generative Pre-trained Transformer 3.5 model (GPT-3) may provide enhanced decision support for surgeons in less certain appendicitis cases or those posing a higher risk for (relative) operative contra-indications. Our objective was to determine whether GPT-3.5, when provided high-throughput clinical, laboratory, and radiological text-based information, will come to clinical decisions similar to those of a machine learning model and a board-certified surgeon (reference standard) in decision-making for appendectomy versus conservative treatment. Methods: In this cohort study, we randomly collected patients presenting at the emergency department (ED) of two German hospitals (GFO, Troisdorf, and University Hospital Cologne) with right abdominal pain between October 2022 and October 2023. Statistical analysis was performed using R, version 3.6.2, on RStudio, version 2023.03.0 + 386. Overall agreement between the GPT-3.5 output and the reference standard was assessed by means of inter-observer kappa values as well as accuracy, sensitivity, specificity, and positive and negative predictive values with the “Caret” and “irr” packages. Statistical significance was defined as p < 0.05. Results: There was agreement between the surgeon’s decision and GPT-3.5 in 102 of 113 cases, and all cases where the surgeon decided upon conservative treatment were correctly classified by GPT-3.5. The estimated model training accuracy was 83.3% (95% CI: 74.0, 90.4), while the validation accuracy for the model was 87.0% (95% CI: 66.4, 97.2). This is in comparison to the GPT-3.5 accuracy of 90.3% (95% CI: 83.2, 95.0), which did not perform significantly better in comparison to the machine learning model (p = 0.21). Conclusions: This study, the first study of the “intended use” of GPT-3.5 for surgical treatment to our knowledge, comparing surgical decision-making versus an algorithm found a high degree of agreement between board-certified surgeons and GPT-3.5 for surgical decision-making in patients presenting to the emergency department with lower abdominal pain. Full article
Show Figures

Figure 1

24 pages, 893 KiB  
Systematic Review
Digital Technologies Impact on Healthcare Delivery: A Systematic Review of Artificial Intelligence (AI) and Machine-Learning (ML) Adoption, Challenges, and Opportunities
by Ifeanyi Anthony Okwor, Geeta Hitch, Saira Hakkim, Shabana Akbar, Dave Sookhoo and John Kainesie
AI 2024, 5(4), 1918-1941; https://doi.org/10.3390/ai5040095 - 12 Oct 2024
Viewed by 2401
Abstract
Recent significant advances in the healthcare industry due to artificial intelligence (AI) and machine learning (ML) have been shown to revolutionize healthcare delivery by improving efficiency, accuracy, and patient outcomes. However, these technologies can face significant challenges and ethical considerations. This systematic review [...] Read more.
Recent significant advances in the healthcare industry due to artificial intelligence (AI) and machine learning (ML) have been shown to revolutionize healthcare delivery by improving efficiency, accuracy, and patient outcomes. However, these technologies can face significant challenges and ethical considerations. This systematic review aimed to gather and synthesize the current knowledge on the impact of AI and ML adoption in healthcare delivery, with its associated challenges and opportunities. This study adhered to the PRISMA guidelines. Articles from 2014 to 2024 were selected from various databases using specific keywords. Eligible studies were included after rigorous screening and quality assessment using checklist tools. Themes were identified through data analysis and thematic analysis. From 4981 articles screened, a data synthesis of nine eligible studies revealed themes, including productivity enhancement, improved patient care through decision support and precision medicine, legal and policy challenges, technological considerations, organizational and managerial aspects, ethical concerns, data challenges, and socioeconomic implications. There exist significant opportunities, as well as substantial challenges and ethical concerns, associated with integrating AI and ML into healthcare delivery. Implementation strategies must be carefully designed, considering technical, ethical, and social factors. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

25 pages, 396 KiB  
Article
Causal Economic Machine Learning (CEML): “Human AI”
by Andrew Horton
AI 2024, 5(4), 1893-1917; https://doi.org/10.3390/ai5040094 - 11 Oct 2024
Viewed by 933
Abstract
This paper proposes causal economic machine learning (CEML) as a research agenda that utilizes causal machine learning (CML), built on causal economics (CE) decision theory. Causal economics is better suited for use in machine learning optimization than expected utility theory (EUT) and behavioral [...] Read more.
This paper proposes causal economic machine learning (CEML) as a research agenda that utilizes causal machine learning (CML), built on causal economics (CE) decision theory. Causal economics is better suited for use in machine learning optimization than expected utility theory (EUT) and behavioral economics (BE) based on its central feature of causal coupling (CC), which models decisions as requiring upfront costs, some certain and some uncertain, in anticipation of future uncertain benefits that are linked by causation. This multi-period causal process, incorporating certainty and uncertainty, replaces the single-period lottery outcomes augmented with intertemporal discounting used in EUT and BE, providing a more realistic framework for AI machine learning modeling and real-world application. It is mathematically demonstrated that EUT and BE are constrained versions of CE. With the growing interest in natural experiments in statistics and causal machine learning (CML) across many fields, such as healthcare, economics, and business, there is a large potential opportunity to run AI models on CE foundations and compare results to models based on traditional decision-making models that focus only on rationality, bounded to various degrees. To be most effective, machine learning must mirror human reasoning as closely as possible, an alignment established through CEML, which represents an evolution to truly “human AI”. This paper maps out how the non-linear optimization required for the CEML structural response functions can be accomplished through Sequential Least Squares Programming (SLSQP) and applied to data sets through the S-Learner CML meta-algorithm. Upon this foundation, the next phase of research is to apply CEML to appropriate data sets in various areas of practice where causality and accurate modeling of human behavior are vital, such as precision healthcare, economic policy, and marketing. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
23 pages, 13442 KiB  
Article
From Play to Understanding: Large Language Models in Logic and Spatial Reasoning Coloring Activities for Children
by Sebastián Tapia-Mandiola and Roberto Araya
AI 2024, 5(4), 1870-1892; https://doi.org/10.3390/ai5040093 - 11 Oct 2024
Viewed by 800
Abstract
Visual thinking leverages spatial mechanisms in animals for navigation and reasoning. Therefore, given the challenge of abstract mathematics and logic, spatial reasoning-based teaching strategies can be highly effective. Our previous research verified that innovative box-and-ball coloring activities help teach elementary school students complex [...] Read more.
Visual thinking leverages spatial mechanisms in animals for navigation and reasoning. Therefore, given the challenge of abstract mathematics and logic, spatial reasoning-based teaching strategies can be highly effective. Our previous research verified that innovative box-and-ball coloring activities help teach elementary school students complex notions like quantifiers, logical connectors, and dynamic systems. However, given the richness of the activities, correction is slow, error-prone, and demands high attention and cognitive load from the teacher. Moreover, feedback to the teacher should be immediate. Thus, we propose to provide the teacher with real-time help with LLMs. We explored various prompting techniques with and without context—Zero-Shot, Few-Shot, Chain of Thought, Visualization of Thought, Self-Consistency, logicLM, and emotional —to test GPT-4o’s visual, logical, and correction capabilities. We obtained that Visualization of Thought and Self-Consistency techniques enabled GPT-4o to correctly evaluate 90% of the logical–spatial problems that we tested. Additionally, we propose a novel prompt combining some of these techniques that achieved 100% accuracy on a testing sample, excelling in spatial problems and enhancing logical reasoning. Full article
Show Figures

Figure 1

12 pages, 999 KiB  
Perspective
Collaborative Robots with Cognitive Capabilities for Industry 4.0 and Beyond
by Giulio Sandini, Alessandra Sciutti and Pietro Morasso
AI 2024, 5(4), 1858-1869; https://doi.org/10.3390/ai5040092 - 9 Oct 2024
Viewed by 750
Abstract
The robots that entered the manufacturing sector in the second and third Industrial Revolutions (IR2 and IR3) were designed for carrying out predefined routines without physical interaction with humans. In contrast, IR4* robots (i.e., robots since IR4 and beyond) are supposed to interact [...] Read more.
The robots that entered the manufacturing sector in the second and third Industrial Revolutions (IR2 and IR3) were designed for carrying out predefined routines without physical interaction with humans. In contrast, IR4* robots (i.e., robots since IR4 and beyond) are supposed to interact with humans in a cooperative way for enhancing flexibility, autonomy, and adaptability, thus dramatically improving productivity. However, human–robot cooperation implies cognitive capabilities that the cooperative robots (CoBots) in the market do not have. The common wisdom is that such a cognitive lack can be filled in a straightforward way by integrating well-established ICT technologies with new AI technologies. This short paper expresses the view that this approach is not promising and suggests a different one based on artificial cognition rather than artificial intelligence, founded on concepts of embodied cognition, developmental robotics, and social robotics. We suggest giving these IR4* robots designed according to such principles the name CoCoBots. The paper also addresses the ethical problems that can be raised in cases of critical emergencies. In normal operating conditions, CoCoBots and human partners, starting from individual evaluations, will routinely develop joint decisions on the course of action to be taken through mutual understanding and explanation. In case a joint decision cannot be reached and/or in the limited case that an emergency is detected and declared by top security levels, we suggest that the ultimate decision-making power, with the associated responsibility, should rest on the human side, at the different levels of the organized structure. Full article
(This article belongs to the Special Issue Intelligent Systems for Industry 4.0)
Show Figures

Figure 1

21 pages, 1242 KiB  
Article
A Bag-of-Words Approach for Information Extraction from Electricity Invoices
by Javier Sánchez and Giovanny A. Cuervo-Londoño
AI 2024, 5(4), 1837-1857; https://doi.org/10.3390/ai5040091 - 8 Oct 2024
Viewed by 550
Abstract
In the context of digitization and automation, extracting relevant information from business documents remains a significant challenge. It is typical to rely on machine-learning techniques to automate the process, reduce manual labor, and minimize errors. This work introduces a new model for extracting [...] Read more.
In the context of digitization and automation, extracting relevant information from business documents remains a significant challenge. It is typical to rely on machine-learning techniques to automate the process, reduce manual labor, and minimize errors. This work introduces a new model for extracting key values from electricity invoices, including customer data, bill breakdown, electricity consumption, or marketer data. We evaluate several machine learning techniques, such as Naive Bayes, Logistic Regression, Random Forests, or Support Vector Machines. Our approach relies on a bag-of-words strategy and custom-designed features tailored for electricity data. We validate our method on the IDSEM dataset, which includes 75,000 electricity invoices with eighty-six fields. The model converts PDF invoices into text and processes each word separately using a context of eleven words. The results of our experiments indicate that Support Vector Machines and Random Forests perform exceptionally well in capturing numerous values with high precision. The study also explores the advantages of our custom features and evaluates the performance of unseen documents. The precision obtained with Support Vector Machines is 91.86% on average, peaking at 98.47% for one document template. These results demonstrate the effectiveness of our method in accurately extracting key values from invoices. Full article
Show Figures

Figure 1

21 pages, 5748 KiB  
Article
Automated Audible Truck-Mounted Attenuator Alerts: Vision System Development and Evaluation
by Neema Jakisa Owor, Yaw Adu-Gyamfi, Linlin Zhang and Carlos Sun
AI 2024, 5(4), 1816-1836; https://doi.org/10.3390/ai5040090 - 8 Oct 2024
Viewed by 766
Abstract
Background: The rise in work zone crashes due to distracted and aggressive driving calls for improved safety measures. While Truck-Mounted Attenuators (TMAs) have helped reduce crash severity, the increasing number of crashes involving TMAs shows the need for improved warning systems. Methods: This [...] Read more.
Background: The rise in work zone crashes due to distracted and aggressive driving calls for improved safety measures. While Truck-Mounted Attenuators (TMAs) have helped reduce crash severity, the increasing number of crashes involving TMAs shows the need for improved warning systems. Methods: This study proposes an AI-enabled vision system to automatically alert drivers on collision courses with TMAs, addressing the limitations of manual alert systems. The system uses multi-task learning (MTL) to detect and classify vehicles, estimate distance zones (danger, warning, and safe), and perform lane and road segmentation. MTL improves efficiency and accuracy, making it ideal for devices with limited resources. Using a Generalized Efficient Layer Aggregation Network (GELAN) backbone, the system enhances stability and performance. Additionally, an alert module triggers alarms based on speed, acceleration, and time to collision. Results: The model achieves a recall of 90.5%, an mAP of 0.792 for vehicle detection, an mIOU of 0.948 for road segmentation, an accuracy of 81.5% for lane segmentation, and 83.8% accuracy for distance classification. Conclusions: The results show the system accurately detects vehicles, classifies distances, and provides real-time alerts, reducing TMA collision risks and enhancing work zone safety. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

23 pages, 12844 KiB  
Article
Aircraft Skin Damage Visual Testing System Using Lightweight Devices with YOLO: An Automated Real-Time Material Evaluation System
by Kuo-Chien Liao, Jirayu Lau and Muhamad Hidayat
AI 2024, 5(4), 1793-1815; https://doi.org/10.3390/ai5040089 - 29 Sep 2024
Viewed by 924
Abstract
Inspection and material evaluation are some of the critical factors to ensure the structural integrity and safety of an aircraft in the aviation industry. These inspections are carried out by trained personnel, and while effective, they are prone to human error, where even [...] Read more.
Inspection and material evaluation are some of the critical factors to ensure the structural integrity and safety of an aircraft in the aviation industry. These inspections are carried out by trained personnel, and while effective, they are prone to human error, where even a minute error could result in a large-scale negative impact. Automated detection devices designed to improve the reliability of inspections could help the industry reduce the potential effects caused by human error. This study aims to develop a system that can automatically detect and identify defects on aircraft skin using relatively lightweight devices, including mobile phones and unmanned aerial vehicles (UAVs). The study combines an internet of things (IoT) network, allowing the results to be reviewed in real time, regardless of distance. The experimental results confirmed the effective recognition of defects with the mean average precision ([email protected]) at 0.853 for YOLOv9c for all classes. However, despite the effective detection, the test device (mobile phone) was prone to overheating, significantly reducing its performance. While there is still room for further enhancements, this study demonstrates the potential of introducing automated image detection technology to assist the inspection process in the aviation industry. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

14 pages, 2453 KiB  
Article
Advancing Persistent Character Generation: Comparative Analysis of Fine-Tuning Techniques for Diffusion Models
by Luca Martini, Saverio Iacono, Daniele Zolezzi and Gianni Viardo Vercelli
AI 2024, 5(4), 1779-1792; https://doi.org/10.3390/ai5040088 - 29 Sep 2024
Viewed by 908
Abstract
In the evolving field of artificial intelligence, fine-tuning diffusion models is crucial for generating contextually coherent digital characters across various media. This paper examines four advanced fine-tuning techniques: Low-Rank Adaptation (LoRA), DreamBooth, Hypernetworks, and Textual Inversion. Each technique enhances the specificity and consistency [...] Read more.
In the evolving field of artificial intelligence, fine-tuning diffusion models is crucial for generating contextually coherent digital characters across various media. This paper examines four advanced fine-tuning techniques: Low-Rank Adaptation (LoRA), DreamBooth, Hypernetworks, and Textual Inversion. Each technique enhances the specificity and consistency of character generation, expanding the applications of diffusion models in digital content creation. LoRA efficiently adapts models to new tasks with minimal adjustments, making it ideal for environments with limited computational resources. It excels in low VRAM contexts due to its targeted fine-tuning of low-rank matrices within cross-attention layers, enabling faster training and efficient parameter tweaking. DreamBooth generates highly detailed, subject-specific images but is computationally intensive and suited for robust hardware environments. Hypernetworks introduce auxiliary networks that dynamically adjust the model’s behavior, allowing for flexibility during inference and on-the-fly model switching. This adaptability, however, can result in slightly lower image quality. Textual Inversion embeds new concepts directly into the model’s embedding space, allowing for rapid adaptation to novel styles or concepts, but is less effective for precise character generation. This analysis shows that LoRA is the most efficient for producing high-quality outputs with minimal computational overhead. In contrast, DreamBooth excels in high-fidelity images at the cost of longer training. Hypernetworks provide adaptability with some tradeoffs in quality, while Textual Inversion serves as a lightweight option for style integration. These techniques collectively enhance the creative capabilities of diffusion models, delivering high-quality, contextually relevant outputs. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

20 pages, 14487 KiB  
Article
Fault Classification of 3D-Printing Operations Using Different Types of Machine and Deep Learning Techniques
by Satish Kumar, Sameer Sayyad and Arunkumar Bongale
AI 2024, 5(4), 1759-1778; https://doi.org/10.3390/ai5040087 - 27 Sep 2024
Viewed by 943
Abstract
Fused deposition modeling (FDM), a method of additive manufacturing (AM), comprises the extrusion of materials via a nozzle and the subsequent combining of the layers to create 3D-printed objects. FDM is a widely used method for 3D-printing objects since it is affordable, effective, [...] Read more.
Fused deposition modeling (FDM), a method of additive manufacturing (AM), comprises the extrusion of materials via a nozzle and the subsequent combining of the layers to create 3D-printed objects. FDM is a widely used method for 3D-printing objects since it is affordable, effective, and easy to use. Some defects such as poor infill, elephant foot, layer shift, and poor surface finish arise in the FDM components at the printing stage due to variations in printing parameters such as printing speed, change in nozzle, or bed temperature. Proper fault classification is required to identify the cause of faulty products. In this work, the multi-sensory data are gathered using different sensors such as vibration, current, temperature, and sound sensors. The data acquisition is performed by using the National Instrumentation (NI) Data Acquisition System (DAQ) which provides the synchronous multi-sensory data for the model training. To induce the faults, the data are captured under different conditions such as variations in printing speed, temperate, and jerk during the printing. The collected data are used to train the machine learning (ML) and deep learning (DL) classification models to classify the variation in printing parameters. The ML models such as k-nearest neighbor (KNN), decision tree (DT), extra trees (ET), and random forest (RF) with convolutional neural network (CNN) as a DL model are used to classify the variable operation printing parameters. Out of the available models, in ML models, the RF classifier shows a classification accuracy of around 91% whereas, in the DL model, the CNN model shows good classification performance with accuracy ranging from 92 to 94% under variable operating conditions. Full article
(This article belongs to the Special Issue Intelligent Systems for Industry 4.0)
Show Figures

Figure 1

16 pages, 696 KiB  
Article
Software Defect Prediction Based on Machine Learning and Deep Learning Techniques: An Empirical Approach
by Waleed Albattah and Musaad Alzahrani
AI 2024, 5(4), 1743-1758; https://doi.org/10.3390/ai5040086 - 24 Sep 2024
Viewed by 3196
Abstract
Software bug prediction is a software maintenance technique used to predict the occurrences of bugs in the early stages of the software development process. Early prediction of bugs can reduce the overall cost of software and increase its reliability. Machine learning approaches have [...] Read more.
Software bug prediction is a software maintenance technique used to predict the occurrences of bugs in the early stages of the software development process. Early prediction of bugs can reduce the overall cost of software and increase its reliability. Machine learning approaches have recently offered several prediction methods to improve software quality. This paper empirically investigates eight well-known machine learning and deep learning algorithms for software bug prediction. We compare the created models using different evaluation metrics and a well-accepted dataset to make the study results more reliable. This study uses a large dataset collected from five publicly available bug datasets that includes about 60 software metrics. The source-code metrics of internal class quality, including cohesion, coupling, complexity, documentation inheritance, and size metrics, were used as features to predict buggy and non-buggy classes. Four performance metrics, namely accuracy, macro F1 score, weighted F1 score, and binary F1 score, are considered to quantitatively evaluate and compare the performance of the constructed bug prediction models. The results demonstrate that the deep learning model (LSTM) outperforms all other models across these metrics, achieving an accuracy of 0.87. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop