Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (76)

Search Parameters:
Keywords = COSTLy language

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 2793 KiB  
Article
Link Predictions with Bi-Level Routing Attention
by Yu Wang, Shu Xu, Zenghui Ding, Cong Liu and Xianjun Yang
AI 2025, 6(7), 156; https://doi.org/10.3390/ai6070156 - 14 Jul 2025
Viewed by 338
Abstract
Background/Objectives: Knowledge Graphs (KGs) are often incomplete, which can significantly impact the performance of downstream applications. Manual completion of KGs is time-consuming and costly, emphasizing the importance of developing automated methods for KGC. Link prediction serves as a fundamental task in this domain. [...] Read more.
Background/Objectives: Knowledge Graphs (KGs) are often incomplete, which can significantly impact the performance of downstream applications. Manual completion of KGs is time-consuming and costly, emphasizing the importance of developing automated methods for KGC. Link prediction serves as a fundamental task in this domain. The semantic correlation among entity features plays a crucial role in determining the effectiveness of link-prediction models. Notably, the human brain can often infer information using a limited set of salient features. Methods: Inspired by this cognitive principle, this paper proposes a lightweight Bi-level routing attention mechanism specifically designed for link-prediction tasks. This proposed module explores a theoretically grounded and lightweight structural design aimed at enhancing the semantic recognition capability of language models without altering their core parameters. The proposed module enhances the model’s ability to attend to feature regions with high semantic relevance. With only a marginal increase of approximately one million parameters, the mechanism effectively captures the most semantically informative features. Result: It replaces the original feature-extraction module within the KGML framework and is evaluated on the publicly available WN18RR and FB15K-237 dataset. Conclusions: Experimental results demonstrate consistent improvements in standard evaluation metrics, including Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hits@10, thereby confirming the effectiveness of the proposed approach. Full article
Show Figures

Figure 1

23 pages, 2579 KiB  
Article
Multimodal Particulate Matter Prediction: Enabling Scalable and High-Precision Air Quality Monitoring Using Mobile Devices and Deep Learning Models
by Hirokazu Madokoro and Stephanie Nix
Sensors 2025, 25(13), 4053; https://doi.org/10.3390/s25134053 - 29 Jun 2025
Viewed by 386
Abstract
This paper presents a novel approach for predicting Particulate Matter (PM) concentrations using mobile camera devices. In response to persistent air pollution challenges across Japan, we developed a system that utilizes cutting-edge transformer-based deep learning architectures to estimate PM values from imagery captured [...] Read more.
This paper presents a novel approach for predicting Particulate Matter (PM) concentrations using mobile camera devices. In response to persistent air pollution challenges across Japan, we developed a system that utilizes cutting-edge transformer-based deep learning architectures to estimate PM values from imagery captured by smartphone cameras. Our approach employs Contrastive Language–Image Pre-Training (CLIP) as a multimodal framework to extract visual features associated with PM concentration from environmental scenes. We first developed a baseline through comparative analysis of time-series models for 1D PM signal prediction, finding that linear models, particularly NLinear, outperformed complex transformer architectures for short-term forecasting tasks. Building on these insights, we implemented a CLIP-based system for 2D image analysis that achieved a Top-1 accuracy of 0.24 and a Top-5 accuracy of 0.52 when tested on diverse smartphone-captured images. The performance evaluations on Graphics Processing Unit (GPU) and Single-Board Computer (SBC) platforms highlight a viable path toward edge deployment. Processing times of 0.29 s per image on the GPU versus 2.68 s on the SBC demonstrate the potential for scalable, real-time environmental monitoring. We consider that this research connects high-performance computing with energy-efficient hardware solutions, creating a practical framework for distributed environmental monitoring that reduces reliance on costly centralized monitoring systems. Our findings indicate that transformer-based multimodal models present a promising approach for mobile sensing applications, with opportunities for further improvement through seasonal data expansion and architectural refinements. Full article
(This article belongs to the Special Issue Machine Learning and Image-Based Smart Sensing and Applications)
Show Figures

Figure 1

17 pages, 1272 KiB  
Article
Multi Stage Retrieval for Web Search During Crisis
by Claudiu Constantin Tcaciuc, Daniele Rege Cambrin and Paolo Garza
Future Internet 2025, 17(6), 239; https://doi.org/10.3390/fi17060239 - 29 May 2025
Viewed by 488
Abstract
During crisis events, digital information volume can increase by over 500% within hours, with social media platforms alone generating millions of crisis-related posts. This volume creates critical challenges for emergency responders who require timely access to the concise subset of accurate information they [...] Read more.
During crisis events, digital information volume can increase by over 500% within hours, with social media platforms alone generating millions of crisis-related posts. This volume creates critical challenges for emergency responders who require timely access to the concise subset of accurate information they are interested in. Existing approaches strongly rely on the power of large language models. However, the use of large language models limits the scalability of the retrieval procedure and may introduce hallucinations. This paper introduces a novel multi-stage text retrieval framework to enhance information retrieval during crises. Our framework employs a novel three-stage extractive pipeline where (1) a topic modeling component filters candidates based on thematic relevance, (2) an initial high-recall lexical retriever identifies a broad candidate set, and (3) a dense retriever reranks the remaining documents. This architecture balances computational efficiency with retrieval effectiveness, prioritizing high recall in early stages while refining precision in later stages. The framework avoids the introduction of hallucinations, achieving a 15% improvement in BERT-Score compared to existing solutions without requiring any costly abstractive model. Moreover, our sequential approach accelerates the search process by 5% compared to the use of a single-stage based on a dense retrieval approach, with minimal effect on the performance in terms of BERT-Score. Full article
Show Figures

Figure 1

17 pages, 9012 KiB  
Article
PLM-ATG: Identification of Autophagy Proteins by Integrating Protein Language Model Embeddings with PSSM-Based Features
by Yangying Wang and Chunhua Wang
Molecules 2025, 30(8), 1704; https://doi.org/10.3390/molecules30081704 - 10 Apr 2025
Viewed by 443
Abstract
Autophagy critically regulates cellular development while maintaining pathophysiological homeostasis. Since the autophagic process is tightly regulated by the coordination of autophagy-related proteins (ATGs), precise identification of these proteins is essential. Although current computational approaches have addressed experimental recognition’s costly and time-consuming challenges, they [...] Read more.
Autophagy critically regulates cellular development while maintaining pathophysiological homeostasis. Since the autophagic process is tightly regulated by the coordination of autophagy-related proteins (ATGs), precise identification of these proteins is essential. Although current computational approaches have addressed experimental recognition’s costly and time-consuming challenges, they still have room for improvement since handcrafted features inadequately capture the intricate patterns and relationships hidden in sequences. In this study, we propose PLM-ATG, a novel computational model that integrates support vector machines with the fusion of protein language model (PLM) embeddings and position-specific scoring matrix (PSSM)-based features for the ATG identification. First, we extracted sequence-based features and PSSM-based features as the inputs of six classifiers to establish baseline models. Among these, the combination of the SVM classifier and the AADP-PSSM feature set achieved the best prediction accuracy. Second, two popular PLM embeddings, i.e., ESM-2 and ProtT5, were fused with the AADP-PSSM features to further improve the prediction of ATGs. Third, we selected the optimal feature subset from the combination of the ESM-2 embeddings and AADP-PSSM features to train the final SVM model. The proposed PLM-ATG achieved an accuracy of 99.5% and an MCC of 0.990, which are nearly 5% and 0.1 higher than those of the state-of-the-art model EnsembleDL-ATG, respectively. Full article
Show Figures

Figure 1

33 pages, 19319 KiB  
Article
Optimising Contract Interpretations with Large Language Models: A Comparative Evaluation of a Vector Database-Powered Chatbot vs. ChatGPT
by P. V. I. N. Saparamadu, Samad Sepasgozar, R. N. D. Guruge, H. S. Jayasena, Ali Darejeh, Sanee Mohammad Ebrahimzadeh and B. A. I. Eranga
Buildings 2025, 15(7), 1144; https://doi.org/10.3390/buildings15071144 - 31 Mar 2025
Cited by 2 | Viewed by 1370
Abstract
Frequent ambiguities in contract terms often lead to costly legal disputes and project delays in the construction industry. Large Language Models (LLMs) offer a promising solution, enhancing accuracy and reducing misinterpretations. As studies pointed out, many professionals use LLMs, such as ChatGPT, to [...] Read more.
Frequent ambiguities in contract terms often lead to costly legal disputes and project delays in the construction industry. Large Language Models (LLMs) offer a promising solution, enhancing accuracy and reducing misinterpretations. As studies pointed out, many professionals use LLMs, such as ChatGPT, to assist with their professional tasks at a minor level, such as information retrieval from the Internet and content editing. With access to a construction regulation database, LLMs can automate contract interpretation. However, the lack of Artificial Intelligence tools tailored to industry regulations hinders their adoption in the construction sector. This research addresses the gap by developing and deploying a publicly available specialised chatbot using the ChatGPT language model. The development process includes architectural design, data preparation, vector embeddings, and model integration. The study uses qualitative and quantitative methodologies to evaluate the chatbot’s role in resolving contract-related issues through standardised tests. The specialised chatbot, trained on construction-specific legal information, achieved an average score of 88%, significantly outperforming ChatGPT’s 36%. The integration of a domain-specific language model promises to revolutionise construction practices through increased precision, efficiency, and innovation. These findings demonstrate the potential of optimised language models to transform construction practices. Full article
Show Figures

Figure 1

16 pages, 248 KiB  
Article
Challenges from 4e Cognition to the Standard Cognitive Science of Religion Model
by David H. Nikkel
Religions 2025, 16(4), 415; https://doi.org/10.3390/rel16040415 - 25 Mar 2025
Viewed by 785
Abstract
Embodied, enactive cognition, which is also embedded or emplaced cognition and extended cognition through tools, including language, presents various challenges to the standard model of the cognitive science of religion. In its focus on unconscious brain mechanisms, the standard model downplays or eliminates [...] Read more.
Embodied, enactive cognition, which is also embedded or emplaced cognition and extended cognition through tools, including language, presents various challenges to the standard model of the cognitive science of religion. In its focus on unconscious brain mechanisms, the standard model downplays or eliminates religious meaning as epiphenomenal or illusory. It often denies that religion, once present, is adaptive or admits as adaptive only costly signaling. It regards humans’ perceptions of their environments as representations, mistaking an environment as determinate before cognition occurs. This support for indirect perception makes no sense given its emphasis on the need for sensing possible threats to survival. As brain mechanisms of individuals do all the heavy lifting, the model regards culture and its influence as nonexistent or insignificant. This stance denies how the social constitutes a huge part of our embodied preobjective and tacit engagement with the world, as well as socio-cultural realities, including religion, as self-organizing systems. The neglect of embodiment extends to its take on supernatural agents as allegedly disembodied minds. The standard model overlooks how ordinary rituals promote bonding through group presence, synchrony, and endorphin production and how some rituals increase knowledge of a particular natural environment, thus overlooking how religion can be adaptive. Full article
(This article belongs to the Special Issue Situating Religious Cognition)
7 pages, 1587 KiB  
Proceeding Paper
Sensitivity Analysis of Conformal Cooling Channels for Injection Molds: Two-Dimensional Transient Heat Transfer Analysis
by Hugo Miguel Silva, João Tiago Noversa, Leandro Fernandes, Hugo Luís Rodrigues and António José Pontes
Eng. Proc. 2025, 87(1), 16; https://doi.org/10.3390/engproc2025087016 - 12 Mar 2025
Cited by 1 | Viewed by 355
Abstract
In recent years, conformal cooling channels (CCCs) have become simpler and less costly to produce. This was largely the product of recent developments in additive manufacturing. In injection molding engineering applications, CCCs provide superior cooling performance compared to the usual usage of straight-drilled [...] Read more.
In recent years, conformal cooling channels (CCCs) have become simpler and less costly to produce. This was largely the product of recent developments in additive manufacturing. In injection molding engineering applications, CCCs provide superior cooling performance compared to the usual usage of straight-drilled channels. This is because CCCs can be conformed for more uniform cooling of the molded part. Using CCCs decreases cooling time, total injection time, thermal stresses, and warpage by a significant amount. Despite this, CCC design is more difficult than conventional channel design. The production of a cost-effective and efficient design is dependent upon CAE simulations. This inquiry focuses on the sensitivity analysis of design features in preparation for the adoption of a design optimization approach in the future. The goal is to optimize the position of cooling channels (CCs) so as to decrease ejection time and promote temperature distribution uniformity. The ANSYS Parametric Design Language (APDL) parametrization and the given design variables are useable and may be used in future optimization attempts. Full article
(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

20 pages, 2026 KiB  
Article
RL–Fusion: The Large Language Model Fusion Method Based on Reinforcement Learning for Task Enhancing
by Zijian Wang, Jiayong Li, Yu Liu, Xuhang Li, Cairong Yan and Yanting Zhang
Appl. Sci. 2025, 15(4), 2186; https://doi.org/10.3390/app15042186 - 18 Feb 2025
Viewed by 1179
Abstract
Model fusion is a technique of growing interest in the field of machine learning, which constructs a generalized model by merging the parameters of multiple independent models with different capabilities without the need to access the original training data or perform costly computations. [...] Read more.
Model fusion is a technique of growing interest in the field of machine learning, which constructs a generalized model by merging the parameters of multiple independent models with different capabilities without the need to access the original training data or perform costly computations. However, during model fusion, when the number of parameters in a large language model is high, the dimension of the parameter space increases, which makes it more challenging to find the optimal combination of weights. Meanwhile, there is considerable potential for further development in sustainable optimization schemes for task-specific performance enhancement through model fusion in this area. In this paper, we propose a large-scale language model fusion approach based on task-enhanced reinforcement learning (RL–Fusion) to efficiently explore and optimize model fusion configurations. The key innovation of RL–Fusion lies in its use of reinforcement learning to guide parameter selection during model fusion, enabling a more intelligent and adaptive exploration of the parameter space. Additionally, RL–Fusion introduces a dynamic evaluation mechanism that adjusts the evaluation dataset in real-time based on feedback from SOTA models, ensuring continuous enhancement of domain-specific capabilities. RL–Fusion outperforms the baseline model by improving 1.75% in the MMLU benchmark test, 1.8% in the C-eval test, and 1.8% in the Chinese Named Entity Recognition (NER) test on the Yayi NER dataset by 16%. The results show that RL–Fusion is an effective and scalable model fusion solution that improves performance without increasing the computational cost of traditional optimization methods and has a wide range of applications in AI research and practice. Full article
Show Figures

Figure 1

30 pages, 667 KiB  
Article
Large Language Models for Electronic Health Record De-Identification in English and German
by Samuel Sousa, Michael Jantscher, Mark Kröll and Roman Kern
Information 2025, 16(2), 112; https://doi.org/10.3390/info16020112 - 6 Feb 2025
Cited by 1 | Viewed by 2222
Abstract
Electronic health record (EHR) de-identification is crucial for publishing or sharing medical data without violating the patient’s privacy. Protected health information (PHI) is abundant in EHRs, and privacy regulations worldwide mandate de-identification before downstream tasks are performed. The ever-growing data generation in healthcare [...] Read more.
Electronic health record (EHR) de-identification is crucial for publishing or sharing medical data without violating the patient’s privacy. Protected health information (PHI) is abundant in EHRs, and privacy regulations worldwide mandate de-identification before downstream tasks are performed. The ever-growing data generation in healthcare and the advent of generative artificial intelligence have increased the demand for de-identified EHRs and highlighted privacy issues with large language models (LLMs), especially data transmission to cloud-based LLMs. In this study, we benchmark ten LLMs for de-identifying EHRs in English and German. We then compare de-identification performance for in-context learning and full model fine-tuning and analyze the limitations of LLMs for this task. Our experimental evaluation shows that LLMs effectively de-identify EHRs in both languages. Moreover, in-context learning with a one-shot setting boosts de-identification performance without the costly full fine-tuning of the LLMs. Full article
(This article belongs to the Special Issue Information Extraction and Language Discourse Processing)
Show Figures

Figure 1

31 pages, 1741 KiB  
Article
Context Is King: Large Language Models’ Interpretability in Divergent Knowledge Scenarios
by Andrés Piñeiro-Martín, Francisco-Javier Santos-Criado, Carmen García-Mateo, Laura Docío-Fernández and María del Carmen López-Pérez
Appl. Sci. 2025, 15(3), 1192; https://doi.org/10.3390/app15031192 - 24 Jan 2025
Viewed by 2661
Abstract
Large language models (LLMs) have revolutionized the field of artificial intelligence in both academia and industry, transforming how we communicate, search for information, and create content. However, these models face knowledge cutoffs and costly updates, driving a new ecosystem for LLM-based applications that [...] Read more.
Large language models (LLMs) have revolutionized the field of artificial intelligence in both academia and industry, transforming how we communicate, search for information, and create content. However, these models face knowledge cutoffs and costly updates, driving a new ecosystem for LLM-based applications that leverage interaction techniques to extend capabilities and facilitate knowledge updates. As these models grow more complex, understanding their internal workings becomes increasingly challenging, posing significant issues for transparency, interpretability, and explainability. This paper proposes a novel approach to interpretability by shifting the focus to understanding the model’s functionality within specific contexts through interaction techniques. Rather than dissecting the LLM itself, we explore how contextual information and interaction techniques can elucidate the model’s thought processes. To this end, we introduce the Context-Driven Divergent Knowledge Evaluation (CDK-E) methodology, along with the Divergent Knowledge Dataset (DKD), for evaluating the interpretability of LLMs in context-specific scenarios that diverge from the model’s inherent knowledge. The empirical results demonstrate that advanced LLMs achieve high alignment with divergent contexts, validating our hypothesis that contextual information significantly enhances interpretability. Moreover, the strong correlation between LLM-based metrics and semantic metrics confirms the reliability of our evaluation framework. Full article
Show Figures

Figure 1

13 pages, 1023 KiB  
Article
Multilingual Prediction of Cognitive Impairment with Large Language Models and Speech Analysis
by Felix Agbavor and Hualou Liang
Brain Sci. 2024, 14(12), 1292; https://doi.org/10.3390/brainsci14121292 - 22 Dec 2024
Viewed by 1635
Abstract
Background: Cognitive impairment poses a significant global health challenge, emphasizing the critical need for early detection and intervention. Traditional diagnostics like neuroimaging and clinical evaluations are often subjective, costly, and inaccessible, especially in resource-poor settings. Previous research has focused on speech analysis primarily [...] Read more.
Background: Cognitive impairment poses a significant global health challenge, emphasizing the critical need for early detection and intervention. Traditional diagnostics like neuroimaging and clinical evaluations are often subjective, costly, and inaccessible, especially in resource-poor settings. Previous research has focused on speech analysis primarily conducted using English data, leaving multilingual settings unexplored. Methods: In this study, we present our results from the INTERSPEECH 2024 TAUKADIAL Challenge, where we aimed to automatically detect mild cognitive impairment (MCI) and predict cognitive scores for English and Chinese speakers (169 in total). Our approach leverages Whisper, a speech foundation model, to extract language-agnostic speech embeddings. We then utilize ensemble models to incorporate task-specific information. Results: Our model achieved unweighted average recall of 81.83% in an MCI classification task, and root mean squared error of 1.196 in cognitive score prediction task, which placed the model at the second and the first position, respectively, in the ranking for each task. Comparison between language-agnostic and language-specific models reveals the importance of capturing language-specific nuances for accurate cognitive impairment prediction. Conclusions: This study demonstrates the effectiveness of language-specific ensemble modeling with Whisper embeddings in enabling scalable, non-invasive cognitive health assessments of Alzheimer’s disease, achieving state-of-the-art results in multilingual settings. Full article
Show Figures

Figure 1

21 pages, 2265 KiB  
Article
A Generative Artificial-Intelligence-Based Workbench to Test New Methodologies in Organisational Health and Safety
by Andrea Falegnami, Andrea Tomassi, Giuseppe Corbelli, Francesco Saverio Nucci and Elpidio Romano
Appl. Sci. 2024, 14(24), 11586; https://doi.org/10.3390/app142411586 - 11 Dec 2024
Cited by 5 | Viewed by 1433
Abstract
This paper introduces a novel generative artificial intelligence workbench specifically tailored to the field of safety sciences. Utilizing large language models (LLMs), this innovative approach significantly diverges from traditional methods by enabling the rapid development, refinement, and preliminary testing of new safety methodologies. [...] Read more.
This paper introduces a novel generative artificial intelligence workbench specifically tailored to the field of safety sciences. Utilizing large language models (LLMs), this innovative approach significantly diverges from traditional methods by enabling the rapid development, refinement, and preliminary testing of new safety methodologies. Traditional techniques in this field typically depend on slow, iterative cycles of empirical data collection and analysis, which can be both time-intensive and costly. In contrast, our LLM-based workbench leverages synthetic data generation and advanced prompt engineering to simulate complex safety scenarios and generate diverse, realistic data sets on demand. This capability allows for more flexible and accelerated experimentation, enhancing the efficiency and scalability of safety science research. By detailing an application case, we demonstrate the practical implementation and advantages of our framework, such as its ability to adapt quickly to evolving safety requirements and its potential to significantly cut down development time and resources. The introduction of this workbench represents a paradigm shift in safety methodology development, offering a potent tool that combines the theoretical rigor of traditional methods with the agility of modern AI technologies. Full article
Show Figures

Figure 1

18 pages, 5055 KiB  
Article
Investigating the Performance of Open-Vocabulary Classification Algorithms for Pathway and Surface Material Detection in Urban Environments
by Kauê de Moraes Vestena, Silvana Phillipi Camboim, Maria Antonia Brovelli and Daniel Rodrigues dos Santos
ISPRS Int. J. Geo-Inf. 2024, 13(12), 422; https://doi.org/10.3390/ijgi13120422 - 24 Nov 2024
Cited by 2 | Viewed by 1571
Abstract
Mapping pavement types, especially in sidewalks, is essential for urban planning and mobility studies. Identifying pavement materials is a key factor in assessing mobility, such as walkability and wheelchair usability. However, satellite imagery in this scenario is limited, and in situ mapping can [...] Read more.
Mapping pavement types, especially in sidewalks, is essential for urban planning and mobility studies. Identifying pavement materials is a key factor in assessing mobility, such as walkability and wheelchair usability. However, satellite imagery in this scenario is limited, and in situ mapping can be costly. A promising solution is to extract such geospatial features from street-level imagery. This study explores using open-vocabulary classification algorithms to segment and identify pavement types and surface materials in this scenario. Our approach uses large language models (LLMs) to improve the accuracy of classifying different pavement types. The methodology involves two experiments: the first uses free prompting with random street-view images, employing Grounding Dino and SAM algorithms to assess performance across categories. The second experiment evaluates standardized pavement classification using the Deep Pavements dataset and a fine-tuned CLIP algorithm optimized for detecting OSM-compliant pavement categories. The study presents open resources, such as the Deep Pavements dataset and a fine-tuned CLIP-based model, demonstrating a significant improvement in the true positive rate (TPR) from 56.04% to 93.5%. Our findings highlight both the potential and limitations of current open-vocabulary algorithms and emphasize the importance of diverse training datasets. This study advances urban feature mapping by offering a more intuitive and accurate approach to geospatial data extraction, enhancing urban accessibility and mobility mapping. Full article
(This article belongs to the Topic Geocomputation and Artificial Intelligence for Mapping)
Show Figures

Figure 1

19 pages, 716 KiB  
Article
Applying Large Language Model to User Experience Testing
by Nien-Lin Hsueh, Hsuen-Jen Lin and Lien-Chi Lai
Electronics 2024, 13(23), 4633; https://doi.org/10.3390/electronics13234633 - 24 Nov 2024
Cited by 1 | Viewed by 3022
Abstract
The maturation of internet usage environments has elevated User Experience (UX) to a critical factor in system success. However, traditional manual UX testing methods are hampered by subjectivity and lack of standardization, resulting in time-consuming and costly processes. This study explores the potential [...] Read more.
The maturation of internet usage environments has elevated User Experience (UX) to a critical factor in system success. However, traditional manual UX testing methods are hampered by subjectivity and lack of standardization, resulting in time-consuming and costly processes. This study explores the potential of Large Language Models (LLMs) to address these challenges by developing an automated UX testing tool. Our innovative approach integrates the Rapi web recording tool to capture user interaction data with the analytical capabilities of LLMs, utilizing Nielsen’s usability heuristics as evaluation criteria. This methodology aims to significantly reduce the initial costs associated with UX testing while maintaining assessment quality. To validate the tool’s efficacy, we conducted a case study featuring a tennis-themed course reservation system. The system incorporated multiple scenarios per page, allowing users to perform tasks based on predefined goals. We employed our automated UX testing tool to evaluate screenshots and interaction logs from user sessions. Concurrently, we invited participants to test the system and complete UX questionnaires based on their experiences. Comparative analysis revealed that varying prompts in the automated UX testing tool yielded different outcomes, particularly in detecting interface elements. Notably, our tool demonstrated superior capability in identifying issues aligned with Nielsen’s usability principles compared to participant evaluations. This research contributes to the field of UX evaluation by leveraging advanced language models and established usability heuristics. Our findings suggest that LLM-based automated UX testing tools can offer more consistent and comprehensive assessments. Full article
(This article belongs to the Special Issue Recent Advances of Software Engineering)
Show Figures

Figure 1

24 pages, 432 KiB  
Article
Sequence-to-Sequence Models and Their Evaluation for Spoken Language Normalization of Slovenian
by Mirjam Sepesy Maučec, Darinka Verdonik and Gregor Donaj
Appl. Sci. 2024, 14(20), 9515; https://doi.org/10.3390/app14209515 - 18 Oct 2024
Viewed by 1087
Abstract
Sequence-to-sequence models have been applied to many challenging problems, including those in text and speech technologies. Normalization is one of them. It refers to transforming non-standard language forms into their standard counterparts. Non-standard language forms come from different written and spoken sources. This [...] Read more.
Sequence-to-sequence models have been applied to many challenging problems, including those in text and speech technologies. Normalization is one of them. It refers to transforming non-standard language forms into their standard counterparts. Non-standard language forms come from different written and spoken sources. This paper deals with one such source, namely speech from the less-resourced highly inflected Slovenian language. The paper explores speech corpora recently collected in public and private environments. We analyze the efficiencies of three sequence-to-sequence models for automatic normalization from literal transcriptions to standard forms. Experiments were performed using words, subwords, and characters as basic units for normalization. In the article, we demonstrate that the superiority of the approach is linked to the choice of the basic modeling unit. Statistical models prefer words, while neural network-based models prefer characters. The experimental results show that the best results are obtained with neural architectures based on characters. Long short-term memory and transformer architectures gave comparable results. We also present a novel analysis tool, which we use for in-depth error analysis of results obtained by character-based models. This analysis showed that systems with similar overall results can differ in the performance for different types of errors. Errors obtained with the transformer architecture are easier to correct in the post-editing process. This is an important insight, as creating speech corpora is a time-consuming and costly process. The analysis tool also incorporates two statistical significance tests: approximate randomization and bootstrap resampling. Both statistical tests confirm the improved results of neural network-based models compared to statistical ones. Full article
(This article belongs to the Special Issue Computational Linguistics: From Text to Speech Technologies)
Show Figures

Figure 1

Back to TopTop