Saved Queries

Background: Spinocerebellar ataxia type 2 (SCA2) is an inherited neurodegenerative disorder characterized by progressive cerebellar degeneration. One difficulty in treating this disease lies in identifying preclinical carriers: individuals who carry the pathogenic ATXN2 mutation but remain asymptomatic with respect to motor manifestations. Though magnetic resonance imaging (MRI) has proven valuable in supporting the diagnosis of ataxia, traditional univariate approaches using linear measurements have shown limited ability to capture the complex anatomical changes that occur across the disease spectrum, particularly during the preclinical phase. Methods: This study employed a comprehensive multivariate approach to improve the classification of individuals across the SCA2 spectrum. We developed a multinomial logistic regression model incorporating multiple linear measurements derived from magnetic resonance imaging to discriminate between healthy controls (n = 72), preclinical carriers (n = 17), and patients with manifest SCA2 (n = 61). To mitigate inherent class imbalance, particularly in the smaller preclinical subgroup, we implemented the Synthetic Minority Over-sampling Technique (SMOTE), generating a balanced dataset that enhances the model’s ability to discern the distinctive anatomical features. This was compared to the model applied to the unbalanced data. An improvement was observed when applying SMOTE. Results: The multivariate model demonstrated discriminatory performance, achieving an overall accuracy of 80.7%. The ability to identify healthy controls (AUC: 0.96), preclinical individuals (AUC: 0.75), and clinical individuals (AUC: 95%). This represents an advance over previous univariate approaches, which have had difficulty capturing the neurodegenerative changes characteristic of the preclinical stage. Conclusions: By integrating multiple neuroimaging biomarkers into a multivariable model, this study provides a tool for early identification of preclinical SCA2 carriers. The ability to accurately classify these individuals opens an opportunity for early therapeutic intervention before irreversible neurological deterioration occurs. This approach shows promise for optimizing clinical trial design and personalized care in SCA2. Full article

(This article belongs to the Section Neuroimaging)

►▼ Show Figures

Figure 1

25 pages, 1841 KB

Open AccessReview

Advances in AI-Guided CRISPR-Cas9 Engineering Strategies for Microbial Biotechnology

by Javier Alejandro Delgado-Nungaray, Dulce Alitzel Pérez-Ponce, Luis Joel Figueroa-Yáñez, Eire Reynaga-Delgado, Mario Alberto García-Ramírez and Orfil Gonzalez-Reynoso

J. Genome Biotechnol. Genet. 2026, 1(2), 10; https://doi.org/10.3390/jgbg1020010 (registering DOI) - 24 Jun 2026

Abstract

CRISPR-Cas9 has transformed microbial biotechnology by enabling precise genome modifications; however, achieving high editing efficiency remains a challenge due to multiple determinants, including on-target specificity, off-target events, PAM sequence, sgRNA scaffold composition, and RNA secondary structure. Our review foresees how artificial intelligence (AI) can address those challenges by enabling automated identification as well as highly active guide RNA (gRNA) optimisation. We highlight the influence of a data-driven training strategy that is focused on high-quality, diverse, and accurately labelled microbial datasets—mainly, given the limitations of models derived from mammalian systems that are not directly transferable to microbial organisms. Moreover, we discuss the key role of FAIR (Findable, Accessible, Interoperable, and Reusable) data principles and centralised, curated CRISPR-Cas databases as foundational elements for developing robust and predictive frameworks. Emerging directions are also explored, including generative AI approaches capable of supporting automated experimental planning. By considering the potential dual use of such technologies, the review further addresses bioethical considerations and regulatory frameworks necessary to ensure responsible genome engineering as a milestone, as well as the implementation of safeguards against misuse, particularly in pathogenic microorganisms. Furthermore, the convergence of standardised experimental data, specialised microbial datasets, and advanced AI architectures is paving the way to transform microbial biotechnology by accelerating metabolic engineering and synthetic biology applications. Full article

►▼ Show Figures

Graphical abstract

21 pages, 20156 KB

Open AccessData Descriptor

Synthetic Reference Energy Community Load Profiles for Artificial Case Studies

by Arne Surmann, Elena Timofeeva, Fabian Liesenhoff, Patrick Selzam and Pierre Hülsemann

Data 2026, 11(7), 156; https://doi.org/10.3390/data11070156 (registering DOI) - 23 Jun 2026

Abstract

This data descriptor presents CINES-REC-CITY, an open synthetic dataset providing high-resolution load profiles for energy community research. The dataset represents a typical German urban district with 70 apartments across eight multi-family buildings, including diverse socioeconomic characteristics. Three main components are provided at 15 min resolution for a full year: non-controllable residential electricity consumption for all apartments, charging profiles for 17 battery electric vehicles with trip information, and heat pump operation data for both variable-speed and hysteresis-controlled ground-source systems. All profiles were generated using validated bottom-up stochastic simulation models accounting for realistic user behavior, mobility patterns, and thermal building physics. The modular structure allows for selective combination of components, enabling investigation of different technology penetration scenarios. The dataset serves as a reference benchmark for reproducible research, allowing for direct comparison of optimization approaches, business models, and control strategies using identical underlying consumption patterns. It is suitable for techno-economic analysis, algorithm development for flexible load control, and grid impact assessment. All data is provided in CSV format with weather data for consistent extensions. Full article

(This article belongs to the Section Data Science for Chemistry, Energy and Materials)

►▼ Show Figures

Figure 1

34 pages, 11399 KB

Open AccessArticle

RSSI Data Augmentation Algorithm Based on Polynomial Regression and Stochastic Signal Fade Modeling

by Mateusz Sumorek, Adam Idźkowski and Krzysztof Konopko

Electronics 2026, 15(13), 2757; https://doi.org/10.3390/electronics15132757 (registering DOI) - 23 Jun 2026

Abstract

This article presents a simple, original data augmentation algorithm for Received Signal Strength Indicator (RSSI), dedicated to indoor localization systems. The aim of the research was to develop a synthetic data generation method to serve as a regularization technique, making models more robust against measurement noise. The proposed approach combines propagation modeling using polynomial regression with the individual statistical characteristics of each Access Point (AP), accounting for signal fluctuations and a probabilistic signal outage mechanism. The effectiveness of the proposed solution was experimentally verified by evaluating K-NN and MLP neural network models in both classification and regression variants. The study was conducted on datasets with different measurement grid granularities, demonstrating the algorithm’s ability to improve the generalization properties of estimators, even with a limited number of samples in the training set. The results showed that the use of augmentation reduced the Mean Absolute Error (MAE) by an average of approximately 20% for the dense training set and about 17% for the sparse set. Within the evaluated test environment, models trained on the augmented sparse measurement grid, which contained 67% fewer physical calibration points (30 points compared to the dense grid’s 92), reached a precision comparable to models trained on the dense real-world dataset. Analysis of histograms and Cumulative Distribution Functions (CDF) of the error confirmed the preservation of the signal’s statistical integrity and the effective mitigation of gross errors. The proposed solution constitutes an efficient and easy-to-implement alternative to complex generative models (e.g., GANs). These findings serve as a successful proof-of-concept and pilot study, laying the foundation for further development and validation in larger, more complex spatial environments. Full article

(This article belongs to the Special Issue Recent Advance of Auto Navigation in Indoor Scenarios)

►▼ Show Figures

Figure 1

17 pages, 8857 KB

Open AccessArticle

An Interpretable Deep Learning System for Fine-Grained Classification and Longitudinal Tracking of Neonatal Auricular Deformities

by Yihui Feng, Xujun Hu, Xiwen Zhang, Xiaobao Ma, Jialin Xie, Jianyong Chen and Yangyang Yuan

Biology 2026, 15(13), 985; https://doi.org/10.3390/biology15130985 (registering DOI) - 23 Jun 2026

Abstract

Early non-invasive correction of neonatal auricular deformities is highly dependent on timely and precise diagnosis. However, clinical practice is often compromised by the subjectivity of visual assessments and the lack of objective tracking metrics, which frequently leads to missed optimal treatment windows. To address these challenges, we developed an interpretable deep learning-based diagnostic system for the automated screening and fine-grained classification of these deformities. Methodologically, a large-scale, multi-source dataset (n = 4644) was curated to support model training. The system pairs an automated object detector (YOLOv11) for background-reduced region-of-interest isolation with a cascaded classification pipeline optimized via ConvNeXt-Tiny. Crucially, we introduced a supervised contrastive learning module to project high-dimensional morphological features into a continuous severity score, enabling quantitative longitudinal tracking of therapeutic efficacy. To evaluate generalization and robustness, the framework underwent rigorous evaluation across three independent real-world cohorts and one controlled synthetic stress test. The system achieved 88.2% accuracy (Area Under the Curve (AUC): 0.949) in binary screening and 87.4% accuracy (macro-AUC: 0.976) in multi-class subtyping on the internal baseline. To enhance interpretability and build clinical trust, Gradient-weighted Class Activation Mapping (Grad-CAM) was utilized to explore the spatial distribution of the model’s attention, which frequently aligned with key anatomical landmarks. Furthermore, the learned severity scores robustly quantified post-intervention improvements (p = 0.0004), effectively capturing subtle anatomical normalization. While validation for rare subtypes remains underpowered, and the severity score currently functions mainly as a learned morphological similarity index requiring future clinical calibration, this study ultimately provides an objective and standardized web-based tool to facilitate the early intervention and precision management of neonatal auricular anomalies. Full article

(This article belongs to the Special Issue AI Deep Learning Approach to Study Biological Questions (3rd Edition))

►▼ Show Figures

Figure 1

35 pages, 7584 KB

Open AccessArticle

A Comparative Study of Time Series Clustering Performance with Classification as a Benchmark

by Maria Sadowska and Krzysztof Gajowniczek

Big Data Cogn. Comput. 2026, 10(7), 201; https://doi.org/10.3390/bdcc10070201 (registering DOI) - 23 Jun 2026

Abstract

This paper extends a previous classification study by examining clustering methods on the same synthetic datasets and comparing their behavior with the previously obtained classification results. This study investigates the performance of selected time series clustering methods under controlled changes in noise level and class complexity. Six clustering methods representing distance-based, feature-based, and deep learning approaches were evaluated on 82 balanced synthetic datasets. The datasets contained from two to six classes, different levels of additive Gaussian noise, 200 time series per dataset, and 1000 observations per time series. The analysis focused on clustering quality, comparative behavior with classification models, and computational cost in terms of training time and peak memory usage. Clustering quality was assessed mainly using Adjusted Rand Index and V-measure, while accuracy after Hungarian label matching was used as an auxiliary measure for comparison with classification models. The results show that distance-based methods, and particularly TimeSeriesKMedoids, achieved the most robust and consistent clustering performance across the considered settings. Clustering quality decreased with both the number of classes and the noise level, but the effect of noise was clearly stronger. Feature-based and deep learning-based clustering methods were generally more sensitive to noise, while deep models were also associated with substantially higher computational cost. In terms of memory usage, classical clustering methods remained below 50 MiB, whereas deep learning-based clustering methods required substantially more memory. This study further shows that accuracy computed after Hungarian label matching may provide an overly optimistic view of clustering quality. Accuracy after Hungarian label matching is reported only as an auxiliary metric, while the main interpretation of clustering quality is based on structure-sensitive measures such as Adjusted Rand Index and V-measure. Overall, the findings highlight the importance of robust distance-based approaches and of using structure-sensitive evaluation measures when analyzing time series clustering. Full article

(This article belongs to the Section Data Mining and Machine Learning)

►▼ Show Figures

Figure 1

25 pages, 15914 KB

Open AccessArticle

A Safety-Case-Driven Hybrid Digital Twin for Centrifugal Compressor Health Monitoring

by Hezrone Mujawo and Oyeniyi Akeem Alimi

Machines 2026, 14(7), 712; https://doi.org/10.3390/machines14070712 (registering DOI) - 23 Jun 2026

Abstract

Centrifugal compressors are critical assets in the oil and gas, petrochemical, and power generation industries, where unplanned downtime results in severe economic and safety consequences. Despite the application of digital twin technology for predictive maintenance, existing approaches struggle to combine accurate degradation modeling with formal assurance evidence that regulators and operators demand before trusting machine learning-augmented systems. This paper proposes a hybrid digital twin framework whose architecture is structured around a formal safety case template, addressing both the accuracy and the trustworthiness challenges simultaneously. The methodology couples a first-principles thermodynamic model with a neural-network residual learner, and the complete system is organized through a design-stage safety case constructed in Goal Structuring Notation. The design stage identifies the requirements for operational deployment. Validation through a simulation study on a one-year synthetic operational dataset shows that the hybrid model reduces root-mean-square prediction error by over 50% for both pressure ratio and polytropic efficiency compared to the physics-only baseline. The anomaly detection module, presented here as a proof of concept, achieves 92% recall in identifying injected faults, and a composite health index tracks the progression of fouling, erosion, and seal wear over the simulated service life. This study is purely theoretical, with no experimental measurements conducted. It demonstrates the structural viability and coherence of the proposed framework within a controlled environment, providing a solid theoretical and computational foundation for future physical validation efforts. These findings provide preliminary evidence that embedding a structured safety argument into the design of a hybrid digital twin is technically feasible and beneficial for building the confidence needed to deploy such systems in safety-critical industrial environments. Full article

(This article belongs to the Special Issue Intelligent Fault Diagnosis and Predictive Maintenance Systems: Advanced Methods for Industrial Equipment and Dynamic Operating Conditions)

►▼ Show Figures

Figure 1

29 pages, 1519 KB

Open AccessArticle

Spatial Multi-Sensor Fusion with Heterogeneous Error Characteristics

by Ben Ingram, Rodrigo Paredes, Joel Díaz, Felipe Besoaín and Ricardo Baettig

Appl. Sci. 2026, 16(13), 6294; https://doi.org/10.3390/app16136294 (registering DOI) - 23 Jun 2026

Viewed by 37

Abstract

Fusing spatial observations from sensors with heterogeneous error characteristics is a persistent challenge in geostatistics. Classical kriging assumes a Gaussian likelihood for all observations, an assumption that fails when sensors exhibit one-sided or asymmetric noise. We present a Variable Rank Kriging (VRK) formulation that supports per-observation heterogeneous likelihoods where each observation may define its own likelihood function, thus enabling principled fusion of sensors whose noise structures are significantly different in terms of distribution family and magnitude. Within this framework, we use the exponential (one-sided) likelihood as a case study to demonstrate the method and compare it with sampling-based numerical alternatives for general likelihoods without closed forms. A non-collocated RTK calibration workflow uses kriging predictions from a sparse high-accuracy reference to characterise sensor-specific likelihood parameters without requiring co-located paired observations. Synthetic 1-D and 2-D experiments show that correct per-point likelihood specification reduces RMSE by up to 92% (1-D) and 57% (2-D) relative to a misspecified Gaussian model while also eliminating systematic positive bias. A demonstration using NEON Airborne Observation Platform lidar data at Harvard Forest confirms these findings in a practical, real-world scenario. Across multiple subsamples of the lidar dataset, the exponential likelihood reduces vegetated-zone RMSE by

20.6 %

(open zone:

18.6 %

) and mean absolute bias by

26.5 %

relative to a heteroscedastic Gaussian baseline. The open-source vrk Python (

> =

3.10) package provides a reproducible implementation that can be applied to any spatial domain that requires multi-sensor spatial fusion with heterogeneous error structures. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

43 pages, 4986 KB

Open AccessArticle

Enhanced Data Security in Metadata-Governed Cloud IOT Using Optimized Provenance and Access Control Through MARShield, ThreshGuard and SentinelScheduler

by Abbi Kala, Mahalakshmi Guruvayur Suryanarayanan and Sendhilkumar Selvaradjou

Appl. Sci. 2026, 16(12), 6280; https://doi.org/10.3390/app16126280 (registering DOI) - 22 Jun 2026

Viewed by 202

Abstract

Manual data storage methods on various mobile devices, IoT devices, and traditional computing platforms still lack sufficient security governance due to the absence of a unified security framework. Unlike application controlled environments, manual storage locations such as file systems, removable media, and IoT devices are highly susceptible to unauthorized access, misuse, and exfiltration. To address this problem, the paper proposes a security framework for manual storage systems using metadata, and the proposed framework includes three different algorithms, namely MARShield, ThreshGuard, and SentinelScheduler. These three algorithms operate together to ensure security for manual storage systems. MARShield is used for enforcing immutable metadata, multi-access rights based on tokens, and persistent source tracking by cryptographically securing provenance logs. ThreshGuard, on the other hand, enables the use of adaptive threshold-based misuse regulation and bottleneck-controlled serialized execution. SentinelScheduler optimizes the use of cryptography by incorporating trust-based application profiling and idle-time scheduling for heavy security operations. The proposed methodology is evaluated using a hybrid approach combining real-world datasets (CIC-IoT2023, TON-IoT, Bot-IoT and ISCX VPN non-VPN) and dataset-driven synthetic access pattern generation. Real datasets are used to model realistic IoT traffic behaviors, while additional synthetic scenarios are introduced to evaluate adaptability against evolving and previously unseen attack patterns. Network level features from these datasets are systematically transformed into storage-level access behaviors to evaluate metadata-driven access control. The experimental results indicate improved detection accuracy (94.6%), reduced false positive rate (4.3%), improved misuse control efficiency (92%) and scalability (94%). The proposed methodology for securing manual storage domains is scalable, adaptive, and portable, extending the security of applications and their associated domains. Full article

►▼ Show Figures

Figure 1

28 pages, 6207 KB

Open AccessArticle

Machine Learning-Driven Rapid Optimization of Solar Power Plant Sizing Using HOMER-Generated Synthetic Scenarios

by Nazım Elmalı and Cemil Altın

Sustainability 2026, 18(12), 6364; https://doi.org/10.3390/su18126364 (registering DOI) - 22 Jun 2026

Viewed by 279

Abstract

Solar power plants are among the most widely used renewable energy sources today. Varying radiation levels from region to region, and similarly varying consumption depending on the user within a given region, make the optimal sizing of these plants challenging. In this study, a machine learning-based surrogate model for the real-time sizing optimization of solar power plants, trained with a completely original dataset, has been developed. In the first stage, 500 different solar power plant installation scenarios were synthetically generated and evaluated in HOMER, and the obtained optimal sizing outputs were used as training targets for the proposed surrogate model rather than real operational data. The results obtained by applying various machine learning methods to the generated dataset are presented comparatively. Among 7 different machine learning models, XGBoost, Gradient Boosting, and LightGBM demonstrated the best performance. The developed model achieved an average R² score of 0.9425 for a total of 3 targets, while target-specific performance showed R² scores of 0.9747 for inverters, 0.9365 for PV panels, and 0.9165 for batteries. This model serves as a computationally efficient surrogate of the HOMER optimization process, enabling high-accuracy real-time predictions while significantly reducing the computational burden associated with intensive mathematical calculations, iterative procedures, and complex search spaces. Full article

(This article belongs to the Special Issue AI-Driven Low-Carbon Sustainable Energy Systems: System Design, Computational Strategies, and Emerging Innovations)

►▼ Show Figures

Figure 1

10 pages, 223 KB

Open AccessReview

Generative AI and Language Models in Human Genetics and Health: From Variant Interpretation to Clinical Decision Support

by Yael Pinchevsky Itan and Yuval Itan

Genes 2026, 17(6), 723; https://doi.org/10.3390/genes17060723 (registering DOI) - 22 Jun 2026

Viewed by 142

Abstract

Generative artificial intelligence (AI) is transforming biological and medical research and data analysis. Beyond analyzing existing information, these models can learn complex patterns and generate new data such as realistic protein sequences, genetic variants, or clinical notes. In molecular biology, language-like sequence models can read and generate DNA, RNA, and amino acid sequences to predict genetic variant effects, design new proteins, and explore molecular functions. In medicine, large language models (LLMs) trained on biomedical literature and electronic health records (EHRs) can summarize clinical findings, identify patterns, and provide decision support for clinicians and healthcare providers. Additionally, synthetic data generation can help protect patient privacy and augment existing disease datasets. While these advances make tasks that were previously impractical possible at scale, they also carry major risks, including producing convincing but incorrect results, reflecting hidden biases in the training data, and underperforming when real-world conditions change. Full article

(This article belongs to the Section Technologies and Resources for Genetics)

►▼ Show Figures

Figure 1

24 pages, 4627 KB

Open AccessArticle

A State Space Model-Driven Feature Disentanglement Network for Real-Time Detection of Morphologically Complex Insect Pests in Agricultural Fields

by Jiaren Sun, Yating Jiang, Shuai Teng, Zongchao Liu and Nuo Chen

Modelling 2026, 7(3), 122; https://doi.org/10.3390/modelling7030122 (registering DOI) - 21 Jun 2026

Viewed by 150

Abstract

Accurate detection of field insect pests remains a significant challenge for precision agriculture due to the elongated and variable morphology of the target organisms, their frequent resemblance to complex background textures, and the long-tail distribution of species in natural datasets. While deep convolutional neural networks (CNNs) have advanced the field, they are often constrained by a limited effective receptive field and the entanglement of semantic and spatial features, which can lead to elevated false-positive rates and missed detections for low-contrast or rare targets. This paper introduces a novel detection framework that integrates state space modeling with multi-stream feature disentanglement to address these limitations. First, a visual state space module is employed as the backbone feature extractor, enabling the establishment of a global receptive field with linear computational complexity and thereby improving the perception of long-range morphological structures. Second, a Topological Feature Disentanglement Pyramid Network is proposed. This architecture explicitly separates feature representations into semantic and spatial streams and recombines them through graph convolutional interactions, which serves to suppress background interference and enhance localization precision. A meta-auxiliary detection head, active only during training, is introduced to amplify supervision signals for hard, low-contrast samples via adversarial gradient modulation. Furthermore, an implicit neural radiance field augmentation pipeline is used to generate physically consistent synthetic views of underrepresented pest classes, mitigating the negative effects of long-tail data distributions. Experimental evaluations on the public BAU-Insectv2 benchmark demonstrate that the proposed method achieves a mean average precision (mAP@0.5) of 81.8%, representing a 4.4-percentage-point improvement over a comparable baseline, while maintaining a compact parameter count of 2.33 M and an inference speed of 178.6 FPS. The framework exhibits particular efficacy in detecting elongated, minute, and rare pests, suggesting a promising technical approach for real-time, field-based pest surveillance in precision agriculture. Full article

►▼ Show Figures

Figure 1

26 pages, 8518 KB

Open AccessArticle

CVA-Net: Multi-View 3D Reconstruction for Fringe Projection Profilometry via Cross-View Attention and Sim2Real Learning

by Zuqiong Chen, Xiaopin Zhong and Yibin Tian

Photonics 2026, 13(6), 601; https://doi.org/10.3390/photonics13060601 (registering DOI) - 21 Jun 2026

Viewed by 194

Abstract

Fringe projection profilometry (FPP) is widely used for 3D reconstruction, but conventional single-view FPP systems suffer from inherent occlusions and shadow regions, leading to incomplete surface recovery. In this study, we propose CVA-Net, an end-to-end deep learning framework with cross-view attention (CVA) that directly reconstructs dense depth maps from multi-view fringe patterns. CVA-Net simultaneously processes four fringe images acquired from orthogonal projection directions and leverages a CVA module to explicitly model inter-view dependencies, enabling adaptive fusion of complementary information. A 3D U-Net backbone with attention gates, atrous spatial pyramid pooling (ASPP), and an auxiliary parameter estimation branch further enhances reconstruction accuracy and structural consistency via multitask learning. To support Sim2Real network training, we build a Blender-based digital twin of a multi-view FPP system and generate a large-scale synthetic dataset with perfect ground truth. Extensive experiments on both synthetic and real-world objects demonstrate that CVA-Net significantly outperforms state-of-the-art single-view methods. With a symmetric four-view configuration and fringe period of 8, CVA-Net achieves an MAE of 0.0359 mm, an MSE of 0.0379 mm² and an RMSE of 0.1947 mm, reducing the MAE, MSE, and RMSE by 32.8%, 54.1%, and 32.2%, respectively, compared to the best single-view competitor. Ablation studies validate the contribution of each architectural component, while real-system experiments demonstrate the feasibility of transferring a network trained purely on synthetic data to practical FPP measurements without domain adaptation. Although further improvements are required to enhance reconstruction accuracy under real imaging conditions, the proposed framework provides an effective initial step toward bridging the gap between digital-twin-based training and real-world multi-view FPP applications. CVA-Net provides a robust, occlusion-aware solution for multi-view FPP reconstruction. Full article

(This article belongs to the Special Issue Optical Imaging for 3D Surface and Phase Recovery: Techniques and Applications)

►▼ Show Figures

Figure 1

21 pages, 673 KB

Open AccessReview

Bridging Ancestry-Stratified Bias in Pharmacogenomics AI: Toward Metabolomics-Inclusive Multi-Omics Precision Medicine

by Heayyean Lee, Khadijah Sajid and Dayeon Lee

J. Pers. Med. 2026, 16(6), 332; https://doi.org/10.3390/jpm16060332 (registering DOI) - 20 Jun 2026

Viewed by 188

Abstract

Pharmacogenomics AI offers significant potential for individualized drug therapy; however, its clinical benefits remain unevenly distributed. Models trained predominantly on European-ancestry data consistently underperform in non-European populations, with polygenic risk scores (PRS) showing an estimated 39–73% reduction in predictive accuracy in African-ancestry cohorts across complex traits. These disparities have driven increased interest in moving beyond single-layer genomic approaches. Multi-omics frameworks integrating genomic, transcriptomic, proteomic, and metabolomic data have emerged as a promising strategy to improve prediction across heterogeneous clinical populations, as each molecular layer provides distinct and complementary biological information. Among these layers, metabolomics may represent a particularly transferable component across populations. Metabolite profiles capture the downstream functional output of biological systems influenced by genetic, environmental, dietary, and microbiome-related factors, and may therefore be less reliant on ancestry-stratified allele frequency structures that underlie performance disparities in genomic models. This review synthesizes evidence regarding the mechanistic basis of genomic bias in pharmacogenomics AI, the emerging role of multi-omics integration, especially metabolomics, in improving predictive performance, and the current landscape of computational strategies for bias mitigation, including federated learning, transfer learning, domain adaptation, and synthetic data generation. Collectively, current evidence supports metabolomics-inclusive multi-omics frameworks as a biologically plausible, hypothesis-generating strategy to reduce reliance on ancestry-linked genomic features. However, direct evidence that such frameworks reduce ancestry-related bias in clinical AI outputs remains limited, underscoring the need for globally diverse datasets and prospective multi-population validation. Full article

(This article belongs to the Section Omics/Informatics)

►▼ Show Figures

Figure 1

26 pages, 357 KB

Open AccessArticle

A Reproducible Synthetic Socio-Digital Network Dataset for Analyzing Digital Gaps in Community-Based Tourism Communities in Rural Ecuador

by Dolores Mieles-Cevallos, Lourdes Suntagsi-Tuasa, Jael Zambrano-Mieles, Velasco Zambrano-Burgos, Miguel Vera, Nicolás Márquez and Cristian Vidal-Silva

Data 2026, 11(6), 151; https://doi.org/10.3390/data11060151 (registering DOI) - 20 Jun 2026

Viewed by 181

Abstract

Digital transformation has become an essential component of sustainable rural development, yet substantial inequalities persist in how communities access, adopt, and benefit from digital technologies. Understanding these disparities requires not only information about technological resources but also knowledge of the relational structures through which information, support, and opportunities circulate. This article presents a reproducible synthetic socio-digital network dataset designed to support the analysis of digital gaps in community-based tourism (CBT) environments. Rather than containing original respondent-level observations, the repository was computationally reconstructed from aggregate statistics derived from field studies conducted in three rural communities in the province of Guayas, Ecuador: Bucay (5 de Septiembre), Manglares Churute, and Ruta de los Chirijos. All node-level records, survey variables, and support relationships included in the repository were synthetically generated to preserve aggregate community characteristics while protecting participant confidentiality and preventing individual re-identification. The repository contains synthetic actor metadata, reconstructed socio-digital variables, directed support networks, graph representations in interoperable formats, and precomputed Social Network Analysis (SNA) indicators. The dataset includes 90 synthetic actors, more than one thousand generated support interactions distributed across multiple socio-digital dimensions, machine-readable metadata, and reusable scripts for preprocessing, validation, graph construction, and metric computation. The represented dimensions include financial assistance, training support, information exchange, technological support, social media promotion, institutional collaboration, trust, and emotional closeness. To facilitate reuse, all resources are distributed in standardized formats compatible with NetworkX, Gephi, Neo4j, and graph-learning frameworks. The repository follows FAIR principles and includes documentation intended to support transparency, reproducibility, and methodological benchmarking. Potential applications include social network analysis, graph mining, graph neural networks, digital inequality research, computational social science, community resilience studies, and educational activities. By providing an openly documented synthetic dataset and reproducible computational workflow, the repository contributes to the study of socio-digital systems, privacy-preserving data sharing, and community-level digital transformation processes. Full article

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 45.

Go to page 1 2 3 4 5

Search Results (2,249)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI