Next Article in Journal
Context-Dependent Modulation of Breast Cancer Cell E-Cadherin Expression, Mitogenesis, and Immuno-Sensitivity by Immortalized Human Mesenchymal Stem Cells In Vitro
Previous Article in Journal
Calpain-1 and Calpain-2 Promote Breast Cancer Metastasis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Artificial Intelligence in Ocular Transcriptomics: Applications of Unsupervised and Supervised Learning

by
Catherine Lalman
1,2,
Yimin Yang
3 and
Janice L. Walker
1,2,4,*
1
Department of Pathology and Genomic Medicine, Thomas Jefferson University, Philadelphia, PA 19107, USA
2
Sidney Kimmel Medical School, Thomas Jefferson University, Philadelphia, PA 19107, USA
3
Department of Electrical and Computer Engineering, Western University, London, ON N6A 3K7, Canada
4
Department of Ophthalmology, Thomas Jefferson University, Philadelphia, PA 19107, USA
*
Author to whom correspondence should be addressed.
Cells 2025, 14(17), 1315; https://doi.org/10.3390/cells14171315
Submission received: 25 July 2025 / Revised: 19 August 2025 / Accepted: 24 August 2025 / Published: 26 August 2025

Abstract

Transcriptomic profiling is a powerful tool for dissecting the cellular and molecular complexity of ocular tissues, providing insights into retinal development, corneal disease, macular degeneration, and glaucoma. With the expansion of microarray, bulk RNA sequencing (RNA-seq), and single-cell RNA-seq technologies, artificial intelligence (AI) has emerged as a key strategy for analyzing high-dimensional gene expression data. This review synthesizes AI-enabled transcriptomic studies in ophthalmology from 2019 to 2025, highlighting how supervised and unsupervised machine learning (ML) methods have advanced biomarker discovery, cell type classification, and eye development and ocular disease modeling. Here, we discuss unsupervised techniques, such as principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and weighted gene co-expression network analysis (WGCNA), now the standard in single-cell workflows. Supervised approaches are also discussed, including the least absolute shrinkage and selection operator (LASSO), support vector machines (SVMs), and random forests (RFs), and their utility in identifying diagnostic and prognostic markers in age-related macular degeneration (AMD), diabetic retinopathy (DR), glaucoma, keratoconus, thyroid eye disease, and posterior capsule opacification (PCO), as well as deep learning frameworks, such as variational autoencoders and neural networks that support multi-omics integration. Despite challenges in interpretability and standardization, explainable AI and multimodal approaches offer promising avenues for advancing precision ophthalmology.

1. Introduction

Understanding the cellular and genetic landscape of ocular tissues is fundamental to advancing knowledge of eye development, physiology, and disease. The eye is a complex organ composed of many different tissues, including the retina, the retinal pigment epithelium, the choroid, and the optic nerve in the posterior eye and the cornea, the lens, the ciliary body, the iris aqueous humor, and the trabecular meshwork in the anterior segment [1,2,3]. As the eye develops from different embryonic tissues, including ectoderm (both surface and neuroectoderm) and mesoderm, with important contributions from neural crest cells and mesenchymal cells, a complex and delicate network of genomic interactions must be precisely regulated to ensure normal morphogenesis and functional differentiation for proper visual function [1,4,5]. Given this spatial and cellular heterogeneity, transcriptomic analysis approaches, including microarrays and bulk and single-cell RNA-seq, have emerged as powerful strategies for investigating ocular biology at multiple scales. For example, bulk RNA-seq has enabled transcriptome-wide profiling across many ocular tissues and revealed age-related inflammatory genes in the choroid and dynamic changes in the Wnt, Hedgehog, and Notch pathways during early retinal progenitor cell proliferation [6,7]. More recently, single-cell RNA-seq (scRNA-seq) has generated high-resolution gene expression maps of the retina, the RPE (retinal pigment epithelium), and the choroid, enabling the identification and molecular characterization of distinct cell populations [8,9,10,11]. Transcriptomic approaches have also provided valuable insights into the molecular pathogenesis of numerous eye diseases, including keratoconus, glaucoma, thyroid eye disease (TED), AMD, and DR. Microarray transcriptomic studies have identified disease-associated gene signatures in glaucoma, TED, and DR [12,13,14]. Bulk RNA-seq has implicated dysregulation of the Wnt and Notch1 pathways in the corneal degradation observed in keratoconus [15]. Similarly, scRNA-seq analyses have revealed inflammatory and immune-related transcriptional signatures that may contribute to DR severity and AMD progression [16,17].
However, while transcriptomic analyses have significantly expanded our understanding of differential gene expression in the eye, the datasets they generate often comprise tens of thousands of genes measured across hundreds to millions of cells [18]. Furthermore, these datasets are frequently noisy and exhibit nonlinear structures, which presents challenges for conventional analytic approaches [19]. To address this, many AI algorithms, ranging from unsupervised techniques to various ML algorithms, have been developed and employed. While statistical transcriptomics focuses on detecting differentially expressed genes using predefined models, AI-based approaches uncover complex, nonlinear patterns and enable predictive analysis of high-dimensional gene expression data. These approaches have been used to cluster retinal cell types in single-cell atlases, identify ferroptosis-related gene signatures in keratoconus, and classify transcriptomic profiles in glaucoma using predictive models [8,10,20,21]. As such, applying AI algorithms to large transcriptomic datasets offers an opportunity to uncover hidden regulatory networks and identify disease-associated biomarkers more efficiently. In fact, these strategies have been successfully applied in identifying genes related to the progression of AMD, DR, and keratoconus [17,21,22,23,24].
Despite the growing number of studies applying AI to ocular transcriptomic datasets, to the best of our knowledge, there are currently no comprehensive reviews synthesizing these approaches across multiple transcriptomic modalities and disease contexts in the eye [25,26,27]. In contrast, several articles have reviewed the applications of AI to diagnosing ophthalmic diseases, summarizing numerous studies that have been conducted on the different ways AI can be utilized to clinically diagnose ocular diseases, including glaucoma, keratoconus, DR, and AMD [28,29]. The reader is referred to the following comprehensive reviews on this topic, which is beyond the current focus of this review [28,29]. Importantly, none, however, provide a unified synthesis spanning bulk RNA-seq, scRNA-seq, and microarray data, integrating both machine learning and deep learning approaches across the full spectrum of ocular diseases. By consolidating these advances into a single, structured framework, our review fills a critical gap by offering researchers a comprehensive and methodologically diverse reference on this topic. This is timely and useful considering how quickly AI-based technologies are being integrated to reshape big “omic” data analyses in the eye, providing both new opportunities and potential challenges, all of which are considered and discussed in this review.
We conducted a structured literature search of PubMed, Scopus, and Google Scholar for articles published between 1 January 2019, and 1 August 2025. Search terms combined keywords related to transcriptomic modalities and AI methods: transcriptomics, RNA-seq, microarray, machine learning, deep learning, artificial intelligence, and ophthalmology. Boolean operators and truncation were applied to broaden results (e.g., (“RNA-seq” OR transcriptomics) AND (“machine learning” OR “deep learning”) AND ophthalmology). Eligible studies included primary research applying AI (machine learning or deep learning) to microarray, bulk RNA-seq, or single-cell RNA-seq data in ocular tissues or diseases using either human or animal data with translational relevance to ophthalmology. Review articles were also consulted for background information and to identify additional relevant primary studies. We excluded commentaries, conference abstracts, preprints, unpublished studies, non-English publications, and research not involving transcriptomic data or unrelated to ophthalmology. All titles and abstracts were screened by a single reviewer, with full texts assessed to confirm eligibility. Quality assessment included recording study design, dataset size, transcriptomic modality, AI approach, validation strategy, and reproducibility measures (e.g., code or data availability). While nearly all studies described internal validation (e.g., cross-validation, train/test splits), external validation using independent cohorts was uncommon. We distinguished between openly accessible custom pipelines (e.g., GitHub repositories or dedicated websites) and cases where authors relied on standard public libraries (e.g., scikit-learn, Seurat) or indicated that the code was “available upon request.” Only the former was classified as publicly available code.
We begin by outlining transcriptomic modalities, including microarray, bulk RNA-seq, and single-cell RNA-seq, to provide context for each data structure’s advantages and analytical challenges. We then describe the AI approaches that have been previously applied, including unsupervised shallow learning, unsupervised deep learning, supervised shallow learning, and supervised deep learning. We also describe their roles in data preprocessing, pattern discovery, and predictive modeling. Finally, we synthesize these applications across several ophthalmic diseases, including corneal disorders, retinal development, macular degeneration, diabetic retinopathy, glaucoma, thyroid eye disease, and posterior capsule opacification, emphasizing the biological insights that were identified. Figure 1 summarizes these AI approaches and the associated biological clinical outputs.

2. Transcriptomic Modalities

2.1. Microarray

Microarray transcriptomic profiling, which was one of the earliest high-throughput approaches for measuring genome-wide gene expression, uses pre-designed DNA probes fixed to a chip, allowing only known transcripts to be detected [30,31]. Although microarray use has largely given way to RNA-seq technologies, which involves direct sequencing, microarrays remain valuable due to their cost-effectiveness, standardized platforms, and broad availability of archived datasets [32]. Traditional analytic pipelines for microarray data already incorporate various computational tools for normalization, such as limma and downstream pathway enrichment using DAVID or GSEA [33,34,35,36]. However, microarray datasets are sensitive to both technical and biological sources of variability, requiring robust normalization and correction to ensure comparability [37]. Variability can be introduced during sample collection, RNA extraction, labeling, hybridization, and scanning, potentially obscuring true biological differences if not carefully controlled [38,39]. The quality of the sample itself, including RNA integrity and yield, further contributes to noise and technical variance [40]. Furthermore, batch effects are a major source of confounding bias in microarray analyses and can obscure true biological differences unless addressed through statistical approaches [41]. Moreover, microarray platforms’ lower dynamic range and inability to detect novel transcripts or isoforms can constrain biological interpretation [42].
In recent years, AI-based methods have been applied to address these challenges by enhancing signal extraction, reducing noise through feature selection and dimensionality reduction, and integrating multi-cohort datasets to improve robustness. However, their performance ultimately depends on the quality of the underlying data, underscoring the need for rigorous normalization, batch correction, and standardized processing. For example, Suo et al. utilized limma to normalize data and correct for batch effects to integrate four different GEO microarray datasets. They then utilized SVM and RF to identify genes involved in the pathogenesis of open-angle glaucoma. Similarly, Wu et al. also used classification models, including support vector machine–recursive feature elimination (SVM-RFE), to identify ferroptosis-related genes that may play a significant role in the development of keratoconus [12,21]. As such, microarray data remain a valuable resource for hypothesis generation and validation, particularly when coupled with AI-based modeling.

2.2. Bulk RNA-Seq

Bulk RNA-seq was the first widely adopted high-throughput transcriptomic tool, and it has been used to analyze the gene expression patterns in mixed cell populations at scale [43]. In ophthalmology and vision research, bulk RNA-seq has been extensively applied to investigate gene expression changes that occur with keratoconus, glaucoma, cataract development, and retinal degeneration [44,45,46,47]. Traditionally, these studies have relied on established computational pipelines for read mapping and quantification, using tools like DESeq2, limma, edgeR, and BaySeq [33,48,49,50]. Differential expression analysis is then typically performed to identify statistically significant gene changes between conditions, while principal component analysis is frequently used for exploratory data visualization and dimensionality reduction [51]. Furthermore, tools like DAVID and clusterProfiler are widely used for functional enrichment analysis, enabling researchers to assign differentially expressed genes to biological processes, molecular functions, and canonical pathways, such as those in Gene Ontology (GO) or KEGG [34,52]. Although RNA-seq’s scalability and cost-effectiveness have enabled researchers to analyze a wide range of biological systems, the technique’s complex workflow presents numerous challenges that must be addressed to ensure the extraction of meaningful biological insights [53,54]. At any stage of the RNA-seq pipeline, technical or analytical issues can introduce bias into the dataset [54]. Examples include intergroup sample variability, sample to sample variation, batch effects, and the misuse of normalization methods. Tissue heterogeneity, in particular, remains a significant source of confounding variation, as differences in cellular composition can obscure true biological signals [53,55,56,57]. These sources of variation can generate spurious associations and limit the reproducibility of transcriptomic findings. Addressing them requires standardization of experimental protocols and rigorous quality control measures, such as RNA integrity assessment, normalization, batch effect correction, and computational deconvolution, to ensure data are both comparable and biologically meaningful.
While these statistical tools have proven effective in identifying enriched pathways and gene groups, they often fall short when dealing with high-dimensional, noisy datasets and subtle, nonlinear regulatory relationships. Moreover, traditional enrichment approaches frequently return thousands of significant pathways, leaving researchers with limited guidance on how to prioritize biological relevance [58].
AI-based approaches, including ML and DL, offer more scalable solutions by uncovering patterns not detectable through linear methods and improving feature selection in the context of thousands of correlated variables [59,60,61]. These methods enhance exploratory and predictive power in transcriptomic studies by enabling the identification of latent biological structure and robust biomarkers [62].
This shift is exemplified in ophthalmic studies. For instance, Cheng et al. used limma and DESeq2 to normalize and identify differentially expressed genes in corneal and keratoconus datasets, followed by KEGG and GO enrichment via clusterProfiler. Their workflow incorporated sample quality control through RNA integrity assessment and normalization, which helped reduce technical variation. LASSO and SVM-RFE were applied to select key diagnostic biomarkers that were integrated into a predictive nomogram [63]. Similarly, Huang et al., combining edgeR and shinyGO with supervised ML algorithms, implemented batch effect correction to mitigate site-specific biases to improve feature prioritization [20]. Wang et al. employed limma and clusterProfiler for normalization and enrichment, followed by supervised ML, and addressed tissue heterogeneity by using computational deconvolution to adjust for cell type composition [60]. These examples illustrate a broader methodological transition, as AI is no longer a supplementary tool but increasingly forms the analytical backbone for interpreting bulk RNA-seq data at scale. However, their accuracy and generalizability depend heavily on the quality and consistency of the underlying biological data. Incorporating rigorous quality control and standardization measures, such as RNA integrity assessment, normalization, batch effect correction, and computational deconvolution, ensures that these models are trained on reproducible, biologically meaningful inputs, ultimately improving predictive accuracy and interpretability.

2.3. scRNA-Seq

Since its first application in 2009, where Tang et al. demonstrated that transcriptome-wide mRNA profiling could be successfully performed at the resolution of a single cell, scRNA-seq has become an essential tool for unraveling cellular heterogeneity [64]. Unlike bulk RNA-seq, which provides averaged expression profiles across mixed populations, scRNA-seq enables transcriptomic profiling at single-cell resolution, revealing distinct cellular subtypes, lineage trajectories, and cell state transitions. This is especially useful for analyzing tissues in the eye, where the resolution of scRNA-seq enables fine-grained analysis of gene expression changes across diverse cell types in the retina and the RPE/choroid [9]. Although the matrices generated by scRNA-seq are information-rich, they are often unwieldy, with tens of thousands of gene features per cell and sample sizes that can reach into the millions of cells [43]. This introduces major analytical challenges, including noise, nonlinear structure, and difficulty in clustering or labeling distinct cell populations [65,66]. Furthermore, factors like donor-to-donor variability, RNA degradation, uneven cell viability, and dissociation-induced gene expression changes can alter cell type proportions and confound clustering or classification results, ultimately influencing downstream AI model performance [67,68,69]. As such, unsupervised AI techniques, particularly clustering and dimensionality reduction, including PCA, t-SNE, and Louvain or Leiden clustering algorithms, are now embedded in nearly all scRNA-seq analysis pipelines and have become standard practice for identifying structures in high-dimensional single-cell data [65,70,71,72]. Other tools, such as Seurat and Monocle3, support normalization and pseudotime analysis, while LIGER enables cross-condition and cross-species integration of single-cell datasets [10,73,74].
Supervised AI methods have been increasingly applied to scRNA-seq datasets in ophthalmology to both improve data processing approaches, such as cell type clustering and labeling, and also to enhance classification accuracy and biomarker identification. For example, Miao et al. introduced the SCCAF (Single-Cell Clustering Assessment Framework), a supervised learning approach that iteratively refines cluster identities by training a classifier on gene expression features, allowing for the identification of novel and previously unannotated retinal cell types [65]. Zhang et al. applied seven supervised learning algorithms to identify genes predictive of developmental stages across eight fetal retinal cell types [74]. These examples demonstrate how supervised learning enables predictive modeling and cell type inference in complex single-cell datasets, complementing unsupervised techniques by adding quantitative rigor to cell classification and gene prioritization.

3. Artificial Intelligence Approaches

3.1. Unsupervised Machine Learning

3.1.1. Shallow Methods: PCA, Clustering, WGCNA

Unsupervised ML refers to computational methods that analyze and organize data without predefined labels or outcomes [75]. In transcriptomic analysis, these techniques are often used to uncover hidden patterns, group similar cells or samples, reduce dimensionality, or infer biological structure from high-dimensional gene expression data [18,43,66,76]. This is particularly valuable in transcriptomics, where each gene represents a dimension in a high-dimensional expression space [77]. Furthermore, many of the unsupervised ML methods currently utilized are considered shallow learning methods that rely on relatively simple model architectures and fewer parameters compared to deep learning [62,78]. These shallow methods are often sufficient for capturing lower-level structures in the data and widely used due to their interpretability and efficiency on smaller datasets [79]. In bulk RNA-seq ocular studies, dimensionality reduction techniques, such as PCA, are linear methods used to visualize samples with similar gene expression profiles by identifying combinations of variables that explain the greatest variance across samples [63,66,80,81,82]. Similarly, given that thousands of genes among tens of cell population clusters may be studied in a single sc-RNA-seq experiment, nonlinear dimensionality reduction techniques, such as t-SNE and UMAP, which calculate similarity scores between each pair of points within a dataset before projecting the data into a lower-dimensional space, are commonly applied [9,19,76,83]. Because PCA relies on linear projections, it is limited in its ability to capture nonlinear relationships in complex or noisy transcriptomic data [84]. While t-SNE is more effective than PCA at capturing nonlinear relationships, it often fails to preserve global structure, making it difficult to interpret trajectories or distances between clusters. In contrast, UMAP tends to better preserve both local and global structure, but its performance can be less stable on small datasets [85]. Another challenge in processing scRNA-seq data is that technical noise can be misinterpreted as true gene expression patterns. To address this, several imputation tools have been developed. One example is MAGIC, which improves data quality by constructing a graph of cells with similar expression profiles and averaging gene expression values of closely related cells in a process called diffusion to correct for outliers and smooth out noise [86,87]. Furthermore, to correct for batch effects, where gene expression levels across different samples vary systematically, unsupervised integration methods, such as Harmony, which aligns single-cell datasets across different batches or conditions by iteratively adjusting low-dimensional embeddings to minimize technical variation, are employed, as well [88,89,90].
Wang et al.’s (2022) multiome used Harmony to identify cell types, and several unsupervised clustering techniques exist, including k-means clustering, hierarchical clustering, graph-based clustering, and density-based clustering [91,92]. DEGreport, for example, utilizes hierarchical clustering, which calculates the correlation between expressions among groups of genes to better visually identify groups of genes that share similar expression profiles [93,94,95]. Wang et al. (2023) used this method to identify clusters of genes that were consistently expressed with the progression of AMD, while Wang et al. (2022) used DegReport to identify genes related to the progression of DR [16,17]. On the other hand, WGCNA is a slightly more complex gene clustering tool that constructs a co-expression network by first calculating pairwise correlation before further refining gene relationships using a topological overlap matrix (TOM) and then applying hierarchical clustering to prune connections that may not be as integral to disease states or other clinical traits [96]. In particular, WGCNA not only groups genes that demonstrate similar regulation patterns but also quantifies the relationships between the genes that are expressed within a sample. For example, Dong et al. utilized WGCNA to identify genes that may be associated with Sjogren’s syndrome to identify those that may be associated with keratoconjunctivitis sicca development [81]. WGCNA was also employed by Huang et al. and Ma et al. to identify key CD8+ T-cell-related genes in DR and to confirm that ML-prioritized genes are a part of GO pathways associated with AMD [97]. While hierarchical clustering algorithms have been applied in both bulk RNA-seq and scRNA-seq studies, graph-based clustering techniques, such as Leiden, play an integral role in scRNA-seq experiments and are used to identify cell populations that share expression profiles [65,71].
Another important application of unsupervised algorithms in scRNA-seq analysis is pseudotime inference, which computationally models dynamic cellular processes, such as differentiation. In this framework, cells are projected onto a trajectory graph, where each branch represents a potential lineage. The position of each cell along the trajectory defines its pseudotime, the relative measure of a cell’s progress through a biological process, inferred from gene expression patterns rather than chronological time. Many different algorithms exist, which differ in their ability to capture branching trajectories, resolve temporal resolution, handle complex topologies, and integrate prior biological knowledge or covariates into the modeling process [98]. On the other hand, Monocle3 reconstructs developmental trajectories by using UMAP to embed cells into low-dimensional space, constructing a graph that follows the shape of the data, and walking that graph to order cells along pseudotime. Palantir models cell fate as a probabilistic process, using diffusion-based graphs and Markov chains to calculate both pseudotime and the likelihood of each cell adopting a particular fate [99]. While analyzing the differentiation of retinal neuronal cells, Li et al. used a combination Monocle3 and Palantir to reconstruct their developmental trajectories [73].
As the previously mentioned techniques become more integrated into single-cell analysis pipelines, comprehensive toolkits, such as Seurat, which performs clustering, dimensionality reduction, trajectory inference, and batch correction using these methods, have become increasingly widespread [100,101]. For example, Jia et al. used Seurat to perform normalization, dimensionality reduction, and unsupervised clustering of their scRNA-seq and construct a cellular atlas of the trabecular meshwork in glaucomatous and healthy non-human primates, identifying 14 distinct cell types and contraction-related genes that were significantly downregulated in glaucoma [102]. Similarly, Zhang et al. and Li et al. used Seurat to perform normalization, dimensionality reduction, and clustering of macular degeneration datasets and fetal retinal transcriptomic and epigenetic data [24,73].
As transcriptomic studies increasingly span multiple conditions, modalities, and species, advanced integration frameworks have been developed to harmonize data while preserving biological variation. One such approach is LIGER, which uses nonnegative matrix factorization (NMF), a technique that is able to identify shared factors that capture common biological signals across all datasets and dataset-specific factors that reflect unique aspects of each dataset [103]. Liang et al. applied LIGER to integrate single-cell RNA-seq data from human, monkey, chicken, and mouse retinas, allowing for cross-species comparisons of neuronal subtypes and transcriptional regulators [10]. This enabled the identification of conserved and divergent features in retinal development and cell-type-specific expression programs. Similarly, Zhang et al. developed a Deep Subspace Nonnegative Matrix Factorization (DS-NMF) model to stratify AMD subtypes [24].
Among computational approaches used to interpret transcriptomic data, deconvolution has emerged as a powerful technique for estimating cell type composition within mixed tissue samples [43]. RNA-seq deconvolution is an unsupervised technique that infers the proportions of cell types within a heterogeneous sample based on gene expression profiles, which is particularly valuable when the isolation of individual cell types is challenging [104,105,106]. Among the algorithms that have been developed, CIBERSORT, which uses the ML technique support vector regression (v-SVR) to fit transcriptomic profiles against a reference gene expression matrix to profile immune cells, along with CIBERSORTx, an extension of the original algorithm that supports batch correction and provides greater cell-fraction resolution, are commonly used [107,108]. For example, Cheng et al. used CIBERSORT to calculate the proportions of immune cells in keratoconus samples [63]. Wang et al. (2023) used CIBERSORTx to estimate the infiltration of immune cells in retina tissues of different ages, while Ma et al. used the algorithm, as well as several other deconvolution techniques, to reveal the cellular composition of healthy and AMD samples [16,97]. While CIBERSORT and CIBERSORTx are designed to estimate the proportions of major immune cell types in bulk transcriptomic data, ImmucellAI has the ability to profile T-cell subpopulations and is able to recognize 18 different subtypes [109]. It was employed by Huang et al. to calculate the immune cell composition of DR and diabetic macular edema (DME) samples [61].
Gene set scoring algorithms represent another category of unsupervised transcriptomic analysis enabling functional interpretation beyond individual gene-level changes. These methods assign enrichment scores to predefined gene sets, such as Gene Ontology or KEGG pathways, on a per-sample or per-cell basis, allowing for the comparison of biological activity across experimental conditions [110,111]. One widely used method, Gene Set Variation Analysis (GSVA), transforms normalized expression data into pathway-level enrichment profiles, capturing subtle variations in pathway activity even in the absence of significant differential expression [112]. This makes GSVA particularly valuable for identifying biologically meaningful shifts in signaling or immune processes that may not be evident at the single gene level. For example, GSVA was employed by Huang et al. to quantify pathway activity related to CD8+ T-cell-associated genes in DR and applied across multiple diabetic nephropathy datasets to assess immune involvement. Similarly, a combination of GSVA and CIBERSORT was used by Wang et al. (2021) to profile the immune cell composition of RPE tissue in patients with AMD [113].
Kuchroo et al. applied an unsupervised machine learning framework based on manifold learning and diffusion-based clustering to single-nucleus RNA-seq data from AMD donor retinas. By grouping transcriptionally similar cells along a low-dimensional structure, their method reconstructed how cells transition between different states. This framework revealed distinct glial subpopulations and uncovered a conserved microglia-to-astrocyte IL-1β signaling axis driving neovascularization in late-stage AMD, demonstrating the power of geometry-aware unsupervised learning to resolve inflammatory mechanisms in retinal degeneration [23].
Put simply, this section covers tools that allow researchers to explore large gene expression datasets and uncover hidden patterns without knowing the correct answers in advance. These unsupervised tools can group similar cells, reduce the complexity of the data so that it is easier to see relationships, and reveal how cells might change over time. Some methods, like PCA, t-SNE, and UMAP, are mainly used for visualizing patterns, while others, like clustering algorithms and WGCNA, group genes or cells that behave similarly. Special approaches can also estimate what types of cells are in a tissue sample (deconvolution) or measure the activity of whole biological pathways (gene set scoring). Together, these methods give scientists a clearer picture of how genes, cells, and pathways work together in eye diseases, even when the data are messy or come from different experiments. The unsupervised machine learning techniques and their biological relevance have been summarized in Table 1.

3.1.2. Deep Methods: Autoencoders, scVI

Deep learning (DL), a subset of AI built on artificial neural networks, has emerged as a powerful tool for modeling complex, high-dimensional relationships in transcriptomic data. While traditional ML relies on manual or statistical feature engineering, such as selecting subsets of genes or performing principal component analysis (PCA), deep learning models, particularly neural networks and autoencoders, can automatically learn complex, hierarchical representations of transcriptomic data directly from raw inputs [114]. Additionally, DL models are built to capture complex, nonlinear relationships in high-dimensional data and can be efficiently trained on large-scale datasets [115,116]. Because DL models contain many parameters, they generally require large datasets to avoid overfitting and learn meaningful, generalized representations when compared to the shallow learning models described earlier [79]. For example, scVI is an unsupervised deep learning framework based on variational autoencoders that models gene expression in scRNA-seq data while accounting for technical variations, such as batch effects and dropout [115,117]. Unlike traditional methods, scVI learns the probabilistic structure of the data, representing each cell and gene expression value as a distribution rather than a fixed point. This allows it to perform denoising, batch correction, dimensionality reduction, and differential expression analysis with improved robustness and biological resolution [115]. This was utilized by Liang et al., where scVI integrated over 2 million single-nucleus RNA-seq and ATAC-seq profiles. scVI enabled rigorous batch correction and clustering, which supported the identification of 110 distinct retinal cell types and subtypes, including rare populations that were previously uncharacterized [10].
In simpler terms, deep learning uses layered networks of computational neurons to automatically find patterns in large and complex gene expression datasets. Unlike traditional methods, which often require researchers to choose specific genes or features ahead of time, deep learning can learn these patterns directly from the data. This makes it especially powerful for large studies, where it can uncover subtle relationships, correct technical noise, and reveal rare cell types that might otherwise be missed.

3.2. Supervised Learning

3.2.1. Shallow Methods: SVM, RF, LASSO

While unsupervised machine learning plays a fundamental role in many aspects of bulk RNA-seq and scRNA-seq studies, supervised machine learning, which relies on known sample labels, such as disease vs. control, or cell type identity, to classify samples or predict outcomes, has just begun to be incorporated into experimental pipelines. Like unsupervised methods, supervised techniques can be broadly categorized into shallow and deep learning approaches. Shallow learning models, such as support vector machines or random forests, are generally more interpretable and well-suited to smaller datasets [79,118]. In contrast, deep learning models, such as neural networks, are capable of modeling more complex, nonlinear relationships in large-scale data but typically require more training data and computational resources [62,78]. Supervised approaches are employed in ophthalmology to improve feature selection, predict disease progression, and model cellular development [24,60,80,97]. A variety of supervised methods have been employed in ocular transcriptomic studies, each offering distinct advantages in terms of interpretability, scalability, or modeling complexity. An overview of the shallow supervised algorithms discussed and their properties is provided in Table 2.
This table categorizes widely used supervised ML algorithms by type and describes their primary functions, strengths, and limitations in the context of gene expression studies. It highlights their applicability to high-dimensional biological data, particularly in single-cell and bulk transcriptomic analyses.
Several tools complement classifier-based approaches by enabling biological network analysis and module detection. For instance, CytoHubba and MCODE, both Cytoscape plugins, analyze protein–protein interaction (PPI) networks to reveal regulatory architecture. CytoHubba ranks genes based on topological features, such as connectedness and centrality, to identify hub genes, while MCODE detects densely connected modules that often correspond to molecular complexes or functionally related gene clusters [144,145]. These network-based methods provide complementary biological insights that may be missed by expression-based or ML-driven gene selection alone [59,60].
Miao et al. developed a clustering refinement framework for scRNA-seq data that combines simulation and supervised learning. They first simulated scRNA-seq data using multivariate normal distributions and Splatter to benchmark clustering accuracy. Their pipeline begins with over-clustering via standard algorithms, such as Louvain or Leiden, followed by supervised logistic regression to classify each cluster against all others. Marker genes were selected based on model coefficients, and poorly separable clusters were iteratively merged until all groups were distinguishable with high accuracy. Applied to the Shekhar et al. mouse retina dataset, the refined clustering closely matched the original annotations (Rand index > 0.99) and achieved an average Rand index > 0.94 on a retinal bipolar neuron dataset [65].
For keratoconus, Cheng et al. applied LASSO and SVM-RFE to select six genes from transcriptomic data, which were used to train logistic regression, support vector machine (SVM), and naïve Bayes models, all of which achieved high classification accuracy (AUCs (Areas Under the Curve) of 0.95 to 0.98) in distinguishing keratoconus from healthy controls [63]. Wu et al. similarly used SVM and SVM-RFE on microarray profiles to identify ferroptosis-related genes with strong diagnostic performance, suggesting a mechanistic link between ferroptosis and keratoconus [21]. Liu et al. analyzed keratoconus corneal data to identify genes related to programmed cell death, using LASSO and RF to define five signature genes related to ferroptosis [13]. Finally, Dong et al. performed bioinformatics and machine learning analysis on transcriptomic data from patients with Sjögren’s syndrome mediated keratoconjunctivitis sicca (KCS). They used LASSO and SVM-RFE to identify genes most associated with the development of Sjogren’s before validating three candidate genes on a separate KCS microarray dataset. Immune infiltration analysis performed with CIBERSORT revealed that these genes correlated with several immune cell types in the ocular surface, implicating them in the inflammatory response [81].
Multiple studies have also leveraged supervised learning to identify diagnostic and pathogenic markers in AMD. Han et al. used LASSO and SVM to define a 15-gene diagnostic signature, followed by the construction of RF, SVM, XGBoost, and GLM models to build a clinical prediction tool [59]. Ma et al. applied logistic regression, RF, neural networks, and XGBoost to gene expression profiles from 453 donor retinas, identifying an 81-gene glial-enriched signature that was validated through permutation testing and integration with AMD genome-wide associated data [97]. Zhang et al. employed SVM-RFE, RF, K-nearest neighbor (KNN), and Adaboost algorithms, along with differential gene expression, to select key genes to build a diagnostic model that can effectively identify AMD from test data [24]. Wang et al. integrated transcriptomic and DNA methylation data from RPE tissues to identify epigenetically regulated genes, building random forest models from expression and methylation features. The methylation-based model demonstrated superior diagnostic performance (AUC = 0.973) compared to the expression-based model (AUC = 0.825), highlighting the potential of epigenetic features for AMD diagnosis [113]. Lastly, Oca et al. analyzed RNA-seq data from peripheral blood mononuclear cells of AMD patients prior to ranibizumab treatment. Using correlation-based feature selection and RF classification, they identified a panel of four mRNAs and one miRNA that predicted treatment response with high accuracy (AUC = 0.968), effectively distinguishing responders from non-responders before therapy initiation [146].
A growing number of transcriptomic studies have applied ML to DR, aiming to identify diagnostic biomarkers, model disease progression, uncover immune and vascular mechanisms, and predict therapeutic targets across both bulk and single-cell datasets. Toh et al. developed a transcriptomic clock based on retinal vasculature gene expression in Nile rats to predict early vascular changes in prodromal DR. Using segmented regression, they identified genes associated with acellular capillary density and trained a random forest regression model on the top 14 predictors [147]. Huang et al. used LASSO and SVM-RFE in conjunction with CytoHubba and WGCNA to identify eight CD8+ T-cell-related genes. This diagnostic model was then validated in a separate proliferative DR fibrovascular membrane dataset and in a diabetic nephropathy dataset, which showed that the genes could also be used to distinguish early vs. late state diabetic nephropathy [60]. In a separate study, Huang et al. also applied LASSO, SVM-RFE, and CytoHubba to identify six Th17-related genes from NPDR and DME datasets. These genes distinguished DME from NPDR and were similarly validated in independent diabetic nephropathy cohorts [61]. Liu et al. used LASSO and RF algorithms to select five ferroptosis-related genes, which were validated using an external dataset. Immune infiltration profiling performed using CIBERSORT suggested that these genes were linked to altered immune microenvironments in DR [13]. Furthermore, Laich et al. used single-cell RNA and imaging mass spectrometry to study epiretinal membranes from patients with proliferative vitreoretinopathy (PVR), a fibrotic complication of advanced DR. To characterize immune and glial cell populations, they applied xCell, a supervised deconvolution algorithm that uses predefined gene signatures to infer the enrichment of specific cell types. Their analysis revealed diverse immune and glial subpopulations, including activated Müller glia and astrocytes, and showed enrichment of extracellular matrix remodeling pathways [148].
The vertebrate retina is a highly specialized neural tissue composed of diverse neuronal and glial cell types that form through tightly regulated developmental programs. Understanding these trajectories and how they vary across species, development, and disease has been significantly advanced by transcriptomic profiling, particularly at the single cell level [6,9]. However, the complexity of retinal lineage commitment and functional heterogeneity pose challenges for traditional analytic approaches [58]. A growing number of transcriptomic studies have leveraged machine learning to map developmental trajectories, classify cell types, and uncover regulatory programs in the retina. For example, Zhang et al. integrated multi-omics data using LASSO and a suite of algorithms, including AdaBoost, CatBoost, ExtraTrees, LightGBM, RF, and XGBoost, to rank features by importance to identify stage-specific markers across retinal cell types. Their incremental feature selection approach highlighted dynamic expression patterns for each type of cell in the three stages to better understand the development route of the fetal retina and the stage-specific markers [74]. Liang et al. created a multi-omics retina atlas by integrating scRNA-seq and scATAC-seq data across 69 human neuronal types and used RF classifiers to align cell identities between human and macaque retina, highlighting conserved photoreceptor profiles and divergent inner retinal neurons [10]. Lukowski et al. employed scGPS, a supervised probabilistic classifier, to compare postmortem human retinal cells with human induced pluripotent stem cell (hiPSC)-derived cones, using canonical correlation analysis (CCA) to correct donor effects and resolve 16 retinal cell types, including two rod subpopulations [8]. Li et al. combined scRNA-seq and scATAC-seq with regulatory network inference tools, such as GRNBoost2, to identify key transcription factors that govern fate transitions between retinal progenitors and Müller glia [73]. Goetz et al. integrated electrophysiology, morphology, and transcriptomics to classify 42 functional retinal ganglion cell types, training supervised classifiers on peristimulus time histograms and assigning transcriptomic identities with XGBoost [149]. Similarly, Norrie et al. used logistic regression, SVM, and RF models to map euchromatin and heterochromatin domains across the genome and integrated these maps with chromatin compartments identified through Hi-C (High-throughput Chromosome Conformation Capture) [80].
Several transcriptomic studies have utilized artificial intelligence to elucidate the molecular mechanisms underlying primary open-angle glaucoma (POAG). Suo et al. analyzed four GEO microarray datasets from trabecular meshwork and optic nerve tissues, applying batch correction, differential expression analysis, and supervised learning models (RF and SVM) to identify five diagnostic genes (AUCs 0.74–0.83). To assess the immune microenvironment, they used CIBERSORT, finding that KRT14 was positively correlated with plasma cells and neutrophils but negatively correlated with regulatory T-cells, which were increased in POAG tissues [12]. Similarly, Wang et al. used WGCNA, RF, and SVM on two microarray datasets to identify three POAG biomarkers. Immune profiling via ConsensusClusterPlus revealed distinct immune subtypes, suggesting a regulatory role of the immune system in disease progression [20]. In a study by Zhao et al., optic nerve head tissue was analyzed to identify diagnostic biomarkers of POAG. LASSO and SVM-RFE were used to select three genes, each of which showed strong diagnostic performance (AUCs 0.89–0.97). To explore regulatory mechanisms, the authors constructed a competing endogenous RNA network and a compound–mRNA interaction map before applying Mendelian Randomization, a genetic causal inference method that uses SNPs as instrumental variables, finding that DNA methylation GrimAge acceleration, an epigenetic measure of biological aging, was causally linked to glaucoma [150].
AI-assisted transcriptomic analysis has also been applied to thyroid eye disease (TED). Shu et al. analyzed lacrimal gland gene expression data using differential expression analysis and WGCNA to identify disease-relevant secretory genes. LASSO, RF, SVM-RFE, and XGBoost were then used to prioritize two diagnostic markers. Immune deconvolution with CIBERSORT revealed that KIAA0319 was positively associated with CD8+ T-cells and activated mast cells, while PRDX4 was linked to resting memory CD4+ T-cells [151]. In a separate study, Ma et al. used LASSO, RF, and SVM-RFE across two GEO datasets to identify six autophagy-related genes as diagnostic features. These genes were significantly correlated with immune infiltration in orbital tissues, reinforcing the role of both secretory dysfunction and immune modulation in TED pathogenesis [14].
This section describes supervised machine learning methods, which work by learning from labeled examples, such as whether a sample comes from a healthy or diseased eye, to make predictions or classify new data. Shallow models like support vector machines, random forests, and LASSO are easier to interpret and perform well on smaller datasets, while more complex models can capture intricate patterns in larger datasets but require more computational power. In ocular transcriptomics, these methods have been used to identify genes linked to diseases, predict how a disease might progress, and even forecast treatment responses. They can also highlight which genes or pathways are most important for distinguishing different conditions, giving researchers valuable clues about disease mechanisms and potential targets for therapy.

3.2.2. Deep Methods: Neural Networks

Supervised DL models, such as convolutional neural networks (CNNs), have been increasingly applied to transcriptomic data to perform classification, regression, and feature prioritization tasks [152]. Like their unsupervised counterparts, supervised deep learning models require extensive tuning and large datasets; however, when trained on labeled data, they can achieve high predictive accuracy and reveal biologically meaningful relationships [47,78]. Another challenge in deep learning is its limited interpretability; while neural networks can achieve high predictive accuracy, they do not inherently reveal how specific input genes influence the model’s decisions, which can, in turn, make biological interpretation difficult [153,154]. To address this, explainable AI methods, such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model–Agnostic Explanations), which can be applied across model types, are increasingly used to attribute predictions to individual genes or pathways, helping to interpret deep models in a biologically meaningful way [155,156]. These deep learning approaches have begun to gain traction in ocular transcriptomic studies, where they are increasingly applied to uncover gene expression patterns, identify disease-associated cell types, and integrate multi-omics data across diverse eye tissues and disease states. For example, Wang et al. generated scRNA-seq and scATAC-seq libraries from postmortem adult human retina and integrated them with HiChIP, which maps histone-mark-mediated chromatin interactions, and eQTL data, which link genetic variants to gene expression. To prioritize functional SNP–gene interactions, they applied convolutional neural networks (CNNs) based on the BPNet architecture to model chromatin accessibility [88,155,156].
Furthermore, in Ma et al., a neural network classifier was evaluated alongside logistic regression, RF, and XGBoost on bulk retinal transcriptomes. The inclusion of a neural network highlights the exploration of hierarchical, nonlinear modeling, although detailed architecture and performance specifics were not highlighted. However, to interpret model predictions and identify key genes driving classification, the authors applied SHAP, which attributed importance scores to individual genes and enabled the selection of an 81-gene signature enriched in glial cell types, such as microglia and astrocytes [97].
Deep learning models, such as neural networks, can learn complex patterns in large, labeled gene expression datasets to make accurate predictions about disease or cell type. While these models can be powerful, they are often harder to interpret because they do not directly show how each gene contributes to their decisions. New explainable AI tools, like SHAP and LIME, help address this by revealing which genes or pathways most influence the model’s predictions. In eye research, these approaches have been used to link genetic changes to functional effects, highlight key cell types involved in disease, and integrate multiple layers of molecular data.

4. AI Applications in Ocular Diseases and Retinal Development

Artificial intelligence has been increasingly applied to transcriptomic data across a wide spectrum of ocular diseases to uncover pathogenic mechanisms, identify diagnostic and prognostic biomarkers, and support therapeutic development [10,13,21,146]. These approaches span bulk, single-cell, and spatial transcriptomic modalities and integrate machine learning and deep learning frameworks for data interpretation [8,60,151]. A comprehensive summary of AI-enabled transcriptomic studies across ocular diseases, including transcriptomic modality, AI methods, and key findings, is provided in Table S1 AI-Guided Transcriptomic Studies Across Ocular Diseases.

4.1. Corneal Disease

Corneal diseases, such as keratoconus and KCS (keratoconjunctivitis sicca), involve multifactorial pathogenesis, including oxidative stress, immune dysregulation, and ECM remodeling. While conventional transcriptomic analyses have identified candidate pathways, recent AI-enabled approaches have enhanced our ability to disentangle complex molecular interactions and prioritize disease-relevant biomarkers.
Oxidative stress and ECM remodeling have emerged as key drivers in keratoconus. Cheng et al. analyzed bulk RNA-seq data using curated oxidative stress and ECM gene sets, integrating immune cell deconvolution via CIBERSORT and machine-learning-based feature selection. Their results identified 17 dysregulated genes that implicated innate immune activation and structural dysregulation in disease pathology. These genes, in turn, were proposed as a predictive gene signature for keratoconus, creating a nomogram for KC prediction, which was supported by expression signatures validated in patient samples and soft PDMS (polydimethylsiloxane)-cultured corneal cells [63]. Liu et al. extended this analysis by intersecting keratoconus transcriptomes with multiple cell death pathways, including ferroptosis and autophagy. Their integrated machine learning pipeline revealed hub genes enriched for TNF and IL-17 signaling, linking programmed cell death to inflammatory remodeling in the cornea [157]. Furthermore, Wu et al. investigated ferroptosis in KC using WGCNA and SVM-RFE, identifying hub genes associated with oxidative stress, immune regulation, and metal ion transport. They identified miR-184 as a potential regulatory factor, which has been previously reported to be the most abundant miRNA in corneal and lens epithelial cells, linked to horizontal protein alignment in the cornea. In that study, decreased AKR1C3 expression was found to reduce miR-184 synthesis in keratoconus, and four predicted AKR1C3-targeting drugs (indomethacin, daunorubicin, doxorubicin, and docetaxel) were highlighted as potential genetic-level interventions. No clinical validation was performed [21]. Finally, Cai et al. identified nine characteristic genes involved in the pathogenesis of keratoconus using machine learning algorithms. These genes were associated with oxidative stress, ferroptosis, inflammatory responses, and mitochondrial apoptotic pathways. Validation with a single-cell RNA-seq dataset highlighted ACSL4, an enzyme that activates polyunsaturated fatty acids and promotes ferroptosis, as the most significant hub gene, which was further confirmed by RT-PCR and Western blotting in corneal stromal cells. Notably, ACSL4 expression was significantly upregulated under conditions of reduced substrate stiffness, implicating this gene, and mitochondria-related pathways more broadly, as critical drivers in the development of keratoconus. However, these findings remain preclinical, as no clinical validation has been performed [22].
Immune involvement has also been highlighted in KCS. Dong et al. applied ML-based gene selection to a microarray dataset and identified biomarkers correlated with inflammatory cytokines in a rat corneal injury model. Upregulation of key mediators, such as JAK1, SKI, and ZBTB16, paralleled increased IL-6, IL-1β, and TNF-α expression, implicating innate immune activation in dry eye pathogenesis. While these results are based on a rat corneal injury model and require confirmation in human studies, they suggest that JAK1, SKI, and ZBTB16 could represent candidate biomarkers for future diagnostic assays or targets for immunomodulatory therapy in dry eye [81].
Together, these studies demonstrate how AI-guided transcriptomic profiling can uncover convergent biological themes across corneal diseases, including oxidative stress, immune infiltration, and ECM (extracellular matrix) disorganization, while enabling the development of predictive gene signatures for future diagnostic and therapeutic applications.

4.2. Acute Macular Degeneration

AMD is a complex, multifactorial disease involving aging, genetic susceptibility, inflammation, and environmental exposures. While transcriptomics has advanced our understanding of AMD’s pathogenesis, the integration of AI, particularly supervised machine learning, has enabled deeper analysis of high-dimensional gene expression data. These approaches can uncover hidden molecular patterns, identify diagnostic and prognostic biomarkers, and stratify patients into biologically meaningful subgroups. In AMD, AI-guided transcriptomic studies have revealed novel insights into immune dysregulation, glial remodeling, mitochondrial dysfunction, and aging-related transcriptional reprogramming.
Several studies have revealed immune activation as a central feature of AMD pathogenesis. Han et al. screened transcriptomic data from human donor RPE choroid tissues to identify 15 disease signature genes and constructed a five-gene clinical prediction model. They implicated several immune functions, such as macrophage activation and CD4+ memory T-cell function, as predictors, which they were able to use to differentiate molecular subtypes of AMD and predict disease progression. Although the model shows strong predictive performance in silico, its clinical utility remains to be validated in prospective patient cohorts [59]. Wang et al. further investigated immune subtype regulation in aging human retinas through immune deconvolution and methylation–transcriptome integration, showing that genes like SMAD2 and NGFR were able to predict AMD progression and were associated with specific immune system functions, including inflammatory signaling. These genes, in turn, could be potential therapeutic targets of AMD; however, as no clinical validation was performed in that study, future studies should pursue further biological testing [113]. Another study by Wang et al. focused on aging found increased infiltration of M2 macrophages and activated T-cells in association with disease severity. Using CIBERSORTx and scRNA-seq reference data, they showed that immune dysregulation was concentrated in glial cell populations, such as Müller glia and microglia. Eight age-related MGS genes were identified in these processes, potentially playing critical roles in the progression of AMD with age. While no clinical trial validation was performed, the findings suggest potential mechanisms and targets for slowing age-related AMD progression [16]. Oca et al. contributed to this immunologic perspective by identifying a blood-based transcriptomic signature predictive of anti-VEGF (vascular endothelial growth factor) treatment response, suggesting that systemic immune alterations may also modulate therapeutic outcomes. The signature, derived from PBMCs of AMD patients, comprised four mRNAs and one miRNA and retrospectively predicted a successful response to ranibizumab with good accuracy. The authors proposed that machine learning classifiers based on mRNA and miRNA profiles, particularly when combined with baseline clinical characteristics, could improve the identification of patients unlikely to respond adequately to ranibizumab and enable patient-specific treatment planning from the first visit. The study was conducted in clinical cohorts [146]. Collectively, these studies underscore the centrality of immune dysfunction in AMD’s pathobiology.
Glial cells, particularly Müller glia, astrocytes, and microglia, have emerged as consistent cellular contributors across transcriptomic studies. Ma et al. identified an 81-gene signature enriched in glial markers that predicted AMD status and overlapped with known AMD gene loci, reinforcing the genetic basis of glial involvement They also discovered a novel AMD-associated variant, rs4133124 at PLCG2, suggesting that genes involved in retinal glial function may drive AMD’s pathology and that disease progression may not follow a strictly linear course [97]. Kuchroo et al. used single-nucleus RNA-seq and a topology-aware clustering algorithm (CATCH) to resolve glial subpopulations across disease stages [23]. They revealed early activation states involving phagocytosis and lysosomal remodeling, and, in advanced disease, they identified a conserved microglia-to-astrocyte IL-1β signaling axis that drives angiogenesis, functionally validated in vitro and in vivo. Given that anti-VEGF therapy remains the only approved intervention for AMD and is primarily effective in advanced stages, the authors proposed that inhibiting microglia-derived IL-1β could offer therapeutic benefit by preventing further neovascularization in advanced patients or even forestalling its onset in earlier stages. These experiments were performed in vitro and in mouse models, with no clinical trials conducted to date. These studies collectively highlight glial remodeling as both an early marker and late-stage driver of AMD’s progression.
Mitochondrial dysfunction and oxidative stress are also increasingly recognized as drivers of retinal degeneration in AMD. Zhang et al. integrated bulk and single-cell RNA-seq to define two molecular subtypes based on mitochondrial gene expression profiles. They constructed and validated a 13-gene diagnostic model linked to immune microenvironment shifts, suggesting that mitochondrial stress may both result from and exacerbate chronic inflammation. Using four machine learning methods, they further identified pathways associated with these subtypes and predicted ten potential small-molecule therapeutics for AMD. While these findings provide a theoretical basis for targeted treatment, no clinical validation has yet been performed [24]. This aligns with immune-focused studies and highlights how mitochondrial and inflammatory processes intersect in AMD pathogenesis.
Additionally, Wang et al. identified 26 age-associated genes whose expression levels correlated with both chronological age and clinical disease severity. These genes were enriched in glial cell types, particularly Müller glia, microglia, and astrocytes. Bulk RNA-seq deconvolution further confirmed an increase in glial proportions with age, especially in advanced AMD. Notably, age-related immune changes overlapped with those identified in inflammatory and mitochondrial studies, suggesting a shared molecular axis of degeneration. Together, these findings support the view that aging-associated transcriptional reprogramming, glial expansion, and immune activation converge to create a permissive environment for retinal degeneration [16].

4.3. Retinal Development

Recent applications of artificial intelligence in retinal development have illuminated the cellular and molecular complexity of neurogenesis, chromatin dynamics, and disease-associated transcriptional programs. By integrating single-cell multi-omics technologies with ML and DL models, researchers have begun to map the regulatory architecture underlying retinal cell type specification, fate commitment, and degeneration. These studies have not only refined our understanding of normal retinogenesis but also identified critical pathways and biomarkers relevant to conditions like glaucoma, macular degeneration, and photoreceptor loss.
Multimodal and AI-based profiling has enhanced our understanding of functional diversity within retinal cell types. Goetz et al. used electrophysiological, morphological, and transcriptomic profiling of mouse retinal ganglion cells (RGCs) to define functional subtypes. Machine learning classifiers trained on light-evoked responses revealed that Tusc5 expression marked a specific RGC type with transient responses and compact dendritic fields, linking transcriptional identity to visual function and offering insights into how dysfunction in specific RGC subtypes may contribute to retinal neurodegeneration. Furthermore, the study enabled the identification of each RGC type along with its projection and wiring patterns in the brain and within the retina, as well as the underlying molecular determinants. No clinical validation was performed [149].
AI-guided transcriptomics has also deepened insight into the transcriptional regulators that govern retinal development. Li et al. integrated embryonic human scRNA-seq and scATAC-seq data to identify transcription factors, such as REST, IRX1/2, ONECUTs, and LHX3/4, that direct Müller glia and retinal neuron differentiation. Of particular note, a glial subtype (MGC2) appeared essential for supporting macular neuron formation, implicating early glial dysfunction in congenital retinal disorders. In parallel, they examined the top 25 disease-related genes across congenital and other ocular diseases, noting that elevated PI3K family gene expression was linked to retinoblastoma, mutations in mGluR6 cascade members represented the third most common cause of complete congenital stationary night blindness, and MGC2 showed the strongest associations with pathogenic genes implicated in AMD, DR, and common uveitis. The study was based on a small number of human embryonic eye samples, and clinical validation is required to confirm these findings [73]. Similarly, Zhang et al. used seven supervised ML models to prioritize stage-specific regulators across major human retinal cell types. Key biomarkers—including RELN, DAB1, ANK3, RIMS2, PDE6H, NFIA, and WIF1—were implicated in neural migration, synaptic stability, phototransduction, and retinal repair, with specific genes, such as RELN, DAB1, and RIMS2, linked to lineage specification in amacrine, bipolar, and photoreceptor cells. These findings provide a mechanistic basis for understanding developmental disorders and inherited retinal dystrophies and identifying potential therapeutic targets for retinal diseases, including cone-rod dystrophy, diabetic retinopathy, AMD, and glaucoma. No clinical trials or in vivo validation have yet been conducted [74]. Finally, Lukowski et al. created a single-cell transcriptomic atlas from postmortem human retinas, identifying 18 distinct neural retinal cell populations using unsupervised clustering and the scGPS machine learning framework. Their analysis revealed postmortem-dependent downregulation of MALAT1 in rod photoreceptors, suggesting its potential as a target to enhance photoreceptor survival and preserve retinal function. The atlas also serves as a benchmark for assessing stem-cell-derived retinal cell types and detecting early molecular changes in retinal disease. Due to the limited number of donor and profiled cells, no clinical validation has yet been performed [8].
Genomic and epigenomic integration has clarified how noncoding variants shape disease risk. Wang et al. generated scRNA-seq and scATAC-seq libraries from postmortem adult human retinas and combined them with HiChIP and eQTL data to identify functional SNP–gene interactions. CNNs based on the BPNet architecture modeled chromatin’s accessibility, prioritizing variants like rs7727244 and rs4102217 as risk loci for myopia and glaucoma. Notably, rs1532278 was shown to regulate CLU expression specifically in Müller glia, highlighting the cell-type-specific impact of noncoding variants in diseases like AMD and glaucoma. The study also nominated additional pathogenic SNP–target gene interactions (e.g., rs1874459) relevant to AMD, glaucoma, DR, myopia, and type 2 macular telangiectasia, providing a valuable resource for interpreting noncoding variation in the eye. No clinical trials have been performed [88].
DL has also enabled new frameworks for regenerative therapy evaluation. Schaub et al. used neural networks trained on quantitative brightfield absorbance microscopy (QBAM) images of iPSC-derived RPE cells to predict VEGF secretion and transepithelial resistance, both of which are key indicators of RPE maturity and function. Complemented by ML models like MLPs, PLSR, and RF, this approach identified morphological features predictive of therapeutic quality, presenting a clinically compatible strategy for evaluating cell therapy products prior to implantation [158].
Chromatin accessibility and transcriptional regulation at the single cell level are also being mapped to establish disease baselines. Liang et al. used snRNA-seq and snATAC-seq to construct a comprehensive atlas of healthy human retina, applying the SCENIC pipeline to reveal transcription factor modules active in specific retinal cell types, providing a molecular framework for understanding normal retinal biology and detecting regulatory disruptions in degenerative diseases. While no clinical trials were performed, these findings offer a valuable resource that may inform the design of future retinal disease studies and targeted therapeutic strategies [10].

4.4. Diabetic Retinopathy

DR is a progressive retinal complication of diabetes, causing significant visual impairment. There are two broad categories, including the early stage of non-proliferative diabetic retinopathy (NPDR) and the advanced stage of proliferative diabetic retinopathy (PDR). An important additional category of DR is DME, the most common cause of vision loss in patients with DR. While traditionally characterized by vascular damage, AI-enabled transcriptomic analyses have expanded our understanding of DR as a multifaceted disorder involving immune dysregulation, oxidative stress, and microvascular remodeling.
Immune activation has emerged as a key driver of DR progression. In two complementary studies, Huang et al. applied network-based analysis and supervised gene selection methods to DR-related transcriptomes, identifying stage-specific immune markers. In the first study, they found that Th17 cell infiltration increased from control to NPDR to DME and identified six hub genes, CD44, CDC42, TIMP1, BMP7, RHOC, and FLT1, associated with Th17-related inflammation. This increased progressively from control to NPDR to DME, with associated hub genes largely involved in leukocyte trafficking, angiogenesis, and cytoskeletal remodeling [61]. In another study, they focused on CD8+ T-cells and identified eight additional genes, IKZF1, PTPRC, ITGB2, ITGAX, TLR7, LYN, CD74, and SPI1, associated with immune activation, cell adhesion, and innate immune sensing pathways. In both studies, hub gene expression was validated by GSVA and qPCR in murine models, suggesting that adaptive immune cells, particularly Th17 and CD8+ T-cells, play active roles in the progression to vision-threatening DR phenotypes like DME [60].
Oxidative stress and ferroptosis have also been implicated in DR. Liu et al. analyzed microarray data to identify 40 differentially expressed ferroptosis-related genes and applied LASSO and RF to prioritize five hub genes involved in antioxidant defense, iron metabolism, and autophagy regulation. These genes were linked to immune infiltration and validated by qRT-PCR in human retinal microvascular endothelial cells exposed to high glucose, suggesting that ferroptosis may contribute to immune activation and vascular injury in diabetic eyes. Molecular docking further demonstrated strong binding of glutathione to CAV1 and TLR4, highlighting a potential ferroptosis-targeting therapeutic approach. While these findings highlight ferroptosis as a potential therapeutic avenue, in vivo and clinical validation remain necessary [13].
Microvascular degeneration and remodeling were explored by Toh et al., who performed RNA-seq on retinal vascular tissue from Nile rats and trained a random forest model to predict acellular capillary density, which is a hallmark of early DR. Their 14-gene signature provided a molecular correlate for vascular dropout before clinical signs emerge and was used in a data-driven approach to identify three candidate drugs—NVP-TAE684, geldanamycin, and NVP-AUY922—that could potentially attenuate early DR by downregulating gene expression linked to acellular capillary density. These findings offer transcriptomic insight into early-stage DR pathogenesis and a framework for drug repurposing, though in vivo and clinical studies are needed to confirm their therapeutic potential [147]. In parallel, Wang et al. analyzed macular transcriptomes across DR severity stages and identified seven genes with expression patterns that tracked disease progression. Among them, CCND1 and FCGR2B were associated with increased infiltration of M2 macrophages, supporting a role for immune remodeling in late-stage DR. By controlling for factors like age and gender, they confirmed that expression of these genes, along with proportions of memory B cells, M2 macrophages, and Müller glia, increased with DR severity. These findings suggest potential molecular and cellular targets for studying the mechanisms of DR progression. The study was based solely on in silico analysis of transcriptomic data [17].
Laich et al. extended these insights by using single-cell RNA and protein profiling to study epiretinal membranes from patients with proliferative vitreoretinopathy (PVR), a fibrotic complication of advanced DR. Their work revealed diverse immune and glial subpopulations, including activated Müller glia and astrocytes, and showed enrichment of extracellular matrix remodeling pathways. Drug-matching analysis ranked aminocaproic acid, levamisole, and TOP2A inhibitors (etoposide, mitoxantrone, doxorubicin) among the top candidates, alongside daunomycin, which has already been investigated in PVR clinical trials. These findings provide targets for developing PVR diagnostics and therapeutics, with future clinical trials representing the next step [148].
Together, these studies suggest that DR progression is shaped by converging mechanisms of immune infiltration, oxidative injury, and vascular degradation. AI-assisted transcriptomic approaches have enabled the identification of stage-specific biomarkers and clarified the contributions of Th17 and CD8+ T-cells, ferroptosis pathways, and macrophage-driven inflammation. These insights lay the foundation for new strategies in early diagnosis, therapeutic targeting, and individualized risk stratification in DR.

4.5. Glaucoma

Factors that affect the pathogenesis and development of glaucoma are still incompletely understood. As such, transcriptomic analyses have increasingly been used to uncover molecular contributors to disease onset and progression. Recent studies integrating artificial intelligence (AI) have identified gene signatures pointing to roles in neurovascular signaling, vesicle transport, and metabolic stress.
For example, Wang et al. applied ML analysis to gene expression data filtered through an immune gene set from the ImmPort database and identified three hub genes: CD40LG, MDK, and TEK. While all three genes have immune or vascular relevance, the study primarily highlights their potential roles in glaucoma through mechanisms like neurotrophic signaling and vascular stability, suggesting that immune-adjacent pathways may contribute to disease development. Using the DGIdb database, the authors further predicted small-molecule drugs targeting these genes, providing a potential therapeutic direction for POAG. Expression patterns were validated through RT-PCR in mouse ocular tissues, but no human clinical validation has yet been performed [20]. Similarly, Zhao et al. analyzed POAG-associated optic nerve head transcriptomes using ML and Mendelian Randomization and identified three hub genes: RAB8A, PRG3, and SMAD3. They also constructed a small-molecule compound–mRNA interaction network, suggesting potential pharmacologic modulators of these biomarkers. These genes implicate vesicle trafficking and inflammatory signaling in glaucomatous neurodegeneration, with SMAD3 suggesting a potential role for TGF-β-mediated fibrosis and immune activation in POAG. Expression patterns were validated through RT-PCR in mouse optic nerve head tissue, but no human clinical validation has yet been performed [150]. Finally, Suo et al. applied machine learning to POAG-associated transcriptomic data from the trabecular meshwork and optic nerve and identified five signature genes associated with disruptions in epithelial integrity, oxidative metabolism, and iron homeostasis. Using the Connectivity Map database, the authors identified five compounds, avrainvillamide-analysis-3, cytochalasin-D, NPI2358, oxymethylone, and vinorelbine, whose gene expression profiles were inversely correlated with those of POAG, suggesting the potential to mitigate or reverse the disease state. These predictions are based solely on in silico analyses, with no in vivo or clinical validation performed [12].

4.6. Thyroid Eye Disease

Thyroid eye disease (TED) is an autoimmune orbital disorder often associated with Graves’ disease characterized by orbital inflammation, fibroblast activation, and extracellular matrix remodeling. While its clinical manifestations include proptosis, diplopia, and vision loss, the underlying molecular mechanisms remain incompletely understood. Recent AI-guided transcriptomic studies have expanded our understanding of TED beyond broad immune activation, highlighting more nuanced contributions from secretory dysfunction, autophagy, and immune-fibroblast interactions.
Immune activation has emerged as a prominent contributor to TED’s pathogenesis. Shu et al. analyzed lacrimal gland transcriptomes using differential expression and WGCNA, followed by supervised machine learning methods, including LASSO, random forest, SVM-RFE, and XGBoost. Two hub genes, KIAA0319 and PRDX4, were identified as diagnostic markers. Immune cell deconvolution using CIBERSORT revealed that KIAA0319 expression correlated positively with CD8+ T-cells and activated mast cells, whereas PRDX4 was associated with resting memory CD4+ T-cells. These findings suggest that both cytotoxic and regulatory immune cell populations play roles in TED’s severity and orbital tissue remodeling. The same study by Shu et al. found that secretory genes involved in tear film production and mucin secretion were downregulated in TED patients, supporting clinical observations of dry eye symptoms and ocular surface instability. The involvement of CD8+ T-cells and mast cells further suggests that inflammation may disrupt normal glandular function. However, these results are derived entirely from computational analyses and await validation in tear fluid from clinical cohorts [151].
Autophagy-related pathways have been highlighted in additional TED studies. Ma et al. analyzed orbital tissue transcriptomes from two GEO datasets and used a combination of LASSO, RF, and SVM-RFE to identify six autophagy-associated genes with strong diagnostic potential. These genes showed significant correlations with immune cell infiltration, linking impaired autophagic flux to immune activation and tissue remodeling in the orbit. They further performed drug–gene interaction screening to identify potential compounds targeting these hub genes, suggesting avenues for therapeutic exploration. While this suggests that defective cellular recycling may exacerbate antigen presentation, inflammation, and fibroblast activation in TED, they remain unvalidated in human samples or clinical cohorts [14].

4.7. Posterior Capsule Opacification

The lens has also been extensively profiled using transcriptomic and multi-omics approaches, providing a foundation for AI applications [152,153,154,155,156,157,158,159,160,161,162,163,164]. Our group recently applied supervised machine learning to transcriptomic data from an ex vivo chick lens injury model to distinguish molecular signatures associated with regenerative wound healing versus fibrotic posterior capsule opacification (PCO) outcomes. Using LASSO, support vector machines, and random forests, we identified distinct gene panels linked to wound healing (e.g., HS3ST2, ID1) and fibrosis (e.g., VGLL3, CEBPD, MXRA7), with pathway analysis implicating MAPK and HIPPO signaling in divergent outcomes. While experimental validation was performed using RT-PCR, clinical validation was not performed at that time [165]. This approach illustrates how AI-guided feature selection and classification can resolve biologically meaningful signatures in a clinically relevant model of secondary cataract formation.

5. Challenges and Limitations

AI-enabled transcriptomic analyses in ophthalmology are increasingly translating molecular insights into clinical applications, from diagnostic gene panels and patient stratification tools to therapeutic target discovery and drug repurposing opportunities. However, despite the growing utility of AI in ocular transcriptomics, some limitations remain. Some ocular transcriptomic studies, especially those involving rare diseases or single-cell datasets, face challenges related to limited sample sizes and high dimensionality, which may increase the risk of overfitting and reduce generalizability [118,162]. Another challenge is the inconsistency in clustering and annotation strategies across studies, which can hinder the accuracy of downstream analyses, such as deconvolution and cell type comparisons. For example, while some studies rely on standard clustering algorithms like Louvain, others implement custom pipelines using supervised classifiers, such as scGPS in Lukowski et al., feature selection frameworks in Zhang et al., or iterative refinement methods in Miao et al., making direct comparisons difficult [8,65,74]. Additionally, if findings are not followed by in vivo or in vitro functional experiments, biological interpretation is limited [166]. While many ML models demonstrate promising diagnostic or predictive performance, their limited interpretability remains a significant concern, particularly in biological and clinical settings [167]. Complex models, such as random forests, support vector machines, and deep neural networks, often function as “black boxes,” providing limited insight into how predictions are derived [168]. Explainable AI techniques, including SHAP and LIME, offer promising solutions by attributing importance scores to individual input features [169,170]. However, these tools are not yet routinely implemented in ocular transcriptomic studies [171]. Moreover, there is a risk of over-interpreting associations without orthogonal validation using proteomics, functional assays, or genetic perturbation studies [172]. Furthermore, case-control designs are prevalent, but they do not support causal inference, and external validation is often lacking [173,174,175]. For example, while Toh et al. performed text mining to indirectly validate the three compounds that were explained to be able to inhibit the genes that composed their acellular capillary density genomic signature, no experimental studies were performed to validate their findings [147]. This is also acknowledged in both Huang et al. (2022a) and Huang et al. (2022b), where no in vitro or in vivo confirmatory studies were performed [60,61]. Furthermore, these studies mention that because the original dataset utilized came from case-control studies it is impossible to clarify the causal relationship between the expression of biomarker genes and the presence of immune cells [60,61].
Another notable limitation across the reviewed literature was the lack of publicly available code and limited external validation, underscoring ongoing challenges for reproducibility and generalizability in AI-driven transcriptomic studies. While several studies mentioned code availability, closer inspection revealed that most were not publicly accessible in practice. For example, Wang et al. (2022) stated that analysis scripts could be provided “upon request,” which we classified as not openly available [17]. Goetz et al. created an open online resource (rgctypes.org) for their retinal ganglion cell classification framework, which provides access to data and algorithms, although this represents a data portal rather than a fully documented analysis pipeline [149]. Norrie et al. deposited sequencing data in GEO and described the use of existing open-source software but did not release new custom scripts [80]. In contrast, Kuchroo et al. represent the only example in which a custom pipeline was made publicly available; their CATCH Python library was deposited on GitHub with documentation and tutorials [23]. Thus, apart from this isolated case, most studies did not provide usable custom code, reflecting a broader reproducibility gap in the field.
Finally, clinical translation remains a major hurdle. Most AI-driven transcriptomic insights have not yet been incorporated into diagnostic, prognostic, or therapeutic workflows, and regulatory pathways for such tools are still in early development [176,177]. Notably, the majority of studies reviewed here lacked direct clinical validation, limiting their immediate applicability to patient care. Key barriers include the absence of robust cross-cohort validation, lack of standardized analytic pipelines, and the need to demonstrate clinical utility in prospective trials. In addition, regulatory pathways for AI-enabled molecular tools are still in their infancy, with unresolved issues related to data privacy, reproducibility, interpretability, and integration into existing electronic health records. Overcoming these challenges will be essential to bridge the gap between discovery-driven analyses and real-world clinical application.
In addition to technical and translational challenges, the application of AI to ocular transcriptomics raises important ethical, legal, and privacy concerns. Transcriptomic datasets, particularly when linked to clinical records or genomic metadata, can contain sensitive personal information [178]. Robust de-identification, compliance with data protection frameworks, such as HIPAA (US) or GDPR (EU), and secure data storage are essential to prevent re-identification risks [179,180]. Moreover, equitable model development requires addressing biases introduced through the underrepresentation of specific populations, which can lead to disparities in diagnostic or therapeutic recommendations [181]. Legal and regulatory pathways for AI-driven molecular diagnostics are still evolving, underscoring the importance of transparency, reproducibility, and adherence to ethical guidelines when integrating these tools into clinical practice [181].

6. Future Directions

Integrating multimodal data (e.g., imaging, epigenomics, proteomics) remains technically and computationally complex, although it holds substantial promise for future precision ophthalmology applications [88,177]. Recent advances in multimodal AI frameworks suggest promising avenues for integrating spatial imaging and molecular data in ophthalmology. For instance, Jackson et al. developed a high-resolution, AI-derived retinal thickness map from optical coherence tomography (OCT) and associated it with transcriptomic and epigenomic profiles from spatially matched retinal tissues. Their model uncovered spatial transcriptomic signatures linked to retinal architecture and disease-prone regions, illustrating how DL imaging phenotypes can be anchored to underlying molecular biology [11]. While most DL applications in ocular transcriptomics focus on gene expression data alone, Schaub et al. demonstrated the potential of using neural networks on non-invasive imaging to predict cell function. They trained models to forecast transepithelial resistance and VEGF secretion from label-free microscopy images of iPSC-derived RPE, highlighting opportunities to integrate morphological, functional, and molecular data [25].
AI-driven multi-omics integration is also gaining traction [28,29,182,183,184]. Laich et al. combined scRNA-seq with imaging mass spectrometry, a technique that maps spatial distributions of metabolites and proteins within tissue sections, to characterize immune and glial heterogeneity in proliferative vitreoretinopathy (PVR), revealing ECM remodeling pathways [148]. Imaging mass spectrometry thus bridges molecular and spatial dimensions by visualizing biochemical species directly in situ. Furthermore, Wolf et al. developed TEMPO (Tracing Expression of Multiple Protein Origins), a framework integrating high-resolution proteomics of aqueous humor with single-cell transcriptomic data from all major retinal cell types. A neural network “proteomic clock” was developed to predict biological aging in specific cell types, revealing accelerated aging signals in diabetic retinopathy and uveitis even after clinical control, illustrating how AI can trace tissue-level disease processes in vivo [185].
While AI applications in ocular transcriptomics have expanded rapidly in recent years, several ocular tissues and diseases remain underexplored. Although substantial work has focused on the retina and the cornea, tissues like the lens, the trabecular meshwork, and the ciliary body remain comparatively underrepresented in AI-driven transcriptomic studies [21,60,148,186]. For example, while the lens has been the subject of numerous transcriptomic and multi-omics analyses, these high-resolution datasets have yet to be analyzed comprehensively using AI techniques [152,153,154,155,156,157,158,159,160,161,162,163,164].
Moving forward, multimodal machine learning frameworks may enable noninvasive inference of tissue state, immune activity, or gene regulation by integrating transcriptomic data with imaging, proteomic, or metabolomic features. In parallel, advances in spatial transcriptomics will allow molecular profiles to be mapped directly onto tissue architectures, enabling earlier detection of disease processes and finer resolution of therapeutic targets. The development of large-scale foundational models trained on diverse ocular datasets could further provide generalizable representations across diseases and modalities, reducing the need for disease-specific retraining. However, many anterior segment tissues remain largely absent from such integrative AI efforts, representing an important opportunity for future work. Achieving these advances will require robust validation using multiple datasets, alongside frameworks for interpretability, regulatory approval, and ethical deployment, ensuring that technological progress translates into tangible clinical benefits.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cells14171315/s1, Table S1: AI-Guided Transcriptomic Studies Across Ocular Diseases.

Author Contributions

Conceptualization, C.L.; investigation, C.L.; writing—original draft preparation, C.L.; writing—review and editing, C.L., Y.Y. and J.L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AIArtificial intelligence
AMDAge-related macular degeneration
AUCsAreas Under the Curve
BaySeqBayesian Sequencing
CIBERSORTCell type Identification By Estimating Relative Subsets Of RNA Transcripts
DAVIDDatabase for Annotation, Visualization, and Integrated Discovery
DEGsDifferentially expressed genes
DESeq2Differential Expression analysis based on the Negative Binomial distribution (version 2)
DLDeep learning
DMEDiabetic macular edema
DNNDeep neural networks
DRDiabetic retinopathy
DS-NMFDeep Subspace Nonnegative Matrix Factorization
ECMExtracellular matrix
edgeREmpirical Analysis of Digital Gene Expression Data in R
GOGene Ontology
GSEAGene Set Enrichment Analysis
GSVAGene Set Variation Analysis
GWASGenome-Wide Association Study Analysis
iPSCInduced pluripotent stem cells
KCSKeratoconjunctivitis sicca
LASSOLeast absolute shrinkage and selection operator
LIGERLinked Inference of Genomic Experimental Relationships
LIMELocal Interpretable Model–Agnostic Explanations
limmaLinear Models for Microarray Data
MAGICMarkov Affinity-based Graph Imputation of Cells
MCODEMolecular Complex Detection
MLMachine learning
NMFNegative matrix factorization
NPDRNon-proliferative diabetic retinopathy
OCTOptical coherence tomography
PCAPrincipal component analysis
PDMSPolydimethylsiloxane
PDRProliferative diabetic retinopathy
POAGPrimary open-angle glaucoma
PVRProliferative vitreoretinopathy
QBAMQuantitative brightfield absorbance microscopy
RFRandom forest
RGCsRetinal ganglion cells
RNA-seqRNA sequencing
RPERetinal pigment epithelium
SCCAFSingle-Cell Clustering Assessment Framework
ScGPSSingle-Cell Global fate Potential of Subpopulations
ScRNA-seqSingle-cell RNA sequencing
scVISingle-cell variational inference
SHAPSHapley Additive exPlanations
SVMSupport vector machine
TEDThyroid eye disease
TEMPO Tracing Expression of Multiple Protein Origins
TNFTumor Necrosis Factor
t-SNEt-distributed Stochastic Neighbor Embedding
UMAPUniform Manifold Approximation and Projection
VAEVariational autoencoders
VEGFVascular endothelial growth factor
v-SVRSupport vector regression (SVR)
WGCNAWeighted gene co-expression network analysis
XGBoosteXtreme Gradient Boosting

References

  1. Sinn, R.; Wittbrodt, J. An eye on eye development. Mech. Dev. 2013, 130, 347–358. [Google Scholar] [CrossRef]
  2. Chow, R.L.; Lang, R.A. Early eye development in vertebrates. Annu. Rev. Cell Dev. Biol. 2001, 17, 255–296. [Google Scholar] [CrossRef]
  3. Miesfeld, J.B.; Brown, N.L. Eye organogenesis: A hierarchical view of ocular development. Curr. Top. Dev. Biol. 2019, 132, 351–393. [Google Scholar]
  4. Zuber, M.E.; Gestri, G.; Viczian, A.S.; Barsacchi, G.; Harris, W.A. Specification of the vertebrate eye by a network of eye field transcription factors. Development 2003, 130, 5155–5167. [Google Scholar] [CrossRef]
  5. Vöcking, O.; Famulski, J.K. Single cell transcriptome analyses of the developing zebrafish eye—Perspectives and applications. Front. Cell Dev. Biol. 2023, 11, 1213382. [Google Scholar] [CrossRef]
  6. Voigt, A.P.; Whitmore, S.S.; Mulfaul, K.; Chirco, K.R.; Giacalone, J.C.; Flamme-Wiese, M.J.; Stockman, A.; Stone, E.M.; Tucker, B.A.; Scheetz, T.E.; et al. Bulk and single-cell gene expression analyses reveal aging human choriocapillaris has pro-inflammatory phenotype. Microvasc. Res. 2020, 131, 104031. [Google Scholar] [CrossRef]
  7. Hack, S.J.; Petereit, J.; Tseng, K.A.-S. Temporal Transcriptomic Profiling of the Developing Xenopus laevis Eye. Cells 2024, 13, 1390. [Google Scholar] [CrossRef]
  8. Lukowski, S.W.; Lo, C.Y.; Sharov, A.A.; Nguyen, Q.; Fang, L.; Hung, S.S.; Zhu, L.; Zhang, T.; Grünert, U.; Nguyen, T.; et al. A single-cell transcriptome atlas of the adult human retina. EMBO J. 2019, 38, e100811. [Google Scholar] [CrossRef]
  9. Voigt, A.P.; Mulfaul, K.; Mullin, N.K.; Flamme-Wiese, M.J.; Giacalone, J.C.; Stone, E.M.; Tucker, B.A.; Scheetz, T.E.; Mullins, R.F. Single-cell transcriptomics of the human retinal pigment epithelium and choroid in health and macular degeneration. Proc. Natl. Acad. Sci. USA 2019, 116, 24100–24107. [Google Scholar] [CrossRef]
  10. Liang, Q.; Cheng, X.; Wang, J.; Owen, L.; Shakoor, A.; Lillvis, J.L.; Zhang, C.; Farkas, M.; Kim, I.K.; Li, Y.; et al. A multi-omics atlas of the human retina at single-cell resolution. Cell Genom. 2023, 3, 100298. [Google Scholar] [CrossRef]
  11. Jackson, V.; Wu, Y.; Bonelli, R.; Owen, J.; Scott, L.; Farashi, S.; Kihara, Y.; Gantner, M.L.; Egan, C.; Williams, K.M.; et al. Multi-omic spatial effects on high-resolution AI-derived retinal thickness. Nat. Commun. 2025, 16, 1317. [Google Scholar] [CrossRef]
  12. Suo, L.; Dai, W.; Qin, X.; Li, G.; Zhang, D.; Cheng, T.; Yao, T.; Zhang, C. Screening of primary open-angle glaucoma diagnostic markers based on immune-related genes and immune infiltration. BMC Genom. Data 2022, 23, 67. [Google Scholar] [CrossRef]
  13. Liu, J.; Li, X.; Cheng, Y.; Liu, K.; Zou, H.; You, Z. Identification of potential ferroptosis-related biomarkers and a pharmacological compound in diabetic retinopathy based on machine learning and molecular docking. Front. Endocrinol. 2022, 13, 988506. [Google Scholar] [CrossRef]
  14. Ma, Q.; Hai, Y.; Shen, J. Signatures of Six Autophagy-Related Genes as Diagnostic Markers of Thyroid-Associated Ophthalmopathy and Their Correlation with Immune Infiltration. Immun. Inflamm. Dis. 2024, 12, e70093. [Google Scholar] [CrossRef]
  15. Owen, N.; Moosajee, M. RNA-sequencing in ophthalmology research: Considerations for experimental design and analysis. Ther. Adv. Ophthalmol. 2019, 11, 251584141983546. [Google Scholar] [CrossRef]
  16. Wang, J.-H.; Wong, R.C.; Liu, G.-S. Retinal aging transcriptome and cellular landscape in association with the progression of age-related macular degeneration. Investig. Ophthalmol. Vis. Sci. 2023, 64, 32. [Google Scholar] [CrossRef]
  17. Wang, J.-H.; Wong, R.C.B.; Liu, G.-S. Retinal Transcriptome and Cellular Landscape in Relation to the Progression of Diabetic Retinopathy. Investig. Ophthalmol. Vis. Sci. 2022, 63, 26. [Google Scholar] [CrossRef]
  18. Yang, T.; Zhang, N.; Yang, N. Single-cell sequencing in diabetic retinopathy: Progress and prospects. J. Transl. Med. 2025, 23, 49. [Google Scholar] [CrossRef]
  19. Ahsanuddin, S.; Wu, A.Y. Single-cell transcriptomics of the ocular anterior segment: A comprehensive review. Eye 2023, 37, 3334–3350. [Google Scholar] [CrossRef]
  20. Wang, D.; Pu, Y.; Tan, S.; Wang, X.; Zeng, L.; Lei, J.; Gao, X.; Li, H. Identification of immune-related biomarkers for glaucoma using gene expression profiling. Front. Genet. 2024, 15, 1366453. [Google Scholar] [CrossRef]
  21. Wu, X.; Deng, Q.; Han, Z.; Ni, F.; Sun, D.; Xu, Y. Screening and identification of genes related to ferroptosis in keratoconus. Sci. Rep. 2023, 13, 13956. [Google Scholar] [CrossRef]
  22. Cai, Y.; Zhou, T.; Cai, X.; Shi, W.; Sun, H.; Fu, Y. Deciphering mitochondrial dysfunction in keratoconus: Insights into ACSL4 from machine learning-based bulk and single-cell transcriptome analyses and experimental validation. Comput. Struct. Biotechnol. J. 2025, 27, 1962–1974. [Google Scholar] [CrossRef]
  23. Kuchroo, M.; DiStasio, M.; Song, E.; Calapkulu, E.; Zhang, L.; Ige, M.; Sheth, A.H.; Majdoubi, A.; Menon, M.; Tong, A.; et al. Single-cell analysis reveals inflammatory interactions driving macular degeneration. Nat. Commun. 2023, 14, 2589. [Google Scholar] [CrossRef]
  24. Zhang, S.; Yang, Y.; Chen, J.; Su, S.; Cai, Y.; Yang, X.; Sang, A. Integrating Multi-omics to Identify Age-Related Macular Degeneration Subtypes and Biomarkers. J. Mol. Neurosci. 2024, 74, 74. [Google Scholar] [CrossRef]
  25. Schaub, N.J.; Hotaling, N.A.; Manescu, P.; Padi, S.; Wan, Q.; Sharma, R.; George, A.; Chalfoun, J.; Simon, M.; Ouladi, M.; et al. Deep learning predicts function of live retinal pigment epithelium from quantitative microscopy. J. Clin. Investig. 2020, 130, 1010–1023. [Google Scholar] [CrossRef]
  26. Lu, C.; Mao, X.; Yuan, S. Decoding physiological and pathological roles of innate immune cells in eye diseases: The perspectives from single-cell RNA sequencing. Front. Immunol. 2024, 15, 1490719. [Google Scholar] [CrossRef]
  27. Syta, A.; Podkowiński, A.; Chorągiewicz, T.; Karpiński, R.; Gęca, J.; Wróbel-Dudzińska, D.; Jonak, K.E.; Głuchowski, D.; Maciejewski, M.; Rejdak, R.; et al. Machine learning-assisted early detection of keratoconus: A comparative analysis of corneal topography and biomechanical data. Sci. Rep. 2025, 15, 24399. [Google Scholar] [CrossRef] [PubMed]
  28. Ting, D.S.W.; Pasquale, L.R.; Peng, L.; Campbell, J.P.; Lee, A.Y.-Y.; Raman, R.; Tan, G.S.W.; Schmetterer, L.; Keane, P.A.; Wong, T.Y. Artificial intelligence and deep learning in ophthalmology. Br. J. Ophthalmol. 2019, 103, 167–175. [Google Scholar] [CrossRef]
  29. Hogarty, D.T.; Mackey, D.A.; Hewitt, A.W. Current state and future prospects of artificial intelligence in ophthalmology: A review. Clin. Exp. Ophthalmol. 2019, 47, 128–139. [Google Scholar] [CrossRef]
  30. Schena, M.; Shalon, D.; Davis, R.W.; Brown, P.O. Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray. Science 1995, 270, 467–470. [Google Scholar] [CrossRef]
  31. Lockhart, D.J.; Winzeler, E.A. Genomics, gene expression and DNA arrays. Nature 2000, 405, 827–836. [Google Scholar] [CrossRef]
  32. Gao, X.; Yourick, M.R.; Campasino, K.; Zhao, Y.; Sepehr, E.; Vaught, C.; Sprando, R.L.; Yourick, J.J. An updated comparison of microarray and RNA-seq for concentration response transcriptomic study: Case studies with two cannabinoids, cannabichromene and cannabinol. BMC Genom. 2025, 26, 392. [Google Scholar] [CrossRef]
  33. Ritchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015, 43, e47. [Google Scholar] [CrossRef] [PubMed]
  34. Huang, D.W.; Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009, 4, 44–57. [Google Scholar] [CrossRef] [PubMed]
  35. Candia, J.; Ferrucci, L. Assessment of Gene Set Enrichment Analysis using curated RNA-seq-based benchmarks. PLoS ONE 2024, 19, e0302696. [Google Scholar] [CrossRef]
  36. Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 2005, 102, 15545–15550. [Google Scholar] [CrossRef]
  37. Bolstad, B.M.; Irizarry, R.A.; Åstrand, M.; Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19, 185–193. [Google Scholar] [CrossRef]
  38. Tan, D.S.P.; Lambros, M.B.; Natrajan, R.; Reis-Filho, J.S. Getting it right: Designing microarray (and not ‘microawry’) comparative genomic hybridization studies for cancer research. Lab. Investig. 2007, 87, 737–754. [Google Scholar] [CrossRef]
  39. Piccolo, S.R.; Sun, Y.; Campbell, J.D.; Lenburg, M.E.; Bild, A.H.; Johnson, W.E. A single-sample microarray normalization method to facilitate personalized-medicine workflows. Genomics 2012, 100, 337–344. [Google Scholar] [CrossRef]
  40. Rhodius, V.A.; Gross, C.A. Using DNA microarrays to assay part function. Methods Enzymol. 2011, 497, 75–113. [Google Scholar]
  41. Leek, J.T.; Scharpf, R.B.; Corrada-Bravo, H.; Simcha, D.; Langmead, B.; Johnson, W.E.; Geman, D.; Baggerly, K.; Irizarry, R.A. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 2010, 11, 733–739. [Google Scholar] [CrossRef]
  42. Agapito, G.; Milano, M.; Cannataro, M. A statistical network pre-processing method to improve relevance and significance of gene lists in microarray gene expression studies. BMC Bioinform. 2022, 23, 393. [Google Scholar] [CrossRef]
  43. Tzec-Interián, J.A.; González-Padilla, D.; Góngora-Castillo, E.B. Bioinformatics perspectives on transcriptomics: A comprehensive review of bulk and single-cell RNA sequencing analyses. Quant. Biol. 2025, 13, e78. [Google Scholar] [CrossRef]
  44. Donato, L.; Bramanti, P.; Scimone, C.; Rinaldi, C.; D’Angelo, R.; Sidoti, A. miRNA expression profile of retinal pigment epithelial cells under oxidative stress conditions. FEBS Open Bio 2018, 8, 219–233. [Google Scholar] [CrossRef]
  45. You, J.; Corley, S.M.; Wen, L.; Hodge, C.; Höllhumer, R.; Madigan, M.C.; Wilkins, M.R.; Sutton, G. RNA-Seq analysis and comparison of corneal epithelium in keratoconus and myopia patients. Sci. Rep. 2018, 8, 389. [Google Scholar] [CrossRef]
  46. Lozano, D.C.; Choi, D.; Jayaram, H.; Morrison, J.C.; Johnson, E.C. Utilizing RNA-Seq to Identify Differentially Expressed Genes in Glaucoma Model Tissues, Such as the Rodent Optic Nerve Head. In Methods in Molecular Biology; Springer: New York, NY, USA, 2018; pp. 299–310. [Google Scholar]
  47. Anand, D.; Kakrana, A.; Siddam, A.D.; Huang, H.; Saadi, I.; Lachke, S.A. RNA sequencing-based transcriptomic profiles of embryonic lens development for cataract gene discovery. Hum. Genet. 2018, 137, 941–954. [Google Scholar] [CrossRef]
  48. Love, M.I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef]
  49. Hardcastle, T.J.; Kelly, K.A. baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinform. 2010, 11, 422. [Google Scholar] [CrossRef]
  50. Robinson, M.D.; McCarthy, D.J.; Smyth, G.K. edgeR: A Bioconductor package f1or differential expression analysis of digital gene expression data. Bioinformatics 2010, 26, 139–140. [Google Scholar] [CrossRef]
  51. Zeng, I.S.L.; Lumley, T. Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science). Bioinform. Biol. Insights 2018, 12, 117793221875929. [Google Scholar] [CrossRef]
  52. Yu, G.; Wang, L.-G.; Han, Y.; He, Q.-Y. clusterProfiler: An R Package for Comparing Biological Themes Among Gene Clusters. OMICS A J. Integr. Biol. 2012, 16, 284–287. [Google Scholar] [CrossRef]
  53. Deshpande, D.; Chhugani, K.; Chang, Y.; Karlsberg, A.; Loeffler, C.; Zhang, J.; Muszyńska, A.; Munteanu, V.; Yang, H.; Rotman, J.; et al. RNA-seq data science: From raw data to effective interpretation. Front. Genet. 2023, 14, 997383. [Google Scholar] [CrossRef]
  54. Koch, C.M.; Chiu, S.F.; Akbarpour, M.; Bharat, A.; Ridge, K.M.; Bartom, E.T.; Winter, D.R. A Beginner’s Guide to Analysis of RNA Sequencing Data. Am. J. Respir. Cell Mol. Biol. 2018, 59, 145–157. [Google Scholar] [CrossRef]
  55. Van den Berge, K.; Hembach, K.M.; Soneson, C.; Tiberi, S.; Clement, L.; Love, M.I.; Patro, R.; Robinson, M.D. RNA sequencing data: Hitchhiker’s guide to expression analysis. Annu. Rev. Biomed. Data Sci. 2019, 2, 139–173. [Google Scholar] [CrossRef]
  56. Yu, Y.; Mai, Y.; Zheng, Y.; Shi, L. Assessing and mitigating batch effects in large-scale omics studies. Genome Biol. 2024, 25, 254. [Google Scholar] [CrossRef]
  57. Zhao, S.; Ye, Z.; Stanton, R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA 2020, 26, 903–909. [Google Scholar] [CrossRef]
  58. Heil, B.J.; Crawford, J.; Greene, C.S. The effect of non-linear signal in classification problems using gene expression. PLoS Comput. Biol. 2023, 19, e1010984. [Google Scholar] [CrossRef]
  59. Han, D.; He, X. Screening for biomarkers in age-related macular degeneration. Heliyon 2023, 9, e16981. [Google Scholar] [CrossRef]
  60. Huang, J.; Zhou, Q. CD8+T Cell-Related Gene Biomarkers in Macular Edema of Diabetic Retinopathy. Front. Endocrinol. 2022, 13, 907396. [Google Scholar] [CrossRef]
  61. Huang, J.; Zhou, Q. Gene Biomarkers Related to Th17 Cells in Macular Edema of Diabetic Retinopathy: Cutting-Edge Comprehensive Bioinformatics Analysis and In Vivo Validation. Front. Immunol. 2022, 13, 858972. [Google Scholar] [CrossRef]
  62. Libbrecht, M.W.; Noble, W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 2015, 16, 321–332. [Google Scholar] [CrossRef]
  63. Cheng, Z.; Hao, J.; Cai, S.; Feng, P.; Chen, W.; Ma, X.; Li, X. A novel combined oxidative stress and extracellular matrix related predictive gene signature for keratoconus. Biochem. Biophys. Res. Commun. 2025, 742, 151144. [Google Scholar] [CrossRef]
  64. Tang, F.; Barbacioru, C.; Wang, Y.; Nordman, E.; Lee, C.; Xu, N.; Wang, X.; Bodeau, J.; Tuch, B.B.; Siddiqui, A.; et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 2009, 6, 377–382. [Google Scholar] [CrossRef]
  65. Miao, Z.; Moreno, P.; Huang, N.; Papatheodorou, I.; Brazma, A.; Teichmann, S.A. Putative cell type discovery from single-cell gene expression data. Nat. Methods 2020, 17, 621–628. [Google Scholar] [CrossRef]
  66. Hwang, B.; Lee, J.H.; Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med. 2018, 50, 96. [Google Scholar] [CrossRef]
  67. Denisenko, E.; Guo, B.B.; Jones, M.; Hou, R.; de Kock, L.; Lassmann, T.; Poppe, D.; Clément, O.; Simmons, R.K.; Lister, R.; et al. Systematic assessment of tissue dissociation and storage biases in single-cell and single-nucleus RNA-seq workflows. Genome Biol. 2020, 21, 130. [Google Scholar] [CrossRef]
  68. Rich, J.M.; Moses, L.; Einarsson, P.H.; Jackson, K.; Luebbert, L.; Booeshaghi, A.S.; Antonsson, S.; Sullivan, D.K.; Bray, N.; Melsted, P.; et al. The impact of package selection and versioning on single-cell RNA-seq analysis. bioRxiv 2024. bioRxiv:2024.04.04.588111. [Google Scholar] [CrossRef]
  69. Hu, Z.; Ahmed, A.A.; Yau, C. CIDER: An interpretable meta-clustering framework for single-cell RNA-seq data integration and evaluation. Genome Biol. 2021, 22, 337. [Google Scholar] [CrossRef]
  70. Blondel, V.D.; Guillaume, J.-L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef]
  71. Traag, V.A.; Waltman, L.; Van Eck, N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef]
  72. van der Maaten, L.; Hinton, G. Visualizing high-dimensional data using t-sne. J. Mach. Learn. Res. 2008, 9, 2579. [Google Scholar]
  73. Li, R.; Liu, J.; Yi, P.; Yang, X.; Chen, J.; Zhao, C.; Liao, X.; Wang, X.; Xu, Z.; Lu, H.; et al. Integrative Single-Cell Transcriptomics and Epigenomics Mapping of the Fetal Retina Developmental Dynamics. Adv. Sci. 2023, 10, 2206623. [Google Scholar] [CrossRef]
  74. Zhang, J.; Du, T.; Jin, Y.; Bao, Y.; Ma, Q.; Cai, Y.-D.; Zhang, J. Machine Learning Identifies Key Gene Markers Related to Fetal Retina Development at Single-Cell Transcription Level. Investig. Ophthalmol. Vis. Sci. 2025, 66, 60. [Google Scholar] [CrossRef]
  75. Yazici, İ.; Shayea, I.; Din, J. A survey of applications of artificial intelligence and machine learning in future mobile networks-enabled systems. Eng. Sci. Technol. Int. J. 2023, 44, 101455. [Google Scholar] [CrossRef]
  76. Voigt, A.P.; Mullin, N.K.; Stone, E.M.; Tucker, B.A.; Scheetz, T.E.; Mullins, R.F. Single-cell RNA sequencing in vision research: Insights into human retinal health and disease. Prog. Retin. Eye Res. 2021, 83, 100934. [Google Scholar] [CrossRef]
  77. Wang, Y.; Miller, D.J.; Clarke, R. Approaches to working in high-dimensional data spaces: Gene expression microarrays. Br. J. Cancer 2008, 98, 1023–1028. [Google Scholar] [CrossRef]
  78. Angermueller, C.; Pärnamaa, T.; Parts, L.; Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 2016, 12, 878. [Google Scholar] [CrossRef]
  79. Oleynik, M.; Kugic, A.; Kasáč, Z.; Kreuzthaler, M. Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification. J. Am. Med. Inf. Assoc. 2019, 26, 1247–1254. [Google Scholar] [CrossRef]
  80. Norrie, J.L.; Lupo, M.S.; Little, D.R.; Shirinifard, A.; Mishra, A.; Zhang, Q.; Geiger, N.; Putnam, D.; Djekidel, N.; Ramirez, C.; et al. Latent epigenetic programs in Müller glia contribute to stress and disease response in the retina. Dev. Cell 2025, 60, 1199–1216. [Google Scholar] [CrossRef] [PubMed]
  81. Dong, Z.; Wang, C.; Dou, S.; Yang, X.; Wang, D.; Shi, K.; Wu, N. JAK1, SKI, ZBTB16 as potential biomarkers mediate the inflammatory response in keratoconjunctivitis sicca. Gene 2024, 927, 148691. [Google Scholar] [CrossRef]
  82. Ringnér, M. What is principal component analysis? Nat. Biotechnol. 2008, 26, 303–304. [Google Scholar] [CrossRef]
  83. McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimensionality Reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
  84. Islam, S.; Anand, S.; Hamid, J.; Thabane, L.; Beyene, J. Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration. Stat. Appl. Genet. Mol. Biol. 2017, 16, 199–216. [Google Scholar] [CrossRef]
  85. Nayak, R.; Hasija, Y. A hitchhiker’s guide to single-cell transcriptomics and data analysis pipelines. Genomics 2021, 113, 606–619. [Google Scholar] [CrossRef]
  86. Van Dijk, D.; Sharma, R.; Nainys, J.; Yim, K.; Kathail, P.; Carr, A.J.; Burdziak, C.; Moon, K.R.; Chaffer, C.L.; Pattabiraman, D.; et al. Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. Cell 2018, 174, 716–729.e27. [Google Scholar] [CrossRef]
  87. Stegle, O.; Teichmann, S.A.; Marioni, J.C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 2015, 16, 133–145. [Google Scholar] [CrossRef]
  88. Wang, S.K.; Nair, S.; Li, R.; Kraft, K.; Pampari, A.; Patel, A.; Kang, J.B.; Luong, C.; Kundaje, A.; Chang, H.Y. Single-cell multiome of the human retina and deep learning nominate causal variants in complex eye diseases. Cell Genom. 2022, 2, 100164. [Google Scholar] [CrossRef]
  89. Haghverdi, L.; Lun, A.T.L.; Morgan, M.D.; Marioni, J.C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 2018, 36, 421–427. [Google Scholar] [CrossRef]
  90. Korsunsky, I.; Millard, N.; Fan, J.; Slowikowski, K.; Zhang, F.; Wei, K.; Baglaenko, Y.; Brenner, M.; Loh, P.-R.; Raychaudhuri, S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 2019, 16, 1289–1296. [Google Scholar] [CrossRef]
  91. Zhang, S.; Li, X.; Lin, J.; Lin, Q.; Wong, K.-C. Review of single-cell RNA-seq data clustering for cell-type identification and characterization. RNA 2023, 29, 517–530. [Google Scholar] [CrossRef]
  92. Johnson, K.A.; Krishnan, A. Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data. Genome Biol. 2022, 23, 1. [Google Scholar] [CrossRef]
  93. Jaskowiak, P.A.; Campello, R.J.; Costa, I.G. On the selection of appropriate distances for gene expression data clustering. BMC Bioinform. 2014, 15, S2. [Google Scholar] [CrossRef]
  94. Do, J.H.; Choi, D.-K. Clustering Approaches to Identifying Gene Expression Patterns from DNA Microarray Data. Mol. Cells 2008, 25, 279–288. [Google Scholar] [CrossRef]
  95. Pantano, L.; Hutchinson, J.; Barrera, V.; Kirchner, R.; Steinbaugh, M. DEGreport: Report of DEG analysis. Bioconductor, 15 April 2025. [Google Scholar]
  96. Langfelder, P.; Horvath, S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinform. 2008, 9, 559. [Google Scholar] [CrossRef]
  97. Ma, K.; Nakajima, H.; Basak, N.; Barman, A.; Ratnapriya, R. Integrating explainable machine learning and transcriptomics data reveals cell-type specific immune signatures underlying macular degeneration. npj Genom. Med. 2025, 10, 48. [Google Scholar] [CrossRef]
  98. Saelens, W.; Cannoodt, R.; Todorov, H.; Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 2019, 37, 547–554. [Google Scholar] [CrossRef]
  99. Setty, M.; Kiseliovas, V.; Levine, J.; Gayoso, A.; Mazutis, L.; Pe’Er, D. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 2019, 37, 451–460. [Google Scholar] [CrossRef]
  100. Butler, A.; Hoffman, P.; Smibert, P.; Papalexi, E.; Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018, 36, 411–420. [Google Scholar] [CrossRef]
  101. Stuart, T.; Butler, A.; Hoffman, P.; Hafemeister, C.; Papalexi, E.; Mauck, W.M.; Hao, Y.; Stoeckius, M.; Smibert, P.; Satija, R. Comprehensive Integration of Single-Cell Data. Cell 2019, 177, 1888–1902.e21. [Google Scholar] [CrossRef]
  102. Jia, X.; Wu, J.; Chen, X.; Hou, S.; Li, Y.; Zhao, L.; Zhu, Y.; Li, Z.; Deng, C.; Su, W.; et al. Cell atlas of trabecular meshwork in glaucomatous non-human primates and DEGs related to tissue contract based on single-cell transcriptomics. iScience 2023, 26, 108024. [Google Scholar] [CrossRef]
  103. Welch, J.D.; Kozareva, V.; Ferreira, A.; Vanderburg, C.; Martin, C.; Macosko, E.Z. Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity. Cell 2019, 177, 1873–1887.e17. [Google Scholar] [CrossRef]
  104. Avila Cobos, F.; Alquicira-Hernandez, J.; Powell, J.E.; Mestdagh, P.; De Preter, K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat. Commun. 2020, 11, 5650. [Google Scholar] [CrossRef]
  105. Gong, T.; Szustakowski, J.D. DeconRNASeq: A statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data. Bioinformatics 2013, 29, 1083–1085. [Google Scholar] [CrossRef]
  106. Jin, H.; Liu, Z. A benchmark for RNA-seq deconvolution analysis under dynamic testing environments. Genome Biol. 2021, 22, 102. [Google Scholar] [CrossRef]
  107. Newman, A.M.; Steen, C.B.; Liu, C.L.; Gentles, A.J.; Chaudhuri, A.A.; Scherer, F.; Khodadoust, M.S.; Esfahani, M.S.; Luca, B.A.; Steiner, D.; et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 2019, 37, 773–782. [Google Scholar] [CrossRef]
  108. Newman, A.M.; Liu, C.L.; Green, M.R.; Gentles, A.J.; Feng, W.; Xu, Y.; Hoang, C.D.; Diehn, M.; Alizadeh, A.A. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 2015, 12, 453–457. [Google Scholar] [CrossRef]
  109. Miao, Y.R.; Zhang, Q.; Lei, Q.; Luo, M.; Xie, G.Y.; Wang, H.; Guo, A.Y. ImmuCellAI: A Unique Method for Comprehensive T-Cell Subsets Abundance Prediction and its Application in Cancer Immunotherapy. Adv. Sci. 2020, 7, 1902880. [Google Scholar] [CrossRef]
  110. Das, S.; McClain, C.J.; Rai, S.N. Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges. Entropy 2020, 22, 427. [Google Scholar] [CrossRef]
  111. Wang, Y.; Yang, X.; Zhang, Y.; Hong, L.; Xie, Z.; Jiang, W.; Chen, L.; Xiong, K.; Yang, S.; Lin, M.; et al. Single-cell RNA sequencing reveals roles of unique retinal microglia types in early diabetic retinopathy. Diabetol. Metab. Syndr. 2024, 16, 49. [Google Scholar] [CrossRef]
  112. Hänzelmann, S.; Castelo, R.; Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinform. 2013, 14, 7. [Google Scholar] [CrossRef]
  113. Wang, Z.; Huang, Y.; Chu, F.; Liao, K.; Cui, Z.; Chen, J.; Tang, S. Integrated Analysis of DNA methylation and transcriptome profile to identify key features of age-related macular degeneration. Bioengineered 2021, 12, 7061–7078. [Google Scholar] [CrossRef]
  114. Yousef, M.; Allmer, J. Deep learning in bioinformatics. Turk. J. Biol. 2023, 47, 366–382. [Google Scholar] [CrossRef]
  115. Lopez, R.; Regier, J.; Cole, M.B.; Jordan, M.I.; Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 2018, 15, 1053–1058. [Google Scholar] [CrossRef]
  116. Tran, D.; Nguyen, H.; Tran, B.; La Vecchia, C.; Luu, H.N.; Nguyen, T. Fast and precise single-cell data analysis using a hierarchical autoencoder. Nat. Commun. 2021, 12, 1029. [Google Scholar] [CrossRef] [PubMed]
  117. Eraslan, G.; Simon, L.M.; Mircea, M.; Mueller, N.S.; Theis, F.J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 2019, 10, 390. [Google Scholar] [CrossRef] [PubMed]
  118. Dou, B.; Zhu, Z.; Merkurjev, E.; Ke, L.; Chen, L.; Jiang, J.; Zhu, Y.; Liu, J.; Zhang, B.; Wei, G.-W. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem. Rev. 2023, 123, 8736–8780. [Google Scholar] [CrossRef]
  119. Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  120. Simon, N.; Friedman, J.H.; Hastie, T.; Tibshirani, R. Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. J. Stat. Softw. 2011, 39, 1–13. [Google Scholar] [CrossRef]
  121. Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  122. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  123. Huang, S.; Cai, N.; Pacheco, P.P.; Narrandes, S.; Wang, Y.; Xu, W. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer Genom. Proteom. 2018, 15, 41–51. [Google Scholar]
  124. Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene Selection for Cancer Classification using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
  125. Ding, Y.; Wilkins, D. Improving the Performance of SVM-RFE to Select Genes in Microarray Data. BMC Bioinform. 2006, 7, S12. [Google Scholar] [CrossRef]
  126. Li, Z.; Xie, W.; Liu, T. Efficient feature selection and classification for microarray data. PLoS ONE 2018, 13, e0202167. [Google Scholar] [CrossRef]
  127. Díaz-Uriarte, R.; Alvarez de Andrés, S. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006, 7, 3. [Google Scholar] [CrossRef]
  128. Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
  129. Strobl, C.; Boulesteix, A.-L.; Zeileis, A.; Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 2007, 8, 25. [Google Scholar] [CrossRef]
  130. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  131. Zhang, C.; Liu, C.; Zhang, X.; Almpanidis, G. An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 2017, 82, 128–150. [Google Scholar] [CrossRef]
  132. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; pp. 6638–6648. [Google Scholar]
  133. Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for big data: An interdisciplinary review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef]
  134. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 3146–3154. [Google Scholar]
  135. Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
  136. Martinez, W.; Gray, J.B. Noise peeling methods to improve boosting algorithms. Comput. Stat. Data Anal. 2016, 93, 483–497. [Google Scholar] [CrossRef]
  137. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
  138. Moerman, T.; Aibar Santos, S.; Bravo González-Blas, C.; Simm, J.; Moreau, Y.; Aerts, J.; Aerts, S.; Kelso, J. GRNBoost2 and Arboreto: Efficient and scalable inference of gene regulatory networks. Bioinformatics 2019, 35, 2159–2161. [Google Scholar] [CrossRef]
  139. Pratapa, A.; Jalihal, A.P.; Law, J.N.; Bharadwaj, A.; Murali, T.M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 2020, 17, 147–154. [Google Scholar] [CrossRef]
  140. Karamveer n Uzun, Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform. Biol. Insights 2024, 18, 11779322241287120. [Google Scholar] [CrossRef]
  141. Thompson, M.; Matsumoto, M.; Ma, T.; Senabouth, A.; Palpant, N.J.; Powell, J.E.; Nguyen, Q. scGPS: Determining Cell States and Global Fate Potential of Subpopulations. Front. Genet. 2021, 12, 666771. [Google Scholar] [CrossRef]
  142. Aran, D.; Hu, Z.; Butte, A.J. xCell: Digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017, 18, 220. [Google Scholar] [CrossRef]
  143. Sutton, G.J.; Poppe, D.; Simmons, R.K.; Walsh, K.; Nawaz, U.; Lister, R.; Gagnon-Bartsch, J.A.; Voineagu, I. Comprehensive evaluation of deconvolution methods for human brain gene expression. Nat. Commun. 2022, 13, 1358. [Google Scholar] [CrossRef]
  144. Chin, C.-H.; Chen, S.-H.; Wu, H.-H.; Ho, C.-W.; Ko, M.-T.; Lin, C.-Y. cytoHubba: Identifying hub objects and sub-networks from complex interactome. BMC Syst. Biol. 2014, 8, S11. [Google Scholar] [CrossRef]
  145. Bader, G.D.; Hogue, C.W. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 2003, 4, 2. [Google Scholar] [CrossRef] [PubMed]
  146. Oca, A.I.; Pérez-Sala, Á.; Pariente, A.; Ochoa, R.; Velilla, S.; Peláez, R.; Larráyoz, I.M. Predictive Biomarkers of Age-Related Macular Degeneration Response to Anti-VEGF Treatment. J. Pers. Med. 2021, 11, 1329. [Google Scholar] [CrossRef]
  147. Toh, H.; Smolentsev, A.; Sadjadi, R.; Clegg, D.; Yan, J.; Stewart, R.; Thomson, J.A.; Jiang, P. Transcriptomic clock predicts vascular changes of prodromal diabetic retinopathy. Sci. Rep. 2023, 13, 12968. [Google Scholar] [CrossRef]
  148. Laich, Y.; Wolf, J.; Hajdu, R.I.; Schlecht, A.; Bucher, F.; Pauleikhoff, L.; Busch, M.; Martin, G.; Faatz, H.; Killmer, S.; et al. Single-Cell Protein and Transcriptional Characterization of Epiretinal Membranes from Patients with Proliferative Vitreoretinopathy. Investig. Ophthalmol. Vis. Sci. 2022, 63, 17. [Google Scholar] [CrossRef]
  149. Goetz, J.; Jessen, Z.F.; Jacobi, A.; Mani, A.; Cooler, S.; Greer, D.; Kadri, S.; Segal, J.; Shekhar, K.; Sanes, J.R.; et al. Unified classification of mouse retinal ganglion cells using function, morphology, and gene expression. Cell Rep. 2022, 40, 111040. [Google Scholar] [CrossRef]
  150. Zhao, S.; Dai, Q.; Rao, Z.; Li, J.; Wang, A.; Gao, Z.; Fan, Y. Identification of Optic Nerve–Related Biomarkers in Primary Open-Angle Glaucoma Based on Comprehensive Bioinformatics and Mendelian Randomization. Transl. Vis. Sci. Technol. 2024, 13, 21. [Google Scholar] [CrossRef]
  151. Shu, X.; Zeng, C.; Zhu, Y.; Chen, Y.; Huang, X.; Wei, R. Screening of pathologically significant diagnostic biomarkers in tears of thyroid eye disease based on bioinformatic analysis and machine learning. Front. Cell Dev. Biol. 2024, 12, 1486170. [Google Scholar] [CrossRef]
  152. Lachke, S.A.; Ho, J.W.K.; Kryukov, G.V.; O’Connell, D.J.; Aboukhalil, A.; Bulyk, M.L.; Park, P.J.; Maas, R.L. iSyTE: Integrated Systems Tool for Eye gene discovery. Investig. Ophthalmol. Vis. Sci. 2012, 53, 1617–1627. [Google Scholar] [CrossRef]
  153. Tangeman, J.A.; Rebull, S.M.; Grajales-Esquivel, E.; Weaver, J.M.; Bendezu-Sayas, S.; Robinson, M.L.; Lachke, S.A.; Del Rio-Tsonis, K. Integrated single-cell multiomics uncovers foundational regulatory mechanisms of lens development and pathology. Development 2024, 151, dev202249. [Google Scholar] [CrossRef]
  154. Disatham, J.; Brennan, L.A.; Kantorow, M. Epigenetic regulation during lens fiber cell differentiation. Epigenet. Chromatin 2022, 15, 9. [Google Scholar]
  155. Disatham, J.; Brennan, L.; Kantorow, M.; Cvekl, A. Profiling chromatin accessibility during lens development reveals regulatory motif dynamics and Pax6 involvement. Epigenet. Chromatin 2019, 12, 55. [Google Scholar]
  156. Jiang, J.; Shihan, M.H.; Wang, Y.; Duncan, M.K. Lens Epithelial Cells Initiate an Inflammatory Response Following Cataract Surgery. Investig. Ophthalmol. Vis. Sci. 2018, 59, 4986–4997. [Google Scholar] [CrossRef]
  157. Faranda, A.P.; Shihan, M.H.; Wang, Y.; Duncan, M.K. The aging mouse lens transcriptome. Exp. Eye Res. 2021, 209, 108663. [Google Scholar] [CrossRef]
  158. Novo, S.G.; Faranda, A.P.; D’Antin, J.C.; Wang, Y.; Shihan, M.; Barraquer, R.I.; Michael, R.; Duncan, M.K. Human lens epithelial cells induce the inflammatory response when placed into the lens capsular bag model of posterior capsular opacification. Mol. Vis. 2024, 30, 348–367. [Google Scholar]
  159. Duot, M.; Coomson, S.Y.; Shrestha, S.K.; Nagulla, M.M.K.; Audic, Y.; Barve, R.A.; Huang, H.; Gautier-Courteille, C.; Paillard, L.; Lachke, S.A. Transcriptome Meta-Analysis Uncovers Cell-Specific Regulatory Relationships in Embryonic, Juvenile, Adult, and Aged Mouse Lens Epithelium and Fibers. Investig. Ophthalmol. Vis. Sci. 2025, 66, 42. [Google Scholar] [CrossRef]
  160. Gorai, S.; Faranda, A.P.; Shihan, M.H.; Wang, Y.; Duncan, M.K. LIRTS Viewer: A Web-Based Resource to View the Transcriptional Response of Lens Epithelial Cells to Injury. Investig. Ophthalmol. Vis. Sci. 2025, 66, 53. [Google Scholar] [CrossRef]
  161. Zhao, Y.; Zheng, D.; Cvekl, A. A comprehensive spatial-temporal transcriptomic analysis of differentiating nascent mouse lens epithelial and fiber cells. Exp. Eye Res. 2018, 175, 56–72. [Google Scholar] [CrossRef]
  162. Zhao, Y.; Wilmarth, P.A.; Cheng, C.; Limi, S.; Fowler, V.M.; Zheng, D.; David, L.L.; Cvekl, A. Proteome-transcriptome analysis and proteome remodeling in mouse lens epithelium and fibers. Exp. Eye Res. 2019, 179, 32–46. [Google Scholar] [CrossRef]
  163. Disatham, J.; Brennan, L.; Cvekl, A.; Kantorow, M. Multiomics Analysis Reveals Novel Genetic Determinants for Lens Differentiation, Structure, and Transparency. Biomolecules 2023, 13, 693. [Google Scholar] [CrossRef] [PubMed]
  164. Hao, C.; Li, K.; Wei, Z.; Radeen, K.R.; Zhang, X.; Purohit, S.; Fan, X. Transcriptomic Analysis of Human Lens Epithelium Tissue With and Without Cataract Surgery: Uncovering Novel Pathways of Post-Surgical Lens Epithelium Remodeling. Investig. Ophthalmol. Vis. Sci. 2025, 66, 28. [Google Scholar] [CrossRef]
  165. Lalman, C.; Stabler, K.R.; Yang, Y.; Walker, J.L. Supervised machine-based learning and computational analysis to reveal unique molecular signatures associated with wound healing and fibrotic outcomes to lens injury. Int. J. Mol. Sci. 2025, 26, 7422. [Google Scholar] [CrossRef]
  166. Kakati, T.; Bhattacharyya, D.K.; Kalita, J.K.; Norden-Krichmar, T.M. DEGnext: Classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning. BMC Bioinform. 2022, 23, 17. [Google Scholar] [CrossRef]
  167. Zeng, H.; Edwards, M.D.; Liu, G.; Gifford, D.K. Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 2016, 32, i121–i127. [Google Scholar] [CrossRef]
  168. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
  169. Ekanayake, I.U.; Meddage, D.P.P.; Rathnayake, U. A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud. Constr. Mater. 2022, 16, e01059. [Google Scholar] [CrossRef]
  170. Budhkar, A.; Song, Q.; Su, J.; Zhang, X. Demystifying the black box: A survey on explainable artificial intelligence (XAI) in bioinformatics. Comput. Struct. Biotechnol. J. 2025, 27, 346–359. [Google Scholar] [CrossRef]
  171. Lundberg, S.M.; Lee, S.-I. Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
  172. Liu, J.; Gao, J.; Xing, S.; Yan, Y.; Yan, X.; Jing, Y.; Li, X. Bioinformatics analysis of signature genes related to cell death in keratoconus. Sci. Rep. 2024, 14, 12749. [Google Scholar] [CrossRef] [PubMed]
  173. Ahmed, Z.; Wan, S.; Zhang, F.; Zhong, W. Artificial intelligence for omics data analysis. BMC Methods 2024, 1, 4. [Google Scholar] [CrossRef]
  174. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
  175. Min, S.; Lee, B.; Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 2017, 18, 851–869. [Google Scholar] [CrossRef]
  176. Rung, J.; Brazma, A. Reuse of public genome-wide gene expression data. Nat. Rev. Genet. 2013, 14, 89–99. [Google Scholar] [CrossRef] [PubMed]
  177. Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
  178. Oestreich, M.; Chen, D.; Schultze, J.L.; Fritz, M.; Becker, M. Privacy considerations for sharing genomics data. EXCLI J. 2021, 20, 1243–1260. [Google Scholar]
  179. Konnoth, C. AI and data protection law in health. In Research Handbook on Health, AI and the Law; Solaiman, B., Cohen, I.G., Eds.; Edward Elgar Publishing Ltd.: Cheltenham, UK, 2024; Chapter 7. [Google Scholar] [CrossRef]
  180. Abbas, S.R.; Abbas, Z.; Zahir, A.; Lee, S.W. Advancing genome-based precision medicine: A review on machine learning applications for rare genetic disorders. Brief. Bioinform. 2025, 26, bbaf329. [Google Scholar] [CrossRef]
  181. Pham, T. Ethical and legal considerations in healthcare AI: Innovation and policy for safe and fair use. R. Soc. Open Sci. 2025, 12, 241873. [Google Scholar] [CrossRef]
  182. Haider, S.; Pal, R. Integrated analysis of transcriptomic and proteomic data. Curr. Genom. 2013, 14, 91–110. [Google Scholar] [CrossRef] [PubMed]
  183. Argelaguet, R.; Velten, B.; Arnol, D.; Dietrich, S.; Zenz, T.; Marioni, J.C.; Buettner, F.; Huber, W.; Stegle, O. Multi-Omics Factor Analysis—A framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 2018, 14, e8124. [Google Scholar] [CrossRef] [PubMed]
  184. Lee, C.H.; Yoon, H.-J. Medical big data: Promise and challenges. Kidney Res. Clin. Pract. 2017, 36, 3–11. [Google Scholar] [CrossRef] [PubMed]
  185. Wolf, J.; Franco, J.A.; Yip, R.; Dabaja, M.Z.; Velez, G.; Liu, F.; Bassuk, A.G.; Mruthyunjaya, P.; Dufour, A.; Mahajan, V.B. Liquid Biopsy Proteomics in Ophthalmology. J. Proteome Res. 2024, 23, 511–522. [Google Scholar] [CrossRef]
  186. Schaub, J.M.; Fu, D.J.; van Velthoven, C.T.J.; Park, D.Y.; Lin, J.H.; Lee, C.S. A comprehensive review of artificial intelligence models for screening major retinal diseases. Artif. Intell. Rev. 2024, 57, 3487–3518. [Google Scholar] [CrossRef]
Figure 1. AI-assisted transcriptomic analysis pipeline and clinical applications. Transcriptomic and epigenomic data are analyzed using unsupervised and supervised machine learning approaches to uncover biological insights and inform applications like biomarker discovery, therapeutic prediction, and cell atlas construction. Created using BioRender.com. (accessed on 18 August 25, latest web-based version).
Figure 1. AI-assisted transcriptomic analysis pipeline and clinical applications. Transcriptomic and epigenomic data are analyzed using unsupervised and supervised machine learning approaches to uncover biological insights and inform applications like biomarker discovery, therapeutic prediction, and cell atlas construction. Created using BioRender.com. (accessed on 18 August 25, latest web-based version).
Cells 14 01315 g001
Table 1. Summary of unsupervised machine learning algorithms used in transcriptomic analysis.
Table 1. Summary of unsupervised machine learning algorithms used in transcriptomic analysis.
AlgorithmModalityBiological Relevance
PCA, t-SNE, UMAPMicroarray, bulk RNA-seq, scRNA-seq Dimensionality reduction
MAGICscRNA-seqImputes missing values/dropouts
HarmonyscRNA-seqCorrects for batch effects
DEGreport Microarray, bulk RNA-seq, scRNA-seqGroups genes by correlated expression patterns
WGCNAMicroarray, bulk RNA-seq, scRNA-seqClusters groups of genes based on expression profiles
LeidenscRNA-seqDetects cell communities by clustering single-cell transcriptomes
Monocle3, PalantirscRNA-seqReconstructs developmental trajectories
SeuratMicroarray, bulk RNA-seq, scRNA-seqComprehensive toolkit for clustering, dimensionality reduction, and batch correction
LIGERscRNA-seqHarmonizes data from multiple datasets
CIBERSORT, CIBERSORTxMicroarray, bulk RNA-seqEstimates cell type composition from mixed tissue samples
GSVA Microarray, bulk RNA-seqEstimates variation in pathway activity across a sample population
Table 2. Summary of supervised machine learning algorithms used in transcriptomic analysis.
Table 2. Summary of supervised machine learning algorithms used in transcriptomic analysis.
AlgorithmCategoryWhat It DoesStrengthLimitations
LASSOLinear modelSelects a small set of predictive genes by shrinking the contribution of less significant genes [119]Avoids overfitting, more interpretable [120]Assumes linear relationships; can underperform with correlated predictors [121]
SVMClassifier Separates classes by finding the optimal boundary [122]Good for complex high-dimensional gene data with few samples [123]Requires tuning
SVM-RFEClassifier with feature eliminationIteratively removes uninformative genes [124]Good for complex, nonlinear, high-dimensional data [124]Slow; does not account for correlated features [125,126]
RFDecision tree ensembleCombines many trees to improve accuracy and estimate gene importance [127]Robust to noise and overfitting; more interpretable and precise [127,128]Feature importance can be biased toward variables with more categories or more split points (e.g., continuous features) [129]
XGBoostGradient boosting (ensemble)Sequentially builds decision trees to correct previous errors [130]Handles missing values, fast and efficient [130]Can overfit without tuning [128,131]
CatBoostGradient boosting (categorical)Deals well with categorical variables, with strong generalization accuracy [128,132]Performs well on mixed data types; minimal preprocessing of categorical features [133]Different hyperparameters can significantly change speed/accuracy [133]
LightGBMGradient boostingUses histogram-based learning [134]Efficient memory usage; fast training; supports large-scale data [134]Performance may degrade on datasets with extremely high-cardinality categorical features without tuning [132]
AdaBoostBoosted ensembleCombines many models, correcting for mistakes made by earlier versions [135]Fast, avoids overfitting, and handles nonlinear data well [135]Sensitive to noise and outliers [136]
ExtraTreesRandomized tree ensembleBuilds multiple decision trees with extra randomness to reduce overfitting [137]Fast and avoids overfitting [137]Less accurate and harder to interpret [137]
GRNBoost2Tree-based with network interferenceReconstructs gene regulatory networks [138]Captures nonlinear regulatory relationships [138]Requires large datasets; sensitive to noise [139,140]
scGPSClassifier [139,140] with projection scoringTrains classifiers on labeled cell subpopulations to infer trajectories and inter-sample similarity [141]Good for comparing single-cell populations and predicting cell fates across datasets [8]Performance may drop with novel or underrepresented cell types [141]
xCellSignature-based deconvolutionEstimates relative enrichment of immune cells [142]Robust to noise, requires no retraining [142]Limited to predefined signatures [143]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lalman, C.; Yang, Y.; Walker, J.L. Artificial Intelligence in Ocular Transcriptomics: Applications of Unsupervised and Supervised Learning. Cells 2025, 14, 1315. https://doi.org/10.3390/cells14171315

AMA Style

Lalman C, Yang Y, Walker JL. Artificial Intelligence in Ocular Transcriptomics: Applications of Unsupervised and Supervised Learning. Cells. 2025; 14(17):1315. https://doi.org/10.3390/cells14171315

Chicago/Turabian Style

Lalman, Catherine, Yimin Yang, and Janice L. Walker. 2025. "Artificial Intelligence in Ocular Transcriptomics: Applications of Unsupervised and Supervised Learning" Cells 14, no. 17: 1315. https://doi.org/10.3390/cells14171315

APA Style

Lalman, C., Yang, Y., & Walker, J. L. (2025). Artificial Intelligence in Ocular Transcriptomics: Applications of Unsupervised and Supervised Learning. Cells, 14(17), 1315. https://doi.org/10.3390/cells14171315

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop