A Comprehensive Review of the Impact of Machine Learning and Omics on Rare Neurological Diseases

: Background: Rare diseases, predominantly caused by genetic factors and often presenting neurological manifestations, are significantly underrepresented in research. This review addresses the urgent need for advanced research in rare neurological diseases (RNDs), which suffer from a data scarcity and diagnostic challenges. Bridging the gap in RND research is the integration of machine learning (ML) and omics technologies, offering potential insights into the genetic and molecular complexities of these conditions. Methods: We employed a structured search strategy, using a combination of machine learning and omics-related keywords, alongside the names and synonyms of 1840 RNDs as identified by Orphanet. Our inclusion criteria were limited to English language articles that utilized specific ML algorithms in the analysis of omics data related to RNDs. We excluded reviews and animal studies, focusing solely on studies with the clear application of ML in omics data to ensure the relevance and specificity of our research corpus. Results: The structured search revealed the growing use of machine learning algorithms for the discovery of biomarkers and diagnosis of rare neurological diseases (RNDs), with a primary focus on genomics and radiomics because genetic factors and imaging techniques play a crucial role in determining the severity of these diseases. With AI, we can improve diagnosis and mutation detection and develop personalized treatment plans. There are, however, several challenges, including small sample sizes, data heterogeneity, model interpretability, and the need for external validation studies. Conclusions: The sparse knowledge of valid biomarkers, disease pathogenesis, and treatments for rare diseases presents a significant challenge for RND research. The integration of omics and machine learning technologies, coupled with collaboration among stakeholders, is essential to develop personalized treatment plans and improve patient outcomes in this critical medical domain.

Most RDs are genetically rooted [19], a fact that omics technologies can exploit to accelerate diagnosis and drug discovery.DNA and RNA sequencing advancements have led to various genomic analysis techniques, like whole-exome sequencing, whole-genome sequencing (WGS), and single-cell RNA sequencing, providing deep genomic insights [20,21].These omics investigations unveil disease aspects previously obscured by traditional approaches.For example, WGS has identified pathogenic variants in rare epilepsies and the genetic causes of rare diseases [20].Omics extends beyond genomic resolution, including proteomics, metabolomics, epigenomics, and lipidomics, which assess proteins, metabolites, DNA machinery, and lipids [22].Radiomics, a novel omics field, involves high-throughput medical imaging assessments [23].Artificial intelligence (AI) and machine learning algorithms analyze these diverse data, enabling reanalysis for research and healthcare solutions [24].Machine learning, a key interest area, involves training algorithms on large datasets to predict unseen data.The algorithms fall into three categories: supervised learning (learning from labeled data), unsupervised learning (finding patterns in input data without targets), and reinforcement learning (action-reward-based learning) [25].AI can assist RD research and treatment, aiding in variant classification, biomarker identification, gene interactions, and the understanding of protein and metabolite profiles [26].It facilitates disease diagnosis and prognosis by integrating phenotype data with omics data, discovering new drug molecules, and managing patient registries and rare disease databases.This review assesses omics and AI in a combined approach to overcome the RD treatment challenges.Understanding RDs' molecular pathophysiology and drug development is crucial, especially for neurological RDs involving the nerves, muscles, and brain.AI enhances pharmaceutical development with automated processes, efficiency, and unconventional insight generation.The focus is on compiling ML applications exploring omics data in rare neurological disorders and raising the awareness of AI and omics in rare diseases.A list of some of the rare neurological disorders and a brief explanation of the algorithms and omics data described in this review are given separately in Table 1.

Difficulties in Disease Mechanism Investigation and Biomarker Discovery
One of the prime challenges in the rare disease diagnosis domain is the lack of understanding of the disease and the mechanisms that cause it.Since the molecular pathophysiological factors of rare diseases are unknown, clinicians find it difficult to link the symptoms between different organ systems and differentiate between disorders with overlapping symptoms.The lack of valid parameters and biomarkers, as well as the low frequency of occurrence of the disease, makes it difficult to derive statistically significant and clinically relevant parameters that can assist in diagnosis.However, next-generation sequencing (NGS) technologies such as whole-genome sequencing, whole-exome sequencing, and DNA methylation techniques are now being commonly utilized in the research and diagnosis of rare diseases.One of the clear advantages of NGS is the ability to interrogate multiple targets at the same time, making it possible to uncover the molecular heterogeneity between and within rare neurological diseases.The main challenges discussed in this section are summarized in Table 2.

Mutation Detection or Prediction
Detecting pathogenic variants in genomes is crucial for diagnosis and in guiding precision medicine.Deep intronic variants, often challenging to detect via whole-exome sequencing, play a role in multiple disorders [42].Machine learning tools like SpliceAI and SpliceRover are revolutionizing this area of detection.SpliceAI, using a 32-layer deep convolutional neural network, predicts splice junctions from pre-mRNA, identifying cryptic splicing variants [43].It has successfully identified de novo mutations in conditions like intellectual disability and autism spectrum disorder (ASD), with observed enrichment in these disorders.SpliceRover employs convolutional neural networks to identify splice sites, offering a more nuanced analysis compared to traditional probabilistic methods [27].It detected a significant cryptic exon in Joubert syndrome [44].Additionally, tools like the Variant Effect Scoring Tool (VEST) use algorithms like random forest to prioritize gene variants for diseases like Freeman-Sheldon syndrome and Miller syndrome, outperforming other tools in missense variant prioritization [28].These advancements highlight the growing role of machine learning in understanding complex genetic variations and rare neurological conditions.

AI in Tumor Identification
The application of bioinformatics and AI in tumor studies, particularly for brain tumors classified by growth rate and recurrence, is advancing tumor diagnosis and treatment.Glioma tumors, originating from mutations in glial cells, are sub-classified as astrocytomas, oligodendrogliomas, or ependymomas and graded based on their aggressiveness [45].Pediatric and adult brain tumors exhibit copy number alterations (CNAs), contributing to genomic instability and tumor progression [46][47][48].CNV calling from sequencing data, particularly AluScan, is complex, but AluScanCNV has been developed for efficient CNV calling, distinguishing non-cancerous and cancerous tissues in glioma samples [29].
Molecular testing, crucial for the diagnosis of oligodendroglial tumors, requires the detection of IDH gene mutations and 1p/19q co-deletion [49].A one-dimensional convolutional neural network analyzed CNVs from NGS data to detect 1p/19q co-deletion in 61 tumors, validated against 427 low-grade glial tumors from The Cancer Genome Atlas [49].
In PURA syndrome research, exome sequencing and AI algorithms identified a de novo mutation, c.697-699del p.Phe233del in the PURA gene, with structural analysis using Alpha Fold and hybrid quantum mechanics-molecular mechanics (QM-MM) analyses [30,50].This study marks a significant advancement in understanding the functional impact of mutations at an atomic level, laying the groundwork for future functional analyses.

Genotype-Phenotype Integration
Integrating genomic data with phenotype and clinical features enhances models that predict phenotypic traits and outcomes, revealing biomarkers and insights into the heritability of complex traits.PhenoApt, using ML-based graph embedding techniques, prioritizes genes for Mendelian disorder diagnoses by mapping data from HPO, OMIM, and Orphanet [31].It assigns scores based on phenotype-gene vector representations, aiding in gene prioritization.
DOMINO, another tool, focuses on identifying dominant mutations in Mendelian disorders, a challenge due to frequent non-pathogenic heterozygous variants [32].It uses linear discriminant analysis on genomic data, protein interactions, and structures, trained on 985 genes with known Mendelian inheritance patterns.In epilepsy and intellectual disability cases, DOMINO accurately identified known genes and predicted new candidates.An ML study on genotyping and clinical data from neurological disease patients developed a multinomial linear model that accurately identified 88% of disease samples, emphasizing the importance of age and cognition [33].This analysis also found common SNPs across neurological diseases, linking MND to RBBP5 and TNF, and MG to oncogenes and brain-related genes.In phenylketonuria (PKU), the PPML machine learning framework predicts the PKU phenotype based on nucleotide mutations and amino acid changes [34].Using a random forest classifier, it accurately classified PKU into classical, mild, and mild hyperphenylalaninemia categories, enhancing the genotype-to-phenotype linkage, crucial for treatment strategy and prognosis prediction.

Omics Data Integration for Disease Characterization
To develop effective disease treatments, understanding the affected molecules and their interactions is key, with metabolites playing a crucial role as they reflect the biochemical activity in cells.Liquid chromatography-mass spectrometry (LC-MS) is commonly used to globally measure metabolites, but identifying them can be challenging due to multiple metabolites matching a single peak [35].Pirhaji et al. developed PIUMet, a network-based algorithm integrating protein and metabolite interactions to identify metabolites from LC-MS peaks.Utilizing ML, statistical analysis, and network optimization, PIUMet infers putative metabolites and dysregulated pathways.Applied to Huntington's disease (HD) data, it identified disrupted features like the sphingolipid subnetwork and steroid metabolism.
Amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) share pathological, clinical, and genetic features, including C9orf72 repeat expansion [51].Dickson et al. analyzed RNA seq data from the frontal cortex tissue of FTLD and FTLD/MND patients to understand their clinical variability.Although the initial regression models did not yield significant genes post-adjustment, ML models like LASSO and random forest regression with leave-one-out cross-validation highlighted biologically relevant genes consistently associated with outcomes.Genes such as VEGFA, CDKL1, EEF2K, and SGSM3 were promising, with VEGFA linked to the disease onset age.

Disease Mechanisms and Research Models
ML and omics technologies are instrumental in understanding rare neurological diseases, revealing the connections between gene variants, phenotypes, and clinical features.These technologies have advanced the knowledge of disease pathogenesis and aided in developing experimental models.Trevino et al. used single-cell methods to map the gene-regulatory circuit in human corticogenesis, employing a deep learning model derived from BP-Net [36].This model predicted genetic variants' impacts on epigenomic elements and highlighted ASD-related mutations.Wilscher et al. utilized self-organizing maps (SOMs) for detailed mapping from the transcriptomics and DNA methylation data of gliomas [37].Their high-resolution molecular map revealed connections between gene expression, methylome changes, and the tumor microenvironment, offering insights into glioma subtypes and prognosis.Loeffler-Wirth et al. implemented SOMs to analyze the transcriptome and methylome in developing and aging brains [38].Their maps showed gene expression and methylation changes over the lifespan, identifying gene sets impacting gliomas and providing potential biomarkers.[39].They found abnormal differentiation and transcriptomic dysregulation in ARCOs from Prader-Willi syndrome patients, demonstrating their value in studying early human arcuate development in these diseases.For Huntington's disease, gene expression profiles from the caudate nuclei of asymptomatic HD+ individuals were compared with those of symptomatic HD individuals and healthy controls [40].A random forest classifier identified genes potentially involved in the early onset of the disease.

Huang et al. created arcuate organoids (ARCOs) from human iPSCs to model hypothalamic arcuate nucleus development in neurodevelopmental disorders
In neuropsychiatric research, the SH-SY5Y neuroblastoma cell line is commonly used.The CoNTeXT framework, an ML algorithm, estimates the developmental stage and regional identity of transcriptomic signatures [41].This study found significant gene overlaps in ASD, Fragile X Syndrome, intellectual disability, and schizophrenia, highlighting pathways specific to each disorder during early neurodevelopment.The landscape of the diagnosis of rare neurological diseases is evolving rapidly, transitioning from traditional heuristic approaches to more advanced and precise methodologies.Traditional methods, which relied heavily on clinical experience and medical literature, often resulted in a long, uncertain journey towards diagnosis for many patients.In contrast, recent advancements in genomics and data analysis are providing new pathways to understand these complex conditions.

Diagnosis
Gene panels, microarrays, and exome sequencing have become pivotal in uncovering the molecular basis of many previously undiagnosed and rare diseases.These techniques, when coupled with long-read technology, transcriptomics, metabolomics, proteomics, and methylome data, are enhancing the precision and speed of diagnosis.The integration of artificial intelligence (AI) with these methods is pushing the boundaries further, allowing for more comprehensive and nuanced analysis.For instance, Choi et al. [52] conducted a systematic evaluation of machine learning algorithms and feature selection methods to classify neuromuscular diseases with remarkable accuracy.Their study utilized support vector machines (SVM) and directed acyclic graphs, achieving a 100% success rate.This breakthrough is significant, as it demonstrates the potential of AI in identifying diseases with complex genetic backgrounds.Similarly, Caputo et al. [53] developed a machine learning-driven protocol for the classification of Facioscapulohumeral Muscular Dystrophy (FSHD) based on DNA methylation patterns, showcasing the ability of these technologies to discern subtle genetic variations.Their methodology, which incorporated various machine learning models, was able to differentiate FSHD patients from controls with high precision.In the realm of idiopathic inflammatory myopathies (IIM), a study analyzed the plasma and urine metabolomes of patients, employing machine learning algorithms to identify specific biomarkers [53].This approach is essential in diseases like IIM, where subtypes exhibit overlapping symptoms but require distinct treatments.
Moreover, neural networks are proving invaluable in distinguishing conditions such as sporadic Creutzfeldt-Jakob disease (sCJD) from healthy states [54].By analyzing differentially methylated CpG loci, these models can effectively differentiate sCJD patients with notable accuracy.Advancements in DNA methylation studies are also facilitating the better understanding and diagnosis of conditions like malformations of cortical development (MCDs).Jabari et al. [55] applied machine learning and deep learning to decipher the DNA methylation pattern in MCD, achieving high accuracy and predictive value.
Machine learning models have also been instrumental in pre-diagnostic risk assessments for diseases like amyotrophic lateral sclerosis (ALS) [56,57].Although challenges remain in accurately differentiating ALS patients from healthy individuals, these studies have identified metabolic dysregulation that occurs years before disease onset.The novel computational method CTD [58] exemplifies the integration of untargeted metabolomics with genomic data, offering a more refined approach to diagnosing inborn errors of metabolism (IEMs).This method connects metabolite perturbations to disease-specific networks, improving clinical decision-making.
In oncology, AI models like those developed by Zhao et al. [59] and Capper et al. [60] are differentiating between types of brain tumors based on multi-omics data and DNA methylation profiles, demonstrating the potential of AI in precision medicine.The field of radiomics is another area where AI is making significant strides.By extracting quantitative features from radiographic images, machine learning algorithms are enhancing the diagnosis of rare neurological diseases and tumors [61][62][63].These models not only compete with but, in some cases, outperform human experts in diagnosing conditions like high-grade gliomas [64,65].
In summary, the integration of AI with genomic and omics technologies is revolutionizing the diagnosis and understanding of rare neurological diseases.By enabling more precise, efficient, and early diagnostics, these advancements hold great promise for patients who have long struggled with undiagnosed conditions.However, this evolving landscape also presents new challenges and opportunities for future research and clinical application.

Prognosis
Early diagnosis and optimal care for rare diseases are pivotal, especially for underserved populations.Advances in medical bioinformatics, artificial intelligence (AI), and machine learning (ML) have enabled the identification of disease patterns, the prediction of disease progression, and the assessment of treatment responses.The random forest algorithm, applied to multi-omics data, identified 111 genes linked to survival outcomes in astrocytoma and oligodendroglioma, serving as diagnostic biomarkers [65].Neural networks have been crucial in identifying prognosis-related genes in neuroblastoma, a common childhood extracranial solid tumor, with 84% sensitivity and 90% specificity for poor-outcome patients [66].Deep neural networks outperformed support vector machines and random forest in predicting neuroblastoma outcomes from omics data [67].
Linear support vector machines and random forest, trained on various omics data, have been used for predictive classification in neuroblastoma [68].Integrative network fusion improved prognosis prediction by integrating microarray and aCGH datasets.DNA methylation alterations in neuroblastoma, analyzed using random forest and XGBoost, highlighted distinct methylation patterns as indicators of disease progression [69,70].Networkbased methods have been evaluated for the integration of multi-omics data to predict clinical outcomes in neuroblastoma, achieving 65-80% accuracy [71].
ML algorithms have also been applied to assess genotyping data from medulloblastoma patients, identifying genetic predictors of intellectual functioning post-treatment [72].In medulloblastoma, logistic regression used mRNA expression and DNA methylation signatures to guide prognosis.A novel framework by Mihaylov et al. integrated gene expression and clinical data from neuroblastoma and breast cancer patients to predict the survival time [73].Bratulic et al. explored metabolomic profiles for early cancer detection, finding that glycosaminoglycan profiles could be used to detect cancer types with good sensitivity [74].In ALS, Sparse Canonical Correlation Analysis explored the role of genes in cognitive dysfunction using whole-genome sequencing [75,76].For epilepsy, random forest and XGBoost identified co-expressed genes linked to the cardiac event risk [77].The machine learning-driven metabolomic profiling of aneurysmal subarachnoid hemorrhage patients uncovered biomarkers for functional outcomes [78].
Radiomics studies, enhanced by ML, have improved the glioblastoma biopsy guidance and differentiated brain metastases from glioblastoma [79,80] Convolutional neural networks have been used to detect fatty infiltration in neuromuscular diseases, with HRNet being the most effective [81].Finally, ML regression models have been employed to predict the muscle fat fraction in FSHD, aiding in disease progression assessment [82].These advancements in AI and ML are transforming the landscape of diagnosis, prognosis, and treatment in rare diseases.

Therapeutic Approach
Precision medicine, particularly in the context of rare diseases, is transforming healthcare through a holistic approach that includes diagnosis, treatment, and follow-up tailored to an individual's genetic makeup.Artificial intelligence (AI) and machine learning (ML) are pivotal in this transformation, analyzing diverse data types like clinical features, multiomics data, and medical images and incorporating phenotype, pharmacogenomic, and pharmacokinetic factors.
Gene therapy, especially CRISPR-based tools, is revolutionizing the treatment of rare neurological diseases.Shen et al. developed inDelphi, an ML algorithm, to predict Cas9induced insertions and deletions with high accuracy, aiding template-free DNA editing for diseases like Hermansky-Pudlak syndrome and Menkes disease [83].In Duchenne muscular dystrophy (DMD), Nishida et al. and Malueka et al. explored exon-skipping therapies, using AI to identify cryptic exons and classify dystrophin gene exons for potential therapeutic targets [84].
For adamantinomatous craniopharyngioma (ACP), Lin et al. utilized random forest and LASSO regression to identify diagnostic markers S100A2 and SDC1 from gene expression profiles, pinpointing potential drug targets like Pentostatin and Wortmannin [85].In medulloblastoma (MB), an ML model was used to discover gene expression-based stemness indices and DNA methylation-based stemness indices, leading to the identification of 96 compounds targeting MB pathways [86].
Gilard et al.'s study on glioblastoma used random forest classifiers to differentiate between diseased and control samples based on metabolomic profiles, highlighting phosphatidylcholine (PC aa C36:6) as a key biomarker [87].De Jong et al. applied various ML models in precision medicine for rare epilepsy conditions, with the XGBoost trees classifier demonstrating notable effectiveness in predicting the drug response [88].DNA methylation profiling in temporal lobe epilepsy (TLE) patients identified potential biomarkers for the drug response, utilizing ML for accurate prediction [88].
Dahlin et al.'s research on the gut microbiota in drug-resistant epilepsy children revealed the potential benefits of a ketogenic diet, employing ML to analyze the gut microbiome's role in epilepsy [89].Kurkiewicz et al. approached myotonic dystrophy type 1 (DM1) as a spectrum disorder, using ML models to predict the modal allele length of the DM1 CTG expansion, a crucial factor in disease progression and the treatment response [90].
In summary, AI and ML are pivotal in advancing precision medicine, especially in rare diseases, by enabling personalized treatment strategies based on genetic and molecular profiles.

Methods
To identify scientific articles that described the application of artificial intelligence to omics data about rare neurological diseases, the Medline database and PubMed were used, with additional searches in Scopus and Web of Science to ensure the thorough coverage of the biomedical literature.A triple combination of keywords related to machine learning ("machine learning", "artificial intelligence"), omics ("genomics", "proteomics", "multiomics"), and rare neurological diseases ("rare neurological disease", "rare neurological disorder") was used to create the search string.Additionally, the names and synonyms for 1840 specific rare neurological diseases were searched in combination with the general terms/keywords of machine learning and omics.These specific rare neurological diseases were identified with the help of Orphanet [91].Orphanet is a comprehensive database that provides information on rare diseases and orphan drugs to improve the diagnosis, care, and treatment of patients with RDs.It addresses the scarcity and fragmentation of knowledge on RDs by providing multiple levels of classification and nomenclature [92,93].Only diseases with known point prevalence ("1-5/10,000", "1-9/100,000", "1-9/1,000,000", "1/1,000,000") were included in the search.For most diseases, Orphanet provides PubMed search strings, which were used to construct the search term (for example, "Aneurysm* (subarachnoid hemorrhage [ti] OR subarachnoid hemorrhage[ti] OR subarachnoid hemorrhage[mh]) OR aneurysmal SAH[tw]" for the disorder acquired aneurysmal subarachnoid hemorrhage).For diseases where no search terms were available from Orphanet, the disorder name was used instead.The inclusion/exclusion criteria were as follows: (1) manuscripts written in English and that included a title and abstract were selected; (2) Orphanet classification was used and only rare neurological diseases with Orpha codes were included for further study; (3) manuscripts involving the use of at least one concrete AI/ML algorithm to handle/explore omics data related to rare neurological diseases were included; (4) reviews and studies on animal models were excluded from the results.The literature search covered articles published from January 2000 to December 2023, allowing us to capture the evolution of AI and omics technologies in the context of rare neurological diseases over the past two decades.We aimed to minimize potential bias by conducting a comprehensive search across multiple databases, including studies with varying outcomes and methodologies, and using systematic and transparent selection criteria.The list of rare neurological disorders, as well as the details of the algorithm and omics data described in this review, can be found in the Supplementary Materials.

Discussion and Conclusions
In this review, the scientific literature on ML and omics methods was assessed to explore which artificial intelligence techniques are being utilized to advance the understanding of rare neurological diseases (RNDs) as well as how they are being applied.The most commonly used algorithms were random forest, support vector machines, and artificial neural networks.The most common applications were with regard to biomarker discovery and the diagnosis of rare neurological diseases based on omics data.The majority of the studies gathered in the review were found to be focused on genomics and radiomics.This was expected given that genetic factors are the leading cause of rare diseases and magnetic resonance imaging is the most frequently utilized clinical tool in neuroimaging.The integration of various omics technologies to enhance our understanding of RNDs is illustrated in Figure 1.The random forest algorithm is advantageous as it uses an ensemble of decision trees to lower the variance and reduce overfitting.It is also robust to outliers and requires no feature scaling.Support vector machines are useful in cases where the number of features is more than the number of samples, and the kernel functions associated with SVM can be customized to enhance classification.Artificial neural networks are able to learn and model non-linear complex relationships and they capture new features in the hidden layers that can be instrumental to understanding the molecular details of diseases.Indeed, images derived from medical imaging techniques can be best standardized and processed by deep neural networks.
With the rise of "big data", there is an increasing need to automate tasks that currently require human intervention.In the field of biomedicine, artificial intelligence (AI) techniques have been developed to analyze a wide range of data, from individual omics data and clinical phenotypes to large-scale health databases and multiparametric studies involving large cohorts of patients.Over the past 20 years, machine learning has become a well-established and highly useful discipline.Although there are several learning paradigms available today, machine learning has been successful in various applications, including life sciences and medical research.However, the clinical use of machine learning methods is still relatively rare.AI algorithms have the potential to enhance the diagnosis and understanding of rare neurological diseases by performing mutation detection, prediction, classification, and the identification of disease biomarkers.This can lead to an increase in the number of diagnosed cases and uncover new disease mechanisms and therapeutic targets.However, there is still a need to improve the rate of research and development for rare neurological diseases.While AI has made significant progress in diagnosis, progress in therapy development has been modest.It is known that machine learning plays a significant role in improving treatment by accelerating drug development, predicting the drug's efficacy, optimizing the dosages, and repurposing existing drugs for other diseases [94,95].With the ever-evolving AI frameworks, it can be assumed that the promising results obtained so far will soon change the current scenario in the treatment of rare neurological diseases.To diagnose and characterize rare neurological disease patients, AI-based multi-omics integrative approaches are being adopted as genomic data alone are often insufficient.Additionally, novel applications of AI are being explored to develop new research models for RNDs [96].However, there is still room for improvement in AI-mediated diagnosis, particularly in designing and training models for rare diseases.This is due to various confounding and detrimental factors, such as small patient cohorts and differences in patient ethnicity and gender.The most significant limitation in building predictive models for RNDs is the data collection process [97].Applying machine learning models to unstructured, poorly standardized, and low-quality control data can adversely affect the model's performance [98].This is because noise, incompleteness, and sparsity can lead to model overfitting, resulting in high prediction accuracy on training data but low prediction accuracy on new evaluation data.Regarding the limitation of small sample sizes, it must be noted that deep learning models generally require thousands of samples to generalize over the data and achieve robust solutions, while shallow models may still need at least a few hundred samples to build reasonably high-performing models.However, there are several ways to deal with this issue of small sample sizes in machine learning for RNDs.One method is to learn from data from other disorders that are related to the disease being studied or at least share overlapping features.If the heterogeneity is accounted for, one can take into account data regarding the same disease but derived from diverse sources, such as 'multi-omics' data; medical imaging; clinical features; patient registries; open-source databases on genes, proteins, mutations, and drug interactions; and phenotypic data.Data augmentation or the enhancement of the existing data strategy with simulated samples can be considered as well [92,99].Transfer learning is another option, wherein one can use the knowledge learned by other similar models and fine-tune it to suit the studied domain.Deep learning paradigms have been successful in big data scenarios with large sample sizes, but they often produce models that are difficult to interpret.To enable clinicians to understand the meaning of the classification results, it is necessary to use less complex but explainable models [43,93].Interpretations of the data derived from explainable models must be uniform across multiple learning algorithms and within the domain or disease being studied.This is possible only when feature extraction and weighting is a stable process and captures the biologically relevant data patterns.Such efforts may strengthen the clinical decision in the small sample regime of RNDs.Additionally, features must be assessed from a biological and statistical standpoint, and robust error analysis must be conducted.Sometimes, routine diagnostic techniques may be insufficient in providing a feature set that can be analyzed by AI to generate results relevant to disease pathogenesis; this may warrant slight modifications in the diagnostic tools.In this report, Dionnet et al. developed a 'minigene' functional assay to identify aberrant splicing in CAPN3, the gene responsible for limb girdle muscular dystrophy [22].Whole-exome sequencing followed by analysis with AI tools failed to predict the splicing impact for the majority of the deep exonic variants.However, a change in the functional assay that specifically targeted the CAPN3 gene helped to identify 24 variants with AI techniques and seven were clinically important.Although AI predictions can help to solve medical challenges in RNDs, all results must be experimentally validated to confirm their biomedical relevance.One issue that needs to be addressed is the lack of external validation studies for AI models in clinical practice.These studies are crucial in assessing the generalizability and reliability of the algorithm and determining its potential use in clinical settings.However, only a few studies have conducted this type of validation, partly due to the difficulty in obtaining large and diverse datasets and the lack of standardized methods for data collection and analysis.Without adequate validation, there is a risk that the algorithms will produce unreliable or inaccurate results when applied to new datasets or patient cohorts.To solve this problem, collaboration among researchers, clinicians, and data scientists is necessary to develop standardized methods and share data and algorithms to facilitate external validation studies.Furthermore, it is important to note that AI-based applications must be tailored to the biomedical issue.Biomedical data and the associated challenges are complex, and numerous AI-based algorithms and methods are being improved.Technical limitations and data management and protection must also be carefully considered when designing an AI approach in the medical context.Artificial intelligence and machine learning models show great promise in the identification, diagnosis, treatment, and follow-up of rare neurological diseases.With vast amounts of heterogeneous data now available, ML algorithms can identify patterns and rapidly analyze such data, which would otherwise be incomprehensible to human analysts.While omics-based classifiers assist in the diagnosis of RNDs and help to distinguish between disease mimics, predictive modeling techniques can help to monitor disease progression, thereby allowing for earlier interventions and better treatment planning.From a precision medicine perspective, by identifying biomarkers associated with a particular rare disease, AI algorithms can help to develop personalized treatment plans, helping to improve patient outcomes and reduce the risk of side effects.Rare neurological diseases pose specific challenges such as a limited understanding of the molecular pathophysiology of the disease, small patient groups, and a big data regime-specifically, 'omics' data.AI models should be designed to overcome these challenges and need to be validated through clinical trials and real-world evidence.The use of AI and omics data in rare disease research raises significant ethical and privacy concerns, including the challenges of obtaining valid consent, protecting confidentiality, and navigating privacy, data protection, and copyright issues [100][101][102].Patients and caregivers generally support the use of AI in healthcare research, highlighting the need for transparency and disclosure [103].Privacy laws like the GDPR in the EU and HIPAA in the US are crucial for patient data protection, yet their application can pose threats to the progress of rare disease research [103,104].The tension between the potential benefits and risks of AI in healthcare, including privacy concerns, has been underscored [105].Ethical frameworks have been suggested to address the use and sharing of clinical data for AI applications, advocating for data stewards and the protection of patient privacy [106].However, the need for further research into these ethical implications, especially in lowand middle-income countries, is paramount [107].Regulations and governance approaches need refinement to tackle the ethical challenges posed by AI in rare disease research effectively [104,107].The issue of equity is also pivotal, with an emphasis on ensuring that AI and omics advancements benefit all populations and do not exacerbate health disparities [107].The integration of privacy, trust, accountability, responsibility, and bias into the research framework is essential to navigate the complex landscape of AI and omics data in rare disease research [104,107,108].
A major opportunity for further exploration exists in the future of this research, particularly in relation to the emerging role of Large Language Models (LLMs) and knowledge bases in enhancing omics and machine learning research in RNDs.A key objective of future research for rare neurological disorders should be to leverage the full potential of LLMs and knowledge bases through the strategic integration of these two tools in omics and machine learning research.Realizing the transformative impact of these technologies will require the development of robust frameworks for their ethical and effective application.
Finally, artificial intelligence techniques strongly rooted in clinical understanding, the appropriate ethical principles, and sound computational frameworks can help to address the knowledge gap in rare neurological diseases and benefit patients and their families.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biomedinformatics4020073/s1,Table S1: The algorithm and omics data for rare neurological disorders reviewed in the article.A detailed description of the algorithm and omics data for rare neurological disorders is attached.
Funding: This research did not receive any specific grants from funding agencies in the public, commercial, or not-for-profit sectors.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement:
The author has provided informed consent for the publication of identifiable details within this manuscript.This consent encompasses publication across various media, including print and digital formats, ensuring the dissemination of the material in the public domain.
Data Availability Statement: Not applicable.

Figure 1 .
Figure 1.Omics technology for rare neurological disease (RND) research.The figure was drawn using BioRender.com.

Table 1 .
Important rare neurological disorders with algorithms and omics data.

Table 2 .
Summary of the main challenges discussed in Section 2.