1. Introduction
Rare pediatric neurological diseases (RPND) represent one of the most complex challenges in modern medicine. It is known that more than 7000 rare diseases have been registered, approximately 80% of which have a genetic basis and may first manifest during childhood, including various neurological pathologies. Although each individual diagnosis is rare (approximately 1 in 10,000–40,000 newborns), rare diseases collectively affect 6–8% of the population, corresponding to over 300 million people worldwide. These figures are supported by international organizations such as EURORDIS (
https://www.eurordis.org, accessed on 24 March 2025) and Orphanet (
https://www.orpha.net, accessed on 30 March 2025) [
1].
RPNDs are characterized by high clinical complexity, heterogeneity of presentation, often unclear etiology, and progressive course, leading to severe disability, reduced quality of life, and increased mortality in children. Traditional diagnostic methods often result in prolonged diagnostic processes involving numerous examinations, consultations, and repeated testing. According to Orphanet, approximately 70% of rare diseases begin in childhood, and the National Organization for Rare Disorders (NORD) reports that the diagnostic odyssey may last from 3 to 7 years (
https://rarediseases.org/rare-disease-information/what-are-rare-diseases/, accessed on 02 April 2025). A study published in Genetics in Medicine showed that more than 60% of patients with rare genetic diseases underwent over five standard diagnostic procedures before receiving an accurate diagnosis, which significantly increases the financial burden on the healthcare system and exerts substantial emotional stress on families [
2,
3]. These data highlight the need to adopt new methodological approaches capable of reducing diagnostic time and resource expenditures.
In recent years, significant progress has been made in the fields of deep phenotyping and genomics, offering innovative opportunities for diagnostics. Deep phenotyping enables the identification of even subtle clinical features that may be overlooked during standard clinical assessments [
4]. Robinson (2012) emphasized that detailed phenotypic descriptions are a cornerstone of implementing precision medicine principles [
5]. When integrated with genomic technologies such as exome sequencing (ES) or genome sequencing (GS), it becomes possible to accurately correlate phenotypic manifestations with genetic mutations, thereby substantially increasing diagnostic accuracy [
6].
In several regions, access to advanced next-generation sequencing (NGS) technologies remains limited [
7]. In settings where the use of costly sequencing platforms is constrained, deep phenotyping can serve as an effective pre-laboratory method for patient selection for subsequent genetic testing. This approach narrows the range of suspected genetic conditions based on clinical presentation, contributing to more rational resource utilization and reducing the overall cost of molecular genetic diagnostics.
In many low- and middle-income settings, including Kazakhstan, children with early motor impairments receive a diagnosis of cerebral palsy (CP) based solely on clinical neurological examination, even when neuroimaging (e.g., MRI) shows no focal lesions or structural abnormalities to support CP. This blanket application of the CP label overlooks monogenic syndromes that phenotypically mimic non-progressive motor disorders, so-called “CP-phenocopies”, for which genetic diagnosis is essential to enable targeted therapies, refine prognostic expectations, and provide accurate genetic counseling. Incorporating systematic evaluation for CP-phenocopies within the deep phenotyping workflow allows for more precise patient stratification for exome or genome sequencing, thereby enhancing diagnostic yield and promoting equitable access to precision diagnostics in resource-limited environments.
The integration of deep phenotyping and genomics is crucial for sustainable healthcare development. Accurate and early diagnosis optimizes the use of medical resources, reduces the need for additional testing, and ensures equitable access to advanced technologies, which is particularly relevant in resource-limited settings. The application of this integrated approach facilitates the timely initiation of targeted treatment, improves clinical outcomes, and alleviates both emotional and financial burdens on patient families. This approach aligns with the United Nations Sustainable Development Goals (SDGs), particularly SDG 3 (Good Health and Well-being), SDG 9 (Industry, Innovation and Infrastructure), and SDG 10 (Reduced Inequalities) (
https://sdgs.un.org/goals, accessed on 10 April 2025) [
8].
The Study Objective is to develop and implement an integrated diagnostic approach combining deep phenotyping using Human Phenotype Ontology (HPO), ES with laboratory interpretation following American College of Medical Genetics and Genomics (ACMG) guidelines, and reverse phenotyping based on OMIM data [
9]. This approach aims to shorten the time to accurate diagnosis, optimize resource utilization, and ensure equitable access to modern diagnostic methods for rare pediatric neurological diseases. Ultimately, it is expected that combining deep and reverse phenotyping with ES will increase the proportion of genetically confirmed diagnoses and reduce the duration and number of diagnostic procedures.
2. Materials and Methods
2.1. Study Design
This study employed a prospective observational design with retrospective data collection of pre-diagnostic parameters. Although patients were identified and deeply phenotyped according to a standardized prospective protocol, the key clinical and procedural measures—such as time to diagnosis, number of diagnostic interventions prior to ES, and turnaround time—were collected retrospectively from medical records. This design allowed for the evaluation of real-world diagnostic trajectories in children with suspected rare neurological diseases. Such a pilot design enabled a comprehensive assessment of the effectiveness of a diagnostic algorithm based on deep phenotyping and ES, with a focus on reducing the time to diagnosis and minimising the diagnostic burden in children suspected of having rare neurological diseases. The choice of this design is justified by its appropriateness in studying rare diseases, as it enables the implementation of an integrated diagnostic approach with rational use of time and resources [
10].
Patients were referred from 11 healthcare institutions in South Kazakhstan between 1 July and 20 December 2023. Key stages of the study—deep phenotyping and biological sample collection—were conducted at the Clinical and Diagnostic Center of the Khoja Akhmet Yassawi International Kazakh-Turkish University.
The study was conducted in accordance with the Declaration of Helsinki. Approval was obtained from the Local Ethics Committee of Khoja Akhmet Yassawi International Kazakh-Turkish University (protocol №16, 8 June 2023; project identification code: №16/2023). Written informed consent was obtained from the parents or legal guardians of all participants.
2.2. Preliminary Phase: Physician Training
Before patient recruitment, individual and group training sessions were conducted for outpatient physicians. These sessions focused on recognizing rare neurological diseases with suspected genetic etiology. The training covered key phenotypic features of RPNDs and aimed to develop physicians’ skills in the early identification of patients who may benefit from deep phenotyping and genetic testing. Upon completion, pediatric neurologists and general practitioners referred patients who met the established criteria, ensuring high-quality patient selection for further investigation.
2.3. Study Population
Initially, 250 children under the age of 18 with suspected RPNDs were considered for inclusion. The final study sample was determined based on strict inclusion and exclusion criteria.
Inclusion Criteria: complex neurological phenotype involving a combination of syndromes (e.g., epilepsy, cognitive impairment, movement disorders, neurodegeneration, psychiatric symptoms, peripheral neuropathy), clinical suspicion of hereditary etiology, progressive or relapsing disease course with unknown etiology, lack of definitive diagnosis following clinical and instrumental investigations, CP phenocopies (children initially diagnosed with cerebral palsy on clinical grounds, with normal neuroimaging but an atypical or progressive clinical course), developmental and epileptic encephalopathy phenotype (early-onset, drug-resistant seizures with significant global developmental impairment in the absence of acquired causes).
Exclusion Criteria: established non-genetic causes (infectious or toxic CNS injuries and vascular brain lesions—excluded by thorough clinical history review, TORCH serology, toxicology screening, and neuroimaging for infarcts/hemorrhages), neuromuscular diseases confirmed by electromyography, muscle biopsy, or molecular genetic testing, specific nosologies (e.g., isolated epilepsy, confirmed chromosomal abnormalities), significant structural brain abnormalities detected by MRI or CT that do not require genetic confirmation.
After applying these criteria, the final study sample included 81 patients. This sample size is sufficient for the objectives of the study, given the epidemiological rarity of the conditions under investigation and the stringent inclusion/exclusion criteria. This initial cohort provides a valuable pilot sample for assessing the feasibility and clinical utility of an integrated diagnostic approach in children with suspected rare neurological diseases.
2.4. Deep Phenotyping
Each patient underwent a comprehensive clinical assessment including detailed history-taking, neurological examination, and review of instrumental findings [
4]. The goal of this phase was to standardize and systematize clinical data for subsequent correlation with genetic findings. Deep phenotyping was performed using the HPO system, allowing for structured and unified documentation of clinical manifestations.
Stages of Deep Phenotyping:
Clinical Examination and History Taking: Each patient underwent a detailed physical and neurological examination, assessing both central and peripheral nervous systems. History taking focused on identifying neurological and somatic symptoms, their onset, progression, and dynamics.
HPO-Based Coding of Clinical Features: Identified clinical manifestations were converted into standardized terms using the HPO system. Each symptom, its severity, and specificity were recorded using appropriate HPO codes. Special attention was paid to the correct term hierarchy: parent terms define general clinical categories, while child terms specify particular manifestations within those categories. For example, if a patient had seizures, the parent term “Seizure” (HP:0001250) was used. If generalized seizures were present, the child term “Generalized clonic seizure” (HP:0011169) provided further detail. Similarly, “Severe muscular hypotonia” was recorded as HP:0006829.
Phenotypic Profile Formation: Based on the coded HPO terms, an individual phenotypic profile was created for each patient. This profile served as an integrated clinical map, reflecting all detected symptoms and their characteristics. For instance, if a patient presented with generalized clonic seizures and severe muscular hypotonia, their profile would include HP:0011169, and HP:0006829. This coding ensured a standardized and precise clinical description, which is critical for subsequent genetic interpretation.
Phenotype Verification: At this stage, the phenotypic profiles were analyzed to eliminate redundant or non-informative data. Each phenotypic feature was assessed for clinical relevance and consistency with known disease patterns. Results were discussed during interdisciplinary case reviews. All verified data were uploaded to the 3billion laboratory portal to maintain accuracy and support the precise interpretation of genetic findings.
2.5. Molecular Genetic Analysis
To identify the genetic causes of the disorders, ES was performed at the 3Billion laboratory in Seoul, South Korea. Genomic DNA was extracted from peripheral blood or dried blood spots using standard protocols. Exome capture was performed using the xGen Exome Research Panel v2 (Integrated DNA Technologies, Coralville, IA, USA), supplemented with mitochondrial and custom panels. Sequencing was carried out on the NovaSeq X (Illumina, San Diego, CA, USA), achieving an average depth of 140× and ≥20× coverage for 99.6% of the targeted regions. Sequencing quality met clinical-grade standards (CAP, CLIA certified).
Raw data were processed using 3Billion’s bioinformatics pipeline EVIDENCE v4.2, which incorporates GATK (v4.4.0) for SNV/INDEL calling, Manta for structural variant detection, 3bCNV for copy number variation analysis, and ExpansionHunter, MELT, and AutoMap for repeat expansions, mobile element insertions, and regions of homozygosity, respectively. Variant annotation was performed using Ensembl’s Variant Effect Predictor (VEP v104.2) [
11].
Variants were filtered using a multi-parameter approach. First, allele frequency thresholds were applied, excluding variants with a minor allele frequency (MAF) ≥ 1% in the gnomAD v4.1.0 database. Second, predicted pathogenicity was assessed, with prioritization given to protein-truncating variants such as frameshift, nonsense, and canonical splice-site changes, as well as to missense variants predicted to be deleterious by multiple in silico tools, including SIFT, PolyPhen-2, and Combined Annotation Dependent Depletion (CADD). Third, inheritance patterns were considered, taking into account family history and zygosity (e.g., homozygous, compound heterozygous, de novo). Finally, only high-confidence variants with a sequencing depth of at least 20× were retained for analysis; low-quality or ambiguous variants were subjected to orthogonal validation by Sanger sequencing.
Variant classification was performed by 3billion Inc. (Seoul, Republic of Korea) using the validated Evidence
® platform, which implements ACMG/AMP guidelines with ClinGen specifications. This pipeline integrates multiple in silico prediction tools (including SIFT, PolyPhen-2, REVEL, CADD, and SpliceAI). The PP3 criterion was applied only when several algorithms concordantly indicated a deleterious effect, whereas BP4 was applied when predictors consistently supported a benign effect. These assignments were generated automatically by the laboratory pipeline. For transparency, we have
Supplemented Table S1 with the ACMG evidence codes used for each variant, including PP3 or BP4 where applicable. For each variant, inheritance pattern was annotated according to OMIM/HGNC. Zygosity (heterozygous, homozygous, compound heterozygous, hemizygous, homoplasmic) was determined from WES data.
CNVs were identified from exome sequencing data and subsequently classified according to the ACMG/ClinGen technical standards for constitutional CNV interpretation. The classification framework incorporated genomic coordinates, gene content, haploinsufficiency and triplosensitivity scores, population frequency data (gnomAD-SV), ClinVar records, and published literature.
2.6. Reverse Phenotyping
Reverse phenotyping—the process of correlating identified genetic variants with a patient’s clinical manifestations to clarify their significance—was conducted using the OMIM database [
12]. The clinical profile of each patient was compared with phenotypic descriptions associated with the identified genetic variants. The genetic findings were analyzed during interdisciplinary case discussions involving neurologists and neurogeneticists, during which their diagnostic relevance and interpretation were assessed, including for variants of uncertain significance (VUS). VUSs were documented separately and were not included in the calculation of the primary diagnostic yield. Each VUS was systematically re-evaluated by reverse phenotyping in the context of patient-specific clinical features. Multidisciplinary team review was conducted to assess genotype–phenotype correlations, while final classification followed ACMG/ClinGen guidelines.
2.7. Evaluation of the Reduction in Diagnostic Odyssey
This study assessed the effectiveness of the diagnostic approach in terms of reducing the so-called “diagnostic odyssey”—the period from the appearance of the first symptoms to the establishment of a final diagnosis.
The analysis was based on two key parameters:
Duration of the Diagnostic Process—the time interval from the onset of the first symptoms to the patient’s inclusion in the study, as well as from the time of inclusion to the final diagnosis, was examined. This allowed the evaluation of whether the new diagnostic algorithm accelerated the diagnostic process.
Diagnostic Burden—the total number of diagnostic tests (e.g., specialist consultations, MRI, biopsies, etc.) conducted before and after the patient’s inclusion in the study was analyzed. This parameter enabled the assessment of whether the new diagnostic intervention reduced the number of additional investigations or, conversely, required more.
The obtained results will help determine the extent to which the proposed approach contributes to diagnostic optimization, reduction in time and resource expenditures, and improvement in the quality of medical care for patients.
2.8. Statistical Analysis
Statistical analysis was performed using R software (version 4.3.1) and JASP (version 0.18.1). Descriptive statistics for quantitative variables were presented as median, interquartile range (IQR), range, and standard deviation, depending on the distribution of the data. Categorical variables were described using absolute numbers and percentages.
Normality of distribution was assessed using the Shapiro–Wilk test and visual inspection of Q–Q plots. Due to deviations from normal distribution, Wilcoxon signed-rank test was used to compare paired dependent samples.
To quantify the effect size, Cohen’s d coefficient (adapted for paired samples) was calculated. The level of statistical significance was set at p < 0.05 for two-tailed tests. All calculations were conducted with 95% confidence intervals.
3. Results
3.1. Results of Deep Phenotyping
The study included 81 patients with complex neurological phenotypes. Age at the time of evaluation ranged from 6 months to 17 years (median: 6 years; interquartile range: 4–11.5 years).
Almost all patients exhibited psychomotor developmental delay and intellectual disability. Key demographic and clinical features, including sex distribution, epilepsy prevalence, motor function, and intellectual disability, are summarized in
Figure 1.
Each patient’s clinical phenotype was standardized and structured using HPO terms. On average, 12 HPO terms were assigned per patient (median: 11; range: 2–30), reflecting the spectrum of clinical manifestations. The most common neurological features included global developmental delay/intellectual disability, epilepsy, infantile hypotonia, and spastic paresis, among others. Nearly all patients exhibited complex phenotypes with multiple (two or more) manifestations, and in over half of the cases, 10 to 15 HPO terms were used to describe the phenotype.
The development of such detailed phenotypic profiles allowed for a standardized clinical description, which is essential for the interpretation of subsequent genetic testing results.
Figure 2 presents the distribution of the number of HPO terms per patient. A detailed clinical characterization of the probands is presented in
Table S2.
3.2. ES Results
In the study group of 81 patients, genomic variations were identified in 43 individuals (53.1%), including single nucleotide variants (SNVs) in 38 patients (46.9%) and copy number variations (CNVs) in 5 patients (6.2%).
Among the SNVs, 29 patients (35.8%) carried pathogenic or likely pathogenic variants, which were considered confirmed molecular diagnoses. An additional 9 patients (11.1%) harbored VUSs, which were reported separately and not included in the diagnostic yield (
Table 1 and
Table S1). In one patient, two different SNVs were identified in the
IDS and
MKKS genes. In total, 39 rare variants were detected across the cohort; according to ACMG criteria, 18 were classified as pathogenic, 12 as likely pathogenic, and 9 as VUS. The 39 variants were distributed across 39 different genes, with the most frequently affected being
SCN1A,
TSC2, and
ARID1B, in which two different variants were identified in distinct patients. Variants were detected in 27 patients with autosomal dominant genes, 9 with autosomal recessive genes, 3 with X-linked genes, and 1 with a mitochondrial variant. Within the autosomal recessive group, most patients carried homozygous or compound heterozygous variants, while in one case a single heterozygous variant was identified (
MTHFS). Detailed information on inheritance patterns and zygosity is provided in
Supplementary Table S1.
A significant portion of the identified variants was associated with developmental and epileptic encephalopathies (DEE). Pathogenic or likely pathogenic mutations were identified in genes associated with DEE, including KCNQ2 (DEE7), SCN1A (Dravet syndrome, DEE type 6), SCN8A (DEE13), CDKL5 (DEE2), WWOX (DEE28), DNM1 (DEE31A), and GRIN2A (focal epilepsy with speech disorder and intellectual disability). Altogether, variants in DEE-related genes accounted for approximately 20% of all identified mutations.
Another large group of findings (35.5%) involved genes associated with neurodevelopmental disorders (intellectual disability, developmental and behavioral impairments). These included, for example, KDM6B and KCND2 (both associated with autosomal dominant forms of developmental delay), UBAP2L and BRAT1 (autosomal recessive neurodevelopmental syndromes), as well as TRIO, CUL4B, DYRK1A, and others.
In addition, a number of identified variants were linked to hereditary syndromic disorders: in our cohort, mutations were found that are associated with tuberous sclerosis (TSC1/TSC2), Costello syndrome (HRAS), Kabuki syndrome (KMT2D), Smith–Magenis syndrome (RAI1), Lowe syndrome (OCRL), Bardet–Biedl syndrome (MKKS), and other monogenic disorders.
Of particular note are the variants related to inherited metabolic diseases: in particular, pathogenic mutations were identified that cause Salla disease (SLC17A5) and L2-hydroxyglutaric aciduria (L2HGDH), as well as a mitochondrial mutation associated with MELAS syndrome (MTTL1).
The identified CNVs included deletions and duplications of various chromosomal regions associated with specific phenotypic manifestations (
Table 2). All CNVs were classified as pathogenic or likely pathogenic according to ACMG standards and were considered confirmed molecular diagnoses. In Case 18, a duplication in 9p24.3p22.2 was detected in the patient, but it was not associated with any known syndromes. However, there have been reports of duplications partially or completely involving this genomic region, in which individuals exhibited phenotypic similarities to this patient [
13,
14,
15].
3.3. Results of Reverse Phenotyping
In the group of 29 patients with pathogenic or likely pathogenic variants, reverse phenotyping allowed for refinement of the clinical diagnosis in 100% of cases. In all of these patients, the clinical features were fully consistent with the identified genetic variants. Reverse phenotyping also contributed to the identification of additional phenotypic features in 10 patients (34.5%). During targeted re-evaluation, previously undocumented findings were discovered, including metabolic abnormalities, dysmorphic features, clinically insignificant cardiac arrhythmia, and others.
Among the 9 patients with VUS, an in-depth analysis was performed using reverse phenotyping combined with interdisciplinary discussion. In 8 of these 9 cases (88.9%), the refined phenotypic profile was consistent with published descriptions of disorders associated with the respective genes. This provided supportive clinical evidence for a potential pathogenic role; however, the variants remained formally classified as VUSs and were not included in the diagnostic yield. The remaining case involved a patient with a single heterozygous variant in the autosomal recessive gene MTHFS, in whom no phenotypic correlation could be established.
When formulating the presumptive genetic interpretation, the following factors were considered: concordance between phenotype and known disease manifestations, results of repeated phenotypic assessment using HPO terms, and conclusions from interdisciplinary case review. Although VUSs did not undergo reclassification, these analyses strengthened their clinical relevance in the individual context.
3.4. Outcomes of the Reduction in Diagnostic Odyssey
Prior to inclusion in the study, the median time interval from the onset of the first symptoms to participation in the study was 72 months (range: 6–204 months) (
Figure 3). Following participation in the study and the initiation of molecular genetic testing, this period was reduced to up to 5 months.
Although visual dispersion appears notable, most diagnoses were achieved within a narrow 4–5 month window, corresponding to a post-study SD of 0.70 months.
Before undergoing genetic testing, patients underwent an average of 20 different diagnostic procedures (
Figure 4). After the implementation of genetic testing and targeted confirmatory investigations, the median number of additional tests was 2 (
Table S2).
A comparison of these indicators before and after study participation demonstrates a marked reduction in the diagnostic odyssey. As a result of applying the new diagnostic approach, the average time to diagnosis was reduced by approximately 19-fold compared to the initial period, and the number of required diagnostic procedures decreased more than tenfold.
The Q–Q plots demonstrate deviations from normality in both the duration of the diagnostic odyssey and the number of diagnostic procedures before genetic testing, indicating non-normal data distribution (
Figure 5).
To assess the significance of differences in the duration of the diagnostic process before and after inclusion in the study, statistical data analysis was performed. The Shapiro–Wilk test revealed a deviation from normal distribution (p < 0.05) in both groups, which justified the use of a non-parametric comparison method. The Wilcoxon signed-rank test for paired samples was applied to assess statistical differences.
The results demonstrated that the median duration of the diagnostic odyssey was significantly reduced following implementation of the proposed diagnostic algorithm (
p = 1.91 × 10
−6). The main statistical indicators are summarized in
Table 3. To further quantify the magnitude of change, an effect size analysis was performed, yielding a Cohen’s d value of 2.43, which corresponds to a very strong effect in reducing diagnostic time.
4. Discussion
The results of our study demonstrate the high effectiveness of integrating deep phenotyping and genomic sequencing in the diagnosis of rare RPNDs. The diagnostic yield in our cohort reached 42% (34 of 81 patients received a genetically confirmed diagnosis), which is notably higher than that reported in studies employing genomic testing in the absence of standardized deep phenotyping approaches, such as HPO-based annotation. Our findings are comparable to leading international outcomes—for instance, a recent genome GS study reported a diagnostic yield of 61%, underscoring the value of comprehensive clinical assessment and reverse phenotyping [
16]. Notably, to our knowledge, no similar large-scale studies integrating structured phenotyping and genomic diagnostics have been conducted in Central Asia, highlighting the novelty and regional significance of our work.
All patients in our cohort underwent systematic deep phenotyping, including thorough medical history, neurological examination, and standardized documentation using HPO. While some of the most frequently used HPO terms, such as “global developmental delay”, are non-specific and appear across numerous genetic conditions, they were never used in isolation. Instead, they were part of a broader phenotypic profile constructed using 5–30 HPO terms per patient. We did not perform statistical analysis to assess the correlation between term frequency and diagnostic outcome, as the study was not powered for this purpose. However, such an analysis could provide further insights in future larger-scale studies. This structured approach enabled precise alignment between clinical manifestations and ES results.
Building upon the diagnostic yield analysis, we further examined the role of reverse phenotyping, particularly in the context of VUSs. Reverse phenotyping further uncovered previously unrecognized features in a substantial subset of patients, thereby strengthening genotype–phenotype correlations. Such methodologies are increasingly endorsed in the current literature as effective strategies for interpreting VUSs [
16]. Despite the lack of blinded validation, our experience underscores the practical utility of reverse phenotyping in everyday clinical settings, especially when performed by trained clinicians with access to both phenotypic and genetic information. This strategy facilitated timely and context-sensitive interpretation of genomic data and proved feasible in a resource-limited regional network.
In addition, eight patients (9.9%) carried variants of uncertain significance (VUS). Although these variants were not included in the primary diagnostic yield under ACMG criteria, reverse phenotyping supported their potential clinical relevance, indicating that the true diagnostic contribution of the approach may be even higher pending future validation.
From a healthcare delivery perspective, this integrated diagnostic model resulted in a marked reduction in the diagnostic odyssey. Prior to inclusion, the median duration of the diagnostic process was 72 months, with a median of 20 diagnostic procedures per patient. Following the implementation of our protocol, diagnoses were achieved within 4–5 months and required only two confirmatory investigations. Although we stated that the median number of pre-study diagnostic tests was approximately 20, this figure represents an average across a clinically heterogeneous cohort. As shown in
Table S2, individual patients underwent a variable number of procedures depending on disease complexity, ranging from 7 to 33 prior to exome sequencing. This inter-case variability is an important reflection of real-world diagnostic trajectories and was not intended to imply uniform testing. These results align with global initiatives aimed at streamlining diagnostics for rare diseases. While genomic testing incurs initial costs, recent analyses show that early deployment of such technologies can substantially reduce cumulative diagnostic expenses [
17]. Our data reinforce this notion, demonstrating that a single, well-targeted sequencing test can replace years of fragmented, inconclusive diagnostics.
Although these findings are compelling, we acknowledge that the broad range of pre-study diagnostic durations (6 to 204 months) introduces inter-case variability that may affect the interpretability of the reported median and effect size (Cohen’s d = 2.43). This variation likely reflects both clinical heterogeneity and disparities in access to timely genetic testing. While no adjustment for disease severity was made in the current analysis, the consistently short diagnostic timeline post-intervention suggests that the integrated approach may offer robust benefits across a wide range of clinical contexts. Future studies may consider stratified analyses to further refine the impact of disease complexity on diagnostic timelines.
Beyond economic considerations, timely diagnosis confers significant intangible benefits—alleviating familial anxiety and enabling earlier clinical management. Thus, the integrated approach presented here not only enhances diagnostic efficacy but also supports the broader goal of sustainable healthcare by promoting more rational resource allocation and improving the lived experiences of affected families.
Comparison with other cohorts reveals that our diagnostic performance is consistent with or surpasses global benchmarks. The proportion of genetically confirmed diagnoses in neurodevelopmental disorders is reported to range between 30% and 50%, depending on clinical severity and sequencing methodology [
18,
19,
20,
21]. Our study focused on patients with severe phenotypes—combinations of epilepsy, intellectual disability, and complex syndromic features—many of which are known to be monogenic, likely contributing to the high diagnostic yield.
Our findings illustrate a broad genetic spectrum, including well-established pathogenic variants (e.g., RAI1 in Smith–Magenis syndrome, KCNQ2 in early infantile epileptic encephalopathy) and less characterized gene-disease associations (e.g., a novel KDM6B variant likely linked to developmental delay). Notably, several molecular diagnoses diverged from the initial clinical hypotheses, highlighting the diagnostic value of ES in cases with atypical or overlapping phenotypes.
Despite methodological similarities to prior studies, our work is unique in its demonstration of a structured integration of deep and reverse phenotyping into routine clinical pathways in a resource-limited region, offering a reproducible model for other low-resource healthcare systems.
Implementation & Feasibility
The present study was designed as a pragmatic implementation pilot rather than a comprehensive sustainability model. The pathway combined standardized HPO–based data entry, outsourced ES, and concise multidisciplinary reverse phenotyping, aiming to minimize the infrastructure burden on participating hospitals. Importantly, HPO tagging was performed using freely available online platforms, eliminating the need for licensed software or specialized local bioinformatics capacity. Sequencing was conducted by an external accredited laboratory under existing institutional agreements, which allowed sample processing without new capital investment in sequencing equipment. Case discussions were structured as brief (<30 min) multidisciplinary huddles that could be conducted remotely, ensuring flexibility and minimal disruption to daily clinical practice.
These features demonstrate short-term feasibility of the approach within a resource-limited context. The operational requirements were restricted to informed consent, sample logistics, and standardized digital data entry, all of which were achievable without substantial new infrastructure. Nonetheless, long-term sustainability will depend on stable funding mechanisms, streamlined procurement processes, and the development of local expertise in variant interpretation. We explicitly recognize that a formal cost-effectiveness analysis and health-economic modeling were beyond the scope of this pilot study. Scale-up is currently planned through integration with the regional rare disease registry, which may provide a platform for training, case aggregation, and pooled procurement strategies. This stepwise integration underscores the potential feasibility of the pathway while acknowledging that durable sustainability requires broader policy support and financial commitment.
This study was intentionally designed as a pilot implementation project rather than a controlled trial. The absence of a control group reflects the pragmatic focus on feasibility and diagnostic yield in a real-world, resource-limited setting, where constructing a comparator arm would have required substantially greater resources and extended timelines. Importantly, variants of uncertain significance (VUS) were not counted toward the primary diagnostic yield. Instead, they were reported separately and systematically re-assessed through reverse phenotyping and multidisciplinary review. While this process provided valuable clinical insights and supported hypothesis generation, no formal reclassification of VUS was performed, in strict accordance with ACMG standards. These methodological choices ensure that the diagnostic outcomes reported here are conservative and clinically robust, while also highlighting areas for future research including functional validation studies and the incorporation of comparator cohorts.
5. Limitations and Future Directions
While this study demonstrates important strengths, several limitations must be acknowledged, many of which reflect the realities of conducting genomic diagnostics in resource-limited settings.
First, the study was designed as a pragmatic pilot implementation without a control group. This choice was deliberate to assess feasibility under real-world conditions, but it limits direct comparisons with standard diagnostic workflows. Future studies incorporating comparator cohorts will be necessary to fully quantify the incremental benefit of this approach.
Second, genomic testing relied on exome sequencing rather than genome sequencing. Although exome sequencing is more affordable and widely available, it cannot reliably detect intronic, structural, or mosaic variants. This limitation likely explains part of the 47% of patients who remained undiagnosed. Integrating genome sequencing, long-read sequencing, and epigenetic testing could further increase diagnostic yield in future work.
Third, the cohort was drawn from a single geographic region and enriched for children with severe, complex phenotypes. While this design reflects the referral pattern to tertiary care and helps to demonstrate feasibility in the most challenging cases, it may limit the generalizability of the findings to broader pediatric populations. Larger, multi-regional studies will be required to validate the diagnostic performance of this model.
Fourth, VUSs were reported separately and were not included in the diagnostic yield. Reverse phenotyping supported a plausible clinical role in several cases, but no functional validation was performed and no formal reclassification occurred. This underscores the importance of ongoing follow-up, segregation analysis, and laboratory studies to clarify the pathogenicity of unresolved variants.
Finally, retrospective collection of certain pre-diagnostic data (e.g., diagnostic timelines, number of prior procedures) may have introduced recall or documentation bias, although prospective phenotyping and standardized HPO annotation were applied consistently.
Despite these limitations, the study provides a unique pilot dataset and demonstrates the feasibility of integrating deep phenotyping and genomic testing in a resource-limited region. Future directions include expansion to genome sequencing, incorporation of functional assays, and validation in larger multi-center cohorts to refine genotype–phenotype correlations and establish clinical utility.
6. Conclusions
To our knowledge, this is one of the first multi-center studies in Central Asia to demonstrate the successful integration of deep phenotyping and advanced genomic diagnostics in pediatric neurology. Unlike isolated case reports, we present a reproducible and scalable model built on interdisciplinary collaboration, standardized clinical data collection, phenotype-driven variant prioritization, and reverse phenotyping. This diagnostic framework can serve as a prototype for establishing rare disease centers in other resource-limited settings.
In settings where families affected by rare disorders face significant emotional and financial strain, shortening the diagnostic journey from years to months represents a major advancement in care. Our findings not only validate a practical diagnostic approach but also emphasize the need for continued genomic innovation in unresolved cases. Early, accurate diagnosis enables targeted management, improves outcomes, and aligns with the principles of sustainable healthcare.
In conclusion, the integration of deep phenotyping, first-line ES, and reverse phenotyping in children with rare neurological disorders yielded a high diagnostic rate (42%), well above conventional methods. This approach drastically reduced diagnostic delays and minimized unnecessary investigations, improving both patient quality of life and healthcare system efficiency. Our results support the incorporation of this model into routine practice and advocate for the broader adoption of genomics-informed, resource-conscious diagnostic strategies in pediatric neurology
7. Patents
This work resulted in the development of a diagnostic algorithm for complex neurological phenotypes using deep phenotyping methods. The algorithm is registered as a scientific intellectual property object with the Copyright Certificate No. 56915, issued on 17 April 2025, by the Ministry of Justice of the Republic of Kazakhstan.
Supplementary Materials
The following supporting information can be downloaded at:
https://www.mdpi.com/article/10.3390/ijtm5040047/s1, Figure S1: Diagnostic workflow; Table S1: SNV detected in the studied cohort by WES analysis; Table S2: Clinical features of the study cohort; Table S3: Implementation resources required for the HPO→ES→Multidisciplinary team pathway.
Author Contributions
Conceptualization, N.Y. and R.K.; methodology, N.Y., R.K. and N.Z.; software, R.K.; validation, N.Z. and R.K.; formal analysis, N.Y. and A.O.; investigation, N.Y. and R.K.; resources, N.Z. and G.N.; data curation, N.Y. and A.O.; writing—original draft preparation, N.Y. and S.A.; writing—review and editing, N.Y., S.A. and R.K.; visualization, N.Y. and G.N.; supervision, N.Z. and R.K.; project administration, A.O.; funding acquisition, R.K., G.N. and A.O. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by 3Billion Inc. (South Korea), CAT-RPND (
https://www.cat-genomics.com/, accessed on 10 March 2025), and the Committee of Science of the Ministry of Science and Higher Education of the Republic of Kazakhstan (Grant No. BR24992814).
Institutional Review Board Statement
The study protocol was reviewed and approved by the Ethics Committee of Khoja Akhmet Yassawi International Kazakh-Turkish University (Protocol №16, dated 8 June 2023). All procedures were conducted in accordance with the Declaration of Helsinki (2013 revision), the Convention on the Rights of the Child, and national regulations governing research involving minors.
Informed Consent Statement
Prior to inclusion in the study, all participants and their legal guardians were thoroughly informed—both verbally and in writing—about the objectives, procedures, potential risks and benefits of the research, as well as the possibility of publication of anonymized clinical and genetic data. Written informed consent for participation and publication was obtained from all legal guardians. When appropriate, verbal and written assent was also obtained from the minor participants in accordance with their age and level of understanding.
Data Availability Statement
De-identified clinical and genomic datasets generated during this study are available from the corresponding author upon reasonable request, subject to ethical and institutional review board approval.
Acknowledgments
The authors sincerely thank all the families who participated in this study for their trust, time, and cooperation. We gratefully acknowledge the contributions of pediatricians, pediatric neurologists, and clinical specialists involved in patient evaluation, recruitment, and phenotypic data collection. We thank 3Billion Inc. (South Korea) for providing ES services and variant interpretation. We also acknowledge the collaborative and scientific support of CAT-RPND, coordinated by the UCL Queen Square Institute of Neurology (London, UK). We are grateful to the South Kazakhstan Medical Academy (Shymkent, Kazakhstan) for administrative and logistical support, including documentation and international transport of biological samples. The authors thank the Khoja Akhmet Yassawi International Kazakh-Turkish University for its institutional support throughout the study. Finally, we acknowledge the Clinical and Diagnostic Center of Khoja Akhmet Yassawi International Kazakh-Turkish University (Turkestan, Kazakhstan) for providing facilities for clinical assessments, patient visits, and biological sample collection.
Conflicts of Interest
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Abbreviations
The following abbreviations are used in this manuscript:
ACMG | American College of Medical Genetics and Genomics |
CAT-RPND | Central Asia and Transcaucasia Rare Pediatric Neurological Diseases Genomic Consortium |
GMFCS | Gross Motor Function Classification System |
CNV | Copy Number Variations |
DEE | Developmental and epileptic encephalopathy |
ES | Exome Sequencing |
GS | Genome Sequencing |
HPO | Human Phenotype Ontology |
NGS | Next-Generation Sequencing |
NORD | National Organisation for Rare Disorders |
RPNDs | Rare Paediatric Neurological Diseases |
SDGs | Sustainable Development Goals |
SNVs | Single Nucleotide Variants |
VUS | Variant of uncertain significance |
References
- Ferreira, C.R. The burden of rare diseases. Am. J. Med. Genet. A 2019, 179, 885–892. [Google Scholar] [CrossRef]
- Stark, Z.; Tan, T.Y.; Chong, B.; Brett, G.R.; Yap, P.; Walsh, M.; Yeung, A.; Peters, H.; Mordaunt, D.; Cowie, S.; et al. A prospective evaluation of whole-exome sequencing as a first-tier molecular test in infants with suspected monogenic disorders. Genet. Med. 2016, 18, 1090–1096. [Google Scholar] [CrossRef]
- Makarova, E.V.; Krysanov, I.S.; Valilyeva, T.P.; Vasiliev, M.D.; Zinchenko, R.A. Evaluation of orphan diseases global burden. Eur. J. Transl. Myol. 2021, 31, 9610. [Google Scholar] [CrossRef]
- Köhler, S.; Schulz, M.H.; Krawitz, P.; Bauer, S.; Dölken, S.; Ott, C.E.; Mundlos, C.; Horn, D.; Mundlos, S.; Robinson, P.N. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am. J. Hum. Genet. 2009, 85, 457–464. [Google Scholar] [CrossRef] [PubMed]
- Robinson, P.N. Deep phenotyping for precision medicine. Hum. Mutat. 2012, 33, 777–780. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Muzny, D.M.; Reid, J.G.; Bainbridge, M.N.; Willis, A.; Ward, P.A.; Braxton, A.; Beuten, J.; Xia, F.; Niu, Z.; et al. Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N. Engl. J. Med. 2013, 369, 1502–1511. [Google Scholar] [CrossRef] [PubMed]
- Kaiyrzhanov, R.; Zharkinbekova, N.; Guliyeva, U.; Ganieva, M.; Tavadyan, Z.; Gachechiladze, T.; Salayev, K.; Guliyeva, S.; Isayan, M.; Kekenadze, M.; et al. Elucidating the genomic basis of rare pediatric neurological diseases in Central Asia and Transcaucasia. Nat. Genet. 2024, 56, 2582–2584. [Google Scholar] [CrossRef] [PubMed]
- Strong, K.; Noor, A.; Aponte, J.; Banerjee, A.; Cibulskis, R.; Diaz, T.; Ghys, P.; Glaziou, P.; Hereward, M.; Hug, L.; et al. Monitoring the status of selected health related sustainable development goals: Methods and projections to 2030. Glob. Health Action 2020, 13, 1846903. [Google Scholar] [CrossRef]
- Richards, S.; Aziz, N.; Bale, S.; Bick, D.; Das, S.; Gastier-Foster, J.; Grody, W.W.; Hegde, M.; Lyon, E.; Spector, E.; et al. ACMG Laboratory Quality Assurance Committee. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 2015, 17, 405–424. [Google Scholar] [CrossRef]
- Kernohan, K.D.; Boycott, K.M. The expanding diagnostic toolbox for rare genetic diseases. Nat. Rev. Genet. 2024, 25, 401–415. [Google Scholar] [CrossRef]
- Seo, G.H.; Kim, T.; Choi, I.H.; Park, J.Y.; Lee, J.; Kim, S.; Won, D.G.; Oh, A.; Lee, Y.; Choi, J.; et al. Diagnostic yield and clinical utility of whole exome sequencing using an automated variant prioritization system, EVIDENCE. Clin. Genet. 2020, 98, 562–570. [Google Scholar] [CrossRef]
- Wilczewski, C.M.; Obasohan, J.; Paschall, J.E.; Zhang, S.; Singh, S.; Maxwell, G.L.; Similuk, M.; Wolfsberg, T.G.; Turner, C.; Biesecker, L.G.; et al. Genotype first: Clinical genomics research through a reverse phenotyping approach. Am. J. Hum. Genet. 2023, 110, 3–12. [Google Scholar] [CrossRef]
- Capkova, Z.; Capkova, P.; Srovnal, J.; Adamova, K.; Prochazka, M.; Hajduch, M. Duplication of 9p24.3 in three unrelated patients and their phenotypes, considering affected genes, and similar recurrent variants. Mol. Genet. Genom. Med. 2021, 9, e1592. [Google Scholar] [CrossRef] [PubMed]
- The Janssen-CHOP Neuropsychiatric Genomics Working Group; Glessner, J.T.; Li, J.; Wang, D.; March, M.; Lima, L.; Desai, A.; Hadley, D.; Kao, C.; Gur, R.E.; et al. Copy number variation meta-analysis reveals a novel duplication at 9p24 associated with multiple neurodevelopmental disorders. Genome Med. 2017, 9, 106. [Google Scholar] [CrossRef] [PubMed]
- Guilherme, R.S.; Meloni, V.A.; Perez, A.B.A.; Pilla, A.L.; de Ramos, M.A.P.; Dantas, A.G.; Takeno, S.S.; Kulikowski, L.D.; Melaragno, M.I. Duplication 9p and their implication to phenotype. BMC Med. Genet. 2014, 15, 142. [Google Scholar] [CrossRef] [PubMed]
- Akgun-Dogan, O.; Bengur, E.T.; Ay, B.; Ozkose, G.S.; Kar, E.; Bengur, F.B.; Bulut, A.S.; Yigit, A.; Aydin, E.; Esen, F.N.; et al. Impact of deep phenotyping: High diagnostic yield in a diverse pediatric population of 172 patients through clinical whole-genome sequencing at a single center. Front Genet. 2024, 15, 1347474. [Google Scholar] [CrossRef]
- Runheim, H.; Pettersson, M.; Hammarsjö, A.; Nordgren, A.; Henriksson, M.; Lindstrand, A.; Levin, L.; Soller, M.J. The cost-effectiveness of whole genome sequencing in neurodevelopmental disorders. Sci. Rep. 2023, 13, 6904. [Google Scholar] [CrossRef]
- Wang, Q.; Tang, X.; Yang, K.; Huo, X.; Zhang, H.; Ding, K.; Liao, S. Deep phenotyping and whole-exome sequencing improved the diagnostic yield for nuclear pedigrees with neurodevelopmental disorders. Mol. Genet. Genom. Med. 2022, 10, e1918. [Google Scholar] [CrossRef]
- Pandey, R.; Brennan, N.F.; Trachana, K.; Katsandres, S.; Bodamer, O.; Belmont, J.; Veenstra, D.L.; Peng, S. A meta-analysis of diagnostic yield and clinical utility of genome and exome sequencing in pediatric rare and undiagnosed genetic diseases. Genet. Med. 2025, 27, 101398. [Google Scholar] [CrossRef]
- Albuquerque, A.L.B.; dos Santos, G.G.; Sadok, S.H.; Antonello, B.B.; de Jesus, L.M.; de Carvalho, M.E.A.; Mutarelli, A.; Ribeiro, P.V.Z. Diagnostic Yield of Genome Sequencing Versus Exome Sequencing in Pediatric Patients With Rare Phenotypes: A Systematic Review and Meta-Analysis. Am. J. Med. Genet. A 2025, 2025, e64146. [Google Scholar] [CrossRef]
- Stoyanova, M.; Yahya, D.; Hachmeriyan, M.; Levkova, M. Diagnostic Yield of Next-Generation Sequencing for Rare Pediatric Genetic Disorders: A Single-Center Experience. Med. Sci. 2025, 13, 75. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).