Unraveling the Androgen Receptor’s Role in Hypospadias: A Systematic Review and Meta-Analysis
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsGeneral Assessment
This manuscript addresses a relevant and understudied question: whether androgen-receptor (AR) expression differs between boys with hypospadias and age-matched controls, through a systematic review and meta-analysis. The topic is scientifically important, and the authors provide a rich mechanistic background on androgen signaling, epigenetics, and environmental disruptors.
However, in its current form, the manuscript suffers from major methodological inconsistencies, statistical concerns, and conceptual overstatements that significantly compromise the validity and interpretability of the findings.
Substantial revision is necessary before the work can be considered for publication.
Major Comments
- Critical inconsistencies in study selection (PRISMA)
There are conflicting numbers across the manuscript:
- Figure 1 indicates 28 full texts reviewed and 11 studies included.
- Section 3.1 states 38 full texts reviewed and 13 studies included.
- Table 1 lists 13 studies.
These discrepancies undermine the transparency and reproducibility of the review process.
The entire PRISMA flow, inclusion/exclusion accounting, and study counts must be corrected and made fully consistent.
- Literature search and methods description are incomplete and partially truncated
Section 2.1 begins with a clearly truncated sentence (“Table 1979. through 2025…”), suggesting an editing error. The search strategy lacks:
- complete Boolean strings,
- use of MeSH/keywords,
- exact filters applied,
- full list of databases searched.
As written, the search cannot be replicated, which is unacceptable for a systematic review.
- Potential statistical errors in the calculation of effect sizes
There is a serious risk of miscalculation in the meta-analysis:
- Authors state that SD was converted into SE for analysis.
- The metagen function typically requires SD, not SE, unless specified otherwise.
- The extremely large SMD for protein expression (SMD = 7.02; CI −7.74 to 21.78) suggests:
- scale inconsistencies,
- use of non-comparable semi-quantitative measures,
- or incorrect use of SE in place of SD.
This issue requires full clarification, including:
- explicit R code used,
- confirmation of whether SD or SE was passed to metagen,
- re-analysis if necessary.
Given this uncertainty, the current pooled protein result cannot be considered reliable.
- Extreme heterogeneity (I² ~ 100%) not appropriately addressed
Both protein and mRNA meta-analyses show near-maximum heterogeneity (I² = 100% and 99.7%).
In such conditions:
- a pooled random-effects estimate may be statistically meaningless,
- and may obscure true clinical or methodological sources of variation.
Yet, the authors:
- did not perform subgroup analyses (e.g., proximal vs distal, tissue type, assay method),
- did not conduct meta-regression beyond a simple age analysis,
- did not explore the impact of outliers.
A comprehensive heterogeneity exploration is required; otherwise, the manuscript should acknowledge that a quantitative synthesis may not be appropriate for these data.
- Overstatement of findings, especially in the Abstract and Conclusion
The abstract states that AR expression was “neither up nor down regulated” in hypospadias.
This phrasing implies biological equivalence, which is not supported by:
- wide confidence intervals,
- high heterogeneity,
- inconsistent study-level directions of effect,
- small sample sizes.
A more accurate interpretation would be:
“No consistent direction of effect could be demonstrated due to substantial heterogeneity and imprecision.”
The current wording risks misinforming readers and should be revised.
- Disconnect between mechanistic discussion and actual findings
The Discussion provides an extensive narrative review of androgen biology, epigenetics, and environmental toxicology.
Although informative, much of it is:
- weakly connected to the results,
- based on extrapolations from prostate cancer or animal models,
- not directly testable within the included data.
A tighter, more focused discussion is recommended, emphasizing:
- what the meta-analysis can conclude,
- what it cannot resolve,
- and the limitations of the underlying evidence.
- Limitations section is insufficient
The meta-analysis faces multiple major limitations—not just “small number of studies.”
The authors should explicitly state:
- inconsistency in study selection (once corrected),
- heterogeneity near 100%,
- reliance on semi-quantitative assays,
- potential extraction error via WebPlotDigitizer,
- mixing of tissue types and developmental stages,
- inability to adjust for clinical confounders.
The current limitations section is too brief and underrepresents these issues.
Minor Comments
- Formatting and editorial issues
- The manuscript header contains outdated formatting (“Int. J. Mol. Sci 2022, 23, x”), inconsistent with 2025 citations.
- Several references show formatting irregularities (e.g., spacing, author ordering).
- “will are found” in the Data Availability statement is ungrammatical.
- Table 1 ambiguities
- Danurdoro 2023 is labeled as “DNA” rather than mRNA or protein.
Please clarify what this means in terms of AR quantification.
- Provide a supplementary table detailing analytical methods (IHC, qPCR, Western blot, scoring systems).
- Funnel plots
- With <10 studies per subgroup, funnel plot interpretation is unreliable.
This caveat should be stated clearly.
- Combining proximal and distal hypospadias
- The justification (“direction of effect similar within each study”) is qualitative and insufficient.
- At minimum, a sensitivity analysis separating severities is recommended.
- Reproducibility
- If Supplemental Code 1 is essential for verifying analyses, it should be provided in a public repository (e.g., Zenodo, OSF, GitHub).
The manuscript would benefit from targeted revisions to improve clarity, precision, and readability. Several sections contain grammatical inconsistencies, truncated sentences, and ambiguous wording that may obscure key methodological details or weaken the interpretation of results. In addition, transitions between concepts—particularly in the Discussion—are sometimes abrupt, and some sentences are overly long or complex, reducing overall clarity. Improving the English will help ensure that the scientific message is communicated accurately and effectively.
Author Response
Reviewer 1:
Major Comments
Comment 1: Critical inconsistencies in study selection (PRISMA)
There are conflicting numbers across the manuscript:
- Figure 1 indicates 28 full texts reviewed and 11 studies included.
- Section 3.1 states 38 full texts reviewed and 13 studies included.
- Table 1 lists 13 studies.
These discrepancies undermine the transparency and reproducibility of the review process.
The entire PRISMA flow, inclusion/exclusion accounting, and study counts must be corrected and made fully consistent.
Response 1: Thank you for catching this. We have uploaded the correct Figure 1 to correct the discrepancies present in the paper.
Comment 2: Literature search and methods description are incomplete and partially truncated
Section 2.1 begins with a clearly truncated sentence (“Table 1979. through 2025…”), suggesting an editing error. The search strategy lacks:
- complete Boolean strings,
- use of MeSH/keywords,
- exact filters applied,
- full list of databases searched.
As written, the search cannot be replicated, which is unacceptable for a systematic review.
Response 2: We believe this was an editing issue when reformatting to meet journal requirements. We have fixed the wording, which improves the clarity of the research strategy we used.
Comment 3: Potential statistical errors in the calculation of effect sizes
There is a serious risk of miscalculation in the meta-analysis:
- Authors state that SD was converted into SE for analysis.
- The metagen function typically requires SD, not SE, unless specified otherwise.
- The extremely large SMD for protein expression (SMD = 7.02; CI −7.74 to 21.78) suggests:
- scale inconsistencies,
- use of non-comparable semi-quantitative measures,
- or incorrect use of SE in place of SD.
This issue requires full clarification, including:
- explicit R code used,
- confirmation of whether SD or SE was passed to metagen,
- re-analysis if necessary.
Given this uncertainty, the current pooled protein result cannot be considered reliable.
Response 3: We used standard deviation to calculate Cohen’s D and used SE from each study to conduct the metanalysis. Metagen and other metanalysis tools typically require standard error estimates. From ?metagen the arguments section states:
|
TE |
Estimate of treatment effect, e.g., log hazard ratio or risk difference or an R object created with pairwise. |
|
seTE |
Standard error of treatment estimate or standard deviation of n-of-1 trials. |
Also, several tutorials state standard error is required for meta-analysis calculations:
https://lukaswallrich.github.io/GoldCoreQuants/meta-analyses-a-very-brief-introduction.html
https://doing-meta.guide/pooling-es
We do believe that the current RNA and protein results cannot be considered reliable. We believe it is a mixture of inconsistent study design, but also the large amount of heterogeneity that exists within the patient populations. This further underscores the necessity for more and better research to be conducted. We hope this manuscript will draw attention to these necessary details.
Comment 4: Extreme heterogeneity (I² ~ 100%) not appropriately addressed
Both protein and mRNA meta-analyses show near-maximum heterogeneity (I² = 100% and 99.7%).
In such conditions:
- a pooled random-effects estimate may be statistically meaningless,
- and may obscure true clinical or methodological sources of variation.
Yet, the authors:
- did not perform subgroup analyses (e.g., proximal vs distal, tissue type, assay method),
- did not conduct meta-regression beyond a simple age analysis,
- did not explore the impact of outliers.
A comprehensive heterogeneity exploration is required; otherwise, the manuscript should acknowledge that a quantitative synthesis may not be appropriate for these data.
Response 4: We agree the heterogeneity is very high is this meta-analysis. This is likely due to the near 50:50 split of studies that report elevated vs diminished AR expression. When uncertainty of effect direction exists, heterogeneity will be very high. We believe that the heterogeneity is high in this study due to 1) the diversity of analytical methods used for protein analysis, and 2) the large heterogeneity that exists in patient populations. We do not believe this diminishes the usefulness of a meta-analysis.
To investigate technical contributors to heterogeneity, we included an outlier test in our analysis. In the protein data we round Rai et al and Kahana et al to be outliers. After removing these outliers, heterogeneity dropped to 96.8%, p-value became 0.0417, and SMD was 3.126 (95% CI [0.18 to 6.25]), supporting a slight increase in AR protein within hypospadias patients (Supplemental figure 1).
In the mRNA data, Qiao was identified as an outlier. When removing Qiao et al. SMD became -0.281 (95% CI [-2.44 to 1.96]) with a p-value of 0.7741 and an I2 of 65.1%, still demonstrating uncertainty in AR expression changes in hypospadias cases.
It identify if hypospadias severity describes some of the study heterogeneity we performed meta-regressions by hypospadias severity (distal, midshaft, and proximal) separately for protein and mRNA outcomes. No association was observed between severity and standardized mean difference for mRNA (distal p = 0.3653, midshaft p = 0.3615, proximal p = 0.8321) or for protein (distal p = 0.8793, proximal p = 0.7205, midshaft data were insufficient across protein studies). These results indicate that within the limits of available reporting, severity did not explain the between-study heterogeneity in AR expression.
This information has been added to respective sections in the results and the resulting figures added to supplemental figure 3.
Comment 5: Overstatement of findings, especially in the Abstract and Conclusion
The abstract states that AR expression was “neither up nor down regulated” in hypospadias.
This phrasing implies biological equivalence, which is not supported by:
- wide confidence intervals,
- high heterogeneity,
- inconsistent study-level directions of effect,
- small sample sizes.
A more accurate interpretation would be:
“No consistent direction of effect could be demonstrated due to substantial heterogeneity and imprecision.”
The current wording risks misinforming readers and should be revised.
Response 5: Thank you for pointing this out, we agree that the results need to be stated more clearly so readers are not misled. We have made the recommended changes in the abstract and conclusion. The following was changed in the abstract:
OLD: In both mRNA and protein assays, androgen receptor expression was neither up nor down regulated, suggesting that hypospadias etiology is more complicated than just the sole expression of androgen receptor.
NEW: Due to substantial heterogeneity and imprecision in both mRNA and protein assays no consistent direction of androgen receptor expression could be demonstrated, suggesting that hypospadias etiology may be more complicated than just the sole expression of androgen receptor.
The following was changed in the beginning of the discussion:
OLD: Despite many investigations into the role of androgen receptor signaling in hypospadias, findings on AR expression in hypospadias patients are inconsistent, as seen in our results. Approximately half the studies reported higher AR expression and the other half lower, and pooled effects of both protein and mRNA results were not statistically significant. These observations are consistent with prior work, which suggests that the AR-pathway disturbance is multifactorial, arising from genetic, epigenetic, and environmental mechanisms [17, 47, 48]. Thus, depending on the underlying etiology, AR expression levels can either be elevated, diminished or unchanged.
NEW: Across 13 human foreskin cohorts quantifying AR mRNA or protein, we found pooled effects were not statistically significant and with very high heterogeneity. These observations indicate that AR abundance is not a reliable, direction consistent marker of hypospadias. Variation in age-range, assay, and quantified variable did not explain the substantial variability between studies. This could further support that AR-pathway disturbance in hypospadias is multifactorial, arising from genetic, epigenetic, and environmental mechanisms [17, 47, 48].
The following was changed in the conclusion
OLD: In this systematic review and meta-analysis of human hypospadias studies, we found no uniform difference in androgen-receptor expression between hypospadias patients and controls, with bidirectional and highly heterogeneous results.
NEW: In this systematic review and meta-analysis of human hypospadias studies, we found no consistent direction of effect in androgen-receptor expression between hypospadias patients and controls, likely due to imprecision and highly heterogeneous results.
Comment 6: Disconnect between mechanistic discussion and actual findings
The Discussion provides an extensive narrative review of androgen biology, epigenetics, and environmental toxicology.
Although informative, much of it is:
- weakly connected to the results,
- based on extrapolations from prostate cancer or animal models,
- not directly testable within the included data.
A tighter, more focused discussion is recommended, emphasizing:
- what the meta-analysis can conclude,
- what it cannot resolve,
- and the limitations of the underlying evidence.
Response 6: The following was added at the end of section 4.1: These mechanisms help interpret the increased AR expression in our study findings but are not directly testable within our dataset.
At the end of section 4.2 we added: These observations are consistent with lower AR expression in some contexts, but our meta-analysis cannot resolve specific causality.
Comment 7: Limitations section is insufficient
The meta-analysis faces multiple major limitations—not just “small number of studies.”
The authors should explicitly state:
- inconsistency in study selection (once corrected),
- heterogeneity near 100%,
- reliance on semi-quantitative assays,
- potential extraction error via WebPlotDigitizer,
- mixing of tissue types and developmental stages,
- inability to adjust for clinical confounders.
The current limitations section is too brief and underrepresents these issues.
Response 7: To address these points, we have greatly expanded this section, incorporating your suggestions. Below is the added section:
4.6 Strengths and Limitations
This review has several strengths. We conducted a comprehensive, PRISMA-guided search with data extraction across four decades of literature, applying prespecified inclusion and exclusion criteria. Heterogeneous assays were harmonized using standardized mean differences, and random-effects models were used with full reporting of heterogeneity and small-study assessments. When extracting data from studies with figures, we performed calibrated graphical extraction to minimize error while maintaining transparency. The analysis of these studies supports a clear conclusion that there is no uniform shift in AR expression among these studies and translate these findings into recommendations to improve future studies.
This study also has important limitations, which mostly the quality and consistency of the underlying studies rather than the meta-analysis approach. Most cohorts were small and single-center, with broad age ranges that often spanned mini-puberty, a period of dynamic AR regulation. Despite the age-matching of the controls, the heterogeneity in developmental timing could have obscured true effects. In terms of data collection, the studies had non-standardized IHC scoring, incomplete variance reporting, and frequent reliance on figure-only data, which required graphical extraction and could introduce errors even with calibration. The funnel plot asymmetry in protein studies showed small-study effects or publication bias, and the mRNA funnel plot, while more symmetric, still had low power. Reporting of key modifiers including anatomic site, tissue compartment, hypospadias severity, and pre-operative hormone exposure were sparse and inconsistent. Assay heterogeneity and different quantification methods further increased the variance. Because most of the studies reported measures from bulk tissue, potential compartmental dilution could not be addressed in our analysis. These limitations define the minimal reporting and design standards needed for future studies capable of resolving AR pathway involvement in hypospadias pathogenesis.
Minor Comments
- Formatting and editorial issues
- The manuscript header contains outdated formatting (“Int. J. Mol. Sci 2022, 23, x”), inconsistent with 2025 citations.
We believe this was added in by the journal when submitted. We have fixed it.
- Several references show formatting irregularities (e.g., spacing, author ordering).
We apologize for this. We believe this was due to reformatting when submitting to the journal
- “will are found” in the Data Availability statement is ungrammatical.
We have corrected this grammatical error.
- Table 1 ambiguities
See previous response. These ambiguities have been addressed.
- Danurdoro 2023 is labeled as “DNA” rather than mRNA or protein. Please clarify what this means in terms of AR quantification.
This was entered incorrectly; we have fixed this issue.
- Provide a supplementary table detailing analytical methods (IHC, qPCR, Western blot, scoring systems).
We have now detailed the techniques and methods used to analyze the protein data in supplemental table 1.
- Funnel plots. With <10 studies per subgroup, funnel plot interpretation is unreliable. This caveat should be stated clearly.
Yes, we agree with this statement. We had a statement in the figure legends and have also added to following to the results section.
Being that funnel plots with a sample size <10 can be unreliable; these data should be interpreted cautiously
- Combining proximal and distal hypospadias
- The justification (“direction of effect similar within each study”) is qualitative and insufficient.
- At minimum, a sensitivity analysis separating severities is recommended.
We performed meta-regressions by hypospadias severity (distal, midshaft, and proximal) where indicated, separately for protein and mRNA outcomes. No association was observed between severity and standardized mean difference for mRNA (distal p = 0.3653, midshaft p = 0.3615, proximal p = 0.8321) or for protein (distal p = 0.8793, proximal p = 0.7205, midshaft data were insufficient across protein studies). These results indicate that within the limits of available reporting, severity did not explain the between-study heterogeneity in AR expression.
- Reproducibility
- If Supplemental Code 1 is essential for verifying analyses, it should be provided in a public repository (e.g., Zenodo, OSF, GitHub).
Supplemental code 1 was uploaded to GitHub at https://github.com/amatociro/AR-and-hypospadias-meta-analysis-code/blob/main/Supplemental%20Code%201
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this review, Ko S et al. conducted a quantitative meta-analysis comparing androgen receptor expression in hypospadias patients and healthy boys with a mini review summarizing the various mechanisms. Manuscript is well written and I felt this is almost suitable for publication.
If possible, in Figure 5, it would be better to draw regulators of AR expression such as HOXA13 as they mentioned.
Author Response
Reviewer 2:
In this review, Ko S et al. conducted a quantitative meta-analysis comparing androgen receptor expression in hypospadias patients and healthy boys with a mini review summarizing the various mechanisms. Manuscript is well written and I felt this is almost suitable for publication.
Comment 1: If possible, in Figure 5, it would be better to draw regulators of AR expression such as HOXA13 as they mentioned.
Response 1: Thank you for this comment, we have added the regulators of AR to the figure.
