Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Reviewer Comments

This manuscript presents a computational framework suggesting that RNA-binding protein (RBP) occupancy patterns derived from eCLIP data can predict lncRNA subcellular localization. The topic is potentially interesting because subcellular localization is a key determinant of lncRNA function. However, in its current form, the study remains largely descriptive and several conclusions are overstated relative to the evidence provided. Significant revisions are required before the manuscript can be considered for publication.

Major Comments

Novelty is overstated.

The central claim of the manuscript is the existence of an RBP occupancy “localization code”. However, the presented analyses demonstrate a statistical association rather than a biological code. The reported predictive performance is only moderate (AUC 0.71–0.77, R² approximately 0.25), indicating that the majority of localization variance remains unexplained. Throughout the manuscript, terms such as “code”, “predicts”, and “mechanistically interpretable” imply a stronger biological relationship than is supported by the data. The authors should substantially moderate their language and clearly distinguish correlation from mechanism.

Lack of independent biological validation.

All analyses rely on publicly available ENCODE datasets and computational modeling. No experimental validation is provided.

For example, the manuscript identifies several RBPs (e.g., BUD13, FUS, DDX3X, IGF2BP1) as major contributors to localization prediction. However, no perturbation experiments, CRISPR knockdown studies, CLIP validation, or localization assays are performed to demonstrate that these factors causally regulate lncRNA localization.

Without experimental validation, it is difficult to assess whether the identified RBPs represent true biological determinants or merely correlate with other transcriptomic features.

Cell-line generalizability remains weakly supported.

The manuscript repeatedly emphasizes cross-cell-line reproducibility. In reality, only two ENCODE cell lines (K562 and HepG2) are analyzed. A transfer AUC of 0.71 and coefficient sign agreement of 72% indicate only partial conservation.

Given the known cell-type specificity of RBP expression, localization mechanisms, and transcriptome architecture, it is premature to claim broad generalizability. At minimum, the Discussion should explicitly acknowledge that conclusions are currently restricted to the limited cellular contexts examined.

Potential confounding factors remain insufficiently addressed.

The occupancy matrix is generated from binary peak overlap across entire gene loci, including intronic regions. Consequently, occupancy may be strongly influenced by:

gene length,
transcript abundance,
transcriptional activity,
intron content,
splicing complexity.

Although length controls are included, these factors are highly correlated and difficult to disentangle.

Previous studies have shown that gene architecture itself contributes substantially to lncRNA localization. The manuscript does not convincingly demonstrate that RBP occupancy provides information independent of these underlying transcript features.

Biological interpretation is selective.

The manuscript highlights examples consistent with existing knowledge (e.g., spliceosomal factors associated with nuclear localization and translation-associated factors associated with cytoplasmic localization). However, no systematic enrichment analysis is presented.

It remains unclear whether the observed coefficient distribution truly reflects compartment-specific biology or whether the authors selected representative examples that support the proposed model.

A formal pathway enrichment or functional category analysis of positively and negatively weighted RBPs would substantially strengthen the biological interpretation.

Missing discussion of dynamic regulation.

Subcellular localization is highly dynamic and context-dependent. Localization can vary with cell state, stress response, differentiation, infection, and disease progression.

Recent studies have highlighted that RNA-binding proteins can regulate transcriptional programs, RNA processing, apoptosis, EMT, and cellular adaptation in highly context-dependent manners. For example, RBMX2 has been shown to regulate intron retention and epithelial apoptosis during Mycobacterium infection and was later linked to EMT-associated transcriptional reprogramming and tumor progression (Wang et al., Frontiers in Immunology, 2024; Wang et al., eLife, 2025). These observations illustrate that RBP-associated regulatory functions are highly dynamic and may not be fully captured by static occupancy measurements alone.

The authors should discuss this important limitation.

Minor Comments

The rationale for defining occupancy at the entire gene-locus level rather than exon-level resolution requires further justification.
Figure 3 highlights top coefficients but does not provide confidence intervals or stability estimates.
The manuscript should report calibration metrics in addition to AUC.
The term “localization code” should be reconsidered throughout the manuscript.
Several statements in the Abstract and Discussion imply causal relationships that are not demonstrated by the presented analyses.

Overall Assessment

The study provides an interesting reanalysis of publicly available datasets and suggests that RBP occupancy contains information relevant to lncRNA localization. However, the current manuscript remains primarily correlative, lacks independent validation, and overstates both novelty and biological interpretation. Substantial revision and a more cautious presentation of the findings are necessary before publication.

Author Response

Author’s Reply to the Review Report — Reviewer 1

Manuscript ijms-4387146 (Revision 1) Title (revised): RNA-Binding Protein Occupancy Composition Predicts Long Noncoding RNA Subcellular Localization Author: Hidenori Tani

Dear Reviewer,

Thank you for the constructive and detailed evaluation. I have revised the manuscript comprehensively. The most substantial changes in response to your report are: the central claim has been reframed from a biological “code” to an interpretable, moderate-strength statistical association (including a new title); and four new analyses were added that directly answer your requests — a confound-independence analysis, a systematic functional-category enrichment, bootstrap confidence intervals for the coefficients, and calibration metrics. All reported numbers are regenerated by the analysis scripts and are locked by a test suite (tests/test_revision_analyses.py, 11 tests). Changes are highlighted in the revised manuscript. A point-by-point response follows.

General comment (descriptive; conclusions overstated relative to evidence). I agree and have moderated the framing throughout. The thesis is now stated as a moderate-strength, correlational association rather than a biological “code,” and the new analyses below provide the systematic and quantitative support that was missing.

Major 1 — Novelty/“code” overstated; distinguish correlation from mechanism. Addressed. (i) The word “code” has been removed from the title and the manuscript and replaced by “signed RBP-occupancy profile” / “occupancy composition.” (ii) A dedicated paragraph in the Discussion now states explicitly that the performance is moderate (AUC 0.71–0.77), that a substantial fraction of variance is unexplained, and that the analysis is correlational and does not establish causal control. (iii) “Mechanistically interpretable” has been changed to “biologically interpretable,” and causal phrasings (e.g., “encodes”) have been removed. (Abstract; Introduction, final paragraph; Discussion, paragraphs 1–2.)

Major 2 — No independent experimental validation. I acknowledge this directly. The study is explicitly framed as a computational re-analysis. A new Discussion paragraph (“No independent experimental validation”) states that no perturbation, knockdown, orthogonal CLIP or localization assays were performed, that the contributing RBPs should be regarded as statistically informative correlates rather than validated causal regulators, and that perturbation experiments knocking down individual nuclear- or cytoplasmic-predictive RBPs would be required to establish causal control. This is stated as the primary limitation and an explicit direction for future work, while the well-calibrated, reproducible, function-coherent profile is offered as a principled way to nominate such candidates.

Major 3 — Cell-line generalizability weakly supported (only two cell lines; partial conservation). Addressed in the Discussion (“Limitations,” third point): the conclusions are now stated to be established only for the two cellular contexts examined (K562, HepG2), the 72% sign-agreement is explicitly described as partial rather than complete conservation, and the cell-type specificity of RBP expression and localization mechanisms is acknowledged as the reason that broader generalizability requires eCLIP and matched fractionation in additional cell types.

Major 4 — Confounding factors (gene length, abundance, transcriptional activity, intron content, splicing complexity). Addressed with a new analysis (Section 2.2; Methods 4.3; Figure 1d–f; scripts/loc_confound.py). The baseline was expanded to absorb transcript abundance (total cytosolic + nuclear expression), intron fraction, exon number and transcript number, in addition to length and binding amount. Each confound is itself correlated with localization (Pearson r = 0.20–0.27) and with the number of bound RBPs (r = 0.34–0.57), and including them raised the baseline cross-validated R-squared from 0.08 to 0.17. Nonetheless, RBP-occupancy composition still added a significant increment over this expanded baseline (delta-R-squared = 0.12; AUC = 0.77; Freedman–Lane p = 0.005 in K562; delta-R-squared = 0.14, AUC = 0.82 in HepG2). RBP composition therefore carries information that is not reducible to these transcript features.

Major 5 — Biological interpretation selective; no systematic enrichment. Addressed with a new systematic analysis (Section 2.3; Methods 4.7; Figure 2b; scripts/loc_enrichment.py). Every RBP in the model was assigned a priori to a nuclear-process or cytoplasmic-process functional super-class from its canonical GO biological-process / UniProt role, independently of its fitted coefficient (the full assignment is provided as Supplementary Table S1; multifunctional shuttling factors were left unclassified and excluded). Across all classified RBPs, nuclear-process factors carried significantly more nuclear-direction coefficients than cytoplasmic-process factors (Mann–Whitney p = 0.013 in K562, rank-biserial 0.28; p = 0.005 in HepG2, 0.40), and nuclear-process factors were enriched among the most nuclear-predictive RBPs (Fisher odds ratio = 3.0 in K562). The compartment coherence is therefore a systematic property of the full coefficient distribution, not a selection of examples.

Major 6 — Dynamic regulation not discussed (e.g., RBMX2; Wang et al., 2024, 2025). Addressed in the Discussion (“Limitations,” fourth point). A new passage states that subcellular localization and RBP function are dynamic and context-dependent, that a static, steady-state occupancy map of two cell lines cannot capture such dynamics, and that the profile should be read as a steady-state average. I have critically evaluated and cited the two recommended references on RBMX2, which I judge to be appropriate illustrations of context-dependent RBP regulatory function (now references [15] and [16]).

Minor 1 — Rationale for gene-locus vs exon-level occupancy. The rationale is now stated explicitly in Methods 4.2: locus-level occupancy was chosen deliberately because many RBPs (notably the spliceosomal and co-transcriptional factors that dominate the nuclear signature) bind intronic and nascent-transcript sequence that exon-level occupancy would discard.

Minor 2 — Figure 3 lacks confidence intervals / stability estimates. Addressed with a new gene-level bootstrap (1,000 replicates; Methods 4.6; scripts/loc_coef_stability.py). The revised coefficient figure (Figure 2a) now shows 95% bootstrap confidence intervals as error bars, with RBPs whose interval excludes zero shown in bold. 27 of 139 K562 coefficients (29 of 105 in HepG2) are individually stable, and the median sign-consistency is 0.86.

Minor 3 — Report calibration metrics in addition to AUC. Addressed (Section 2.2; Methods 4.5; Figure 3c; scripts/loc_calibration.py). Out-of-fold predicted probabilities give a Brier score of 0.10 (K562) and 0.13 (HepG2), both better than a prevalence-only reference, with an expected calibration error of 0.02 in both cell lines; reliability curves are shown in Figure 3c.

Minor 4 — Reconsider “localization code” throughout. Done; the term has been removed throughout (see Major 1).

Minor 5 — Abstract/Discussion imply causal relationships. Done; causal implications have been removed from the Abstract and Discussion, and the correlational nature of the analysis is stated explicitly.

Thank you again for the careful review. I believe the new analyses and the more cautious presentation have substantially strengthened the work.

Sincerely, Hidenori Tani Department of Health Pharmacy, Yokohama University of Pharmacy, Yokohama, Kanagawa, Japan hidenori.tani@yok.hamayaku.ac.jp

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear author,

The standard of the English language used throughout all manuscript sections requires improvement to ensure full reader understanding. Further thought is also suggested to improve the quality of presentation of included Figures. In summary, the content is of interest, just more care is required for improved data presentation standards and an acceptable standard of data reporting, and data interpretation. Section-by-section comments are:

Abstract:

do not use "we" or "I" in scientific writing, be more factual in descriptions. You cannot use "we" as you are a single author (and sections suffer from this issue).
do not italise or bold text unless you are conforming to field / journal conventions - which you are not here for IJMS
provide full words in Abstract to ensure reader understanding / encourage expansion of readership audience.
numerous instances of poor word choice, and incorrect sentence structures = these are all identified on the annotated PDF
Keywords require further consideration. These are to be used to attract readers and should be list in a hierarchical fashion

Inroduction:

word choice, sentence format, sentence structure issues throughout Introduction. These are all identified on the annotated PDF = please address all.
reference standards must be improved - to be order correctly, and to cite appropriate works
sentence lines 41-46 needs a lot of work in order to make sense to the reader
again, do not place text in italics or bold wherever you deem appropriate. You must conform to journal and field norms
sentence across lines 64 to 68 needs improvement to allow for full reader understanding.
Introduction is very brief, and could benefit from inclusion of additional background text.
Also ensure that major findings are described in full in the final paragraph of the Introduction

Results:

Do not use terms like "we" and "i" in scientific writing. Instead provide descriptive text
again, make sure the order of references cited is correct
full words are to be listed in full on their first mention, with the abbreviation provided in brackets immediately after. This helps with full reader understanding instead of the reader having to find the meaning of used abbreviations themselves.
many instances of poor word choice or incorrectly used words = all these are identified on the annotated PDF and must be corrected in revised manuscript.
many instances of poorly worded sentences, overly long sentences, or poorly formatted sentences = all are identified in the annotated PDF and must be addressed in the revised manuscript.
Figures 1 and 2 could be could combined to produce a larger Figure which carries greater impact
Examples of RBPs listed in the text of the results do not match completely with those provided in the Figure 3 graphic - why? Consistency is required throughout the manuscript's text
Could Figures 3 and 4 be combined to create larger Figure of great impact
Is there additional data which can be included / added to Figure 5? Currently this Figure does not carry enough weight to form a standalone Figure.
Please carefully construct all Figure titles so that they accurately describe the data being presented in each

Discussion:

quite brief and therefore could benefit from expanding out data interpretations.
Many instances of poorly selected words / incorrectly selected words / word misuse = all are identified in the annotated PDF and must be addressed in the revised manuscript.
many poorly structured, worded, formatted sentences in the Discussion = all are identified in the annotated PDF and must be addressed in the revised manuscript.

Materials and Methods:

please ensure all methods are fully detailed
present methods in the order they were used in the study / data reporting

Conclusion:

Add little and can be completely replaced with Conclusion of appropriate standard.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

needs a lot of improvement

Author Response

Author’s Reply to the Review Report — Reviewer 2

Manuscript ijms-4387146 (Revision 1) Title (revised): RNA-Binding Protein Occupancy Composition Predicts Long Noncoding RNA Subcellular Localization Author: Hidenori Tani

Dear Reviewer,

Thank you for the careful, line-level reading and for the annotated PDF. I have addressed all of the marked items and the section-by-section comments. The language has been thoroughly revised, the figures consolidated and strengthened, and the references reordered. Changes are highlighted in the revised manuscript. A point-by-point response follows; manuscript locations refer to the revised manuscript.

English / first-person / emphasis (Abstract and throughout). The first-person “we”/“our” has been removed entirely, consistent with single authorship; the text now uses descriptive, impersonal phrasing. Stray italic/bold emphasis (e.g., “composition,” “its,” “which,” “how many,” “combination”) has been removed. Abbreviations are now defined in full on first mention (e.g., enhanced crosslinking and immunoprecipitation (eCLIP); receiver-operating-characteristic area under the curve (AUC)).

Keywords. Revised and ordered from general to specific: long noncoding RNA; subcellular localization; RNA-binding protein; eCLIP; cross-validated prediction; nuclear retention.

Introduction (word choice; sentences at lines 41–46 and 64–68; brevity; references; major findings in final paragraph). The Introduction has been rewritten and expanded. The two flagged sentences (the determinants/gene-architecture sentence and the closing “Here we construct…” sentence) have been recomposed for clarity. Additional background on lncRNA localization determinants and RBP roles has been added, and the final paragraph now states the major findings in full, including the new confound, enrichment and calibration results. References are cited in the corrected order (see note below).

Results (first person; reference order; abbreviations; word choice; figures; text vs Figure 3 consistency). First-person phrasing has been removed and abbreviations defined on first use. The text-versus-figure inconsistency in the RBP examples has been resolved: the RBPs named in the text are now exactly those shown — and, where stated, the bootstrap-stable ones highlighted in bold — in the corresponding figure.

Figures (combine 1+2 and 3+4; strengthen Figure 5; figure titles). The figures were consolidated from five to three as suggested. New Figure 1 combines the resource and predictive-performance panels (former Figures 1 and 2) and adds the confound-control panels; new Figure 2 combines the signed-profile and cross-cell-line panels (former Figures 3 and 4) and adds bootstrap confidence intervals and the functional-category enrichment; new Figure 3 (formerly the under-weight Figure 5) is strengthened with calibration reliability curves and the expanded-baseline robustness bar, in addition to the imbalance-aware metrics. All figure captions have been rewritten to describe the data presented.

Discussion (expand; word choice; sentence structure). The Discussion has been substantially expanded with dedicated paragraphs on the correlational nature of the findings, the non-circular but process-linked interpretation, the relationship to sequence-based predictors, the absence of experimental validation, dynamic regulation, and a structured Limitations paragraph. The flagged long sentences have been rewritten.

Materials and Methods (full detail; present in order of use). Methods have been reorganized to follow the order of the study (4.1 Data Sources → 4.2 Occupancy Matrix → 4.3 Confound Covariates → 4.4 Predictive Model and Permutation Tests → 4.5 Calibration → 4.6 Interpretable Profile and Coefficient Stability → 4.7 Functional-Category Enrichment → 4.8 Cross-Cell-Line Validation → 4.9 Robustness and the Half-Life Comparison → 4.10 Software), and the new analyses are fully described.

Conclusion. The Conclusions section has been replaced with a concise, forward-looking statement of the contribution, its correlational and moderate-strength nature, and the experimental work required to establish causal control.

“(Results)” cross-pointer. The parenthetical “(Results)” pointer flagged in the Discussion has been removed and the sentence rephrased.

Reference ordering / format (and editorial note on references [1] and [6]). In response to the request that references be ordered correctly, all references have been renumbered in order of first citation. As a consequence, the previously numbered reference [1] (Van Nostrand et al., 2020, Nature) is now reference [9], and the previously numbered reference [6] (Statello et al., 2021, Nat. Rev. Mol. Cell Biol.) is now reference [1]; both entries follow the corrected MDPI reference format. Two references were added for the dynamic-regulation discussion (now [15] and [16]).

Thank you again for the detailed and constructive review, which has substantially improved the clarity and presentation of the manuscript.

Sincerely, Hidenori Tani Department of Health Pharmacy, Yokohama University of Pharmacy, Yokohama, Kanagawa, Japan hidenori.tani@yok.hamayaku.ac.jp

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Reviewer 2 Report

Comments and Suggestions for Authors

Dear author,

Thank you kindly for actioning my requests stemming from review of your original submission.

Addressing all my comments/concerns/suggestions is greatly appreciated. I thank you for positive action on the revision front - such active response to the requests of a reviewer is highly refreshing! So thank you.

I do not have any remaining concerns with your study: a study which reports a set of novel and highly interesting results.

Review Reports

Author’s Reply to the Review Report — Reviewer 1

Author’s Reply to the Review Report — Reviewer 2