Review Reports - Comparative Evaluation of hiPSC-Derived Brain Organoids as Platforms for Assessing Thyroid Hormone System Disrupting Chemicals

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In my opinion, the paper "Comparative evaluation of hiPSC-derived brain organoids as platforms for assessing thyroid hormone system-disrupting chemicals" by Valeria Fernandez Vallone and colleagues is very well-written and comprehensive. I consider it a valuable and highly interesting work. The paper presents a comparative analysis of two hiPSC-derived brain organoid models (cerebral organoids and NSC-derived organoids) as platforms for studying thyroid hormone system-disrupting chemicals (THSDCs). The authors combine functional (T3 metabolism), molecular (gene expression), and phenotypic (imaging analysis) approaches, providing a comprehensive strategy for assessing NAM-type models. The topic is timely and relevant, particularly in the context of the development of alternative methods to animal testing and the growing importance of developmental toxicology. The paper has the potential to contribute to the development of standards for organoids in the safety assessment of chemicals. However, despite its numerous strengths, the manuscript requires several methodological improvements, clarification of the interpretation of results, and organization of the narrative to meet publication standards.

Introduction section

Lines 48-49 "….. during sensitive developmental windows…." This part of the sentence raises some doubts. Is it just “….. during sensitive developmental windows…”? Or does it rather refer to a general principle relating to overall development and functioning?

Insufficient emphasis on the novelty of the work. Although the purpose is described, a clear sentence such as “To our knowledge, this is the first study that…” is missing.

Some sentences in the Introduction section are too long. I would suggest breaking them into shorter, more specific ones.

Materials and Methods section

Very high level of detail (reproducibility), which on the one hand is a positive feature of this description, but this section seems excessively long and overloaded with detail. In short, this section is very long and at times too technical for the main text.

Results section

This section is very well developed and described in detail. However, it constitutes a very large (volume-wise) part of the work, which may be difficult for many readers. Therefore, I would suggest emphasizing the key results more strongly ("The key finding of this experiment is..." "Importantly,..." "Notably,..."…)

Discussion section

This section provides a good link between the results and the work's hypothesis and places them firmly in the context of the literature. However, this section seems significantly too long and at times feels like a repetition of the Introduction and Results sections. It also feels as if the authors are re-describing the results instead of interpreting them. I would suggest highlighting the more important conclusions and using clear statements such as "Our findings demonstrate that...", "The most important implication is...",…….

Conclusion section

In this section, I would emphasize the novelty of the work more and give more prominence to the authors' contributions regarding what exactly constitutes the new contribution of the work and why the results are groundbreaking.

Author Response

Reviewer 1

Comments and Suggestions for Authors

We sincerely thank the reviewer for the positive and thoughtful assessment of our manuscript, and for recognizing the relevance, timeliness, and potential contribution of our work to the development of human-relevant NAMs for chemical safety assessment. We greatly appreciate the reviewer’s constructive comments, which have helped us improve the methodological clarity, interpretation of results, and overall organization of the manuscript. We have carefully addressed each point raised and revised the manuscript accordingly.

Introduction section

Reviewer 1 commented: Lines 48-49 "….. during sensitive developmental windows…." This part of the sentence raises some doubts. Is it just “….. during sensitive developmental windows…”? Or does it rather refer to a general principle relating to overall development and functioning?

We thank the reviewer for this observation. Local and temporal thyroid hormone (TH) availability is required throughout life for all tissues. However, during fetal and early postnatal period their fine-tuned provision is especially important e.g. for development and proper organization of the brain. Thus, sensitive (developmental) windows exist during this period but also later in life, e.g. during puberty, tissue remodelling or repair, etc. (Dentice et al. doi.org/10.1016/j.bbagen.2012.05.007).

We agree that the original wording could be interpreted as referring to specific well defined windows during brain development, though our intention was to refer more broadly to the well-established role of TH in human brain development and function, including processes such as neural proliferation, migration and myelination.

To avoid ambiguity and prevent confusion with the exposure windows later used in our organoid experiments, we have revised the sentence to refer to overall brain development and function. The corresponding change is highlighted in the revised manuscript (line 69-70).

Reviewer 1 commented: Insufficient emphasis on the novelty of the work. Although the purpose is described, a clear sentence such as “To our knowledge, this is the first study that…” is missing.

We have revised the final part of the Introduction to better emphasize the original contribution of the work. In particular, we now clarify that, to our knowledge, this is the first study to directly compare two hiPSC-derived brain organoid platforms for THSDC assessment while integrating model standardization, quality control and tissue-specific TH system endpoints. The modified text is included in lines 139-152 of the revised manuscript.

Reviewer 1 commented: Some sentences in the Introduction section are too long. I would suggest breaking them into shorter, more specific ones.

We thank the reviewer for this helpful suggestion. The Introduction has been substantially revised, and we have carefully considered the reviewer’s comment when editing this section.

Several long sentences have been shortened or divided into more focused sentences to improve readability, clarity, and flow throughout the revised Introduction.

Materials and Methods section

Reviewer 1 commented: Very high level of detail (reproducibility), which on the one hand is a positive feature of this description, but this section seems excessively long and overloaded with detail. In short, this section is very long and at times too technical for the main text.

We thank the reviewer for this constructive comment. We agree that, although methodological detail is important for reproducibility, the Methods section was overly long and contained technical details that were not essential for the main text.

To improve readability while preserving reproducibility, we have substantially shortened and reorganized the Methods section. Specifically, we have:

Moved details on hiPSC banking and quality control to the Supplementary Information section “hiPSC banks and quality control”.
Transferred detailed CO differentiation procedures to the Supplementary Information, as this protocol largely follows the manufacturer’s instructions with only minor deviations.
Published the full protocols for NSC generation, quality control, banking, and NSCO generation on Protocols.io (Springer Nature, open source), which are now cited in the manuscript as reference.

https://dx.doi.org/10.17504/protocols.io.x54v9pxoqg3e/v1

https://dx.doi.org/10.17504/protocols.io.3byl4j6kzlo5/v1

Moved additional methodological details from the main text to the Supplementary Information: histology and immunofluorescence including table with antibodies details. RT-qPCR procedure plus table with primers sequences.
Shortened the remaining Methods sections to retain only information essential for understanding and reproducing the study. The titles of these sections are highlighted in yellow.

The revised and shortened Methods subsections are highlighted in yellow in the manuscript.

Results section

Reviewer 1 commented: This section is very well developed and described in detail. However, it constitutes a very large (volume-wise) part of the work, which may be difficult for many readers. Therefore, I would suggest emphasizing the key results more strongly ("The key finding of this experiment is..." "Importantly,..." "Notably,..."…)

We thank the reviewer for this constructive suggestion. Accordingly, we have added when adequate transition words to signal the relevance of specific observations. In addition, we revised the final paragraph of each Results subsection, which summarizes the most relevant findings and helps the reader interpret the main conclusions of that section. Changes are highlighted in yellow.

Discussion section

Reviewer 1 commented: This section provides a good link between the results and the work's hypothesis and places them firmly in the context of the literature. However, this section seems significantly too long and at times feels like a repetition of the Introduction and Results sections. It also feels as if the authors are re-describing the results instead of interpreting them. I would suggest highlighting the more important conclusions and using clear statements such as "Our findings demonstrate that...", "The most important implication is...",…….

We thank the reviewer for this constructive suggestion. We have revised and reorganized the discussion to reduce redundancy with the results section and to place greater emphasis on interpretation rather than description. Where appropriate, we now highlight the main conclusions more explicitly using clearer framing statements.

Conclusion section

Reviewer 1 commented: In this section, I would emphasize the novelty of the work more and give more prominence to the authors' contributions regarding what exactly constitutes the new contribution of the work and why the results are groundbreaking.

We thank the reviewer for this helpful suggestion. We have revised the conclusion to more clearly emphasize the novelty and main contributions of the study. We also revised the subsection dedicated to “Future directions”. Changes are highlighted in yellow.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

I have carefully reviewed your manuscript addressing the use of hiPSC-derived brain organoids for assessing thyroid hormone system-disrupting chemicals. The topic is outstanding and of clear relevance for the development of human-relevant NAMs. The study is well conceived but some aspects would benefit from clarification to further strengthen the manuscript.

Abstract

Lines 37-38: The statement on treatment-associated changes in cell composition remains quite general. Including an indication of the level or magnitude of the observed effects would make the findings more informative and strengthen the scientific value of the abstract.

Introduction

Lines 72-79: The discussion on the discrepancy between systemic TH levels and tissue-specific effects is well articulated. The inclusion of specific examples or key studies demonstrating this mismatch can better support the rationale for developing more advanced in vitro systems.

Lines 80-83: The statement that thyroid hormone system disruption is comparatively underserved within the OECD framework is relevant and well aligned with the rationale of the study. However, as currently phrased, it comes across as a rather strong general claim. It may be helpful to briefly clarify or contextualize this statement, for instance by referring more explicitly to the current limitations or gaps in available test strategies addressing thyroid-related endpoints.

Lines 100-102: The sentence describing organoid advantages should be revised for readability.

Lines 106-115: The limitations of organoids are appropriately acknowledged but I suggest clearly linking the variability, discussed in general terms, with how it is addressed in your experiments.

Lines 126-134: The study objective is quite extensive and combines elements of model comparison, assay development, and validation. For clarity, I suggest distinguishing more explicitly between primary and secondary aims, which would help guide the reader through the manuscript.

Materials and Methods

Lines 144-164: How many biological replicates per line were used in each experiment?

Lines 165-182: While reagent preparation is described in detail, the use of different solvents (ethanol for T3 and DMSO for other compounds) is not clearly addressed.

Lines 202-203: The exclusion of batches failing to develop cortical units is mentioned, but the number or proportion of excluded batches is not reported.

Lines 212-257: The early versus late exposure design is interesting but not sufficiently justified. Explain how these time points relate to specific stages of human neurodevelopment or to known windows of TH sensitivity.

Lines 223-227: The distinction between pulse and chronic exposure is described, but the rationale behind the chosen durations is not entirely clear, so adding a brief justification would help interpret the biological relevance of these conditions.

Lines 229-230 and 245-246: The use of technical replicates for LC-MS/MS is noted, but biological replication is not clearly described.

Lines 258-307: While I appreciate the thorough description of the NSC quality control procedures, I found it somewhat difficult to understand how these QC steps translate into actual acceptance criteria. In particular, markers such as SOX2, PAX6, or Nestin are mentioned, but no quantitative thresholds are provided to define when a culture is considered suitable for downstream applications. Including explicit criteria (for example, percentage of positive cells or acceptable ranges) would significantly strengthen the reproducibility of the workflow.

Lines 308-335: The description of NSCO generation includes pilot experiments, but it is not entirely clear how these relate to the main dataset. In my opinion, distinguishing exploratory from confirmatory experiments can improve clarity.

Lines 336-343: The exposure timeline used for NSCOs differs from that applied to COs, yet I could not find a clear rationale for this choice. Since one of the main objectives of the study is to compare these two organoid platforms, I suggest clarifying whether these differences reflect biological considerations such as developmental stage equivalence or are driven by technical constraints. Without this explanation, it becomes more difficult to interpret whether observed differences between models are intrinsic or simply related to the exposure design.

Results

The Results section is rich and integrates multiple layers of information, including hormone metabolism, transcriptional responses, and imaging-based phenotyping, which is certainly a strength of the study. However, I found it challenging to understand how these different datasets are systematically connected. A clearer explanation of the analytical framework used to integrate these endpoints (are they interpreted independently or combined to support specific mechanistic conclusions?) would improve the interpretability of the results. In addition, it is not entirely clear how variability inherent to organoid systems has been handled at the statistical level. Given the known impact of factors such as donor origin, batch effects, and organoid-to-organoid variability, it would be important to state whether and how these sources of variability were accounted for in the analyses.

Discussion

The Discussion appropriately highlights the relevance of the proposed models in the context of NAMs but in my opinion, a slightly more critical evaluation of their current limitations is needed. In particular, aspects such as incomplete maturation, lack of vascularization, and potential constraints in metabolic competence are well-recognized features of brain organoids and could influence the interpretation of the results. Moreover, since the comparison between COs and NSCOs represents a central aspect of the work, I encourage the authors to discuss the practical compromises between these models. While their complementary nature is well described, in my opinion clarifying in which contexts one model might be preferred over the other, especially in terms of complexity, reproducibility and scalability.

Conclusions

The conclusions are generally consistent with the abstract but remain somewhat wide-ranging in their current form. They would be strengthened by more explicitly distinguishing what has been experimentally demonstrated in this study from what is proposed as a future application, particularly in the context of regulatory implementation.

Kind Regards

Author Response

Reviewer 2

Comments and Suggestions for Authors

Dear Authors,

We thank the reviewer for the positive assessment of our manuscript and for highlighting the relevance of hiPSC-derived brain organoids as human-relevant NAMs for assessing thyroid hormone system-disrupting chemicals (THDCs). We greatly appreciate the constructive comments, which have helped us clarify and strengthen several aspects of the manuscript. We have addressed each point in detail below and indicate, where appropriate, the corresponding changes made in the revised manuscript.

Abstract

Reviewer 2 commented:

We thank the reviewer for this helpful suggestion. We have revised the sentence in the abstract to provide a more specific description of the observed treatment-associated changes in cell composition, including the direction of the effect. The revised text is highlighted in yellow in the manuscript (lines 58-60).

Introduction

Reviewer 2 commented: Lines 72-79: The discussion on the discrepancy between systemic TH levels and tissue-specific effects is well articulated. The inclusion of specific examples or key studies demonstrating this mismatch can better support the rationale for developing more advanced in vitro systems.

We thank the reviewer for this helpful suggestion. We have strengthened this section by adding references that support the mismatch between circulating TH levels and tissue-specific TH action (Stub et al., 2026; 10.1016/j.tox.2025.154353 and Thomas et al. 2026; 10.1093/toxsci/kfaf152). Additionally, we now cite studies showing that local T3 availability is regulated by deiodinase activity, including DIO2-mediated activation and DIO3-mediated inactivation, and that this regulation can differ between tissues and even between cell types within the same tissue (Hernandez et al., 2021; Bernal, 2022; Luongo et al., 2019). We also added references highlighting that developmental changes in thyroid hormone receptor expression further modulate local TH action during brain development (Bernal, 2007; Bernal, 2017). Together, these studies support the concept that brain T3 availability is shaped not only by circulating hormone levels, but also by local deiodinase activity, membrane transporter function, and TH receptor expression. To date, local tissue hormone regulation and action in brain is a field in development. Modification are now in lines 88-92.

Reviewer 2 commented: Lines 80-83: The statement that thyroid hormone system disruption is comparatively underserved within the OECD framework is relevant and well aligned with the rationale of the study. However, as currently phrased, it comes across as a rather strong general claim. It may be helpful to briefly clarify or contextualize this statement, for instance by referring more explicitly to the current limitations or gaps in available test strategies addressing thyroid-related endpoints.

We thank the reviewer for this helpful comment. We agree that the original wording was too broad and required further contextualization. While a broad inventory of test methods relevant to thyroid hormone system disruption is available (Vergauwen et al. 2024; 10.12688/openreseurope.18739.1; also cited in the manuscript introduction), their applicability, standardization, validation, and regulatory implementation for hazard identification and risk assessment remain limited. There are currently no internationally validated in vitro assays that capture the consequences of disrupted thyroid hormone action on the developing brain, and no validated endpoints that specifically reflect adequate local T3 provision during brain development (Ramhoj et al. 2023; 10.3389/ftox.2023.1189303 and Gilbert et al. 2020; 10.1210/endocr/bqaa106)

For simplification, we have revised the sentence and specified the current test methods and their limitations. The revised text is highlighted in the manuscript (lines 96-99).

Reviewer 2 commented: Lines 100-102: The sentence describing organoid advantages should be revised for readability.

We have modified the sentence for readability as suggested by the reviewer. See lines 115-117.

Reviewer 2 commented: Lines 106-115: The limitations of organoids are appropriately acknowledged but I suggest clearly linking the variability, discussed in general terms, with how it is addressed in your experiments.

We thank the reviewer for this valuable suggestion. We have revised this section to more explicitly link the general limitations of organoid models, particularly variability and reproducibility, to the experimental design and objectives of our study.

Specifically, we now clarify that these limitations were addressed by implementing standardized workflows, intermediate quality control steps, multiple hiPSC lines, and fit-for-purpose endpoints to assess model performance. These revisions are included in lines 148–152 of the revised manuscript.

Reviewer 2 commented: Lines 126-134: The study objective is quite extensive and combines elements of model comparison, assay development, and validation. For clarity, I suggest distinguishing more explicitly between primary and secondary aims, which would help guide the reader through the manuscript.

We thank the reviewer for this helpful suggestion. We agree that the original formulation combined several objectives and could be made clearer for the reader.

We have revised this section to distinguish the main aim of the study from the stepwise strategy used to address it. Specifically, we now first define the overall aim of comparing two hiPSC-derived brain organoid models, and then describe how this was approached through endpoint establishment and subsequent application to THSDC assessment. The revised text is included in lines 139-147 of the manuscript.

Materials and Methods

Reviewer 2 commented: Lines 144-164: How many biological replicates per line were used in each experiment?

We thank the reviewer for raising this important point. In this study, independent hiPSC lines were considered biological replicates. For each dataset, the number of biological and technical replicates is specified in the corresponding Materials and Methods section and figure legends.

Reviewer 2 commented: Lines 165-182: While reagent preparation is described in detail, the use of different solvents (ethanol for T3 and DMSO for other compounds) is not clearly addressed.

We thank the reviewer for this important comment. We have revised the Methods section to clarify the rationale for using different solvents for T3 and the reference THSDCs.

T3 was prepared in ethanol according to the manufacturer’s recommendation. The final ethanol concentration was kept constant across the corresponding conditions and models, with a maximum concentration of 0.0007% in the 20 nM T3 treatments, which is considered negligible in terms of cytotoxicity.

SC and IA were prepared in DMSO due to their higher solubility in this solvent compared to ethanol, which allowed lower solvent volumes to be used. In addition, DMSO at low concentrations has been reported to show limited cytotoxicity in most tested cell systems, although solvent tolerance depends on cell type and exposure duration. By contrast, ethanol may show higher cytotoxicity and therefore requires careful concentration control (Asiri et al., 2025; https://doi.org/10.3390/mps8040093).

We have modified the Methods section accordingly to describe the solvent choice (lines 176-177 and 190-191).

Reviewer 2 commented: Lines 202-203: The exclusion of batches failing to develop cortical units is mentioned, but the number or proportion of excluded batches is not reported.

Information regarding the early exclusion (day 15) of batches failing to develop cortical units is reported in results section (now lines 489-492). We have additionally now mentioned it in material and methods (now in supplementary information, highlighted in yellow).

Reviewer 2 commented: Lines 212-257: The early versus late exposure design is interesting but not sufficiently justified. Explain how these time points relate to specific stages of human neurodevelopment or to known windows of TH sensitivity.

We thank the reviewer for this important comment. We agree that the rationale for the early versus late exposure design required clearer explanation.

The early versus late exposure design was based on the differentiation dynamics of cortical organoids (COs), rather than being defined as a direct clinical equivalent. This design was applied only to COs, as this model shows recognizable developmental progression over time. COs broadly reflect aspects of first-trimester human fetal brain development. Until approximately day 40–45, progenitor proliferation and early neurogenesis are dominant processes, with the emergence of ventricle-like cavities, proliferative zones, intermediate progenitors, and early differentiating neurons resembling ventricular and subventricular zone organization (Nascimiento et al., 2019; doi: 10.3389/fcell.2019.00303). After this stage, neuronal migration and further differentiation become more prominent.

Thus, the rationale was to compare TH/THSDC effects in a relatively progenitor-rich stage versus a more differentiated stage showing initial neuronal layering. These processes broadly correspond to developmental events occurring in vivo between approximately gestational weeks 6 and 15 (Eze et al., 2021, https://doi.org/10.1038/s41593-020-00794-1).

We also agree that timing is a critical determinant of the consequences of altered TH signalling during neurodevelopment. Clinical evidence indicates that early treatment of congenital hypothyroidism can efficiently improve neurological outcome when initiated rapidly after birth (Grosse et al., 2011; DOI: 10.1136/adc.2010.190280). In contrast, disrupted TH action during gestation may lead to more persistent neurodevelopmental consequences, as illustrated by Allan–Herndon–Dudley syndrome caused by mutations in the TH transporter MCT8. These patients show severe neurodevelopmental delay, likely linked to impaired TH availability in the developing brain and reduced T3 uptake by neurons and their progenitors (Krude et al., 2020; DOI: 10.1055/a-1108-1456).

However, despite this clinical evidence, tissue-wide TH bioavailability and local TH action during early human brain development remain incompletely understood. Therefore, our exposure windows were not intended to model specific clinical windows directly, but rather to test whether altered TH availability produces different effects depending on the developmental state of the organoid model. This is further supported by our data showing the presence of key TH-related components, including MCT8 and DIO3, during both phases of differentiation, and by previous evidence that THRA expression increases along the differentiation trajectory from radial glial cells to excitatory neurons.

To address the reviewer’s comment, we have added a concise explanation of the rationale for the early versus late exposure design in the Methods section (lines 199-204).

Reviewer 2 commented: Lines 223-227: The distinction between pulse and chronic exposure is described, but the rationale behind the chosen durations is not entirely clear, so adding a brief justification would help interpret the biological relevance of these conditions.

We thank the reviewer for this helpful comment. We agree that the rationale for the pulse and chronic exposure durations required clearer explanation.

For the CO model, the exposure scheme was designed in relation to the two developmental phases described above. Chronic exposure was initiated after cortical units had formed and after removal of the extracellular matrix capsule at day 23. This allowed chronic treatment to cover the subsequent differentiation phase until either the early endpoint at day 40 or the later endpoint at day 60, while maintaining a comparable treatment duration across both conditions.

The 48 h pulse exposure was selected as a short-term treatment window sufficient to detect T3 metabolism in the culture supernatant by LC–MS/MS before nutrient depletion became limiting. This duration also produced clear transcriptional modulation in bulk RNA-seq, supporting its suitability as a short-term response window.

For NSCOs, early and late developmental windows were not applied because this model does not undergo the same cytoarchitectural progression as COs. Instead, NSCOs retain a neurosphere-like organization, with a progenitor pool and increasing neuronal proportion over time. Therefore, chronic exposure was initiated once NSCOs were established. The pulse duration was selected to allow sufficient T3 consumption from the medium for detection using the high-throughput ELISA-based depletion assay, and this time point also showed clear gene-expression modulation by bulk RNA-seq.

We now clarify in the manuscript that these exposure schemes were designed as proof-of-concept conditions. For future applications, the optimal pulse and chronic exposure windows may require model- and chemical-specific adjustment through preliminary exploratory experiments. A brief justification is described in the discussion (lines 952-957), while the Materials and Methods section remains focused on the technical description of the exposure conditions.

Reviewer 2 commented: Lines 229-230 and 245-246: The use of technical replicates for LC-MS/MS is noted, but biological replication is not clearly described.

We thank the reviewer for pointing this out. We have revised the presentation of the LC–MS/MS data to make the biological replication clearer.

Specifically, this information has been added to Figure 4A and 4B, where biological replicates are now shown as color-coded data points. For control conditions, the previous version displayed technical replicates from individual wells. We have now averaged the technical replicates and used the mean value of each corresponding biological replicate for plotting and statistical analysis. The statistical analysis has been updated accordingly, and the figure legends have been revised and highlighted in yellow.

Reviewer 2 commented: Lines 258-307: While I appreciate the thorough description of the NSC quality control procedures, I found it somewhat difficult to understand how these QC steps translate into actual acceptance criteria. In particular, markers such as SOX2, PAX6, or Nestin are mentioned, but no quantitative thresholds are provided to define when a culture is considered suitable for downstream applications. Including explicit criteria (for example, percentage of positive cells or acceptable ranges) would significantly strengthen the reproducibility of the workflow.

We thank the reviewer for this valuable suggestion. We agree that explicit acceptance criteria are important to improve reproducibility and clarify how NSC quality control informs downstream use.

To reduce the length of the main Methods section while preserving methodological detail, we have now published the full NSC differentiation, banking, and QC workflow on Protocols.io (Springer Nature, open source; https://dx.doi.org/10.17504/protocols.io.x54v9pxoqg3e/v1). We have added this reference to the new manuscript version. In the manuscript, we retained a concise description of the methodology and added the NSC banking eligibility criteria, including quantitative marker-based acceptance criteria, as requested.

This additional information is highlighted in the revised manuscript (lines 257-265).

Reviewer 2 commented: Lines 308-335: The description of NSCO generation includes pilot experiments, but it is not entirely clear how these relate to the main dataset. In my opinion, distinguishing exploratory from confirmatory experiments can improve clarity.

We thank the reviewer for this helpful comment. We agree that the relationship between the pilot experiments and the main dataset required clearer explanation.

The full protocols for NSCO generation under both static and dynamic conditions are now described in detail on Protocols.io (Springer Nature, Open source; https://dx.doi.org/10.17504/protocols.io.3byl4j6kzlo5/v1). In the manuscript, we have retained a concise description of the workflow and revised the text to clearly distinguish between the main experiments and exploratory pilot experiments.

Specifically, NSCOs generated under static conditions were used for the main experimental dataset, whereas NSCOs generated under dynamic conditions were used in pilot experiments to explore their capacity to metabolize T3. This clarification has been added and highlighted in the revised manuscript in Material and Methods (lines 270-286) and in results (lines 775-776 and 802-805).

Reviewer 2 commented: Lines 336-343: The exposure timeline used for NSCOs differs from that applied to COs, yet I could not find a clear rationale for this choice. Since one of the main objectives of the study is to compare these two organoid platforms, I suggest clarifying whether these differences reflect biological considerations such as developmental stage equivalence or are driven by technical constraints. Without this explanation, it becomes more difficult to interpret whether observed differences between models are intrinsic or simply related to the exposure design.

We thank the reviewer for this important comment. We agree that the rationale for using different exposure timelines in COs and NSCOs required clearer explanation, particularly because the study compares both platforms.

The exposure schemes were defined separately for each model based on their distinct biological and technical characteristics. COs show greater cytoarchitectural complexity and developmental progression over time, allowing the design of early and late exposure windows that reflect different differentiation stages. In contrast, NSCOs have a more neurosphere-like organization, retain a progenitor pool, and show less pronounced cytoarchitectural staging. In addition, the NSCO culture period compatible with maintenance in 96-well plates is considerably shorter than that of COs, approximately 30 days versus 60 days, respectively. This represents an additional technical constraint that influenced the exposure design. Therefore, applying the same early/late exposure logic used for COs was not considered biologically or technically appropriate for NSCOs.

The purpose of the comparison was not to impose identical exposure timelines on both models, but to assess whether each platform can detect THSDC-mediated interference with T3 action using comparable endpoint categories, including gene expression, T3 metabolism, and cell composition. In this context, COs may be more suitable for mechanistic studies requiring higher developmental complexity, whereas NSCOs may be more suitable for scalable or higher-throughput applications.

Results section (lines 698-703), explains that the different exposure timelines reflect model-specific biological and technical considerations rather than an attempt to establish direct developmental equivalence between COs and NSCOs. We have additionally add the point to the discussion.

Results

Reviewer 2 commented: The Results section is rich and integrates multiple layers of information, including hormone metabolism, transcriptional responses, and imaging-based phenotyping, which is certainly a strength of the study. However, I found it challenging to understand how these different datasets are systematically connected. A clearer explanation of the analytical framework used to integrate these endpoints (are they interpreted independently or combined to support specific mechanistic conclusions?) would improve the interpretability of the results.

We thank the reviewer for this helpful comment. We have now clarified in the discussion (lines 932-938, highlighted) how the different datasets were interpreted and integrated. Specifically, we emphasize that hormone metabolism, transcriptional responses, and imaging-based phenotyping were first evaluated as distinct but complementary endpoints and then considered together to support mechanistic interpretation of TH system disruption in brain organoids. Based on the findings of this proof-of-concept study, we show that combining endpoints reduces the risk of overlooking relevant effects and provides a stronger basis for linking local TH action to potential adverse outcome pathways.

Reviewer 2 commented: In addition, it is not entirely clear how variability inherent to organoid systems has been handled at the statistical level. Given the known impact of factors such as donor origin, batch effects, and organoid-to-organoid variability, it would be important to state whether and how these sources of variability were accounted for in the analyses.

We thank the reviewer for this valuable comment. We agree that the statistical analysis should explicitly account for the intrinsic variability of organoid-based systems, including donor origin, batch effects, and organoid-to-organoid variation.

In the original version of the manuscript, individual organoids were analyzed as independent replicates. However, organoids generated from the same donor and batch do not represent fully independent biological observations, and such an approach may introduce pseudo-replication.

To address this issue, we have comprehensively revised the statistical analysis. In the revised manuscript, the data are analyzed within a hierarchical mixed-effects framework that reflects the experimental structure, with organoids nested within donor and, where applicable, batch (only for NSCO derived from BIHi250-A). Treatment was specified as the fixed effect of interest, while donor (hiPSC line) and batch were modelled as random effects. Accordingly, organoids are now considered within-donor replicates, and donor is treated as the biological unit of replication. This revised approach more appropriately captures donor-dependent variability, batch-related effects, and residual organoid-to-organoid heterogeneity.

The re-analysis was performed in R Studio using linear mixed-effects models with treatment as a fixed effect and donor as a random effect. Model assumptions were assessed by inspection of residual and Q-Q plots. Planned contrasts were adjusted for multiple testing using the Holm method.

Figures with plots re-analysed include: Figures 1 B-C; 3 B-C; 4 C; 5 C and Supplementary Figures 9B, 11C. Figure legends were modified accordingly and changes are highlighted in yellow.

After re-analysis using a mixed-effects model with donor included as a random effect, the main conclusions remained unchanged.

Discussion

Reviewer 2 commented: The Discussion appropriately highlights the relevance of the proposed models in the context of NAMs but in my opinion, a slightly more critical evaluation of their current limitations is needed. In particular, aspects such as incomplete maturation, lack of vascularization, and potential constraints in metabolic competence are well-recognized features of brain organoids and could influence the interpretation of the results. Moreover, since the comparison between COs and NSCOs represents a central aspect of the work, I encourage the authors to discuss the practical compromises between these models. While their complementary nature is well described, in my opinion clarifying in which contexts one model might be preferred over the other, especially in terms of complexity, reproducibility and scalability.

We thank the reviewer for this important suggestion. We have added a more critical discussion of the current limitations of the brain organoid models used in this study, including incomplete maturation, lack of vascularization and blood–brain barrier-related properties, and potential constraints in metabolic competence, all of which may influence the interpretation of TH and THSDC-related endpoints. This has been addressed in lines 898-909.

In the discussion we clarify the practical compromises between CO and NSCO models. Specifically, we indicate that COs may be preferable for mechanistic studies requiring greater cytoarchitectural complexity and developmental fidelity, whereas NSCOs may be more suitable for screening-oriented applications where reproducibility, scalability, and reduced variability are prioritized. This section can be found in lines 893-897.

Conclusions

Reviewer 2 commented: The conclusions are generally consistent with the abstract but remain somewhat wide-ranging in their current form. They would be strengthened by more explicitly distinguishing what has been experimentally demonstrated in this study from what is proposed as a future application, particularly in the context of regulatory implementation.

We thank the reviewer for this helpful comment. We have revised the conclusion to more clearly distinguish between findings experimentally demonstrated in this study and future applications, particularly regarding regulatory implementation. Changes are highlighted in lines 988-1008.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

I appreciate the thorough and constructive manner in which you have addressed the points raised during the review process, which has significantly improved the overall quality of the manuscript. In particular, I find that the additional explanations regarding the experimental design and the clearer description of procedures have strengthened the robustness of the study.

Kind regards