Acoustic Cues to Automatic Identification of Phrase Boundaries in Lithuanian: A Preparatory Study

Round 1
Reviewer 1 Report
Comments and Suggestions for Authors
please see attached file
Comments for author File: Comments.pdf
Author Response
Dear reviewer,
Thank you for your thoughtful and constructive comments on the manuscript. Your feedback was insightful and has helped us improve the clarity and quality of the paper.
Please see the attachment with our responses to your comments. We hope the changes address your concerns and enhance the manuscript accordingly.
Thank you again for your valuable input.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors
The theme of this paper is highly interesting, and the work has the potential to make a valuable contribution to ongoing debates in prosody and speech technology. Phrasing (e.g., chunking, pauses) is a salient feature in both natural and synthesized speech, yet much work on prosody remains overly abstract and artificially tied to syntactic structure. For this reason, a study focused on the automatic detection of prosodic boundaries in a less-studied language like Lithuanian could help fill an important gap in the field.
That said, the argument currently appears weak due to several substantial issues across the paper, starting from the introduction and continuing through the methodology and evaluation.
The introduction is overly simplistic and lacks depth in framing the theoretical background. Key points require clarification, especially regarding the complex and debated relationship between syntax, morphosyntax, and the prosodic hierarchy. The current explanation of how prosodic units are derived from syntax oversimplifies the matter and omits essential references. Foundational works such as Selkirk (1984) and Nespor & Vogel (1986), even if they advocate for a top-down approach that differs from that of the paper, should be cited. Additionally, some references (e.g., focused on French) are overly language-specific and not well contextualized. If the introduction aims for general relevance, it should be expanded with references to other languages; if the focus is on Lithuanian, more appropriate language-specific references should be added.
There are also significant methodological concerns. First, the total volume of data (e.g., duration, number of tokens) is not clearly reported, which is a basic but crucial piece of information. More importantly, the initial prosodic annotations, which constitute the foundation of the acoustic and computational analyses, were performed impressionistically by a single annotator. The absence of any reported measure of inter-annotator agreement is a major flaw.
In the computational component, the automatic boundary detection model is trained and tested on the same dataset used for feature analysis, using 20-fold cross-validation. While cross-validation is useful, it is not a substitute for an independent test set, particularly when working with a small, single-speaker dataset. This severely limits the generalizability of the results and raises concerns about overfitting. At minimum, the authors should introduce a speaker-independent test or hold-out set in a revised version. Dataset imbalance (with a predominance of word-final boundaries) also deserves further discussion, as it may bias model predictions. Additionally, even if the feature selection appears theoretically sound and is drawn from the literature, a clearer explanation and citation of the specific sources supporting each acoustic feature would strengthen the section.
A further serious concern lies in the fact that, by the authors' own admission, some salient prosodic cues for boundary marking—such as assimilation, degemination, articulatory strengthening/weakening—were disregarded both in annotation and analysis. Instead, boundaries were defined based on syntactic and semantic criteria, with little regard for phonetic realization. This directly contradicts the goals of a study on prosodic boundary detection, where annotation should be driven by phonetic evidence. Even more troubling is the suggestion that the criteria used for annotation were not fully defined, as admitted in the text.
In summary, while the topic is promising and the intentions are commendable, the study currently suffers from theoretical oversimplifications and major methodological limitations that must be addressed. I recommend major revisions, including re-annotation with multiple trained annotators using a clearly defined and phonetics-based protocol, restructuring the machine learning evaluation with independent test data, and a substantial expansion and refinement of the theoretical background in the introduction.
The Ethical Concerns:
- Subjective Annotation by a Single Annotator: The study relies heavily on the manual, impressionistic annotation of prosodic boundaries by a single annotator. This forms the basis for both the acoustic analysis and the training data used in the machine learning component. However, no inter-annotator agreement or validation procedure is reported. Given the inherently subjective nature of prosodic boundary identification, this absence raises concerns about the reliability of the dataset and the reproducibility of the results.
- Lack of Transparency About Participant Consent and Data Usage
The manuscript does not provide information about how recordings were obtained, whether participants consented to the use of their data, or whether any ethical approval was obtained for data collection. While this may be an oversight, it is essential for any empirical study involving human data to report these aspects.
Comments for author File: Comments.pdf
Author Response
Dear reviewer,
Thank you for your thoughtful and constructive comments on the manuscript. Your feedback was insightful and has helped us improve the clarity and quality of the paper.
Please see the attachment with our responses to your comments. We hope the changes address your concerns and enhance the manuscript accordingly.
Thank you again for your valuable input.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for Authors
see file attached
Comments for author File: Comments.pdf
Reviewer 2 Report
Comments and Suggestions for Authors
I appreciate the authors’ thorough and thoughtful responses to the review comments. The revised version of the manuscript shows clear and substantial improvement across all key areas. Given these improvements, I now find the manuscript suitable for publication.