Next Article in Journal
Profiling Personality to Predict Athletes’ Academic Achievement: Cross-Cultural Analysis
Previous Article in Journal
Effects of a Video-Guided Active Break Programme on the Self-Esteem and Socio-Emotional Well-Being of Schoolchildren with Special Educational Needs: Active Classes Project
 
 
Article
Peer-Review Record

Investigation of Pitch and Tone Preference of Preschool Children in Mandarin

Behav. Sci. 2026, 16(3), 460; https://doi.org/10.3390/bs16030460
by Minmin Yin 1,2, Surina Zhang 1,2,3,4, Hongyun Zhu 5, Jieyi Huang 1,2, Shengnan Ge 6 and Baoming Li 1,3,4,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Behav. Sci. 2026, 16(3), 460; https://doi.org/10.3390/bs16030460
Submission received: 6 January 2026 / Revised: 13 March 2026 / Accepted: 17 March 2026 / Published: 20 March 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

REVIEW ADDITIONS:  Behavioral Sciences (Manuscript ID: behavsci-4110819) 

The topic of this study is of interest to scholars studying early language acquisition, as it affords analysis of the potential effects of both pitch and tone as they occur in input to Mandarin child learners. Mandarin input affords observation of both pitch and tone dimensions of speech input from adult speakers in the children’s environment. The study focusses on 3-6 year old kindergarten age children who already have acquired basic language skills in their dominant language, enabling participation in a task related set of responses that contained balanced input level across the child participants, in contrast to capacities of younger participants with less developed language and cognitive capacities. An additional strength of the study is that the author’s focus on child responses rather than adult speaker characteristics, allowing consideration of the effects of pitch and tone on children’s interest and responsiveness to language input in a critical language learning phase.  

Methods employed in constructing and executing the study were missing some aspects of detail that would enhance the reader’s understanding and precision in evaluating study results. For example, the authors describe a study population of 94 children and give a list of four participant inclusion criteria employed to select participants (page 6).  Criteria 2, 3, and 4 are listed with no information at all about how the children were tested to determine eligibility on these criteria.

The speech stimuli involved were described as being read at” high”, “normal”, and “low” pitches by a female who recorded the speech stimuli “with Mandarin Level 2A” (without definition of what Level 2A indicates). While the authors explicate the F0 method of discriminating high, mid, and low pitches, they do not define the quantities of “rate” and “amplitude” leaving the reader with an imprecise understanding of what these parameters mean relative to the goal and findings of their study.

In contrast to missing information on definition of terms in the study parameters (discussed above), data collection and analysis procedures were precisely and helpfully described, a positive aspect of the lucidity of the narrative.

Data analysis results, including the Tables and Figures provided were a strength of the narrative. In combination with the Discussion section, the findings of the study were well established and clear to the reader. Establishing basic parameters of pre-school children’s response to prosodic parameters in input, particularly in a language that leans heavily on prosody to support meaning in early language development is an important contribution, particularly as there are far fewer studies of child responsiveness than there have been of ADS and IDS speech styles produced by adult speakers to consider adult input. More work is needed to understand the effects of input prosody on child responses. Precision in definitions of study terms would help in understanding the meaning of the present study results.


EXAMPLE OF TERMS USED BELOW
: page # (P), line # (L), comment(C)  

P 2, L 104, C omit "s"

P3, L104, C insert range "of" F0;  L113, C omit "the";  L117, C omit "It is found that";  L120, C substitute "between" for "of";  L 131, C insert "the";  L 136 C substitute "have" for "has"

P4, L149, C insert "of" between "range" & "F0L 167, C insert "or" between "tone & "durations;  L178, C insert "s" at the end of "nurture"

P 6, L 200, C substitute "between" for "and"L 202, C insert "there were"; omit "aged"; L 203,  C omit "aged"; omit "are instances of"; L 215, C substitute "because" for "due to"; L 221 substitute "exhibits" for "have"; L 241 C substitute "the for "a"

P 8, L 266, C substitute "other" for "cross-linguistic"

P 10, L 314, C correct "mad" to "made" ; L 330, 331 & 333 C omit "s" in "years"

P 11, L 338, C omit "alternative"

P 13, L 389, C substitute "have for "do" & "mastered" for Master"

P 14, L 438 C substitute "in an" for "under"

Comments on the Quality of English Language

Many infelicities and errors in written English across the narrative . I would strongly suggest and native English speaker edit your manuscript before you submit it.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript addresses an interesting and potentially important question concerning preschool children’s pitch preferences in Mandarin and its implications for child-directed speech. However, the paper suffers from substantial conceptual, theoretical, and methodological problems that fundamentally weaken its contribution. In particular, the literature review lacks clear organization and theoretical precision, key constructs such as child-directed speech, pitch, tone, and intonation are repeatedly conflated, and several central claims are not adequately supported by the cited literature or the empirical results. In addition, there are serious issues regarding construct validity, data coding transparency, and the logical coherence of the conclusions drawn from the findings. Taken together, these issues prevent the manuscript from providing a clear or convincing answer to its stated research questions. For these reasons, I recommend rejection of the manuscript in its current form.

The literature review needs to be substantially reorganized to clarify the theoretical motivation of the study.


Lines 35–46: The opening paragraph does not clearly state its main point, and it is unclear how this paragraph motivates the research questions. The authors should explicitly articulate what argument they are building and how this paragraph fits into the overall structure of the introduction.

Lines 49–67: Two potential reasons are proposed for why teachers change their pitch when talking to preschool children, but the two explanations sound highly overlapping. It would be helpful for the authors to clearly distinguish these factors and explicitly state how they differ conceptually and empirically.

Lines 67–73: The discussion of worn voices and vocal fatigue appears disconnected from the purpose of the study. The authors should clarify how this point relates to the research questions or remove it if it is not directly relevant.

Lines 83–91: The current framing conflates child-directed speech with pitch modulation. Pitch is only one component of CDS, rather than CDS itself. Attributing the effects of CDS on children’s speech perception primarily to pitch risks obscuring the contributions of other acoustic, prosodic, and interactional features that co-occur in CDS.

Lines 106–108: The claim that enhanced pitch facilitates children’s vocabulary growth is not clearly supported by the cited literature. My reading of the referenced work suggests that CDS interventions may mediate vocabulary development, but that pitch alone has not been shown to play such a role. The authors should carefully re-examine the cited studies and revise this claim accordingly.

Lines 111–123: The studies reviewed here demonstrate that maternal pitch is a reliable and developmentally sensitive correlate of CDS and child age. However, these findings primarily establish pitch as a descriptive marker of caregiver adaptation, rather than as a functional mechanism supporting language learning. The relevance of this literature to the paper’s central research question remains indirect and would benefit from clearer theoretical linkage.

Lines 159–174: This paragraph addresses an important issue concerning pitch and tone in Mandarin CDS, but its logic and terminology require clarification. Pitch, lexical tone, and intonation are not clearly distinguished and are sometimes treated interchangeably. Evidence for higher pitch in CDS is presented alongside findings showing no significant enhancement of tonal contrast, yet the relationship between these results is not adequately explained. The abrupt shift to specific tone contrasts (e.g., T1–T4 vs. T2–T3) and the concluding claim about balancing tone between syllables are insufficiently motivated.

Table 4: The manuscript does not clearly explain how dependent variable values ranging from 0 to 2 were derived. Given that individual trials involve binary “like/dislike” responses, it appears that responses may have been summed across multiple sentences within each condition; however, this scoring procedure is not explicitly described. The authors should clearly state how responses were coded, how many trials contributed to each condition, and how the reported means and standard deviations were calculated.

Line 437 (Conclusion): The conclusions drawn are not logically supported by the results. The data show no difference between high and normal pitch, yet the authors conclude that kindergarten teachers should modify their pitch and avoid both high and low pitch. If children dislike low pitch but do not dislike high pitch, it is unclear why high pitch should be avoided. These recommendations are not justified by the reported findings and should be reconsidered or removed.

 

Comments for author File: Comments.pdf

Comments on the Quality of English Language

The sentences are generally easy to understand; however, the logical flow between them is difficult to follow. It would be helpful to review how each sentence connects to the next to ensure logical coherence and clear relevance to the research questions.

     

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Very interesting paper showing that Mandarin-speaking children have no  significant preference for high-pitch over normal-pitch speech but have a clear dislike for low-pitch speech. I recommend the paper for publication with minor revisions. In the attached review, I have left specific comments regarding various aspects that need improvement (some unclear passages that need refinement, various typos, etc.). While the paper is important in showing children's attitudes towards pitch, I wonder what effects this may have upon learning- it would be nice if the authors engaged with this question as well.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

There are several English grammar mistakes in the paper, I have left various comments in the paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The study aims to investigate whether Mandarin-speaking preschool children prefer high, normal, or low pitch. To this end, the researcher carried out a perceptual experiment in which Mandarin children were asked to choose between liking or disliking the pitch of the stimuli (sentences with T1, T2, T3, T4, or mixed tone) that they heard. This paper is interesting and useful because, so far, few studies have focused on the pitch preference of these children.

 

General concept comments

As mentioned above, I consider that the ideas presented in this paper are interesting and that the research would benefit the relevant field. However, to further emphasize the originality of this paper, I think it is essential to clarify and strengthen the results and discussion sections, especially regarding the statistical analysis. To that end, I have also provided detailed comments in the comments section of the relevant page, so please take a moment to review them.

It should also be noted that no limitations of the study are stated in the paper. I believe that all studies have some limitations. Therefore, I suggest including some limitations and, if applicable, making recommendations for further research. For example, you only have one female speaker, but her age is not specified. Depending on whether she is older, her voice frequency might differ. In that case, would the results be the same as those obtained in this study? Furthermore, what would the results have been if there were multiple speakers instead of just one? One could argue that there is insufficient data to draw conclusions about whether children who are native Mandarin speakers prefer higher voices over lower ones based on a single speaker. It might be worthwhile to discuss these points.

 

Specific comments 

Regarding the entire article:

  • As for apostrophes, I found two types:' and ’. I believe it would be more readable to unify them to one style.
  • Regarding quotation marks, I found two types: “” and "". I believe it would be more readable to unify them to one style.

I have not specifically commented on the abstract; however, considering the individual points mentioned below, I believe it would be beneficial to reconsider the content of the abstract.

Suggested improvements or changes in each section (except comments on English Languages)

  • (p.1, line 39) Typo: You need to insert a space between “vocal organs” and “(Xu, 2019)”.
  • (p.2 line 59) “Szabo Portela et al. 2013).” => “Szabo Portela et al. 2014).”
  • (p.3 line 99-100) “The same facilitative results were obtained in intervention studies of child-directed speech.”
    Whose studies are being referred to here? Is this referring to the studies cited in the next sentence? I believe it would be clearer for readers if the specific researchers were identified in this part.
  • (p.3 line 112) “Brightness”: I think this term may be unclear to those who are encountering this field for the first time in this paper. Therefore, I recommend including an explanation.
  • (p.3 line 112) The phrase "As a special tonal language, Mandarin has four tones" could be clarified. What is meant by “special” in this context? It would be beneficial to elaborate on what makes Mandarin unique, specifically regarding its tonal aspects.
  • (p.3 line 130-131) “like vowels, consonants and other phonemic segments,” => “phonemic segments”
  • (p.4 line 145) “in early childhood, and” => “in early childhood, since”
  • (p.4 line 152) Are you referring to “some researchers” as Gauthier & Shi (2011)? If so, I think it would be better to use a narrative citation here to clarify that you mean those researchers for the readers. (For the narrative citation, please see https://mdpi-res.com/data/the-mdpi-apa-reference-list-and-citations-style-guide-v2-2025.12-online.pdf)
  • (p.4 line 169) “the control” => “the contrast”?
  • (p.4 line 171-173) “However, because of… T1 and T4”: Honestly, I found this sentence unclear, so I suggest rewriting it to make it easier for readers to understand.
  • (p.4 line 180) Are you referring to “studies” as Steen & Englund (2022)? If so, I think it would be better to use a narrative citation here to clarify that you mean those researchers for the readers.
  • (p.4 line 190) “Mandarin”: I suggest changing the paragraph starting from this word.
  • (p.5 line 196-197) I kindly request a reconsideration of the relevant section for the following reasons:
    1. I believe your first hypothesis is too vague. It would be more effective to specify what you mean by “clear preference for pitch”, such as whether participants prefer higher pitches or lower pitches. Without this specificity, it is difficult to explain, from the perspective of hypothesis validation, why you conducted perceptual experiments using three pitches (high, normal, and low) and statistically analyzed the results.
    2. I honestly do not understand what “vocal pitch” refers to in your second hypothesis. I believe that you need to clarify it.
    3. I believe that the term “Mandarin” is missing from both hypotheses. Without this term, it becomes unclear to readers why you included a section on “tone of Mandarin” in your literature review.
    4. This section lacks a description of the interactions between lexical tones and pitch, as well as between the age of participants and pitch—elements that you statistically analyzed. I believe that statistical analysis is meant to demonstrate whether your hypotheses are supported. If you investigated elements unrelated to the hypotheses, you should explain why these independent variables were included in your statistical analysis before presenting the results. As for participants’ ages, it is not clear to me why you conducted statistical analyses by dividing the participants into these three age groups, as this is not evident from your literature review.
  • (p.6 line 224-225) The next page lists who has read the text, but I believe it would be clearer for the readers if this information were anticipated here.
  • (p.7 line 231-232) I suggest providing an explanation about “Mandarin Level 2A” for readers, especially those who do not know Mandarin.
  • (p.7 line 245) “pitch” => “tone”?
  • (p.7 line 250-251) Were the F0 and intensity values extracted from the midpoint of each sentence? If so, please specify that.
  • (p.8 line 254) “According to the average Chinese speech fundamental frequency norm” Please specify the source of this information.
  • (p.8 line 262) “Analysis” => “Statistical analysis”
  • (p.8 line 262-273) In this section, I believe you should provide more detail about the statistical methods used. Table 3 mentions a paired-sample statistic comparison, but why was a t-test employed from the start? Or did you use ANOVA and then conduct post-hoc tests based on those results? Additionally, how can you claim “(line 264) Main acoustic parameters”? I believe one of the acoustic correlates of Mandarin lexical tone is duration (cf. p.2 of Chang H-S, Lee C-Y, Wang X, Young S-T, Li C-H, Chu W-C (2023) Emotional tones of voice affect the acoustics and perception of Mandarin tones. PLoS ONE 18(4): e0283635. https://doi.org/10.1371/journal.pone.0283635). However, the acoustic parameters of the speech stimuli you presented do not include duration. It would be beneficial to clarify the reason for this omission.
  • (p.8 line 265) “Masataka (Masataka, 1992)” => “Masataka (1992)”
  • (p.9 line 275) “E-prime program”: please consider changing this title as Perceptual experiment.
  • (p.9 line 276) “36” => “30”?
  • (p.10 line 320-322) Please consider writing the three independent variables as “Age,” “Pitch,” and “Tone” to maintain consistency with Tables 5 and 6, using capital letters for the initial letters.
  • (p.11 line 337) “the within-subject variables (pitch and pitch × tone)” => “the within-subject variables, namely pitch and pitch × tone,”
  • (p.11 line 339) “ANOVA”: please consider specifying three-way ANOVA
  • (p.11 line 340)
    “was extremely significant” => “was significant”.
    2. “p”: In the text, the “p” in p-values is presented both in italics and non-italics. Personally, I prefer the italicized format; however, setting aside my preference, I believe it should be unified to one format.
  • (p.11 line 342-343) “The third-order effect” => “Three-way interaction?
  • (p.11 line 343-344) “Since the main effects of pitch and tone were significant and both had two or more levels”: I believe that you need to add: “the interaction between Pitch and Age was also significant”
  • (p.11 line 344) Please specify which adjustment method you used when conducting the multiple comparisons.
  • (p.11 line 346) Please consider changing the title of Table 5 from “Spherical Test for the Pitch Preference of the Preschools” to “Results for Sphericity Test Hypothesis.”
  • (p.11 line 347) Please consider changing the title of Table 6 from “ANOVA for the pitch preference of the preschools.” to “Results for three-way ANOVA.”
  • (p.12 line 361-362) “1.23±0.04…” Unlike the entry on p. 11, line 350, “1.49 ± 0.04…”, there is no space between the number and the ± sign here. Please consider unifying them into one style.
  • (p.13 line 375-376) “the preference for high and normal pitches did not change significantly with age”: I believe this part contradicts what was stated in the previous sentence, so I kindly request reconsideration.
  • (p.13 line 381) “significant dislike”: This term appears multiple times, not just in this sentence, and I believe the expression “significant dislike” should be revised. For example, you could use "significant difference between three pitch heights." Since like/dislike is one of your dependent variables, using “significant” might imply a statistically significant difference. If you intended to say that participants strongly dislike the low pitch, it would be better to use another expression to avoid confusion
  • (p.13 line 384) According to the results of your statistical analysis on p. 12, a significant difference was found between T2 and the mixed tone. Therefore, it appears to me that this statement contradicts those results.
  • (p.13 line 401-402) “A significant interaction effect of pitch and age was seen only for low pitch”: It seems to me that this sentence contradicts the results of your statistical analysis. For high pitch, there was a significant difference between the small-age group and the large-age group, am I correct?
  • (p.13 line 402-403) “Preschool children had similar preference for both high and normal pitches”: It appears to me that this sentence contradicts the results of your statistical analysis, which revealed a significant difference, for high pitch, between the small-age group and the large-age group, am I correct?
  • (p.13 line 407-408) It seems to me that this sentence merely describes the phenomenon that preferences for pitch vary by participants’ age, but it does not explain why this phenomenon occurs. I kindly request a reconsideration of this statement.
  • (p.14 line 412 and 416) Following APA style, I think you need to use narrative citations.
  • (p.14 line 420) It appears to me that this sentence contradicts the results of your statistical analysis on p. 12: a significant difference was found between T2 and the mixed tone.
  • (p.14 line 420) “Children who speak tonal languages” => “Children who speak tonal languages as their mother tongue”
  • (p.14 line 440-441) It appears to me that this sentence contradicts the results of your statistical analysis on p. 12: a significant difference was found between T2 and the mixed tone.
  • (p.14 line 448) “Huang ;” => “Huang;”
  • (p.15 line 483-484) The journal name, Journal of the Acoustical Society of America, is fully written out here, but it is abbreviated in other citations. I believe it would be better to unify the format to one style.
  • (p.15 line 485) “10.1111/infa.12006” => “797-824”
  • (p.15 line 492) “El380-” => “EL380-”
  • (p.16 line 500) “NH, D. E. J.,” => “De Jong, NH.,”
  • (p.16 line 508) “e923-948.e929” => “e23-948.e29”
  • (p.16 line 509) https://doi.org/10.1016/j.lin-gua.2012.02.005 => https://doi.org/10.1016/j.lingua.2012.02.005
  • (p.16 line 511) “12(1)” => “12”
  • (p.16 line 532) “507.e501-505” => “507.e1-507.e5”
  • (p.16 line 535) “J Voice.” => “J Voice, 39(1), 105-112.”
  • (p.17 line 563-564) https://www.sciencedirect.com/sci-ence/article/pii/S0273229716300429 => https://www.sciencedirect.com/science/article/pii/S0273229716300429
  • (p.17 line 575) “4464” => “4464-4473”
  • (p.17 line 577) “3169” => “3169-3183”
Comments on the Quality of English Language

I have a question regarding the tense used, particularly whether some verbs in the present tense on pages 3 and 4 should be written in the past tense. However, since I am not a native English speaker, I recommend having a native English speaker check the paper’s language.

 

Suggested improvements or changes in each section

  • (p.3 line 136) “has” => “have”.
  • (p.4 line 148-149) I noticed that the verb “use” is repeated twice in the same sentence, and I believe it would be better to avoid this repetition.
  • (p.4 line 152-156) I recommend rewriting this sentence, as it is unclear what “which” refers to in line 154.
  • (p.4 line 163) “Mothers over-articulated both” => “Mothers over-articulated to both”
  • (p.4 line 167) “tone, duration” => “tone and duration”
  • (p.4 line 193-195) Is the subject of this sentence “a balance of utterance… and syllable tone”? If so, the verb needs to be plural: “were”.
  • (p.6 line 215) “Due to” => “Due to the fact that”
  • (p.7 line 235) “by females” => “by a female”? (You had only one female native Mandarin speaker, am I correct?)
  • (p.8 line 262) “show” => “showed”
  • (p.8 line 266) “Tagalog, Korean)” => “Tagalog, and Korean)”
  • (p.9 line 281) “sentence” => “sentences”?
  • (p.16 line 505) Typo 1. “Pereference” => “Preference”. Typo “Stidies” => “Studies”

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 4 Report

Comments and Suggestions for Authors

Firstly, I would like to sincerely thank the authors for taking the time to read each of my comments carefully, for taking them seriously, and for making the necessary corrections in the respective areas. I truly appreciate the thoughtful responses I received from you, even though reading the reviews and revising the paper accordingly can be quite labor-intensive and stressful. After reading your cover letter, I also read your revised paper, and I found it to be much more readable and engaging. To further enhance the readability of your manuscript, I would like to offer a few suggestions, and hope that they will be useful.

 

  • (p.1, line 28-29) “a strong and significant aversion to low-pitch speech”: I apologize for insisting again, but I believe the expression “significant aversion” should be revised. I kindly request that you consider eliminating “and significant.” As I mentioned in my previous comment, since “like/dislike” is your dependent variable, using “significant” might imply a statistically significant difference.
  • (p. 5, line 242) “Some evidence”: I kindly suggest that you cite a reference.
  • (p. 10, line 369) “the average Chinese speech fundamental frequency norm”: I kindly suggest that you cite a reference.
  • (p.13, line 467, 473) I kindly request that you capitalize each word: “Pitch”, “Pitch × Tone” (line 467); “Pitch”, “Tone”, and “Age”. This is to maintain coherence with p.12, and Tables 5 and 6.
  • (p.15, line 508 and 510) “reaction time” (line 508); “looking time” (line 510): These two measures appeared only in the results section. However, I believe it would be beneficial to mention in the methodology section that both reaction time and looking time were measured to avoid their sudden appearance in the results section. Furthermore, I kindly request that you consider specifying that these two measures were not your dependent variables; instead, they were only used to analyze “like/dislike response,” which is your dependent variable.
  • (p.16, line 548) “An alternative explanation”: While the term “alternative” is used, the previous paragraph discusses not an explanation for your findings, but rather your experimental results. Therefore, I kindly suggest considering changing “An alternative explanation” to “A possible explanation.”

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop