1. Introduction
Pseudowords are widely used in research across psycholinguistics, linguistics, and cognitive neuroscience. As nonwords that conform to the phonological and orthographic constraints of a language but lack semantic content (e.g., “badoti”), they provide a powerful means of investigating cognitive mechanisms such as phonological decoding, word recognition, reading, and natural and artificial language acquisition. Their use in research dates back to early experimental psychology. Ebbinghaus [
1] is often credited as a conceptual forerunner, employing nonsense syllables (e.g., “bok”, “dap”) to study learning and forgetting while minimizing confounds related to prior knowledge and meaning. Pseudowords became fully integrated into experimental research in the mid-20th century through influential contributions such as Broadbent’s [
2] use of nonsense materials in dichotic listening tasks to study selective attention to auditory streams, and Berko Gleason’s [
3] seminal “Wug Test,” which used pseudowords (e.g., “wug”) to assess morphological competence independently of lexical familiarity. Since then, pseudowords have become central not only to controlled experimental investigations, but also to the assessment of language and literacy skills and to research on language-related disorders, including developmental language disorder, dyslexia, and dysgraphia (see [
4,
5,
6,
7]).
A major domain of application concerns artificial language learning paradigms, such as the triplet-embedded statistical-learning paradigm introduced by Saffran and colleagues [
8]. In this paradigm, participants are familiarized with a continuous stream of syllables (e.g., “gikobatokibutipolugopilatokibu”), in which trisyllabic pseudowords (e.g., “gikoba”, “tokibu”, “tipolu”, “gopila”) are embedded without pauses or explicit boundary cues. Transitional probabilities are typically higher within pseudowords than across boundaries (e.g., 1.0 vs. 0.33), allowing learners to segment the stream based solely on distributional regularities. Learning is then assessed in a test phase, often using a two-alternative forced-choice (2-AFC) task, by asking participants to discriminate familiar “words” from foils such as part-words (sequences spanning pseudoword boundaries; e.g., “kobato”) or nonwords (novel recombinations of the syllables used in the stream; e.g., “gitoti”). Many studies have since used the original paradigm or closely related variants to examine how learners detect recurring patterns in auditory and/or visual input and build phonological and grammatical representations in the absence of explicit instruction (e.g., [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22]).
Because pseudowords can be constructed and manipulated in highly systematic ways, artificial-language-learning research has used them to probe how multiple linguistic properties (such as transitional probabilities, phonotactic probability, syllable structure, or morphological complexity) shape learning, offering insights into mechanisms relevant for natural language acquisition and processing (e.g., [
23,
24,
25,
26,
27]). However, like words, pseudowords are complex stimuli. Their construction requires careful control over multiple dimensions, including the legality of letter/phoneme/syllable sequences, the frequency with which those sequences occur in the lexicon, grapheme–phoneme correspondence regularity, and the density of orthographic and phonological neighbors [
28,
29,
30,
31,
32]. For instance, pseudowords that are phonologically identical or very similar to existing words (e.g., pseudohomophones such as “brane” for brain) are processed more slowly and are harder to reject as nonwords, yielding robust pseudohomophone effects [
33,
34]. Likewise, pseudowords derived from base words via letter substitution (e.g., “brone” from brave) or transposition (e.g., “jugde” from judge) show reliable processing costs relative to entirely novel forms [
35,
36] (see [
37] for a recent overview). These findings underscore that similarity to real words and sublexical distributional structure can introduce confounds if not carefully controlled.
A further methodological issue is that pseudowords are often created and controlled primarily in their written form. While this is appropriate for many reading-focused questions, written control may fail to capture phonetic and suprasegmental factors that shape spoken-language processing, including syllable structure, phonotactic constraints in speech, and patterns of prominence and stress, especially in languages with variable stress assignment such as European Portuguese (EP; see [
38,
39,
40,
41]). To address this limitation, researchers frequently generate auditory tokens of orthographically created pseudowords, sometimes via text-to-speech synthesis (e.g., [
42,
43]; see [
44] for a different approach). In addition, many statistical-learning paradigms deliberately counterbalance syllables across positions (e.g., “gikoba”, “bagiko”, “bagiko”) to reduce confounds arising from segmental or distributional properties of specific syllables and their combinations (e.g., [
19,
20,
45,
46,
47]).
However, as Siegelman et al. [
48] pointed out, such paradigms often assume that learners have no prior knowledge of the artificial language. This assumption is problematic because the stimuli are typically composed of syllables from participants’ native language and, therefore, carry distributional regularities that may affect segmentation and learning. For example, Elazar et al. [
9] showed that, in Spanish, the frequency with which syllable combinations occur in the native language influences how participants segment auditory streams composed of trisyllabic nonsense words (see also [
49,
50]). These findings challenge “blank slate” assumptions and highlight the importance of quantifying the degree of similarity between experimental pseudowords and the listener’s existing lexical patterns.
This similarity is often referred to as wordlikeness, and it has been operationalized in multiple ways (e.g., [
51,
52,
53,
54,
55,
56,
57,
58]). Classic accounts link wordlikeness to phonotactic probability, i.e., the likelihood or frequency with which phonemes and phoneme sequences occur in specific positions within words in a language [
54,
59,
60] (see also [
59]) and to neighborhood density, i.e., the number of real words that are phonologically similar to a given form [
60]. Converging evidence suggests that higher phonotactic probability and denser lexical neighborhoods tend to yield higher wordlikeness judgments (see [
51] for a review). Bailey and Hahn [
51], for instance, collected explicit acceptability judgments (“how good would this be as a word of English?”) and showed that wordlikeness is not reducible to a single cue: both phonotactic probability and lexical similarity contributed independently, with lexical similarity often emerging as the stronger predictor. Evidence from auditory lexical decision further supports this view: high-probability/high-density nonwords are typically harder to reject, consistent with stronger activation of multiple lexical candidates [
61]. Yet, the factors shaping wordlikeness ratings are not yet fully understood and may vary across languages and stimulus sets, suggesting that additional cues beyond segmental distributional structure may contribute to what “sounds like a word”.
A factor that has been largely overlooked in wordlikeness judgements is the extent to which pseudowords conform to the dominant stress pattern of a given language, a prosodic cue that is particularly relevant in languages with variable stress, as in EP [
38,
39,
41]. Indeed, previous studies have shown that speakers of languages with fixed stress, such as French, where stress falls on the last syllable, were less efficient than speakers of languages with variable stress, such as Spanish, at distinguishing pseudowords that vary only in stress position (e.g., nuPi vs. NUpi—uppercase letters are used for illustrative purposes), showing stress “deafness” (e.g., [
62,
63,
64,
65,
66]). Nevertheless, it is important to note that lexical stress is signaled by multiple acoustic correlates, including duration, fundamental frequency (F0), intensity, and vowel quality [
41,
67], and that the relevance of these cues in stress detection can differ across languages (e.g., [
68,
69]). For example, in English, F0-related prominence (pitch) is often described as a major cue to stress perception (e.g., [
70,
71]), whereas in EP, vowel quality (i.e., vowel reduction), is typically assumed to play a critical role in stress discrimination once when it is minimized in the input, EP speakers show a “stress-deafness-like” pattern, resembling what has been reported for speakers of languages with largely predictable (fixed) stress (see [
72]). Still, in a subsequent study aimed to further analyze whether EP native speakers showed the stress “deafness” pattern when vowel-quality (vowel reduction) cues are unavailable at pre-attentive (Event -related Potential—ERP) and attentive (behavioral) levels, Lu et al. [
73] found that stress contrasts between trochaic (“BUbu” [’bubu]) and iambic (“buBU” [bu’bu]), pseudowords elicited the ERP mismatch negativity (MMN) component and a subsequent late negativity, indicating that EP listeners can discriminate stress pre-attentively even when vowel reduction is absent. Moreover, Lu et al. [
73] reported an iambic advantage, with larger and more sustained neural responses for iambic patterns, converging with behavioral evidence that iambic stress is processed more efficiently than trochaic stress even under these cue-reduced conditions (see also [
74,
75] for further evidence of an iambic stress advantage in EP infants).
The present work builds on these considerations with a practical goal: to provide a normative resource of wordlikeness judgments for EP auditory pseudowords that can support reproducible stimulus selection and matching in speech-oriented paradigms. Specifically, we introduce the
Minho Pseudoword Wordlikeness Ratings (MPWR), the first normative dataset of wordlikeness ratings for EP auditory trisyllabic CV pseudowords. We compiled a set of 120 trisyllabic items (CV.CV.CV) assembled from naturally produced syllables drawn from the Minho Spoken Syllable Pool (MSSP; [
76]), an EP resource providing high-quality recordings of 266 CV syllables along with linguistic/acoustic annotations and corpus-derived syllable-frequency counts based on SUBTLEX-PT [
77]. Trisyllabic CV strings were selected because they provide a useful balance between simplicity and representational richness and because they closely match the stimulus structure widely used in statistical learning and speech-stream segmentation research following Saffran et al. [
8] (see [
78] for a meta-analysis).
To increase the utility of the resource across a broader range of auditory paradigms, each pseudoword was implemented in three token types (120 × 3 = 360 auditory tokens) while keeping segmental composition constant: a baseline (flat) condition and two F0-enhanced versions (+15%), which increase pitch prominence on either the penultimate or the final syllable. This manipulation is widely used as an experimental cue to highlight syllabic prominence in auditory materials, particularly when stimuli are presented in isolation or under cue-reduced conditions, while preserving segmental identity across conditions. The F0 enhancement is therefore included as a controlled experimental cue to highlight syllabic prominence while keeping segmental content constant. It should not, however, be interpreted as modeling the full acoustic realization of lexical stress in EP, which normally reflects multiple interacting cues such as vowel quality, duration, intensity, and F0 [
38,
39,
41].
Wordlikeness ratings (“How good would this be as a word of EP?”) were collected from native EP speakers for all tokens. Normative item means were computed after standard data screening and trimming procedures. In addition to the normative ratings, the MPWR provides syllable-based corpus metrics that facilitate stimulus characterization and enable exploratory analyses of rating variability. Because phoneme-level phonotactic probability and comprehensive lexical-neighborhood measures—central to much previous wordlikeness research (e.g., [
51,
59,
60,
61,
79])—are not yet available in directly comparable form for EP auditory pseudowords, we operationalized sublexical familiarity at the syllabic level using MSSP-derived norms. Importantly, these measures are not equivalent to phoneme-level phonotactic probability in the strict sense, nor do they replace lexicon-wide neighborhood metrics. Rather, they provide complementary descriptors that are particularly well suited to the structure of the present materials. Our stimuli are auditory trisyllabic CV pseudowords assembled from naturally produced spoken syllables, and the MSSP allows these syllabic building blocks to be characterized in terms of their corpus-based distributional familiarity, both across EP more broadly and within trisyllabic words specifically. In addition, for the F0-enhanced tokens, the same resource makes it possible to estimate how often the targeted syllable occurs in stressed position in trisyllabic EP words. Thus, the present indices capture aspects of syllable-level familiarity and stress-position regularity that are directly aligned with the auditory and prosodic structure of the stimuli. More broadly, psycholinguistic tools developed for the written domain, such as Wuggy [
80] and SYLLABARIUM [
81], show that syllabified and subsyllabic structure can provide a principled basis for generating controlled nonwords, further supporting the relevance of descriptors defined at this level.
Specifically, based on MSSP counts, we computed (i) a general syllable-familiarity index capturing overall syllable frequency independent of prominence marking, and (ii) a prominence-position index quantifying how often the syllable targeted by the F0 enhancement occurs in stressed position in trisyllabic words, based on MSSP stress-position counts. Because baseline tokens do not explicitly cue prominence via pitch, prominence-position metrics are only defined for the two F0-enhanced token types, where the target syllable is explicitly highlighted.
In sum, the MPWR provides the first normative dataset of wordlikeness ratings for EP auditory trisyllabic pseudowords, together with unified item-level metadata and syllable-based corpus metrics derived from MSSP. The resource is designed to support reproducible stimulus selection and matching in research on spoken word recognition and learning. The inclusion of optional F0-enhanced tokens broadens applicability to paradigms that benefit from an explicit, controlled prominence cue while maintaining constant segmental content.
3. Results
All analyses were conducted on item-level normative ratings. For each of the 120 trisyllabic pseudowords, we derived an average wordlikeness rating for each of the three prosodic realizations (flat, middle-pitch, and final-pitch). These item means were computed after applying a series of data-screening and trimming procedures designed to ensure the integrity of the normative estimates, following previous Portuguese norming studies (e.g., [
40,
82,
83,
84]). Ratings that deviated more than ±2.5 standard deviations from the mean of each pseudoword and prosodic condition were removed, thereby minimizing the influence of occasional extreme judgments on the final estimates. This outlier trimming affected a small proportion of trials (flat: 1.3%, middle-pitch: 1.6%, final-pitch: 1.5%). Then, we computed the trimmed mean rating for each token, yielding 3988 valid responses in the flat condition, 3976 in the middle-pitch condition, and 3948 in the final-pitch condition (an average of 33 valid ratings per item across conditions). The normative wordlikeness values for each of the 120 pseudowords, available in the flat, middle-pitch, and final-pitch token types, can be downloaded from the OSF repository at
https://osf.io/5at7b/overview?view_only=42151fffef4e488ba45d62a1e84a7860 (accessed on 1 April 2026), from the University of Minho server at
https://s3.eu-central-1.amazonaws.com/files.cipsi.uminho.pt/s3fs-public/2026-01/MPWR_files.zip?VersionId=a.YPvpxgNDYwmS9qpKRx3cMPC6IUpqG1 (accessed on 1 April 2026) and are also available as
Supplementary Materials associated with this paper.
In the database, pseudowords are listed alphabetically and are numerically indexed (MPWR_Pseudoword_ID, 1–120). For each item, we provide IPA phonetic transcriptions (PH_t) for the three token types: PH_t_Flat, PH_t_Middle-Pitch, and PH_t_Final-Pitch. In the two pitch-enhanced token types, an apostrophe (′) indicates the syllable that received the localized pitch manipulation (i.e., the targeted prominence location). This mark is absent in the flat token type, which contains no localized prominence cue. For example, [ba′dɔti] in PH_t_Middle-Pitch indicates F0 enhancement on syllable 2, whereas [badɔ′ti] in PH_t_Final-Pitch indicates F0 enhancement on syllable 3; the corresponding flat transcription is [badɔti]. Then the wordlikeness ratings are provided for each condition: mean, standard deviation, minimum, maximum, median, first and third quartile, and confidence intervals (95%).
In addition to the normative ratings, the database includes syllable-based corpus metrics derived from the MSSP norms for the three constituent syllables of each pseudoword (PH_s1, PH_s2, PH_s3). For each syllable, we report type and token frequency counts computed (i) over the full corpus (e.g., MSSP_Syll_freq_type_all_s1, Syll_freq_token_all_s1) and (ii) over the subset of trisyllabic words only (e.g., MSSP_Syll_freq_type_N3_s1, MSSP_Syll_freq_token_N3_s1). We also provide position-sensitive syllable frequencies indicating how often each syllable occurs in word position p1, p2, or p3 (e.g., MSSP_Syll_freq_type_N3_p1_s1, MSSP_Syll_freq_type_N3_p2_s1, MSSP_Syll_freq_type__N3_p3_s1, and corresponding token-frequency fields). Finally, to support stimulus characterization and exploratory modeling of rating variability, the database reports syllable counts in stressed positions within the trisyllabic corpus (e.g., MSSP_Syll_stress_type_N3_p1_s1, MSSP_Syll_stress_type_N3_p2_s1, MSSP_Syll_stress_type_N3_p3_s1).
As mentioned, because phoneme-level phonotactic probability and lexicon-wide neighborhood measures commonly used in wordlikeness work (e.g., [
51,
60,
61,
79]) are not yet readily available for EP pseudowords, we operationalized sublexical familiarity at the syllable level using MSSP token-frequency and stress-position counts [
76,
77]. Specifically, we computed two syllable general wordlikeness familiarity indexes (SWI) for each pseudoword: (i)
SWIAll, defined as the mean of ln-transformed token frequencies (natural logarithm, base e) of the three syllables (
s1,
s2, and
s3) based on global MSSP counts (i.e., across the full SUBTLEX-PT-derived corpus underlying MSSP); and (ii)
SWIN3, defined analogously using MSSP token counts restricted to trisyllabic words (N3). These indices provide two versions of syllable-familiarity reflecting different baselines of distributional familiarity operationalized by the following formulas
where
fAll (
s) and
fN3 (
s) denote the MSSP syllable token frequency for syllable
s (1–3), computed from SUBTLEX-PT [
77], using all words vs. the subset of trisyllabic words (N3), respectively. In the database,
fAll (
s) correspond to the MSSP_Syll_freq_token_all_s1, MSSP_Syll_freq_token_all_s2, and MSSP_Syll_freq_token_all_s3 metrics, and the
fN3 (
s) correspond to the MSSP_Syll_freq_token_N3_s1, MSSP_Syll_freq_token_N3_s2, and MSSP_Syll_freq_token_N3_s3 metrics. The log transform with a +1 offset reduces the influence of extremely frequent syllables and yields a smoother familiarity scale comparable across items. If perceived wordlikeness reflects distributional familiarity, pseudowords composed of higher-frequency syllables should, on average, be judged as more plausible wordforms [
51,
54].
Moreover, to analyze whether F0-based prominence marking aligns with distributional tendencies of stress placement in EP, we computed an exploratory index capturing the Stress-Position Propensity of the syllable targeted by the F0 enhancement (SPPmarked). This index was computed only for the two F0-enhanced conditions, because the baseline (flat) tokens contain no explicit acoustic cue specifying which syllable is intended to be prominent.
Let sm be the syllable carrying the F0 enhancement, with position p = 2 in the middle-pitch condition and p = 3 in the final-pitch condition. From MSSP stress-position counts restricted to trisyllabic words (N3), we extracted C (s, p): the number of trisyllabic word types in which syllable s (2–3) occurs stressed in position p (2–3).
Because many syllables have sparse counts in some stressed positions (particularly position 3), we used add-one (Laplace) smoothing over the MSSP syllable inventory (V = 266):
Moreover, to analyze whether F0-based prominence marking aligns with distributional tendencies of stress placement in EP, we computed an exploratory index capturing the Stress-Position Propensity of the syllable targeted (sm) by the F0 enhancement (SPP_marked). This index was computed only for the two F0-enhanced token types, because the baseline (flat) tokens contain no explicit acoustic cue specifying which syllable is intended to be prominent. Let sm be the syllable carrying the F0 enhancement, with position p = 2 in the middle-pitch token type and p = 3 in the final-pitch token type. From MSSP stress-position counts restricted to trisyllabic words (N3), we extracted C (s, p): the number of trisyllabic word types in which syllable s (2–3) occurs stressed in position p (2–3). For transparency and re-use, the database also provides these underlying stress-position counts for the targeted syllable in each condition (e.g., C_stress_p2_s2 for the middle-pitch token type and C_stress_p3_s3 for the final-pitch token type), along with the corresponding derived indices SPP_markedP2 and SPP_markedP3.
Because many syllables have sparse counts in some stressed positions (particularly position 3), we applied add-one (Laplace) smoothing over the MSSP syllable inventory (V = 266):
We then log-transformed this probability:
This yields SPPmarked for the middle-pitch (p2) condition (SPPmarkedP2) and for the final-pitch (p3) condition (SPPmarkedP3). The rationale is that, if introducing an explicit prominence cue affects plausibility judgments in a way that is sensitive to corpus-based stress-position regularities, F0 marking may be less penalizing (or more acceptable) when the targeted syllable has higher stress-position propensity in that position in trisyllabic words, over and above general syllable familiarity.
A summary of descriptive statistics for the MPWR item means and the derived syllable metrics is provided in
Table 3, including mean, SD (across items), median, range, and quartiles (Q1 and Q2).
As shown in
Table 3, item-level wordlikeness ratings were generally low, clustering toward the lower end of the 7-point scale, while still exhibiting substantial variability across items in all three token types. This overall low wordlikeness profile is expected given the design goals of the stimulus set: the pseudowords were deliberately constructed to sound minimally like existing EP words, reducing the likelihood of accidental overlap with real lexical items (e.g., close phonological neighbors or pseudohomophone-like forms) and thereby limiting unintended familiarity-based advantages in downstream experiments—particularly in artificial language learning and speech-stream segmentation paradigms, where even subtle lexical similarities can bias segmentation and learning.
Table 3 further reports 95% confidence intervals for the item-level means. These intervals are relatively narrow, reflecting stable mean estimates across the 120 items, while the SDs, ranges, and quartiles highlight meaningful dispersion in plausibility across the stimulus set. In line with the intended design, average wordlikeness was slightly higher for the baseline (flat) tokens than for the two F0-enhanced token types, whereas the middle- and final-pitch means were very similar. Descriptively, this pattern suggests that adding a localized F0-based prominence cue to otherwise cue-reduced CV strings did not increase lexical plausibility and, if anything, made the stimuli sound marginally less like plausible EP wordforms.
The table also summarizes the distribution of the MSSP-derived syllable metrics. Both general familiarity indices (SWIAll and SWIN3) show meaningful dispersion across items, indicating that—even within a tightly controlled CV inventory—the stimulus set spans a broad range of syllabic distributional familiarity depending on whether frequencies are computed across the full corpus or restricted to trisyllabic words. The position-specific indices for the F0-targeted syllable (SPP_markedP2 and SPP_markedP3) are negative by construction because they reflect log-transformed smoothed probabilities; importantly, they also vary across items, capturing differences in how often the targeted syllable is attested as stressed in the corresponding position in trisyllabic EP words. Together, these descriptive patterns confirm that MPWR provides not only normative ratings but also interpretable corpus-based descriptors that can be used to characterize, match, and (exploratorily) model item-level variation in perceived wordlikeness.
To test whether F0-based prominence marking modulated perceived wordlikeness, we analyzed the ratings using within-item models that account for repeated measurements across token types. Specifically, we estimated (i) a repeated-measures ANOVA over items (within-item factor: token type) and (ii) linear mixed-effects models with random intercepts for item, which yield equivalent inferences under this design. Planned comparisons focused on (i) middle-pitch vs. flat, (ii) final-pitch vs. flat, and (iii) middle-pitch vs. final-pitch. We report mean differences with 95% confidence intervals and paired-effect sizes (Cohen’s dz). To examine whether distributional familiarity accounted for variability in wordlikeness, we used the two MSSP-derived measures SWIAll and SWIN3 (ln-transformed token frequencies with a +1 offset, averaged across syllables). Additionally, SPPmarkedP2 and SPPmarkedP3 were used to test whether stress-position propensity of the F0-targeted syllable modulated ratings within the two F0-enhanced token types. For the latter, models were fitted on the subset of F0-enhanced tokens (middle-pitch, final-pitch) and included an interaction between F0-target position and SPPmarked. Continuous predictors were mean-centered for interpretability.
3.1. Effect of Token Type on Wordlikeness Ratings
To test whether token type modulated wordlikeness at the trial level, we fitted a Gaussian linear mixed-effects model with token type (flat, middle-pitch, final-pitch) as a fixed effect and random intercepts for participant and item (1∣subject) + (1∣item). The model converged and showed a significant omnibus effect of token type, F(2, 11,692) = 8.42, p < 0.001. Holm-corrected pairwise comparisons indicated that flat tokens were rated higher than both F0-enhanced token types: flat > final-pitch (Δ = 0.1147, SE = 0.0310, t(11,695) = 3.695, pHolm < 0.001) and flat > middle-pitch (Δ = 0.1050, SE = 0.0310, t(11,694) = 3.391, pHolm = 0.001). In contrast, the two F0-enhanced token types did not differ (middle-pitch ≈ final-pitch; Δ = 0.0097, SE = 0.0311, t(11,695) = 0.311, pHolm = 0.756). Overall, adding a localized F0-based prominence cue did not increase perceived lexical plausibility; instead, it produced a small but reliable decrease in wordlikeness relative to the flat baseline, with comparable effects for penultimate- vs. final-syllable targeting.
The within-item repeated-measures ANOVA on item means yielded a consistent pattern, F(2, 238) = 4.14, p = 0.017, ηp2 = 0.034. Holm-corrected paired comparisons across items showed flat > middle-pitch (Δ = 0.11, 95% CI [0.02, 0.20], t(119) = 2.45, pHolm = 0.041, dz = 0.22) and flat > final-pitch (Δ = 0.11, 95% CI [0.02, 0.20], t(119) = 2.50, pHolm = 0.041, dz = 0.23), with no difference between middle- and final-pitch (Δ = 0.01, 95% CI [−0.08, 0.09], t(119) = 0.12, p = 0.904). These effect sizes are small, indicating that token-type differences are reliable but modest relative to the broader item-to-item variability in the set. From a practical perspective, this means that the localized F0 manipulation does not create qualitatively different classes of pseudowords, nor does it substantially alter the overall wordlikeness profile of the database. Rather, it exerts a small but systematic shift in perceived plausibility that becomes relevant when stimuli need to be closely matched across experimental conditions. In this sense, the manipulation is best understood as a fine-grained control factor, not as a major determinant of item selection.
3.2. Do Syllable-Based Corpus Metrics Explain Variability in Ratings?
We next examined whether corpus-derived syllable metrics accounted for item-to-item variability in MPWR ratings. Because the SWI indices are defined at the item level, we modeled item means in long format (120 items × 3 token types) using a Gaussian linear mixed-effects model with token type and syllable familiarity as fixed effects and a random intercept for item. General syllable familiarity (SWI
All; mean log-transformed syllable frequency across the three syllables) reliably predicted higher wordlikeness ratings across token types (β = 0.417, SE = 0.098, z = 4.254,
p < 0.001, 95% CI [0.225, 0.608]). Controlling for SWI
All, the fixed-effect pattern for token type remained consistent with
Section 3.1: relative to the final-pitch reference, flat tokens were rated higher (β = 0.113, SE = 0.044, z = 2.559,
p = 0.010), whereas middle-pitch did not differ from final-pitch (β = 0.005, SE = 0.044, z = 0.120,
p = 0.904). Together, these results indicate that item-to-item variability in MPWR ratings is systematically structured by syllable-level distributional familiarity, over and above small differences associated with token type.
To provide condition-specific descriptive context for stimulus selection, we also estimated separate item-level regressions within each token type. SWIAll significantly predicted item means in all three token types (flat: β = 0.40, SE = 0.11, t(118) = 3.81, p < 0.001, R2 = 0.109; middle-pitch: β = 0.38, SE = 0.11, t(118) = 3.44, p < 0.001, R2 = 0.091; final-pitch: β = 0.47, SE = 0.10, t(118) = 4.49, p < 0.001, R2 = 0.146), confirming a consistent familiarity–wordlikeness relationship across token types. When both familiarity indices were entered simultaneously (SWIAll and SWIN3), SWIAll remained significant, whereas SWIN3 did not (all ps ≥ 0.293), indicating that the global syllable-frequency baseline captured most of the explainable variance in this stimulus set.
3.3. Does the Stress-Position Propensity Index Explain Additional Variance in the F0-Enhanced Tokens?
We then tested whether the stress-position propensity of the F0-targeted syllable in the corresponding position in trisyllabic words (SPPmarked) explained additional variance beyond general syllable familiarity. Analyses were restricted to the F0-enhanced token types because SPPmarked is defined only when the targeted syllable is explicitly specified (penultimate targeting: SPPmarkedP2; final targeting: SPPmarkedP3).
Controlling for SWIAll, SPPmarked did not reliably predict ratings in either F0-enhanced token type. In the middle-pitch token type, the effect of SPPmarkedP2 was negligible (β = 0.011, SE = 0.081, t(117) = 0.13, p = 0.894). In the final-pitch token type, SPPmarkedP3 likewise did not explain additional variance (β = −0.076, SE = 0.067, t(117) = −1.14, p = 0.259). Thus, once general syllable familiarity is taken into account, the position-specific stress propensity of the targeted syllable provides limited incremental explanatory power within this stimulus set. Likewise, exploratory models predicting the F0 “penalty” relative to flat (Δ = F0-enhanced − flat) did not provide evidence that higher stress-position propensity reduced the penalty (middle-pitch penalty: β = −0.009, SE = 0.052, t(117) = −0.18, p = 0.858; final-pitch penalty: β = −0.028, SE = 0.046, t(117) = −0.59, p = 0.554). Overall, within these cue-reduced materials, variability in perceived wordlikeness was explained more consistently by overall syllable familiarity (SWIAll) than by the stress-position propensity of the F0-targeted syllable.
Together, these results indicate that MPWR items are consistently rated as low-to-moderately word-like, as intended given the goal of minimizing inadvertent lexical familiarity for downstream paradigms. Inferentially, token type reliably affected wordlikeness (flat > both F0-enhanced versions; middle ≈ final), item-to-item variability was systematically explained by general syllable familiarity (SWIAll), and the stress-position propensity indices for the targeted syllable (SPPmarkedP2/SPPmarkedP3) did not account for reliable additional variance beyond general familiarity in the F0-enhanced token types.
4. Discussion
“Scrabbling syllables into words” captures the practical motivation behind this work: researchers often need auditory pseudowords that are assembled from well-characterized sublexical units and whose lexical plausibility is known a priori. The main contribution of the present study is the
Minho Pseudoword Wordlikeness Ratings (MPWR), the first normative dataset of EP auditory trisyllabic pseudowords paired with syllable-based corpus descriptors derived from the MSSP. Beyond providing norms for stimulus selection and matching, the study addressed two exploratory questions: whether a localized F0-based prominence cue modulates perceived wordlikeness, and whether item-level variability in ratings is structured by distributional familiarity, here captured by syllable-level indices (cf. [
51,
54,
59,
60,
62]). Three findings are particularly informative. First, ratings were low overall, consistent with the goal of minimizing accidental similarity to real words in downstream paradigms such as artificial language learning and speech segmentation. In that sense, the low mean ratings should not be seen as a weakness of the dataset, but as a direct consequence of its design goals: pseudowords that sound too word-like may inadvertently recruit lexical representations and bias later measures of segmentation, learning, or recognition [
8,
33,
34,
35,
54]. At the same time, the dataset still spans meaningful item-to-item variation in plausibility, which is precisely what makes it useful for stimulus selection and matching. Second, token type reliably affected judgments, but only modestly: flat baseline tokens were rated slightly higher than both F0-enhanced versions, whereas middle- and final-targeting did not differ. Third, item-to-item variability was robustly explained by general syllable familiarity (SWI
All). In contrast, the stress-position propensity of the F0-targeted syllable (SPP
marked) did not explain additional variance once general familiarity was controlled.
The low mean ratings should not be interpreted as a weakness of the dataset, but rather as a direct consequence of its design goals. Pseudowords that sound too word-like can inadvertently recruit lexical representations (e.g., via close phonological neighbors or pseudohomophone-like relationships), biasing downstream measures of segmentation, learning, or recognition [
33,
34,
35,
54]. For paradigms such as the triplet-embedded design [
8], reducing uncontrolled lexical familiarity is often desirable to ensure that performance reflects sensitivity to distributional structure rather than accidental resemblance to known words. MPWR therefore provides norms precisely in the region of the stimulus space that is frequently needed in practice (novel, low-familiarity forms) while still spanning meaningful variability across items.
Our exploratory prediction that adding an explicit prominence cue might increase wordlikeness was not supported. Instead, introducing an F0 boost led to a small but reliable decrease in ratings relative to the flat baseline, with no robust difference between targeting syllable 2 versus syllable 3. This pattern highlights a useful dissociation: increasing acoustic salience via a localized F0 cue does not necessarily translate into greater lexical plausibility in tightly controlled, cue-reduced materials. From a practical perspective, the effect is modest but useful. Localized F0 cues can slightly shift plausibility judgments and should therefore be considered when stimuli are closely matched, yet their impact in the present set was clearly smaller than the effect of overall syllable familiarity. In this sense, F0 marking did not create qualitatively different classes of pseudowords, but rather introduced a small yet consistent source of variance that may need to be controlled in tightly matched auditory materials.
This result fits current views of cue weighting in lexical stress. Stress is indexed by multiple correlates, including duration, F0, intensity, vowel quality [
67], and languages differ in how these cues are typically weighted [
68,
69]. In English, F0-related prominence is often treated as a major perceptual cue [
70,
71]. In EP, by contrast, vowel quality (including vowel reduction) and duration are widely considered central to stress perception and production [
38,
39,
41], and when vowel-quality cues are minimized, listeners can show reduced sensitivity to stress contrasts [
72]. The present materials were intentionally designed under cue-reduced conditions: our pseudowords were constructed from CV syllables in order to maximize experimental control and provide tightly matched auditory materials for artificial-language paradigms. This design reduces unintended variation arising from syllable complexity, consonant overlap, and vowel-reduction cues, but it also limits ecological validity, since the resulting pseudowords do not reflect the full phonological and prosodic diversity of natural spoken EP. Crucially, it also means that the observed F0 effect should not be interpreted as a full estimate of how lexical stress contributes to EP wordlikeness. Rather, under these cue-reduced conditions, a localized F0 boost may have been perceived less as a canonical lexical-stress pattern and more as an intonationally marked or otherwise atypical prominence cue, thereby lowering perceived lexical naturalness. In short, F0 enhancement made items more prominent, but not more “word-like.” Consistent with this interpretation, the penalty was comparable for targeting syllable 2 and syllable 3, suggesting that the manipulation was not strongly mapped onto stress-location typicality for these stimuli. The practical significance of this effect should therefore be understood in terms of stimulus control rather than broad lexical naturalness. For many applications, the value of this finding lies less in showing that F0 marking substantially changes wordlikeness than in identifying a small yet consistent source of variance that may need to be controlled when constructing tightly matched auditory materials.
At first glance, the present findings might seem at odds with evidence that EP listeners can process stress without vowel reduction. Lu et al. [
73] showed that trochaic versus iambic contrasts in disyllabic pseudowords elicited an MMN and a subsequent late negativity, indicating pre-attentive discrimination even when vowel-quality cues are unavailable, and reported an iambic advantage in neural and behavioral measures (see also [
74,
75]). Our findings do not contradict Lu et al. [
73] because the constructs differ. Lu et al. [
73] showed that EP listeners can tell two stress patterns apart under cue-reduced conditions. MPWR, in contrast, measures lexical plausibility—whether a form sounds like it could be a real EP word. Thus, a listener can successfully detect “the prominence is on syllable 2 vs. syllable 3” while still feeling that a pitch-only prominence cue makes the token sound atypical in EP, and therefore less word-like. Moreover, projecting disyllabic asymmetries onto trisyllables is not straightforward. In trisyllables, multiple factors can compete (e.g., right-edge biases, frequency distributions, and lexical/prosodic-word constraints), and EP permits both penultimate and final stress with non-trivial frequencies depending on corpus, criteria, and weighting [
41,
84,
85,
86]. Against this background, the most principled conclusion is that F0-only prominence marking is not sufficient to increase perceived wordlikeness for trisyllabic CV forms, even though stress contrasts can remain discriminable when vowel reduction is absent.
A central value of MPWR is that it pairs normative ratings with transparent corpus-based descriptors that are directly aligned with the structure of the auditory materials. Across token types, general syllable familiarity (SWI
All) robustly explained item-to-item rating variability, showing that perceived wordlikeness in this stimulus set is systematically shaped by distributional familiarity at the syllabic level. This is consistent with a large body of literature showing that wordlikeness judgments are sensitive to probabilistic knowledge of sublexical structure [
51,
54,
60,
61,
79]. At the same time, previous research also shows that lexical similarity contributes independently to such judgments [
51,
62], reinforcing the view that wordlikeness is multidetermined rather than reducible to a single cue. Unlike phoneme-level phonotactic probability or lexicon-wide neighborhood density, the present resource captures a complementary aspect of sublexical well-formedness: how familiar the spoken syllabic building blocks of a trisyllabic CV pseudoword are in EP, both in the language more broadly and within trisyllabic words specifically. This focus is particularly appropriate here because the stimuli themselves were assembled from naturally produced spoken syllables, making syllable-level familiarity directly relevant to their auditory structure. In addition, the MSSP-based stress-position counts make it possible to characterize whether the F0-targeted syllable tends to occur in stressed position, a prosodically relevant descriptor that standard phoneme-level metrics do not typically provide in directly comparable form for EP auditory materials. These indices complement phoneme-based and neighborhood measures by providing transparent, reproducible, and linguistically appropriate descriptors for EP auditory pseudowords. Their use necessarily limits direct comparability with parts of the prior wordlikeness literature, but it also extends the characterization of auditory pseudowords in a direction that is especially useful for syllable-based speech paradigms.
In contrast, the stress-position propensity of the F0-targeted syllable (SPPmarkedP2/SPPmarkedP3) did not account for reliable additional variance beyond general familiarity in the F0-enhanced token types. This null result is informative and likely reflects both design constraints and cue ecology. Given the restricted segmental inventory and the CV format, many items are “non-lexical” by design; under such conditions, position-specific stress tendencies may be too weak to overcome the overall “nonword” signal, and the F0 cue may not be interpreted as lexical stress in a stable way. We therefore view stress-position propensity indices as auxiliary descriptors—useful for stimulus characterization and exploratory modeling—rather than primary determinants of wordlikeness in the present stimulus space. More generally, the F0 enhancement is included as a practical experimental cue to highlight syllabic prominence while keeping segmental content constant; it should not be interpreted as modeling the full acoustic realization of lexical stress in EP.
A further boundary of MPWR concerns generalizability across Portuguese varieties. The present norms were collected for European Portuguese and should therefore be interpreted as variety-specific. Although many of the CV combinations and the overall stimulus format may still be useful as tightly controlled auditory materials in Brazilian Portuguese and other Portuguese varieties, perceived wordlikeness is likely to vary as a function of variety-specific phonotactics, vowel quality patterns, cue weighting in stress perception, and lexical similarity structure. MPWR should therefore be treated as an EP-specific normative resource, and local re-norming would be advisable whenever the materials are to be used as wordlikeness-calibrated stimuli outside EP.