System for Automatic Assignment of Lexical Stress in Croatian

Mikelić Preradović, Nives; Nacinovic Prskalo, Lucia

doi:10.3390/electronics11223687

Open AccessArticle

System for Automatic Assignment of Lexical Stress in Croatian

by

Nives Mikelić Preradović

¹ and

Lucia Nacinovic Prskalo

^2,*

¹

Faculty of Humanities and Social Sciences, University of Zagreb, Ivana Lučića 3, 10000 Zagreb, Croatia

²

Faculty of Informatics and Digital Technologies, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, Croatia

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(22), 3687; https://doi.org/10.3390/electronics11223687

Submission received: 1 October 2022 / Revised: 6 November 2022 / Accepted: 8 November 2022 / Published: 10 November 2022

(This article belongs to the Special Issue Recent Advances in the IoT and Smart City Based on Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

It is very popular today to integrate voice interfaces into IoT devices. The pronunciation and proper prosody of speech play a major role in the intelligibility and naturalness of synthesized voices. Each language has its own prosodic characteristics. In this paper, we present the results of a study aimed at testing the applicability of methods for modelling and predicting the prosodic features of the Croatian language. The extent to which their performance can be improved by incorporating linguistic features and linguistic peculiarities specific to the Croatian language was investigated. In the model learning process, tree classification was used to predict the lexical stress position and the type of stress in a word, and a lexicon of 1,011,785 word forms was used as the model learning set. Separate models were created for predicting the position and type of lexical stress. The results improved significantly after the rules for atonic words (clitics) were applied. A hybrid approach combining a rule-based approach and a modelling approach was also proposed. The final accuracy of assigning lexical stress using the hybrid approach was 95.3%.

Keywords:

voice interface; automatic lexical stress assignment; lexical stress detection; lexical stress in Croatian; text-to-speech synthesis

1. Introduction

Voice is a fundamental means of human communication. To facilitate human–computer interaction, text-to-speech (TTS) and speech-to-text (STT) systems have been used for some time. With the advent of the Internet of Things (IoT), voice interfaces that serve as a means of communication between humans and various devices have also become popular. Voice interfaces also use speech synthesis and speech recognition. Their intelligibility and naturalness largely depend on the language used [1].

Highly inflected languages (such as Slavic languages) are considered under-resourced because they do not have extensive resources that allow for the orthographic and phonological assignment of numerous inflected word forms. Croatian is an under-resourced language in terms of the availability of corpora (especially speech corpora) and resources needed for robust natural text and speech processing systems [2]. Just recently, the first open speech-to-text system [3] and ASR training dataset for Croatian [4] was published.

In Croatian, it is mostly straightforward to convert the graphemes of a word into phonemes once the lexical stress type and position are determined. There are some exceptions, which are described in [5].

Syllables, as smaller units of which words are composed, have different durations, strengths, tones, and pronunciation accuracies. In languages with a pitch accent, a syllable in a word is distinguished by the above-mentioned features and is called the lexical stress of the word. The role of lexical stress on a word is to emphasize the word as a unit of the speech sequence, and the position of the stressed syllable in the word is not important. In some languages, the position of the stressed syllable in a word is precisely determined; for example, in French, it is always the last syllable in a word [6]; in Polish, it is the penultimate syllable [7]; and in Czech, it is the first syllable in a word [8]. In other languages such as Croatian, lexical stress can be placed almost arbitrarily on any syllable, except for the last syllable [9]. Apart from the fact that stress is not bound to a particular syllable, it can also change within the paradigm so that different inflectional forms of the same word can have any of the four accents in the Croatian language. For example, the lexical unit stolac (Eng. chair) has a short rising accent in the nominative singular and a long rising accent from the genitive singular onwards, except for the vocative singular, where the accent is long falling. The plural forms of the same lexical unit have a long falling accent, except for the genitive plural, where the accent is short falling. Therefore, a prerequisite for modelling the prosody of the Croatian language is the existence of a lexicon in which the lexical stress of both the canonical (base) forms and the declined inflectional forms of words is marked.

In digital Croatian dictionaries such as in [10], only the canonical forms (lemmas) are marked with lexical stress, whereas the information for the lexical stress of all the declined word forms is missing. Most lexical entries contain only basic gender markers (masculine, feminine, and neuter). One also finds genitive and/or plural forms with lexical stress for some nouns, certain adjective forms, and the first person singular present tense for verbs, present verbal adverbs, past verbal adverbs, and verbal nouns. Therefore, in [11], an algorithm was implemented based on an extensive set of rules for all word classes, allowing for the prediction of the lexical stress position and type of accent in the canonical forms, as well as the morphological accent shift in the inflectional forms of all words of the open class. The result of the algorithm is a lexicon with 72,366 lemmas and 1,011,785 inflectional forms marked with lexical stress. The lexicon consists of all inflectional forms of nouns, verbs, and adjectives, as well as most inflectional forms of pronouns and numbers. Each entry in the lexicon consists of the stressed word form, its morphosyntactic tag, and the unstressed word form.

However, a lexicon cannot cover Out-Of-Vocabulary (OOV) words that might be encountered in Croatian texts. For such cases, a system for automatic lexical stress assignment must be developed.

The aim of this work is to investigate an automatic approach to lexical stress assignment in Croatian based on a machine learning model and a hybrid approach based on rules and machine learning models. The main research questions are:

RQ1: Is it possible to develop an automatic system for lexical stress assignment in Croatian based on machine learning models with high accuracy?

RQ2: Is it possible to combine a rule-based system for lexical stress assignment with a system for lexical stress assignment based on machine learning models to achieve even better results?

The main contributions of this work are developed systems for automatic lexical stress assignment in Croatian (machine-learning-based and hybrid). An evaluation of the developed models was also performed and tested. Their accuracy was compared with texts that were not used for model training and where linguistic experts assigned stress manually. Details about the method used to develop the models and their results are described in Section 3 and Section 4.

The paper is organized as follows. Section 2 presents the background, describes the lexical stress features of Croatian, and presents previous work; Section 3 presents our method and a model trained on the lexicon data with the lexical stresses assigned to the words; and Section 4 presents our results. Finally, Section 5 presents the discussion and conclusions.

2. Background

2.1. Lexical Stress Features in the Croatian Language

In Croatian, there are four types of stresses: short falling (SF), with a high tone on the initial short stressed syllable and low tones elsewhere; short rising (SR), with a high tone on the short-stressed syllable and the following syllable; long falling (LF), with a falling tone on the long-stressed syllable; and long rising (LR), with a high tone on the long-stressed syllable, passing high into the following syllable [12]. The position of lexical stress in a word is also not fixed; it can be anywhere, except for on the last syllable.

In Croatian, a distinction is also made between prosodic and orthographic words. An orthographic word is a part of the text that remains an inseparable whole even with syntagmatic modifications (insertion or change). Parts of an orthographic word are morphemes: semantic core, prefixes, infixes, and suffixes. Orthographic words are written separately in a text. The prosodic word consists of syllables in a sequence, which syntagmatically refer to a single stressed syllable. It is usually the semantic nucleus and one or more morphemes that denote the linguistic, modal, and logical relations of the nucleus. Morphemes are grammatical morphemes that can be parts of an orthographic word or grammatical words that are separate word clitics or atonic words that do not have their own lexical stress and cannot occur as separate spoken words.

Clitics can depend on the following word for lexical stress and are called proclitics or they can depend on the preceding word and are called enclitics (e.g., auxiliary verbs, pronouns, and the particle “li”). All other words that are not clitics are called tonic words. Table 1 lists Croatian proclitics and enclitics according to [13]. Enclitics do not have independent stress, e.g., vȉdīmga (Eng. I see him) or prȅdālisunamse (Eng. they surrendered to us). On the other hand, proclitics lack lexical stress before words with rising stress, e.g., u vòdi, (Eng. in the water) or po ljepòti (Eng. by the beauty), but possess one before words with falling accents, as the stress is shifted from the stressed word to the proclitic, e.g., ȕzoru (Eng. at dawn) and pȍvodu (Eng. for water).

The prosody of a prosodic word consists of the number of syllables, the prosodic features of the syllables, and the mutual prosodic relations. A prosodic word is a whole regardless of the number of orthographic words that make it up. This is confirmed by the shift of lexical stress. The rule states that the falling stress can only be on the first syllable, and when a prefix or proclitic is added to the semantic nucleus, the stress shifts from the first syllable of the nucleus to the prefix or proclitic, e.g., znȁti (Eng. to know)—pòznati (Eng. to recognize)—nė znati (Eng. not know). Exceptions are compounds where the falling stress is also found in the middle part of the word, and sometimes the link with the proclitic is on the same level of the compound, e.g., poljoprìvreda (Eng. agriculture). In trisyllabic and multisyllabic words, the lexical stress does not shift, e.g., poȍpomenama (Eng. after warnings).

Since a prosodic word consists of one or more orthographic words, it is on average about 40% larger than the orthographic word. In the standard Croatian language, the average prosodic word has 3.12 syllables, and the average orthographic word has 2.25 syllables. Orthographic words are most often monosyllabic (43.42%), then two-syllabic (27.3%), three-syllabic (21.6%), four-syllabic (12.5%), five-syllabic (3%), and six-syllabic (0.85%). Prosodic words, according to [9], are most frequently trisyllabic (31.5%), then two-syllabic (28.7%), followed by four-syllabic (22.4%), five-syllabic (8.6%), monosyllabic (4.9%), and six-syllabic (2.9%).

2.2. Related Work

In addition to text-to-speech systems, lexical stress assignments can be useful in a variety of domains. For example, in [14], lexical stress classification was used in medicine to assess dysprosody in childhood apraxia of speech. The research showed promising results in automatically classifying lexical stress to detect errors in children’s speech during diagnosis or treatment-related changes, but the authors concluded that further training of the algorithms on larger datasets is needed. In [15], lexical stress in language learning was used to detect errors in the speech of non-native speakers of English as a second language. The authors reported results of 94.8% precision and 49.2% recall for detecting incorrectly stressed words in the English L2 speech of Baltic and Slavic speakers. In [16], automatic accent classification was performed and its use in forensic applications was described.

Although there is a large body of literature on the automatic assignment of lexical stress for well-resourced languages, research on Slavic languages (a family to which the Croatian language belongs) is sparse. The main reason for this is that these languages are not only under-resourced (i.e., lack speech or pronunciation corpora and language models) but also morphologically rich.

Although there are efforts to train multilingual models and build resources to support languages with insufficient resources [17,18,19], the languages in the model must be related in some way to produce high-quality results. Since Slavic languages are similar but generally under-resourced and morphologically rich, it is difficult to create an environment in which they can be trained together with well-resourced languages.

One of the main problems of languages with insufficient resources and rich morphologies is how to deal with words outside the vocabulary (Out-Of-Vocabulary, OOV).

Therefore, the basic goal of developing rule-based text phoneme mapping systems is to handle stressed OOV words. The lexicon in which the lexical stress of basic and derived inflectional word forms is marked cannot cover all the words that may occur in the texts. In such situations, one of the statistical methods can be applied to find the most probable position of lexical stress and type of accent in a word. For this purpose, classification and regression decision trees, the support vector machine (SVM) method, hidden Markov models (HMMs), or naive Bayes classifiers for low-resource languages were used.

In terms of the related work in the field of automatic lexical stress detection and mapping for low-resource languages, Ni, Liu, and Xu [20] developed a hierarchical model-based boosting classification and regression tree (CART) for Mandarin stress detection using acoustic evidence and textual information. Gharavian, Sheikhan, and Ghasemi [21] developed a combined classification model for lexical stress detection in Farsi (HMM was used to segment stressed sentences, additional features were extracted from pitch and formant frequencies, and six feature sets were selected using fast correlation-based filter feature selection). For Hindi, a hybrid model (rule-based and statistical learning) was used [22]. James et al. [23] also used HMMs for the under-resourced language of Māori (New Zealand).

Ciobanu, Dinu, and Dinu [24] used SVM to find the boundary between syllables and predict stressed syllables for Romanian. Lorincz et al. [25] describe “RoLEX”—a dataset for the Romanian language containing over 330,000 entries with information on the lemma, morphosyntactic description, syllabification, lexical stress, and phonemic transcription. Moreover, Marinčič, Tušar, Gams and Šef [26] used classification trees to determine the lexical stress position and type of accent in Slovenian. First, based on the context of each vowel, a model predicting whether it is stressed was created (a new model was created for each vowel), followed by a model predicting the type of accent. Unlike Croatian, where there are four types of accents that can be placed on any of the five vowels or “syllabic r”, in Slovenian, the vowel e can have three types of accents, the vowel o can have only two, and the other vowels are either stressed or unstressed.

For Croatian, there are works proposing acoustic modelling for speech recognition and speech synthesis [27], intonation modelling [28], and prototype systems for Croatian speech synthesis [29], although in the mentioned studies, lexical stress was not considered.

There are also lexical resources for Croatian that are very useful for various tasks in processing the language, such as the Croatian Morphological Lexicon [30] and hrLex v1.3 [31], which are inflectional lexicons of Croatian. In recent years, resources for processing Croatian at the level of derivative morphology have been published, such as the Croatian Derivative Lexicon-CroDeriv [32] and DerivBase HR [33]. None of these lexicons, however, contain the lexical stress on words, which is a very important feature for the naturalness of synthesized speech and the performance of speech-to-text systems.

As far as we know, there is no comparable research in the field of automatic stress recognition and assignment for the Croatian language so far.

3. Method

In this study, classification trees were used to predict the lexical stress position and the lexical stress type (LF, LR, SF, SR) in a word. Two separate models were built for this purpose.

3.1. Dataset

The dataset used to train the models is a lexicon described in detail in [11], which consists of all inflectional forms of nouns, verbs, and adjectives, as well as manually added inflectional forms of pronouns and numbers. The lexicon contains a total of 72,366 lemmas and 1,011,785 inflectional word forms (a total of 1,084,151 entries). Each entry in the lexicon consists of a stressed word form, its morphosyntactic (MSD) tag, and the corresponding unstressed word form. Figure 1 shows the entries for the Croatian word lonac (En. pot) in all inflectional forms. The first column represents the stress word form, the second column the MSD tag (consisting of a tag for category noun (N); gender (m, f, or n); number (s or p); case (n, g, d, a, v, l, or i); animate (y or n)), and the third column the unstressed word form. It can be noted that the word lonac in Croatian has different stress types in different forms. Therefore, the MSD tag of the word is an important feature in determining the correct lexical stress. In addition to the MSD tag, there are other features that are important in determining the stress type such as the position of the stressed syllable in the word, the phonetic features of the syllables in the words, etc. All features that were used in training the models can be found in Section 3.2.

3.2. Linguistic Features

In order to predict the lexical stress position and the stress type in a word, in addition to the dataset described above on which the model is trained, the linguistic features to be considered in the construction of the trees must be specified. The following linguistic features were used to train the first model to predict the stress position in a word: the number of syllables in a word, the phonetic features of the last four phonemes, the phonetic features of the first three phonemes in a word, and the part-of-speech tags (POS). These features are listed in Table 2. For each entry, a set of 44 attributes was created and used in model training to predict the position of lexical stress in a word.

The phonetic features of the phonemes and the labels used for training the models are listed in Table 3.

The following linguistic features were used to train the second model to predict the stress type: the number of syllables in a word, phonetic features of the last three phonemes in a word, phonetic features of the first two phonemes in a word, POS tags, ordinal number of the stressed syllable (obtained from the first model), identity of the stressed phoneme, phonetic features of the phoneme before the stressed phoneme, and phonetic features of the phoneme after the stressed phoneme. The linguistic features used in the second model are listed in Table 4. Thus, for each entry, a set of 55 attributes was created and used in model training for predicting the type of lexical stress in a word.

Figure 2 shows an example of a set of labels used by the model to predict the stress type. For each entry, there are 55 attributes compiled according to Table 4. The examples in the figure show that some of the words differ in the following features: the number of syllables, phonetic features of the last three phonemes in the word, and ordinal number of the stressed syllable counted from the end of the word. The other features are the same for all words in this example. This is because they have the same beginning (prímjer) so the phonetic features of the first seven phonemes are the same. The last attribute is the category attribute. There are altogether five possible categories—four types of stress and zero stress (ks, ku, du, ds, 0).

3.3. Model Training

Classification and regression trees (CARTs) were used to construct a decision tree for solving classification and regression problems. In cases where the most likely membership of a feature to a particular class or category (qualitative variable) is predicted, these are called classification trees, and when the value of a numerical or continuous feature (quantitative variable) is predicted, these are called regression trees.

To construct the tree, a set of learning data S = {(z_n,y_n), n − 1,…N} was used, where z_n is the feature vector of the nth sample (instance) and y_n is the dependent variable to be predicted. The tree construction started with the root node, which had all the data in the set associated with it. The next step was to find the attribute that best partitions the data in the set according to a given criterion. The C4.5 algorithm [34], which was used for training the classification and regression trees, used the gain ratio criterion. The procedure was repeated recursively for the dataset in each node and ended when the subset of a given node had the same values as the initial variable, when further branching no longer helped to improve the results, or when further subdivision was not possible. Each leaf in a tree represented the value of the target variable when the given values of the input were represented by the path from the root of the tree to that leaf.

Tree-building algorithms usually classify learning data very well. However, they may not classify unseen data with the same accuracy. When a model does not reflect the true dependencies between the input and output variables, this is called overfitting. It usually happens when the model is too complex, that is, when it contains too many parameters relative to the number of instances. In tree construction, pruning is usually used to prevent overfitting. In this way, the branches responsible for overfitting are not considered. Pruning is usually conducted by evaluating the tree using the data that were not used to build the tree and then simplifying them by ignoring parts of the tree that do not classify the data well [35].

The C4.5 decision tree algorithm used for model training was implemented in the Weka tool as a classifier named J48 [36]. Pruning was used here as the default value.

First, we built a model for predicting the lexical stress position and then a model for predicting the type of stress. We searched for the most likely ordinal number of the stressed syllable in a word. After that, the most likely stress type (short falling, short rising, long falling, long rising) was found in the new model. We used 10-fold cross-validation in training the model. We also experimented with attribute selection and hyperparameter optimization techniques within the J48 classifier, but this did not produce better results. Finally, when the accuracies of the predictions for the lexical stress position and type of accent were evaluated, the model was applied to unseen data for testing.

For this purpose, a text of 2160 words with hand-marked lexical stress was used. It is a transcription of speech and lexical stress was assigned according to the Croatian literature standard by linguists. The text was annotated using the morphosyntactic tagger ReLDI [31,37].

Using the model trained on the training dataset, we obtained a prediction of the most likely lexical stress positions for words in the test data. These probabilities were filtered out and used to build a model predicting the most likely type of accent in the test data. The probabilities were added to the second model as language features (one probability for one instance).

4. Results

The test of the first model used to predict the most likely position of stress in a word based on lexicon data resulted in a 90.78% rate of correctly classified instances, and the second model, which predicted the most likely type of stress, resulted in an 86.7% rate of correctly classified instances. Table 5 shows the detailed accuracies of the models in terms of the TP (true-positive) rate, FP (false-positive) rate, precision, recall, F-measure, and ROC area.

After the words from the test data described above were presented with the linguistic feature labels, the predictions provided by the first model were applied to determine the ordinal number of the syllable most likely to be stressed in the word. The obtained predictions were added as a linguistic feature to each word from the test data and thus included in the test of the second model to determine the most likely type of stress. The obtained prediction values were compared with the actual values to determine the accuracy of the model on the test data.

Since the test data were a transcription of speech, they contained many short words, prepositions, and consonants that did not have lexical stress assigned in the test data, such as the atonic clitics se, ga, po. These were not considered for calculating the accuracy of the model on the test data.

For the remaining 1422 words, the model accurately predicted the lexical stress position for 1365 words (95.99%) and the type of accent for 1056 words (74.26%), whereas for 1026 words (72.15%), it accurately predicted both the stress position and type of accent.

Here, the accuracy represented the percentage of words that were assigned the correct stress in the test text by the automatic procedure, and the accuracy was calculated according to the following formula:

A c c u r a c y = \frac{|N t \cap N s|}{\begin{matrix} N s \\ 1 \end{matrix}} * 100 (%)

(1)

where the cross-section

|N t \cap N s|

(2)

represents the number of words that were assigned the correct stress compared to the test text and Ns represents the total number of words in the test text.

It was assumed that the accuracy of the model in the test dataset was better at predicting the lexical stress position and worse at predicting the type of accent because most of the words in the test dataset were monosyllabic and two-syllabic words. For such words, the probability of determining the exact position was higher than for multisyllabic words (because the lexical stress in Croatian is most often positioned on the first syllable), but the probability of determining the type of accent was lower because words with a stressed first syllable can have any of the four accents, whereas the other syllables can have only ascending accents.

When the rules for atonic proclitics and enclitics were applied, i.e., when the stress was shifted from the stressed word to the proclitic, the following accuracy was obtained: 97.4% for the position of the stress, 82.4% for the type of the stress, and 80.1% for both the position and type of the stress. The results are shown in Table 6.

Hybrid Approach

In order to achieve the highest possible accuracy of lexical stress assignment in the text, we proposed a hybrid approach that combined a comprehensive set of morphological accent rules for all words in their base and inflectional forms described in [11,38], together with the models described in this paper, in such a way that the accent was first assigned to the words using the set of morphological accentuation rules mentioned above, and then the words that were not assigned the accent in the first step (by implementing the rules) because they were OOV words, were automatically assigned the accent by implementing the described models.

Although the main source for training the models was a lexicon that did not contain OOV words, lexical stress was still assigned to such words in the hybrid approach because the model was trained based on the linguistic features of other words in the lexicon. Therefore, lexical stress was also assigned to OOV words based on words that share the same linguistic features as the OOV word. Implementing the proposed hybrid approach, a 95.3% accuracy was achieved, i.e., 95.3% of the words from the test data were assigned the correct lexical stress.

A schematic representation of the proposed hybrid approach is shown in Figure 3. It shows all the procedures required to obtain a final result—a text with lexical stress assigned (label 6 in Figure 3) from a text without lexical stress (label 1 in Figure 3). We can see that the first step was the MSD tagging (label a in Figure 3) of the text without lexical stress. After tagging, we obtained the text containing the words with their MSD tags (label 2 in Figure 3). Then, we applied an extensive set of rules described in [11,38] (label b in Figure 3). As a result, we obtained words with assigned stress (label 3a in Figure 3) or words without assigned stress for the OOV words (label 3b in Figure 3). For the OOV words, we performed automatic stress assignment using machine learning models (label c in Figure 3). Finally, we had all words with assigned stress (labels 3a and 4 in Figure 3). Next, we enforced the rules for atonic words, enclitics, and proclitics (label d in Figure 3). In this way, we obtained words with assigned stress and enforced rules (label 5 in Figure 3), which was our final goal—a text with assigned lexical stress.

5. Discussion and Conclusions

In this work, we tested the applicability of methods for modelling and predicting prosodic features for Croatian, more specifically, lexical stress position and type.

In order to solve the problem of the automatic assignment of lexical stress to OOV words (i.e., words not present in the lexicon), a system for automatic stress assignment was developed using a model. Classification trees were used to predict the position of stress and the type of stress in a word, whereas a lexicon, which consisted of 1,011,785 word forms containing information about the stressed word form, its unstressed equivalent, and morphosyntactic (MSD) tag was used for training.

The model used to predict the most likely position of stress in a word based on the lexicon data yielded a 90.78% accuracy of correctly classified instances, and the model that predicted the most likely type of stress yielded an 86.7% accuracy of correctly classified instances.

The accuracy of the models in predicting the position of lexical stress on the unseen test data was 95.99%, in predicting the type of stress it was 74.26%, and in predicting both the position of stress and type of accent, a 72.15% accuracy was achieved. After applying the rules for atonic words, the following accuracies were achieved: 97.4% for the lexical stress position, 82.4% for the lexical stress type, and 80.1% for both the lexical stress position and stress type.

Based on our experiments, we can conclude that the best model for the automatic assignment of lexical stress in Croatian was a hybrid model, where words were first assigned the appropriate stress using morphological accent rules, followed by the enforcement of rules for unstressed words. For words that had not been assigned stress, the model-based stress assignment procedure was applied. Then, the rules for atonic words were applied again, i.e., the lexical stress rules were transferred from the stressed word to the proclitic. The final accuracy in assigning stress to words was 95.3%.

Our future work includes experimenting with other machine learning algorithms and expanding the test data corpus. A deep learning approach was not considered because a sufficiently large speech or pronunciation corpus is not yet available for the highly inflected Croatian language. Our results suggest that a hybrid model is effective for under-resourced languages for which labelled training data is not available.

Author Contributions

Conceptualization, N.M.P. and L.N.P.; Data curation, N.M.P. and L.N.P.; Formal analysis, L.N.P.; Funding acquisition, L.N.P.; Investigation, N.M.P. and L.N.P.; Methodology, N.M.P. and L.N.P.; Project administration, L.N.P.; Resources, N.M.P. and L.N.P.; Software, L.N.P.; Supervision, N.M.P.; Validation, N.M.P. and L.N.P.; Visualization, L.N.P.; Writing—original draft, N.M.P. and L.N.P.; Writing—review and editing, N.M.P. and L.N.P. All authors have read and agreed to the published version of the manuscript.

Funding

Partial financial support was received from the University of Rijeka under Grant Agreement No 18.14.2.2.02.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Quesada, W.; Lautenbach, B. Programming Voice Interfaces; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017. [Google Scholar]
Tadić, M. European Language Equality. In D1.7 Report on the Croatian Language; ELE: Karaez, France, 2022. [Google Scholar]
Ljubešić, N.; Koržinek, D.; Rupnik, P.; Jazbec, I. ParlaSpeech-HR—A freely available ASR dataset for Croatian bootstrapped from the ParlaMint corpus. In Proceedings of the ParlaCLARIN III @ LREC2022, Marseille, France, 20 June 2022; pp. 111–116. [Google Scholar]
Nikola, L.J.; Koržinke, D.; Rupnik, P.; Jazbec, I.; Batanović, V.; Bajčetić, L.; Evkoski, B. ASR training dataset for Croatian ParlaSpeech-HR v1.0. In Slovenian language resource repository CLARIN.SI; Jožef Stefan Institute: Ljubljana, Slovenia, 2022; ISSN 2820-4042. [Google Scholar]
Načinović, L.; Pobar, M.; Ipšić, I.; Martinčić-Ipšić, S. Grapheme-to-Phoneme Conversion for Croatian Speech Synthesis. In Proceedings of the 32nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2009), Opatija, Croatia, 25–29 May 2009; pp. 318–323. [Google Scholar]
Vaissière, J. Cross-linguistic prosodic transcription: French vs. English. In Problems and Methods of Experimental Phonetics; Volskaya, N.B., Svetozarova, N.D., Skrelin, P.A., Eds.; St Petersburg State University Press: St Petersburg, Russia, 2002; pp. 147–164. [Google Scholar]
Malisz, Z.; Zygis, M. Lexical stress in Polish: Evidence from focus and phrase-position differentiated production data. In Proceedings of the 9th International Conference on Speech Prosody, Poznan, Poland, 13–16 June 2018; pp. 1008–1012. [Google Scholar] [CrossRef]
Skarnitzl, R.; Eriksson, A. The Acoustics of Word Stress in Czech as a Function of Speaking Style. In Proceedings of the 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, 20–24 August 2017; pp. 3221–3225. [Google Scholar] [CrossRef]
Babić, S.; Brozović, D.; Moguš, M.; Pavešić, S.; Škarić, I.; Težak, S. Povijesni Pregled, Glasovi i Oblici Hrvatskoga Književnog Jezika [Historic Review, Sounds and Forms of the Standard Croatian Language]; Globus, Nakladni zavod: Zagreb, Croatia, 1991. [Google Scholar]
Anić, V. Veliki Rječnik Hrvatskoga Jezika [The Great Dictionary of Croatian Language]; Novi liber: Zagreb, Croatia, 2009. [Google Scholar]
Mikelic Preradovic, N.; Nacinovic Prskalo, L. Development of the accent dictionary for the pitch-accent language: The case of Croatian. J. Slav. Linguist. submitted 2022.
Pletikos Olof, E.; Bradfield, J. Standard Croatian pitch-accents: Fact and fiction. In Proceedings of the 19th International Congress of Phonetic Sciences ICPhS 2019, Melbourne, Australia,, 5–9 August 2019; Calhoun, S., Escudero, P., Tabain, M., Warren, P., Eds.; Australian Speech Science & Technology Association Inc: Canberra, Australia, 2019; pp. 855–858. [Google Scholar]
Barić, E.; Lončarić, M.; Malić, D.; Pavešić, S.; Peti, M.; Zečević, V.; Znika, M. Hrvatska Gramatika [Croatian Grammary]; Školska knjiga: Zagreb, Croatia, 2003. [Google Scholar]
McKechnie, J.; Shahin, M.; Ahmed, B.; McCabe, P.; Arciuli, J.; Ballard, K.J. An Automated Lexical Stress Classification Tool for Assessing Dysprosody in Childhood Apraxia of Speech. Brain Sci. 2021, 11, 1408. [Google Scholar] [CrossRef] [PubMed]
Korzekwa, D.; Barra-Chicote, R.; Zaporowski, S.; Beringer, G.; Lorenzo-Trueba, J.; Serafinowicz, A.; Droppo, J.; Drugman, T.; Kostek, B. Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention. Proc. Interspeech 2021, 2021, 3915–3919. [Google Scholar] [CrossRef]
Brown, G.; Franco-Pedroso, J.; González-Rodríguez, J. A segmentally informed solution to automatic accent classification and its advantages to forensic applications. Int. J. Speech Lang. Law 2022, 28, 201–232. [Google Scholar] [CrossRef]
Woldemariam, Y.D. Transfer Learning for Less-Resourced Semitic Languages Speech Recognition: The Case of Amharic. In Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020), Marseille, France, 11–12 May 2020; pp. 61–69. [Google Scholar]
Rosenberg, A.; Audhkhasi, K.; Sethy, A.; Ramabhadran, B.; Picheny, M. End-to-end speech recognition and keyword search on low-resource languages. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 5280–5284. [Google Scholar]
Das, A.; Jyothi, P.; Hasegawa-Johnson, M. Automatic speech recognition using probabilistic transcriptions in Swahili, Amharic, and Dinka. In Proceedings of the 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, 8–12 September 2016; pp. 3524–3528. [Google Scholar]
Ni, C.; Liu, W.; Xu, B. Mandarin stress detection using hierarchical model based boosting classification and regression tree. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–5. [Google Scholar] [CrossRef]
Gharavian, D.; Sheikhan, M.; Ghasemi, S.S. Combined classification method for prosodic stress recognition in Farsi language. Int. J. Speech Technol. 2018, 21, 333–341. [Google Scholar] [CrossRef]
Bellur, A.; Narayan, K.B.; Raghava Krishnan, K.; Murthy, H.A. Prosody modeling for syllable-based concatenative speech synthesis of Hindi and Tamil. In Proceedings of the 2011 National Conference on Communications (NCC), Bangalore, India, 28–30 January 2011; pp. 1–5. [Google Scholar] [CrossRef]
James, J.; Shields, I.; Berriman, R.; Keegan, P.J.; Watson, C.I. Developing Resources for Te Reo Māori Text To Speech Synthesis System. In Lecture Notes in Computer Science, Proceedings of the Text, Speech, and Dialogue (TSD), Brno, Czech, 8–11 September 2020; Sojka, P., Kopeček, I., Pala, K., Horák, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12284, pp. 294–302. [Google Scholar] [CrossRef]
Ciobanu, A.M.; Dinu, A.; Dinu, P.L. Predicting Romanian Stress Assignment. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 26–30 April 2014; pp. 64–68. [Google Scholar]
Marinčič, D.; Tušar, T.; Gams, M.; Šef, T. Analysis of Automatic Stress Assignment in Slovene. Informatica 2009, 20, 35–50. [Google Scholar] [CrossRef]
Lorincz, B.; Irimia, E.; Stan, A.; Barbu Mititelu, V. RoLEX: The development of an extended Romanian lexical dataset and its evaluation at predicting concurrent lexical information. Nat. Lang. Eng. 2022, 1–26. [Google Scholar] [CrossRef]
Martinčić–Ipšić, S.; Ribarić, S.; Ipšić, I. Acoustic Modelling for Croatian Speech Recognition and Synthesis. Informatica 2008, 19, 227–254. [Google Scholar] [CrossRef]
Načinović, L.; Pobar, M.; Martinčić-Ipšić, S.; Ipšić, I. Automatic Intonation Event Detection Using Tilt Model for Croatian Speech Synthesis. In Proceedings of the INFuture2011, The Future of Information Sciences, Information Sciences and e-Society, Zagreb, Croatia, 9–11 November 2011; pp. 383–391. [Google Scholar]
Martinčić-Ipšić, S.; Ipšić, I. Croatian HMM Based Speech Synthesis. J. Comput. Inf. Technol. CIT 2006, 14, 299–305. [Google Scholar] [CrossRef][Green Version]
Tadić, M. The Croatian Lemmatization Server. South. J. Linguist. 2005, 29, 206–217. [Google Scholar]
Ljubešić, N.; Klubička, F.; Agić, Ž.; Jazbec, I.-P. New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, 23–28 May 2016; Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., et al., Eds.; pp. 4262–4270. [Google Scholar]
Filko, M.; Šojat, K.; Štefanec, V. The Design of Croderiv 2.0. Prague Bull. Math. Linguist. 2020, 115, 83–104. [Google Scholar] [CrossRef]
Šnajder, J. Derivbase.hr: A high-coverage derivational morphology resource for Croatian. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014; Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S., Eds.; pp. 3371–3377. [Google Scholar]
Salzberg, S.L. C4.5: Programs for Machine Learning. Mach. Learn. 1994, 16, 235–240. [Google Scholar] [CrossRef]
Krzywinski, M.; Altman, N. Classification and regression trees. Nat. Methods 2017, 14, 757–758. [Google Scholar] [CrossRef]
Witten, I.H.; Frank, E.; Hall, M.A. Data Mining, Practical Machine Learning Tools and Techniques, 3rd ed.; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
Ljubešić, N. MULTEXT-East Morphosyntactic Specifications, revised Version 4; Croatian Specifications. 2013. Available online: http://nlp.ffzg.hr/data/tagging/msd-hr.html (accessed on 15 May 2022).
Mikelić Preradović, N. Pristupi Izradi Strojnog Tezaurusa za Hrvatski Jezik, Doktorska Disertacija [Approaches to the Development of the Machine Lexicon for Croatian Language]. Ph.D. Thesis, University of Zagreb, Zagreb, Croatia, 2008. [Google Scholar]

Figure 1. Entries in the lexicon [8] for the word lonac (En. pot) and its inflectional forms.

Figure 2. Label set for the linguistic features of individual words.

Figure 3. Hybrid approach to lexical stress assignment using morphological accentuation rules and models.

Table 1. Proclitics and enclitics in the Croatian language.

Proclitics
Prepositions	monosyllabic		s, k, zbog, na, o, po, pri, u, etc.
	two-syllabic		među, mimo, nada, poda, pokraj, preko, prema, oko
	three-syllabic		umjesto (Eng. instead), all compounds with the preposition “iz-”, e.g., između, iznad, ispod (Eng. between, above, below)
Conjunctions			a, i, ni, da, kad (when it is stressed)
Negative particle			ne
Enclitics
Pronoun proclitic	unstressed forms of personal pronouns	genitive	me, te, ga, je, nas, vas, ih
		dative	mi, ti, mu, joj, nam, vam, im
		accusative	me, te, ga (nj), ju (je), nju, nas, vas, ih
	unstressed forms of the reflexive pronoun	genitive	se
		dative	si
		accusative	se
Verb proclitic	unstressed present tense forms of the verb to be		sam, si, je, smo, ste, su
	unstressed present tense forms of the modal verb will		ću, ćeš, će, ćemo, ćete, će
	unstressed aorist forms of the verb to be		bih, bi, bi, bismo, biste, bi
			conjunctive—interrogative li

Table 2. Linguistic features used to build models to predict the lexical stress position in a word.

Linguistic Feature	Feature Description	Potential Values	Number of Attributes
number of syllables	number of syllables in a word	values from 1 to 12	1
the last phoneme in a word	phonetic features of the last phoneme in a word	according to Table 3	6
the penultimate phoneme in a word	phonetic features of the penultimate phoneme in a word	according to Table 3	6
the third phoneme from the end of a word	phonetic features of the third phoneme from the end of a word	according to Table 3	6
the fourth phoneme from the end of a word	phonetic features of the fourth phoneme from the end of a word	according to Table 3	6
the first phoneme in a word	phonetic features of the first phoneme in a word	according to Table 3	6
the second phoneme in a word	phonetic features of the second phoneme in a word	according to Table 3	6
the third phoneme in a word	phonetic features of the third phoneme in a word	according to Table 3	6
POS tag	POS tag of a word	N, V, A, P, R, S, C, M, Q, I	1

Table 3. Phonetic features of phonemes.

Phonetic Feature		Label
Vowel or consonant	vowel	v
Vowel or consonant	consonant	c
Type of consonant according to place of articulation	approximant	p
	vibrant	tr
	lateral	b
	nasal	n
	plosive	z
	fricative	tj
	affricate	s
Type of consonant	labial	u
	labiodental	z
	dental	d
	postalveolar	p
	palatal	n
	velar	j
Type of consonant according to voicing	voiced	z
Type of consonant according to voicing	voiceless	b
Height of vowels	high vowel	v
	mid vowel	s
	low vowel	n
Vowels according to place of articulation	i	p
	e	ps
	a	a
	o or u	s

Table 4. Linguistic features used to build models to predict the type of accent in a word.

Linguistic Feature	Feature Description	Potential Values	Number of Attributes
number of syllables	number of syllables in a word	values from 1 to 12	1
the last phoneme in a word	phonetic features of the last phoneme in a word	according to Table 3	6
the penultimate phoneme in a word	phonetic features of the penultimate phoneme in a word	according to Table 3	6
the third phoneme from the end of a word	phonetic features of the third phoneme from the end of a word	according to Table 3	6
the first phoneme in a word	phonetic features of the first phoneme in a word	according to Table 3	6
the second phoneme in a word	phonetic features of the second phoneme in a word	according to Table 3	6
POS tag	POS tag of a word	N, V, A, P, R, S, C, M, Q, I	1
ordinal number of the stressed syllable	ordinal number of the stressed syllable as a result of the first model	values from 1 to 12	1
ordinal number of the stressed syllable counting from the end of a word	ordinal number of the stressed syllable counting from the end of a word	values from 1 to 12	1
the identity of the stressed phoneme	ordinal number of the stressed syllable counting from the end of a word	a, e, i, o, u, syllabic r	1
the phoneme in front of the stressed one	phonetic features of the phoneme in front of the stressed one	according to Table 3	5 (except for vowels/consonants)
the phoneme following the stressed one	phonetic features of the phoneme following the stressed one	according to Table 3	5 (except for vowels/consonants)

Table 5. Detailed accuracy (weighted average) of models predicting stress position and type.

	TP Rate	FP Rate	Precision	Recall	F-Measure	ROC Area
Prediction of lexical stress position (weighted avg.)	0.908	0.084	0.907	0.908	0.908	0.964
Prediction of lexical stress type (weighted avg.)	0.867	0.072	0.867	0.867	0.867	0.948

Table 6. Accuracy of prediction of stress position and type on the test set.

	Accuracy of Prediction of Lexical Stress Position	Accuracy of Prediction of Lexical Stress Type	Accuracy of Prediction of Both Stress Position and Type
Before enforcing the rules for unstressed words	95.99%	74.26%	72.15%
After enforcing the rules for unstressed words	97.4%	82.4%	80.97%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mikelić Preradović, N.; Nacinovic Prskalo, L. System for Automatic Assignment of Lexical Stress in Croatian. Electronics 2022, 11, 3687. https://doi.org/10.3390/electronics11223687

AMA Style

Mikelić Preradović N, Nacinovic Prskalo L. System for Automatic Assignment of Lexical Stress in Croatian. Electronics. 2022; 11(22):3687. https://doi.org/10.3390/electronics11223687

Chicago/Turabian Style

Mikelić Preradović, Nives, and Lucia Nacinovic Prskalo. 2022. "System for Automatic Assignment of Lexical Stress in Croatian" Electronics 11, no. 22: 3687. https://doi.org/10.3390/electronics11223687

APA Style

Mikelić Preradović, N., & Nacinovic Prskalo, L. (2022). System for Automatic Assignment of Lexical Stress in Croatian. Electronics, 11(22), 3687. https://doi.org/10.3390/electronics11223687

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

System for Automatic Assignment of Lexical Stress in Croatian

Abstract

1. Introduction

2. Background

2.1. Lexical Stress Features in the Croatian Language

2.2. Related Work

3. Method

3.1. Dataset

3.2. Linguistic Features

3.3. Model Training

4. Results

Hybrid Approach

5. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI