Fuzzy Property Grammars for Gradience in Natural Language
Abstract
:1. Introduction
- In linguistics, a model which considers grammaticality as a vague object has to be a fuzzy grammar which defines grammaticality as “knowledge of a grammar for processing language meaning”. This model offers a tool to evaluate complexity and universality in terms of degree. This research direction is already shown in Torrens-Urrutia et al. [25,26].
- For society and individuals, the concept of grammaticality as graded and vague opens the path to developing language software with linguistic explicative capacity. Explicative stands for the notion of explicability as in Floridi et al. [27], that is, combining demands for intelligibility and accountability. Therefore, a software with such characteristics will bare linguistic knowledge that can be incremented or reduced in real-time according to the evolution of natural languages, always displaying the linguistic information as a white-box method. This software could extract information with linguistic explicability in web data mining, in the automatic extraction of other natural language grammars with explicit linguistic information, in self-learning programs, in the developing of computer tools for the automatic detection of language pathologies, etc.
- In Section 2, we present the background of our research with some general ideas regarding key concepts in our framework.
- In Section 3, we lay out the formal prerequisites of our model regarding property grammars and fuzzy natural logic.
- In Section 4, the formal model of Fuzzy Property Grammars is introduced.
- In Section 5, materials and methods for extracting and computing the degrees of grammaticality are presented.
- In Section 6, the results of our research by introducing our Fuzzy Property Grammar of Spanish Language and its idiosyncrasies are shown.
- In Section 7, a theoretical application of the degrees of grammaticality in natural language with examples is displayed.
2. Background
- (1)
- A grammar represents the linguistic knowledge that a native speaker has regarding a specific natural language. This also concerns the abstract linguistic rules which arrange the surface structures of a particular grammar. Lakoff [4] already strongly claimed that such rules must be considered part of the linguistic competence, and not as part of the linguistic performance (disagreeing with Chomsky [21]).
- (2)
- The linguistic knowledge concerns not only to those rules that can generate perfect grammatical utterances but also to the knowledge that a native speaker has acquired for processing and understanding non-grammatical utterances.
- (3)
- Linguistic knowledge can be defined through linguistic constraints (as a type of linguistic rule) and must tackle the notion of markedness. Therefore, we definitely need to consider canonical, prototypical, and non-canonical, non-prototypical or borderline constraints when defining a grammar.
- –
- The notion of linguistic constraints stands for a relation that puts together two or more linguistic elements. When this relation happens mostly in any context, the linguistic constraint is labeled as canonical or prototypical.
- –
- The notion of markedness stands for non-prototypical linguistic contexts. When a linguistic constraint happens in a marked context, the constraint is labeled as non-canonical, non-prototypical, or borderline.
- –
- The canonical constraints are those ones that definitely belong to a specific grammar. The non-canonical ones, even though they are part of the grammar, are “not definitely” part of it since their belonging to the grammar depends on how marked the linguistic context is.
- A framework with constraints: It is necessary to choose a grammar or a model which takes into account constraints.
- Linguistic constraints: The notion of linguistic constraint stands for a relation that puts together two or more linguistic elements. For example, “ἃ Transitive Verb Requires a Subject and an Object”. The linguistic constraint is the “requirement”, and the linguistic elements are “Transitive Verb”. “Subject”, “Object”. Formally, a linguistic constraint is an n-tuple where are linguistic categories. We usually have , as shown in Torrens-Urrutia et al. [25,26].
- Context effects and markedness: The concept of markedness arises to represent the importance of context for a word. A sentence is more marked than a sentence if is acceptable in less contexts than . Müller [55] claimed that markedness can be determined either by the judgments of the speakers or by extracting the number of possible context types for a sentence. Keller [8] points out that “a constraint is context-dependent if the degree of unacceptability triggered by its violation varies from context to context”.
- Constraint ranking: It takes into account how some constraint violations are more significant than the other ones. Constraint ranking is especially essential for representing degrees of acceptability since it seems clear that the speakers find some violations more notable than others.
- Cumulativity: This effect is present in those structures that violate multiple constraints in contrast to those structures that violate a single constraint which is highly ranked.
- Constraint counterbalance: This notion is found in Blache and Prost [56] (p. 7) as an alternative use of cumulativity. Constraint counterbalance claims that “cumulativity must take into account both violated and satisfied constraints; in contrast with standard cumulativity which takes into account only the violated ones.”
- Ganging up effect: This effect shows up when a constraint has been violated multiple times in a structure. Acknowledging this effect allows us to consider that a constraint, which might be ranked below another one, can trigger more unacceptability if it has been violated more repeatedly than that which is ranked higher and violated just a single time.
- Soft and hard constraints are proposed as a paired concept by Keller [8,9]. Both constraints share features such as universal effects of being ranked, being cumulative and performing a ganging up effect. However, they also have features that distinguish them: Hard constraints trigger strong unacceptability when violated, while soft constraints trigger mild violations; hard constraints are independent of their context while soft ones are context dependent; hard constraints are mandatory during the acquisition process of a language as both for a native or as a second language acquisition, while soft constraints display optional traits when they are being acquired.
- Violation position: This notion is also from Blache and Prost [56] (p. 7) and points out how the value of a violation of a constraint might differ from one syntactic structure from another.
- Weights and rules: Linguists who work in gradience weigh constraints according to their ranking, context effect, and how hard and soft they are. The weights of constraints are deeply dependent on the perceived, extracted or intuited impact on native speaker’s acceptability. Usually, the degree of grammaticality and acceptability of a linguistic input is computed as the sum of the weights of the violations triggered by an utterance.
3. Formal Prerrequisites
3.1. Property Grammars
- −
- Linearity of precedence order between two elements: A precedes B, in symbols, . Therefore, a violation is triggered when B precedes A. A typical example of this property can be found with the precedence relation between the determiner (DET) and the noun (NOUN) in English: For example, in “The kid”.
- −
- Co-occurrence between two elements: A requires B, in symbols, . A violation is triggered if A occurs, but B does not. A typical example of this property in English is “The woman plays basketball” where . A violation would be “Woman plays basketball”. Moreover, co-occurrence demands at the same time that B requires A. This property is non-hierarchic and non-headed. Therefore, the co-occurrence property must figure in both categories.
- −
- Exclusion between two elements: A and B never appear in co-occurrence in the specified construction, in symbols, , that is, only A or only B occurs. A violation is triggered if both A and B occur. An example of this property in English is the exclusion between the pronoun (PRON) and the noun (NOUN): For example, in “She woman watches a movie”. Unlike co-occurrence, this property does not necessarily figure in both property descriptions.
- −
- Uniqueness means that neither a category nor a group of categories (constituents) can appear more than once in a given construction. For example, in a construction X, . A violation is triggered if one of these constituents is repeated in a construction. A classical example in English is the non-repetition of the determiner and the relative pronoun concerning the nominal construction: In “The the woman that who used to be my partner”, we have a nominal construction: .
- −
- Dependency. An element A has a on an element B, in symbols, . A violation is triggered if the specified dependency does not occur. A classical example in English is the relation between an adjective (ADJ) with the dependency of a modifier, and a noun. For example, in “Colombia is a big country”, . One such violation might be: “Colombia is a big badly”. This property can be perceived as the syntactic property which mixes syntactic features with semantic ones.
3.2. Fuzzy Natural Logic
- Development of methods for construction of models of systems and processes on the basis of expert knowledge expressed in genuine natural language.
- Development of algorithms making computer to “understand” natural language and behave accordingly.
- Help to understand the principles of human thinking.
- A satisfied constraint is a constraint of a grammar which is found reproduced in a linguistic input.
- A violated constraint is a constraint of a grammar which is found infringed in a linguistic input.
- A variability constraint is a constraint that is triggered when a violation occurs, compensating the final value of grammaticality. Variability rules are found and justified by context effects phenomena. Mainly by the tandem of the linguistic concept of markedness and frequency of appearance, as in the work of Keller [8], Müller [55].
4. Fuzzy Property Grammars
4.1. Our Basic Idea of Graded Grammaticality
- (1)
- Grades can be found in grammar by considering:
- –
- Linguistic constraints that definitely belong to a grammar. Those are labeled as canonical or prototypical constraints.
- –
- Linguistic constraints that belong to the grammar in only marked contexts. Those are labeled as non-canonical, non-prototypical, borderline, or marked constraints.
- (2)
- Graded grammaticality can be found in linguistic inputs (utterances) when describing how many linguistic constraints from a specific linguistic input can be found in a specific grammar. Therefore, a gradient grammar such as a Fuzzy Property Grammar understands grammaticality as the vague relationship between a naturally produced linguistic input and a grammar in gradient terms. This relationship can be expressed in a degree [0, 1] according to how many rules and/or linguistic constraints have been identified by the grammar towards a linguistic input as constraints that are definitely part of the grammar (satisfied and prototypical), partially part of the grammar and satisfied (satisfied and borderline), and definitely not part of the grammar (violated constraints).
4.2. Definition of a Fuzzy Grammar and Fuzzy Property Grammar
- is the set of constraints that can be determined in phonology.
- is the set of constraints that can be determined in morphology.
- is the set of constraints that characterize syntax.
- is the set of constraints that characterize semantic phenomena.
- is the set of constraints that occur on lexical level.
- is the set of constraints that characterize pragmatics.
- is the set of constraints that can be determined in prosody.
- The morphological domain, which defines the part-of-speech (or linguistic categories) and the constraints between lexemes and morphemes. For example, in English, the lexeme of a “Regular Verb” ≺ (precedes) the morpheme -ed.
- The syntactical domain, which defines the structure relations between categories in a linguistic construction or phrase. For example, in English, an adverb as a modifier of an adjective is dependent (⇝) of such adjective ().
- The semantic domain, which defines the network-of-meanings of a language and its relation with the syntactical domain. This can be defined with semantic frames [75,76]. It is also responsible for explaining semantic phenomena as metaphorical meaning, metonymy, and semantic implausibility. For example, in English, object (i.e., ) ⇒ (requires) (i.e., ). A metonymy can be triggered with the follow rule: If asking for something to without (i.e., “I am reading R. L. Stevenson”), then is included as a feature in the frame of object as a borderline frame, i.e., .
4.3. A Fuzzy Grammar Computed Using Evaluative Linguistic Expressions
- IF an input is significantly satisfied THEN the degree of grammaticality is high.
- IF an input is quite satisfied THEN the degree of grammaticality is medium.
- IF an input is barely satisfied THEN the degree of grammaticality is low.
- IF the degree of grammaticality is high THEN the input is significantly grammatical.
- IF the degree of grammaticality is medium THEN the input is quite grammatical.
- IF the degree of grammaticality is low THEN the input is barely grammatical.
4.4. Constraint Behavior
- (a)
- Syntactic Canonical Properties: These are the properties which define the gold standard of the Fuzzy Grammar. These are strictly the most representative constraints, based on both their frequency of occurrence and some theoretical reasons. These properties are represented by the type .
- (b)
- Syntactic Violated Properties: These properties are canonical properties which have been violated regarding a linguistic input or a dialect. Pointing out the violation of a canonical property is necessary in order to trigger the related syntactic variability properties (if it is needed). These properties are represented with the type .
- (c)
- Syntactic Variability Properties: These properties are the core of this framework. These are triggered in the fuzzy grammar only when a violation is identified in an input. Therefore, these are borderline cases in between a violation and a canonical. They explain linguistic variability concerning a fuzzy grammar. When a variability property is satisfied, it triggers a new value over the violated constraint improving its degree of grammaticality. These properties are represented with the type . Variability constraints are found and justified by context effects phenomena. Mainly by the tandem of the linguistic concept of markedness, and frequency of appearance, as in Keller [8], Müller [55].
4.5. Syntactic Variability and xCategory
4.6. Constraint Characterization: Part of Speech and Features
- Our part of speech nomenclature for constraint categorization for both words and lexical units takes into account only these 10 categories. The constraints have been extracted by using Universal Dependencies. Therefore, FPGr has based its part of speech in the universal dependencies criteria for future implementations.
- Our construction nomenclature takes into account only these six constructions. These constructions have been found as the most frequent in Spanish while extracting the Universal Dependency corpus (Section 5.1). Therefore, we have considered that those are the most general constructions of Spanish language. Our grammar does not consider more marked structures such as comparatives, superlatives, or widespread idioms yet.
- Our construction nomenclature in Table 1 takes into account three constraint behaviors already mentioned in Section 4.4.
5. Materials and Methods for Extracting and Computing Degrees of Grammaticality
5.1. Extracting and Placing Constraints
5.1.1. Universal Dependency Spanish Corpus Treebank
- -
- Clause 10441:2. “El municipio de Sherman” (“The Municipality of Sherman”), as a subject.
- –
- “El” (“The”, masculine and singular) is tagged as a determiner with the dependency of determiner towards the noun “municipio “municipality”.
- –
- “municipio” is tagged as a noun and it is the root of the subject clause (10441:2). It receives as dependents “El” (tagged as determiner), and a proper noun clause (10441:5) headed by “Sherman” as a proper noun. “Sherman” receives as a dependent the adposition “de” (“of”).
- -
- Clause 10441:10. “ubicado en las coordenadas” (“located at the coordinates”) is denoted as a verbal complement.
- –
- “ubicado” (“located”) is tagged as a verb and it is the root of the complement of the verbal clause (10441:10). It receives as dependents a noun clause (10441:12) headed by “coordenadas” (“coordinates”) as a noun. “coordenadas” receives as a dependents the adposition “en” (“in”), and the determiner “las” as (“the feminine and plural”).
- –
- “se” is denoted as an unstressed pronoun since, in this case, the verb “ encuentra” is a pronominal verb which requires the pronoun for expressing such meaning of “finding something or somebody in a location”.
5.1.2. Spanish Syntactic Properties and MarsaGram
- (a)
- The corpus allows us to work with linguistic categories and their dependencies to find dependency phrases: Noun phrases, adjective phrases, prepositional phrases, and so on, and their properties. For example, Figure 3 is an example of the as a linguistic category as dependency.
- (b)
- We can check the most important/frequent categories for each construction in terms of dependencies and properties. For example, Figure 3 displays the eight phrase structures ranked with an index 58, 59, 90, 100, 117, 170, 211, and 366. They all have low frequency due to the fact that a as a in Spanish is rare. We find the constructions in “constituents”. The constituents in index 58, 59, 90, 170, and 366 display structures where we can find a as a nominal construction of subject, while in index 211 the is on a nominal construction of subject as a passive subject, which can show different properties.
- (c)
- We can apply the notion of construction from Goldberg [80] to the pair of constituency plus dependencies which appear in the RULES section, i.e., a subject construction is a subject dependency-constituency-phrase, a direct object construction is extracted from a direct object dependency-constituency-phrase, and so on. Therefore, we can see which constituents take part in the most common syntactic constructions of Spanish since we operate with an objective statistical frequency number.
- (d)
- MarsaGram provides two weights based on the frequency of each property. is a weight that depends on the number of times a property has been violated, while is a numerical value of the importance of a property in the corpus. This value corresponds to the frequency of a property. Therefore, a property that has never been violated ( as 1) but which has a low numerical value in the corpus ( as 0.001) means that it is either residual or an exception. A property with a high value of importance (), together with a high value of satisfaction (), is a significant property which the speakers tend to respect.
- (e)
- The properties of linearity, co-occurrence, exclusion, and uniqueness have been automatically extracted by MarsaGram. However, particular care needs to be taken with the exclusion property (or it should be disregarded) since it seems that the algorithm over-induces exclusion regarding a category for every other category which does not appear in a construction. MarsaGram makes it possible to check every property extracted in the context of the real sentence.
- 1.
- The generation of properties depends on the universal dependencies tag. If the latter tag is wrong, it will generate a non-wanted property. For this reason, in general, it is better to always review the properties for each specific construction, its dependencies, and the actual sentence altogether, without implicitly trusting the automatically extracted ones. Therefore, some properties need to be justified with additional theoretical reasons rather than just frequencies.
- 2.
- It is not possible to automatically extract rules or properties for single elements with MarsaGram and universal dependencies. For example, if we want to check a PRON (alone) as a subject such as “Este es mi cuarto” (“This is my bedroom”), we cannot do it checking PRON-nsubj because “Este” (“This”), a appearing alone as a subject, is not extracted as a clause. In order to check so, we have to do it manually. In this case, we would need to check a alone as a subject on the rules and properties extracted from the as of a sentence. This is illustrated in Figure 3.
5.1.3. Overview of Spanish Universal Dependencies and MarsaGram Corpus
5.1.4. Why an Extraction of the Subject Construction?
- The categories of , , and are the ones which most perform the categories of subject construction, direct object construction, and indirect object construction.
- , , and (with a preposition) are the categories which mostly introduce a modifier construction.
- The is essential for representing verbal constructions, together with all those other constructions that have requirement relations with it: subject, direct object, and indirect object constructions.
5.2. Defining Prototypical and Non-Prototypical Fuzzy Constraints
- (a)
- : It assigns to each property a behavior and a number regarding the category in a construction.
- (b)
- Specifications: They can specify features for each category. This trait is handy for those categories which have sub-categories, just like the verbs. We could specify some properties for infinitive verbs (), and others for copulative verbs (), instransitive verbs () and so on.
- (c)
- ∧: This symbol is understood as and. It allows defining a category and its properties concerning many different categories (or features) at the same time. Therefore, all the elements must be satisfied, or it will trigger a violation. This property prevents over-satisfaction, since it groups many categories under the same property. The over-satisfaction mainly occurs concerning the exclusion property. Exclusion property used to involve many categories.
- –
- ;
- –
- ;
- –
- .
- (d)
- ∨: This symbol is understood as or. It allows defining a category and its property concerning many different categories (or features) at the same time. One of the elements regarding ∨ must satisfy the specified property, or a violation will be triggered. This property prevents over-violation.
- –
- ;
- –
- ;
- –
- ;
- –
- ;
- –
- .
- (e)
- : It allows to specify the properties for the feature within the prototypical category. In Table 6, the constraints for a non-canonical noun with a syntactic fit of a noun can fe found in in .
5.3. Word Density and Degrees of Grammaticality
- Context effects: We have extracted the properties according to its frequency and by applying theoretical notions such as the concept of markedness. A value just based on frequencies is avoided, in favor of a value based on a combination of frequencies plus the notion of markedness among other theoretical reason according to context effects. In such manner:
- -
- A theoretical canonical value is understood as 1 ().
- -
- A violated value is understood as 0 ().
- -
- A variability value is understood as a 0.5 ().
- Cumulativity, ganging up effect, constraint counterbalance, and positive ganging up effect. A Property Grammar takes into account different constraint behavior (both violated and satisfied) and the multiple repetitions of both a single violation or various violations for calculating degrees of grammaticality. It also considers the multiple repetitions of both a single satisfaction or various satisfied properties for calculating degrees of grammaticality.
- Density. This notion weights each constraint regarding the number of constraints that defines a category. In our approach, density weights each constraint according to the number of constraints of a category in the construction of an input that are triggered (either satisfied or violated).
5.4. Computing the Grammaticality Values from an Input
6. Results
- Thirty-two canonical properties for 6 types of Verb construction. Three variability properties for 6 types of verb construction.
- Five canonical properties for the noun () as subject, and one variability property.
- Seven canonical properties for the adjective (), and one variability property.
- Six canonical properties for the noun () as a modifier and one variability property.
- Three canonical properties for the preposition () as a specifier.
- Two canonical properties for the proper noun () as subject and three variability properties.
- Four canonical properties for the proper noun () as a modifier and three variability properties.
- Five canonical properties for the pronoun () and two variability properties.
- Four canonical properties for the determiner (), and one variability property.
- Five variability properties for .
- Two variability properties for the .
- Two variability properties for the .
7. Discussion: Theoretical Application of Degrees of Grammaticality in Natural Language Examples
7.1. Example 1: Parsing Constructions with Variability Constraints
- The value of the word “funcionarios” (Public-workers) as a is estimated with the theoretical value of 1: .
- The value of each canonical property of () is calculated dividing all the triggered canonical properties both satisfied () or violated () (4) by our standard value of a canonical property (1). The canonical value of each property in Table 8 is 0.25.
- The value of a variability property ( is calculated by dividing the value of a variability property (0.5) by all triggered satisfied () and violated () constraints in (“funcionarios”). The variability value of each property in Table 8 is 0.125.
- cannot either satisfy or violate because any determiner has appeared. Therefore, our property grammar cannot evaluate its uniqueness in . In this manner, the property is not triggered.
- satisfies 2 canonical properties out of 4. We calculate as 0.5.
- satisfies 1 variability properties out of 1: : ⇒. We calculate as 0.125.
- The value of grammaticality of the word funcionarios in subject construction in (22) is 0.625. The input is quite grammatical.
- However, the value of grammaticality of in (23) is 0.5. The input displays a borderline case between being quite grammatical and barely grammatical. Because our Fuzzy Property Grammar took into account such variability property as in , we can provide a more fine-grained value such as the one presented in (22).
7.2. Example 2: Mind Which Constraints Shall Be Included in the Grammar
7.3. Example 3: The Feature xCategory in Processing Natural Language
- “El hombre robot corre” (The robot man runs)
- Our Fuzzy Grammar considers a value of 0.581 as an input quite satisfied; therefore, its value of grammaticality is medium. The input is quite grammatical.
- In contrast, the value of the input “robot” as a would be 0.498 without our variability property. In this manner, this input without our variability property in our fuzzy grammar would be computed as an input which is barely satisfied. Therefore, its value of grammaticality would be low. The input would be barely grammatical.
- We recognize that another combination of words for and such as “El hombre paz” (The man peace) or “El cielo hombre” (The sky man) would have the same value of grammaticality regarding the syntactic domain. However, it would not have the same grammaticality value regarding a cross-domain perspective of a Fuzzy Grammar. We state that the combination of and is syntactically possible to a certain degree. However, its degree of grammaticality regarding the other domains (such as semantics) would rely on the satisfaction or the violation of its properties in such domains. Consequently, the final value of grammaticality of two identical syntactic structures might be different in a Fuzzy Property Grammar when we calculate grammaticality of an utterance regarding all the properties in all their domains.
8. Conclusions
9. Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Spanish Fuzzy Property Grammars for Subject Construction
: : : : : : |
Variability Properties |
: ⇒ |
: : : : : : |
: : ≺ : |
: |
: |
: |
: |
Variability Properties |
: |
: : |
: : : : : : : |
Variability Properties of |
: ⇒ |
: : ∨V : |
: ⊗ |
: |
Variability Properties |
: ¬() ⇒ : ¬() ⇒ : ¬() ⇒ |
: : : |
: |
Variability Properties |
: ¬() ⇒ : ¬() ⇒ : ¬() ⇒ |
: ⊗∧∧∧∧ : ≺ : ⇒ : ⊗∧∧ : V |
Variability Properties |
: ¬() ⇒≺ : ¬() ⇒ |
: ⇒∧≺ |
: ⇒ { |
: : : : : |
Variability Properties |
: ¬() ⇒ |
in |
: ⇒∨∨ : ⇒∨∨∨∨ : ∨∨≺ : ≺∨∨∨∨ : ≺ : ≺ : ≺ : : : : |
: : ⇒∨∨ : ∨∨≺ : ≺ : ≺ : : |
: ⇒∨∨∨ : ⇒∨∨∨∨ : ∨∨≺ : ≺∨∨∨∨ : ∧≺. : : . : |
: : : : : |
: : : : |
: : |
Variability Properties |
: ⇒ : ⇒V⇒ : ⇒⇒∨ |
References
- Hayes, P.J. Flexible parsing. Am. J. Comput. Linguist. 1986, 7, 232–242. [Google Scholar] [CrossRef]
- Bolinger, D.L.M. Generality: Gradience and the All-or-None, 14th ed.; Mouton Publishers: The Hague, The Netherlands, 1961. [Google Scholar]
- Ross, J.R. The category squish: Endstation hauptwort. In Papers from the 8th Regional Meeting of the Chicago Linguistic Society; Peranteau, P.M., Levi, J., Phares, G., Eds.; Chicago Linguistic Society: Chicago, IL, USA, 1972; pp. 316–328. [Google Scholar]
- Lakoff, G. Fuzzy grammar and the performance/competence terminology game. In Papers from the Ninth Regional Meeting of the Chicago Linguistic Society; Corum, C.W., Smith-Stark, T.C., Weiser, A., Eds.; Chicago Linguistic Society: Chicago, IL, USA, 1973. [Google Scholar]
- Manning, C.D. Probabilistic syntax. In Probabilistic Linguistics; Bod, R., Hay, J., Jannedy, S., Eds.; MIT Press: Cambridge, UK, 2003; pp. 289–341. [Google Scholar]
- Aarts, B.; Denison, D.; Keizer, E.; Popova, G. (Eds.) Fuzzy Grammar: A Reader; Oxford University Press: Oxford, UK, 2004. [Google Scholar]
- Aarts, B. Conceptions of gradience in the history of linguistics. Lang. Sci. 2004, 26, 343–389. [Google Scholar] [CrossRef]
- Keller, F. Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality. Ph.D. Thesis, University of Edinburgh, Edinburgh, UK, 2000. [Google Scholar]
- Keller, F. Linear Optimality Theory as a model of gradience in grammar. In Gradience in Grammar: Generative Perspectives; Fanselow, G., Féry, C., Schlesewsky, M., Vogel, R., Eds.; Oxford University Press: Oxford, UK, 2006; pp. 270–287. [Google Scholar] [CrossRef]
- Fanselow, G.; Féry, C.; Vogel, R.; Schlesewsky, M. Gradience in grammar. In Gradience in Grammar: Generative Perspectives; Fanselow, G., Féry, C., Schlesewsky, M., Vogel, R., Eds.; Oxford University Press: Oxford, UK, 2006; pp. 1–23. [Google Scholar]
- Prost, J.P. Modelling Syntactic Gradience with Loose Constraint-Based Parsing. Ph.D. Thesis, Macquarie University, Macquarie Park, NSW, Australia, 2008. [Google Scholar]
- Bresnan, J.; Nikitina, T. The gradience of the dative alternation. In Reality Exploration and Discovery: Pattern Interaction in Language and Life; Uyechi, L., Wee, L.H., Eds.; The University of Chicago Press: Chicago, IL, USA, 2009; pp. 161–184. [Google Scholar]
- Goldberg, A. The nature of generalization in language. Cogn. Linguist. 2009, 20, 1–35. [Google Scholar] [CrossRef]
- Baldwin, T.; Cook, P.; Lui, M.; MacKinlay, A.; Wang, L. How noisy social media text, how diffrnt social media sources? In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan, 14–18 October 2013; pp. 356–364. [Google Scholar]
- Lesmo, L.; Torasso, P. Interpreting syntactically ill-formed sentences. In Proceedings of the 10th International Conference on Computational Linguistics and 22nd Annual Meeting on Association for Computational Linguistics, Stanford, CA, USA, 2–6 July 1984; Association for Computational Linguistics: Stanford, CA, USA, 1984; pp. 534–539. [Google Scholar]
- Eisenstein, J. What to do about bad language on the internet. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Rochester, NY, USA, 23–25 April 2013; ACL: Atlanta, GA, USA, 2013; pp. 359–369. [Google Scholar]
- Lavie, A. GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language. Ph.D. Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PE, USA, 1996. [Google Scholar]
- Lau, J.H.; Clark, A.; Lappin, S. Measuring Gradience in Speakers’ Grammaticality Judgements. Proc. Annu. Meet. Cogn. Sci. Soc. 2014, 36, 821–826. [Google Scholar]
- Blache, P. Property grammars and the problem of constraint satisfaction. In Proceedings of the ESSLLI 2000 Workshop on Linguistic Theory and Grammar Implementation, Birmingham, UK, 6–18 August 2000; pp. 47–56. [Google Scholar]
- Blache, P. Representing syntax by means of properties: A formal framework for descriptive approaches. J. Lang. Model. 2016, 4, 183–224. [Google Scholar] [CrossRef]
- Chomsky, N. Aspects of the Theory of Syntax; MIT Press: Cambridge, UK, 1965. [Google Scholar]
- Chomsky, N. The Minimalist Program; MIT Press: Cambridge, UK, 1995. [Google Scholar]
- Sorace, A.; Keller, F. Gradience in linguistic data. Lingua 2005, 115, 1497–1524. [Google Scholar] [CrossRef]
- Lau, J.H.; Clark, A.; Lappin, S. Grammaticality, acceptability, and probability: A probabilistic view of linguistic knowledge. Cogn. Sci. 2017, 41, 1202–1241. [Google Scholar] [CrossRef]
- Torrens-Urrutia, A.; Novák, V.; Jiménez-López, M.D. Describing Linguistic Vagueness of Evaluative Expressions Using Fuzzy Natural Logic and Linguistic Constraints. Mathematics 2022, 10, 2760. [Google Scholar] [CrossRef]
- Torrens-Urrutia, A.; Jiménez-López, M.D.; Brosa-Rodríguez, A.; Adamczyk, D. A Fuzzy Grammar for Evaluating Universality and Complexity in Natural Language. Mathematics 2022, 10, 2602. [Google Scholar] [CrossRef]
- Floridi, L.; Cowls, J.; Beltrametti, M.; Chatila, R.; Chazerand, P.; Dignum, V.; Luetge, C.; Madelin, R.; Pagallo, U.; Rossi, F.; et al. AI4People—An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. Minds Mach. 2018, 28, 689–707. [Google Scholar] [CrossRef] [Green Version]
- Blache, P. Property grammars: A fully constraint-based theory. In Constraint Solving and Language Processing; Christiansen, H., Skadhauge, P.R., Villadsen, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3438, pp. 1–16. [Google Scholar]
- Joshi, A.K.; Levy, L.S.; Takahashi, M. Tree adjunct grammars. J. Comput. Syst. Sci. 1975, 10, 136–163. [Google Scholar] [CrossRef]
- Schutze, C.T. The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology; University of Chicago Press: Chicago, IL, USA, 1996. [Google Scholar]
- Pullum, G.K.; Scholz, B.C. On the distinction between model-theoretic and generative-enumerative syntactic frameworks. In Proceedings of the International Conference on Logical Aspects of Computational Linguistics, Le Croisic, France, 27–29 June 2001; Springer: Berlin/Heidelberg, Germany, 2001; pp. 17–43. [Google Scholar]
- Aarts, B. Modelling linguistic gradience. Stud. Lang. 2004, 28, 1–49. [Google Scholar] [CrossRef]
- Jespersen, O. The Philosophy of Grammar; George Allen and Unwin Ltd.: London, UK, 1924. [Google Scholar]
- Curme, G.O. A Grammar of the English Language, Vol. II: Parts of Speech and Accidence; DC Heath and Co.: Boston, MA, USA, 1935. [Google Scholar]
- Wells, R. Is a structural treatment of meaning possible? In Proceedings of the Eight International Congress of Linguists; Eva, S., Ed.; Oslo University Press: Oslo, Norway, 1958; pp. 636–704. [Google Scholar]
- Crystal, D. English. Lingua 1967, 17, 24–56. [Google Scholar] [CrossRef]
- Quirk, R. Descriptive statement and serial relationship. Language 1965, 41, 205–217. [Google Scholar] [CrossRef]
- Chomsky, N. The Logical Structure of Linguistic Theory; Plenum Press: Aarhus, Denmark, 1975. [Google Scholar]
- Daneš, F. The relation of centre and periphery as a language universal. Trav. Linguist. De Prague 1966, 2, 9–21. [Google Scholar]
- Vachek, J. On the integration of the peripheral elements into the system of language. Trav. Linguist. De Prague 1966, 2, 23–37. [Google Scholar]
- Neustupnỳ, J.V. On the analysis of linguistic vagueness. Trav. Linguist. De Prague 1966, 2, 39–51. [Google Scholar]
- Ross, J.R. Adjectives as noun phrases. In Modern Studies in English; Reibel, D.A., Schane, S.A., Eds.; Prentice-Hall: Englewood Cliffs, NJ, USA, 1969; pp. 352–360. [Google Scholar]
- Ross, J.R. Auxiliaries as main verbs. In Proceedings of the Studies in Philosophical Linguistics. Series I. Great Expectations; Todd, W., Ed.; Great Expectation: Evanston, IL, USA, 1969; pp. 77–102. [Google Scholar]
- Ross, J.R. A fake NP squish. In New Ways of Analyzing Variation in English; Bailey, C.J.N., Shuy, R.W., Eds.; Georgetown University Press: Washington, DC, USA, 1973; pp. 96–140. [Google Scholar]
- Ross, J.R. Nouniness. In Three Dimensions of Linguistic Theory; Kiparsky, P., Fujimura, O., Eds.; TEC Company Ltd.: Tokyo, Japan, 1973; pp. 137–328. [Google Scholar]
- Ross, J.R. Three batons for cognitive psychology. In Cognition and the Symbolic Processes; Weimer, W.B., Palermo, D.S., Eds.; Lawrence Erlbaum: Oxford, UK, 1974; pp. 63–124. [Google Scholar]
- Ross, J.R. The frozeness of pseudoclefts: Towards an inequality-based syntax. In Proceedings of the Papers from the Thirty-Sixth Regional Meeting of the Chicago Linguistic Society, Chicago, IL, USA, 28–30 December 1961; Okrent, A., Boyle, J., Eds.; Chicago Linguistic Society: Chicago, IL, USA, 2000; pp. 385–426. [Google Scholar]
- Lakoff, G. Linguistics and natural logic. Synthese 1970, 22, 151–271. [Google Scholar] [CrossRef]
- Rosch, E. On the internal structure of perceptual and semantic categories. In Cognitive Development and Acquisition of Language; Moore, T.E., Ed.; Academic Press: Amsterdam, The Netherlands, 1973; pp. 111–144. [Google Scholar]
- Rosch, E. Natural categories. Cogn. Psychol. 1973, 4, 328–350. [Google Scholar] [CrossRef]
- Rosch, E. Cognitive representations of semantic categories. J. Exp. Psychol. Gen. 1975, 104, 192. [Google Scholar] [CrossRef]
- Labov, W. The boundaries of words and their meanings. In New Ways of Analyzing Variations in English; Bailey, C., Shuy, R., Eds.; Georgetown University Press: Washington, DC, USA, 1973; pp. 340–373. [Google Scholar]
- Prince, A.; Smolensky, P. Optimality Theory: Constraint Interaction in Generative Grammar; Rutgers University: New Brunswick, NJ, USA, 1993. [Google Scholar]
- Legendre, G.; Miyata, Y.; Smolensky, P. Harmonic Grammar: A formal multi-level connectionist theory of linguistic well-Formedness: An application. In Proceedings of the 12th Annual Conference of the Cognitive Science Society, Cambridge, UK, 25–28 July 1990; Lawrence Erlbaum Associates: Cambridge, UK, 1990; pp. 884–891. [Google Scholar]
- Müller, G. Optimality, markedness, and word order in German. Linguistics 1999, 37, 777–818. [Google Scholar] [CrossRef]
- Blache, P.; Prost, J.P. A quantification model of grammaticality. In Proceedings of the Fifth International Workshop on Constraints and Language Processing (CSLP2008); Villadsen, J., Christiansen, H., Eds.; Computer Science Research Reports; Roskilde University: Roskilde, Denmark, 2008; Volume 122, pp. 5–19. [Google Scholar]
- Blache, P.; Balfourier, J.M. Property Grammars: A Flexible Constraint-Based Approach to Parsing. In Proceedings of the 7th International Conference on Parsing Technologies, IWPT 2001, Beijing, China, 17–19 October 2001. [Google Scholar]
- Fillmore, C.J. The mechanisms of “construction grammar”. In Proceedings of the Fourteenth Annual Meeting of the Berkeley Linguistics Society; Axmaker, S., Jaisser, A., Singmaster, H., Eds.; Berkeley Linguistics Society: Berkeley, CA, USA, 1988; Volume 14, pp. 35–55. [Google Scholar]
- Goldberg, A. Constructions: A new theoretical approach to language. Trends Cogn. Sci. 2003, 7, 219–224. [Google Scholar] [CrossRef] [PubMed]
- Guénot, M.L.; Blache, P. A descriptive and formal perspective for grammar development. In Proceedings of the Foundations of Natural-Language Grammar, Edinburgh, UK, 16–20 August 2005; Available online: https://hal.science/hal-00134236/document (accessed on 29 November 2022).
- Blache, P.; Prévot, L. A formal scheme for multimodal grammars. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics, Beijing, China, 23–27 August 2010; pp. 63–71. [Google Scholar]
- Blache, P.; Bertrand, R.; Ferré, G. Creating and exploiting multimodal annotated corpora: The toma project. In Multimodal Corpora; Kipp, M., Martin, J.C., Paggio, P., Heylen, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; Volume LNAI 5509, pp. 38–53. [Google Scholar]
- Novák, V. Fuzzy Natural Logic: Towards Mathematical Logic of Human Reasoning. In Towards the Future of Fuzzy Logic; Seising, R., Trillas, E., Kacprzyk, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2015; pp. 137–165. [Google Scholar]
- Novák, V. On Fuzzy Type Theory. Fuzzy Sets Syst. 2005, 149, 235–273. [Google Scholar] [CrossRef]
- Montague, R. Universal grammar. Theoria 1970, 36, 373–398. [Google Scholar] [CrossRef]
- Novák, V. A Comprehensive Theory of Trichotomous Evaluative Linguistic Expressions. Fuzzy Sets Syst. 2008, 159, 2939–2969. [Google Scholar] [CrossRef]
- Novák, V. Mathematical Fuzzy Logic in Modeling of Natural Language Semantics. In Fuzzy Logic—A Spectrum of Theoretical & Practical Issues; Wang, P., Ruan, D., Kerre, E., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 145–182. [Google Scholar]
- Novák, V.; Lehmke, S. Logical Structure of Fuzzy IF-THEN rules. Fuzzy Sets Syst. 2006, 157, 2003–2029. [Google Scholar] [CrossRef]
- Murinová, P.; Novák, V. Syllogisms and 5-Square of Opposition with Intermediate Quantifiers in Fuzzy Natural Logic. Log. Universalis 2016, 10, 339–357. [Google Scholar] [CrossRef]
- Novák, V. A Formal Theory of Intermediate Quantifiers. Fuzzy Sets Syst. 2008, 159, 1229–1246. [Google Scholar] [CrossRef]
- Novák, V. Fuzzy Natural Logic: Theory and Applications. In Proceedings of the Fuzzy Sets and Their Applications FSTA 2016, Liptovsky Jan, Slovakia, 27 January 2016; Available online: https://irafm.osu.cz/f/Conferences/FSTA2016Sli.pdf (accessed on 29 November 2022).
- Torrens-Urrutia, A. An Approach to Measuring Complexity with a Fuzzy Grammar & Degrees of Grammaticality. In Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing, Santa Fe, NM, USA, 25 August 2018; pp. 59–67. [Google Scholar]
- Torrens-Urrutia, A. An approach to measuring complexity within the boundaries of a natural language fuzzy grammar. In Proceedings of the International Symposium on Distributed Computing and Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2018; pp. 222–230. [Google Scholar]
- Torrens Urrutia, A. A Formal Characterization of Fuzzy Degrees of Grammaticality for Natural Language. Ph.D. Thesis, Universitat Rovira i Virgili, Tarragona, Spain, 2019. [Google Scholar]
- Fillmore, C.J.; Baker, C. A frames approach to semantic analysis. In The Oxford Handbook of Linguistic Analysis; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
- Goldberg, A. Verbs, constructions and semantic frames. In Syntax, Lexical Semantics, and Event Structure; Oxford University Press: Oxford, UK, 2010; pp. 39–58. [Google Scholar]
- Novák, V.; Perfilieva, I.; Dvorak, A. Insight into Fuzzy Modeling; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Novak, V. Evaluative linguistic expressions vs. fuzzy categories. Fuzzy Sets Syst. 2015, 281, 73–87. [Google Scholar] [CrossRef]
- Blache, P.; Rauzy, S.; Montcheuil, G. MarsaGram: An excursion in the forests of parsing trees. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, 23–28 May 2016; pp. 2336–2342. [Google Scholar]
- Goldberg, A. Constructions: A Construction Grammar Approach to Argument Structure; University of Chicago Press: Chicago, IL, USA, 1995. [Google Scholar]
- Sakthivel, R.; Kavikumar, R.; Mohammadzadeh, A.; Kwon, O.M.; Kaviarasan, B. Fault estimation for mode-dependent IT2 fuzzy systems with quantized output signals. IEEE Trans. Fuzzy Syst. 2020, 29, 298–309. [Google Scholar] [CrossRef]
Constraint Characterization in FPGr | ||
---|---|---|
Part of Speech | Constructions and Construction Features | Constraint Behavior |
: Determiner | : Subject construction | |
: Adjective | : Verbal construction | |
: Noun | : Direct Object construction | |
: Pronoun | : Modifier construction | |
V: Verb | : Specifier construction | |
: Adverb | : Coordinate construction | |
: Coordinate conjunction | : Subordinate construction | |
: Subordinate conjunction | ||
: Preposition |
Universal Dependencies Treebank in Spanish | |||||
---|---|---|---|---|---|
Argument Constructions | |||||
Constructions | Subject Construction | Verbal Construction | Direct Object Construction | Indirect Object Construction | |
Dependencies in Universal Dependencies | nsubj, nsubjpass, csubj | root, cop, aux, aux:pass | dobj, ccomp, xcomp | iobj | |
Dependencies in FPGr | subj | dep, comp, aux | obj | iobj | |
Adjunct Constructions | |||||
Constructions | Modifier Construction | Specifier Construction | Conjunctive Construction | Subordinative Conjunctive Construction | Others |
Dependencies in Universal Dependencies | nmod, appos, name, nummod, amod, advmod, neg, acl, advcl | det, case | cc, conj | mark, acl:relcl | Compound, mwe, parataxis, punct, dep |
Dependencies in FPGr | mod | spec | conj, dep | conj, dep | Not considered |
Parts of Speech | NB RULES/ FILTERED RULES | PROPERTIES | OCCURRENCES | CORPUS FREQUENCY |
---|---|---|---|---|
NOUN | 1769 (+1070) | 117 | 77.925 | 18.08% |
ADP –Adposition/Preposition– | 26 (+40) | 86 | 70.738 | 16.42% |
DET –Determiner– | 9 (+27) | 45 | 60.465 | 14.04% |
PUNTC –Punctuation– | 1 (+6) | 0 | 47.448 | 11.01% |
VERB | 2437 (+6387) | 157 | 40.950 | 9.51% |
PROPN –Proper Noun– | 670 (+1467) | 140 | 40.506 | 9.40% |
ADJ –Adjective– | 358 (+1070) | 166 | 23.891 | 5.55% |
CONJ –Coordinating conjunction– | 16 (+24) | 52 | 13.787 | 3.20% |
PRON –Pronoun– | 146 (+351) | 118 | 13.552 | 3.15% |
ADV –Adverb– | 72 (+124) | 117 | 12.510 | 2.90% |
NUM –Numeral– | 116 (+211) | 116 | 11.834 | 2.75% |
SCONJ –Subordinating conjunction– | 16 (+6) | 67 | 8059 | 1.87% |
AUX –Auxiliary– | 15 (+32) | 42 | 6033 | 1.40% |
X –Non-classified– | 94 (+263) | 114 | 1952 | 0.45% |
SYM –Symbol– | 44 (+73) | 113 | 0177 | 0.25% |
PART –Particle– | 1 (+4) | 2 | 37 | 0.01% |
Categories and Dependencies | |||||||||
---|---|---|---|---|---|---|---|---|---|
subj | dobj | iobj | mod | root | |||||
CAT | FREQ % | CAT | FREQ % | CAT | FREQ % | CAT | FREQ % | CAT | FREQ % |
NOUN | 2.32% | NOUN | 2.49% | PRON | 1.54% | ADP+NOUN | 14.70% | VERB | 2.75% |
PROPN | 0.92% | PRON | 0.45% | NOUN | 0.13% | ADJ | 4.64% | NOUN | 0.61% |
PRON | 0.35% | PROPN | 0.18% | PROPN | 0.05% | NUM | 2.48% | ADJ | 0.24% |
NUM | 0.02% | NUM | 0.01% | ADV | 2.17% | PRON | 0.06% | ||
PROPN | 0.62% | PROPN | 0.04% | ||||||
NOUN | 0.03% | NUM | 0.01% |
Subject Construction | ||
---|---|---|
Category | Frequency in Corpus | Frequency as Subject |
NOUN | 2.32% | 64.62% |
PROPN | 0.92% | 25.62% |
PRON | 0.35% | 9.47% |
: : : : : : |
Variability Properties |
: ⇒ |
: : : : : : |
Parts of Speech-dep | ||||||
Satisfied Cons. out of total Cons. | 4/4 | 5/5 | 2/3 | 3/3 | 4/4 | 1/3 |
and | = 0.25 = 0.25 = 0.25 = 0.25 | = 0.2 = 0.2 = 0.2 = 0.2 = 0.2 | = 0 = 0.33 = 0.33 = 0.165 | = 0.25 = 0.25 = 0.25 = 0.25 | = 0.33 = 0 = 0 | |
1 | 1 | 0.825 | 1 | 1 | 0.33 | |
0.8591 |
Case: Triggering Variability Properties | |||
---|---|---|---|
Sentence | Funcionarios (Public-workers) | del estado (of the state) | sufrieron las pérdidas (suffered the loses) |
CAT | |||
Properties | : : : ⇒ : : | ||
Grammaticality | = 0.25 = 0.125 = 0.625 |
In Case is in | |||
---|---|---|---|
Sentence | El (The) | chico (boy) | corre (runs) |
CAT | |||
Properties | : : : : : | : : : : : : : ≺ : | : : ⇒ : ≺ : : |
Grammaticality | = 0.25 = 1 | = 0.142 = 0.714 | = 0.25 = 1 |
In Case is in | |||
---|---|---|---|
Sentence | El (The) | chico (boy) | corre (runs) |
CAT | |||
Properties | : : : : : | : : : : : : | : : ⇒ : ≺ : : |
Grammaticality | = 0.25 = 1 | = 0.2 = 1 | = 0.25 = 1 |
Case: in | ||||
---|---|---|---|---|
Sentence | El (The) | hombre (man) | robot (robot) | corre (runs) |
CAT | ||||
Properties | : : : : : | : : : : : : | : : : : : : : : ⇒ : : | : : ⇒ : ≺ : : |
Grammaticality | = 0.25 = 1 | = 0.2 = 1 | = 0.166 = 0.083 = 0.581 | = 0.25 = 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Torrens-Urrutia, A.; Novák, V.; Jiménez-López, M.D. Fuzzy Property Grammars for Gradience in Natural Language. Mathematics 2023, 11, 735. https://doi.org/10.3390/math11030735
Torrens-Urrutia A, Novák V, Jiménez-López MD. Fuzzy Property Grammars for Gradience in Natural Language. Mathematics. 2023; 11(3):735. https://doi.org/10.3390/math11030735
Chicago/Turabian StyleTorrens-Urrutia, Adrià, Vilém Novák, and María Dolores Jiménez-López. 2023. "Fuzzy Property Grammars for Gradience in Natural Language" Mathematics 11, no. 3: 735. https://doi.org/10.3390/math11030735
APA StyleTorrens-Urrutia, A., Novák, V., & Jiménez-López, M. D. (2023). Fuzzy Property Grammars for Gradience in Natural Language. Mathematics, 11(3), 735. https://doi.org/10.3390/math11030735