Achieving Semantic Consistency for Multilingual Sentence Representation Using an Explainable Machine Natural Language Parser ( MParser )

: In multilingual semantic representation, the interaction between humans and computers faces the challenge of understanding meaning or semantics, which causes ambiguity and incon-sistency in heterogeneous information. This paper proposes a Machine Natural Language Parser (MParser) to address the semantic interoperability problem between users and computers. By lever-aging a semantic input method for sharing common atomic concepts, MParser represents any simple English sentence as a bag of unique and universal concepts via case grammar of an explainable machine natural language. In addition, it provides a human and computer-readable and -understandable interaction concept to resolve the semantic shift problems and guarantees consistent information understanding among heterogeneous sentence-level contexts. To evaluate the annotator agreement of MParser outputs that generates a list of English sentences under a common multilingual word sense, three expert participants manually and semantically annotated 75 sentences (505 words in total) in English. In addition, 154 non-expert participants evaluated the sentences’ semantic expressiveness. The evaluation results demonstrate that the proposed MParser shows higher compatibility with human intuitions.


Introduction
Multilingual semantic representation [1] presents words, phrases, texts, or documents in heterogeneous parties (e.g., English and Chinese) to achieve semantic consistency. It has been applied in several areas, such as machine translation [2], question answering [3], and document representation [4,5]. The process of parsing a natural language sentence to its semantic representation is called semantic parsing [6], which parses the sentences without representing the syntactic classification of the components of the sentence. Semantic parsing is an essential process and has attracted great attention in multilingual semantic representation and NLP research over the last few decades [6]. Typically, a semantic parser labels each word in the original sentence according to its semantic role or represents each compound component based on its meaning [7]. Several semantic approaches are proposed for parsing natural language sentences in semantic representation, such as Groningen Meaning Bank (GMB) [8] and abstract meaning representation (AMR) [9]. Still, their annotation schemes are designed for individual languages that have language-dependent features. Because many applications require multilingual capabilities, several efforts are underway to create more cross-lingual natural language resources such as universal conceptual cognitive annotation (UCCA) [10], universal networking language (UNL) [11], and universal dependencies (UD) [12]. They are the framework for cross-linguistically consistent grammatical such as universal conceptual cognitive annotation (UCCA) [10], universal networking language (UNL) [11], and universal dependencies (UD) [12]. They are the framework for cross-linguistically consistent grammatical annotation. Despite these efforts, some remaining interlanguage variations important for practical usage are not yet captured by the efforts. They create obstacles to a truly cross-lingual meaning representation that enables downstream applications written in one language to be applicable for other languages. Using cross-lingual language to perform cross-lingual semantic parsing for one language to improve the representation of another language remains a largely under-explored research question. This paper focuses on the problem of multilingual semantic interoperability in semantic representation.
In semantic analysis and labeling, texts and documents are generally very complex because of flexible structural and complex morphological grammars. The state-of-art semantic parser methods and applications have not achieved satisfying results. One technical challenge is the lack of consistent conversions across domains. The heterogeneous text may share heterogeneous meaning and cause semantic loss or misunderstanding between a computer and a user [13]. For example, Figure 1 shows an English inquiry sheet for illustrating the multilingual semantic interoperability problems. The table consists of 10 cells; cells 1-9 contain a single atomic concept, i.e., "one cell one atomic concept" (e.g., Date in cell 1). However, one atomic concept may have multiple meanings. For instance, the word "company" in cell 10 refers to several meanings such as "a commercial business" and "the fact or condition of being with another or others, especially in a way that provides friendship and enjoyment". To achieve accurate atomic concept exchange and guarantee semantic consistency in cells 1-9, several document representation approaches [5,14] are proposed to solve the heterogeneous concept or meaning exchange problem. An effective solution is the collaboration mechanism that connects heterogeneous domains or contexts, allowing the exchange of heterogeneous semantic documents by a semantic input method (SIM) approach [15]. However, some sentences also contain sequences of atomic concepts for a free-text cell (e.g., cell 10), which makes it hard to ensure that the meaning (M1) of an English sentence Ei: = List (w1, w2, …, wn) and the meaning (M2) of a translated Chinese sentence Cj := List (w1, w2, …, wn) will be semantically equivalent. The reasons for causing M1 ≠s M2 ("≠s" refers to not semantically equal) include: (1) Heterogeneous grammatical rules: The language grammars of the components in Ei and Ci have their own rules to generate a sentence and it is impossible to achieve a one-to-one mapping. (2) Synonyms and homonyms: Each term in Ei may have several synonyms or homonyms. A wrong term in meaning may cause semantic ambiguity. (3) Peculiar language phenomena: Some phenomena in Ei never appear in Ci, resulting in asymmetric mapping. For example, the particles of "を, に, で, へ, より" in Japanese do not have counterparts in Chinese. (1) Heterogeneous grammatical rules: The language grammars of the components in E i and C i have their own rules to generate a sentence and it is impossible to achieve a one-to-one mapping. (2) Synonyms and homonyms: Each term in E i may have several synonyms or homonyms.
A wrong term in meaning may cause semantic ambiguity. (3) Peculiar language phenomena: Some phenomena in E i never appear in C i , resulting in asymmetric mapping. For example, the particles of "を, に, で, へ, より" in Japanese do not have counterparts in Chinese.
Therefore, the same sentence will produce completely different scenarios in a heterogeneous text, and the original meaning in mind may be shifted to another meaning. The above problems are called semantic shift problems that change a sentence's original meaning in multilingual semantic representation. Moreover, in natural language texts, users cannot express their information needs in a computer-understandable way or interpret the representation correctly due to problems in representing complex semantics. Therefore, the development of a novel model has been motivated by the following aspects: (1) Computer-human-understandable representation: providing information understandable by both computers and humans, realizing the accurate interpretation of sentences in the human-computer messaging cycle of humans and computers without ambiguity. (2) Accurate semantic representation among computing applications: applying computerhuman-understandable information in computing applications and enabling information to be semantically interoperable.
(3) Automated multilingual information processing by software agents: allowing multilingual information to be automatically processed across domains and contexts.
Thus, this research proposes a new multilingual semantic representation parser for sentence-based text or documents that enhances textual representation and reduces multilingual ambiguity. Based on our previous conceptual work [16], we propose a novel Machine Natural Language Parser (Mparser) to realize universal representations between computers and users unambiguously. The explainable MParser parses a simple English sentence, resolving complex concepts towards a bag of universal concepts sentence-readable and -understandable for any heterogeneous information, and mediates contextual human natural languages collaboratively, as shown in Figure 2. The universal concepts sentence shares a common concept at both the syntactic and semantic levels between users and computers.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 o Therefore, the same sentence will produce completely different scenarios in a he ogeneous text, and the original meaning in mind may be shifted to another meaning. T above problems are called semantic shift problems that change a sentence's original me ing in multilingual semantic representation. Moreover, in natural language texts, us cannot express their information needs in a computer-understandable way or interp the representation correctly due to problems in representing complex semantics. The fore, the development of a novel model has been motivated by the following aspects: (1) Computer-human-understandable representation: providing information und standable by both computers and humans, realizing the accurate interpretation sentences in the human-computer messaging cycle of humans and computers w out ambiguity. (2) Accurate semantic representation among computing applications: applying co puter-human-understandable information in computing applications and enabl information to be semantically interoperable.
(3) Automated multilingual information processing by software agents: allowing mu lingual information to be automatically processed across domains and contexts.
Thus, this research proposes a new multilingual semantic representation parser sentence-based text or documents that enhances textual representation and reduces m tilingual ambiguity. Based on our previous conceptual work [16], we propose a novel M chine Natural Language Parser (Mparser) to realize universal representations betw computers and users unambiguously. The explainable MParser parses a simple Eng sentence, resolving complex concepts towards a bag of universal concepts sentence-re able and -understandable for any heterogeneous information, and mediates context human natural languages collaboratively, as shown in Figure 2. The universal conce sentence shares a common concept at both the syntactic and semantic levels between us and computers. To achieve consistency and universal representation, MParser designs from hum input and sentence generation: (1) In the human input, each unique concept is collaboratively edited with SIM [ based on a common dictionary (CoDic) [17] for eliminating atomic concept ambigu and morphological features. Thus, a simple English sentence can be converted t sequence of unique concepts across conversational contexts. (2) To maintain complex semantic concept consistency between computers and users MParser for English sentences parses the semantic roles between English words a represents them for deriving a unique concept that can be accurately represented a understood by computers through case grammar [16]. The cases are used to la words, which are aligned from local language perspectives. The proposed parser lizes powerful linguistic tools such as Stanford Parser and universal dependency lations. To achieve consistency and universal representation, MParser designs from human input and sentence generation: (1) In the human input, each unique concept is collaboratively edited with SIM [15] based on a common dictionary (CoDic) [17] for eliminating atomic concept ambiguity and morphological features. Thus, a simple English sentence can be converted to a sequence of unique concepts across conversational contexts. (2) To maintain complex semantic concept consistency between computers and users, an MParser for English sentences parses the semantic roles between English words and represents them for deriving a unique concept that can be accurately represented and understood by computers through case grammar [16]. The cases are used to label words, which are aligned from local language perspectives. The proposed parser utilizes powerful linguistic tools such as Stanford Parser and universal dependency relations. (3) Evaluate the proposed MParser through annotator agreement between the expert's case labeling and MParser's outputs. Additionally, 154 non-expert participants investigated judgments of semantic expressiveness.
The rest of this paper is organized as follows. Section 2 compares the proposed approach with related work. Section 3 introduces the general process and methodology of MParser. Section 4 introduces the activity of human semantic input. Section 5 introduces the activity of sentence computerization. Section 6 and 7 implement and evaluate MParser. Finally, a conclusion is given.

Related Work
Semantic representation presents the meaning of sentences, and the process should be reliable and computational [18]. The alternative approaches to semantic representation can be divided into two categories: document representation [1,19] and meaning representation. Document Representation: Currently, document information exchange mainly has three approaches: (1) Standardization approaches define a semantic document by combining a set of standardized document compositions: for example, EDI-based (http:// www.edibasics.com/ediresources/document-standards/), XML-based (ebXML. Available: http://www.ebxml.org) and Web service-based (http://www.edibasics.com/ediresources/ document-standards/). The problem with this approach is that documents are only interoperable on representation syntax and templates, and these standards are heterogeneous and incompatible with each other. (2) Ontology modeling [20,21] approaches define a semantic document in a certain domain (e.g., RDF [22], RDFS [23] and OWL (http://www.edibasics.com/ediresources/document-standards/)). They are usually used to solve the problem of semantic interoperability and realize collaboration. Generally, an ontology clearly describes the relationships of entities [18] and can be employed for knowledge representation. However, if computers in different contexts participate in usercomputer interaction, it will not be easy to achieve a consistent understanding, because an ontology is domain-dependent, preventing it from being understood between heterogeneous document descriptions. (3) Collaborative approaches [17,24] allow participants from different contexts to construct document terms and solve the cross-domain problem, but the document is constrained by a template and lacks flexibility. One issue is that the user still needs a user template to construct the document.
Current subjects of research on document representations are rule format [25,26], ontology [20,24], XML+Ontology [21], tree/graph [27], and collaborative approach [15,17,28]. First, it is not easy to embed and extract meanings to/from a document automatically. For example, it is not easy for a document written in natural languages to be automatically converted to a machine-processable format (e.g., RuleML [25,26]). Second, constructing semantic documents needs intensive work. For example, [5] proposes a semantic disambiguation solution by using a machine-readable semantic network (e.g., WordNet) as a common knowledge base. However, it is time-consuming and sometimes unnecessary because it also disambiguates unambiguous terms. To acquire accurate semantic concept representation for a document, [20] requires learning a concept border from a particular document collection based on a particular ontology in the same domain. However, there is a heavy workload and enormous data redundancy to construct and store concept borders for different domains. Third, it is not easy to maintain semantic consistency between heterogeneous document systems. For example, [24] claims accurate mapping between different ontologies' entities, and [20] requires the similarity computation between keywords in a received document and equivalent terms in a domain-wide ontology. Both approaches hardly reach a trade-off between low computational demand and semantic interoperability.
In short, these approaches rely on the homogeneity of concept in multilingual text or domain semantics, and sentence-based documents or complex concepts may cause semantic loss among different contexts through the above state-of-art approaches.
Semantic Representation: It defines the annotation to construct syntactic structure such as FrameNet [29] and Semlink [30], but focuses on argument out of other relations. In this context, there are several available semantic representation approaches. For instance, universal networking language (UNL) proposes independent language representation so that sentences inputted in any language can be translated into any other natural language. Abstract meaning representation (AMR) [9] proposes a relatively more straightforward sentence-level semantic parser to cover semantic role broad predictions. AMR manually an-Appl. Sci. 2021, 11, 11699 5 of 29 notates sentences and utilizes PropBank frames [31] to represent the semantic relationship between words. However, AMR faces difficulties across translation because the syntactical similarity is not suitable cross-linguistically [32]. Therefore, new multilayered solutions such as universal concept cognitive annotation (UCCA) [10] and universal decompositional semantics (UDS) [33] are applied in cross text for semantic annotation and word senses by BabelNet [34] and Open Multilingual Wordnet (http://compling.hss.ntu.edu.sg/omw/). They constructed substantial multilingual semantic nets to achieve universality by connecting resources such as WordNet and Wikipedia. The method adapts linguistic theory to build a manual and multilingual scheme. However, UCCA annotates short sentences (e.g., multiword expressions) where the same multiword or entity is annotated in many different sentences. Groningen Meaning Bank (GMB) is a new solution to integrate language phenomena into a single formalism instead of covering single phenomena in an isolated way. Additionally, universal dependencies (UD) [12] build cross text dependency-based annotations for multilingual sentences.
Most of the semantic representation methods use simple concepts such as UCCA, but some other methods adapt concepts such as WordNet synsets for UNL and PropBank frames for AMR. Furthermore, UNL has its relationships set while AMR uses PropBank relationships. UNL, UCCA, and AMR are fully manual annotated, but GMB produces meaning representations automatically and can be corrected by experts. However, such approaches (e.g., AMR, UCCA, GMB, and UNL) focus on lexical-semantic or multilingual words rather than on sentence semantics and cannot guarantee sentence-based semantic representation to be universal and unambiguous across languages. Most of the proposed semantic representation methods do not consider the morphological and syntactic characteristics of the language in the construction of sentence-level semantic labeling. Contributions made in the semantic representation of any language text will utilize the translated English resources, which may negatively affect the performance of other semantic representation methods. In our research work, MParser propose a universal semantic representation to extract semantic relationships from local language text using local language tools and resources, such as Stanford Parser. In addition, the proposed parser takes into account the syntactic and morphological features of a given sentence. It is worth noting that the proposed MParser model uses various tools, resources, and text features to reduce the negative impact of resource quality on semantic representation. Moreover, MParser achieves a universal representation and semantic consistency across languages.

Overview
MParser comprises two processes: (1) human semantic input (HSI) and (2) sentence computerization (SC), as shown in Figure 3. First, human semantic input is the process of converting human natural language (HNL i ) (here, i indicates English) through an editor typing from CoDic into a sequence of machine-readable sentences SiS ci , which comprises sequentially converting sets of literals to a list of the symbolic signs. The editor (i.e., human user) inputs the HNL i by SIM from CoDic to constrain sentence creation based on strict rules. Second, sentence computerization (f c ) is a process of converting a sentence SiS hi to a sentence SiS m that is universally readable and understandable by a computer in MParser, denoted by f c : = SiS m ← SiS hi . In particular, this involves a sequence of activities: sentence analysis (i.e., parsing a local language sentence based on the local grammatical rules through robust Stanford Parser and universal dependency), case generation (i.e., appending a case on each sign to represent its grammatical functions and properties), and machine representation (i.e., representing a sentence that is computer-readable and -understandable). Thus, sentence SiS m ⊂ MParser only readable and understandable by computers can be converted back to a human-readable and -understandable SiS hj (here, j indicates other languages), such that f r : = SiS hj ← SiS m to rebuild human-readable sentences based on SiS m . pending a case on each sign to represent its grammatical functions and properties), an machine representation (i.e., representing a sentence that is computer-readable and -un derstandable). Thus, sentence SiSm ⊂ MParser only readable and understandable by com puters can be converted back to a human-readable and -understandable SiShj (here, j ind cates other languages), such that fr : = SiShj ← SiSm to rebuild human-readable sentence based on SiSm.

Methodology
The theoretical foundation of MParser comes from the sign description framewor (SDF) [26], as shown in Figure 4. It is a language for representing signs in computing sys tems and is particularly intended to represent the interpreted meanings or ideas of a objects in reality, such as appearing in dictionaries, texts, software, and web pages. A sign = (sign, denoter, reifier, denotation, connotation) is modeled by a bi-tree, consisting of thre relationships of a denotation, a connotation and a reification between signs. A denotation is an internal relationship between a sign and its denoter, such that th denoter denotes the properties of a sign. We can understand a denoter as a feature con tainer, containing all features of a sign. For a natural language, these features consist o

Methodology
The theoretical foundation of MParser comes from the sign description framework (SDF) [26], as shown in Figure 4. It is a language for representing signs in computing systems and is particularly intended to represent the interpreted meanings or ideas of all objects in reality, such as appearing in dictionaries, texts, software, and web pages. A sign: = (sign, denoter, reifier, denotation, connotation) is modeled by a bi-tree, consisting of three relationships of a denotation, a connotation and a reification between signs. machine representation (i.e., representing a sentence that is computer-readable and -un derstandable). Thus, sentence SiSm ⊂ MParser only readable and understandable by com puters can be converted back to a human-readable and -understandable SiShj (here, j indi cates other languages), such that fr : = SiShj ← SiSm to rebuild human-readable sentence based on SiSm.

Methodology
The theoretical foundation of MParser comes from the sign description framework (SDF) [26], as shown in Figure 4. It is a language for representing signs in computing sys tems and is particularly intended to represent the interpreted meanings or ideas of al objects in reality, such as appearing in dictionaries, texts, software, and web pages. A sign = (sign, denoter, reifier, denotation, connotation) is modeled by a bi-tree, consisting of thre relationships of a denotation, a connotation and a reification between signs. A denotation is an internal relationship between a sign and its denoter, such that th denoter denotes the properties of a sign. We can understand a denoter as a feature con tainer, containing all features of a sign. For a natural language, these features consist o A denotation is an internal relationship between a sign and its denoter, such that the denoter denotes the properties of a sign. We can understand a denoter as a feature container, containing all features of a sign. For a natural language, these features consist of the form (e.g., iid, term, and pronunciation), sense (i.e., meaning), part of speech (e.g., noun), tense (e.g., past), aspect (e.g., perfective), gender (e.g., male), number (e.g., single), and context (e.g., English). In essence, denotation provides a way to define a sign in the context of a sentence by a set of properties provided by a denoter.
A connotation is an external relationship between signs, such that a sign is connoted by a set of signs, which builds a parse tree of a set of signs. For instance, when a set of signs constructs a sentence as a sign in language, it can be parsed through connotation in grammatical cases. For example, we replace the sign of a sentence, and connotation can then parse the sentence sign into many atomic signs.
A reification is an instantiation relationship between a reifier (often a particular sign) and a specific denoter (often an abstract sign). For instance, given a denoter denoting the sign of "color", then "white" is a reifier, and between "white" and "color", there is a reification relationship. Or the sign is INT datatype, and 1234 is the reifier.
By generalizing these represented concepts of objects into structured signs, SDF represents all objects in reality, such as objects of abstract and concrete, physical and virtual, and real and fictitious.
CoDic (CoDic http://www.cis.umac.mo/~jzguo/pages/codic/, accessed on 30 August 2021) [17] is a common dictionary and an application of the SDF consisting of 93,546 English words, 20,446 Chinese words, and 190,001 word senses. In CoDic, a concept is a basic element in a sentence and consists of words and phrases. Each concept has already been collaboratively edited without semantic ambiguity. Any dictionary term in CoDic (called a sign) is identified as a unique and internal identifier iid ∈ IID, which is neutral and independent of any natural language and can refer to any term of a natural language. Given a simple sign s = (t, iid) = (icebox, 5107df00b635) = (common noun, "An insulated chest or box into which ice is placed, used for cooling and preserving food.") as shown in Figure 5. Specifically, the form of the sign is presented as follows: • IID: = POS+Y+ID: indicates the universal sign representational form. For instance, iid = 5107df00b635, in which 1 after 5 refers to common noun, 7df refers to year 2015, and 00b635 is ID. • Term indicates literal representational form for a sign, e.g., "icebox" is the literal representation of the sign 5107df00b635 in English context.

•
Meaning is the sense of a sign, e.g., "An insulated chest or box into which ice is placed, used for cooling and preserving food" is the sense of 5107df00b635. and context (e.g., English). In essence, denotation provides a way to define a sign in the context of a sentence by a set of properties provided by a denoter. A connotation is an external relationship between signs, such that a sign is connoted by a set of signs, which builds a parse tree of a set of signs. For instance, when a set of signs constructs a sentence as a sign in language, it can be parsed through connotation in grammatical cases. For example, we replace the sign of a sentence, and connotation can then parse the sentence sign into many atomic signs.
A reification is an instantiation relationship between a reifier (often a particular sign) and a specific denoter (often an abstract sign). For instance, given a denoter denoting the sign of "color", then "white" is a reifier, and between "white" and "color", there is a reification relationship. Or the sign is INT datatype, and 1234 is the reifier.
By generalizing these represented concepts of objects into structured signs, SDF represents all objects in reality, such as objects of abstract and concrete, physical and virtual, and real and fictitious.
CoDic (CoDic http://www.cis.umac.mo/~jzguo/pages/codic/, accessed on 30 August 2021) [17] is a common dictionary and an application of the SDF consisting of 93,546 English words, 20,446 Chinese words, and 190,001 word senses. In CoDic, a concept is a basic element in a sentence and consists of words and phrases. Each concept has already been collaboratively edited without semantic ambiguity. Any dictionary term in CoDic (called a sign) is identified as a unique and internal identifier iid ∈ IID, which is neutral and independent of any natural language and can refer to any term of a natural language. PoS plays a very important role and includes 16 kinds of signs, which are:

•
Meaning is the sense of a sign, e.g., "An insulated chest or box into which ice is placed, used for cooling and preserving food" is the sense of 5107df00b635. Thus, the meaning of iid is: 5107df00b635 = "icebox" = "アイスボックス" = "电冰箱" though they are in heterogeneous contexts. Thus, the meaning of iid is: 5107df00b635 = "icebox" = "アイスボックス" = "电冰箱" though they are in heterogeneous contexts.

Human Semantic Input (HSI)
In human semantic input, the user's initial intention is essential when they try to translate the transmitted concepts into unique semantic representations. If semantics are insufficient for a clear and accurate representation, in that case, the same literal words in users' minds may be different from different contexts between computers and users; it is possible to fail the information interaction because of ad hoc user input. Therefore, HSI tries to solve ad hoc input through a supervised sentence input that cannot casually input the words and phrases in users' minds.
In MParser, all written sentences are constrained by HSI, which is a supervised humanreadable sentence via CoDic. We developed an editor to input any term by selecting PoS and the exact meaning, which has a unique identifier (iid), to point to the same meaning regardless of contexts. We use a simple English sentence "I enjoy travel in summer." to illustrate HSI. First of all, a user types words one by one by selecting terms as shown in Figure 6: the terms "I" (ncm,0 × 5107df00b5e2), "enjoy" (vtr, 0x5707df00184b), "travel" (ncm, 0x5107df01848b), "in" (prep, 0x5a07df000103), and "summer" (ncm, 0x5107df016d86).
In human semantic input, the user's initial intention is essential when they try to translate the transmitted concepts into unique semantic representations. If semantics are insufficient for a clear and accurate representation, in that case, the same literal words in users' minds may be different from different contexts between computers and users; it is possible to fail the information interaction because of ad hoc user input. Therefore, HSI tries to solve ad hoc input through a supervised sentence input that cannot casually input the words and phrases in users' minds.
In MParser, all written sentences are constrained by HSI, which is a supervised human-readable sentence via CoDic. We developed an editor to input any term by selecting PoS and the exact meaning, which has a unique identifier (iid), to point to the same meaning regardless of contexts. We use a simple English sentence "I enjoy travel in summer." to illustrate HSI. First of all, a user types words one by one by selecting terms as shown in Figure 6: the terms "I" (ncm,0 × 5107df00b5e2), "enjoy" (vtr, 0x5707df00184b), "travel" (ncm, 0x5107df01848b), "in" (prep, 0x5a07df000103), and "summer" (ncm, 0x5107df016d86). Figure 6. HSI for English sentence "I enjoy travel in summer." (a) term "I"; (b) term "enjoy"; (c) term "travel"; (d) term "in"; (e) term "summer".
CoDic resources are all on the level of lemmas, and the "term" can be seen as word senses in CoDic, which cannot realize different morphological forms for a word. For instance, in English, the lemma "enjoy" yields morphological features: enjoys, enjoyed, enjoying. Thus, the morphological feature (mf) for each lemma of CoDic is designed and lists the forms needed in each language. The morphologic feature (mf) has the gender (G) and number (N) features for nouns and the features of tense (T), aspect (A) and voice (V) for verbs. The morphological feature (mf) can be different in each language (for details of morphological features, please see Appendix B). The morphological features (mf) are parsed according to the local grammar rule because different languages have different morphological phenomena, which are language-dependent for each language. Actually, Figure 6. HSI for English sentence "I enjoy travel in summer." (a) term "I"; (b) term "enjoy"; (c) term "travel"; (d) term "in"; (e) term "summer".
CoDic resources are all on the level of lemmas, and the "term" can be seen as word senses in CoDic, which cannot realize different morphological forms for a word. For instance, in English, the lemma "enjoy" yields morphological features: enjoys, enjoyed, enjoying. Thus, the morphological feature (mf ) for each lemma of CoDic is designed and lists the forms needed in each language. The morphologic feature (mf ) has the gender (G) and number (N) features for nouns and the features of tense (T), aspect (A) and voice (V) for verbs. The morphological feature (mf ) can be different in each language (for details of morphological features, please see Appendix B). The morphological features (mf ) are parsed according to the local grammar rule because different languages have different morphological phenomena, which are language-dependent for each language. Actually, populating the morphological feature is an engineering effort of its own. In HSI, users manually select the correct feature for each term in the CoDic. Thus, when a user inputs nouns or verbs, he/she needs a second selection for words, including morphologic features (mf ), as shown in Figure 7.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 29 populating the morphological feature is an engineering effort of its own. In HSI, users manually select the correct feature for each term in the CoDic. Thus, when a user inputs nouns or verbs, he/she needs a second selection for words, including morphologic features (mf), as shown in Figure 7. , terms "I", "travel" and "summer" choose singular and neuter (actually, no gender attribute in English, the default is neuter), and the hex is 0 for the noun. The term "enjoy" chooses active present imperfective habitual, and the hex is 03 for the verb. The morphologic feature identification algorithm is presented in Table 1. Table 2 shows the tenses of a sentence in English and HSI through a basic example ("she go home"). Following interesting observations from Table 2, it can be observed that helping verbs (Bold font) have been removed during the human sentence input for all tenses of verbs. MParser uses only the root form of the verb. These helping verbs, such as "is, am, be, being, has, had", are represented by a hex of morphologic feature (mf). Thus, the human input sentence is universal for all languages. 1. Function (Input words) 2. Input 3. String  Input word 4. if (String.pos= "ncm" or "npp" or "ntp") then 5. Gender(G): = n | m | f | b /* Select noun's gender */ 6. Number(N): = s | p | u /* Select noun's number */ 7. return  noun morphological feature (mf) 8. if (String.pos= "vtr" or "vid" or "vit") then 9. Tense(T) = present | past | future | past future /* Select a verb's tense */ 10. Aspect(A) = f | g | w | h | p /* Select verb's aspect */ 11. Voice(V): = active | passive /* Select verb's voice */ 12. return  verb morphological feature (mf)  , terms "I", "travel" and "summer" choose singular and neuter (actually, no gender attribute in English, the default is neuter), and the hex is 0 for the noun. The term "enjoy" chooses active present imperfective habitual, and the hex is 03 for the verb. The morphologic feature identification algorithm is presented in Table 1. Table 2 shows the tenses of a sentence in English and HSI through a basic example ("she go home"). Following interesting observations from Table 2, it can be observed that helping verbs (Bold font) have been removed during the human sentence input for all tenses of verbs. MParser uses only the root form of the verb. These helping verbs, such as "is, am, be, being, has, had", are represented by a hex of morphologic feature (mf ). Thus, the human input sentence is universal for all languages. if (String.pos= "ncm" or "npp" or "ntp") then 5.
Aspect(A) = f | g | w | h | p /* Select verb's aspect */ 11. Voice(V): = active | passive /* Select verb's voice */ 12. return ← verb morphological feature (mf) HSI converted a sequence of human-readable literals HNL i to a sequence of signs SiS ci that a computer program can understand without semantic ambiguity. Formally, the concepts are defined below.
where w 0 is an automatically generated leading word signifying the beginning of a sentence, 0 < k ≤ m is the word sequence number of the word w k in SiS hi .

Definition 2.
(Computer Sentence "SiS ci "): Given SiS hi : = (w 0.mf , w 1.mf , . . . , w k.mf , . . . , w m.mf ) =∑ m k=0 w k , then SiS hi is generated into iid sentence, called computer sentence "SiS ci ", such that: where iid 0 is an automatically generated leading word signifying the beginning of a sentence, 0 < k ≤ m is the iid sequence number of the iid k in SiS ci . For the result of human semantic input, SiS ci is a supervised computer-readable sentence.

Sentence Computerization (SC)
Sentence computerization (SC) transforms a human sentence into a computer sentence. It consists of three main activities: (1) Analyze the constituency structure and universal dependency relationship from the outputting words in the HSI step. (2) Adapt the Stanford parser tool to extract potential relationships between outputting words. (3) Apply predefined case grammar rules to label semantic roles outputting words and generate a universal sentence. The activities involve local sentence analysis described in Section 5.1, case generation described in Section 5.2, and machine representation described in Section 5.3.

Local Sentence Analysis
Each word is tagged into the PoS and morphological feature (mf ) in the sentence from the step of HSI. Local sentence analysis identifies the relationship between different words that constitute the English sentence. We adapted the Stanford parser tool [35], which provides full syntactic analysis, minimally a constituency (bracketed sentences) parse of local English sentences between different PoS. Constituency parse describes what the constituents are and how the words are put together. For instance, a sentence: "the quick brown fox jumps over the lazy dog" can transform into: The bracketed sentence represents grammatical functions, such as NP, VP, and PP based on English grammar. However, Stanford parser adapts the Penn PoS tagger rather than the CoDic PoS tagger, such that it is impossible to parse the sentence directly. Thus, we built a mapping between Penn PoS tagger and our CoDic PoS tagger, and the PoS mapping algorithm is shown in Table 3. In particular, particle words, which are only for the local English language, have no common iid to map other languages and will not appear in the final machine representation. if (CoDicpos =par and iid= "xxx") { 3.
end if; } Stanford parser presents and parses a word's relationship by a pure constituency, but ignores their semantic role. For example, SVO (subject-verb-object) structure is presented as S → NP VP NP by the Stanford parser, and it is impossible to parse subject, object, and other semantic roles in a sentence. Nivre et al. [12] proposed a universal dependency (UD) that uses dependency labels and PoS tags to parse sentences for different languages. The UD annotation defines a classification of around 40 relations as the universal dependency label sets (https://universaldependencies.org/#language-tagset, accessed on 30 August 2021), such as nsubj: nominal subject, amod: adjectival modifier. Thus, when the UD appeared, it immediately became interesting to see its relationship with the Stanford parser. For instance, the sentence "the quick brown fox jumps over the lazy dog" can transform into: Finally, through Stanford Parser and UD, the local English sentence becomes a segmented sentence with dependency relationships for each word, as shown in Definition 3.

Case Generation
MParser grammar is a set of machine natural language grammars such as universal grammar (UG) and case grammar (CG), originating from Fillmore's case study [36,37]. MParser grammar specifies various sequences of signs, forming a general natural language commonly read and understood both by humans and computer systems. It consists of morphological features (intrinsic) (discussed in HSI) and case grammar components (extrinsic). The morphological component varies from one language to another regarding the sets of morphological features, which are inflection forms themselves, but uses common naming conventions. Each case label either presents a syntactic, semantic, or computational function or marks a grammatical function in general and abstracts a particular grammatical phenomenon pertaining to a group of words, phrases, sentences, or others that appeared in natural languages.
In our previous work [16], we proposed a case grammar representing a universal and deep case (or semantic roles) that reflects in a sentence as the central means of explaining both the syntactic structure as well as the meaning of sentences. The case grammar component displays a common representation of syntactic structures and structural words and can be used as a resource for language processing tasks, such as translation, multilingual generation, and machine inference. The novel available cases are defined as follows: • Nominative Case (NOM): denotes a semantic category of entities that initiate actions, trigger events, or give states. Nominative case often associates with the agentive properties of volition, sentience, instigation, and motion. In this paper, cases are labels or tags that mark signs' syntactic, semantic, and computational functions in the marked forms such as marked words, phrases, and sentences within a natural language's text. For example, in the sentence "earth moves around sun", the behavior "move" is performed by the entity "earth" and the behavioral method is "around the sun". A case is used to label the functionality of a word or a phrase in the sentence, such as "NOM.earth PRE.moves COMv.around NOM.sun". The universal case grammar provides a common grammar transformable to the grammar of any existing natural language.

Tree Generation
Case generation converts a sequence of single concepts (i.e., atomic signs) into complex concepts (i.e., a compound sign), that is self-described. MParser builds a sentence-based case concept associated with an iid defining how an iid grammatically functions and combines with other iids by the case grammar. It does not need to consider the order of the sentence, which is a bag of concepts. The key of case generation to a sign lies in two facts: (1) There is a known PoS already associated with the term (HSI); (2) The term has a clear grammatical relationship with other terms in a sentence (local sentence analysis).
A sentence is defined as a sequence of signs, each marked with a functionality label defined as a case. Each sign in a sentence can describe its case grammar relationship with other signs; that is the compound sign, called SignX, which is NOM and ACC cases are appending for nouns such as the words "fox" and "dog", PRE case is appending for verbs such as the word "jump". Thus, the case generation (fox_NOM (jump_PRE)) yielding the English "fox jump" can be turned into Chinese by just changing the lexical item: (狐狸 _NOM (跳_PRE)) yielding "狐狸跳". The case is appending NOM and PRE to form correct sentences in both languages. Meanwhile, the morphology feature (mf ) builds inflection features for nouns and verbs in both languages.
Based on sign theory [28], every concept (e.g., fox, jump) is a meaning group, which appends a single case (e.g., NOM, PRE) to modify a larger meaning group in a tree of concepts. If each concept in a sequence is unique, then the sequence is also unique. The tree is defined as T = (N, E), where N indicates a group of nodes, and E indicates a group of edges, where E⊆ N × N. The path in a tree is a sequence of nodes n 1 , n 2 , . . . , n k-1 , n k , where each pair (n 1 , n 2 ) has e(n 1 , n 2 )∈E. A cycle is a path n 1 , n 2 , . . . , n k-1 , n k (k > 2) that consists of distinct nodes, except n 1 = n k . In our tree generation, we present a sentence in a tree-based SignX representation as T SignX . Nodes N contains two main types: iid node N iid and case node N c . Formally, the node-set is: where IID is a group of all words' iids in the sentence, and each iid in the local sentence is represented as a node in the T SignX . C is a group of predefined case concepts, including NOM, PRE, and so on. Additionally, edges E link any two nodes in a tree, where: An important principle is designed in sentence construction, which is the fatherchild relationship. Each edge e(n f , n c ), where n f , n c ∈ N is connected with a father-child relationship that represents the structure relationship between its two connected nodes n f and n c -whether a father code (f ) is modified by another child node (c) or not, while the father node proceeds. A father node is a key sentence constituent. Differently, a child node is always dependent and belongs to a father node. This correspondence can be illustrated in Figure 8. Applying this principle, we can always construct a sequence of sentences in different order of atomic concepts but still ensure structural equivalence. NOM and ACC cases are appending for nouns such as the words "fox" and PRE case is appending for verbs such as the word "jump". Thus, the case gene (fox_NOM (jump_PRE)) yielding the English "fox jump" can be turned into Chin just changing the lexical item: (狐狸_NOM (跳_PRE)) yielding "狐狸跳". The case pending NOM and PRE to form correct sentences in both languages. Meanwhile, th phology feature (mf) builds inflection features for nouns and verbs in both languag Based on sign theory [28], every concept (e.g., fox, jump) is a meaning group, appends a single case (e.g., NOM, PRE) to modify a larger meaning group in a concepts. If each concept in a sequence is unique, then the sequence is also uniqu tree is defined as T = (N, E), where N indicates a group of nodes, and E indicates a of edges, where E⊆ N × N. The path in a tree is a sequence of nodes n1, n2, ..., nk-1, nk, each pair (n1, n2) has e(n1, n2)∈E. A cycle is a path n1, n2, ..., nk-1, nk (k > 2) that cons distinct nodes, except n1 = nk. In our tree generation, we present a sentence in a tree SignX representation as TSignX. Nodes N contains two main types: iid node Niid an node Nc. Formally, the node-set is: N = {Niid, Nc | iid ∈ IID, c ∈ C} where IID is a group of all words' iids in the sentence, and each iid in the local sente represented as a node in the TSignX. C is a group of predefined case concepts, incl NOM, PRE, and so on. Additionally, edges E link any two nodes in a tree, where: E ⊆ {nf, nc | f, c ∈ N} An important principle is designed in sentence construction, which is the fathe relationship. Each edge e(nf, nc), where nf, nc ∈ N is connected with a father-child re ship that represents the structure relationship between its two connected nodes nf -whether a father code (f) is modified by another child node (c) or not, while the node proceeds. A father node is a key sentence constituent. Differently, a child nod ways dependent and belongs to a father node. This correspondence can be illustra Figure 8. Applying this principle, we can always construct a sequence of sentences ferent order of atomic concepts but still ensure structural equivalence. The case generation is converted into TSignX using the tree generation algorithm provides a phrase-based structure, such as SVO, OVS sequences, and case labeling, a non-redundant representation. The TSignX Tree algorithm is derived as follows: 1. Linearize input to a term sequence S. 2. Connect each term in S to its smallest subtree in TSignX. The case generation is converted into T SignX using the tree generation algorithm. T SignX provides a phrase-based structure, such as SVO, OVS sequences, and case labeling, and is a non-redundant representation. The T SignX Tree algorithm is derived as follows:

1.
Linearize input to a term sequence S.

2.
Connect each term in S to its smallest subtree in T SignX .

3.
Append one case in each node of T SignX based on case grammar rules.

4.
Parse the universal dependency labels at each branching node N of the T SignX .

5.
Find the dependency relationship in the node of each word: a.
If exist corresponding dependency label, then replace the current case using dependency mapping rules; b.
If no dependency relationship, keep the current case.
The proposed T SignX model can represent different sentences with the same tree if they have the same semantics. Because the order of the words does not affect its representation, it reduces the influence of language, which has the property of flexible order. A sentence becomes a case sentence through the case generation, which appends a case concept for each word, as shown in Definition 4. where the length of i-th subsequence (iid 1.mf.C , . . . , iid p.mf.C ) i =∑ p j=1 iid j is p (1≤ p∈N), and C is appended case.

Machine Representation
After attaching a case to a word, a machine universal language representation shows a computer-readable and -understandable sentence without huge extra data to process it. where an extended iid (eiid): eiid : = Term.iid.mf.Case.F.C (term and "iid" refers to a sense in CoDic, "PoS" is already defined in iid, mf refers to morphological feature, F is the index of the higher level father sign in MParser tree, and index of the lower level child node "C" in MParser tree). Additionally, the machine representation referring to PoS is defined: Finally, through the machine representation activity, a sentence becomes a bag of semantic concepts without considering the sequence of the sentence through term index and can be self-described for understanding by computers.

Implementation
The MParser is implemented in Python and Java under macOS version 11.0.1 system, and runs under python 3.7 and JDK 1.8. CoDic is represented in XML format for English and Chinese. In addition, Stanford Parser and universal dependency APIs are called by MParser. In the implementation, several sentences are processed and analyzed to describe how to represent a sentence and maintain semantic consistency from English sentences. In MParser, the user first types words one by one by selecting terms and additional morphological features such as "I enjoy travel in summer" in the HSI step. By calling the constructInfo function in MParser, the sentence is generated into: Meanwhile, a universal dependency is parsed by calling the dependency_parse function in MParser, and finding each word dependency relationship by the everyWordDep function in the sentence: We applied our case grammar rules to generate the MParser tree. The tree visualizations are presented by NLTK API (NLTK API: http://nltk.org). Figure 9 shows the structure and tree screenshot from MParser.
Finally, the machine representation generated a universal sentence: The universal sentence presents a sequence of extracted meaningful concepts related to each other using cases and syntactical relationships. The sentence also can map into Chinese words for Chinese CoDic via unique iid. An illustration shows a transformation from local English HNL (i) to a universal sentence, then Chinese HNL (j) in Table 4.

3)))))
The universal sentence presents a sequence of extracted meaningful concepts related to each other using cases and syntactical relationships. The sentence also can map into  First, the English sentence is converted to machine-readable iid sequences from English CoDic. Then, through case generation and machine representation steps, the English computer-understandable sentence is converted into a universal computer-readable and -understandable eiid sentence that is a bag of unique concepts. Finally, the eiid sentence can be translated into another language such as Chinese based on local rules. MParser ensures that any sentence in an HNL i can be transformed into HNL j without any semantic loss.
We also tested a passive sentence in English to illustrate the difference between NOM and ACC from the semantic role, which is "dog is hit by man heavily.", as shown in Figure 10.
From the example, we found that "dog" is ACC, and "man" is NOM in a passive sentence, and they meet the standard semantic role for a passive sentence. The PoS of the word "is" is null since it is inserted during local sentence analysis, not from CoDic, and it does not appear in final machine representation. We illustrate from tenses of three English sentences, shown in Table 5.  -understandable eiid sentence that is a bag of unique concepts. Finally, the eiid sentence can be translated into another language such as Chinese based on local rules. MParser ensures that any sentence in an HNLi can be transformed into HNLj without any semantic loss.
We also tested a passive sentence in English to illustrate the difference between NOM and ACC from the semantic role, which is "dog is hit by man heavily.", as shown in Figure  10.

Evaluation
Human manual evaluation is the crucial and ultimate criterion for validating semantic case labeling given our definition of semantics as a meaning as it is understood by a language speaker [38]. In this research, MParser was evaluated using intrinsic and extrinsic evaluation. Intrinsic evaluation (reader-focused) aimed to evaluate the properties of MParser output by asking participants about the degree of semantic expressiveness of the output in a questionnaire. The extrinsic (expert-focused) evaluation aimed to evaluate the agreement rate of case labeling between MParser outputs and experts.

Dataset
In our experiment, we randomly selected 100 sentences from a dataset (https://www. kaggle.com/c/billion-word-imputation/data, accessed on 30 August 2021) [39],which is a large corpus of English language sentences, to manually input each word for each sentence in MParser, and finally output 75 retained sentences (N = 75) (please see Appendix C for 75 automatic sentence outputs from MParser) because we removed some unrecognizable words from CoDic and unparseable sentences. Taking into account the validity of the questionnaire, we divided the 75 sentences (N = 75) into 5 groups (each with 15 sentences (N = 15)), which were Group A, B, C, D, and E. Table 6 shows our test dataset, which were 50 short sentences with less than 8 words and 25 long sentences with more than 8 words.

Experiment Settings
Intrinsic: An intrinsic (reader-focused) design usually requires a larger sample of (non-expert) participants. In order to investigate judgments of the semantic expressiveness of MParser outputs, we used 154 valid participants to judge the degree of semantic expressiveness for 75 generated sentences through a questionnaire [40]. The semantic expressiveness criterion was: "how clear is it to understand what is being described" or "how clear it would be to identify the case label from the description". We adapted the 5-point Likert scale of semantic expressiveness, as follows:

Very unclear 2. Unclear 3. Acceptable 4. Clear 5. Perfectly clear
Readers were from cohorts of undergraduate and graduate students pursuing Englishrelated degrees. Before completing the questionnaire, they were expected to understand the attributes of each MNL case label; each group required at least 25 readers to complete.
Extrinsic: In the semantic case labeling evaluation, ideally, by asking the annotator to make some semantic prediction or annotation based on pre-specified criteria and comparing it with the case extracted from the proposed method, the degree of agreement between the proposed method and the expert's annotation could be determined. Thus, a small number of expert annotators were recruited to label cases of the MParser [41]. We used three experts, two Ph.D. students majoring in an English linguistics-related research area, and one university English lecturer to label the 75 sentences. Before labeling, they were required to fully understand the description of attributes of each MNL case through learning case grammars. Additionally, five groups of sentences (each with 15 sentences) required three experts to be completed. This meant that every expert needed to label 75 sentences. To facilitate labeling by the experts and compare it to test data of MParser, we split each word of each sentence, and the experts only needed to select the case for each word. We measured pairwise agreement of extrinsic evaluation among experts and MParser outputs using the kappa coefficient (κ), which is widely used in computational linguistics for measuring agreement in category judgments [42]. It is defined as where P(A) is the observed agreement rate of case labeling for one annotator such as expert 1, and P(E) is the expected agreement rate for another expert 2. The simple Kappa coefficient adapts binary classification. Thus, case labeling was achieved by a binary classification where each case has Yes (1) or No (0). For example, a NOM case label might be NOM case (1) or non-NOM case (0) in one word for annotators. We calculated κ from two aspects: inter-annotator agreement and intra-annotator agreement. Inter-annotator agreement was calculated for 75 sentences, which were annotated by two experts. Intraannotator agreement followed a similar process but was calculated for 75 sentences that were annotated between expert and MParser outputs. The interpretation standard of Kappa varied (−1 to 1) according to Landis and Koch [43]:

Results
From Table 7 and Figure 11, the judgments of semantic expressiveness indicated that MParser had better results since Clear and Perfectly clear had the largest percentage overall. Additionally, the Perfectly clear percentage between short sentences (N = 50) and long sentences (N = 25), at 44% and 23%, respectively, indicated that performance with short sentences was more significantly clear in semantic expressiveness. Table 7. The judgements of semantic expressiveness in intrinsic evaluation.

Perfectly
Clear Clear Acceptable Unclear Very Unclear Total Figure 11. The percentage of semantic expressiveness for short and long sentences in intrinsic evaluation. Table 8 shows the experimental results using MParser and human expert labeling. The average κ values were 0.693 for inter-annotator agreement and 0.717 for intra-annotator agreement. As 0.6 < κ < 0.8 indicates substantial agreement, the empirical results showed good consistency between the predictions generated by our approach and those of experts. The analysis of the κ values between three experts found that the agreement κ values for experts 2 and 3 were relatively higher. Experts 1 and 2, 3 had a slight gap, but the κ values were still within the range 0.6 < κ < 0.8. Table 8 found that experts 2 and 3 had higher average κ values than expert 1 in intra-annotator agreement. In addition, we calculated average κ values for intra-annotator agreement between short sentences and long sentences, as shown in Table 9. The average κ value for long sentences was significantly lower than that for short sentences. This result is consistent with the trend for our intrinsic evaluation, which showed that the higher complexity of a sentence was more likely to cause disagreement in case grammar labeling. In summary, comparing expert and MParser outputs, inter-annotator and intra-annotator agreement presented substantial results, and there was no major disagreement between our MParser results and those of the experts.  Figure 11. The percentage of semantic expressiveness for short and long sentences in intrinsic evaluation. Table 8 shows the experimental results using MParser and human expert labeling. The average κ values were 0.693 for inter-annotator agreement and 0.717 for intra-annotator agreement. As 0.6 < κ < 0.8 indicates substantial agreement, the empirical results showed good consistency between the predictions generated by our approach and those of experts. The analysis of the κ values between three experts found that the agreement κ values for experts 2 and 3 were relatively higher. Experts 1 and 2, 3 had a slight gap, but the κ values were still within the range 0.6 < κ < 0.8. Table 8 found that experts 2 and 3 had higher average κ values than expert 1 in intra-annotator agreement. In addition, we calculated average κ values for intra-annotator agreement between short sentences and long sentences, as shown in Table 9. The average κ value for long sentences was significantly lower than that for short sentences. This result is consistent with the trend for our intrinsic evaluation, which showed that the higher complexity of a sentence was more likely to cause disagreement in case grammar labeling. In summary, comparing expert and MParser outputs, inter-annotator and intra-annotator agreement presented substantial results, and there was no major disagreement between our MParser results and those of the experts.  From the results shown in Table 10, we found that PRE and GEN cases had extremely high MRs, which were 0.986 and 0.959, respectively. ADV, ACC, and LIN cases came next. The MR of DAT was relatively low because of the differences in the judgment of the infinitive. To our surprise, the MR of the NOM case was relatively low. Through one-to-one analysis of sentences, we found that when nouns were under the COM (COM n /COM v ) structure, some experts still labeled the COM case for nouns, and our MParser identified the nouns as NOM case. For COM, COM n , and COM v cases, the MR was not very high because the experts had different labels on which COM case to use for prepositions. However, if the COM case was considered a general COM case, COM all , the average of the MR achieved a very high score, which was 0.920, indicating a consensus on the COM case.

Semantic Consistency
Here, we discuss the multilingual semantic consistency of MParser between English and Chinese. In MParser, a sentence is a concept tree, consisting of simple sentences defined by a sequential list SiS, where each atomic concept iid is a low-level concept llc∈LLC in the step of human semantic input (HSI), and compound concept eiid ∈EIID is a high-level concept hlc∈HLC generated in MParser, acting as a sentence constituent in the step of sentence computerization (SC). Given two sentences, SiS i , which is an English sentence, and SiS j, which is a Chinese sentence, if low-level concept equivalence and high-level concept equivalence are equal such that SiS i = m SiS j (= m indicates semantic equivalence), then they are semantically consistent. As low-level concept equivalence is semantic consistency of terms, or word-based, high-level concept equivalence is sentence-based semantic consistency.
1. Low-level concept equivalence: SiS i and SiS j are equivalent if and only if: This guarantees that two heterogeneous single concepts are semantically consistent, as two sentences share a common iid ∈ CoDic.
2. High-level concept equivalence: SiS i and SiS j are equivalent if and only if: (1) Mapping relationship: IID ↔ EIID, which is iid in Def. 4 mapped to eiid in Def. 5 Thus, if and only if the following mapping path exists for semantic equivalence: It is obvious that if all three conditions are met, then SiS i = m SiS j . Figure 12 illustrates that languages i and j are semantically consistent as they share common tree concepts in cross languages through the unique iid and eiid in MParser. Appl. Sci. 2021, 11, x FOR PEER REVIEW 22 of 29 Figure 12. An illustration on semantic consistency.

Conclusions and Future Work
Creating a common semantic representation for multilingual languages is an essential goal of the NLP community. To facilitate multilingual sentence representation and semantic interoperability, this research presented an MParser for parsing local language sentences and providing a common understanding across the heterogeneous sentence. MParser converts complex concepts into a computer-readable and -understandable universal sentence for any simple multilingual sentence. This approach has provided a universal grammatical feature such that any sentence can be processed as a bag of concepts and refer to any term of a natural language. Additionally, it has laid a theoretical foundation for enabling humans and computers to understand sentences semantically through unique iid and eiid.
In the future, we plan to apply the approach to more real-world applications. For example, we will conduct research on how to achieve content persistence during construction of the Metaverse [44] by proposing a content-level persistence maintenance model since the ambiguity of the language, the use of synonyms to express a single idea, creates problems. In the blockchain, we will explore the question of how to achieve semantic interoperability between IoT devices and users [45]. In the field of smart contracts, we will study the cross-context issues of smart contracts between unknown business partners such as developers or anybody who even comes from different backgrounds or languages. Since language barriers prevent cross-language searches, most users do not have easy access to most of this [46]. Moreover, it also will be necessary to extend the research, including semantic inference on extracted meaning. We hope that our novel method will inspire the community to integrate various functions into our work.

Conclusions and Future Work
Creating a common semantic representation for multilingual languages is an essential goal of the NLP community. To facilitate multilingual sentence representation and semantic interoperability, this research presented an MParser for parsing local language sentences and providing a common understanding across the heterogeneous sentence. MParser converts complex concepts into a computer-readable and -understandable universal sentence for any simple multilingual sentence. This approach has provided a universal grammatical feature such that any sentence can be processed as a bag of concepts and refer to any term of a natural language. Additionally, it has laid a theoretical foundation for enabling humans and computers to understand sentences semantically through unique iid and eiid.
In the future, we plan to apply the approach to more real-world applications. For example, we will conduct research on how to achieve content persistence during construction of the Metaverse [44] by proposing a content-level persistence maintenance model since the ambiguity of the language, the use of synonyms to express a single idea, creates problems. In the blockchain, we will explore the question of how to achieve semantic interoperability between IoT devices and users [45]. In the field of smart contracts, we will study the cross-context issues of smart contracts between unknown business partners such as developers or anybody who even comes from different backgrounds or languages. Since language barriers prevent cross-language searches, most users do not have easy access to most of this [46]. Moreover, it also will be necessary to extend the research, including semantic inference on extracted meaning. We hope that our novel method will inspire the community to integrate various functions into our work.

B. Grammatical Features
In MParser, the gender and number features are only attributed to nouns. The features of tense, aspect, and voice are only attributed to verbs. For the grammatical aspects, we have the following definitions: -Perfect (prf ): a verb form that indicates that an action or circumstance occurred earlier than the time under consideration, often focusing attention on the resulting state rather than on the occurrence itself. E.g., "I have made dinner". -Perfect Progressive (pfg): a verb form that indicates that an action was progressive and finished at a time. E.g., "I had been doing homework until 6 PM yesterday". -Perfective (pfv): a grammatical aspect that describes an action viewed as a simple whole, i.e., a unit without interior composition. Sometimes called the aoristic aspect, which is a verb form to usually refer to past events. For example, "I came". -Imperfective (ipfv): a grammatical aspect used to describe a situation viewed with interior composition. The imperfective is used to describe ongoing, habitual, repeated, or similar semantic roles, whether that situation occurs in the past, present, or future. Although many languages have a general imperfective, others have distinct aspects for one or more of its various roles, such as progressive, habitual, and iterative aspects. 1. Imperfective habitual (iph): describes habitual and repeated actions. For example, "I read". "The rain beat down continuously through the night". 2. Imperfective progressive (ipp): describes ongoing actions or events. For example, "The rain was beating down".
Thus, we now have the feature combinations for noun and verb as shown in Tables 2 and 3.

C. MParser Output-75 Sentences
In MParser, we manually input 75 valid sentences and automatically output parsed results for each sentence, as shown in Table 4.