Next Article in Journal
A Generic Model for Quantum Measurements
Previous Article in Journal
Optimization of Big Data Scheduling in Social Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Formal Model for Semantic Computing Based on Generalized Probabilistic Automata

School of Computer Science, South China Normal University, Guangzhou 510631, China
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(9), 903; https://doi.org/10.3390/e21090903
Submission received: 25 July 2019 / Revised: 28 August 2019 / Accepted: 10 September 2019 / Published: 17 September 2019

Abstract

:
In most previous research, “semantic computing” refers to computational implementations of semantic reasoning. It lacks support from the formal theory of computation. To provide solid foundations for semantic computing, researchers propose a different understanding of semantic computing based on finite automata. This approach provides a computer theoretical approach to semantic computing. But finite automata are not capable enough to deal with imprecise knowledge. Therefore, in this paper, we provide foundations for semantic computing based on probabilistic automata. Even though traditional probabilistic automata can handle imprecise knowledge, their limitation resides in their being defined on a fixed finite input alphabet. This deeply restricts the abilities of automata. In this paper, we rebuild traditional probabilistic automata for semantic computing. Furthermore, our new probabilistic automata are robust enough to handle any alphabet as input. They have better performances in many applications. We provide an application for weather forecasting, a domain for which traditional probabilistic automata are not effective due to their finite input alphabet. Our new probabilistic automata can overcome these limitations.

1. Introduction

Semantic similarity is applied in plentiful applications about artificial intelligence and computational linguistics, such as word sense disambiguation, information retrieval, knowledge acquisition and natural language processing [1,2,3]. But most research on Semantic Computing (SC) are devoted to the following three fields: (1) formal models for theoretical foundations of SC, such as description logics, ontology reasoning and answer set programming [4,5]; (2) fundamental languages and technologies for interoperability and reuse of information including RDF, RDFS, the OWL family of languages, the WSML family of languages, and SPARQL [6,7]; (3) some applications of SC [8,9,10]. As a result, “semantic computing” refers to computational implementations of semantic reasoning (e.g., ontology reasoning, rule reasoning, semantic query and semantic search) but it is not from the perspective of the formal theory of computation. To provide solid foundations for semantic computing, it is significant to bridge the gap between semantic computing and the formal theory of computation.
Recently, Reference [11] proposed a different understanding of semantic computing based on finite automata. Unlike previous research, it explains semantic computing from the computation theory perspective. Their models for semantic computing are based on finite automata. However, finite automata cannot deal with imprecise knowledge, which is the advantage of probabilistic automata. Unfortunately, traditional probabilistic automaton M = ( Q , Σ , δ , q 0 , F ) is not capable enough, because it is defined on a finite input alphabet Σ . When an undefined input a Σ is transmitted to automaton in the applications, it will be invalid. But in the applications, the inputs transmitted from users are unpredictable. Users may input synonyms, equivalent concepts, acronyms or even some words with spelling errors. The fixed finite set of inputs restrict the abilities of automata deeply. For robustness, we prefer to build an automaton which can take any alphabet as input and semantic computing is the key to build such automata.
In this paper, we rebuild traditional probabilistic automata to provide formal models for semantic computing, based on semantic similarity of two symbols or concepts [2,3,12,13,14]. We will apply the existing methods in semantic similarity to probabilistic automata directly. For more information about semantic similarity, readers are encouraged to read our previous work [2,3,15].
The intuitive idea to rebuild a traditional probabilistic automaton is that we redefine the string S Σ as its corresponding most similar input string S Σ , by the methods of semantic similarity. Specifically, when automata need to deal with an undefined input string S, we find its similar string S in the input alphabet. Then we handle string S in the way similar to S . As a result, probabilistic automata can take any string as input, whether defined or not. In the following section, we will propose an application of the weather forecast. Because the weather factor has been changing all the time, it is impossible to define all the weather factors in a finite input alphabet. Therefore, traditional probabilistic automata are not competent in the weather forecast. But with the methods in semantic computing, for any undefined input, our new models can take them as corresponding most similar defined inputs. Moreover, our new models provide a computer theoretical approach to semantic computing.
The rest of this paper is organized as follows. In Section 2, we briefly recall some basic notions of probabilistic automata. In Section 3, we build the models for semantic computing based on probabilistic automata by semantic similarity. Section 4 gives an application for the weather forecast, and Section 5 concludes this paper.

2. Related Work

For convenience, in this section, we will briefly recall some basic notions of probabilistic automata. There are more details in References [16,17,18,19,20].
Definition 1
(Probability distribution). Suppose U is a finite set. A probability distribution on U is a function f from U to [0,1], if x U f ( x ) = 1 . The set { x U | f ( x ) > 0 } is called the support of U. The set of all probability distributions on U is denoted as D ( U ) . A probability distribution f D ( U ) is denoted as follows [21]:
f = f ( x 1 ) / x 1 + f ( x 2 ) / x 2 + f ( x 3 ) / x 3 .
For any λ [ 0 , 1 ] , f D ( U ) , scalar multiplication λ · f : U [ 0 , 1 ] is defined as ( λ · f ) ( x ) = λ · f ( x ) .
Definition 2
(PA). A probabilistic automaton (PA) is a five-tuple M = ( Q , Σ , δ , q 0 , F ) , where
  • Q is a finite set of states,
  • Σ is a finite input alphabet,
  • δ : Q × Σ D ( Q ) is the transition function,
  • q 0 Q is the start state, q 0 = i α i q i with α i = 1 , a i [ 0 , 1 ] ,
  • F Q is the set of accept states.
Remark 1.
δ ( q , a ) ( p ) denotes the probability of automaton entering state p with input a from state q. In [17], transition function is defined in an equivalent form δ : Q × Σ × Q [ 0 , 1 ] satisfying p Q δ ( q , a , p ) = 1 . δ ( q , a , p ) presents the probability of automaton entering state p with input a from state q, i.e., δ ( q , a ) ( p ) = δ ( q , a , p ) . In the form δ : Q × Σ D ( Q ) , we do not have a constraint similar to p Q δ ( q , a , p ) = 1 , because by complete probability formula we can get that for any f D ( U ) , f ( x ) = 1 . For convenience, both forms of transition function are denoted as δ, i.e., δ ( q , a ) ( p ) = δ ( q , a , p ) .
Definition 3
(Semantic similarity). Let U be a set of concepts or symbols. The semantic similarity of two concepts a , b U is defined as s i m : U × U I , where I is unit interval.
In traditional approaches, Reference [1] computes semantic similarity of concepts based on WordNet. Our previous work [2,3,15] is based on Wikipedia. Couto, F.M. et al. [22] proposed an approach by ontology. In this paper, we combine the theory of semantic similarity and computation to construct a new model of probabilistic automaton.

3. Formal Model of Semantic Computing Based on Probabilistic Automata

In this section, we present formal models for semantic computing by generalizing the automata mentioned in the above section. Firstly, we consider a simple case, extending automata by equivalent concepts. Then we investigate a general situation, generalizing automata by semantically related concepts.

3.1. Probabilistic Automata under Equivalent Concepts

Suppose that M = ( Q , Σ , δ , q 0 , F ) is a PA. One of the limitation of a PA M is that M is restricted by finite input alphabet Σ . Suppose x = a 1 a 2 a n Σ ( Σ is the set of all strings over Σ , containing empty string ε ) and x = a 1 a 2 a n Σ . For any 1 i n , a i and a i are equivalent concept or synonym in semantics. For instance, x = a 1 a 2 Σ and x = a 1 a 2 Σ where a 1 = t r u e , a 2 = v a l i d and a 1 = c o r r e c t , a 2 = a u t h o r i z e d . M is valid with input x, but not x . But the question is that in many applications, the inputs transmitted from users are unpredictable. Previously, we defined a 1 = t r u e as a legal input. But in the application, users may input a 1 = c o r r e c t instead of a 1 = t r u e . Because a 1 Σ , automaton M is invalid with input a 1 . For robustness, the automaton M is supposed to have the ability to take equivalent concepts or synonyms as inputs as well.
Definition 4.
Suppose Σ is an alphabet and π is the alphabet of all the possible symbols. For any b Σ , a π , if s i m ( a , b ) = 1 , then b is called an equivalent concept or synonym of a in Σ, denoted as a e = b . If a Σ then a e = a . If there does not exist any equivalent concept or synonym in Σ, then a e = ε where ε is empty string. Therefore, for any a π , there exist a e Σ . Both equivalent concepts and synonyms are took as equivalent concepts in this paper.
In the case of a e = ε , the transition function δ : Q × Σ D ( Q ) of a PA M need to be generalized to a function Q × Σ D ( Q ) and for convenience, we still denote it as δ . For q Q ,
δ ( q , A ) = δ ( q , ε ) = 1 / q , i f A = ε , δ ( q , a 1 a 2 ) = p Q δ ( q , a 1 ) ( p ) · δ ( p , a 2 ) , i f A = a 1 a 2 and a 1 Σ , a 2 Σ .
The language accepted by PA M is defined as a function L M : Σ [ 0 , 1 ] : for any x = a 1 a 2 a n Σ ,
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } , i = 1 , 2 n 1 ,
where δ ( q 0 , a 1 , q 1 ) = i α i δ ( q i , a 1 , q 1 ) , if q 0 = i α i q i .
Definition 5
(DPEC). A deterministic probabilistic automaton for semantic computing under equivalent concepts (DPEC) is a seven-tuple M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , where
  • Q is a finite set of states,
  • Σ is a finite input alphabet,
  • δ : Q × Σ D ( Q ) is a internal transition function,
  • q 0 Q is the start state, q 0 = i α i q i with α i = 1 , a i [ 0 , 1 ] ,
  • F Q is the set of accept states,
  • Σ = { a | ! a e Σ , a π } , where ! a e Σ means there exist only one a e Σ for every a,
  • δ : Q × ( Σ Σ ) D ( Q ) is a generalized transition function of δ:
    δ ( q , a ) = δ ( q , a ) , i f a Σ , δ ( q , a e ) , i f a Σ .
In the traditional perspective, Σ Σ is the alphabet of M and δ is the transition function of M. But in this paper, the transition function δ is called internal transition function and δ is called generalized transition function. Simply, by transition function of M, we mean δ . An input in Σ is called an original input and an input in Σ is called a generalized input.
Example 1
(Command order). For instance, we consider a sample case of command order. We consider a simple case that every time users input a command, automaton will enter a state based on probability distribution. Suppose that in PA M = ( Q , Σ , δ , q 0 , F ) , Q = { q , p } . Σ = { t r u e , f a l s e } . q 0 = q , F = Q . δ is defined as:
δ ( q , t r u e , q ) = 0.7 , δ ( q , f a l s e , q ) = 0.8 , δ ( q , t r u e , p ) = 0.3 , δ ( q , f a l s e , p ) = 0.2 , δ ( p , t r u e , q ) = 0.1 , δ ( p , f a l s e , q ) = 0.4 , δ ( p , t r u e , p ) = 0.9 , δ ( p , f a l s e , p ) = 0.6 .
In PA M, “true” and “false” in Σ are commands input by users. δ ( q , t r u e , p ) = 0.3 means when users are in state q and input command “true”, the probability of that he will enter state p is 0.3. Others are similar. But in application, users may input “right” and “wrong” instead of “true” and “false”, because they are equivalent concepts for users. In our daily life, there are many equivalent concepts and users cannot know which are legal commands.
In order to empower M with the ability of taking equivalent concepts as commands, we rebuild PA M as a DPEC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) as follows. Suppose that we need to consider the additional case that users will input “wrong” and “right”, i.e., Σ = { w r o n g , r i g h t } . For other cases, it is analogous. Define δ ( k , a , k ) = δ ( k , a , k ) , for k , k Q , a Σ . Suppose
s i m ( r i g h t , t r u e ) = 1 , s i m ( w r o n g , f a l s e ) = 1 ,
and for other cases s i m ( a , a ) = 0 , a Σ , a Σ . For a Σ , δ is defined as:
δ ( q , r i g h t , p ) = δ ( q , t r u e , p ) = 0.3 , δ ( q , w r o n g , p ) = δ ( q , f a l s e , p ) = 0.2 , δ ( q , r i g h t , q ) = δ ( q , t r u e , q ) = 0.7 , δ ( q , w r o n g , q ) = δ ( q , f a l s e , q ) = 0.8 , δ ( p , r i g h t , q ) = δ ( p , t r u e , q ) = 0.1 , δ ( p , w r o n g , q ) = δ ( p , f a l s e , q ) = 0.4 , δ ( p , r i g h t , p ) = δ ( p , t r u e , p ) = 0.9 , δ ( p , w r o n g , p ) = δ ( p , f a l s e , p ) = 0.6 .
With the above extended Σ and δ , when we suppose to get “true” from users, but users input “right”, PA M is invalid. However, DPEC M is still valid. Similarly, we can deal with spelling error, such as “ture”. The key to do this is the semantic similarity function s i m . If we define s i m ( t r u e , t u r e ) = 1 and δ ( k , t u r e , k ) = δ ( k , t r u e , k ) , k , k Q , then spelling error “ture” can be taken as a legal input.
But in many cases, for some a π , there may exist several a e Σ . This is defined as the following nondeterministic case.
Definition 6
(NPEC). A nondeterministic probabilistic automaton for semantic computing under equivalent concepts (NPEC) is a seven-tuple M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , where
  • Q , Σ , δ , q 0 , F are the same in DPEC,
  • Σ = { a | a π , a e Σ } ,
  • δ : Q × ( Σ Σ ) D ( Q ) is a generalized transition function of δ:
    δ ( q , a ) = δ ( q , a ) , i f a Σ , a e Σ δ ( q , a e ) , i f a e Σ δ ( q , a e ) 1 , a Σ , 1 , i f a e Σ δ ( q , a e ) > 1 , a Σ .
In the traditional perspective, Σ Σ is the alphabet of NPEC M and δ is the transition function of NPEC M.
Example 2
(Command order). Suppose that the designers of the PA M in above Example 1 get the feedback that users get an error report with inputs “right” and “wrong” frequently. In order to make it more convenient for users, the designers improve it as follows. In PA M = ( Q , Σ , δ , q 0 , F ) , Σ is redefined as Σ = { t r u e , f a l s e , r i g h t , w r o n g } . δ is redefined as:
δ ( q , r i g h t , p ) = δ ( q , t r u e , p ) = 0.3 , δ ( q , w r o n g , p ) = δ ( q , f a l s e , p ) = 0.2 , δ ( q , r i g h t , q ) = δ ( q , t r u e , q ) = 0.7 , δ ( q , w r o n g , q ) = δ ( q , f a l s e , q ) = 0.8 , δ ( p , r i g h t , q ) = δ ( p , t r u e , q ) = 0.1 , δ ( p , w r o n g , q ) = δ ( p , f a l s e , q ) = 0.4 , δ ( p , r i g h t , p ) = δ ( p , t r u e , p ) = 0.9 , δ ( p , w r o n g , p ) = δ ( p , f a l s e , p ) = 0.6 .
But there are so many equivalent concepts. What if some users may input “correct” instead of “right” or “true”?
In this case, we rebuild PA M as a NPEC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) as follows. In this case, Σ = { c o r r e c t } . Define δ ( k , a , k ) = δ ( k , a , k ) , for k , k Q , a Σ . Because δ ( q , r i g h t , p ) = δ ( q , t r u e , p ) = 0.3 and “correct” is equivalent to “right” and “true”, so we define δ ( q , c o r r e c t , p ) = δ ( q , r i g h t , p ) + δ ( q , t r u e , p ) = 0.3 + 0.3 = 0.6 . Analogously, δ can be defined as:
δ ( q , r i g h t , p ) = δ ( q , r i g h t , p ) + δ ( q , t r u e , p ) = 0.6 , δ ( q , r i g h t , q ) = m i n { δ ( q , r i g h t , q ) + δ ( q , t r u e , q ) , 1 } = 1 , δ ( p , r i g h t , q ) = δ ( p , r i g h t , q ) + δ ( p , t r u e , q ) = 0.2 , δ ( p , r i g h t , p ) = m i n { δ ( p , r i g h t , q ) + δ ( p , t r u e , p ) , 1 } = 1 .
With the above extended Σ and δ , when we suppose to get “true” and “right” from users, but users input “correct”, PA M is invalid. However, NPEC M is still valid. Similarly, we can deal with other equivalent concepts.
The key point is that the inputs transmitted from users are unpredictable. The finite input alphabet Σ in PA M cannot take every case into account. But NPEC is competent. Based on the research on semantic computing introduced previously, we can apply their methods of semantic computing to PA M directly. Once an undefined input is transmitted to M, we find its equivalent inputs and modify δ , then it can be taken as a legal input. But most of the time, we can only find similar inputs instead of equivalent concepts. We leave this case to the next section.
The transition function of a DPEC or NPEC M can be generalized to a function δ : Q × ( Σ Σ ) D ( Q ) and for convenience, we still denote it as δ :
δ ( q , A ) = δ ( q , ε ) = 1 / q , i f A = ε , q Q , δ ( q , a 1 a 2 ) = p Q δ ( q , a 1 ) ( p ) · δ ( p , a 2 ) , i f A = a 1 a 2 a n d a 1 ( Σ Σ ) , a 2 Σ Σ , q Q .
By the above generalized transition function, we can get the robustness of a DPEC or NPEC M :
Theorem 1
(Robustness). Suppose that M is a DPEC or NPEC, then M can take any string of symbols A π as an input, i.e., the transition function of a DPEC or NPEC M can be generalized to a function δ : Q × π D ( Q ) .
Proof. 
For any q Q , if A = ε , then δ ( q , A ) = δ ( q , ε ) = 1 / q . If A ε then suppose A = a 1 a 2 a n where a i π , 1 i n . If a i = ε , then δ ( q , a i ) = δ ( q , ε ) = 1 / q . If a i Σ , then δ ( q , a i ) = δ ( q , a i ) . If a i Σ , then δ ( q , a i ) = δ ( q , a i e ) . Recall that if a Σ , then a e = a . Therefore
δ ( q , A ) = δ ( q , a 1 a 2 a n ) = p 1 p 2 p n 1 Q δ ( q , a 1 ) ( p 1 ) · δ ( p 1 , a 2 ) ( p 2 ) δ ( p n 1 , a n ) = p 1 p 2 p n 1 Q δ ( q , a 1 e ) ( p 1 ) · δ ( p 1 , a 2 e ) ( p 2 ) δ ( p n 1 , a n e ) ,
which means M can take any string of symbols A π as an input, i.e., the transition function of M can be generalized to a function δ : Q × π D ( Q ) . □
As analysed above, in many applications of automata, the inputs transmitted from users are unpredictable. If an automaton can take any string of symbols as an input, it can be applied to more fields. The robustness of probabilistic automata empowers it to be more stable. Intuitively, a DPEC or NPEC is a semantic expansion of PA which releases PA from a finite input alphabet.
Definition 7.
The language accepted by a DPEC or NPEC M is defined as a function L M : π [ 0 , 1 ] : for any x = a 1 a 2 a n π ,
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } , i = 1 , 2 n 1 .
DPEC and NPEC are semantic generalizations of PA for semantic computing. To show this relationship, we can define DPEC and NPEC in the following equivalent forms.
Definition 8
(DPEC). A deterministic probabilistic automaton for semantic computing under equivalent concepts (DPEC) is a seven-tuple M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , where
  • M = ( Q , Σ , δ , q 0 , F ) is a PA,
  • Σ = { a | a π , ! a e Σ } ,
  • δ : Q × ( Σ Σ ) D ( Q ) is a generalized transition function of δ:
    δ ( q , a ) = δ ( q , a ) , i f a Σ , δ ( q , a e ) , i f a Σ .
Definition 9
(NPEC). A nondeterministic probabilistic automaton for semantic computing under equivalent concepts (NPEC) is a seven-tuple M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , where
  • M = ( Q , Σ , δ , q 0 , F ) is a PA,
  • Σ = { a | a e Σ , a π } ,
  • δ : Q × ( Σ Σ ) D ( Q ) is a generalized transition function of δ:
    δ ( q , a ) = δ ( q , a ) , i f a Σ , a e Σ δ ( q , a e ) , i f a e Σ δ ( q , a e ) 1 , a Σ , 1 , i f a e Σ δ ( q , a e ) > 1 , a Σ .
The language accepted by DPEC (or NPEC) and PA have the following property.
Theorem 2
(Semantic generalization). Suppose that DPEC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) is a semantic generalizationof PA M = ( Q , Σ , δ , q 0 , F ) and the language accepted by them are function L M and L M respectively. The languages accepted by them have the following properties:
1.
for any x Σ , L M ( x ) = L M ( x ) ,
2.
for any x π , there exist x Σ such that L M ( x ) = L M ( x ) = L M ( x ) .
Proof. 
For DPEC M and PA M, if x = ε , then δ ( q , x ) = δ ( q , x ) = 1 / q , for any q Q . Obviously, L M ( x ) = L M ( x ) . If x ε , (1) for any x = a 1 a 2 a n Σ , a i Σ , 1 i n 1 ,
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } .
Since δ ( q , a , p ) = δ ( q , a , p ) , for any a Σ , q , p Q , we get
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } = L M ( x ) .
(2) For any x = a 1 a 2 a n π , a i π , 1 i n 1 ,
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } .
For a Σ , δ ( q , a ) = δ ( q , a e ) and recall that a e = a if a Σ . Let x = a 1 e a 2 e a n e , then x Σ .
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } = q F , q i Q { δ ( q 0 , a 1 e , q 1 ) · δ ( q 1 , a 2 e , q 2 ) δ ( q n 1 , a n e , q ) } = L M ( x ) .
Since x Σ , by (1) we can get L M ( x ) = L M ( x ) . Therefore, L M ( x ) = L M ( x ) = L M ( x ) . □
Theorem 3
(Semantic generalization). Suppose that NPEC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) is a semantic generalization of PA M = ( Q , Σ , δ , q 0 , F ) and the language accepted by them are function L M and L M respectively. The languages accepted by them have the following properties:
1.
for any x Σ , L M ( x ) = L M ( x ) ,
2.
for any x π , there exist x Σ such that L M ( x ) L M ( x ) = L M ( x ) .
Proof. 
For NPEC M and PA M, if x = ε or x = a 1 a 2 a n Σ , a i Σ , 1 i n , it is similar to DPEC.
For any x = a 1 a 2 a n π , a i π , 1 i n 1 ,
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } , i = 1 , 2 n 1 .
For a Σ , δ ( q , a e , p ) a e Σ δ ( q , a e , p ) and δ ( q , a e , p ) 1 , therefore,
δ ( q , a , p ) = m i n { a e Σ δ ( q , a e , p ) , 1 } δ ( q , a e , p ) .
Recall that a e = a if a Σ . Let x = a 1 e a 2 e a n e , then x Σ .
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } q F , q i Q { δ ( q 0 , a 1 e , q 1 ) · δ ( q 1 , a 2 e , q 2 ) δ ( q n 1 , a n e , q ) } = L M ( x ) .
Since x Σ , by (1) we can get L M ( x ) = L M ( x ) . Therefore, L M ( x ) L M ( x ) = L M ( x ) . □
The intuitive idea of the above two theorems is that because DPEC and NPEC M are semantically generalized from PA M, then for an original input in Σ , they will get the same result. For a generalized input in Σ , if there is only one equivalent concept, they will get an equivalent result too. But if there are several equivalent concepts, the generalized input will include all the equivalent cases, then NPEC M will get a bigger probability. The properties of semantic generalization can be reflected by their transition functions and the languages accepted.
An important difference between DPEC and NPEC is that a DPEC is a traditional probabilistic automaton but an NPEC M is not a traditional probabilistic automaton because it does not satisfy complete probability formula. In PA M, p Q δ ( q , a , p ) = 1 . But in corresponding NPEC M , because δ ( q , a , p ) 0 , when a Σ ,
p Q δ ( q , a , p ) p Q δ ( q , a , p ) = 1 .
If we want to define a NPEC M as a kind of traditional probabilistic automaton, the transition function needs to be adjusted as follows.
Definition 10
(NPEC). A nondeterministic probabilistic automaton for semantic computing under equivalent concepts (NPEC) is a seven-tuple M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , where
  • M = ( Q , Σ , δ , q 0 , F ) is a PA,
  • Σ = { a | a π , a e Σ } , N ( a e ) is the number of the equivalent concepts of a,
  • δ : Q × ( Σ Σ ) D ( Q ) is a generalized transition function of δ:
    δ ( q , a ) = δ ( q , a ) , i f a Σ , 1 N ( a e ) a e Σ δ ( q , a e ) , i f a Σ .
It is obvious that the properties of robustness and semantic generalization are still valid and by the above adjusted transition function, it is easy to get the following property.
Theorem 4
(Semantic computing). In a DPEC or NPEC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , for any a Σ , p Q δ ( q , a ) ( p ) = s i m ( a , a e ) = 1 .
Proof. 
In a DPEC or NPEC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) ,
  • for any a Σ , p Q δ ( q , a ) ( p ) = p Q δ ( q , a ) ( p ) = 1 = s i m ( a , a e ) ,
  • for any a Σ ,
    p Q δ ( q , a ) ( p ) = p Q 1 N ( a e ) a e Σ δ ( q , a e ) ( p ) = 1 N ( a e ) a e Σ p Q δ ( q , a e ) ( p ) = 1 N ( a e ) a e Σ 1 = 1 N ( a e ) N ( a e ) = 1 = s i m ( a , a e ) .
 □
The intuitive idea of this theorem is that the total probability of that a DPEC or NPEC M will enter all the states with a similar input is the semantic similarity between generalized input and similar input. The main purpose of a DPEC (or NPEC) M = ( Q , Σ , δ , Σ , δ , q 0 , F ) generalized from PA M = ( Q , Σ , δ , q 0 , F ) is semantic computing.
From the above theorems, we can get that all the generalized PA do not have the limitations coming from the fixed finite input alphabet. They explain semantic computing with computation theory, instead of implementations of semantic reasoning (e.g., ontology reasoning, rule reasoning, semantic query, and semantic search).

3.2. Probabilistic Automata under Related Concept

In the previous subsection, we suppose that for any input a Σ , there exists an a e Σ . In fact, it is just an ideal case. As illustrated in bookshop example, when a PA M = ( Q , Σ , δ , q 0 , F ) is applied to an application, in order to handle unpredictable inputs, we rebuild it as a NPEC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) . When an input a Σ transmitted from users, we check the original input alphabet Σ . If there exist a a e Σ such that s i m ( a , a e ) = 1 , then we add a to Σ and define δ ( q , a ) = δ ( q , a e ) , q Q . But it is too hard to find a a e ε Σ such that s i m ( a , a e ) = 1 for every input a. In fact, most of the time, we can only find a a Σ such that 0 s i m ( a , a ) 1 . Therefore, in this subsection, we generalize DPEC and NPEC to the case 0 s i m ( a , a ) 1 .
Definition 11.
Suppose Σ is an alphabet and π is the alphabet of all the possible symbols. For any b Σ , a π , if s i m ( a , b ) = m a x b Σ { s i m ( a , b ) } > 0 , then b is called the most similar concepts of a in Σ, denoted as a = b . If a Σ then a = a e = a . If m a x b Σ { s i m ( a , b ) } = 0 , then a = ε . Notice that a is not unique.
With the most similar concepts, we generalize DPEC and NPEC to the universal case as follows.
Definition 12
(DPRC). A deterministic probabilistic automaton for semantic computing under related concept (DPRC) is a seven-tuple M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , where
  • M = ( Q , Σ , δ , q 0 , F ) is a PA,
  • Σ = { a | a π , ! a Σ } ,
  • δ : Q × ( Σ Σ ) D ( Q ) is a generalized transition function of δ:
    δ ( q , a ) = δ ( q , a ) , i f a Σ , δ ( q , a ) · s i m ( a , a ) , i f a Σ .
Definition 13
(NPRC). A nondeterministic probabilistic automaton for semantic computing under related concept (NPRC) is a seven-tuple M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , where
  • M = ( Q , Σ , δ , q 0 , F ) is a PA,
  • Σ = { a | a π , a Σ } ,
  • δ : Q × ( Σ Σ ) D ( Q ) is a generalized transition function of δ:
    δ ( q , a ) = δ ( q , a ) , i f a Σ , a Σ δ ( q , a ) , i f a Σ δ ( q , a ) 1 , a Σ , 1 , i f a Σ δ ( q , a ) > 1 , a Σ .
Obviously, DPEC and NPEC are special cases of DPRC and NPRC, that is, s i m ( a , a ) = s i m ( a , a e ) = 1 .
Example 3
(Command order). Recall the PA M in above Example 2. Here we consider the case that users may input “fit” instead of “right” or “true”. The term “fit” is not equivalent concept of “right” or “true”, but they are similar. Suppose s i m ( f i t , r i g h t ) = 0.7 , s i m ( f i t , t r u e ) = 0.8 . Then we can only find a similar concept for “fit” instead of an equivalent concept. So DPEC and NPEC are not valid in this case. In fact, the probability of getting a similar input is bigger than an equivalent concept.
In order to deal with similar inputs, we rebuild PA M as a DPRC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) as follows. In this case, Σ = { f i t } . Define δ ( k , a , k ) = δ ( k , a , k ) , for k , k Q , a Σ . δ is defined as:
δ ( q , f i t , p ) = s i m ( f i t , t r u e ) · δ ( q , t r u e , p ) = 0.24 , δ ( q , f i t , q ) = s i m ( f i t , t r u e ) · δ ( q , t r u e , q ) = 0.56 , δ ( p , f i t , q ) = s i m ( f i t , t r u e ) · δ ( p , t r u e , q ) = 0.08 , δ ( p , f i t , p ) = s i m ( f i t , t r u e ) · δ ( p , t r u e , p ) = 0.72 .
With the above extended Σ and δ , when we suppose to get “true” and “right” from users, but users input “fit”, PA M is invalid. However, DPRC M is still valid.
Suppose s i m ( f i t , r i g h t ) = s i m ( f i t , t r u e ) = 0.8 , then PA M need to be rebuild as a NPRC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) as follows. Σ = { f i t } . Define δ ( k , a , k ) = δ ( k , a , k ) , for k , k Q , a Σ . δ is defined as:
δ ( q , f i t , p ) = s i m ( f i t , r i g h t ) · δ ( q , r i g h t , p ) + s i m ( f i t , t r u e ) · δ ( q , t r u e , p ) = 0.48 , δ ( q , f i t , q ) = m i n { s i m ( f i t , r i g h t ) · δ ( q , r i g h t , q ) + s i m ( f i t , t r u e ) · δ ( q , t r u e , q ) , 1 } = 1 , δ ( p , f i t , q ) = s i m ( f i t , r i g h t ) · δ ( p , r i g h t , q ) + s i m ( f i t , t r u e ) · δ ( p , t r u e , q ) = 0.16 , δ ( p , f i t , p ) = m i n { s i m ( f i t , r i g h t ) · δ ( p , r i g h t , q ) + s i m ( f i t , t r u e ) · δ ( p , t r u e , p ) , 1 } = 1 .
Spelling errors can also be took as similar inputs analogously.
The transition function of a DPRC or NPRC M can also be generalized to a function δ : Q × ( Σ Σ ) D ( Q ) and for convenience, we still denote it as δ :
δ ( q , A ) = δ ( q , ε ) = 1 / q , i f A = ε , q Q , δ ( q , a 1 a 2 ) = p Q δ ( q , a 1 ) ( p ) · δ ( p , a 2 ) , i f A = a 1 a 2 a n d a 1 ( Σ Σ ) , a 2 Σ Σ , q Q .
By this generalized transition function, a DPRC (or NPRC) M inherits the robustness of DPEC (or NPEC).
Theorem 5
(Robustness). Suppose that M is a DPRC or NPRC, then M can take any string of symbols A π as an input, i.e., the transition function of a DPRC or NPRC M can be generalized to a function δ : Q × π D ( Q ) .
Proof. 
The proof is similar to proof of robustness of DPEC and NPEC. □
Definition 14.
The language accepted by a DPRC or NPRC M is defined as a function L M : π [ 0 , 1 ] : for any x = a 1 a 2 a n π ,
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } , i = 1 , 2 n 1 .
A DPRC M also inherits the relationship of accepted languages between DPEC and PA.
Theorem 6
(Semantic generalization). Suppose that DPRC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) is a semantic generalization of PA M = ( Q , Σ , δ , q 0 , F ) and the languages accepted by them are functions L M and L M respectively. The languages accepted by them have the following properties:
1.
for any x Σ , L M ( x ) = L M ( x ) ,
2.
for any x π , there exist x Σ such that L M ( x ) L M ( x ) = L M ( x ) .
Proof. 
if x = ε or x Σ , the proof is similar to the proof of semantic generalization of DPEC and NPEC. For any x = a 1 a 2 a n π / Σ ,
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } , i = 1 , 2 n 1 .
Since s i m ( a , a ) 1 , δ ( q , a , p ) = δ ( q , a , p ) · s i m ( a , a ) δ ( q , a , p ) , for any q , p Q . Let x = a 1 a 2 a n Σ .
L M ( x ) = q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } q F , q i Q { δ ( q 0 , a 1 , q 1 ) · δ ( q 1 , a 2 , q 2 ) δ ( q n 1 , a n , q ) } = L M ( x ) .
And since x = a 1 a 2 a n Σ , by (1), we can get L M ( x ) = L M ( x ) .  □
Theorem 7
(Semantic generalization). Suppose that DPRC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) and NPRC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) are semantic generalizations of PA M = ( Q , Σ , δ , q 0 , F ) and the languages accepted by them are functions L M , L M and L M respectively. The languages accepted by them have the following properties:
1.
for any x Σ , L M ( x ) = L M ( x ) = L M ( x ) ,
2.
for any x π , there exist x Σ such that L M ( x ) L M ( x ) L M ( x ) .
Proof. 
if x = ε or x Σ , the proof is similar to the proof of semantic generalization of DPEC and NPEC. For any x = a 1 a 2 a n π / Σ , p Q ,
δ ( q , a , p ) = m i n { a Σ δ ( q , a , p ) · s i m ( a , a ) , 1 } δ ( q , a , p ) · s i m ( a , a ) = δ ( q , a , p ) .
Therefore, L M ( x ) L M ( x ) . By the semantic generalization of DPRC, we can get that there exist x Σ such that L M ( x ) L M ( x ) . □
The above two theorems show the relationship between the language accepted by NPRC or DPRC M and the corresponding PA M. Every element of M is included in M and the languages accepted by M can be taken as a part of the languages accepted by M . Therefore, M is a semantic generalization of M.
Example 4
(Bookshop). Let M = ( Q , Σ , δ , q 0 , F ) be a PA. Suppose that M is applied to an application of recommendation system for a book shop. When a customer inputs the name of a book and buys a book, PA M recommends a list of books, ordered by the probability of which book the customer will buy as well. Suppose that Q is the set of books, i.e.,
Q = { q 1 = Web   Ontology   Languages , q 2 = Knowledge   Representation , q 3 = Artificial   Intelligence , q 4 = Theory   of   Computation } .
Σ includes all combinations of the key words of every book’s name in Q which means Σ =   { Web, Ontology, Languages, Web Ontology, Web Ontology Languages, Knowledge, Representation, Knowledge Representation } . δ ( q , a , p ) means the probability of recommending book p to a customer after this customer has bought q with input a. The value of every δ ( q , a , p ) can be calculated by historical data which means recommendation system is based on customer’s purchasing history.
But as a user, when I have to input “Web Ontology Languages”, I prefer to input “WOL”—short for “Web Ontology Languages”—because it is more efficient to input an acronym. When I have to input “Knowledge Representation” or “Artificial Intelligence”, I also prefer to input “KR” or “AI” as well.
On the other hand, as a designer, the inputs transmitted from users are unpredictable. Users may input acronyms, synonyms or equivalent concepts in semantics, even some strange symbols coming from spelling mistakes. Therefore, when defining the input alphabet Σ, we need to collect synonyms or equivalent concepts of every element of Σ, as many as we can. But when Σ is very big, it is too hard to collect all the synonyms or equivalent concepts. Another worse case is that, as the development of globalization, the users may come from any country and input symbols with any language (English, Chinese, French, and so on). For robustness and universality, as a designer, it is too hard to define a PA M with the ability to handle all those cases, because PA is restricted by a fixed finite input alphabet Σ defined previously. But a NPRC is competent for this challenge.
Hence, we rebuild PA M as a NPRC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) as follows. Firstly, define Σ =   { linguistics, represent } as the set of some possible inputs. For every a Σ , define δ ( q , a , p ) = δ ( q , a , p ) · s i m ( a , a ) . Notice that after we have defined the way to compute semantic similarity, we only need to collect familiar equivalents and synonyms as Σ . When an input a Σ Σ transmitted from customers, we add a to Σ and find a based on function s i m . Then define δ ( q , a , p ) = δ ( q , a , p ) · s i m ( a , a ) , q , p Q .
For instance, we consider a sample case. In PA M = ( Q , Σ , δ , q 0 , F ) , Q = { q = Web Ontology Languages, p = Knowledge Representation}. Σ = { Web, Ontology, Languages, Representation, Knowledge}. q 0 = q , F = Q . δ is defined as:
δ ( q , W e b , q ) = 0.7 , δ ( q , O n t o l o g y , q ) = 0.8 , δ ( q , L a n g u a g e s , q ) = 0.9 , δ ( q , R e p r e s e n t a t i o n , q ) = 0.1 , δ ( q , K n o w l e d g e , q ) = 0.2 , δ ( q , W e b , p ) = 0.3 , δ ( q , O n t o l o g y , p ) = 0.2 , δ ( q , L a n g u a g e s , p ) = 0.1 , δ ( q , R e p r e s e n t a t i o n , p ) = 0.9 , δ ( q , K n o w l e d g e , p ) = 0.8 , δ ( p , W e b , q ) = 0.1 , δ ( p , O n t o l o g y , q ) = 0.4 , δ ( p , L a n g u a g e s , q ) = 0.5 , δ ( p , R e p r e s e n t a t i o n , q ) = 0.7 , δ ( p , K n o w l e d g e , q ) = 0.9 , δ ( p , W e b , p ) = 0.9 , δ ( p , O n t o l o g y , p ) = 0.6 , δ ( p , L a n g u a g e s , p ) = 0.5 , δ ( p , R e p r e s e n t a t i o n , p ) = 0.3 , δ ( p , K n o w l e d g e , p ) = 0.1 .
δ ( q , W e b , p ) = 0.3 means after a customer inputs key words “Web” searching for a book and finally buys book “Web Ontology Languages”, the probability of he will buy book “Knowledge Representation” is 0.3. Others are similar.
In order to deal with similar inputs, we rebuild PA M to a NPRC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) as follows. Suppose that we only consider the additional case that customers will input “linguistics” and “represent” searching for a book, that is, Σ = { l i n g u i s t i c s , r e p r e s e n t } and δ ( k , a , k ) = δ ( k , a , k ) , for k , k Q , a Σ . Suppose
s i m ( l i n g u i s t i c s , L a n g u a g e s ) = 0.9 , s i m ( r e p r e s e n t , R e p r e s e n t a t i o n ) = 0.7
and for other case a Σ , a Σ , s i m ( a , a ) = 0 . For a Σ , δ is defined as:
δ ( q , l i n g u i s t i c s , p ) = δ ( q , L a n g u a g e s , p ) · s i m ( l i n g u i s t i c s , L a n g u a g e s ) = 0.9 0.1 = 0.09 , δ ( q , r e p r e s e n t , p ) = δ ( q , R e p r e s e n t a t i o n , p ) · s i m ( r e p r e s e n t , R e p r e s e n t a t i o n ) = 0.7 0.9 = 0.63 , δ ( q , l i n g u i s t i c s , q ) = δ ( q , L a n g u a g e s , q ) s i m ( l i n g u i s t i c s , L a n g u a g e s ) = 0.9 0.9 = 0.81 , δ ( q , r e p r e s e n t , q ) = δ ( q , R e p r e s e n t a t i o n , q ) · s i m ( r e p r e s e n t , R e p r e s e n t a t i o n ) = 0.7 0.1 = 0.07 , δ ( p , l i n g u i s t i c s , q ) = δ ( p , L a n g u a g e s , q ) · s i m ( l i n g u i s t i c s , L a n g u a g e s ) = 0.9 0.5 = 0.45 , δ ( p , r e p r e s e n t , q ) = δ ( p , R e p r e s e n t a t i o n , q ) · s i m ( r e p r e s e n t , R e p r e s e n t a t i o n ) = 0.7 0.7 = 0.47 , δ ( p , l i n g u i s t i c s , p ) = δ ( p , L a n g u a g e s , p ) · s i m ( l i n g u i s t i c s , L a n g u a g e s ) = 0.9 0.5 = 0.45 , δ ( p , r e p r e s e n t , p ) = δ ( p , R e p r e s e n t a t i o n , p ) · s i m ( r e p r e s e n t , R e p r e s e n t a t i o n ) = 0.7 0.3 = 0.21 .
With the above extended Σ and δ , when an input “linguistics” that is not in Σ transmitted to automaton, PA M is invalid, but NPRC is valid.
An NPRC can be modified as a traditional probabilistic automaton as follows.
Definition 15
(NPRC). A nondeterministic probabilistic automaton for semantic computing under related concept (NPRC) is a seven-tuple M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , where
  • M = ( Q , Σ , δ , q 0 , F ) is a PA,
  • Σ = { a | a π , a Σ } ,
  • δ : Q × ( Σ Σ ) D ( Q ) is a generalized transition function of δ:
    δ ( q , a ) = δ ( q , a ) , i f a Σ , 1 a Σ s i m ( a , a ) a Σ δ ( q , a ) · s i m ( a , a ) , i f a Σ .
The proof of p Q δ ( q , a ) ( p ) = 1 is similar to NPEC. The key to proof this is that for different most similar concepts a , a of a,
s i m ( a , a ) = s i m ( a , a ) = m a x a Σ { s i m ( a , a ) } .
After the above modification of transition function, NPRC becomes a traditional probabilistic automaton. It is obvious that the property of robustness is still valid. But recall that the main purpose of this paper is to build computational models for semantic computing. Therefore, we prefer to modify it to the following forms.
Definition 16
(NPRC). A nondeterministic probabilistic automaton for semantic computing under related concept (NPRC) is a seven-tuple M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , where
  • M = ( Q , Σ , δ , q 0 , F ) is a PA,
  • Σ = { a | a π , a Σ } , N ( a ) is number of a corresponding to a,
  • δ : Q × ( Σ Σ ) D ( Q ) is a generalized transition function of δ:
    δ ( q , a ) = δ ( q , a ) , i f a Σ , 1 N ( a ) a Σ δ ( q , a ) · s i m ( a , a ) , i f a Σ .
Based on the above form of NPRC, for any DPRC and NPRC, they have the following property.
Theorem 8
(Semantic computing). For any DPRC or NPRC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) , for any a Σ , p Q δ ( q , a ) ( p ) = s i m ( a , a ) .
Proof. 
For any DPRC M and any a Σ ,
p Q δ ( q , a ) ( p ) = p Q δ ( q , a ) ( p ) · s i m ( a , a ) = s i m ( a , a ) p Q δ ( q , a ) ( p ) = s i m ( a , a ) 1 = s i m ( a , a ) .
For any NPRC M and any a Σ ,
p Q δ ( q , a ) ( p ) = p Q 1 N ( a ) a Σ δ ( q , a ) ( p ) · s i m ( a , a ) = 1 N ( a ) a Σ s i m ( a , a ) p Q δ ( q , a ) ( p ) = 1 N ( a ) a Σ s i m ( a , a ) = 1 N ( a ) N ( a ) s i m ( a , a ) = s i m ( a , a ) .
 □
Therefore, in a DPRC or NPRC M , for any a Σ , p Q δ ( q , a ) ( p ) turn out to be the semantic similarity of two concepts: s i m ( a , a ) . It means the total probability of that a DPRC or NPRC will enter all the states with a similar input is the semantic similarity between generalized input and similar input.
In this subsection, we have defined different kinds of NPRC, because in different cases, we need different transition function. For an NPRC M based on Definition 13, any generalized input is related to all similar inputs. But a NPRC M based on Definition 15 is a traditional probabilistic automaton which satisfies the properties of traditional probabilistic automaton. For a NPRC M based on Definition 16, it is a probabilistic automaton model for semantic computing.
In the above subsections, traditional probabilistic automata are generalized for semantic computing under equivalent and related concepts or words, respectively. Compared with traditional probabilistic automata, generalized probabilistic automata are more robust, because it is still valid when the input transmitted from users is not in the defined input alphabet Σ . It explains the semantic computing from computation theory. In the following section, we will show the robustness with an application for the weather forecast.

4. Application

The theory of computation has been applied to different areas such as social networks [23], clustering [24], and e-Services [25]. In this paper, we will build a machine for weather forecast [26,27,28,29] based on automata.
For a given location and time, the primary mission of weather forecast is to predict the conditions of the atmosphere. Human beings have been attempted to predict the weather informally for millennia, because we need weather warnings to protect life and property. Forecasts based on temperature and precipitation are essential to agriculture. We need to be prepared for floods and droughts in advance anyway. In our daily life, the weather forecast is an important factor for the plan of outdoor activities.
The understanding of atmospheric physics builds the foundation of modern numerical weather prediction [30]. After collecting quantitative data about the current state of the atmosphere at a given place, weather forecasts are made by using meteorology to predict how the atmosphere will change. The basic idea of numerical weather prediction is that: firstly, sample the state of the fluid at a given location and time. Then use the equations of fluid dynamics and thermodynamics to estimate the state of the fluid at some time in the future. In Reference [28], they propose that:
The weather station variables analyzed were air temperature, RH, 6-m (20 ft) wind speed, precipitation, cloud cover and solar radiation. Note that in keeping with standard fire weather terminology we will refer to the 6-m wind speed as the 20-ft wind speed throughout the rest of the paper.”
Therefore, in this section, we define a factor that impacts weather as a six-tuple
a = ( air temperature , RH , 6 - m wind speed , precipitation , cloud cover , solar radiation ) .
A numerical representation of the above six-tuple is called a weather factor, which means that the change of weather state is based on the changing trend of—temperature, RH, 6-m wind speed, precipitation, cloud cover and solar radiation. We want to build a probabilistic automaton PA M = ( Q , Σ , δ , q 0 , F ) for weather forecast based on finite historical observation data. The intuitive idea is that Q is the set of weather states: sunny, cloudy, rainy, and so on. Σ is the set of weather factors. δ ( q , a ) ( p ) represents the probability of weather state changing to p from q with weather factor a.
The challenge is that the input alphabet of traditional probabilistic automata is a fixed finite set. But in the natural environment, the weather has been changing all the time. It is impossible and impractical to define every case of weather factors. We will analyze the problem of lacking data in Table 1 lately. One compromised solution is that we extract a finite set of weather factors from historical observation data, defined as Σ . When it comes to a new weather factor a Σ , we find a most similar weather factor a Σ such that s i m ( a , a ) = m a x a Σ { s i m ( a , a ) } . Then we take a as a . With this idea, we rebuild PA M as a DPRC M = ( Q , Σ , Σ , δ , δ , q 0 , F ) based on historical observation data for weather forecast.
Firstly, we build a PA M = ( Q , Σ , δ , q 0 , F ) . For convenience, the six factors of temperature, RH, 6-m wind speed, precipitation, cloud cover and solar radiation are normalized to unit interval I = [ 0 , 1 ] by normalize function:
f ( x ) = x m i n m a x m i n ,
where m i n and m a x represent the minimum and maximum value that take into account. For instance, if we only consider the weather temperature between −20 degree centigrade and 50 degree centigrade, then every temperature t is normalized by:
f ( t ) = t + 20 70 .
Suppose that Σ is a finite set of weather factors extracted from historical observation data. Because of normalization, every a Σ is a tuple a = ( a 1 , a 2 a 6 ) , | a i | I where a i corresponding to the change of six factors: temperature, RH, 6-m wind speed, precipitation, cloud cover and solar radiation respectively. The sign of a i represents the corresponding factor is increasing (+) or decreasing (−) and the absolute value | a i | describe the rate of change. For instance, if a 1 = 0.2 , then it means the temperature is decreasing and its rate is 0.2. a 2 = 0.3 means RH is increasing and its rate is 0.3. In fact, a Σ is the changing trend of weather states. δ is defined as δ ( q , a ) ( p ) = n ( q , a , p ) | Σ | where | Σ | is the total of all weather factors in Σ and n ( q , a , p ) is the number of times that the weather in state q changes to state p with weather factor a in historical observation data. It means the probability of weather in state q change to state p with weather factor a is n ( q , a , p ) / | Σ | . Let F = Q and q 0 be the first observation record. Then PA M can show the probability of weather states changing from a state to another state based on historical observation data.
Suppose that we have been recorded weather factors for Σ every 10 min for 10 years. Roughly, the total number of records in Σ is 10 365 24 6 = 525 , 600 5.3 × 10 5 . However, if the six elements in a weather factor a Σ is accurate to 0.01, every element will has 201 cases between −1.00 to 1.00. Roughly, the number of total possible cases of weather factors is ( 200 ) 6 6.4 × 10 13 . Hence, for a random observed weather factor a, the probability of a Σ is:
P ( a Σ ) = 5.3 × 10 5 6.4 × 10 13 0 .
Table 1 show the approximate probability of a Σ in different cases. For example, the data in fist line means that if we record weather factors every 10 minutes for 10 years, we can get approximate 5.3 × 10 5 records and the probability of a new observed weather factor a Σ is near 0.
Table 1 shows that even when we collect data from records over 100 years, it is still not enough to build Σ for PA M. Therefore PA M is not competent for the weather forecast. Because when we observe and get the current weather factor a, the probability of a Σ is almost 0. The weather has been changing all the time. There are so many possible cases of weather factors, even though we have been recording every minute for 100 years, it is still far from enough. It seems impossible to make sure that every time the observed weather factor a will be in the set Σ . Once the current weather factor a is not in Σ , PA M cannot work.
In order to build an automaton for weather forecast, we need to redefine PA M as a DPRC M = ( Q , Σ , δ , Σ , δ , q 0 , F ) where Σ is the set of all possible weather factors, that is, Σ = E 6 , E = [ 1 , 1 ] . Recall that as the robustness of DPRC, δ can be generalized to a function δ : Q × π D ( Q ) . In this application, δ is generalized to δ : Q × ( E 6 ) D ( Q ) .
The semantic similarity of two weather factors a , b Σ is defined as s i m : E 6 × E 6 E and here we use the most classical cosine function to compute similarity, that is,
s i m ( a , b ) = c o s ( a , b ) = a · b | a | · | b | = i = 1 6 a i b i i = 1 6 a i 2 i = 1 6 b i 2 ,
where a = ( a 1 , a 2 a 6 ) and b = ( b 1 , b 2 b 6 ) . Transition function δ is defined as
δ ( q , a ) ( p ) = δ ( q , a ) ( p ) , f o r a Σ , δ ( q , a ) ( p ) · s i m ( a , a ) , f o r a E 6 / Σ ,
where a satisfy s i m ( a , a ) = m a x a Σ { s i m ( a , a ) } .
The intuitive idea is to suppose the current weather factor is a and the weather state is p. When we want to know the probability of current weather q changing to state p with weather factor a, we go back to check the records. By semantic similarity function s i m ( a , b ) we find that a Σ is the most similar weather factor to a. The probability of q changing to p with weather factor a can be obtained from PA M by δ ( q , a ) ( p ) . Then we can conclude that the probability of current state q changing to state p is δ ( q , a ) ( p ) · s i m ( a , a ) . Then, by DPRC M , we can get the probability that the current weather state may change to any weather states with any weather factor.
Table 2 show the comparison of the data that needed for PA M and DPRC M . Suppose that we only have the data recorded every minute in last 10 years. Table 1 show the comparison of the data that needed for PA M and DPRC M . PA M needs 6.4 × 10 13 to build Σ . The data we have is far more enough and it is impossible to get so many data. If we must get it, the cost will be out of control. But DPRC M dose not has this problem. The data in the last 10 years is enough.
Another challenge is the precision of implements used to measure the weather factors. DPRC M can only deal with precise information. Suppose that when measuring a weather factor, the result obtained from implements is a, but because of the precision of the implements, in fact the real weather factor may be a + k a ( k a is deviation of a) as well. In this case, the weather factor is imprecise information which cannot be dealt with by DPRC M but it can be described by distribution. Suppose W a = 0.9 / a + 0.1 / ( a + k a ) means that when the result got from implements is a, the real weather factor may be a with the probability 0.9 and may be a + k a with the probability 0.1 caused by the precision of implements. In this case, we need to generalize above DPRC M as a model M = ( Q , Σ w , δ , Σ w , δ , q 0 , F ) where Σ w = { W a = 0.9 / a + 0.1 / ( a + k a ) | a Σ } . δ : Q × Σ w D ( Q ) is defined as:
δ ( q , W ) ( p ) = a Σ δ ( q , a ) ( p ) W ( a ) = a Σ δ ( q , a ) ( p ) W a ( a ) = 0.9 δ ( q , a ) ( p ) + 0.1 δ ( q , a k ) ( p ) = 0.9 δ ( q , a ) ( p ) s i m ( a , a ) + 0.1 δ ( q , a k ) ( p ) s i m ( a , a k ) .
Therefore, even though PA M is incompetent to weather forecast, but DPRC M is competent. The critical difference is that PA M is restricted in a fixed finite input alphabet Σ , but DPRC M can take any weather factor as input because of robustness. Even the input is not in the fixed input alphabet Σ defined previously, DPRC can take it as a similar case and give a similar result based on semantic similarity.

5. Conclusions

Traditional automata will be invalid when an undefined input is transmitted from users in applications. In this paper, we generalize probabilistic automata for semantic computing. The generalized automata are more robust, which can take any input as legal input. The semantic generalization of our new probabilistic automata can be reflected from the languages accepted by them, i.e., the languages accepted by a traditional probabilistic automaton are a part of the languages accepted by the corresponding probabilistic automaton under semantic similarity. As shown in the application of weather forecast, when a new weather factor is observed, traditional probabilistic automata become invalid, because the new weather factor hardly belongs to the input alphabet. For robustness, we rebuild traditional probabilistic automata based on semantic similarity. Then when a probabilistic automaton gets an undefined input transmitted from users, despite it is not in the original input alphabet, the automaton is still valid with this undefined input.
Furthermore, probabilistic automata under semantic similarity provide a different understanding of “semantic computing” from a computation theory perspective. In most traditional semantic computing models, computational implementations of semantic reasoning are based on ontology reasoning, rule reasoning, semantic query, and semantic search, and so on. But they are not from the perspective of the formal theory of computation. The semantic computing property of our new probabilistic automata bridges the gap between semantic computing and computation theory. They provide a solid foundation for semantic computing. In our future work, we are planning to generalize the formal models for semantic computing with timed automaton. We are also trying to apply the formal models for semantic computing to deal with the problems such as natural language processing and semantic search.

Author Contributions

Conceptualization, G.H.; methodology, G.H.; software, G.H.; validation, G.H., S.H.W. and L.W.; formal analysis, G.H.; investigation, G.H.; resources, G.H.; writing—original draft preparation, G.H. and L.W.; writing—review and editing, G.H. and S.H.W.; visualization, G.H. and S.H.W.; supervision, Y.J.; funding acquisition, Y.J.

Funding

The work described in this paper is supported by The National Natural Science Foundation of China under Grant Nos. 61772210 and 61272066; Guangdong Province Universities Pearl River Scholar Funded Scheme (2018); The Project of Science and Technology in Guangzhou in China under Grant No. 201807010043; The key project in universities in Guangdong Province of China under Grant No. 2016KZDXM024.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liu, H.; Bao, H.; Xu, D. Concept vector for semantic similarity and relatedness based on wordnet structure. J. Syst. Softw. 2012, 85, 370–381. [Google Scholar] [CrossRef]
  2. Jiang, Y.; Zhang, X.; Tang, Y.; Nie, R. Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Inf. Process. Manag. 2015, 51, 215–234. [Google Scholar] [CrossRef]
  3. Jiang, Y.; Bai, W.; Zhang, X.; Hu, J. Wikipedia-based information content and semantic similarity computation. Inf. Process. Manag. 2017, 53, 248–265. [Google Scholar] [CrossRef]
  4. Baader, F.; Calvanese, D.; McGuinness, D.; Patel-Schneider, P.; Nardi, D. The Description Logic Handbook: Theory, Implementation and Applications; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  5. Eiter, T.; Ianni, G.; Lukasiewicz, T.; Schindlauer, R.; Tompits, H. Combining answer set programming with description logics for the semantic web. Artif. Intell. 2008, 172, 1495–1539. [Google Scholar] [CrossRef]
  6. Horrocks, I.; Patel-Schneider, P.F.; Bechhofer, S.; Tsarkov, D. OWL rules: A proposal and prototype implementation. Web Semant. Sci. Serv. Agents World Wide Web 2005, 3, 23–40. [Google Scholar] [CrossRef] [Green Version]
  7. Storey, V.C.; Burton-Jones, A.; Sugumaran, V.; Purao, S. CONQUER: A methodology for context-aware query processing on the World Wide Web. Inf. Syst. Res. 2008, 19, 3–25. [Google Scholar] [CrossRef]
  8. Sohn, M.; Jeong, S.; Kim, J.; Lee, H.J. Crowdsourced healthcare knowledge creation using patients’ health experience-ontologies. Soft Comput. 2017, 21, 5207–5221. [Google Scholar] [CrossRef]
  9. Tappolet, J.; Kiefer, C.; Bernstein, A. Semantic web enabled software analysis. Web Semant. Sci. Serv. Agents World Wide Web 2010, 8, 225–240. [Google Scholar] [CrossRef]
  10. Wang, F.; Hu, L.; Zhou, J.; Hu, J.; Zhao, K. A semantics-based approach to multi-source heterogeneous information fusion in the internet of things. Soft Comput. 2017, 21, 2005–2013. [Google Scholar] [CrossRef]
  11. Jiang, Y. A formal model of semantic computing. Soft Comput. 2018, 1–19. [Google Scholar] [CrossRef]
  12. Aouicha, M.B.; Taieb, M.A.H.; Hamadou, A.B. SISR: System for integrating semantic relatedness and similarity measures. Soft Comput. 2018, 22, 1855–1879. [Google Scholar] [CrossRef]
  13. Li, Y.; Bandar, Z.A.; McLean, D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 2003, 15, 871–882. [Google Scholar]
  14. Pirró, G. A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 2009, 68, 1289–1308. [Google Scholar] [CrossRef]
  15. Qu, R.; Fang, Y.; Bai, W.; Jiang, Y. Computing semantic similarity based on novel models of semanticrepresentation using Wikipedia. Inf. Process. Manag. 2018, 54, 1002–1021. [Google Scholar] [CrossRef]
  16. Cao, Y.; Xia, L.; Ying, M. Probabilistic automata for computing with words. J. Comput. Syst. Sci. 2013, 79, 152–172. [Google Scholar] [CrossRef]
  17. Qiu, D.; Wang, H. A probabilistic model of computing with words. J. Comput. Syst. Sci. 2005, 70, 176–200. [Google Scholar] [CrossRef] [Green Version]
  18. Shamsizadeh, M.; Zahedi, M. Intuitionistic general fuzzy automata. Soft Comput. 2016, 20, 3505–3519. [Google Scholar] [CrossRef]
  19. Sipser, M. Introduction to the Theory of Computation; Thomson Course Technology: Boston, MA, USA, 2006; Volume 2. [Google Scholar]
  20. Ying, M. A formal model of computing with words. IEEE Trans. Fuzzy Syst. 2002, 10, 640–652. [Google Scholar] [CrossRef]
  21. Zadeh, L.A. Fuzzy Languages and Their Relation to Human and Machine Intelligence. In Fuzzy Sets, Fuzzy Logic, And Fuzzy Systems: Selected Papers by Lotfi A Zadeh; World Scientific: Singapore, 1996; pp. 148–179. [Google Scholar]
  22. Couto, F.M.; Silva, M.J.; Coutinho, P.M. Measuring semantic similarity between Gene Ontology terms. Data Knowl. Eng. 2007, 61, 137–152. [Google Scholar] [CrossRef]
  23. Moradabadi, B.; Meybodi, M.R. Link prediction in weighted social networks using learning automata. Eng. Appl. Artif. Intell. 2018, 70, 16–24. [Google Scholar] [CrossRef]
  24. Mohammad, H.M.; Alireza, R. Learning Automata Clustering. J. Comput. Sci. 2018, 24, 379–388. [Google Scholar]
  25. El-Qurna, J.; Yahyaoui, H.; Almulla, M. A new framework for the verification of service trust behaviors. Knowl. Based Syst. 2017, 121, 7–22. [Google Scholar] [CrossRef]
  26. Sobash, R.A.; Schwartz, C.S.; Romine, G.S.; Fossell, K.R.; Weisman, M.L. Severe weather prediction using storm surrogates from an ensemble forecasting system. Weather Forecast. 2016, 31, 255–271. [Google Scholar] [CrossRef]
  27. Halperin, D.J.; Torn, R.D. Diagnosing conditions associated with large intensity forecast errors in the Hurricane Weather Research and Forecasting (HWRF) model. Weather Forecast. 2018, 33, 239–266. [Google Scholar] [CrossRef]
  28. Page, W.G.; Wagenbrenner, N.S.; Butler, B.W.; Forthofer, J.M.; Gibson, C. An Evaluation of NDFD Weather Forecasts for Wildland Fire Behavior Prediction. Weather Forecast. 2018, 33, 301–315. [Google Scholar] [CrossRef]
  29. Guan, H.; Zhu, Y. Development of verification methodology for extreme weather forecasts. Weather Forecast. 2017, 32, 479–491. [Google Scholar] [CrossRef]
  30. Richardson, L.F. Weather Prediction by Numerical Process; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Table 1. The approximate probability of a new observed weather factor a Σ in different cases.
Table 1. The approximate probability of a new observed weather factor a Σ in different cases.
Interval
(Year)
Frequency
(1/min)
AmountProbability
(Approximate)
1010 5.3 × 10 5 0
101 5.3 × 10 6 0
5010 2.6 × 10 6 0
501 2.6 × 10 7 0
10010 5.3 × 10 6 0
1001 5.3 × 10 7 0
Table 2. Comparison of the data that needed for PA M and DPRC M .
Table 2. Comparison of the data that needed for PA M and DPRC M .
AutomatonHow Many Data We HaveIf the Data Is EnoughHow Many Data Are NeededCost
PA M 5.3 × 10 6 no 6.4 × 10 13 out of control
DPRC M 5.3 × 10 6 yes 5.3 × 10 6 under control

Share and Cite

MDPI and ACS Style

Huang, G.; Wasti, S.H.; Wei, L.; Jiang, Y. A Formal Model for Semantic Computing Based on Generalized Probabilistic Automata. Entropy 2019, 21, 903. https://doi.org/10.3390/e21090903

AMA Style

Huang G, Wasti SH, Wei L, Jiang Y. A Formal Model for Semantic Computing Based on Generalized Probabilistic Automata. Entropy. 2019; 21(9):903. https://doi.org/10.3390/e21090903

Chicago/Turabian Style

Huang, Guangjian, Shahbaz Hassan Wasti, Lina Wei, and Yuncheng Jiang. 2019. "A Formal Model for Semantic Computing Based on Generalized Probabilistic Automata" Entropy 21, no. 9: 903. https://doi.org/10.3390/e21090903

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop