A Formal Framework for Integrated Environment Modeling Systems

Integrated Environment Modeling (IEM) has become more and more important for environmental studies and applications. IEM systems have also been extended from scientific studies to much wider practical application situations. The quality and improved efficiency of IEM systems have therefore become increasingly critical. Although many advanced and creative technologies have been adopted to improve the quality of IEM systems, there is scarcely any formal method for evaluating and improving them. This paper is devoted to proposing a formal method to improve the quality and the developing efficiency of IEM systems. Two primary contributions are made. Firstly, a formal framework for IEM is proposed. The framework not only reflects the static and dynamic features of IEM but also covers different views from variant roles throughout the IEM lifecycle. Secondly, the formal operational semantics corresponding to the former model of the IEM is derived in detail; it can be used as the basis for aiding automated integrated modeling and verifying the integrated model.


Introduction
Over the past 30 years, Integrated Environment Modeling (IEM) has become more and more important for environmental studies as it provides the ability to supply holistic views and solutions for environment science coupled with ecology, economy, and social activities [1][2][3][4].Assisted by advanced computer science and technologies, many IEM systems have been developed to provide technical support for IEM.
IEM systems have undergone approximately three stages since the emergence of IEM.The earliest IEM systems, and even many new systems, are strictly integrated following to the classification proposed by Voinov and Shugart [5].In such a system, the models, data, and analysis algorithms are tightly coupled together.Although most of these systems are effective and efficient, they are too tightly coupled to be reused for similar research, modeling, and analysis, which reduces our ability to efficiently develop new investigative paradigms.Therefore, the second stage focused on adopting modular technologies in system development.Modularized software models and analysis algorithms were developed, allowing for more loosely coupled systems.When new studies or applications are encountered, these modules can be abstracted from the original systems and integrated together into a new system with minimal adaptation.Thus, models and algorithms are more reusable and the development efficiency, as well as the quality, of the new system is promoted.
In the last 10-15 years, with the pressing need of environmental sustainable development, the IEM system has become the basis for applied uses as well as research [6].This requires the IEM system to be developed more efficiently with much better quality; hence, IEM systems evolved into the third stage.Advanced technologies for software development have been adopted to support these requirements during this stage.Models and algorithms are developed as independent, self-contained software components, which are managed in a model or algorithm library.Each component offers one or more interfaces, through which different models and algorithms are woven together to simulate complex phenomena or processes.In this stage, the components of models and algorithms are developed independently.Achieving high reusability, loose coupling, and expandable structure of IEM systems is the main objective.At the same time, to make integration easier, more efficient, and more correct, some support tools have been specially developed.These tools aid IEM by helping build modular components and may be libraries or frameworks, such as MMS [7], OpenMI [8], OMS3 [9,10], ESMF [11], IWRMS [12], GMS/WMS/SMS BASINS [13], CSDMS [14], and SEAMLESS [15].These tools try their best to relieve integrators from complicated software development techniques, allowing them to concentrate on the model's interaction and connection and reflecting the relationships of corresponding physical objects in the real world.
The IEM system has continually and promptly adopted many of the latest and advanced computing models, architectures, and techniques.Recently, Multi-Agent System (MAS) [16,17], grid computing [18,19], Service-Oriented Architecture (SOA) [20,21], and cloud computing [10,22] have also become involved.These new techniques make the models less dependent on each other.It has also become possible for many models to be integrated together no matter where they are deployed.
Nevertheless, the growth of the number of models in one system also increases its complexity.More models lead to more interaction.In addition to the dataflow, there is the control flow, which functions to smooth the gaps across models' semantics and implementation.Thus, the integrated model of IEM system development has a complex and even dynamic structure, which implies that the models and dataflow may themselves function as variables during the simulation.Consequently, it becomes difficult to precisely describe the structure of the integrated model.
Yet it is critical to correctly describe the structure of the model in order to reduce ambiguity, and ensure the model runs smoothly with its dynamic structure.We note that formal methods of developing computer systems use mathematically based techniques to describe system properties and can provide frameworks to specify, develop, and verify systems in a systematic manner [23].Thus, research into defining formal developmental methods may help IEM as well; we focus on using some formal methods to enrich the IEM system in this paper.
Although there is little direct research on formal descriptions for developing IEMs, there have been many studies that enlighten our work.Argent [24] provided an overview of IEM that covered requirements, modeling, integration, development, frameworks, practice, and applications.Laniak et al. [6] summarized the landscape of IEM as containing four independent elements-applications, science, technology, and community-and proposed its roadmaps, which corresponded to each element.In addition, Argent et al. [25] also supplied an evaluation framework that includes conceptual ease, ease of development, support for model development, and run time features, etc. Voinov et al. [6,26] stressed the role of data in IEM and proposed that it is the data that link different models together and, most importantly, distinguish IEM from pure software composition.Rizzoli [27] focused on the semantics of the model interface link models together smoothly, while Schmitz et al. [28] focused in detail on cooperation between models with different temporal scales.Kragt et al. [29] emphasized the role of modelers and provided a framework through which environmental modelers can guide more successful integrative research programs.Lloyd et al. [30] used software engineering methods to illustrate that models with lower framework invasiveness tended to be smaller, less complex, and have less coupling.Our own practices of integrated modeling [31,32] also suggested that there should be some technologies to describe integration more precisely.
Inspired by these significant studies, this paper aims to construct a formal framework to improve the development efficiency of integrated model and IEM systems.Although there have been many formal methods designed to tackle the composition of components, services, and agents [33,34], there are few formal methods to describe the dynamics of the integrated models' structure at the level of application.Thus, this paper proposes actor-oriented-like semantics with which the static and dynamic attributes of models, rather than those of the software components, services, and agents corresponding to the model, are defined.
Two main IEM views, different but related, are proposed to satisfy different objectives.Graph-based semantics are used to supply the visual semantics for integration, from the perspective of the integrators, and runtime control, from the perspective of the IEM system.The interaction among the models is represented by an intuitional multidigraph with ports, in which the dataflow itself is represented with the edge, models with vectors, and variables with ports.
For conceptual ease of reuse, another, unified, view of the model, based on a set of integrated algebra, is defined.Reuse is one important factor of IEMs: integrated models, as well as related knowledge, should be easily reused as this is the goal of integration.To simplify reuse of the integrated model, this paper proposes a hierarchical FSM (HFSM)-based integrated algebra for modeling the control flow of IEM, such that a unified view of both the simple model and integrated model is achieved in our semantic framework.
There have been some studies to tackle the hierarchal structure formal analysis, such as analyzing MAS with a hierarchal Petri-net [35] and model checking of a hierarchical FSM [36].This implies that the complexity that arises from the unified view of the model is affordable.Some analogous studies have been conducted for decades.In the DSP (Digital Signal Processing) area, dataflow (DF) is combined with HFSM to improve the quality (such as static schedule) of the system [37].In some science workflow systems (e.g., Kepler) and simulation frameworks (e.g., Ptolemy, which is Kepler's kernel), models and the dataflow among the models are also delegated with composited actors [38].
Distinct from these pre-existing methods, the main contribution of this paper is to emphasize the dynamics of the model's structure and the reusability of the integrated model.The former leads to more scalable model, while the latter leads to the unified representation of the model and decouples the model from other processes (such as input and output in Ptolemy) as well.
In the following sections, a use case as an example of IEM is given to explain problems encountered by modelers in IEM practice.The unified view of IEM is proposed and the corresponding operational semantics of the integration are carefully constructed.The multidigraphic view is then given and its uniformity with the unified view is also discussed.At the end of the paper, several conclusions of our research are given.

A Use Case as an Example of IEM
In this section, we give a use case as an example of IEM.In the latest research, by using OMS3, Peña-Haro et al. [31] and Zhang et al. [32] integrated WOFOST, a crop growth and production model, with HYDRUS-1D, an unsaturated flow model, and MODFLOW, a saturated flow model, to simulate the interaction between crop growth and unsaturated-saturated flow processes.In the integration, the MODFLOW domain is divided into several zones according to similarities of crops, soil properties and groundwater depth.Only one WOFOST/HYDRUS-1D profile is assigned to each one of these zones, which is illustrated in Figure 1.In this integrated model, the WOFOST provides HYDRUS with the Leaf Area Index (LAI), root depth (RD), and crop height (CH), while HYDRUS provides WOFOST with the water stress factor (the ratio of actual transpiration (vRoot) and potential transpiration (rRoot)).Meanwhile, the HYDRUS provides MODFLOW with recharge fluxes (vBot or recharge) at the water table, while MODFLOW provides HYDRUS with the pressure head value (Hb or H) that is used as the bottom boundary condition in HYDRUS-1D.Three models' time steps are all adopted with one day.That is to say, three models simulate one day's evolution of the crop growth, unsaturated flow, and groundwater flow, respectively.The relation can be illustrated as shown in Figure 2. In this integrated model, the WOFOST provides HYDRUS with the Leaf Area Index (LAI), root depth (RD), and crop height (CH), while HYDRUS provides WOFOST with the water stress factor (the ratio of actual transpiration (vRoot) and potential transpiration (rRoot)).Meanwhile, the HYDRUS provides MODFLOW with recharge fluxes (vBot or recharge) at the water table, while MODFLOW provides HYDRUS with the pressure head value (Hb or H) that is used as the bottom boundary condition in HYDRUS-1D.Three models' time steps are all adopted with one day.That is to say, three models simulate one day's evolution of the crop growth, unsaturated flow, and groundwater flow, respectively.The relation can be illustrated as shown in Figure 2. In this integrated model, the WOFOST provides HYDRUS with the Leaf Area Index (LAI), root depth (RD), and crop height (CH), while HYDRUS provides WOFOST with the water stress factor (the ratio of actual transpiration (vRoot) and potential transpiration (rRoot)).Meanwhile, the HYDRUS provides MODFLOW with recharge fluxes (vBot or recharge) at the water table, while MODFLOW provides HYDRUS with the pressure head value (Hb or H) that is used as the bottom boundary condition in HYDRUS-1D.Three models' time steps are all adopted with one day.That is to say, three models simulate one day's evolution of the crop growth, unsaturated flow, and groundwater flow, respectively.The relation can be illustrated as shown in Figure 2. In practice, we encountered several difficulties.The first difficulty is that WOFOST has a different active period from HYDRUS1-D and MODFLOW.When there is no crop (Before sower and after maturation in Figure 2), WOFOST is deactivated and the input dataflow of HYDRUS-1D disappears.
Should we integrate only Hydrus-1D and MODFLOW, or WOFOST, HYDRUS-1D and MODFLOW together?If we adopt the former, WOFOST has to be integrated in the "after sower" period, which leads to more effort for the new integration.Moreover, the integrated model in the whole simulation is not consistent.If we adopt the later, how deal with simulation without a crop?In our earlier study, we had to add extra codes to exam the state of the crop's sower and growth and decide whether WOFOST should be triggered.At the same time, both the structure and the dataflow of the model are variable during the simulation, which makes the integrated model difficult to reuse.
The second difficulty is that the conceptually integrated model is difficult to be reuse during the instantiation.For different spatial discretizing schemas (such as Schema 1 with 2 zones and Schema 2 with 5 zones, shown in Figure 1), there are different model instances and different dataflow, so that many integrating codes have to be repeated for different schemas.Other problems will be encountered in different contexts, such as how to integrate models with different space dimensions or different time steps.

Unified View of the Model
The IEM support tool and runtime environment should be adaptable to different views from different users.The unified form of the model is provided to the end user by the platform to make the model easier to use.Each model's static features (parameters, input variables, and output variables involved in whole simulation) are visible to users.However, all sub-models (i.e., the blocks used to construct the complex model) of the integrated model are managed by the runtime environment, including data transfer, the models' schedules, and process control, which is transparent to the end user.During the simulation, the integrated model's parameters, input variables, and output variables may vary in different time steps.For example, in the above WOFOST, HYDRUS and MODFLOW integrated model, when the model is in the "before sowing" and "after maturation" phases, the parameters, input variables, and output variables are not needed.Therefore, the dataflow related to the model changes too.The variations of the model, its dynamic features, should be represented carefully such that the runtime environment can detect and manage them.Although FSM, Petri-Nets, and other mechanisms [35][36][37] can all model the dynamic features, because of the simplicity and efficiency in dataflow management [37], we chose FSM to represent the dynamic features in our framework.
In the following section, a formal framework to support these mechanisms is discussed in detail.

Formal Definition of the Model
In our framework, a unified view represents all models, no matter whether they are basic or integrated.Here the basic model implies that the model is not integrated from other models.The model is formally and abstractly defined with a 5-tuple, which is based on the concept of the actor, as F is the tuples of the functions of the model, which is defined as When the model is under a specific state (such as before sowing in the example), it will have the same effective parameters set, effective input variables set, effective output variables set, and effective function set.When the new inputs are coming and they all satisfy the constraints, the model will be fired.Once P e , I e , O e , or f e changes (such as from before sowing to after sowing), the model will change to a new state, which can be represented by the Finite States Machine (FSM).
A is a FSM of the model, which is defined as where S is the finite set of states of the model.T: S → S is the state transition function, which means that, while the parameter P e and input variable I e are valuated, the model will transit to a new state, coinciding with P e and I e , and f e will be fired while O e is obtained.In the framework, the initial state s 0 = < Ø, Ø, Ø, f Ø > means that the model has been activated, where s 0 ∈S and f Ø denotes that there is nothing to do except await new input.
In the definition, P, I, O, and F are used to represent the static structure properties of the model.At the same time, the dynamic characteristics of the model are described with the FSM A, which represents the transition of the structure of the model.We use the block diagram as shown in Figure 3 to represent the model.In the figure, the model m = < P, I, O, F, A >, , and T = {(s 0 , s 1 ), (s 0 , s 2 ), (s 1 , s 1 ), (s 2 , s 2 ),( s 1 , s 2 ), (s 2 , s 1 )}.In addition,  where fe∈F, and fe: where Pe∈P, Ie∈I, Oe∈O, en`, el`, em`∈N, and el`, em` ≥ 0. Pe, Ie, Oe, and fe are the effective parameters set, effective input variables set, effective output variables set, and effective function under the specific state, respectively.
When the model is under a specific state (such as before sowing in the example), it will have the same effective parameters set, effective input variables set, effective output variables set, and effective function set.When the new inputs are coming and they all satisfy the constraints, the model will be fired.Once Pe, Ie, Oe, or fe changes (such as from before sowing to after sowing), the model will change to a new state, which can be represented by the Finite States Machine (FSM).
A is a FSM of the model, which is defined as A: = <S, s0, T>, where S is the finite set of states of the model.T: S → S is the state transition function, which means that, while the parameter Pe and input variable Ie are valuated, the model will transit to a new state, coinciding with Pe and Ie, and fe will be fired while Oe is obtained.In the framework, the initial state s0 = < Ø, Ø, Ø, fØ > means that the model has been activated, where s0 ∈S and fØ denotes that there is nothing to do except await new input.
In the definition, P, I, O, and F are used to represent the static structure properties of the model.At the same time, the dynamic characteristics of the model are described with the FSM A, which represents the transition of the structure of the model.We use the block diagram as shown in Figure 3

Algebra for Integrating Models
To obtain the unified view of the model in integrated modeling, we present algebra like that in the works of Hamadi and Benatallah [33] to model the control flow of IM, which allows the creation of new models using the existing ones as building blocks.With this algebra, the new model also satisfies the unified definition of the model.
We describe below the syntax and informal semantics of the model algebra operators.The constructs are chosen to allow common and advanced model integration.The set of models can be defined by the following grammar in BNF-like notation: an empty model, i.e., a model that performs no operation.
X represents a model constant, used as a basic model in this context.M1→M2 represents an integrated model that performs the model M1 followed by the model M2, i.e., → is an operator of sequence.
M1←M2 represents an integrated model where the next stage of model M1 must be performed after the current stage of model M2, i.e., ← is the feedback operator.
M1 ◊ M2 represents an integrated model that behaves as either model M1 or model M2.Once one of them executes its first operation, the second model is discarded; thus, ◊ is the select operator.
M1 ↑ M2 represents a model in which M1 and M2 can execute simultaneously.There is no interaction between M1 and M2, i.e., ↑ represents a parallel operator.
μM represents a model that performs the model M a certain number of times, i.e., μ represents an iterate operator.

Formal Operational Semantics of Integrated Modeling
In the framework, the formal operational semantics are presented to represent and implement the algebra calculators for integration.

Connector
The models interact with each other through the dataflow among them.However, because the models may be developed in different scenarios for different purpose by different developers, there can be potential gaps between models that impede them from connecting smoothly.The barriers mainly involve the different dimensions, different scales, and even different semantics between the variables that will be connected during integration.
The framework defines the connector to 'glue' the models together such that it fills in the gap.The connector is a simple and single functionality computing unit.The data from the output

Algebra for Integrating Models
To obtain the unified view of the model in integrated modeling, we present algebra like that in the works of Hamadi and Benatallah [33] to model the control flow of IM, which allows the creation of new models using the existing ones as building blocks.With this algebra, the new model also satisfies the unified definition of the model.
We describe below the syntax and informal semantics of the model algebra operators.The constructs are chosen to allow common and advanced model integration.The set of models can be defined by the following grammar in BNF-like notation: ε represents an empty model, i.e., a model that performs no operation.X represents a model constant, used as a basic model in this context.M 1 →M 2 represents an integrated model that performs the model M 1 followed by the model M 2 , i.e., → is an operator of sequence.
M 1 ←M 2 represents an integrated model where the next stage of model M 1 must be performed after the current stage of model M 2 , i.e., ← is the feedback operator.
M 1 ♦ M 2 represents an integrated model that behaves as either model M 1 or model M 2 .Once one of them executes its first operation, the second model is discarded; thus, ♦ is the select operator.
M 1 ↑ M 2 represents a model in which M 1 and M 2 can execute simultaneously.There is no interaction between M 1 and M 2 , i.e., ↑ represents a parallel operator.
µM represents a model that performs the model M a certain number of times, i.e., µ represents an iterate operator.

Formal Operational Semantics of Integrated Modeling
In the framework, the formal operational semantics are presented to represent and implement the algebra calculators for integration.

Connector
The models interact with each other through the dataflow among them.However, because the models may be developed in different scenarios for different purpose by different developers, there can be potential gaps between models that impede them from connecting smoothly.The barriers mainly involve the different dimensions, different scales, and even different semantics between the variables that will be connected during integration.
The framework defines the connector to 'glue' the models together such that it fills in the gap.The connector is a simple and single functionality computing unit.The data from the output variable of the model is collected by the connector.A new datum is computed with those input data and sent to the input variable of another model or to the outputting model itself.
Each connector can be represented as a 5-tuple, similar to the basic model: Compared to the basic model's definition, the only difference is that O = {o}, i.e., the connector always has only one output variable.To distinguish from the model, we use the box with a triangle to represent the connector in the block diagram, as seen in Figure 5.The value of the output variable of the first model can sometimes be transferred to the input variable of the second model directly; the connector may then be simpler, i.e., P = Ø, I = {i}, and

Parallel
The parallel operator indicates that two or more models can be parallel in the same cycle.Paralleled models do not depend on each other's according to the dataflow.But if one model fails, the integrated model based on the control will fail too.For example，in schema 1 of Figure 1

Parallel
The parallel operator indicates that two or more models can be parallel in the same cycle.Paralleled models do not depend on each other's according to the dataflow.But if one model fails, the integrated model based on the control will fail too.For example, in schema 1 of Figure 1, HYDRUS 1 and HYDRUS 2 should be integrated with the parallel operator.Given two models:

Parallel
The parallel operator indicates that two or more models can be parallel in the same cycle.Paralleled models do not depend on each other's according to the dataflow.But if one model fails, the integrated model based on the control will fail too.For example，in schema 1 of Figure 1 and f2 execute exactly and simultaneously.In accordance with FSM, the initial state of the model is s0 = <s10, s20>, where s10 and s20 refer to the initial states of m1 and m2, respectively, which means that m1 and m2 are in initial states simultaneously.S = {s0} ∪ (S1\{s10} × S2\{s20}), where S\{s} = (S − {s}).The transition T is also the combination of T1 and T2.We define f 1 and f 2 execute exactly and simultaneously.In accordance with FSM, the initial state of the model is s 0 = <s 10 , s 20 >, where s 10 and s 20 refer to the initial states of m 1 and m 2 , respectively, which means that m 1 and m 2 are in initial states simultaneously.S = {s 0 } ∪ (S 1 \{s 10 } × S 2 \{s 20 }), where S\{s} = (S − {s}).The transition T is also the combination of T 1 and T 2 .We define

Sequence
The sequence operator is the most familiar operation during integration.It indicates that m1 must have finished its action before m2 can be activated in the same cycle, which is generally caused by the dataflow.If the data of one or more input variables of m2 are from the output variables of m1, then there is a sequence m1→m2.Usually, there is one or more connectors between m1 and m2; we denote this as m1→(c1↑…↑ck)→m2, where c1,…, and ck are parallel.

Sequence
The sequence operator is the most familiar operation during integration.It indicates that m 1 must have finished its action before m 2 can be activated in the same cycle, which is generally caused by the dataflow.If the data of one or more input variables of m 2 are from the output variables of m 1 , then there is a sequence m 1 →m 2 .Usually, there is one or more connectors between m 1 and m 2 ; we denote this as m 1 →(c 1 ↑ . . .↑c k )→m 2 , where c 1 , . . ., and c k are parallel.
Suppose there are two models The integration of the function can be represented with F = F 1 × F c × F 2 , and f = (f 1 →f c →f 2 ) ∈ F, which means that f 1 ∈F 1 , f c ∈F c , and f 2 ∈F 2 are performed in sequence.According to FSM, A = A 1 ×A c ×A 2 , and the initial state of the model is s 0 = <s 10 , s c0 , s 20 >, which means that m 1 , c and m 2 are in initial states simultaneously.In addition, S = {s 0 } ∪ (S 1 \{s 10 } × S c \{s c0 } × S 2 \{s 20 }).The transition T is also the combination of T 1 , T c and T 2 .We define The sequence operation is illustrated in Figure 7.
ISPRS Int.J. Geo-Inf.2017, 6, x FOR PEER REVIEW 10 of 24 represented as WOFOST → HYDRUS-1D and HYDRUS-1D → MODFLOW, respectively.Together these are equal to WOFOST → HYDRUS-1D → MODFLOW.The parameters of the integrated model are the union of three sub-models' parameters.The input variables should be the union of three sub-models' variables except LAI, RD, CH (three HYDRUS-1D's input variables) and Hb (a WOFOST's input variable).The output variables are the union of three models' output variables.The function of the integrated model is the combination of three sub-models.In the model, there are only two states: s0 = (s10, s20, s30) (initial states) and s1 = (s11, s21, s31).Sometimes, there may be no explicit dataflow between two models.In a cycle, the successor models can begin to run if and only if the predecessor models have finished.The empty connector cɛ, which is similar to the empty model, can be used to imply these sequences.

Feedback
A feedback operation occurs across two cycles of the model.It may act on one model or on two different models, denoted with m1←m1 or m1←m2.This implies that the data are transferred from an output variable of the later model in the previous cycle to the input variable of the former at the next cycle through a connector, c.The second type of feedback operation cannot exist on its own.Generally, it will be based on a sequence or parallel operation in a single cycle.Additionally, it is obvious that m1 and m2 can be understood as a model m's different components, such that m1←m2 is equivalent to m←m.Thus, we need only to discuss the semantics of m1←m1 or (c→m1)←m1.We can also obtain m = m 1 →m 2 →m 3 = (m 1 →m 2 )→m 3 = m 1 →(m 2 →m 3 ).In our example model, WOFOST to HYDRUS-1D and HYDRUS-1D to MODFLOW are sequence relations, which can be represented as WOFOST → HYDRUS-1D and HYDRUS-1D → MODFLOW, respectively.Together these are equal to WOFOST → HYDRUS-1D → MODFLOW.The parameters of the integrated model are the union of three sub-models' parameters.The input variables should be the union of three sub-models' variables except LAI, RD, CH (three HYDRUS-1D's input variables) and Hb (a WOFOST's input variable).The output variables are the union of three models' output variables.The function of the integrated model is the combination of three sub-models.In the model, there are only two states: s 0 = (s 10 , s 20 , s 30 ) (initial states) and s 1 = (s 11 , s 21 , s 31 ) .
Sometimes, there may be no explicit dataflow between two models.In a cycle, the successor models can begin to run if and only if the predecessor models have finished.The empty connector c ε , which is similar to the empty model, can be used to imply these sequences.

Feedback
A feedback operation occurs across two cycles of the model.It may act on one model or on two different models, denoted with m 1 ←m 1 or m 1 ←m 2 .This implies that the data are transferred from an output variable of the later model in the previous cycle to the input variable of the former at the next cycle through a connector, c.The second type of feedback operation cannot exist on its own.Generally, it will be based on a sequence or parallel operation in a single cycle.Additionally, it is obvious that m 1 and m 2 can be understood as a model m's different components, such that m 1 ←m 2 is equivalent to m←m.Thus, we need only to discuss the semantics of m 1 ←m 1 or (c→m 1 )←m 1 . Suppose )←m 1 and m = <P, I, O, F, A>, A = <S, s 0 , T>.
We can thus obtain the following conclusions.

Select
The Select operation makes choosing between different sub-models or sub-model groups possible.These sub-models or sub-model groups may have similar functions and different implementation methods, or face different initial or boundary conditions.For example, in the integration of HYDRUS and WOFOST, before the crop sprouts and after the crop harvests, WOFOST need not work all the time.Suppose m1 = <P1, I1, O1, F1, A1>, m2 = <P2, I2, O2, F2, A2>, A1 = <S1, s10, T1>, and A2 = <S2, s20, T2>.If m = <P, I, O, F, A> and m = m1◊m2, this implies that if the guard expression g1 is satisfied, then m1 is performed, or if g2 is satisfied, then m2 is performed.In Select operation, each guard expression has a parameter set Pg, input variable set Ig, and output variable set Og.We can define , where T1i0 and T2i0 are not existing transitions that imply (s1i, s10) for model m1 and (s2i, s20) for model m2, and where s1i and s2i stand for arbitrary states of m1 and m2, respectively.Suppose si = <Pe, Ie, Oe, fe>.
The select operation is illustrated in Figure 9.
Similarly, we can obtain In the example, WOFOST is integrated with select operation WOFOST ◊ ɛ.

Select
The Select operation makes choosing between different sub-models or sub-model groups possible.These sub-models or sub-model groups may have similar functions and different implementation methods, or face different initial or boundary conditions.For example, in the integration of HYDRUS and WOFOST, before the crop sprouts and after the crop harvests, WOFOST need not work all the time.
Suppose I, O, F, A> and m = m 1 ♦m 2 , this implies that if the guard expression g 1 is satisfied, then m 1 is performed, or if g 2 is satisfied, then m 2 is performed.In Select operation, each guard expression has a parameter set P g , input variable set I g , and output variable set O g .We can define For the FSM A, S = {s 10 } × S 2 ∪ S 1 × {s 20 }, s 0 = {s 10 , s 20 }, and ), where T 1i0 and T 2i0 are not existing transitions that imply (s 1i , s 10 ) for model m 1 and (s 2i , s 20 ) for model m 2 , and where s 1i and s 2i stand for arbitrary states of m 1 and m 2 , respectively.Suppose s i = <P e , I e , O e , f e >.If s i = (s 1j , s 20 ), then P e = P e1j ∪ P g1 ∪ P g2 , The select operation is illustrated in Figure 9.

Iterate
In environmental simulations, different time steps must be permitted, such as, for example, from hourly simulation (e.g., to simulate a fungal infestation) up to several decades for sustainability studies.
Although most models work iteratively, the framework only defines the iterate operation that is used when the model needs to perform many cycles while other models that will be integrated only perform one cycle.The typical scenario is when models with different time steps are to be integrated: the model with the shorter time step should iterate many times while the other model only executes once.In integrated algebra, we use μm to denote a model's iteration.For the FSM A = <S, s0, T>, we define s0 = s10.For convenience when integrating with other operations, the running state of the model is extended to comprise three types of states: the first running state sf∈Sf, where sf = {Pef, Ief, Oef, fef}; the iterative running state Sit; and the state of end iteration se, such that S = {s0, se}∪Sf∪Sit and the transitions according to these states are Tf, Tit, and Te, respectively.If (s10, s1i)∈Tf, then (s10, s1i)∈T1.Let s1i = <Pe1i, Ie1i, Oe1i, fe1i >; then Pef = Pe1i ∪ Pg, Ief = Ie1i ∪ Ig, Oef = Oe1i ∪ Og, and fef = g × fe1i.For each (s1i, s1j)∈T1, I ≠ 0, (s1i, s1j) ∈ Tit.Let sit = <Peit, Ieit, Oeit, feit >; then, Peit = Pe1j ∪ Pg, Ieit = Ie1j ∪ Ig, Oeit = Oe1j ∪ Og, and feit = g × fe1j.For the end iteration state, let se = < Pee, Iee, Oee, fee >, Pee = Pg, Iee = Ig, Oee = Og, and fee = g.Then, Te = {(si, se)| si ∈Sf ∪ Sit}. Figure 10 illustrates the iterate operation of the model.

Iterate
In environmental simulations, different time steps must be permitted, such as, for example, from hourly simulation (e.g., to simulate a fungal infestation) up to several decades for sustainability studies.
Although most models work iteratively, the framework only defines the iterate operation that is used when the model needs to perform many cycles while other models that will be integrated only perform one cycle.The typical scenario is when models with different time steps are to be integrated: the model with the shorter time step should iterate many times while the other model only executes once.In integrated algebra, we use µm to denote a model's iteration.
Consider the model m 1 = <P 1 , I 1 , O 1 , F 1 , A 1 > and A 1 = <S 1 , s 10 , T 1 >.If m = <P, I, O, F, A> and m = µm 1 , which imply m 1 is performed until the guard expression g is not satisfied (g has parameter set P g , input variable set I g , and output variable set O g ), then we can obtain P = P g ∪ P 1 , For the FSM A = <S, s 0 , T>, we define s 0 = s 10 .For convenience when integrating with other operations, the running state of the model is extended to comprise three types of states: the first running state s f ∈S f , where s f = {P ef , I ef , O ef , f ef }; the iterative running state S it ; and the state of end iteration s e , such that S = {s 0 , s e }∪S f ∪S it and the transitions according to these states are T f , T it , and T e , respectively.If (s 10 , s 1i )∈T f , then (s 10 , s 1i )∈T 1 .Let s 1i = <P e1i , I e1i , O e1i , f e1i >; then P ef = P e1i ∪ P g , I ef = I e1i ∪ I g , O ef = O e1i ∪ O g , and f ef = g × f e1i .For each (s 1i , s 1j )∈T 1 , I = 0, (s 1i , s 1j ) ∈ T it .Let s it = <P eit , I eit , O eit , f eit >; then, P eit = P e1j ∪ P g , I eit = I e1j ∪ I g , O eit = O e1j ∪ O g , and f eit = g × f e1j .For the end iteration state, let s e = < P ee , I ee , O ee , f ee >, P ee = P g , I ee = I g , O ee = O g , and f ee = g.Then, Te = {(s i , s e )| s i ∈S f ∪ S it }. Figure 10 illustrates the iterate operation of the model.

Combine Iterate with Other Operators
Because the iterate operation should not be used alone, it is combined with the parallel, sequence, feedback, and select operations.The framework uses M↑μM, M→μM, μM→M, μM←M, M←μM, and M◊μM to denote these combinations.
M→μM and μM→M imply that a model's iteration performs with another model in sequence.They can also be extended as what is defined in the combination of parallel and iteration.The former is extended to a sequence M→M in the first cycle and a series of cycles involving ɛ→M.Similarly, the latter is extended to a series of cycles involving ɛ→M first and then a sequence M→M in the last cycle.The integration is illustrated in Figure 12.
The combination of feedback with iteration is similar to the sequence.The combination of select with iteration is simple as well.Thus, these combinations are omitted in this paper.

Combine Iterate with Other Operators
Because the iterate operation should not be used alone, it is combined with the parallel, sequence, feedback, and select operations.The framework uses M↑µM, M→µM, µM→M, µM←M, M←µM, and M♦µM to denote these combinations.
M→µM and µM→M imply that a model's iteration performs with another model in sequence.They can also be extended as what is defined in the combination of parallel and iteration.The former is extended to a sequence M→M in the first cycle and a series of cycles involving ε→M.Similarly, the latter is extended to a series of cycles involving ε→M first and then a sequence M→M in the last cycle.The integration is illustrated in Figure 12.
The combination of feedback with iteration is similar to the sequence.The combination of select with iteration is simple as well.Thus, these combinations are omitted in this paper.

Model Graph
With these basic operators, a complex model can be integrated from simpler models, even basic models.A result of an Iterate operation or Select operation is taken as a sub-model in the frame.From the perspective of the integrator, many sub-models can be synthetized together with connectors and corresponding message transferring channels.These sub-models may involve both basic models and integrated models, such that the sub-models and channels compose a graph called a model graph.The model graph can be illustrated as in Figure 13, which is a simple example.

Formal Semantics of Model Graph
Given three finite alphabets ΣM, ΣC, and ΣVar as the available labels for the models, connectors and variables, respectively, the integration of the models is defined with a labeled multidigraph with ports, as follows: G M : = <M, C, L, PVa, IVar, OVar, s, t, ι, lM, lC, lVar>.
M and C are two finite sets of vertices that denote models and connectors, respectively.M includes basic models and pre-existing integrated models.
L  In the graph model, there are two types of relations between any two sub-models.The first is the partial ordering relation, which is based on the sequence operator.If there is a path between two sub-models mh and mt such that mh→m1, m1→m2, …, mi→mt, we can say that mh →→ mt and call it order.It is obvious that mt begins to perform after mh has finished in one cycle.Another type is called noninterference, i.e., two sub-models perform in parallel without mutual interference.
As a result, we can obtain a width first search algorithm to construct a model from a model graph.The algorithm can be seen in the following graph (Figure 14).

Construct Complex Integrated Model
The model graph is intuitive, such that the integrator can use it easily to represent the interaction of integrated models.At the same time, the IEM tools can use the properties of the graph to check the From the algorithm, we know that the integration of many sub-models can also be taken as a model.

Results
Although there are few formal frameworks for IEM being studied specially, each IEM platform has its own potential formalist basis.We compare our formal framework with several platforms or standards to illustrate our framework's characteristics (in Table 1).The selected platforms are OMS3 [39], OpenMI [8] and ESMF [40].From the comparison, we can see that there are some advantages embedded in our framework.The formal framework can be used as the semantic basis of an IEM Domain Specific Language (DSL) with which the complicated model can be integrated from pre-existing models.Based on the framework, a light weighted IEM DSL named irDSL (integration-reusable Domain Specific Language) has been developed in our recent work.In Appendix A, some code snippet with irDSL for the integrated model in Figure 1 is listed.The additional details of irDSL will be discussed in another paper.

Conclusions and Future Work
In this paper, a formal framework for the IEM system is proposed.In the framework, the features of the model are divided into two parts, i.e., the static and dynamic features.The static features include the traditional parameters, input variables, output variables, and functions of the model.The dynamic feature is the transformation of the static features during the simulation and is represented as an FSM, which adapts the integrated model to dynamic application scenarios.Based on the framework, a unified definition of the model is proposed that makes the integrated model more manageable and reusable, as are the simple models.The integration is also represented as a multidigraph with port.An algorithm is used to explore its sufficiency such that there is a unified representation for a multidigraph.
Our proposed definition can also be used as the interface between specification and formal verification.Based on the framework, it supports integration verification at a specified time (similar to [34,41]) such that the integration specification can be proven to be correct before its implementation; thus, an iterative cycle between the implementation and the specification can be avoided.At the same time, global understanding of the integration is increased, which makes the application easy to understand and, therefore, easy to maintain.Future work should address the enhancement of this tool by supporting the proposed methodology with additional automation features.

Figure 1 .
Figure 1.The conceptual model and a 2-D hypothetical simplified scenario with two spatial discretizing schemas.

Figure 2 .Figure 1 .
Figure 2. The process of the integrated model's simulation.

24 Figure 1 .
Figure 1.The conceptual model and a 2-D hypothetical simplified scenario with two spatial discretizing schemas.

Figure 2 .Figure 2 .
Figure 2. The process of the integrated model's simulation.

M
: = < P, I, O, F, A> In this framework, P, I, and O are three tuples of the parameters, input variables, and output variables, respectively.They are the interfaces of the model, used to interact with the external environment.The framework defines them similarly as follows: P = Ø or P = {p 1 , p 2 , . . .} I = Ø or I = {i 1 , i 2 , . . .} and O = Ø or O = {o 1 , o 2 , . . .} {o 1 }, and O e2 = {o 2 }.The transition is illustrated in the figure.

mFFigure 4 .
Figure 4.The diagram definition of the basic model.
ISPRS Int.J. Geo-Inf.2017, 6, 47 8 of 24 variable of the model is collected by the connector.A new datum is computed with those input data and sent to the input variable of another model or to the outputting model itself.Each connector can be represented as a 5-tuple, similar to the basic model: C: = <P, I, O, F, A>.Compared to the basic model's definition, the only difference is that O = {o}, i.e., the connector always has only one output variable.To distinguish from the model, we use the box with a triangle to represent the connector in the block diagram, as seen in Figure5.The value of the output variable of the first model can sometimes be transferred to the input variable of the second model directly; the connector may then be simpler, i.e., P = Ø, I = {i}, and f is o = f(i) = i.

Figure 5 .
Figure 5.The diagram definition of the connector.

FFigure 5 .
Figure 5.The diagram definition of the connector.
s 10 , T 1 >, and A 2 = <S 2 , s 20 , T 2 >, let m = m 1 ↑m 2 and m = <P, I, O, F, A>.Therefore, P = P 1 ∪ P 2 , I = I 1 ∪ I 2 , and O = O 1 ∪ O 2 , where ∪ implies that all output variables of the different models are different.The integration of the function can be represented with F = F 1 ISPRS Int.J. Geo-Inf.2017, 6, 47 variable of the model is collected and sent to the input variable of Each connector can be repre Compared to the basic mod always has only one output varia to represent the connector in th variable of the first model can so directly; the connector may then

Figure 5 .
Figure 5.The diagram definition of the connector.

Figure 6 .
Figure 6.The diagram definition of the Parallel operator.
To clarify, we consider that m1 and m2 both have two states in addition to their respective initial states.These states are denoted as s11, s12 of m1 and s21, s22 of m2.

Figure 6 .
Figure 6.The diagram definition of the Parallel operator.

Figure 7 .
Figure 7.The diagram definition of the Sequence operator.

Figure 7 .
Figure 7.The diagram definition of the Sequence operator.

24 Figure 8 .
Figure 8.The diagram definition of the Feedback operator.

ModelFigure 8 .
Figure 8.The diagram definition of the Feedback operator.

Figure 10 .
Figure 10.The diagram definition of the Iterate operator.

Figure 10 .
Figure 10.The diagram definition of the Iterate operator.
edges, which indicates the dataflow among the sub-models and connectors.The edge set includes both sequence and feedback related dataflow.To distinguish them, we use the label l s for sequence edge and l b for feedback edge, and L = L s ∪ L b .P, I, and O are the set of parameters, input variables, and output variables, respectively, and Var = P ∪ I ∪ O. ι = ι M ∪ ι C .ι M = (ιp, ιi, ιo): M →PVar × IVar × OVar assigns a parameter variable (port) set, an input variable (port) set and an output variable (port) set to each model, with ιp(M) = PM, ιi(M) = IM, ιo(M) = OM, and ι(M) = (PM, IM, OM).

Figure 13 .
Figure 13.The example of a model graph.(a) The example of a model graph; (b) The unified view of an integrated model based on a.

Figure 13 .
Figure 13.The example of a model graph.(a) The example of a model graph; (b) The unified view of an integrated model based on a.
1 , f 2 , . . .}.The framework defines the state of the model as S: = < P e , I e , O e , f e > where f e ∈F, and f e : P e × I e → O e or (o e1 , o e2 , . . ., o en ') = f e (p e1 , p e2 , . . ., p el ', i e1 , i e2 , . . ., i em ') where P e ∈P, I e ∈I, O e ∈O, en', el', em'∈N, and el', em' ≥ 0. P e , I e , O e , and f e are the effective parameters set, effective input variables set, effective output variables set, and effective function under the specific state, respectively.
the example, WOFOST is integrated with select operation WOFOST ♦ ε.

Table 1 .
The comparison with other platforms.