Agriculture Model Comparison Framework and MyGeoHub Hosting: Case of Soil Nitrogen †

: To be able to compare many agricultural models, a general framework for model comparison when ﬁeld data may limit direct comparison of models is proposed, developed, and also demonstrated. The framework ﬁrst calibrates the benchmark model against the ﬁeld data, and next it calibrates the test model against the data generated by the calibrated benchmark model. The framework is validated for the modeling of the soil nutrient nitrogen (N), a critical component in the overall agriculture system modeling effort. The nitrogen dynamics and related carbon (C) dynamics, as captured in advanced agricultural modeling such as RZWQM, are highly complex, involving numerous states (pools) and parameters. Calibrating many parameters requires more time and data to avoid underﬁtting. The execution time of a complex model is higher as well. A study of tradeoff among modeling complexities vs. speed-up, and the corresponding impact on modeling accuracy, is desirable. This paper surveys soil nitrogen models and lists those by their complexity in terms of the number of parameters, and C-N pools. This paper also examines a lean soil N and C dynamics model and compares it with an advanced model, RZWQM. Since nitrate and ammonia are not directly measured in this study, we ﬁrst calibrate RZWQM using the available data from an experimental ﬁeld in Greeley, CO, and next use the daily nitrate and ammonia data generated from RZWQM as ground truth, against which the lean model’s N dynamics parameters are calibrated. In both cases, the crop growth was removed to zero out the plant uptake, to compare only the soil N-dynamics. The comparison results showed good accuracy with a coefﬁcient of determination (R2) match of 0.99 and 0.62 for nitrate and ammonia, respectively, while affording signiﬁcant speed-up in simulation time. The lean model is also hosted in MyGeoHub cyberinfrastructure for universal online access. Author Contributions: Conceptualization, A.B. and R.K.; methodology, A.B., R.K., and R.M.; software, A.B. and B.F.; validation, A.B.; formal analysis, A.B. and R.K.; investigation, A.B. and R.K.; resources, R.K.; data curation, A.B.; writing—original draft preparation, A.B.; writing— review and editing, A.B., R.K. and R.M.; visualization, A.B. and R.K.; supervision, R.K.; project administration, R.K.; funding acquisition, R.K. authors read agreed manuscript.


Introduction
Mathematical models of agriculture systems have been developed since the 1950s [1][2][3] for on-field decision management support and prediction. These models receive as inputs, weather, agriculture management, and model parameters from a user and predict various agriculture variables as a function of time (common timescale is per day) and depth (models are typically one-dimensional). Weather inputs are daily temperature, rainfall, radiation, humidity, etc. Management inputs could be tillage time and type, sowing, irrigation, and fertilizer application time and quantity, and harvest day. Model parameters could be soil hydraulic and thermal properties, microbial reaction rates, etc. Before using an agriculture model, its unknown parameters need to be estimated or calibrated.
An overall agriculture system model consists of several components such as: water flow and ion transport in soil, temperature distribution in soil, biochemical dynamics of make the model accessible through a browser, thereby making it operating system independent. Plus a user need not install the model executable in their local machine. For the cloud server, we are employing MyGeoHub [39], a science gateway powered by a HUBzero [40] cyberinfrastructure platform that supports the geospatial modeling, data analysis and visualization needs of the broad research and education communities through hosting of groups, datasets, tools, training materials, and educational contents. MyGeoHub hosts a software application in a Linux environment making serving any Windows based model impossible. Accordingly, here we report our development of a lean nutrient model [10] in Python and compare it against RZWQM. Since soil nitrate and ammonia data are not available, we first calibrate RZWQM using the available data from an experimental field in Greeley, CO, and next use the daily nitrate and ammonia data generated from RZWQM as ground truth, against which the lean model's N dynamics parameters are calibrated. Agriculture model comparison and evaluation is one of many AgMIP (Agriculture Model InterComparison and Improvement Project agmip.org) objectives. The comparison between the lean N-model vs. RZWQM shows coefficient of determination (R 2 ) match of 0.99 and 0.62 (where the value one corresponds to a perfect match) for nitrate and ammonia, respectively. Additionally, a run of RZWQM requires 12 s vs. 0.35 s for a run of our Python implementation of the lean N-model. Though RZWQM has components other than N dynamics as well, we can assume that time taken by N dynamics of RZWQM is greater than the lean N-model because of more pools and parameters in RZWQM. The lean model is hosted in the cloud on MyGeoHub.
The following are the key motivations of this article: • A high-fidelity complex agriculture model requires more execution time and space, whereas a comparable accuracy reduced model is desirable for fast prototyping; • Lack of high quality temporal resolution data as the basis for model comparison; • Programs for calibration and decision-making are typically in a separate language from that of the model, making the interfacing slow that needs to be sped-up as well; • A complex agriculture model is generally not accessible across platforms and requires local download and installation, which needs to be addressed; • A lean model can be used for quick initial exploration of a global search space for optimization routine (whether for automated calibration or decision making), which can then seed a subsequent more refined local search utilizing a complex model, increasing the usefulness of a basic lean model.
The following are the key contributions of this article: • Develop a framework to compare any two models where high quality field data for calibrating a test model against a benchmark model is not directly available; • Survey of soil nitrogen models and ranking those based on a number of N-pools and parameters, which are indicative of model complexity; • Implement and compare a lean N-model [10] against a high-fidelity complex agriculture model RZWQM and measuring the degree of fit as well as a speed-up in simulation time; • Implement both the lean N-model and the routine for automated calibration in the same programming language for fast interfacing; • Host the lean soil nitrogen model in MyGeoHub, making the model cloud accessible through a browser and cross-platform, thus eliminating local installation.
In the rest of the paper, we first list and compare some popular soil nitrogen (N) models found in the literature in Table 1 and Section 2. We briefly describe the nitrogen (and carbon) model in RZWQM in Section 3.1. This is followed with the description of a lean nitrogen model [10] in Section 3.2. Model comparison setup in this work is explained in Section 3.3. MyGeoHub cyber-infrastructure to host the lean N-model is briefly described in Section 3.4. Procedure to host the model in MyGeoHub is described in Section 3.4.1. In Section 4 the implemented lean N-model is calibrated and compared with the RZWQM simulated values for daily nitrate and ammonia levels. The paper is discussed and concluded in Sections 5 and 6, respectively.

Review of Soil Nitrogen Models
Almost all soil nitrogen models comprise of several organic and inorganic pools for N and C, where more complex models have a higher number of pools. Carbon dynamics is coupled with nitrogen dynamics due to consideration of organic matter (OM) and the fact that soil mineral N, ammonia and nitrate, are obtained due to OM decay. The plant takes N from the mineral N pool. Table 1 lists soil N-models in decreasing order of complexity as determined by the number of C and N pools (state-variables) and N model parameters used. These models discretize the soil into a number of layers, and the values for pools (statevariables) and N model params listed are for per layer basis. The models are one dimensional (vertical) and simulate on a daily timescale. The common input variables used by the N-models include: soil moisture and water flow, soil temperature, management inputs like fertilization, irrigation, sowing, tillage, and harvest. Soil moisture and temperature in turn depends on soil property and meteorological inputs like air temperature, wind speed, solar radiation, humidity, and rainfall. The N model is integrated with soil water, soil temperature and a crop growth model. The common outputs of the N models include: daily soil nitrate, ammonia, N loss, nitrogen oxide gases. Processes common to all the N models are: organic matter decomposition, mineralization, immobilization, nitrification, denitrification, microbial effect, crop N uptake and N leach. Some models have additional inputs and processes that are listed in Table 1.
There have also been works that compare some select nitrogen models. For instance in [58], fourteen nitrogen turnover models in a soil-crop system were compared. The processes included in the models, their description and the results of simulations carried out with the same data set were compared. In [59], six soil nitrogen cycles were compared and their processes discussed. To compare the accuracy of the models they were run with the same data set. In [60], four soil nitrogen dynamics models' processes were reviewed, compared, and analyzed with references to the equations used in each model. Works [61,62] have discussed many soil N models. The reaction rates of the soil nitrogen cycle depends on temperature and moisture. The dependency can be modeled by many equations. The work in [63] compared the Q10, the Arrhenius, and a Logistic function as predictors of the temperature dependence of the soil N mineralization rate in soil. The works [64,65] have analyzed different functions describing the effects of soil moisture, temperature, and their interaction on soil nitrogen transformation. To the best of our knowledge, there is no work comparing RZWQM with the Lean model by Porporato et al. [10] as performed in our setting.

Materials and Methods
The six macro nutrients that crops need are: nitrogen, carbon, phosphorous, potassium, calcium, magnesium, and sulfer. Among these, nitrogen and carbon are the most important. Nitrogen is required for amines and proteins in plants, whereas carbon is required for carbohydrates and energy for metabolism. Soil contains inorganic soil minerals and dead and living organic matter, which contain nitrogen and carbon. Each form of organic matter maintains its own near-constant C:N ratio. As organic matter decomposes it either becomes another form of organic matter, which is more resistant to decomposition or is consumed by microbes, which then transforms the organic form, and at times also releases carbon dioxide or methane.
While soil has carbon in the form of organic matter, the plant takes carbon from the atmospheric air only. In contrast, while the atmosphere has ample nitrogen, crops can take nitrogen from soil only in its inorganic form, namely, ammonia and nitrate. Mineralization occurs when nitrogen in organic matter changes to inorganic form ammonia through microbial action. Nitrification, another microbial action, then converts ammonia to nitrate. The reverse microbial action of converting inorganic nitrogen to organic form is termed immobilization. The nitrogen in soil can be released from soil to atmosphere by way of denitrification, which is a chemical reaction of conversion of nitrate to N 2 and N 2 O.

Soil Nutrient (C and N) Module in RZWQM
The nitrogen cycle in general and as modeled in RZWQM is described below in Figure 1. Here (1) designates the Haber process: N 2 + 3H 2 → 2NH 3 that fixes atmospheric nitrogen to fertilizers, nitrate, ammonia, or urea. (2) indicates the ammonia-containing fertilizer being added to the ammonia pool. Urea, which has two amide groups, converts to ammonium by urea hydrolysis. (3) Indicates the nitrate containing fertilizer being added to the nitrate pool. Nitrate moves downwards, along with water leading to leaching losses. (4) Indicates the slow surface residue pool like manure that is decomposed by heterotrophic bacteria. A fraction of decomposed residual becomes ammonium. (5) Indicates a part of slow surface residue that on decay joins the fast humus pool below the soil surface. (6) Indicates the crops having nitrogen fixing bacteria in root nodules that fixes N from the atmosphere. (7) Indicates the decay of fast surface residue like dead stalks leaves. A fraction of decay matter joins the organic matter (OM) pool below the soil surface. (8) Indicates the decay of OM by bacteria produces ammonium. (9) Indicates the mineral form of N being converted to organic form due to consumption by bacteria. (10) Indicates the crop root uptake of mineral N from soil. (11) Indicates denitrification that causes nitrate to become nitrogen gas. (12) Indicates that the crop N removal via harvesting. (13) Indicates that ammonium converts to nitrate via nitrification by nitrifying bacteria. (14) Volatilization of ammonium to ammonia gas.
The input variables to the C-N system are fertilizer (inorganic N), manure (organic N and C), and soil moisture/temperature/pH. RZWQM has nineteen C and N pools [9], namely, OM pool 1 (slow-decaying structural material), OM pool 2 (fast-decaying metabolic material), OM pool 3 (fast-decaying), OM pool 4 (medium decaying), OM pool 5 (slow decaying), heterotrophic biomass (soil decomposer), autotrophic biomass (nitrifier), heterotrophic biomass (facultative anaerobes), NO 3 , NH 4 , CO 2 (acts as source or sink), urea, N 2 sink, NH 4 mineralized, NO 3 immobilized, N 2 O sink, NH 3 volatilized, NH 4 immobilized and CH 4 (source/sink). Most of the state variables of the system are shown as ellipses and dashed boxes in Figure 1. The dynamics of the first ten state variables are given in Equations (1)- (10). The other remaining pools are used to balance the mass of C and N during transformation of the first ten state variables. RZWQM segregates soil OM into five pools depending on their resistance to decomposition and vicinity to surface. Microbes are divided into three pools. Other simple N models may segregate OM into fewer pools. This transformation among pools is shown by arrows and the corresponding processes mentioned within those arrows. Fertilizer and manure increase the nitrate and ammonia pool by the amount added. Soil water, pH and temperature affect the decay rates of various pools. Flow of water also affects N content through its transport. (Water flow is a separate sub-module in RZWQM.) The outputs, nitrate and ammonia, are available for plant uptake. The carbon content of organic matter pools, as shown in Figure 1, are C OM1 for near-surface slow residue pools (e.g., manure), C OM2 for near-surface fast residue pools (e.g., crop residue), C OM3 for below-surface fast humus pools, C OM4 for below-surface intermediate humus pools, C OM5 for below-surface slow humus pools, C het for aerobic heterotrophs (organic matter decomposers), C aut for autotrophs or nitrifiers, and C ana for facultative heterotrophs or anaerobic denitrifiers. The nitrogen content of an organic matter pool is obtained by dividing the carbon content with the C:N ratio of that pool. The concentrations of the inorganic pools are C NH4 for ammonium, C NH3 for ammonia gas, C N03 for nitrate, C CO2 for carbon dioxide gas, and C N2 for nitrogen gas. The complexity of RZWQM can be gauged by some of the state update equations for the five pools of soil carbon, three pools of microbes, and two pools of mineral nitrogen, as given in the following. These 10 equations and the interrelations among them have been consolidated from [9]. In these equations, the following notations are used: T-soil temperature, k b -Boltzmann constant, h p -Planks Constant, f aer -factor for the extent of aerobic conditions (this depends on soil water content and bulk density), A i -pool specific rate coefficient, E a -apparent activation energy, O 2 -oxygen concentration in soil, H-hydrogen ion concentration, kh-hydrogen ion concentration in soil, γ 1 -activity coefficient for monovalent ion, k h -hydrogen ion exponent for decay of OM, e max , e nitefficiency factor, f r -fraction of soil decomposer over total heterotrophs, f t (i)-fraction of decayed OMi lost to other OM pools, C N -C:N ratio of autotrophic biomass, a ad -decay conversion factor (this depends on soil water content and bulk density), e ad -fraction of decayed soil carbon to biomass, C S -weighted carbon substrate concentration (linear combination of OMs), E a n-apparent activation energy for nitrification, E death -apparent activation energy for microbial death. R ij -fraction of decayed OM transferred from pool i to j, E den -apparent activation energy for denitrification, A den -denitrification rate coefficient, k v0 -volatilization constant, W-wind speed, Z-soil depth, P NH3 -soil depth, C urea -urea concentration, E k -equilibrium constant between ammonia and ammonium, A u -rate coefficient for urea hydrolysis, E u -activation energy for urea hydrolysis.
Equation (1) below models the decay of the OM1 pool in Figure 1 (paths 4 and 5). Equation (2) models the decay of the OM2 pool (path 7). Equation (3), the decay of the OM3 pool, along with the addition of a fraction of decayed OM2 to OM3 (path 7) and that of dead microbes to OM3. Equation (4) models the decay of the OM4 pool, along with the addition of fractions of decayed OM1 (path 5) and OM3 to OM4. Equation (5) models the decay of the OM5 pool and the conversion of a fraction of decayed OM4 pool into OM5. Equation (6) describes growth of the heterotroph microbe pool (soil decomposers) by consumption of decayed OM pools, and its own death. Equation (8) describes the growth and death of the autotroph microbe pool (nitrifiers). Equation (8) describes the growth of the anaerobe microbe pool (facultative heterotrophs) by feeding on decayed OMs and by denitrifying soil NO 3 , alongside their death. Equation (9) models loss of ammonium due to nitrification (path 13) and volatilization (path 14), and accumulation due to urea hydrolysis if urea fertilizer is added (path 2). If ammonium-containing fertilizer is applied, that amount also gets added to the NH4 pool (path 2). Equation (10) describes accumulation of nitrate due to nitrification (path 13), loss of nitrate by denitrification (path 11), and if nitrate-containing fertilizer is applied, that amount is directly added to the NO3 pool (path 3). If a crop is also present, then root uptake of N diminishes NH4 and NO3 pools (path 19). The N-level of each OM pool is calculated from each pool's C:N ratio, which are constant and specific to a pool. To maintain mass balance, C is adjusted by release/absorption of CO 2 (if O 2 not limiting) or CH 4 (if O 2 scarce) and N is adjusted by release (mineralization, path 8) or absorption (immobilization, path 9) of NH 4 .

A Lean Nutrient Model
While the RZWQM model represents the upper extreme of complexity, an alternative lean nutrient model is desirable as explained in the Introduction. For this, we implemented the model taken from [10] and provide its comparison with RZWQM. This lean model makes certain simplifying assumptions: there are no volatilizations of ammonium to ammonia, no ammonia absorption, no nitrogen deposition from the atmosphere, no biological fixation of nitrogen, no denitrification of nitrate to nitrogen gas. Furthermore, only one soil layer of 200 cm depth is considered instead of the seven soil layers in RZWQM. Figures 2 and 3 show the same set of pools, but Figure 2 shows the carbon fluxes, while Figure 3 shows the N fluxes. Plants uptake mineral nitrogen, in the form of ammonium (NH 4 ) and nitrate (NO 3 ), which are made available through OM decomposition. That is why the nitrogen cycle is closely linked to that of carbon. As can be seen from Figures 1 and 2, the lean model has one litter pool in contrast to two residue pools in RZWQM, while the three soil humus pools of RZWQM are replaced with a single humus pool. Furthermore, only one microbial biomass pool is used against the three microbial pools of RZWQM. All the pools maintain a constant C:N ratio: whenever there is a flux of carbon between pools there is a commensurate flux of nitrogen between pools also.  In Figure 2, a fraction r h of decayed litter (in the form of carbon) goes to the humus pool and a fraction r r is respired as CO 2 . The remaining carbon is consumed by microbes for their growth. The humus pool on decomposition goes to the microbes pool and respires as CO 2 to the atmosphere. In Figure 3, there are various additional fluxes that impact the nitrogen pool; namely, (i) MIN flux flow of nitrogen from organic matter pool to the mineral N pool, which happens when the litter and humus have more nitrogen, (ii) IMM flux that accounts for immobilization; this flux happens when there is insufficient nitrogen for microbial growth and the microbes consume inorganic nitrogen in the form of ammonia and nitrate. Ammonia and nitrate can leach and flow out of the soil profile. Ammonia and nitrate can have an incoming flux of nitrogen when fertilizers are added, and those can also be taken up by the crop roots.
The rate equation for carbon content in the litter pool is given by (11), where t is timestamp; ADD is external input into the system: where in (11), the microbial biomass death BD is given as: in which C b is the carbon concentration in the microbial biomass pool and k d is a proportionality constant. Decay of litter DECl by microbes is given as: where C l is carbon concentration in the litter pool, k l is proportionality constant (defines the rate of decomposition for the litter pool), coefficient ρ (with 0 ≤ ρ ≤ 1 and 0 implying not enough mineral N for immobilization) is a factor that accounts for reduction of the decomposition rate when the litter is very poor in nitrogen and the immobilization is not sufficient to integrate the nitrogen required by the bacteria, s is relative soil moisture, and the soil moisture factor on decay f d is given as: in which S f c is the normalized field capacity. The variation of f d is given below in Figure 4. The dynamics for the nitrogen component of the litter is as follows: where (C/N) add , (C/N) b and (C/N) l are C:N ratios of the added plant residue, microbe and litter pools, respectively. The equation for carbon content of the humus pool is: where r h is the isohumic coefficient. In (16) the decay rate of humus carbon is given as: in which proportionality constant k h k l . The equation for the nitrogen content of the humus pool is: The rate equation for the carbon in microbial pool is: The rate equation balance for nitrogen in the microbial pool is: where φ[t] is the net mineralization given as: where MI N is gross mineralization, I MM is the total rate of immobilization, I MM + is the immobilization rate of ammonium and I MM − is the immobilization rate of nitrate. Net mineralization (i.e., φ > 0), implies I MM = 0 and MI N = φ, whereas net immobilization (i.e., φ < 0), implies MI N = 0 and I MM = φ. Accordingly, three possible cases might arise depending on nitrogen availability: Case I: If litter and humus have sufficient N to keep (C/N) b constant, then net mineralization.
Case II: If litter and humus have insufficient N, then microbes consume N from the mineral pool, i.e., the case of immobilization.
Case III: If litter, humus, and mineral N have insufficient N, the decay is slowed down by reducing ρ. The mathematical equivalents of the three cases are as follows: > 0, then Case I and ρ = 1. If < 0, then immobilization, and if I MM ≤ I MM max , then Case II with ρ = 1, where immobilization of ammonia and nitrate given as: with Otherwise, Case III with The equations for nitrate and ammonia dynamics are: with nitrification given as: The moisture factor on nitrification is given by: the leaching of ammonia and nitrate is given by: in which the water flow leaching downward is: where s[t] is soil moisture content at tth day; a ± is the mixing coefficient of ammonia and nitrate with water; n is soil porosity; Z r is depth of soil profile; k n is nitrification constant; k l is decay constant for kitter pool; k h is decay constant for humus pool; k d is death rate of microbes; K S is saturated hydraulic conductivity; and (C/N) l , (C/N) b and (C/N) h are C:N ratios of litter, biomass and humus pool. We can observe that the lean model's reaction rates do not depend on temperature, unlike RZWQM where reaction rate depends on temperature, pH, soil O 2 as well as moisture.

RZWQN vs. Lean Model Comparison Setup
Due to unavailability of high-frequency agriculture field nitrogen data, we first calibrate RZWQM using the available data from an experimental field in Greeley, CO, and next use the daily nitrate and ammonia data generated from RZWQM as ground truth, against which the lean model's N dynamics parameters are calibrated. The experimental field is the Limited Irrigation Research Field, a USDA research site, where maize was grown from 2008-2011 with different irrigation treatment. Urea Ammonium Nitrate (UAN) was applied so that there is no N stress. Details of the field experiment can be found in [66]. RZWQM was calibrated to this field as per our prior work [37,67]. Once calibrated, the crop growth was zeroed out in the model since our goal is to compare only the N dynamics. With this, the RZWQM only has active modules for soil N, soil water, and heat dynamics, which were run under normal fertilizer and irrigation application. The run yielded the daily soil water content and nitrate and ammonium values at seven soil depths. The average of these data across the seven depths was then treated as ground truth for calibrating the lean N-model. The lean model requires soil moisture values as can be seen from Equations (13), (30) and (32), but it does not have a coupled soil moisture model. The daily soil water content from RZWQM (averaged over seven soil layer of RZWQM) was given as soil moisture input to the lean N-model. The N dynamic's parameters listed in Table 2 were calibrated using pyswarm Particle Swarm Optimization package in Python, to fit the ammonium and nitrate data obtained from field calibrated RZWQM. The calibrated lean N model's ammonium and nitrate output were compared with RZWQM's, as reported in Section 4.
We next describe our framework for comparing a "test model" (a lean N-model in our case) against a "benchamrk model" (RZWQM N model in our case) when the dataset available is not directly usable by the "test model". This is because the benchmark model would be embedded within a complete agriculture model and can be calibrated by the available dataset. This more complete model is used to first calibrate its embedded benchmark model, and next the simulation of the benchmark model is used to generate the data needed to calibrate the test model. The proposed framework for this is shown in Figure 5. The field data (1) is utilized by a calibration program (2) to calibrate a benchmark model embedded within a complete agriculture model (3). The program (2) makes calls to the benchmark model (3) to explore parameter values that best fit the benchmark model to the field data. The program (2) outputs the calibrated parameters (4). With these parameters, the calibrated benchmark model (5) outputs the required data (6) to be able to calibrate the test model. The type of outputs in (6) can differ from that of the field data (1). The output data in (6) is used by program (7) to calibrate a test model (8), again using iterative exploration. The calibrated parameters (9) given by (7) is used by the calibrated test model (10) to give its own outputs (11) of the same variables as in (6). At this point the two model outputs will be compared in (12).

MyGeoHub Cyber-Infrastructure
Several cyberinfrastructures exist for supporting a variety of disciplines and communities. For instance, TeraGrid [68,69] provides distributed computing, storage and networking facilities to many scientific disciplines. nanoHUB [70] infrastructure is geared for nanotechnology. Integrated computational materials engineering (ICME) [71] cyberinfrastructure supports the manufacturing and materials science community. A cyberinfrastructure for US plant and life science community, formerly iPlant and now renamed as cyverse [72], also exists. An infrastructure for a geographical soil moisture dataset is also being developed [73]. The Hydroshare [74] environment helps the collaboration and sharing of hydrologic data and models. Hubzero (the one we propose to use) [40] lets users access and share tools, data-sets, videos, and tutorials with DOIs (digital object identifiers) for subse-quent citations and collaborate online to develop simulation and modeling tools. Numerous scientific communities like pharmaceutical engineering (pharmaHUB.org), heat transfer (https://nanohub.org/groups/thermal (accessed on 28 March 2021)), healthcare (https:// indianactsi.org/ (accessed on 28 March 2021)), and geospatial projects (https://mygeohub. org (accessed on 28 March 2021)) use the Hubzero platform.
For this work of implementing a lean N-model as a cloud service, we utilize My-GeoHub [39] infrastructure leveraging Hubzero [40]. MyGeoHub system architecture is depicted in Figure 6. MyGeoHub lets us create collaborative applications that are built from open source software like Linux OS, Apache web server, MySQL database, Joomla content management system, and PHP web scripting. It has built-in support/ticket features, statistics of usage, integration with GitHub/Google Drive/Dropbox, and supports wiki style documentation. Users access interactive and graphical tools via Java applet in their Web browsers. Hub tools run in OpenVZ containers and utilize VNC (Virtual Network Computing) to provide interactive access. The tools run on HPC clusters where jobs can be dispatched to more powerful computers in the national grid infrastructure. The Rappture toolkit generates GUI based on XML description for a tool's inputs and outputs. The underlying simulator may be written in any programming language. Tool development on the hub is done in the "Workspace" tool, which provides a complete Linux environment. Apart from standard libraries, domain-specific libraries can also be imported. MyGeoHub hosts four projects, namely, (i) driNET-for sharing local to regional drought information, (ii) WaterHub-for sharing hydrologic information, (iii) Useful-to-Usable (U2U)-to transform historical climate data, knowledge, and model into decision support tools for crop farmers, educators, and Ag advisors, and (iv) Geoshare-to develop a spatially global database on agriculture integrated with an economic analysis tool. Some AgMIP [75] projects use MyGeoHub for hosting its tools. iData in MyGeoHub provides support for the entire cycle of geospatial data driven analysis, management, preview, map search, sharing, and publication. MyGeoHub is free to host and access tools for broad research and education communities.
MyGeoHub requires the file system in the repository to be organized in a particular fashion. The general file setup is as follows. The root folder of the tool repository would have three folders; namely, (i) bin folder, which would have the executable file. This executable file will be run on MyGeoHub when the tool is launched. (ii) The middleware folder that has an invoke file. Invoke file is a bash script that gets the tool ready to run. (iii) The src folder having three subfolders, (a) Source code for the tool (b) Manifest file-file used in the makefile that aids in the compilation of the source code and (c) Makefilea makefile basically automates the compilation of source code and the creation of an executable file.

Hosting of Lean N-Model in MyGeoHub
Our implementation of the lean N-model [10] is hosted in MyGeoHub. To host any tool in MyGeoHub, an account has to be created and approved by the administration team at MyGeoHub. User specific files can be stored in it much like Google Drive. To create a tool, one has to navigate to Contribute under the Resources drop-down menu on Account home page. When creating the tool, details like Tool name, ID, and description are to be given, which we did. After this step, SVN (Apache Subversion, subversion.apache.org) repository, an open source version control system, similar to a simple version of Git, was set up. The SVN software was downloaded on the local computer and synced to the tool being developed on MyGeoHub. Doing this allows a tool developer to add files to a filesystem on a local development machine, which can then be committed to the SVN server. MyGeoHub then uses the code in the SVN server to deploy the tool in it.
The lean N-model [10] was implemented in Java programming language using the JavaFX library. JavaFX is an extensive Java library that supports the creation of complete applications and user interfaces that supports user input and output using JavaFX buttons, text input boxes, file upload buttons, and graphs. Several validations and error displays were put into place for the input; for example, non-negativity of ammonium and nitrate values, proper format of ammonium, nitrate and soil moisture file.
The makefile for our tool compiled the source code, used the generated class files to create a jar executable file, and then moved the executable file to the bin directory. The makefile was tricky to format because it runs commands in the command line, and the local development computer runs windows while the MyGeoHub platform runs Linux. Windows and Linux use different commands in the command line and also may have different versions of Java installed, so a makefile that works on a local computer would likely fail when run on the MyGeoHub platform. With guidance from the support team of MyGeoHub to figure out what commands to use, Virtual Linux machines were used to test the makefile commands that worked in MyGeoHub as well. Hence, it is advisable to develop the tool in a local development computer that is also running on a Linux operating system. Another issue that needed to be resolved is that file upload buttons in the tool would open the files of the MyGeoHub Linux machine instead of the files on the tool users' local machine, making file uploading from the users' local machine impossible. This was resolved by adding a line to the invoke file that enabled a special MyGeoHub file uploading tool. This enabled any tool user to use this uploading tool to upload their files to the MyGeoHub Linux system and then use the uploading buttons on the tool to choose their files from the Linux machine.
The lean N-model is deployed at https://mygeohub.org/tools/benfeddersen (accessed on 28 Mar 2021). Any user with a MyGeoHub account can access the tool. The tool inputs daily soil moisture values and daily nitrate and ammonia prescriptions to the soil. After launching the tool, the input files have to be uploaded from user's local computer to the MyGeoHub user space using the MyGeoHub file upload tool, as shown in Figure 7. Number of days for simulation, initial nitrate, and ammonium values are to be entered through the input boxes shown in Figure 8. Then nitrate, ammonium, and soil moisture input files have to be provided by clicking on the buttons (named Choose N input file and Choose swc file) shown in Figure 8. The file input buttons in Figure 8 choose files from the MyGeoHub user space. One can run the tool by clicking the Run button in the top left corner. The tool would then generate and display two graphs, as shown in Figure 8. One can also click the Restart button to clear the graphs and restart the tool. Note if the tool is restarted, the step mentioned in Figure 7 can be skipped because the files will still be uploaded to the Linux machine.

Results
Following the general framework of Figure 5, we set the benchmark model to be RZWQM, and the test model is taken to be the lean N-model of [10]. The benchmark model calibration algorithm used is coordinated descent from our prior work [37,76]. The coordinated descent method showed good model prediction of the field data, with a coefficient of determination (R 2 ) of 0.79. The field data (1) is taken from USDA experimental field in Greeley, CO [66], which consists of soil moisture and plant growth. The parameters calibrated in the RZWQM model are eight hydraulics at seven depths and three crop growth parameters (total 59). The outputs (6) and (11) compared are daily nitrate and ammonium outputs. The output data of calibrated RZWQM in (6) was used to calibrate 12 parameters (9) of the lean N test model using Particle Swarm Optimization (7). The calibrated lean N-model (10) is run with the parameters in (9) to output daily nitrate and ammonia values (11). The outputs from (6) and (11) are compared in (12) visually through plots and quantitatively using the coefficient of determination.
For the calibrated RZWQM, we examined its daily nitrate and ammonia predictions (under no crop) and used those values to calibrate our lean N-model [10], implemented in Python and JavaFX (and hosted in MyGeoHub run). Its nutrient parameters, namely, a + , a − , b, K S , r h , r r , k d , k l , k h , k n , k + i , k + i were calibrated against the RZWQM predicted (as a substitute for field N data) nitrate and ammonia data.
The lean N-model parameter calibration was done using PSO (Particle Swarm Optimization [77]) in Python. PSO is a stochastic technique inspired by social behavior of bird flocking in search of food. PSO is initialized with a group of random particles (candidate solutions). It searches for the optimum by updating through iterations. In every iteration each particle's velocity and position is updated by following two best values. First is the position of the best solution the particle itself has achieved so far. Second is the best solution attained so far by any particle in the swarm. pyswarm library in Python was used to implement the PSO algorithm, which also has the constraint optimization capability. The constraints imposed by us were the lower and upper range of the nutrient parameters so that physically practical values fit the data.
The lower and upper range and calibrated values obtained by PSO are summarized in Table 2. The population size of the PSO used was 100,000, and the PSO algorithm terminated in about 1000 iterations when its cost function could not be reduced further. We fitted both nitrate and ammonia values by defining the cost function for the PSO as the sum of weighted RMSE of ammonia and nitrate. The weights used were the inverse mean of the RZWQM predicted values of ammonia and nitrate, respectively. (This is because, while nitrate and ammonia use the same units, numerical values for nitrate is much higher than those of ammonia.) The RMSE upon termination was found to be 0.8353.
The plot of ammonia and nitrate values from two different models is given below in Figures 9 and 10, respectively. The plots show similar predictions by the two models, with the figure of merit of the fit as measured by the coefficient of determination (R 2 ) to be 0.99 and 0.62 (1 being a perfect fit) for nitrate and ammonia, respectively. In addition, while a run of RZWQM takes 12 s without the SHAW module and 70 s with that SHAW module, the runtime for the lean N-model is only 0.35 s. However, it should be remarked that RZWQM must also simulate the soil water and heat modules, apart from the N-cycle module. The lean model is thus also more flexible. Time required by PSO to calibrate the lean N-model was 1.5 h.

Discussion
Figures 9 and 10 show promising results. The occurrences of the peaks and the rates of rise and fall are comparable. Some mismatch is understandable as the lean model does not assume many physical processes and works with lesser nitrogen pools. RZWQM also discretizes the soil into many layers (with different soil properties if required), which is avoided in the lean model. We can conclude that the lean N-model with fewer pools and parameters yields reasonably accurate ammonium and nitrate plots, helpful for faster prototyping. We attribute the possible reason of ammonia data having the fitting value of R 2 = 0.62 (as compared to nitrate data having a fit value of R 2 = 0.99) to the lean Nmodel not accounting for ammonia volatilization (loss of soil ammonium to atmosphere as ammonia gas). The volatilization process is implemented in RZWQM, and it depends on wind speed, soil temperature, and the partial pressure gradient of ammonia.
The lean N-model is easier to implement, calibrate (online and offline), and faster. High dimension optimization has a very large search space, and exploring it with a model with long execution time is not practical. In this regard, the lean N-model facilitates a quick initial phase of global optimization, whereas a complex model can later be used to fine-tune the optimization, starting from the output of the initial lean model based optimization.
To host and serve the model and related tools within the cloud, MyGeoHub was chosen. Hosting it there required some learning and support from their team. However, after that, managing and configuration creation, like setting up server, database, and sessions, are all taken care of by MyGeoHub.
The unavailability of high frequency field nutrient data is due to the unavailability of in-situ soil sensors. However, with recent advances in portable in-situ soil moisture, nutrient, and gas sensors, including the ones from our research group [78][79][80][81][82], there is a great possibility of high frequency measurement of soil and plant variables, which can then be used to calibrate the models independent of each other.

Conclusions
A general framework for model comparison when field data may limit direct comparison of models is proposed, developed, and also demonstrated. The framework first calibrates the benchmark model against the field data, and next it calibrates the test model against the data generated by the calibrated benchmark model. Using the proposed framework, this work qualitatively compared popular N models based on their modeling complexity and quantitatively compared a most complex one (RZWQM) with the least complex one (a lean N-model) to understand the gap and possibility of using a lean model for the sake of fast prototyping. In addition, to the best of our knowledge, there is no literature comparing the N-model of RZWQM with the lean model by Porporato et al. [10] with respect to the field data of Greeley, CO. Our prior calibration work [37] improved upon the then state-of-art calibration of RZWQM against Greeley, CO, field data performed by an agriculture expert [67]. Here we used [37] to calibrate RZWQM and used its simulated data to then calibrate the lean N-model. The models were then compared visually through plots as well as quantitatively through the coefficient of determination.
We implemented in Python a lean reliable alternative for a soil nutrient dynamics module, calibrated it using particle swarm optimizer also written in Python, and compared the prediction results of the model with the state-of-art complex model, RZWQM. The lean N-model showed good prediction accuracy vis-a-vis RZWQM, providing R 2 model accuracy of 0.99 and 0.62 for nitrate and ammonia, respectively, and afforded speed-up over RZWQM in runtime. Having a lean, accurate model enables fast prototyping of offline designs. The implemented lean N-model was hosted in MyGeoHub to be universally accessible to the community. Development and integration of lean water flow and crop growth modules are possible future extensions. Accordingly, in the long term, we envision integrating most agriculture processes, automated model calibration, and optimized de-cision recommender in one platform, one that would be independent of a user operating system and accessible anywhere through the internet.