Two-Stage Wiener-Physically-Informed-Neural-Network (W-PINN) AI Methodology for Highly Dynamic and Highly Complex Static Processes

Hurd, Dillon G.; González, Yuderka T.; Oyler, Jacob; Wolfe, Spencer; Lamm, Monica H.; Rollins, Derrick K.

doi:10.3390/stats9010006

Open AccessArticle

Two-Stage Wiener-Physically-Informed-Neural-Network (W-PINN) AI Methodology for Highly Dynamic and Highly Complex Static Processes

by

Dillon G. Hurd

¹

,

Yuderka T. González

²,

Jacob Oyler

¹

,

Spencer Wolfe

¹

,

Monica H. Lamm

¹ and

Derrick K. Rollins

^1,3,*

¹

Department of Chemical and Biological Engineering, Iowa State University, Ames, IA 50011, USA

²

Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, IA 50011, USA

³

Department of Statistics, Iowa State University, Ames, IA 50011, USA

^*

Author to whom correspondence should be addressed.

Stats 2026, 9(1), 6; https://doi.org/10.3390/stats9010006 (registering DOI)

Submission received: 6 October 2025 / Revised: 26 December 2025 / Accepted: 29 December 2025 / Published: 1 January 2026

Download

Browse Figures

Review Reports Versions Notes

Abstract

Our new Theoretically Dynamic Regression (TDR) modeling methodology was recently applied in three types of real data modeling cases using physically based dynamic model structures with low-order linear regression static functions. Two of the modeling cases achieved the validation set modeling goal of

r_{f i t, v a l} \geq 0.9

. However, the third case, consisting of eleven (11) type one (1) sensor glucose data sets, and thus, eleven individual models, all fail considerably short of this modeling goal and the average

r_{f i t, v a l}

,

{\bar{r}}_{f i t, v a l}

= 0.68. For this case, the dynamic forms are highly complex 60 min forecast, second-order-plus-dead-time-plus-lead (SOPDTPL) structures, and the static form is a twelve (12) input first-order linear regression structure. Using these dynamic structure results, the objective is to significantly increase

r_{f i t}

for each of the eleven (11) modeling cases using the recently developed Wiener-Physically-Informed-Neural-Network (W-PINN) approach as the static modeling structure. Two W-PINN stage-two static structures are evaluated–one developed using the JMP® Pro Version 16, Artificial Neural Network (ANN) toolbox and the other developed using a novel ANN methodology coded in Python version, 3.12.3. The JMP

{\bar{r}}_{f i t, v a l}

= 0.74 with a maximum of 0.84. The Python

{\bar{r}}_{f i t, v a l}

= 0.82 with a maximum of 0.93. Incorporating bias correction, using current and past SGC residuals, the Python estimator improved the average

{\bar{r}}_{f i t, v a l}

from 0.82 to 0.87 with the maximum still 0.93.

Keywords:

block-oriented modeling; forecast modeling; forecast control; free-living data collection; glucose modeling; hammerstein modeling; physically-informed-neural-network; type 1 diabetes; wiener modeling

1. Introduction

Our broad objective is the extension of the Wiener-Physically-Informed-Neural-Network (W-PINN) (see Figure 1c) approach developed by [1] to improve modeling effectiveness when dynamic processes (i.e., systems) have highly nonlinear static behavior. This work is an extension and advancement of the modeling approach in [2] that was applied to three types of freely existing data sets. The first data set consisted of four (4) nutrient inputs (x_i, i = 1, …, 4) and modeled the change in weight (y) over time using first-order dynamic structures (v_i) for each input i, and a quadratic, multiple linear regression, static output structure,

f (V),

where

V

is a vector of the v_i’s. The second one consisted of nine (9) x_i’s and modeled the top tray temperature (y) of a pilot distillation column using second-order dynamic structures for the v_i’s and a first-order multiple linear regression, static structure for

f (V) .

Our goal, for the fitted correlation coefficient (

r_{f i t}

) of y and

\hat{y}

(the fitted y), is

r_{f i t} \geq 0.9

for test sets or validation sets when the test set is not possible, and this was met for these two cases in [2].

The third case in [2] consisted of individually modeling eleven (11), two-week, type 1 diabetes data sets, with twelve inputs (x_i’s), originally modeled in [3]. For each data set, its first week is used as training data and its second week as validation data. The sensor glucose concentration (SGC) sampling rate of five (5) minutes resulted in two very large data sets for each of the eleven modeling cases. The critical complexity of this case is that it requires forecast modeling for closed-loop forecast control, a future objective. Thus, all the inputs must have a model deadtime greater than or equal to the effective deadtime (θ_MV) of the manipulated variable (MV) unless it has a scheduled (known) change (e.g., a meal) and a deadtime less than θ_MV, called an “announcement input”. These requirements were not followed in [3] and it is, therefore, not applicable to the modeling objectives of this work. The modeling strategy in [2] estimates θ_MV first, then any announcement inputs, and then all other inputs, using a one-input simple linear regression structure, to obtain initial estimates of the dynamic parameters for each input separately. After completion of this step for all the inputs, a full second-order dynamic structure and first-order static structure strategy was used to obtain final parameter estimates. This approach resulted in an average validation set

r_{f i t, v a l}

of 0.68 and a maximum

r_{f i t, v a l}

of 0.77, considerably below the individual goal of

r_{f i t, v a l} \geq 0.9

.

Our hypothesis is that [2] used an effective dynamic structure and estimated the dynamic parameters sufficiently accurately for the eleven (11) SGC cases. However, the first-order linear regression static structure did not, and cannot, accurately capture the complex static forecast nature of SGC of these data sets. Thus, the overall goal of this work is the development of a two-stage modeling methodology for highly static and highly dynamic behavior that achieves the modeling goal of

r_{f i t, v a l} \geq 0.9

, for one or more of the eleven SGC data sets. While our process example is SGC, it was selected because of its highly dynamic and highly complex static nature. Thus, it is not the objective of this work to focus on the issues related to advancing SGC modeling but to present a general approach for modeling highly dynamic and highly complex static processes.

More specifically, the approach of this work is to use the

{\hat{V}}_{i}

’s results in [2], for each model i, as the first stage in a two stage W-PINN approach, where the second stage is a nonlinear static ANN structure. Note that empirical dynamic ANN modeling and PINN modeling methods are outside this scope since they do not use

{\hat{V}}_{i}

’s (i.e., ANN) or one for each input (i.e., PINN). W-PINN is the only methodology that can directly use

{\hat{V}}_{i}

’s results obtained in [2] and in any context.

Moreover, the objective of this work is the development of an effective W-PINN modeling approach for systems with significantly varying input dynamic behavior and complex nonlinear static behavior. More specifically, by using the Stage 1 dynamic modeling results obtained in [2], the objective of this work is to significantly increase

r_{f i t, v a l}

using a novel, proposed, two-stage (TS), W-PINN modeling approach. Two types of TS methodologies are proposed, one that uses JMP and one that uses Python coding to develop a novel input factor W-PINN approach.

The classical ANN approach is a one-box, empirical modeling methodology as illustrated in Figure 1a. As shown, p measured x_i inputs enter the ANN function,

\hat{f} (X)

, which is an estimated (as denoted by “^”) empirical function with constant coefficients that are adjusted under some criterion, commonly least-squares estimation, to maximize agreement (i.e., fit) between its modeled output,

\hat{y} = \hat{f} (X)

, and its measured output, y (e.g., SGC), where

X

is a pth dimensional vector of measured inputs. In Figure 1a,

\hat{f} (X)

can be a static function, e.g., a nonlinear regression function, or a combined static and empirically dynamic function of lag variables, e.g., a Long- and Short-Term Memory (LSTM) [4,5] function.

Phenomenologically dynamic and empirically static methodologies are illustrated in Figure 1b (PINN [6]) and Figure 1c (W-PINN [1]). Refs. [6,7] named their methodology “physically-informed-neural-network” (PINN). The critical difference between Figure 1a (classical ANN) and Figure 1b,c are the number of stages. Figure 1a has one stage for static and dynamic structures, and the two-stage methods in Figure 1b,c have one stage for static structures and one stage for dynamic structures. The PINN dynamic block is not restricted to linear dynamic structures as it is for W-PINN. However, nonlinear dynamic behavior is modeled when the dynamic outputs of W-PINN (i.e.,

{\hat{v}}_{p}

’s) are passed through

\hat{f} (\hat{V})

. A critical advantage of W-PINN over PINN is that each input has its own dynamic model structure, as illustrated by comparing Figure 1b,c.

The next section, Section 2, describes the W-PINN methodology fundamentally and mathematically. The theoretical structure of a general and complete second-order dynamic structure is first given in Section 2.1 as a differential equation and then transformed to its discrete-time version using backward difference derivatives. It then gives the explicit equation for

{\hat{v}}_{i, t}

, where “t” is the sampling time. Section 2.2 gives important Stage 1 details and results in assisting in the understanding of the Stage 2 methodology. Section 2.3 describes the JMP and Python Stage 2 methodologies. Section 2.4 gives the mathematical details of Model 1 (input only model), Model 2 (input-output), and Model 1–2, a combination of the strengths of Models 1 and 2. Section 2.5 gives the forecast model structure, i.e., the structure for

{\hat{y}}_{t + k Δ t}

. Finally, Section 2.6 gives the information and equations for the summary statistics.

Section 3 gives a table with all the numerical results for the three modeling methods. It also gives Model 1, Model 2, and Model 1–2 graphical results for the best fitting Stage 2 subject, Subject 2. For all three models, their

r_{f i t, v a l}

is 0.93. Section 4 gives a discussion of the results and Section 5 comments on work in progress and speculates on other possible future directions.

2. Materials and Methods

This section describes the two-stage W-PINN methodology in detail. It also gives critical Stage 1 SGC modeling particulars used by [2] to obtain the

{\hat{v}}_{i, t}

’s posted on the website of the last author (see https://drollins9.wixsite.com/derrickrollins, accessed on 6 October 2025) With both the

x_{i, t}

’s and

{\hat{v}}_{i, t}

’s posted on this website, modelers have the option of using these data sets to build both stages or just the second stage, the aim of this work.

With the sampling rate, Δt, equal to 5 min, our SGC models are forecasting 12 steps (i.e., 60 min) into the future. The forecast nature of this work is an artifact of the data sets we are using and their application and, thus, not a necessity of the methodology. As described in [2], 12Δt is the estimated observable time it took for the manipulated variable (MV), exogenous insulin, to cause SGC to start decreasing after a bolus increase (or insulin injection). The 60 min estimate was very consistent for the subjects in this clinical study, as noted in [2].

The models that this work develops are for an unobservant, 60 min, forecast monitoring scenario. “Unobservant” is meant to convey the protocol that the person(s) determining insulin changes have no knowledge of the forecast (

{\hat{y}}_{i j}

) estimates. In addition, even though the authors of [2] worked diligently to obtain models that minimize

{\hat{y}}_{i j}

pairwise correlation, we note that it is still significantly present. Thus, for this reason, this work is best understood as an unobservant monitoring application and not applicable to closed-loop control.

2.1. W-PINN

Our W-PINN approach uses backward difference derivatives (BDD) to discretize second-order-plus-dead-time-plus-lead (SOPDTPL) (for details of this methodology see [8]) theoretical dynamic systems, the only type used in this work, as given in Equation (1) below. The dynamic system does not have to be initially at a steady state for our W-PINN modeling methodology since the initial conditions are also estimated. Note that, Equation (1) is the expression for each of the p-inputs.

τ_{i}^{2} \frac{d^{2} v_{i} (t)}{d t^{2}} + 2 τ_{i} ζ_{i} \frac{d v_{i} (t)}{d t} + v_{i} (t) = τ_{a i} \frac{d x_{i} (t - θ_{i})}{d t} + x_{i} (t - θ_{i})

(1)

with

E [y (t)] = f (V (t))

(2)

where

t \geq θ_{i} \geq 0

,

τ_{i} > 0

, and

ζ_{i} \geq 0

for i = 1, …, p. x_i(t) is the value of the ith input variable at t, and v_i(t) is the value of the ith output variable at t, in the units of x_i; y(t) is the output variable in its units at t,

E [y (t)]

means the expected value (i.e., true mean) of y(t); and

f (V (t))

is the true output (gain) function of V(t), the vector of the v_i(t)’s. The variables

θ_{i}

is the deadtime of the MV,

ζ_{i}

is the damping coefficient,

τ_{i}

is the primary time constant, and

τ_{a i}

is the lead time constant. When

f (V (t))

is a nonlinear function of V(t), as in the ANN (i.e., W-PINN) case, Equations (1) and (2), taken together, have a Wiener block-oriented structure [8], as shown in Figure 1c.

The lead term is the first term on the right side of the equal sign in Equation (1). This term tends to “speed up” the response and provides what the process modeling and control community has termed “numerator dynamics” [8,9,10]. Ref. [8] developed a second-order, multiple-input, single-output, discrete-time, nonlinear Wiener dynamic approach using BDD based on Equation (1). More specifically, using BDD approximation applied to a sampling interval of Δt, an approximate discrete-time form of Equation (1) is

{\hat{v}}_{i, t} = \{\begin{cases} {\hat{δ}}_{1, i} {\hat{v}}_{i, t - Δ t} + {\hat{δ}}_{2, i} {\hat{v}}_{i, t - 2 Δ t} + {\hat{ω}}_{1, i} x_{i, t - {\hat{θ}}_{i} - Δ t} + {\hat{ω}}_{2, i} x_{i, t - {\hat{θ}}_{i} - 2 Δ t}, t > {\hat{θ}}_{i} = {\hat{m}}_{i} Δ t \\ {\hat{v}}_{{\hat{θ}}_{i}} = {\hat{v}}_{{\hat{m}}_{i} Δ t} t = {\hat{θ}}_{i} = {\hat{m}}_{i} Δ t \\ i s u n d e f i n e d, t < {\hat{θ}}_{i} = {\hat{m}}_{i} Δ t \end{cases}

(3)

with

{\hat{δ}}_{1, i} = \frac{2 {\hat{τ}}_{i}^{2} + 2 {\hat{τ}}_{i} {\hat{ζ}}_{i} Δ t}{{\hat{τ}}_{i}^{2} + 2 {\hat{τ}}_{i} {\hat{ζ}}_{i} Δ t + Δ t^{2}}

(4)

{\hat{δ}}_{2, i} = \frac{- {\hat{τ}}_{i}^{2}}{{\hat{τ}}_{i}^{2} + 2 {\hat{τ}}_{i} {\hat{ζ}}_{i} Δ t + Δ t^{2}}

(5)

{\hat{ω}}_{1, i} = \frac{({\hat{τ}}_{a i} + Δ t) Δ t}{{\hat{τ}}_{i}^{2} + 2 {\hat{τ}}_{i} {\hat{ζ}}_{i} Δ t + Δ t^{2}}

(6)

where

{\hat{ω}}_{2, i} = 1 - {\hat{δ}}_{1, i} - {\hat{δ}}_{2, i} - {\hat{ω}}_{1, i}

to satisfy the unity gain constraint. From Equation (3) with

t > {\hat{θ}}_{i} + 2 Δ t

,

\begin{array}{l} {\hat{v}}_{i, t} - {\hat{δ}}_{1, i} {\hat{v}}_{i, t - Δ t} - {\hat{δ}}_{2, i} {\hat{v}}_{i, t - 2 Δ t} \\ = {\hat{ω}}_{1, i} x_{i, t - ({\hat{θ}}_{i} + Δ t)} + {\hat{ω}}_{2, i} x_{i, t - ({\hat{θ}}_{i} + 2 Δ t)} \end{array}

(7)

\begin{array}{l} \Rightarrow (1 - {\hat{δ}}_{1, i} B - {\hat{δ}}_{2, i} B^{2}) {\hat{v}}_{i, t} \\ = ({\hat{ω}}_{1, i} B^{{\hat{θ}}_{i} + 1} + {\hat{ω}}_{2, i} B^{{\hat{θ}}_{i} + 2}) x_{i, t} \end{array}

(8)

\Rightarrow G_{i, t} = \frac{{\hat{v}}_{i, t}}{x_{i, t}} = \frac{{\hat{ω}}_{1, i} B^{{\hat{θ}}_{i} + 1} + {\hat{ω}}_{2, i} B^{{\hat{θ}}_{i} + 2}}{1 - {\hat{δ}}_{1, i} B - {\hat{δ}}_{2, i} B^{2}}

(9)

After obtaining

{\hat{v}}_{i, t}

for each input i, the modeled output value, at time t, is determined by entering these results into

\hat{f} ({\hat{V}}_{t})

, a static ANN in this application, i.e.,

{\hat{y}}_{t} = \hat{f} ({\hat{V}}_{t})

(10)

2.2. Stage 1 Modeling Method

This subsection gives important Stage 1 details and results in assisting in the understanding of the Stage 2 methodology. While missing output (i.e., SGC) measurements are acceptable, missing input values are not for discrete-time modeling. Activity tracker data were the only missing input data. These missing values were estimated by averaging the two values on both sides of a gap and filling in the gap with this value. Some gaps were several hours long. Blocked cross-validation [11,12] was used to guard against overfitting, with the first week as the training (Tr) data set, and the second week as the validation (Val) data set.

In Stage 1, all inputs were first modeled separately on their own Excel worksheet with a first-order linear regression static function. The tool that one chooses to use for this step, as well as all the steps, is a matter of preference. However, we encourage modelers to break the modeling process down for this, and all large/complex data sets, as we have for this case. Note that for the other two modeling cases in [2], this decomposition procedure was not used.

For each case, insulin was modeled first. The estimated deadtime, i.e.,

{\hat{θ}}_{M V}

, was set at 60 min and was varied one Δt forwards and backwards at a time to find the value that gave the best fit. For all the inputs, the estimate of θ_MV,

{\hat{θ}}_{M V}

, was determined to be 60 min, i.e., 12 Δt. The food variables were the only ones with announcements, and the carbohydrate input was the only one found to have a deadtime less than

{\hat{θ}}_{M V}

(see [8] for details on how to incorporate announced inputs into this approach). The “time of day” input has no deadtime, and the deadtime for all other inputs was

{\hat{θ}}_{M V}

, except for fats that had deatimes that were much larger than

{\hat{θ}}_{M V}

, as determined by model estimation. After estimating the dynamic coefficients for each input (i.e., Equations (4)–(6)), these values were copied to an Excel worksheet as the dynamic structure starting values for fitting the SOPDTPL dynamic, and first-order static, multiple-input model (i.e., Equation (10)). With

\hat{f} (\hat{V})

as a first-order multiple linear regression static function,

\hat{V}

, for Stage 2 was determined.

2.3. Stage 2 Model Development Modeling Methods

The objective of this work is the development, evaluation, and comparison of two, Stage 2, W-PINN modeling approaches using

{\hat{V}}_{t}

from Stage 1 to obtain

\hat{f} ({\hat{V}}_{t})

for each of the eleven data sets for the two approaches. The first approach used the JMP ANN toolbox to approximately find the smallest SSE (i.e., SSR) by fitting many cases and selecting the best one. All analyses were conducted using JMP Pro Version 16 (SAS Institute Inc, Cary, NC, USA) for neural network construction, preprocessing, and model optimization. More specifically, JMP was used to create and optimize a three-layer ANN structure. The ANN model began as a fully connected single-layer perceptron, functioning as a decision-making node. This architecture consisted of three layers: the input layer, a hidden transfer layer, and the output layer. Each node in the transfer layer received weighted inputs from the input layer, and the final predictions were made based on the output layer’s activations. The transfer functions used within the model were a combination of linear and Gaussian transformations, and boosting techniques were applied to enhance the model’s performance. To ensure the inputs were appropriately scaled and transformed, continuous covariates were preprocessed by fitting them to a Johnson Su distribution. Using maximum likelihood estimation, this preprocessing step helped transform the data closer to normality, thereby mitigating the effects of skewed distributions and outliers. The general fitting approach aimed to minimize the negative log-likelihood of the observed data, augmented by a penalty function to regulate the model complexity. Specifically, a sum-of-squares penalty, applied to a scaled and centered subset of the parameters, was used to address the overfitting problem that often occurs with ANN models. This penalty was based on the magnitude of the squared residuals (i.e.,

e_{i} = y_{i} - {\hat{y}}_{i}

), helping to stabilize the parameter estimates and improve the model’s optimization. Cross-validation was performed using the holdout method to assess the model’s ability to generalize to new data. The training set consisted of the initial data range, while the validation set represented future observations, ensuring the model’s predictive capacity for unseen data was tested.

The second approach is our newly developed, novel, ANN structure that is coded in the Python language, with the scipy.optimize library with a Tan(h) activation function running the dual annealing optimization solver. As Section 3 will show, both (JMP and Python) nonlinear regression, static stage-two, ANN approaches significantly improved fit in comparison with the linear regression, static stage-two, first-order approach used by [2].

Our W-PINN JMP approach uses a classical ANN p-input variable layer (see Figure 1) where

x_{i}

enters node i, i = 1, …, p, as shown in Figure 1c. In contrast, our proposed W-PINN Python approach uses a q-input factor layer where

q \geq p

(see Figure 2). For example, it uses terms like quadratic factors (e.g.,

x_{1}^{2}

) and interaction factors (e.g.,

x_{1} x_{2}

) as inputs to the input layer, where

w_{i}

represents input factor i (e.g.,

w_{5} = x_{1}^{2}

). Moreover, our proposed W-PINN Python methodology uses q input factors, and not p input variables, as shown in Figure 2 below.

2.4. Three Input Models

We developed three types of input model structures for this application. The first one we call the “input only model” or “Model 1.” All the inputs in this structure have a deadtime

\geq {\hat{θ}}_{M V}

except for announcement inputs that can have deadtimes less than

{\hat{θ}}_{M V}

like carbohydrates, equal to

{\hat{θ}}_{M V}

like proteins, greater than

{\hat{θ}}_{M V}

like fats, and zero like time of day, as mentioned above.

The second one we call the “input-output model” or “Model 2.” It combines the input-only structure of Model 1 (i.e., Equation (10)) with a model of weighted residuals (i.e., bias correction, see [8]), a minimum of

{\hat{θ}}_{M V}

distance in the past (note that, this is model building and not model forecasting), as shown in Equation (11) below (see [8] for the derivation).

\begin{array}{l} {\hat{y}}_{t} = \hat{f} ({\hat{V}}_{t}) + {\hat{ϕ}}_{1} (y_{t - {\hat{θ}}_{M V}} - {\hat{y}}_{t - {\hat{θ}}_{M V}}) \\ + {\hat{ϕ}}_{2} (y_{t - {\hat{θ}}_{M V} - Δ t} - {\hat{y}}_{t - {\hat{θ}}_{M V} - Δ t}) + \dots \\ = \hat{f} ({\hat{V}}_{t}) + {\hat{ϕ}}_{1} e_{t - {\hat{θ}}_{M V}} + {\hat{ϕ}}_{2} e_{t - {\hat{θ}}_{M V} - Δ t} + \dots \end{array}

(11)

Equation (11) has no value if any residual is not determinable due to missing output measurements. Thus, unlike Model 1, which has estimates for all t since it uses only input data, Model 2 will not have an estimate when an output value is missing.

The final model, Model 1–2, is a combination of the strengths of Models 1 and 2. More specifically, for Model 1–2,

{\hat{y}}_{t} = \{\begin{cases} Model 2 {\hat{y}}_{t}, if its value exists \\ Model 1 {\hat{y}}_{t}, if its Model 2 value does not exists \end{cases}

(12)

2.5. Forecast Structures

Equation (10),

{\hat{y}}_{t} = \hat{f} ({\hat{V}}_{t})

, is the fitted structure for Model 1, i.e., estimate of the output,

y_{t}

, at the current time, t. There are no missing input values in Equation (10). This is why missing armband data had to be estimated. In addition, non-announcement input values to obtain Equation (10) must be at least a distance of

{\hat{θ}}_{M V}

in the past. This requirement is because the model developed input lag must be the same as the forecast input lag, i.e.,

\hat{V}

at t, for forecasting a

{\hat{θ}}_{M V}

distance into the future.

After obtaining

{\hat{V}}_{t}

, its transformation into the kΔt forecast form, i.e., the online version, is given by Equation (13) below:

{\hat{y}}_{t + k Δ t} = \hat{f} ({\hat{V}}_{t + k Δ t})

(13)

where from Equation (7), with t = t + kΔt,

\begin{array}{l} {\hat{v}}_{i, t + k Δ t} = {\hat{δ}}_{1, i} {\hat{v}}_{i, t + (k - 1) Δ t} + {\hat{δ}}_{2, i} {\hat{v}}_{i, t + (k - 2) Δ t} \\ + {\hat{ω}}_{1, i} x_{i, t - ({\hat{θ}}_{i} - (k - 1) Δ t)} + {\hat{ω}}_{2, i} x_{i, t - ({\hat{θ}}_{i} - (k - 2) Δ t)} \end{array}

(14)

Note that, if k = 12 and Δt = 5 min, Equation (14) is forecasting 60 min into the future. Thus, all the non-announcement inputs must have, i.e., use, a model building and forecast prediction deadtime of at least 60 min.

2.6. Statistical Analyses

Formal statistics inference tools/methodologies such as confidence intervals, hypothesis testing, etc., are not applicable to dynamic modeling because response data are time-correlated, an inherent nature of time delay and time lag (i.e., dynamic) behavior [12]. Thus, it is not possible to randomize the occurrence (i.e., time order) of trials. Consequently, only informal inference, i.e., direct comparison of the values of statistical features (numerical and/or visual), is applicable to dynamic modeling inference. Moreover, this work uses the following statistics.

The first, and most important, modeling statistic is

r_{f i t}

(which is bounded between −1 and 1), the fitted correlation of the measured SGC,

y_{i}

, and the fitted SGC,

{\hat{y}}_{i}

, as given in Equation (15) below.

r_{f i t} = r_{y_{t}, {\hat{y}}_{t}} = \frac{\sum_{i = 1}^{n} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \cdot \sqrt{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{\hat{y}})}^{2}}}

(15)

where n is the number of samples in the set and the bar above a statistic means that it is its sample mean value. The equations to determine AAD and AD are, respectively,

A A D = \frac{\sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|}{n}

(16)

A D = \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})}{n}

(17)

The equation for SSE (i.e., SSR, the sum of squared residuals), the more common name and used by JMP, is

S S E = S S R = \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(18)

The loss function

J

shown in Equation (19), used by the Python program, is the SSE and an ANN parameter penalty σ is shown in Equation (20).

J (σ) = \sum_{k = s t a r t i n g_d a t a_\lim}^{T r a i n i n g_d a t a_\lim} {(y (k) - \hat{y} (k))}^{2} + (10,000 ∙ ‖σ^{2}‖)

(19)

where

σ = {[a_{11}, b_{11}, a_{22}, b_{22}, \dots, a_{12, 12}, b_{12, 12}, w_{1}, \dots, w_{12}, b_{w}, y_{a}, y_{b}]}^{T}

(20)

3. Results

Training and validation Stage 1 and Stage 2 summary statistics for the eleven subjects are given in Table 1. All the results are

r_{f i t}

unless indicated otherwise. Recalling that each subject has a fixed

{\hat{V}}_{t}

that was determined in Stage 1 using the first-order static structure as given in Equation (21) below:

{\hat{y}}_{t} = {\hat{a}}_{0} + {\hat{a}}_{1} {\hat{v}}_{1, t} + \dots + {\hat{a}}_{p} {\hat{v}}_{p, t}

(21)

As shown in Table 1, Stage 1, Model 1,

r_{f i t, v a l}

results varied from 0.59 to 0.77, with a mean of 0.68. Moreover, Stage 2, Model 1,

r_{f i t, v a l}

results improved significantly over the Stage 1 results for both ANN approaches. As shown, JMP Stage 2, Model 1,

r_{f i t, v a l}

results varied from 0.60 to 0.85, with a mean of 0.74. However, Python Stage 2, Model 1,

r_{f i t, v a l}

results are significantly better than JMP, varying from 0.72 to 0.93, with a mean of 0.82. As a result, Model 2 training and validation results and Models 1–2 validation results are given in Table 1 for Python only. From Model 1 to Model 2, the Python mean

r_{f i t, v a l}

increased from 0.82 to 0.87, the minimum from 0.72 to 0.80, and the maximum of 0.93 did not change. In summary, Python Stage 2 results improved considerably over Stage 1 results and are significantly better than JMP Stage 2 results.

Graphical Python Stage 2 fitted and measured SGC results for Subject 2 (the best case) are given (i.e., plotted) in Figure 3, Figure 4 and Figure 5. Figure 3 shows Model 1 training and validation. Figure 4 is Model 2 training and validation. Figure 5 presents the combined validation results, where Model 1 is plotted when there is no output data, and Model 2 is plotted when there is output data, i.e., the Model 1–2 validation plot. The Model 1–2 plots are associated with the results in the last three columns in Table 1. Figure 4 shows excellent fit of Model 3 and the highly realistic behavior of Model 1 when Model 2 results are not possible because of missing SGC data (see Equation (11)).

The Python Stage 2, Model 1 contains fewer than 65 trainable parameters, performs 60 min-ahead inference in 0.9–1.4 ms on an Intel Core i7-11850H. This can be extrapolated to an estimated <20 ms on an ARM Cortex-M7 microcontroller, making it highly suitable for real-time embedded deployment. The linear dynamic stage is analytically stable with all poles inside the unit circle, the shallow tan(h) network is Lipschitz continuous, and the optimization consistently converges across all 11 subjects [13]. Compared with an equivalent single-stage LSTM (~65 k parameters), the proposed method could train 4–6× faster, requires orders of magnitude less memory, and achieves superior validation performance, confirming excellent computational efficiency and numerical stability for practical diabetes monitoring and control applications.

In this study, two physically based virtual forecasting sensor approaches were developed for obtaining the value of the response variable, SGC, a θ_MV time distance in the future, for a two-stage forecast modeling application. The first stage, a physically (i.e., theoretical) based dynamic modeling approach [2] estimates the physically interpretable dynamic parameters from the measured inputs (x_i’s) with multiple physical constraints to obtain dynamic outputs (v_i’s). The v_i’s are the inputs to the second stage, a static ANN structure. For the first method, this structure was determined by using the ANN toolbox in JMP. For the second method, this structure was coded using Python. Both methods resulted in large average improvements over the Stage 1 results using a first-order linear regression static structure (see Table 1). In addition, a critical advantage of these two approaches is that the modeling is much easier and much less time-consuming than the second order multiple linear regression (MLR) approach. We do note, however, that the static behavior of these data sets is highly nonlinear. Thus, we strongly recommend ANN over MLR for the static model structure, i.e.,

{\hat{y}}_{t} = \hat{f} ({\hat{V}}_{t}) .

Note that ANN modeling is just a particular class of nonlinear regression. We also note that the MLR model was applied to Subject 11, the highest

r_{f i t, v a l}

(0.79) in [3], to compare the performance with the ANN models. The

r_{f i t, v a l}

had a modest improvement from Stage 1 alone, going from 0.74 to 0.79, but much less than the 0.85 obtained by P-ANN.

4. Discussion

We were pleasantly surprised by the P-ANN achievements of Models 1, 2, and 1–2. Model 1 has two subjects over the

r_{f i t, v a l}

goal of 0.90 and a mean of 0.82. Model 2 has four subjects meeting or exceeding the

r_{f i t, v a l}

goal and a significant increase in the mean

r_{f i t, v a l}

of 0.87. The combined Model 1 and Model 2 approach, i.e., Models 1–2, had essentially the same summary statistics results as Model 2, as shown in Table 1. Thus, combining Models 1 and 2, to have continuous forecasting without missing fits did not adversely affect

r_{f i t, v a l}

relative to Model 2, which had missing fits due to missing SGC measurements.

The amount of data missing for all the cases in this work is considerable and can be clearly seen in Model 1–2 plots such as Figure 5. Missing SGC occurred for three reasons —the sensor was off offline, the sensor was online but not saving the data, or SGC measurements exceeded the upper range of 400 mg/dL. In this application, missing output SGC data is unavoidable because the sensors must be changed and replaced periodically and new sensors must be recalibrated to the attributes of the subject to save data. While loss sensor data is undesirable, it is the quantity and activity periods of loss data that are most critical. Moreover, when missing data is possible, modeling protocols should include ways to minimize its impact.

5. Conclusions and Future Work

The W-PINN approach is particularly powerful because each input x_i is dynamically transformed to its v_i counterpart and is the input to a static ANN. The proposed two-stage W-PINN approach greatly improved the SGC model fit for eleven historical diabetes data sets. A one-stage W-PINN approach, in its evaluation stage, is the next step in this research.

During his time as a professor, the corresponding author gained valuable insight into the limitations of empirical modeling through a real-world industrial application. A BS Chemical Engineering student, also pursuing an MS in Statistics, undertook a summer project at a leading Midwest chemical company, which was approved for her MS thesis. The project focused on developing a multivariate Statistical Process Control (SPC) monitoring methodology for a process line. Data were collected, and an SPC chart was developed, resulting in an excellent model fit. However, when the process exceeded control limits, adjustments to the manipulated variable based on this model failed to restore control. A subsequent attempt with new data and a revised control chart, despite another excellent fit, similarly failed to correct deviations when applied in a feedback control scenario. This experience highlighted that the control chart, designed for monitoring, was unsuitable for feedback control due to its reliance on empirical correlation rather than cause-and-effect relationships. Empirical SGC modeling, which uses free-living data and non-physiological structures, faces similar limitations, as it cannot adequately capture cause-and-effect dynamics critical for model-based control applications like automatic forecast control. In contrast, physically informed modeling, which integrates physiological information and structure with free-living data, offers inherent intelligence and a robust structure for potentially developing effective models for control applications. Our W-PINN methodology proposed in this manuscript is approaching the goal for the diabetes data set, but more research and creative screening of inputs are needed to fully realize the goal for SGC closed-loop feedback control. In addition, this advancement relies on a dual hormone scenario, one to decrease SGC, i.e., insulin, and one to as effectively and safely increase SGC, possibly glucagon. Nonetheless, our proposed two-stage methodology has promise, it seems, for physically based, highly nonlinear static systems or processes.

Type 1 and 2 diabetes SGC modeling for monitoring can be effective (i.e., informative) using empirical or physically informed dynamic modeling approaches [14,15,16,17,18]. Closed-loop Type 1 SGC automatic control is inherently forecast automatic control because a change in MV, injected insulin, will take a time of θ_MV to start lowering SGC. For automatic closed-loop control, empirical dynamic modeling approaches are not likely to succeed because they lack a cause-and-effect relationship, unlike PINN approaches. Insulin is the process variable that is changed to keep SGC close to its set point, i.e., it is the manipulated variable (MV). For a control system to do this well in a forecast feedback control scheme, the controlled variable,

S G_{θ M V}

, must be accurately estimated. An empirical method could possibly control

S G_{θ M V}

online accurately if the correlation structure remains the same as it was when the model was developed. However, it is not possible for the correlation structure of an empirical forecast modeling approach in this context to remain intact, i.e., fixed, in online forecast feedback control because the correlation structure changes each time the controller signal to the manipulated variable is transmitted. Thus, it is prudent to restrict free-living empirical modeling to monitoring open-loop processes but not to make decisions on how much to change a manipulated variable to make changes in the control variable.

There are several challenges in the data sets used in this work. First, they are nearly a decade and a half old, and SGC technology has advanced considerably, particularly in terms of missing and lost data. Secondly, wearable technology has advanced considerably in reliability, measured sensor technology such as heart rate, as well as in data management. In addition, there are advancements in ways to obtain accurate consumption of food nutrients. Thus, one (distant) future goal is to evaluate W-PINN using data generated by current technology.

Author Contributions

Conceptualization, D.K.R. and D.G.H.; methodology, D.K.R., S.W. and D.G.H.; software, D.G.H., S.W., M.H.L. and Y.T.G.; validation, D.G.H., S.W., Y.T.G. and D.K.R.; formal analysis, D.K.R. and D.G.H.; investigation, D.G.H.; resources, D.K.R., M.H.L. and D.G.H.; data curation, D.G.H., S.W. and J.O.; writing—original draft preparation, D.G.H.; writing—review and editing, D.K.R., Y.T.G. and J.O.; visualization, D.G.H.; supervision, D.K.R.; project administration, D.K.R.; funding acquisition, M.H.L. All authors have read and agreed to the published version of the manuscript.

Funding

The work is part of the PhD research of Dillon G. Hurd with Dr. Rollins as Major Professor and Dr. Lamm as Co-Major Professor. Jacob Oyler was supported by the National Science Foundation under Grant No. EEC 1852125.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data sets are available to the public on the website of the corresponding author at https://drollins9.wixsite.com/derrickrollins, accessed on 6 October 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DR	dynamic regression
W-PINN	Wiener-Physically-Informed-Neural-Network
PINN	Physically-Informed-Neural-Network
ANN	Artificial neural network
SGC	sensor glucose concentration
SOPDTPL	second-order-plus-dead-time-plus-lead
SSR	sum-of-squared residuals
SSRval	validation sum-of-squared residuals
SSE	sum of squared errors
SSRtr	training sum-of-squared residuals
y	response or output
θ	deadtime
τ	lag
x	static input
v	dynamic input
t	the current time
r_fit	fitted correlation of the measured outputs and the modeled outputs
$r_{f i t, v a l}$	$r_{f i t}$ for the validation data set
AD	average difference
AAD	average absolute difference
MV	manipulated variable
Tr	training
Val	validation
Δt	sampling time
θ_MV	Deadtime of the manipulated variable
θ_i	Deadtime of the ith input variable
ANN	artificial neural networks
X	static input vector
V	dynamic input vector
DTD	discrete-time derivatives
BDD	backward difference derivatives
TDR	theoretically based dynamic regression
SOPDT	second-order-plus-dead-time
FOPDT	first-order-plus-dead-time

References

Hurd, D.G.; Rollins, D.K. A Powerful AI Grain Moisture Sensor Approach with Demonstration on a Real In-Bin Grain Dryer. Smart Agric. Technol. 2025, 12, 101467. [Google Scholar] [CrossRef]
Rollins, D.K.; Nilsen-Hamilton, M.; Kreienbrink, K.; Wolfe, S.; Hurd, D.; Oyler, J. Theoretically Based Dynamic Regression (TDR)—A New and Novel Regression Framework for Modeling Dynamic Behavior. Stats 2025, 8, 89. [Google Scholar] [CrossRef]
Kotz, K.; Cinar, A.; Mei, Y.; Roggendorf, A.; Littlejohn, E.; Quinn, L.; Rollins, D.K., Sr. Multiple-Input Subject-Specific Modeling of Plasma Glucose Concentration for Feedforward Control. Ind. Eng. Chem. Res. 2014, 53, 18216–18225. [Google Scholar] [CrossRef] [PubMed]
Jaloli, M.; Cescon, M. Long-Term Prediction of Blood Glucose Levels in Type 1 Diabetes Using a CNN-LSTM-Based Deep Neural Network. J. Diabetes Sci. Technol. 2023, 17, 1590–1601. [Google Scholar] [CrossRef] [PubMed]
Kamalraj, R.; Neelakandan, S.; Ranjith Kumar, M.; Chandra Shekhar Rao, V.; Anand, R.; Singh, H. Interpretable Filter Based Convolutional Neural Network (IF-CNN) for Glucose Prediction and Classification Using PD-SS Algorithm. Measurement 2021, 183, 109804. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Guo, Y.; Cao, X.; Liu, B.; Gao, M. Solving Partial Differential Equations Using Deep Learning and Physical Constraints. Appl. Sci. 2020, 10, 5917. [Google Scholar] [CrossRef]
Rollins, D.K.; Bhandari, N.; Kleinedler, J.; Kotz, K.; Strohbehn, A.; Boland, L.; Murphy, M.; Andre, D.; Vyas, N.; Welk, G.; et al. Free-Living Inferential Modeling of Blood Glucose Level Using Only Noninvasive Inputs. J. Process Control 2010, 20, 95–107. [Google Scholar] [CrossRef]
Seborg, D.E.; Edgar, T.F.; Mellichamp, D.A. Process Dynamics and Control, 2nd ed.; Wiley: Hoboken, NJ, USA, 2003; ISBN 978-0-471-00077-8. [Google Scholar]
Smith, C.A.; Corripio, A.B. Principles and Practices of Automatic Process Control, 3rd ed.; Wiley: Hoboken, NJ, USA, 2005; ISBN 978-0-471-43190-9. [Google Scholar]
Yates, L.A.; Aandahl, Z.; Richards, S.A.; Brook, B.W. Cross Validation for Model Selection: A Review with Examples from Ecology. Ecol. Monogr. 2023, 93, e1557. [Google Scholar] [CrossRef]
Roberts, D.R.; Bahn, V.; Ciuti, S.; Boyce, M.S.; Elith, J.; Guillera-Arroita, G.; Hauenstein, S.; Lahoz-Monfort, J.J.; Schröder, B.; Thuiller, W.; et al. Cross-Validation Strategies for Data with Temporal, Spatial, Hierarchical, or Phylogenetic Structure. Ecography 2017, 40, 913–929. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK; New York, NY, USA; Melbourne, Australia; New Delhi, India; Singapore, 2023; ISBN 978-0-521-83378-3. [Google Scholar]
Haleem, M.S.; Katsarou, D.; Georga, E.I.; Dafoulas, G.E.; Bargiota, A.; Lopez-Perez, L.; Rujas, M.; Fico, G.; Pecchia, L.; Fotiadis, D. A Multimodal Deep Learning Architecture for Predicting Interstitial Glucose for Effective Type 2 Diabetes Management. Sci. Rep. 2025, 15, 27625. [Google Scholar] [CrossRef] [PubMed]
Petridis, P.D.; Kristo, A.S.; Sikalidis, A.K.; Kitsas, I.K. A Review on Trending Machine Learning Techniques for Type 2 Diabetes Mellitus Management. Informatics 2024, 11, 70. [Google Scholar] [CrossRef]
Liu, K.; Li, L.; Ma, Y.; Jiang, J.; Liu, Z.; Ye, Z.; Liu, S.; Pu, C.; Chen, C.; Wan, Y. Machine Learning Models for Blood Glucose Level Prediction in Patients With Diabetes Mellitus: Systematic Review and Network Meta-Analysis. JMIR Med. Inf. 2023, 11, e47833. [Google Scholar] [CrossRef] [PubMed]
Prendin, F.; Pavan, J.; Cappon, G.; Del Favero, S.; Sparacino, G.; Facchinetti, A. The Importance of Interpreting Machine Learning Models for Blood Glucose Prediction in Diabetes: An Analysis Using SHAP. Sci. Rep. 2023, 13, 16865. [Google Scholar] [CrossRef] [PubMed]
Alfian, G. Blood Glucose Prediction Model for Type 1 Diabetes Based on Artificial Neural Network with Time-Domain Features. Biocybern. Biomed. Eng. 2020, 40, 1586–1599. [Google Scholar] [CrossRef]

Figure 1. AI approaches. (a) Classical ANN, (b) PINN, and (c) W-PINN. The “^” means estimate.

Figure 2. W-PINN with q input factors. The “^” means estimate.

Figure 3. Python Stage 2, Model 1, Observed and Fitted, Training, and Validation graphical results for Subject 2.

Figure 4. Python Stage 2, Model 2, Observed and Fitted, Training and Validation graphical results, for Subject 2.

Figure 5. Python Stage 2, Model 1–2, Observed and Fitted, Validation graphical results, for Subject 2.

Table 1. Stages 1 and 2 Modeling Results ¹.

Subject		Stage 1		Stage 2
		First-Order Linear Reg		J-ANN Model 1		P-ANN Model 1		P-ANN Model 2		P-ANN Model 1–2 Validation
		r_fit,tr	r_fit,val	r_fit,tr	r_fit,val	r_fit,tr	r_fit,val	r_fit,tr	r_fit,val	AD	AAD	r_fit,val
1	501	0.67	0.77	0.78	0.91	0.78	0.91	0.86	0.91	−9.17	25.71	0.91
2	502	0.73	0.75	0.82	0.93	0.82	0.93	0.91	0.93	−0.51	27.20	0.93
3	503	0.62	0.67	0.77	0.88	0.77	0.88	0.86	0.86	8.54	27.81	0.86
4	505	0.52	0.61	0.60	0.71	0.60	0.71	0.84	0.80	9.66	29.30	0.80
5	506	0.73	0.59	0.77	0.77	0.77	0.77	0.84	0.86	3.50	46.78	0.82
6	507	0.67	0.75	0.84	0.87	0.84	0.87	0.89	0.92	1.14	21.79	0.92
7	509	0.59	0.65	0.80	0.80	0.80	0.80	0.86	0.87	−12.33	31.53	0.86
8	510	0.37	0.62	0.57	0.80	0.57	0.80	0.74	0.83	2.45	26.05	0.83
9	511	0.64	0.60	0.73	0.75	0.73	0.75	0.85	0.85	−13.85	35.89	0.84
10	514	0.50	0.72	0.76	0.76	0.76	0.76	0.84	0.86	14.08	34.99	0.85
11	515	0.76	0.74	0.80	0.85	0.80	0.85	0.84	0.90	−1.49	28.24	0.90
Mean		0.62	0.68	0.75	0.82	0.75	0.82	0.85	0.87	0.18	30.48	0.87
Min		0.37	0.59	0.57	0.71	0.57	0.71	0.74	0.80	−13.85	21.79	0.80
Max		0.76	0.77	0.84	0.93	0.84	0.93	0.91	0.93	14.08	46.78	0.93

¹ All results are

r_{f i t}

unless otherwise indicated. Stage 1 results are for Model 1. J-ANN means JMP ANN and P-ANN means Python ANN.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hurd, D.G.; González, Y.T.; Oyler, J.; Wolfe, S.; Lamm, M.H.; Rollins, D.K. Two-Stage Wiener-Physically-Informed-Neural-Network (W-PINN) AI Methodology for Highly Dynamic and Highly Complex Static Processes. Stats 2026, 9, 6. https://doi.org/10.3390/stats9010006

AMA Style

Hurd DG, González YT, Oyler J, Wolfe S, Lamm MH, Rollins DK. Two-Stage Wiener-Physically-Informed-Neural-Network (W-PINN) AI Methodology for Highly Dynamic and Highly Complex Static Processes. Stats. 2026; 9(1):6. https://doi.org/10.3390/stats9010006

Chicago/Turabian Style

Hurd, Dillon G., Yuderka T. González, Jacob Oyler, Spencer Wolfe, Monica H. Lamm, and Derrick K. Rollins. 2026. "Two-Stage Wiener-Physically-Informed-Neural-Network (W-PINN) AI Methodology for Highly Dynamic and Highly Complex Static Processes" Stats 9, no. 1: 6. https://doi.org/10.3390/stats9010006

APA Style

Hurd, D. G., González, Y. T., Oyler, J., Wolfe, S., Lamm, M. H., & Rollins, D. K. (2026). Two-Stage Wiener-Physically-Informed-Neural-Network (W-PINN) AI Methodology for Highly Dynamic and Highly Complex Static Processes. Stats, 9(1), 6. https://doi.org/10.3390/stats9010006

Article Menu

Two-Stage Wiener-Physically-Informed-Neural-Network (W-PINN) AI Methodology for Highly Dynamic and Highly Complex Static Processes

Abstract

1. Introduction

2. Materials and Methods

2.1. W-PINN

2.2. Stage 1 Modeling Method

2.3. Stage 2 Model Development Modeling Methods

2.4. Three Input Models

2.5. Forecast Structures

2.6. Statistical Analyses

3. Results

4. Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI