Next Article in Journal
Smart Grid Security: A PUF-Based Authentication and Key Agreement Protocol
Next Article in Special Issue
Implementation and Evaluation of a Low-Cost Measurement Platform over LoRa and Applicability for Soil Monitoring
Previous Article in Journal
Design and Implementation of a Digital Twin System for Log Rotary Cutting Optimization
Previous Article in Special Issue
Design Considerations and Performance Evaluation of Gossip Routing in LoRa-Based Linear Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Latent Autoregressive Student-t Prior Process Models to Assess Impact of Interventions in Time Series

by
Patrick Toman
1,2,*,
Nalini Ravishanker
2,
Nathan Lally
1 and
Sanguthevar Rajasekaran
3
1
Hartford Steam Boiler, Hartford, CT 06106, USA
2
Department of Statistics, University of Connecticut, Storrs, CT 06269, USA
3
Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
*
Author to whom correspondence should be addressed.
Future Internet 2024, 16(1), 8; https://doi.org/10.3390/fi16010008
Submission received: 10 November 2023 / Revised: 19 December 2023 / Accepted: 23 December 2023 / Published: 28 December 2023
(This article belongs to the Special Issue Wireless Sensor Networks in the IoT)

Abstract

:
With the advent of the “Internet of Things” (IoT), insurers are increasingly leveraging remote sensor technology in the development of novel insurance products and risk management programs. For example, Hartford Steam Boiler’s (HSB) IoT freeze loss program uses IoT temperature sensors to monitor indoor temperatures in locations at high risk of water-pipe burst (freeze loss) with the goal of reducing insurances losses via real-time monitoring of the temperature data streams. In the event these monitoring systems detect a potentially risky temperature environment, an alert is sent to the end-insured (business manager, tenant, maintenance staff, etc.), prompting them to take remedial action by raising temperatures. In the event that an alert is sent and freeze loss occurs, the firm is not liable for any damages incurred by the event. For the program to be effective, there must be a reliable method of verifying if customers took appropriate corrective action after receiving an alert. Due to the program’s scale, direct follow up via text or phone calls is not possible for every alert event. In addition, direct feedback from customers is not necessarily reliable. In this paper, we propose the use of a non-linear, auto-regressive time series model, coupled with the time series intervention analysis method known as causal impact, to directly evaluate whether or not a customer took action directly from IoT temperature streams. Our method offers several distinct advantages over other methods as it is (a) readily scalable with continued program growth, (b) entirely automated, and (c) inherently less biased than human labelers or direct customer response. We demonstrate the efficacy of our method using a sample of actual freeze alert events from the freeze loss program.

1. Introduction

With the advent of the “Internet of Things”, it has become increasingly commonplace to develop real-time alerting and monitoring systems capable of mitigating the risk of mechanical failures via human intervention. Indeed, there are several challenges posed by such systems. First, the multiple time series generated in these scenarios often exhibit non-linear/non-Gaussian temporal dependencies. Next, in order for alerting mechanisms to be effective, there is a need to develop statistical methods capable of assessing the impact of exogenous interventions (alerts) in the form of spurring prompt human intervention. Finally, because there are likely thousands of such sensors involved in these types of IoT programs, any modeling paradigm must be readily scalable. In this paper, we propose a latent autoregressive Student-t process model to accomplish all three of these goals.
Intervention analysis is a well-established time series approach. An integral component of a successful intervention analysis is the use of a suitable time series model to learn the pre-intervention time series behavior. Linear models have traditionally been employed for this purpose. These include Gaussian autoregressive moving average (ARIMA) models [1,2], Gaussian dynamic linear models (DLM) [3,4], or Bayesian structural time series (BSTS) models [5,6].
In the Bayesian framework, the pre-intervention model is used to derive the joint posterior predictive distribution of post-intervention observations. Samples from this posterior predictive distribution serve as counterfactuals to the post-intervention observations. By measuring the difference between these forecast values and the observed post-intervention data, a semi-parametric posterior estimate for the impact of the intervention is constructed. Due to its simplicity and versatility, this methodology described in [6] has been employed across a wide array of disciplines. For instance, Ref. [7] adapted the BSTS model to evaluate the impact of rebates for turf removal on water consumption across many households. In the public health context, Ref. [8] evaluated the impact of bariatric surgery (used for weight loss) on health care utilization in Germany. Another interesting example is given by [9], who used the impact framework in conjunction with a variety of climate time series to assess whether an anomalous climate change event can be credibly linked to the collapse of several Bronze age civilizations in the Mediterranean region.
For intervention impact analysis to be successful, it is critical that the underlying time series model adequately captures the pre-intervention time series dynamics. Traditional linear, Gaussian models can be inadequate for capturing the dynamics of time series that exhibit complex non-linear and/or long-term dependencies, and/or non-Gaussian behavior. As a consequence, the counterfactual forecasts may be inadequate to give a useful assessment of the impact. For example, multiple time series generated by “Internet of Things” (IoT) sensors often exhibit nonlinear temporal dependence that cannot be easily modeled by BSTS models. Successful intervention analysis of such time series requires sophisticated models of pre-intervention data such as those described in this paper.
For intervention impact analysis in multiple IoT time series, Ref. [10] proposed a Gaussian process (GP) prior regression model [11] with a covariance kernel tailored for these series as the underlying predictive model. This model is effective in that it can incorporate typical time series behavior such as seasonality and local linear trends but also non-linear time trends and dependencies between the target variable and exogenous predictors. While this model was demonstrated to be effective at capturing a wide array of time series dynamics, it does not directly incorporate information from past values of the time series. In addition, the GP prior can fail for time series that exhibit a heavy-tailed behavior.
With this in mind, we propose an extension to the latent Gaussian process time series model presented in [12] in which we replace the latent GP with that of a Student-t process (TP) [13,14] that we then use as the underlying model for a time series impact analysis using both simulated and real-world time series data from the IoT domain. In addition to going beyond the GP functional prior, our model had the added versatility in that it can accommodate arbitrary likelihoods, allowing for heavy-tailed observations to be modeled in a more robust way.
Note that because we require a model that allows for posterior sampling, mixture autoregressive models such as the one proposed in [15] are not suitable for solving our problem. In addition, the mixture autoregressive assumptions described in [15] may be unsuitable to describe the data generating process of IoT time series data.
The format of this paper is as follows: Section 2 describes the IoT temperature sensors and their associated data streams; Section 3 gives an overview of GP regression, including descriptions of existing GP regression models tailored for time series. Section 4 details the requisite background information regarding TP regression; we also introduce our autoregressive TP model. Section 5 compares the performance of our proposed model with existing methods on a time series intervention analysis problem. Section 6 summarizes our findings and discusses potential avenues for future research.

2. Background

One domain in which IoT sensor technology has been successfully deployed is the insurance context. For example, insurers have used IoT temperature sensors part of freeze loss prevention programs. The goal of these programs is to reduce insurance losses due to water-pipe burst (freeze loss) by providing temperature sensors to end users (insured property owners) to be installed in areas with a high risk of water pipe burst. In an ideal scenario, losses are prevented (or at least mitigated) by sending real-time alerts to customers to promptly take remedial action (i.e., raising temperatures to a safe level) in the event of dangerously low temperatures within the monitored space.
In this paper, we apply our methods to sensor temperature readings that are relayed in real time at a 15 min frequency. For each sensor stream, a decision rule algorithm combines information from (a) recent sensor readings and (b) outdoor temperatures from nearby weather stations to alert end users of potential imminent freeze loss. After receiving an alert, an end user is expected to take remedial action within 12 h of receiving the alert. Due to the program’s scale, it is impossible to directly verify whether a customer took corrective action. Therefore, methods must be developed that can infer customer action only from the observed post-alert sensor streams themselves. To that end, we employ the causal impact methodology proposed by [6], which uses a counter-factual forecasting model to infer whether the alert system is effective in instigating customer action for a given alert event. More details, as well as an example of an alert event, can be found in Section 5.

3. Review of Gaussian Process Regression Models

We review the basic Gaussian process (GP) regression and its extensions for analyzing time series. Section 3.1 summarizes standard GP and regression techniques. Section 3.2 reviews the current literature on using nonlinear auto-regressions with exogenous predictors (NARX) in conjunction with GP regression (GP-NARX) models. In Section 3.3, we give a detailed review of the GP-RLARX model, a robust time series regression model that uses an auto-regressive latent state whose transition functions follow a GP prior, and the observations follow a normal distribution with time-varying scale.

3.1. GP and Sparse GP Regression

Gaussian processes (GP) are a set of methods that generalize the multivariate normal to infinite dimensions. Not only do GPs have a flexible non-parametric form, GP methods are also attractive because they offer principled uncertainty quantification via a predictive distribution. For supervised learning problems, GP prior models have the distinct advantage of allowing the user to automatically learn the correct functional form linking the input space X to the output space Y . This is achieved by specifying a prior over the distribution of functions, which then allows the derivation of the posterior distribution over these functions once data have been observed. Throughout this paper, we use the notation f ( · ) GP ( μ ( · ) , k ( · , · ) ) to denote a generic GP prior with mean function μ ( · ) and covariance function k ( · , · ) . Let y i R denote an observed response and x i and x j be two distinct input vectors in R p . The GP regression model is defined as
y i = f ( x i ) + ε i , ε i N ( 0 , σ y 2 ) ,
f ( · ) GP ( μ ( · ) , k ( · , · ) ) ,
where the mean function and the covariance kernel are, respectively,
μ x i = E [ f ( x i ) ] ,
k x i , x j = E [ ( f ( x i ) μ ( x i ) ) ( f ( x j ) μ ( x j ) ) ] ,
where X = ( x 1 , , x n ) are fixed predictors, and f = ( f ( x 1 ) , , f ( x n ) ) is an n-dimensional vector. Given observed responses y = ( y 1 , , y n ) R n , it follows that the Gaussian process in (2) has a multivariate normal distribution with mean vector μ = ( μ x 1 , , μ x n ) and variance–covariance matrix K xx = { k x i , x j } R n × n .
Standard GP regression provides convenient closed forms for posterior inference. The posterior distribution p f ( x i ) | X , y is Gaussian with mean and variance, respectively, given by
μ i = μ x i + k i K xx + σ 2 I n 1 y μ , k i = k x i , x i k i K xx + σ 2 I n 1 k i ,
where k i = ( k x i , x j , j = 1 , , n ) .
Given a new set of inputs X R p × m , the joint distribution of the observed response y and the GP prior f ( · ) and the posterior predictive evaluated at new input set X are
y f N n + m μ f μ , K xx + σ 2 I n K x , K , x K , ,
Thus, we have the posterior prediction density for f as
p f | X , X , y = N m μ | x , Σ | x ,
where
μ | x = μ + K x K xx + σ 2 I n 1 ( y μ f ) ,
Σ | x = K K x K xx + σ 2 I n 1 K x .
One notable drawback of the GP model is its difficulty in scaling to large datasets due to inversion of the kernel covariance matrix K R n × n (which has O ( n 3 ) time complexity). Sparse GP methods [16,17,18,19] remedy this issue and reduce the computational cost of fitting GP models to long time series.
For m < < n , they approximate the GP posterior in (5) by learning inducing inputs  Z = { z 1 , , z m } X , which lead to a finite set of inducing variables U = { u 1 , , u m } with u i = f ( z i ) , where f ( · ) was defined in (2). Let u = ( f ( z 1 ) , , f ( z m ) ) . Their joint distribution is
f u N n + m μ x μ z , K xx K xz K zx K zz ,
and using properties of the multivariate normal distribution,
p ( f | u ) = N n μ x + K xz K zz 1 ( u μ z ) , K xx K xz K zz 1 K zx ,
p ( u ) = N m ( μ z , K zz ) .
The conditional distribution in (11) now only requires inversion of the m × m matrix K zz instead of the n × n matrix K xx . The target is the n-dimensional marginal distribution of f given by
p ( f ) = p ( f | u ) p ( u ) d u .
To facilitate this computation, we replace p ( u ) given in (12) by its variational approximation
q ( u ) = N m ( m z , Σ zz ) ,
which in turn leads to approximating p ( f ) by q ( f ) . Again, using properties of the multivariate normal distribution, q ( f ) is given by
q ( f ) = p ( f | u ) q ( u ) d u . = N n μ x + K xz K zz 1 m z μ z , K xx + K xz K zz 1 Σ zz K zz K zz 1 K zx .
Furthermore, given a new set of test inputs X , the approximate posterior predictive density for f has form
p ( f | y ) p ( f | f , u ) p ( f , u | y ) q ( u ) d f d u = p ( f | u ) q ( u ) d u .
The integral in (16) is tractable and takes a form analogous to that in (15).
Given observed data y R n , the variational inference approach for approximating the exact posterior of f in sparse GP regression reduces to minimizing the evidence lower bound (ELBO) [20]
log p y E q u E p ( f | u ) log p ( y | f ) KL q ( u )     p ( u ) = E q ( f ) log p ( y | f ) KL q ( u )     p ( u ) ,
where q ( f ) is defined as in (15). For more details on the ELBO optimization procedure, refer to Section 4 of [19].

3.2. GP-NARX Models

Quite often, we seek to model time series data as a function of exogenous inputs and an autoregressive function of past observations. A class of GP models incorporating both non-linear autoregressive structure and exogenous predictors (typically abbreviated as GP-NARX) offer a principled way to propagate uncertainty when forecasting.
An early example comes from [21], who proposed a GP-NARX model in which the inputs at time t consist of the past L lags of the response time series y t , as well as available exogenous inputs c t R n c . The input vector at time t in the GP-NARX model is the tuple ( x t , c t ) , where x t = ( y t 1 , , y t L ) . Since y t 1 , , y t L are known during the training phase of model fitting, estimation is performed using maximum likelihood or maximum a posteriori methods.
Although training the GP-NARX model is similar to training the GP regression model and is straightforward, predicting future values is more challenging. Suppose our goal is to generate k-step ahead forecasts for future responses y T + 1 , , y T + k given training data y = ( y 1 , , y T ) R T . Because all or part of x t is unobserved in the holdout period (since it involves y T + j 1 , , y T + j L , j = 1 , , k ) and is an uncertain input during forecasting, direct application of (8) would fail to take into account this inherent uncertainty.
Ref. [21] deals with the uncertain inputs issue by assuming that for each j = 1 , , k , x T + j N L μ x T + j , Σ x T + j . Then, given the training data D = { c t , x t , y t } t = L x + 1 T , and a set of exogenous inputs c T + j , j = 1 , , k , the posterior predictive distribution for y T + j is
p ( f T + j ) = p ( f T + j | x T + j , w T + j , D ) p ( x T + j ) d x T + j .
Although there is no closed form for (18), the moments of the posterior predictive distribution can be obtained via Monte Carlo sampling or one of several different approximation methods [22].
Recently, these ideas have been extended to sparse GP models. Ref. [23] developed an approximate uncertainty propagation approach to be used alongside the sparse pseudo-input GP regression method, known as the Fully Independent Sparse Training Conditional (FITC) model [16]. Ref. [24] derived uncertainty propagation methods for a wide variety of competing sparse GP methods, and [25] extended sparse GP-NARX time series modeling to an online setting.

3.3. GP-RLARX Models

Ref. [12] proposed an alternative to the GP-NARX model. Their GP-RLARX model assumes a latent autoregressive structure for the lagged inputs, leading to the description below:
y t = x t + ε t ( y )
x t = f x t 1 , , x t L x , c t 1 , , c t L c + ε t ( x )
f ( · ) = GP μ ( · ) , k ( · , · )
ε t ( x ) N ( ε t ( x ) | 0 , σ x 2 )
ε t ( y ) N ε t ( y ) | 0 , τ t
τ t IG τ t | α , β .
where c t 1 , , c t L c are lagged exogenous inputs with maximum lag L c , and x t 1 , , x t L x are the lagged latent states with maximal lag L x .
This framework is reminiscent of a state-space model in which (19a) denotes the observation equation at time t and (19b) is the corresponding state equation, where x t is an autoregressive function of the preceding L x lags of the latent state.
To facilitate inference in the GP-RLARX model, Ref. [12] used a sparse variational approximation similar to that described in Section 3.1, where u R m are inducing points generated by evaluating the GP prior over pseudo-inputs Z = { z 1 , , z m } , z i R L x + L c , i = 1 , , m . It follows that p ( u ) = N ( u | 0 , K zz ) , where K zz denotes the kernel covariance matrix evaluated over the pseudo-inputs Z . Then, the GP-RLARX hierarchical model takes the form
p ( u ) = N ( u | 0 , K zz )
p f t | u , x = N f t | [ a x ] t , [ Σ xx ] t t
p x t = N x t | μ t , λ t , t { 1 , , L x } ( initial state )
p x t | f t = N x t | f t , σ x 2 , t { L x + 1 , , T }
p ( τ t ) = IG ( τ t | α , β ) , t { L x + 1 , , T }
p y t | x t , τ t = N y t | x t , τ t , t { L x + 1 , , T } ,
where a x = K xz K zz 1 u , Σ xx = K xx K xz K zz 1 K zx , and f t = f ( x t 1 , , x t L x , w t 1 , , w t L w ) , with f ( · ) GP ( 0 , k ( · , · ) ) . For brevity, we denote x ˜ t = x t 1 , , x t L x , w t 1 , , w t L w in the remainder of the paper. The joint distribution is succinctly expressed as
p y , x , f , u , τ = t = L x + 1 T p ( y t | x t , τ t ) p ( x t | f t ) p ( τ t ) p ( f t | u , x ˜ t ) p ( u ) t = 1 L x p ( x t ) .
Ref. [12] used a variational inference approach [20] to estimate the latent variables, adopting the variational approximation
q x , τ , f , u = t = 1 T q ( x t ) t = L x + 1 T q ( τ t ) t = L x + 1 T p ( f t | u , x ˜ t ) q ( u ) ,
where q ( x t ) = N ( x t | μ t ( x ) , λ t ( x ) ) , q ( τ t ) = IG ( τ t | a t , b t ) , p ( f t | u , x ˜ t ) = N ( f | [ a x ] t , [ Σ xx ] t t ) , and q ( u ) = N ( u | m z , Σ zz ) . In this framework, μ t ( x ) , λ t ( x ) , m z , Σ zz , a t , b t are variational parameters that are optimized according to a variational inference strategy, similar to that found in [26]. We refer readers to Section 5.1 of [12] for more details, including the exact expression of the ELBO.
Table 1 summarizes the basic characteristics of the GP-NARX and GP-RLARX models as well as their main pros and cons.

4. Proposed Methods: Autoregressive TP Models

Recently, there has been growing interest in extending Gaussian process models to other types of elliptical process models, with particular emphasis on Student-t process models (TP) [13,14]. In this section, we present extensions to both the GP-NARX and GP-RLARX models by replacing the GP functional prior by a Student-t process prior. Section 4.1 gives an overview of the Student-t process as well as a recently developed method for sparse Student-t processes. Next, Section 4.2 describes the TP-NARX model as an extension of the GP-NARX model. Finally, Section 4.3 gives details of the proposed extension of the GP-RLARX model to the TP-RLARX model. To the authors’ best knowledge, there has been no research on the development or implementation of a NARX model or RLARX model using TP priors. These are useful additions to the literature, and they are discussed in the following sections.

4.1. Review and Notation for Student-t Processes

We say that f R n follows a multivariate Student’s-t distribution with degrees of freedom v R + , location μ R n , and positive definite scale matrix K R n × n if and only if it has the following density:
p ( f ) = Γ ( ( v + n ) / 2 ) ( v π ) n / 2 Γ ( v / 2 ) | K | 1 / 2 1 + 1 v ( f μ ) K 1 ( f μ ) v + n 2 ,
which can be written succinctly as f T n ( v , μ , K ) . Now, suppose that we have f R n and f R m with joint density
f f T m + n v , μ f μ , K ff K f K f K .
By properties of the multivariate Student-t distribution, we have
f T n v , μ f , K ff and f | f T m v + n , μ ˜ , v + β 2 v + n 2 K ˜
where
μ ˜ = μ + K f K ff 1 f μ f , β = f μ f K ff 1 f μ f , K ˜ = K K f K ff 1 K f .
Finally, we say that f ( · ) follows a Student-t process on X , denoted TP v , μ ( · ) , k ( · , · ) , where v > 2 denotes the degrees of freedom, μ ( · ) R denotes the mean function, and k ( · , · ) R is the covariance function, if for any finite collection of function values, we have f = f ( x 1 ) , , f ( x n ) T n v , μ , K .
While less popular than GP models, Student-t processes have still been employed in a number of contexts. For instance, Ref. [27] proposed an online time series anomaly detection algorithm that employs TP regression to simultaneously learn time series dynamics in the presence of heavy-tailed noise and identify anomalous events. Another example comes from [28], in which the authors proposed a Student-t process latent variable model with the goal of identifying a low-dimensional set of latent factors capable of explaining variation among non-Gaussian financial time series. Ref. [29] employed Student-t processes in the development of degradation models used to analyze the lifetime reliability of manufactured products.
Recently, Ref. [30] proposed a variational inference approach for sparse Student-t processes, similar to the sparse GP methods described in Section 3.1. Suppose that we have r IG ( α , β ) . Now, if we let Z = z 1 , , z m X with m < < n denote a set of inducing inputs, then we can define a corresponding inducing variables u | r = f ( z 1 ) , , f ( z m ) N m ( 0 , r K zz ) . It follows that the joint density of f , u , r is
p f , f z , r = p ( f | u , r ) p ( u | r ) p ( r ) = N n K xz K zz 1 u , r ( K xx K x z K zz 1 K zx ) N m 0 , r K zz IG ( α , β ) .
The goal is to develop an approximate distribution q f , u , r capable of accurately approximating p ( f , u , r ) . Ref. [30] proposed the following variational distribution:
q ( f , u , r ) = p ( f | u , r ) q ( u | r ) q ( r ) = N n K xz K zz 1 u , r ( K xx K xz K zz 1 K zx ) N m ( m z , r Σ zz ) IG ( a , b ) .
It follows that the evidence lower bound (ELBO) is
log p ( y ) E q ( f , u , r ) log p ( y | f , u , r ) KL q ( f , u , r ) p ( f , u , r )
where KL ( · ) denotes the KL divergence between the respective joint densities. The KL term can be re-expressed as
KL q ( f , u , r ) p ( f , u , r ) = q ( u , r ) log q ( u , r ) p ( u , r ) d u d r
since the p ( f , u , r ) terms are canceled out. Furthermore, we can evaluate the likelihood component as
E q ( f , u , r ) log p ( y | f , u , r ) = q ( f , u , r ) log p ( y | f , u , r ) d f d u d r = q ( f ) log p ( y | f ) d f
where
q ( f ) = p ( f | u , r ) q ( u | r ) q ( r ) d u d r
which can be expressed as
q ( f ) = T n 2 a , K xz K zz 1 m z , b a ( K xx K xz K zz 1 K zx + K xz K zz 1 Σ K zz 1 K zx ) .
Assuming that we have a set of test inputs X , we can attain the approximate predictive distribution for f as
p ( f | y ) = p ( f | u , r ) q ( u | r ) q ( r ) d u d r = N n K z K zz 1 u , r ( K K z K zz 1 K z ) N m ( m z , r Σ zz ) IG ( a , b ) d u d r = T n 2 a , K z K zz 1 m z , b a ( K K z K zz 1 K z + K z K zz 1 Σ zz K zz 1 K z )
which, as we can see, is structurally quite similar to its sparse GP counterpart described in (16).

4.2. TP-NARX Model

Our first proposed model is the TP-NARX model, which is a straightforward extension of the GP-NARX model (see Section 3.2) obtained by replacing the GP functional prior with that of a t-process prior defined in Section 4.1. Further, during the forecasting phase, rather than assuming x T + j from (18) follows an approximately multivariate normal distribution, we assume instead that x T + j a p p r o x . T L v , μ x T + j , Σ x T + j , where v > 2 denotes the degrees of freedom for the multivariate Student-t distribution. A Monte Carlo sampling approach is used to approximate the integral in (18).

4.3. TP-RLARX Model

For our second proposed model, we extend the GP-RLARX model by replacing the Gaussian process prior with a Student-t process prior. Similar to the GP-RLARX model’s sparse approximation approach, we employ a sparse variational Student-t process (SVTP) framework presented in [30] to act as the functional prior over our state transition. Therefore, the TP-RLARX generative model is
p ( r ) = IG ( r | α , β )
p ( u | r ) = N m u | 0 , r K zz
p f t | u , x , r = N f t | [ a x ] t , [ r Σ xx ] t t
p ( x t ) = N ( x t | μ t , λ t ) , t { 1 , , L x }
p ( x t | f t ) = N ( x t | f t , σ x 2 ) , t { L x + 1 , , T }
p ( τ t ) = IG ( τ t | κ , θ ) , t { L x + 1 , , T }
p ( y t | x t , τ ) = N ( y t | x t , τ t ) ,
where a x and Σ xx are the same as in Section 3.3, whereas f ( · ) is now marginally distributed as TP 2 α , 0 , β α k ( · , · ) . We employ a variational inference approach to approximate the generative model described in (32). The variational distribution has form
q ( f , u , x , τ , r ) = q ( x ) q ( τ ) t = L x + 1 T p ( f t | u , r , x ˜ t ) q ( u | r ) q ( r ) ,
where each term is identical to that found in Section 3.3 with the exception of q ( τ t ) = IG ( a t , b t ) , q u | r = N m ( m z , r Σ zz ) , t = L x + 1 T p ( f t | u , r , x ˜ t ) = t = L x + 1 T N ( f t | [ a x ] t , r [ Σ xx ] t t ) , and the additional variational distribution q ( r ) = IG ( r | γ , σ ) . The evidence lower bound (ELBO) for this model takes form
log p ( y ) E q ( τ , x , f , u , r ) log p ( y , τ , x , f , u , r ) q ( τ , x , f , u , r ) = E q ( τ , x , f , u , r ) log p ( y | τ , x , f , u , r ) + E q ( τ , x , f , u , r ) log p ( τ , x , f , u , r ) q ( τ , x , f , u , r ) = E q ( τ , x , f , u , r ) log p ( y | τ , x , f , u , r ) KL q ( τ , x , f , u , r ) p ( τ , x , f , u , r ) = E q ( τ , x , f , u , r ) log p ( y | τ , x , f , u , r ) KL q ( τ ) p ( τ ) KL q ( x ) p ( x | f ) KL q ( u , r ) p ( u , r ) .
With the exception of the additional scale parameter for the Student-t process, the derivation of the ELBO terms follows similarly to [12]. For the likelihood term, we have
E q ( · ) log p ( y | τ , x , f , u , r ) = t = L x + 1 T E q ( x t , τ ) log p ( y t | x t , τ ) = t = L x + 1 T E q ( x t , τ ) 1 2 log 2 π log τ ( y t x t ) 2 τ = t = L x + 1 T E q ( x t ) 1 2 log 2 π + E q ( τ ) log τ E q ( τ ) ( y t x t ) 2 2 τ t = L x + 1 T a t + log ( b t Γ ( a t ) ) ( a t + 1 ) ψ ( a t ) a t b t E q ( x t ) ( y t x t ) 2 t = L x + 1 T a t + log ( b t Γ ( a t ) ) ( a t + 1 ) ψ ( a t ) a t b t y t 2 2 μ t ( x ) y t + λ t ( x ) + ( μ t ( x ) ) 2
where ψ ( · ) denotes the digamma function. Next, for the KL divergence between q ( τ ) and p ( τ ) , we have
KL q ( τ )     p ( τ ) = t = 1 T ( a t κ ) ψ ( a t ) log Γ ( a t ) + log Γ ( κ ) + κ log b t θ log θ ) + a t θ b t b t .
For the KL divergence of the latent states, we have
KL q ( x )     p ( x | f ) ) = t = 1 L x KL q ( x t )     p ( x t ) ) + t = L x + 1 T KL q ( x t ) p ( x t | f t ) = 1 2 t = 1 L x λ t ( x ) + ( μ t μ t ( x ) ) 2 λ t log λ t ( x ) λ t 1 + t = L x + 1 T E q ( x t , τ t , f t , u , r ) log q ( x t ) p ( x t | f t ) = 1 2 t = 1 L x λ t ( x ) + ( μ t μ t ( x ) ) 2 λ t log λ t ( x ) λ t 1 + t = L x + 1 T 1 2 log 2 π λ t ( x ) + 1 t = L x + 1 T q ( x ) q ( u , r ) p ( f t | x ˜ t , u , r ) log p ( x t | f t ) d x d f d u d r .
Closed forms are not available for the third term in KL q ( x ) p ( x | f ) for most kernel configurations; therefore, we apply employ a black box variational inference procedure as described in [31].
Finally, from [30], we have
KL q ( u , r ) p ( u , r ) = q ( u , r ) log q ( u , r ) p ( u , r ) d u d r = γ 2 σ m z K zz 1 m z + 1 2 T r K zz 1 Σ zz + 1 2 log | K zz | | Σ zz | m 2 + α log σ β log Γ ( γ ) Γ ( α ) + ( γ α ) ψ ( γ ) + ( β σ ) γ σ .
where T r ( · ) denotes the matrix trace. Model fitting is done using the Python library Pyro [32], which is dedicated to probabilistic programming with a particular emphasis on BBVI and SVI methods.
Table 2 summarizes our contributions discussed in this section as well as our findings that are more thoroughly presented in Section 5.2 and Section 5.3.

5. Application: IoT Temperature Time Series

We apply the TP-NARX and TP-RLARX models to perform impact analysis on temperature time series. First, Section 5.1 describes the IoT sensor data and the objective of the intervention impact analysis. Section 5.2 shows the performance for each model on a number of different forecasting metrics and shows some example forecasts. Finally, Section 5.3 presents detailed results from the impact analysis and a thorough interpretation of the findings. We compare the TP-NARX and RLARX approaches with their Gaussian process counterparts.

5.1. Data Description

We analyze data spanning the time period from 1 October 2020 to 25 February 2021 on a sample of N = 50 sensors that are distributed across the contiguous US, with a concentration in the Upper Midwest, Southeast, and the East coast. A sensor measures the internal room temperature at 15 min intervals. Figure 1 shows an example of a sensor’s temperature stream and when an alert was sent (the alert time is denoted by the vertical black line). We see that, just prior to the alert, internal temperatures plunge, while the external temperature is at a low level.
In order for this program to be effective, it is imperative that there is accurate information on whether a customer actually takes meaningful action in a timely manner after receiving an alert in order to avoid freeze loss. Although ideally, this information would be directly obtained from the customer, this is rarely possible in practice. As such, the effectiveness of the alert must be ascertained purely based on the observed pre-alert and post-alert time series. We refer to this analysis as intervention impact analysis. Essentially, a customer action has likely occurred if there is a large increase in post-alert temperatures that are incongruous with forecasts generated by a suitable time series model trained on pre-alert temperatures.
Since many of the IoT temperature time series exhibit non-linear behavior, methods such as Bayesian structural time series are inadequate for intervention impact analysis. Alternatively, the NARX and RLARX models are capable of learning non-linear behavior directly from the data, thus, we will use them in substitution of BSTS models for our impact analysis. Performance will be compared to both GP-NARX and GP-RLARX models.

5.2. Results

Once again, we apply the TP-NARX and TP-RLARX models to each alert event in our dataset and then compare the results to the GP-NARX and GP-RLARX models. For each model, we use one of four covariance kernels: radial basis function (RBF), Matérn 3 / 2 , Matérn 5 / 2 , or Ornstein–Uhlenbeck (OU). Outdoor (external) temperature is the only exogenous predictor used in the experiment.
Figure 2 and Figure 3 depict the forecasts for the GP-RLARX and TP-RLARX models with RBF kernel on the same alert event. As expected, the point estimates (red lines) are similar for both models; however, the predictive interval (red shaded areas) for TP-RLARX is slightly wider. For the results depicted in Figure 2 and Figure 3, the average difference between the 0.025 quantile and 0.975 quantile for GPRLARX is 11.76 , whereas the average difference between the 0.025 quantile and 0.975 quantile for TPRLARX is 15.703 . The wider predictive interval of TP-RLARX means that our decision on whether a customer has taken appreciable action will be more conservative. Firms wishing to be more conservative in assessing customer behavior or those with noisier time series data might prefer the TP-RLARX model.
Furthermore, Table 3 shows the root mean squared error (RMSE), symmetric mean absolute percentage error (sMAPE), and continuous ranked probability score (CRPS) [33] for each combination of model and kernel. Each metric is calculated by averaging the metrics for each alert event within each model combination. In addition, Table 3 also gives CPU times for each model. Overall, we find that the TP-RLARX model using the Ornstein–Uhlenbeck kernel provides the best average RMSE, followed by the TP-NARX using the Matérn 5 / 2 kernel. Furthermore, we find that the GP-RLARX models give considerably worse performance than both TP-NARX and TP-RLARX models, regardless of kernel.

5.3. Intervention Impact Analysis and Interpretation

For each alert event in this experiment, we are given a label indicating whether a panel of domain experts believed the customer took corrective action based on visual inspection of time series plots similar to Figure 1. If a majority of experts thought that action had been taken, an alert was labeled as “Action”, indicating that appreciable customer action likely occurred; otherwise, it was labeled as “No Action”. In the absence of observed labels, this is the closest approximation to the ground truth that we have and constitutes the benchmark we will compare against.
Due to this inherently biased labeling scheme, the labels we have are more aptly described as “pseudo-labels” in that they do not represent an objective truth. For alerts that experts labeled as “No Action”, we find that there is a high degree of correspondence between the model and expert labels, as indicated in Table 4. This result is unsurprising, as it is quite obvious to both experts and the models when no action has been taken because the observed internal temperature will remain flat or even decrease after the alert. Conversely, for instances labeled “Action” by experts, every model is likely to disagree, as shown in Table 4. This is attributable to the fact that whenever post-alert temperatures experience a sharp positive increase, human labelers are biased towards labeling it an action, regardless of the historical time series behavior or its correlation with the exogenous predictor. For example, Figure 4 shows an alert event labeled as “Action” by domain experts; however, there is clearly a strong, positive correlation between the internal and external temperatures that appears to instigate the increase in post-alert internal temperature. Furthermore, the post-alert increase in the response variable is quite modest and is congruent with pre-intervention temperature levels. Unsurprisingly, every possible combination of model and kernel tested returns a decision of “No Action” for this alert.
Indeed, the results of the impact analysis are congruent with our goals in that the models are far more conservative in assessing customer intervention. Expert opinion is that customers typically do not take action, so it is desirable to have models that require a large shift in post-alert behavior in order to declare an alert event as having been addressed. To that end, the RLARX models yield the best results in that they are both highly unwilling to assume an intervention has been successful without significant evidence.

6. Conclusions

In this paper, we have proposed extensions to both the GP-NARX and GP-RLARX models by replacing the GP functional prior with a Student-t process prior. The goal is to use these models as underlying forecasting models for intervention impact analysis of IoT temperature data streams. We have demonstrated that the TP-NARX and TP-RLARX models provide improved forecasting accuracy relative to the GP-NARX and GP-RLARX models. Furthermore, we have shown that the TP-RLARX model has the desirable trait of being more conservative, relative to both the GP models and human labelers, in declaring that an intervention was effective in instigating appreciable customer action in Section 5. As such, the TP-RLARX model is preferable in impact analyses where the ground truth is not necessarily known and there is a high cost associated with false positives.
The analysis performed here opens several avenues for future research. First, it would be interesting to apply the same Student-t process extension to Gaussian process state space models, such as those presented in [34,35,36], and compare their performance with models presented in this research. Furthermore, a comparison of our model with parametric non-linear time series models, such as the deep state space framework proposed in [37], would also be a worthwhile endeavor. The authors intend to explore these ideas in future research.

Author Contributions

Conceptualization, P.T., N.R. and N.L.; methodology, P.T., N.R. and N.L.; writing, reviewing and editing, P.T., N.R. and S.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors gratefully acknolwedge the support provided by Hartford Steam Boiler in providing the IoT sensor data used in this paper.

Conflicts of Interest

Patrick Toman and Nathan Lally were employed by the company Hartford Steam Boiler. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Brockwell, P.J.; Davis, R.A. Introduction to Time Series and Forecasting; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  2. Abraham, B. Intervention analysis and multiple time series. Biometrika 1980, 67, 73–78. [Google Scholar] [CrossRef]
  3. Shumway, R.H.; Stoffer, D.S.; Stoffer, D.S. Time Series Analysis and its Applications; Springer: Berlin/Heidelberg, Germany, 2000; Volume 3. [Google Scholar]
  4. Van den Brakel, J.; Roels, J. Intervention analysis with state-space models to estimate discontinuities due to a survey redesign. Ann. Appl. Stat. 2010, 4, 1105–1138. [Google Scholar] [CrossRef]
  5. Scott, S.L.; Varian, H.R. Predicting the present with Bayesian structural time series. Int. J. Math. Model. Numer. Optim. 2014, 5, 4–23. [Google Scholar] [CrossRef]
  6. Brodersen, K.H.; Gallusser, F.; Koehler, J.; Remy, N.; Scott, S.L. Inferring causal impact using Bayesian structural time-series models. Ann. Appl. Stat. 2015, 9, 247–274. [Google Scholar] [CrossRef]
  7. Schmitt, E.; Tull, C.; Atwater, P. Extending Bayesian structural time-series estimates of causal impact to many-household conservation initiatives. Ann. Appl. Stat. 2018, 12, 2517–2539. [Google Scholar] [CrossRef]
  8. Kurz, C.F.; Rehm, M.; Holle, R.; Teuner, C.; Laxy, M.; Schwarzkopf, L. The effect of bariatric surgery on health care costs: A synthetic control approach using Bayesian structural time series. Health Econ. 2019, 28, 1293–1307. [Google Scholar] [CrossRef]
  9. Ön, Z.B.; Greaves, A.; Akçer-Ön, S.; Özeren, M.S. A Bayesian test for the 4.2 ka BP abrupt climatic change event in southeast Europe and southwest Asia using structural time series analysis of paleoclimate data. Clim. Chang. 2021, 165, 1–19. [Google Scholar] [CrossRef]
  10. Toman, P.; Soliman, A.; Ravishanker, N.; Rajasekaran, S.; Lally, N.; D’Addeo, H. Understanding insured behavior through causal analysis of IoT streams. In Proceedings of the 2023 6th International Conference on Data Mining and Knowledge Discovery (DMKD 2023), Chongqing, China, 24–26 June 2023. [Google Scholar]
  11. Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2. [Google Scholar]
  12. Mattos, C.L.C.; Damianou, A.; Barreto, G.A.; Lawrence, N.D. Latent autoregressive Gaussian processes models for robust system identification. IFAC-PapersOnLine 2016, 49, 1121–1126. [Google Scholar] [CrossRef]
  13. Shah, A.; Wilson, A.; Ghahramani, Z. Student-t processes as alternatives to Gaussian processes. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Reykjavik, Iceland, 22–25 April 2014; pp. 877–885. [Google Scholar]
  14. Solin, A.; Särkkä, S. State space methods for efficient inference in Student-t process regression. In Proceedings of the Artificial Intelligence and Statistics, PMLR, San Diego, CA, USA, 9–12 May 2015; pp. 885–893. [Google Scholar]
  15. Meitz, M.; Preve, D.; Saikkonen, P. A mixture autoregressive model based on Student’st—Distribution. Commun.-Stat.-Theory Methods 2023, 52, 499–515. [Google Scholar] [CrossRef]
  16. Snelson, E.; Ghahramani, Z. Sparse Gaussian processes using pseudo-inputs. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005; Volume 18. [Google Scholar]
  17. Titsias, M. Variational learning of inducing variables in sparse Gaussian processes. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Clearwater Beach, FL, USA, 16–18 April 2009; pp. 567–574. [Google Scholar]
  18. Hensman, J.; Fusi, N.; Lawrence, N.D. Gaussian processes for big data. arXiv 2013, arXiv:1309.6835. [Google Scholar]
  19. Hensman, J.; Matthews, A.; Ghahramani, Z. Scalable variational Gaussian process classification. In Proceedings of the Artificial Intelligence and Statistics, PMLR, San Diego, CA, USA, 9–12 May 2015; pp. 351–360. [Google Scholar]
  20. Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
  21. Girard, A.; Rasmussen, C.E.; Quinonero-Candela, J.; Murray-Smith, R.; Winther, O.; Larsen, J. Multiple-step ahead prediction for non linear dynamic systems–a Gaussian process treatment with propagation of the uncertainty. Adv. Neural Inf. Process. Syst. 2002, 15, 529–536. [Google Scholar]
  22. Girard, A. Approximate Methods for Propagation of Uncertainty with Gaussian Process Models; University of Glasgow (United Kingdom): Glasgow, UK, 2004. [Google Scholar]
  23. Groot, P.; Lucas, P.; Bosch, P. Multiple-step time series forecasting with sparse gaussian processes. In Proceedings of the 23rd Benelux Conference on Artificial Intelligence, Gent, Belgium, 3–4 November 2011. [Google Scholar]
  24. Gutjahr, T.; Ulmer, H.; Ament, C. Sparse Gaussian processes with uncertain inputs for multi-step ahead prediction. IFAC Proc. Vol. 2012, 45, 107–112. [Google Scholar] [CrossRef]
  25. Bijl, H.; Schön, T.B.; van Wingerden, J.W.; Verhaegen, M. System identification through online sparse Gaussian process regression with input noise. IFAC J. Syst. Control 2017, 2, 1–11. [Google Scholar] [CrossRef]
  26. Titsias, M.; Lawrence, N.D. Bayesian Gaussian process latent variable model. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; JMLR Workshop and Conference Proceedings. pp. 844–851. [Google Scholar]
  27. Xu, Z.; Kersting, K.; Von Ritter, L. Stochastic Online Anomaly Analysis for Streaming Time Series. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 3189–3195. [Google Scholar]
  28. Uchiyama, Y.; Nakagawa, K. TPLVM: Portfolio Construction by Student’st-Process Latent Variable Model. Mathematics 2020, 8, 449. [Google Scholar] [CrossRef]
  29. Peng, C.Y.; Cheng, Y.S. Student-t processes for degradation analysis. Technometrics 2020, 62, 223–235. [Google Scholar] [CrossRef]
  30. Lee, H.; Yun, E.; Yang, H.; Lee, J. Scale mixtures of neural network Gaussian processes. arXiv 2021, arXiv:2107.01408. [Google Scholar]
  31. Ranganath, R.; Gerrish, S.; Blei, D. Black box variational inference. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Reykjavik, Iceland, 22–25 April 2014; pp. 814–822. [Google Scholar]
  32. Bingham, E.; Chen, J.P.; Jankowiak, M.; Obermeyer, F.; Pradhan, N.; Karaletsos, T.; Singh, R.; Szerlip, P.; Horsfall, P.; Goodman, N.D. Pyro: Deep Universal Probabilistic Programming. J. Mach. Learn. Res. 2018, 20, 973–978. [Google Scholar]
  33. Gneiting, T.; Raftery, A.E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 2007, 102, 359–378. [Google Scholar] [CrossRef]
  34. Frigola, R.; Chen, Y.; Rasmussen, C.E. Variational Gaussian process state-space models. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
  35. Doerr, A.; Daniel, C.; Schiegg, M.; Duy, N.T.; Schaal, S.; Toussaint, M.; Sebastian, T. Probabilistic recurrent state-space models. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1280–1289. [Google Scholar]
  36. Curi, S.; Melchior, S.; Berkenkamp, F.; Krause, A. Structured variational inference in partially observable unstable Gaussian process state space models. In Proceedings of the Learning for Dynamics and Control, PMLR, Berkeley, CA, USA, 11–12 June 2020; pp. 147–157. [Google Scholar]
  37. Krishnan, R.; Shalit, U.; Sontag, D. Structured inference networks for nonlinear state space models. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Figure 1. Example of an alert event. The black dotted line denotes the time of an alert.
Figure 1. Example of an alert event. The black dotted line denotes the time of an alert.
Futureinternet 16 00008 g001
Figure 2. Forecasts for GP-RLARX.
Figure 2. Forecasts for GP-RLARX.
Futureinternet 16 00008 g002
Figure 3. Forecasts for TP-RLARX.
Figure 3. Forecasts for TP-RLARX.
Futureinternet 16 00008 g003
Figure 4. Alert event mislabeled by human labelers as “Action”.
Figure 4. Alert event mislabeled by human labelers as “Action”.
Futureinternet 16 00008 g004
Table 1. Comparison of GP-NARX and GP-RLARX methods.
Table 1. Comparison of GP-NARX and GP-RLARX methods.
ModelCharacteristicsProsCons
GP-NARX1. Target is a non-linear,
autoregressive function of
observed past values
and exogenous predictors.
2. Trained via Type II MLE
3. Forecasts attained via
simple Monte
Carlo sampling

1. Fast to train
2. Non-parametric
GP prior
3. Predictive uncertainty

1. Incapable of handling
heavy-tailed noise outliers
2. Assumes a
Gaussian likelihood
GP-RLARX1. Target variable is
assumed to equal a latent
state plus noise
2. Autoregressive behavior
captured through
latent state dynamics
3. Exogenous predictors
can be placed at observed
and latent level
4. Trained using variational
Bayesian and sparse
GP methods
5. Forecasts by sampling
from approximate posterior
1. Robust to
heavy-tailed noise
and outliers
2. Non-parametric
sparse GP prior
3. Predictive uncertainty
4. Arbitrary likelihoods
1. Slower to train
than GP-NARX
2. Somewhat more
challenging to train
Table 2. Summary of contributions and findings.
Table 2. Summary of contributions and findings.
Proposed ContributionsSummaryFindings
TP-NARXExtends the GP-NARX model
to the Student-t likelihood
in order to accommodate
heavy-tailed noise and outliers
1. Gain robustness of a
heavy-tailed likelihood
without increasing computational
speed relative to GP-NARX
TP-RLARXExtends the GP-RLARX by
substituting the
latent GP prior with a
Student-t process prior.
Proposed method is now
robust to heavy-tailed noise
at the observational and
latent levels.
Derived the ELBO for
this proposed model
1. Gain robustness of a heavy-tailed
latent state with minor increase
to computational speed relative to
GP-RLARX
2. TP-RLARX has performance at
least as good as GP-RLARX on
intervention analysis task.
Table 3. Forecast metrics and CPU times.
Table 3. Forecast metrics and CPU times.
ModelRMSEsMAPECRPSCPU Time
GP-NARX RBF13.4560.04610.70533.69
GP-NARX Matérn 3 / 2 13.6050.04610.88234.88
GP-NARX Matérn 5 / 2 13.6690.04710.87534.63
GP-NARX OU13.3850.04610.71732.51
GP-RLARX RBF15.7480.05112.253741.50
GP-RLARX Matérn 3 / 2 15.6100.05112.429889.78
GP-RLARX Matérn 5 / 2 15.4530.05112.309954.87
GP-RLARX OU14.8310.04911.798907.22
TP-NARX RBF13.1100.04410.34031.95
TP-NARX Matérn 3 / 2 13.2340.04610.36130.72
TP-NARX Matérn 5 / 2 13.0730.04610.36131.27
TP-NARX OU13.7280.04710.70728.82
TP-RLARX RBF13.6660.04610.886967.20
TP-RLARX Matérn 3 / 2 13.1490.04610.4811131.45
TP-RLARX Matérn 5 / 2 13.3120.04610.5741129.89
TP-RLARX OU13.0030.04510.3941104.85
Table 4. Confusion matrices for each model (RBF kernel).
Table 4. Confusion matrices for each model (RBF kernel).
(a) GP-NARX(b) GP-RLARX
Human Labels Human Labels
Predicted LabelsNo ActionActionPredicted LabelsNo ActionAction
No Action337No Action3810
Action73Action20
(c) TP-NARX(d) TP-RLARX
Human Labels Human Labels
Predicted LabelsNo ActionActionPredicted LabelsNo ActionAction
No Action248No Action399
Action162Action11
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Toman, P.; Ravishanker, N.; Lally, N.; Rajasekaran, S. Latent Autoregressive Student-t Prior Process Models to Assess Impact of Interventions in Time Series. Future Internet 2024, 16, 8. https://doi.org/10.3390/fi16010008

AMA Style

Toman P, Ravishanker N, Lally N, Rajasekaran S. Latent Autoregressive Student-t Prior Process Models to Assess Impact of Interventions in Time Series. Future Internet. 2024; 16(1):8. https://doi.org/10.3390/fi16010008

Chicago/Turabian Style

Toman, Patrick, Nalini Ravishanker, Nathan Lally, and Sanguthevar Rajasekaran. 2024. "Latent Autoregressive Student-t Prior Process Models to Assess Impact of Interventions in Time Series" Future Internet 16, no. 1: 8. https://doi.org/10.3390/fi16010008

APA Style

Toman, P., Ravishanker, N., Lally, N., & Rajasekaran, S. (2024). Latent Autoregressive Student-t Prior Process Models to Assess Impact of Interventions in Time Series. Future Internet, 16(1), 8. https://doi.org/10.3390/fi16010008

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop