If You Like It, GAN It. Probabilistic Multivariate Times Series Forecast With GAN

The contribution of this paper is two-fold. First, we present ProbCast - a novel probabilistic model for multivariate time-series forecasting. We employ a conditional GAN framework to train our model with adversarial training. Second, we propose a framework that lets us transform a deterministic model into a probabilistic one with improved performance. The motivation of the framework is to either transform existing highly accurate point forecast models to their probabilistic counterparts or to train GANs stably by selecting the architecture of GAN's component carefully and efficiently. We conduct experiments over two publicly available datasets namely electricity consumption dataset and exchange-rate dataset. The results of the experiments demonstrate the remarkable performance of our model as well as the successful application of our proposed framework.


Introduction
A large sector of industry such as health care, automotive industry, aerospace industry, and weather forecast deals with time-series data in their operations. The knowledge about what will happen in the future is essential to make genuine decision and accurate forecasting of future values is a key to the success. Hence, a huge body of research is dedicated to address the forecasting problem. An overview of various researches on forecasting problem is provided in [16]. Currently, the field is dominated by point prediction methods which are easy to understand. However, these deterministic models report the mean of possible outcomes and cannot reflect the inherent uncertainty exists in the real arXiv:2005.01181v1 [cs.
LG] 3 May 2020 world. The probabilistic forecast models are devised to answer these shortcomings. These models try to quantify the uncertainty of the predictions by forming a probability distribution over possible outcomes [5].
In this paper, we propose ProbCast, a new probabilistic forecast model for multivariate time-series based on Conditional Generative Adversarial Networks (GANs). Conditional GANs are a class of NN-based generative models that enables us to learn conditional probability distribution given a dataset. ProbCast is trained using a Conditional GAN setup to learn the probability distribution of future values conditioned on the historical information of the signal.
While GANs are powerful methods for learning complex probability distributions, they are notoriously hard to train. The training process is very unstable and quite dependant on careful selection of the model architecture and hyperparameters [8]. In addition to ProbCast, we suggest a framework to transform an existing deterministic forecaster -which is comparatively easy to train -into a probabilistic one which excels its predecessor. By using the proposed framework, the space for searching GAN's architecture becomes considerably smaller. Thus, this framework provides an easy way to adapt highly accurate deterministic models to construct useful probabilistic models without compromising the accuracy by exploiting the potential of GANs.
In summary, the main contributions of this article are as follows: -We introduce ProbCast, a novel probabilistic model for multivariate timeseries forecasting. Our method employs conditional GAN setup to train a probabilistic forecaster. -We suggest a framework to transform a point forecast model into a probabilistic model. This framework eases the process of replacing the deterministic model with probabilistic ones. -We conduct experiments on two publicly available datasets and report the results which shows the superiority of the ProbCast. Furthermore, we demonstrate that our framework is capable of transforming a point forecast method into a probabilistic model with improved accuracy.

Related Work
Due to the lack of a standard evaluation method for GANs, initially, they were applied to domains in which their results are intuitively assessable e.g. images.
However, recently they have been applied to time-series data. Currently GANs are applied to various domains for generating realistic time-series data including health care [3,7,10,18,23] ,finance [21,22] , and energy industry [1,4,25]. In [24], authors combine GAN and auto-regressive models to improve sequential data generation. Ramponi et al. [20] condition a GAN on timestamp information to handle irregularly sampling. Furthermore, researchers have used conditional GAN to build probabilistic forecasting models. Koochali et al. [13] use Conditional GAN to build a probabilistic model for univariate time-series. They use Long Short-Term Memory (LSTM) in GAN's component and test their method on a synthetic dataset as well as two publicly available datasets. In [26] authors utilize LSTM and Multi-Layer Perceptron (MLP) in a conditional GAN structure to forecast the daily closing price of stocks. The authors combine the Mean Square Error (MSE) with the generator loss of a GAN to improve performance. Zhou et al. [27], employ LSTM and convolutional neural network (CNN) in an adversarial training setup to forecast the high-frequency stock market. To guarantee satisfying predictions, this method minimizes the forecast error in the form of Mean absolute error (MAE) or MSE during training in conjunction with the GAN value function. Lin et al. [15] propose a pattern sensitive forecasting model for traffic flow which can provide accurate predictions in unusual states without compromising its performance in usual states. This method uses conditional GAN with MLP in its structure and adds two error terms to standard generator loss. The first term specifies forecast error and the second term expresses reconstruction error. Kabir et al. [12] make use of adversarial training for quantifying the uncertainty of the electricity price with prediction interval. This line of research is more aligned with the method we presented in this article however the methods suggested in [15,26,27] include a point-wise loss function into the GAN loss function. Minimizing suggested loss functions would decrease statistical error values, such as RMSE, MAPE, and MSE. However, they encourage the model to learn the mean of possible outcomes instead of the probability distribution of future value. Hence, their probabilistic forecast can be misleading despite the small statistical error.

Background
Here, we work with a multivariate time-series X = {X 0 , X 1 , ..., X T } where each X t = {x t,1 , x t,2 , ..., x t,f } is a vector with size f equal to the number features. In this paper x t,f refers to data point at time step t of feature f and X t points to feature vector at time step t. The goal is to model P (X t+1 |X t , .., X 0 ), the probability distribution for X t+1 given historical information {X t , .., X 0 }.

Mean regression forecaster
To address the problem of forecasting, we can take the predictive view of regression [5]. Ultimately, the regression analysis aims to learn the conditional distribution of a response given a set of explanatory variables [11]. The mean regression methods are deterministic methods which are concerned with accurately predicting the mean of possible outcome i.e. µ(P (X t+1 |X t , .., X 0 )). There is a broad range of mean regression methods available in the literature however, all of them are unable to reflect uncertainty in their forecasts. Hence, their results can be unreliable and misleading in some cases.

Generative Adversarial Network
In 2014, Goodfellow et al. [9] introduce a powerful generative model called Generative Adversarial Network (GAN). GAN can implicitly learn probability distri-bution which describes a given dataset i.e. P (data) with high precision. Hence, it is capable of generating artificial samples with high fidelity. The GAN architecture is composed of two neural networks namely generator and discriminator. These components are trained simultaneously in an adversarial process. In the training process, first, a noise vector z is sampled from a known probability distribution P noise (z) and fed into generator. Then, generator transforms z from P noise (z) to a sample which follows P data . On the other hand, discriminator checks how well generator is performing by comparing generator's outputs with real samples from the dataset. During training, this two-player minimax game is set in motion by optimizing the following value function: However, GAN's remarkable performance does not acquire easily. Their training process is quite unstable and careful selection of GAN's architecture and hyperparameters is vital for stabilizing training process [8]. Since we should search for the optimal architecture of generator and discriminator simultaneously, it is normally a cumbersome and time-consuming task to find a perfect combination of structures in a big search space.

Conditional GAN
Conditional GAN [17] enables us to incorporate auxiliary information called condition into the process of data generation. In this method, we provide an extra piece of information like labels to both generator and discriminator. The generator must respect the condition while synthesizing a new sample because the discriminator considers the given condition while it checks the authenticity of its input. The new value function V (G, D) for this setting is: After training a Conditional GAN, the generator learns implicit the probability distribution of data given condition i.e. P (data|condition)

ProbCast: The proposed multivariate forecasting model
In this article, we consider Conditional GAN as a method for training a probabilistic forecast model using adversarial training. In this perspective, the generator is our probabilistic model (i.e. ProbCast) and the discriminator provides required gradient for optimizing ProbCast during training. To learn P (X t+1 |X t , .., X 0 ), the historical information {X t , .., X 0 } is used as the condition of our Conditional  Fig. 1: The demonstration of the proposed framework and adversarial training setup. The pipeline is followed from top to bottom. First, we search for the optimal architecture of the deterministic model. The deterministic model consists of a GRU block for learning input window representation and two dense layers to map the representation to the forecast. Then, the noise vector z is integrated into the deterministic model to build the generator. Finally, the generator is trained using suitable discriminator in a conditional GAN setup to obtain ProbCast.
GAN and the generator is trained to generate X t+1 . Hence, the probability distribution which is learned by generator corresponds to P (X t+1 |X t , .., X 0 ) i.e. our target distribution. The value function which we used for training the ProbCast (indicated as PC) is: ., X 0 )))]. (3)

The proposed framework for converting deterministic model to probabilistic model
By stepping in the realm of multivariate time-series, other challenges need to be addressed too. In the multivariate setting, we require more complicated architecture to figure out dependencies between features and forecast future with high accuracy. Furthermore, as previously mentioned, GANs require precise hyperparameter tuning to have a stable training process. Considering required network complexity for handling multivariate time-series data, it is very cumbersome or in some cases impossible to find suitable generator and discriminator architecture concurrently which performs accurately. To address this problem, we propose a new framework for building a probabilistic forecaster based on a deterministic forecaster using GAN architecture.
In this framework, we build the generator based on the architecture and hyper-parameters of the deterministic forecaster and train it using appropriate discriminator architecture. In this fashion, we can perform the task of finding an appropriate generator and discriminator architecture separately which results in simplification of the GAN architecture search process. In other words, by using this framework, we can transform an existing accurate deterministic model into a probabilistic model with increased precision and better alignment with the real world. Figure 1 demonstrates the proposed framework as well as conditional GAN setup for training ProbCast. First, we build an accurate point forecast model by searching for the optimal architecture of the deterministic model. In the case that a precise point forecast model exist, we can skip the first step and use existing model. Then, we need to integrate the noise vector z into the deterministic model architecture. In our experiments, we get the best results when we insert the noise vector into the later layers of the network, letting earlier layers of network learn the representation of the input window. Finally we train this model using adversarial training to acquire our probabilistic forecast model i.e. ProbCast.

Train pipeline
With generator architecture at hand, we only need to search for an appropriate time-series classifier to serve as the discriminator during the training of GAN. By reducing the search space of GAN architecture to discriminator only, we can find a discriminator structure efficiently which is capable of training the ProbCast with superior performance in comparison to the deterministic model. The following steps summarize the framework: 1. Employ an accurate deterministic model: (a) Either use an existing model (b) Or search for an optimal deterministic forecaster 2. Structure the generator based on deterministic model architecture and incorporate noise vector into the network, preferably into later layers. 3. Search for an optimal discriminator structure and train the generator using the that.

Datasets
We tested our method on two publicly available datasets namely electricity and exchange-rate 4 . Electricity dataset consists of electricity consumption of 321 clients in KWh which is collected every 15 minutes between 2012 and 2014. The dataset is converted to reflect hourly consumption. Exchange-rate dataset contains daily exchange-rate of eight countries namely Australia, British, Canada, Switzerland, China, Japan, New Zealand, and Singapore which is collected between 1990 to 2016.

Setup
In each of our experiments, first, we run architecture search to find an accurate deterministic model. For training the deterministic model, we employed MAE as a loss function. In figure 1, the architecture of deterministic model is indicated.
We used a Gated recurrent unit (GRU) [2] block to learn the representation of the input window. Then, the representation passes through two dense layers to map from representation to forecast. We adopt the architecture of the most precise deterministic model which we found to build the ProbCast by concatenating noise vector to GRU block output (i.e. representation) and extending MLP block as shown in Figure 1. Finally, we search for the optimal architecture of the discriminator ( Figure 2) and train the ProbCast. The discriminator concatenates X t+1 to the end of input window and constructs {X t+1 , X t , .., X 0 }. Then it utilizes a GRU block followed by two layers of MLP to inspect the consistency of this window. We use the genetic algorithm to search for the optimal architecture. We code our method using Pytorch [19].  Fig. 2: The discriminator architecture of our conditional GAN. The number of layers and cells in GRU block are hyperparameters.

Evaluation Metric
To report the performance of the ProbCast, we used the negative form of Continues Ranked Probability Score [6] (denoted by CRP S * ) as the metric. The CRP S * reflects the sharpness and calibration of a probabilistic method. It is defined as follow: where X and X' are independent copies of a random variable from probabilistic forecaster F and x is the ground truth. The CRP S * provides a direct way to compare deterministic and probabilistic models. In the case of the deterministic  forecaster, the CRP S * reduces to Mean Absolute Error (MAE) which is a commonly used point-wise error metric. In other words, in a deterministic setting, the CRP S * is equivalent to MAE: where x is the ground truth andx is the point forecast. After the GAN training concluded, we calculate the CRP S * of the ProbCast and deterministic model. To calculate CRP S * for ProbCast using equation 4, we sample it 200 times (100 times for each random variable). Table 2 presents optimal hyperparameters we found for each dataset using our framework during the experiments and table 3 summarizes our experiments' results presenting CRP S * of the best deterministic model and the ProbCast for each dataset.

Results and Discussion
In the experiment with the electricity dataset, the ProbCast is more accurate than the deterministic model while it has an almost identical structure. Furthermore, this experiment shows that our model can provide precise forecasts for multivariate time-series even when the number of features is substantial. In the exchange-rate experiment, the ProbCast outperforms its deterministic predecessor again despite structural similarities. We can also observe that our method works well even though the dataset is considerably smaller in comparison to the previous experiment.
Furthermore, it confirms that our framework is capable of transforming a deterministic model to a probabilistic model which is more accurate than its predecessor. The question now arises: Considering the sensitiveness of GAN to the architecture of its components, why employing deterministic model architecture to define the ProbCast works fine while it is borrowed from a totally

Conclusion and Future works
In this paper, we present ProbCast, a probabilistic model for forecasting one step ahead of multivariate time-series. We employ the potential of conditional GAN in learning conditional probability distribution to model probability distribution of future values given past values i.e. P (X t+1 |X t , .., X 0 ). Furthermore, we propose a framework to efficiently find the optimal architecture of GAN's components. This framework builds the probabilistic model upon a deterministic model to improve its performance. Hence, it enables us to search for optimal architecture of generator and discriminator separately. Furthermore, it can transform an existing deterministic model into a probabilistic model with increased precision and better alignment with the real world.
We assess the performance of our method on two publicly available datasets. The exchange-rate dataset is a small dataset with a few numbers of features while Electricity dataset the is bigger with a considerably larger number of features. We compare the performance of the ProbCast with its deterministic equivalent. In both experiments, our method outperforms its counterpart. The results of the experiments demonstrate that the ProbCast can learn patterns precisely from a small set of data and at the same time, it is capable of figuring out the dependencies between a large number of features and forecast future values accurately in the presence of a big dataset. Furthermore, the results of the experiments indicate the successful application of our framework which paves the way for a systematic and straightforward approach to exchange currently used deterministic models with a probabilistic model to improve accuracy and obtain realistic forecasts.
The promising results of our experiments signify a great potential in probabilistic forecasting using GANs and suggest many new frontiers to further push the research in this direction. For instance, we employ vanilla GAN for our research while there are a lot of modifications suggested for improving GANs in recent years. One possible direction is to apply these modifications and inspect the improvement in the performance of the ProbCast. The other direction is experimenting with more sophisticated architectures for generator and discriminator. Finally, we only use the knowledge from the deterministic model to shape the generator. It would be interesting to push this direction and try to incorporate more knowledge from the deterministic model into the GAN training process to improve and optimize the probabilistic model.