The Generative Adversarial Approach: A Cautionary Tale of Finite Samples

Escobar-Anel, Marcos; Jiao, Yiyao

doi:10.3390/a18090564

Open AccessArticle

The Generative Adversarial Approach: A Cautionary Tale of Finite Samples

by

Marcos Escobar-Anel

^*

and

Yiyao Jiao

Department of Statistical and Actuarial Sciences, University of Western Ontario, London, ON N6A 5B7, Canada

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(9), 564; https://doi.org/10.3390/a18090564

Submission received: 16 July 2025 / Revised: 28 August 2025 / Accepted: 3 September 2025 / Published: 5 September 2025

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Given the relevance and wide use of the Generative Adversarial (GA) methodology, this paper focuses on finite samples to better understand its benefits and pitfalls. We focus on its finite-sample properties from both statistical and numerical perspectives. We set up a simple and ideal “controlled experiment” where the input data are an i.i.d. Gaussian series where the mean is to be learned, and the discriminant and generator are in the same distributional family, not a neural network (NN), as in the popular GAN. We show that, even with the ideal discriminant, the classical GA methodology delivers a biased estimator while producing multiple local optima, confusing numerical methods. The situation worsens when the discriminator is in the correct parametric family but is not the oracle, leading to the absence of a saddle point. To improve the quality of the estimators within the GA method, we propose an alternative loss function, the alternative GA method, that leads to a unique saddle point with better statistical properties. Our findings are intended to start a conversation on the potential pitfalls of GA and GAN methods. In this spirit, the ideas presented here should be explored in other distributional cases and will be extended to the actual use of an NN for discriminators and generators.

Keywords:

generative adversarial estimator; maximum likelihood estimation; finite sample properties; oracle discriminator

1. Introduction

This work studies the Generative Adversarial (GA) methodology, herein referred to as the “classical GA” method, from a finite-sample estimator perspective. This methodology was popularized in the context of GANs (Generative Adversarial Networks) by the seminal work of [1]. We derive theoretical results and reveal pitfalls using a simple example, revealing a better understanding of the benefits and inaccuracies of the methodology. In particular, we study the impact of the sample size from the original data (T) on the statistical and numerical quality of the estimators implied by the classical GA method; we assume a single sample is observed (

K = 1

), and we also study the impact of the number of generated samples (

K_{G}

), assuming the size of each generated sample (

T_{G}

) is the size of the original data (

T_{G} = T

). Statistical quality refers to the estimator’s bias and variance, while numerical quality entails the existence and uniqueness of first-order conditions (FOC) and a saddle point for the corresponding game, i.e., the optimization problem. The key motivation for the finite-sample analysis comes from applications in economics and finance. There is a vast amount of literature on GAN uses in these areas; see [2,3,4] for financial time series generation, refs. [5,6] for advanced GAN methodologies, and [7] for an overview of applications in finance. The literature focuses mostly on large-sample analysis, even though there are many instances where data are in short supply in finance, for instance, a lack of liquidity in traded stocks, especially during a crisis; qualitative changes in prices due to changes in market conditions, leading to changes in parameters, i.e., a different model; and a low frequency of the needed data, e.g., monthly or quarterly data. Last but not least, access to databases can be very expensive in finance, stopping individuals and companies from conducting more in-depth analysis.

We tackle this very general problem of finite-sample implications by focusing on a special, well-known case in statistics. That is, we set up a “control experiment” where we know the true model for the original data, assumed to be an i.i.d. Gaussian series of size T, where the variance is known but the mean needs to be learned using the GA methodology. To benefit the GA methodology, in terms of avoiding the need for approximations, we also assume the ideal situation where the discriminator and the generator are in the same distribution family as the true model. This contrasts with the classical GA neural network (NN) setting (GAN), where both are assumed to be NNs, and hence form an approximation of the reality implied by the true model. The rationale behind avoiding an NN in this study is that we aim to detect pitfalls in ideal conditions (i.e., Gaussian i.i.d. data, with the discriminator and generator in the same Gaussian family), setting the stage for the need for follow-up analyses in a non-ideal NN setting.

In this simple context, we produce

K_{G}

samples each of size

T_{G} = T

by solving the classical GA optimization in closed form for two cases. The first case assumes a perfect discriminator, the so-called oracle discriminator (see [8]). This means the discriminator is of the right functional form and has the true value of the parameter, the mean. In the second case, the discriminator has the correct functional form, i.e., the same distribution family as the true model (named family-oracle) up to the unknown parameter. In both cases, the generator is in the same family as the original data. For the oracle discriminator case, when assuming infinitely many samples of size T from the original data and of size

T_{G}

from the generator (

K, K_{G}

\to \infty

), we demonstrate a unique saddle point for the classical GA method, therefore confirming the underlying game. In this case, the optimal solution is precisely the true distribution. This can be interpreted as an asymptotic analysis, as it uses expected values (see [8] for a similar interpretation). On the other hand, we could not prove a saddle point for the family-oracle case, although, numerically, we could detect the saddle point. These findings confirm the excellent asymptotic large-sample properties of the methodology.

We then shift the focus to the main objective of the paper: the finite-sample analysis. That is, we only have one finite sample of size T from the original data, and we generate

K_{G}

samples of finite size

T_{G} = T

, where

K_{G}

can go to ∞. This is the most practical situation in many areas of application, especially in finance or economics. As before, we first study the case of the oracle discriminator. Here, we obtain a closed-form representation for first-order condition (FOC) candidates. We then rely on a numerical analysis to reveal the shortcomings of the optimal as a function of

K_{G}

and T. Particularly, the possibility of multiple local minima could make the numerical search for the optimal solution very unstable.

We then study the family-oracle discriminator case. This setting leads to two estimators: one for the parameter (mean) of the discriminant, and one for the parameter (mean) of the generator. We also obtain a closed-form representation for first-order condition (FOC) candidates. Numerically, the results of this setting reveal several pitfalls. First, there is an absence of saddle points in some situations, and hence a failure of the game interpretation for the methodology. We also detect a significant bias for the estimator of the generator, which, together with a small variance, could lead practitioners away from the true value of the parameter.

Given these problems with the classical GA estimator, we propose an alternative GA game estimator, that is, an alternative loss function. This alternative GA estimator can deliver better optimal estimators for the discriminant and generator, e.g., a unique saddle point, unbiased and with low variance, on finite samples. We show numerically that this alternative approach is as precise as maximum likelihood estimation. There are many other loss functions, and therefore GAN games, proposed in the literature; for instance, see [9] for a Geometric GAN, ref. [10] for a Least-Squares GAN, and [11] for a Wasserstein GAN. A comparison of our proposal and these previous cases will be the topic of future research.

Our study has some connection with the work of [12], where the authors also entertain the application of a GAN and WGAN in a simple Gaussian case, but they have no objective regarding finite-sample properties. On the other hand, ref. [13] delves into the finite-sample properties of the GAN estimator for the autoregressive parameter of a linear dynamic fixed-effects panel data model with Gaussian errors, reporting the best results for the GAN over the MLE. They rely on an approximation for the estimator, and do not explore the absence of saddle points. More recently, ref. [8] is the most similar paper to our work. The authors study the “adversarial estimator” (AE) as the estimator of the generator in a classical GAN game, and they consider the case of a perfect discriminator as the “oracle discriminator”. On pages 2044–2045, they briefly talk about the location-Gaussian example. They do not notice or acknowledge that the first-order condition (FOC) in a GAN game might have multiple extremes. They discuss other choices of discriminators resulting in other types of estimators (e.g., the simulated method of moments, etc.). Still, they do not entertain other loss functions, which is another advantage of our analysis. Very interestingly, they also study the properties of the AE when using a neural network for the discriminator (i.e., not the oracle or family-oracle discriminator), which we plan to tackle in future research. Other works have used GA methods for training Gaussian mixtures, e.g., ref. [14], explaining low-dimensional problems [15], and, in the search for bounds on errors with non-linear objective functions, [16]; however, the issue of finite-sample implications (on the training and generated data) has not been sufficiently addressed.

For clarity, the setting, main results, and contributions of this work are listed next:

We identify and study six GA method-related problems for a simple model of an i.i.d. Gaussian series with no NN approximation of the generator or discriminant. In all cases, the generator is assumed to be in the same distribution family as the true data-generating process.
The first two problems tackle the classical GA method, assuming infinitely many samples of finite sample size: $K, K_{G} \to \infty$ , and $T_{G} = T \leq \infty$ . The first uses the oracle discriminator (see Proposition 1), and the second uses the family-oracle discriminator (see Proposition 2). The former shows the uniqueness of the solution, and the latter shows a unique saddle point, as prescribed by [1].
The next two problems assume a finite number of samples ( $K = 1, K_{G} < \infty$ ). The first problem uses the oracle discriminator (see Proposition 4), and the second uses the family-oracle (see Proposition 5). The numerical analyses reveal several pitfalls: the possibility of multiple solutions to the FOC, the potential absence of saddle points, and biases for the generator and the discriminator within the family-oracle case.
We propose a new loss function for a GA method that delivers unique saddle points for $K, K_{G} \to \infty$ (see Proposition 3) and for $K = 1, K_{G} < \infty$ (see Proposition 6). Moreover, the new implied GA estimators for the discriminant and generator have better properties in terms of bias and variance than the corresponding classical GA estimators.
We conclude that, given the pitfalls in such a simple Gaussian case, more complex parametric cases or discriminators/generators outside the correct family, like neural networks, will likely perform worse in finite samples.

The paper is organized as follows: Section 2 introduces the notation and the various problems of interest, and it solves the cases of asymptotic games. Section 3 delivers the main results of the paper, as it tackles the finite-sample properties of the GA estimators associated with various choices of discriminators and loss functions. Section 4 reports numerical results confirming the benefits and pitfalls of the various GA approaches. Section 5 concludes, while Appendix A includes all the proofs and notation.

2. Problem Setting and Asymptotic Solutions

We start by explaining the ideas of our experiment before setting the mathematical details. We assume the true model or true data process is an i.i.d. sequence of Gaussian random variables with mean

μ^{M}

and variance

σ

. We want to use the GA methodology to capture this true Gaussian distribution, which means we want the GA method to be able to generate new samples from the true Gaussian distribution.

For simplicity, we design the GA methodology such that both the discriminator and the generator have the advantage of knowing the functional form of the true data process, i.e., the i.i.d. Gaussian series. We call them the family-oracle discriminator and generator. If the discriminator were to know not only the functional form (Gaussian), but also the true value of the parameters, then it would be called the oracle discriminator (see [8] for similar language). As a byproduct, this means that we avoid using neural networks as an approximation to the discriminant or generator, clearly benefiting the GA methodology. We assume that both the discriminator and the generator know the true volatility,

σ

, of the Gaussian series, but they might not know the mean. Therefore, effectively, the GA methodology here aims to learn the true mean of the original Gaussian data.

Let us denote the joint density function of an independent and identically distributed (i.i.d.) series

Y = Y_{1 : T} = \{Y_{1}, \dots, Y_{T}\}

with parameter of interest

μ^{Y}

as follows:

f_{Y} (Y; μ^{Y}) = \frac{1}{{(2 π)}^{T / 2} {(σ)}^{T}} exp (- \sum_{t = 1}^{T} \frac{{(Y_{t} - μ^{Y})}^{2}}{2 σ^{2}}) .

We assume the true data process can be expressed by

R_{t} = μ^{M} + σ Z_{t},

where the parameter

μ^{M}

is potentially unknown,

σ

is known, and

Z_{t}

is a standard Gaussian i.i.d. sequence. Let us denote a random sample of size T as

R_{1 : T} = \{R_{1}, \dots, R_{T}\}

. The joint density function would be denoted as

f_{M}

. The discriminator is a function of data

X_{1 : T} = \{X_{1}, \dots, X_{T}\}

with parameter

μ^{D}

, generically

D (X_{1 : T}; μ^{D})

. The generator chosen for our problem is in the same distribution family as the true data process, albeit with an unknown mean parameter

μ^{G}

:

R_{t}^{G} = μ^{G} + σ Z_{t}^{G},

where

σ

is given and

Z_{t}^{G}

is a standard Gaussian i.i.d. sequence. The generator density function is denoted as

f_{G}

. This means the generator uses the true volatility, but it does not know the mean. Let us denote a random sample from the generator of size

T_{G}

as

R_{1 : T_{G}}^{G, i} = \{R_{1}^{G, i}, \dots, R_{T_{G}}^{G, i}\}

for

i = 1, \dots, K_{G}

samples.

The “asymptotic” or “population” loss function can be written generically as follows:

L (μ^{D}, μ^{G}) = E_{M} [g_{1} (D (R_{1 : T}; μ^{D}))] + E_{G} [g_{2} (D (R_{1 : T_{G}}^{G}; μ^{D}))],

where

g_{1}, g_{2}

are functions, and

E_{M} [.]

,

E_{G} [.]

is the notation for the expected value compared to the true measure/density and the generator measure/density, respectively. In many application fields, like finance, we can observe only one sample (

K = 1

) of size T of real data,

R_{1 : T}

, and we can create several samples of size

T_{G}

with the generator,

R_{1 : T}^{G, (i)}

i = 1, \dots, K_{G}

; therefore, the use of expectations is misleading. Instead, the actual loss function is computed using empirical averages as approximations of the expected values:

\begin{matrix} E_{M} [g_{1} (D (R_{1 : T}; μ^{D}))] & \approx & g_{1} (D (R_{1 : T}; μ^{D})), \\ E_{G} [g_{2} (D (R_{1 : T_{G}}^{G}; μ^{D}))] & \approx & \frac{1}{K_{G}} \sum_{i = 1}^{K_{G}} g_{2} (D (R_{1 : T_{G}}^{G, (i)}; μ^{D})) . \end{matrix}

Therefore, the “finite sample” loss function would be

L^{(E)} (μ^{D}, μ^{G}) = g_{1} (D (R_{1 : T}; μ^{D})) + \frac{1}{K_{G}} \sum_{i = 1}^{K_{G}} g_{2} (D (R_{1 : T_{G}}^{G, (i)}; μ^{D})) .

Working with the previous loss function would allow us to study the properties of GA estimators (i.e., optimal parameter estimates coming from a sample).

The generic “asymptotic” GA game problem is defined as follows:

min_{μ^{G}} \{max_{μ^{D}} \{L (μ^{D}, μ^{G})\}\} .

In case of the existence of a unique solution, we can write it as follows:

({(μ^{G})}^{*}, {(μ^{D})}^{*}) = \underset{μ^{G}}{arg min} \{\underset{μ^{D}}{arg max} \{L (μ^{D}, μ^{G})\}\} .

(1)

Similarly, for a finite sample, we write

min_{μ^{G}} \{max_{μ^{D}} \{L^{(E)} (μ^{D}, μ^{G})\}\} .

If there is a unique solution, we can write it as follows:

({\hat{μ}}^{G}, {\hat{μ}}^{D}) = \underset{μ^{G}}{arg min} \{\underset{μ^{D}}{arg max} \{L^{(E)} (μ^{D}, μ^{G})\}\} .

(2)

Intuitively, we want the discriminant to learn the distribution of true data and the generator to generate from the true data. Therefore, from the asymptotic side, we aim at

\begin{matrix} ({(μ^{D})}^{*}, {(μ^{G})}^{*}) & = & (μ^{M}, μ^{M}) . \end{matrix}

From the finite-sample side, we want the estimators

{\hat{μ}}^{D}

and

{\hat{μ}}^{G}

to satisfy standard statistical concepts like being unbiased and having small variance in capturing

μ^{M}

.

Why is the word ‘game’ used in the problems above? Generally speaking, at its root, any min–max problem regarding the function of two variables can be interpreted as a game as long as it has a unique saddle point. The saddle point is the unique extreme value that fulfills the minimization in terms of

μ_{G}

, as well as the maximization in terms of

μ_{D}

. In this case, each player controls one of the variables, i.e.,

μ^{G}

or

μ^{D}

. If the function has a unique saddle point solution, then we have a game and it is well-defined (i.e., the max–min produces the same results as the min–max) (see [17], Section 1.7, or [18], Section 5.4.2). The precise interpretation of the game depends on the choice of the loss function and variables.

Next, we describe the classical GA choice for the loss function with two discriminators (Section 2.1 and Section 2.2), as well as our proposal for the discriminant and loss function (Section 2.3). We also report solutions for the asymptotic (population) case.

2.1. Classical GA Oracle Discriminator

The choice of an “asymptotic” loss function for the classical GA method in [1] is

g_{1} (x) = ln (x)

,

g_{2} (x) = ln (1 - x)

, leading to

L^{(C)} (D, μ^{G}) = E_{M} [ln (D (R_{1 : T}))] + E_{G} [ln (1 - D (R_{1 : T_{G}}^{G}))] .

(3)

This is motivated by a game (min–max) between a discriminator, captured by the function D, and a generator, captured by the parameter

μ^{G}

. It is essential to notice that, in this original setting, the discriminator is assumed to be quite flexible, and, hence, there is no specification of a functional family or parametrization, i.e., no need for

μ^{D}

, as in our previous notation. The reason for this is that the authors aim to find the best possible discriminator, the so-called oracle discriminator.

The corresponding classical GA game problem would be

({(μ^{G})}^{*}, {(D)}^{*}) = \underset{μ^{G}}{arg min} \{\underset{D}{arg max} \{L^{(C)} (D, μ^{G})\}\},

where the function

{(D)}^{*}

is called the oracle discriminator (see [8]) and it has been found to have the following representation (see [1]):

D^{*} (x) = \frac{f_{M} (x)}{f_{M} (x) + f_{G} (x)} .

(4)

Therefore, the corresponding classical GA problem becomes

{(μ^{G})}^{*} = \underset{μ^{G}}{arg min} L^{(C)} (D^{*}, μ^{G}) .

(5)

The following proposition reveals that the previous problem is well-defined with a closed-form solution.

Proposition 1.

Assume the loss function as per Equation (3). Then, Equation (5) has a unique minimum of the form

{(μ^{G})}^{*} = μ^{M} .

Proof.

See Appendix A. □

2.2. Classical GA Family-Oracle Discriminator

Given the fact that, in a real situation,

f_{M} (x)

is unknown, we explore a specific choice of parametric discriminant with parameter

μ^{D}

, that is,

\begin{matrix} D (x; μ^{D}) & = & \frac{f_{D} (x)}{f_{D} (x) + f_{G} (x)} \\ = & \frac{1}{1 + exp (({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} - (μ^{D} - μ^{G}) \frac{T}{σ^{2}} \bar{x})}, \end{matrix}

(6)

where

\bar{x} = \sum_{t = 1}^{T} \frac{x_{t}}{T}

. Note that this discriminator can be written as

1 / (1 + \frac{f_{G}}{f_{D}})

, where

\frac{f_{G}}{f_{D}}

is a ratio of Gaussian densities, leading to an exponential quadratic on the parameters. This choice of discriminator is in the same functional family as the oracle discriminator. This family is characterized by the parameter

μ^{D}

; hence, if

μ^{D} = μ^{M}

, we recover the oracle discriminator.

Note that the common choice for the discriminator in the literature is a neural network (NN),

D (x; μ^{D})

. Hence, by construction, it will not be in the same functional family as the oracle. This means practitioners can only approximate the oracle, leading to inaccuracies. This potential problem will be explored in future research.

The “asymptotic” loss function and classical GA game would be, respectively,

\begin{matrix} L^{(C)} (μ^{D}, μ^{G}) = E_{M} [ln (D (R_{1 : T}; μ^{D}))] + E_{G} [ln (1 - D (R_{1 : T_{G}}^{G}; μ^{D}))], \end{matrix}

(7)

\begin{matrix} ({(μ^{G})}^{*}, {(μ^{D})}^{*}) = \underset{μ^{G}}{arg min} \{\underset{μ^{D}}{arg max} \{L (μ^{D}, μ^{G})\}\} . \end{matrix}

(8)

The following proposition shows that one can search for the best discriminator parametrically within the functional class defined above, leading to similar results as using the oracle discriminator. This is an intuitive and obvious result, as it confirms that, if users know the functional form of the data, then there is no need to use neural networks.

Proposition 2.

Assume the loss function as per Equation (7). Then, a candidate saddle point for Equation (8) is

\begin{matrix} {(μ^{D})}^{*} & = & μ^{M}, \\ {(μ^{G})}^{*} & = & {(μ^{D})}^{*} . \end{matrix}

Proof.

See Appendix A. □

2.3. Alternative Proposal for the Loss Function and Discriminator

In this section, we propose an alternative choice of loss function and discriminator for the application at hand. (The idea can be extended to any parametric density or neural network.) In terms of a discriminator, we define it as the joint likelihood of a sample:

\begin{matrix} D (X_{1 : T}; μ^{D}) & = f_{D} (X_{1 : T}; μ^{D}) = \prod_{t = 1}^{T} f_{D} (X_{t}; μ^{D}) \\ = \frac{1}{{(2 π)}^{T / 2} {(σ)}^{T}} exp (- \sum_{t = 1}^{T} \frac{{(X_{t} - μ^{D})}^{2}}{2 σ^{2}}) . \end{matrix}

(9)

This is the likelihood structure of a Gaussian model with the mean and volatility

(μ^{D}, σ)

.

An alternative loss function is proposed as follows:

L^{(A)} (μ^{D}, μ^{G}) = E_{M} [ln (D (R_{1 : T}; μ^{D}))] + E_{G} [- λ ln (D (R_{1 : T_{G}}^{G}; μ^{D}))] .

(10)

This is basically the expected log-likelihood on the real data, minus the expected log-likelihood on the generated data, weighted by a parameter

λ

. Therefore, in the game to be defined next, we want to find the parameter

μ^{D}

that maximizes the likelihood on the true data (so that the discriminator learns the true model), and we want to find the parameter

μ^{G}

that maximizes the likelihood that the generated data come from the distribution learned by the discriminator (hence from the true data too).

The GA game problem would be

({(μ^{G})}^{*}, {(μ^{D})}^{*}) = \underset{μ^{G}}{arg min} \{\underset{μ^{D}}{arg max} \{L^{(A)} (μ^{D}, μ^{G})\}\} .

(11)

The proposition next confirms the good properties of the game and its solution.

Proposition 3.

Assume the discriminant and loss function as per Equations (9) and (10), and

0 < λ < 1

. Then, Equation (11) has a unique solution with a saddle point:

\begin{matrix} {(μ^{D})}^{*} & = & μ^{M}, \\ {(μ^{G})}^{*} & = & {(μ^{D})}^{*} . \end{matrix}

Proof.

See Appendix A. □

If

λ = 1

, then the solution is not a saddle point, and, therefore, it cannot be interpreted as a game. On the other hand, it can still be seen as an approach with an estimator and a generator. Furthermore, if

λ > 1

, we would produce a max–min rather than a min–max. In summary, and as desired, the discriminant must learn the true mean, and the generator must also learn the true mean. This confirms the validity, for long samples, of the new choices of discriminant and loss function.

3. Theoretical Results for a Finite Sample

Here, we study the classical GA method and the alternative GA method in terms of their implied estimators and their statistical properties. We have three cases to explore: two cases for the classical GA method (i.e., an oracle discriminator and a parametric discriminator), and one case for the alternative GA method.

3.1. Classical GA Approach

In this section, we focus on the loss function, Equation (3), and the two viable choices of discriminant: the oracle from Equation (4) and the family-oracle in Equation (6).

3.1.1. Finite Sample with Oracle Discriminator

In this setting, the loss function becomes

L^{(E, C)} = ln (D^{*} (R_{1 : T})) + \frac{1}{K_{G}} \sum_{i = 1}^{K_{G}} ln (1 - D^{*} (R_{1 : T_{G}}^{G, (i)})) .

(12)

Recall that the oracle discriminant has the following representation:

\begin{matrix} D^{*} (X_{1 : T}) & = & \frac{1}{1 - exp (({(μ^{M})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} - (μ^{M} - μ^{G}) \frac{T}{σ^{2}} \bar{X})} . \end{matrix}

(13)

As the maximization is already accounted for with the choice of discriminant, the optimization problem would be

\underset{μ^{G}}{arg min} \{L^{(E)}\} .

(14)

Proposition 4.

Assume the discriminant and loss function as per Equations (12) and (13). Then, the candidate solution(s) to Equation (14) solves the equation:

{\hat{μ}}^{G} = \bar{R} - h_{5} (R^{G}, μ^{M}, {\hat{μ}}^{G}) .

(15)

Moreover, if K goes to infinity, then

{\hat{μ}}^{G}

solves

({\hat{μ}}^{G} - μ^{M}) \frac{E_{\bar{G}} [{(\frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}})}^{2}]}{1 - D (\bar{R})} + {\hat{μ}}^{G} = \bar{R} .

As a result,

{\hat{μ}}^{G} \neq μ^{M}

, unless

\bar{R} = μ^{M}

. Also,

{\hat{μ}}^{G} \neq \bar{R}

unless

\bar{R} = μ^{M}

. When

\bar{R} = μ^{M}

, the unique minimum is

{\hat{μ}}^{G} = μ^{M}

.

Proof.

See Appendix A. □

Note that, in the case of an infinite K, since

μ^{G} < μ^{M} \Rightarrow \bar{R} < μ^{G}

and

μ^{G} > μ^{M} \Rightarrow \bar{R} > μ^{G}

, we find that one of the following is true: (i)

\bar{R} < {\hat{μ}}^{G} < μ^{M}

; (ii)

\bar{R} > {\hat{μ}}^{G} > μ^{M}

; or (iii)

\bar{R} = {\hat{μ}}^{G} = μ^{M}

. Thus, the infinite-sample classical oracle GA estimator is between the true mean

μ^{M}

and the MLE

\bar{R}

.

It can be shown, for

K = 1

and specific choices of

\bar{R}

and

{\bar{R}}^{G}

, that Equation (15) has multiple solutions. The existence of multiple candidates means that there are many local optima or non-unique global optima. Given the complexity of the expressions, we study these solutions in more detail in the numerical section. Next, we give the key pitfall of the approach in this section.

Pitfall 1.

The potentially many solutions for Equation (15) are an indication of the potentially ill-posedness of the classical GA method for finite samples. This is the case even in the ideal situation where the discriminant is chosen perfectly and the generator is in the same family as the true data process.

It is important to realize that, given that we use all the data available and no approximation on the form of discriminant/generator, the limitation explained above is not due to a wrong training methodology or a bad choice of neural networks. The limitation is purely due to having a finite sample. It can only be “fixed” by increasing the number of data.

3.1.2. Finite Sample with the Family-Oracle Discriminator

In this setting, the loss function becomes

L^{(E, C)} = ln (D (R_{1 : T}; μ^{D})) + \frac{1}{K_{G}} \sum_{i = 1}^{K_{G}} ln (1 - D (R_{1 : T_{G}}^{G, (i)}; μ^{D})),

and the optimization problem would be

\underset{μ^{G}}{arg min} \{\underset{μ^{D}}{arg max} \{L^{(E)}\}\} .

The solution is presented next.

Proposition 5.

Assume the discriminant and loss function as per the previous equations. Then, the candidate solution(s) solves the following system of equations:

\begin{matrix} \begin{matrix} {\hat{μ}}^{D} & = {\hat{μ}}^{G}, \\ {\hat{μ}}^{G} & = \bar{R} - h_{3} (R, R^{G}, {\hat{μ}}^{G}, {\hat{μ}}^{G}) σ {\bar{Z}}^{G} . \end{matrix} \end{matrix}

(16)

Proof.

See Appendix A. □

The system of Equation (16) may have non-unique solutions, or even no solutions, with the consequential failure of saddle points. This will be explored in the numerical section. The two key pitfalls of this methodology are as follows:

Pitfall 2.

The potentially many solutions for Equation (16) are an indication of the potentially ill-posedness of the classical GA method for finite samples. This is the case even in the ideal situation where the discriminant and the generator are in the same family as the true data-generating process.

Pitfall 3.

We will show, numerically, that the estimator(s) implied by the previous equations is biased; that is,

E [{\hat{μ}}^{D}] \neq μ^{M}

,

E [{\hat{μ}}^{G}] \neq μ^{M}

.

As explained for Pitfall 1, Pitfall 2 is not due to a wrong training methodology or a bad choice of neural networks; it can only be “fixed” by increasing the number of data. Pitfall 3 provides a statistical way of measuring the size of the error in this “control experiment”. Given the complexity of Equation (16), we cannot obtain a closed-from representation for the size of the bias,

E [{\hat{μ}}^{G}] - μ^{M} = E [{\hat{μ}}^{D}] - μ^{M}

.

3.2. Alternative GA Proposal

The new loss function and discriminant lead to the expression

L^{(E, A)} = ln (D (R_{1 : T}; μ^{D})) - \frac{λ}{K_{G}} \sum_{i = 1}^{K_{G}} ln (D (R_{1 : T_{G}}^{G, (i)}; μ^{D})),

and the optimization problem is

\underset{μ^{G}}{arg min} \{\underset{μ^{D}}{arg max} \{L^{(E)}\}\} .

The proposition next presents the main results for this new setting.

Proposition 6.

Assume the discriminant and loss function as per Equations (9) and (10) with

0 < λ < 1

. Then, Equation (2) has a unique solution, a saddle point with

\begin{matrix} {\hat{μ}}^{D} & = & \frac{1}{T} \sum_{t = 1}^{T} R_{t}, \end{matrix}

(17)

\begin{matrix} {\hat{μ}}^{G} & = & {\hat{μ}}^{D} - \frac{σ}{T K_{G}} \sum_{i = 1}^{K_{G}} \sum_{t = 1}^{T} Z_{t}^{G, (i)} . \end{matrix}

(18)

These estimators are unbiased and consistent. Moreover, if

K_{G}

\to \infty

,

\begin{matrix} {\hat{μ}}^{G} = {\hat{μ}}^{D} = \bar{R} . \end{matrix}

Proof.

See Appendix A. □

These expressions are quite intuitive, as the estimate for

{\hat{μ}}^{D}

is what one would produce via the MLE; hence, it converges to

μ^{M}

, and the estimate for

{\hat{μ}}^{G}

converges fairly quickly (in

T_{G}

and in

K_{G}

) to

{\hat{μ}}^{D}

and therefore to

μ^{M}

. These estimates are unbiased and consistent for the targeted parameter

μ^{M}

.

This also highlights the importance of simulating either a large number

K_{G}

of samples or a long sample

T_{G}

from the generator to better capture the true parameter.

4. Numerical Analysis

In this section, we implement all three finite-sample solutions proposed in Propositions 4–6. Each implementation uses parameter settings relevant to financial practitioners. Specifically, we assume a daily expected return of

μ^{M} = \frac{0.10}{252}

and a daily volatility of

σ = \frac{0.25}{\sqrt{252}}

, consistent with typical financial stock behavior.

We consider investors who observe a one-year sample of returns (

T = 252

trading days) and aim to generate additional samples of 252 days using the GA framework. To assess the performance of the estimators, we evaluate two scenarios based on the number of generated samples:

Scenario A: A finite number of generated samples ( $K_{G} = 100$ );
Scenario B: An infinite number of generated samples ( $K_{G} = \infty$ ).

The objectives and key findings in this section are outlined as follows: In Section 4.1, we apply the classical oracle GA estimator (Proposition 4) in both scenarios, A and B. When

K_{G} \neq \infty

, the resulting loss function becomes non-convex, potentially leading to multiple solutions. Section 4.2 analyzes the classical family-oracle GA estimator (Proposition 5) and highlights its occasional failure to identify saddle points when

K_{G} \neq \infty

. In Section 4.3, we investigate the alternative GA method (Proposition 6) in both scenarios. The results demonstrate the existence of a unique saddle point that offers a stable solution. Section 4.4 presents density plots that compare all three finite-sample estimators with the MLE. Their means and standard deviations are also reported for a comprehensive performance comparison, demonstrating the bias of the classical GA approach.

The numerical minimization and maximization of the loss function in our finite-sample analysis employ the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm. This quasi-Newton method offers greater robustness than other first-order optimizers such as the Adam algorithm—particularly when the Hessian matrix is nearly singular and the model is small. For the case

K_{G} = \infty

, wherein expectations of the loss function involve integration, we use the QAGI routine from the QUADPACK FORTRAN library [19] to perform the numerical quadrature.

4.1. Classical Oracle GA Estimator

The shape of the loss function associated with the classical oracle GA estimator is fundamental to understanding its statistical properties. To illustrate this, we generate loss function plots by sampling values of

\bar{R}

and

{({\bar{R}}^{G, (i)})}_{i = 1, 2, \dots, K_{G}}

for Scenario A, and only

\bar{R}

for Scenario B. Figure 1a–c show the loss functions of the classical oracle GA estimator under various combinations of

\bar{R}

and

{({\bar{R}}^{G, (i)})}_{i = 1, 2, \dots, K_{G}}

, while Figure 1d displays the loss function for Scenario B, based solely on

\bar{R}

. In each subplot of Figure 1, both the true mean

μ^{M}

and the MLE of the corresponding

\bar{R}

are marked for direct comparison with the classical oracle GA estimator.

Despite having access to

μ^{M}

, the loss function of the classical oracle GA estimator can sometimes produce estimates worse than those of the MLE. In Figure 1a, the classical oracle GA estimator deviates significantly from the true mean, offering no advantage over the MLE, even when the true mean is explicitly known.

Figure 1b,c illustrate cases where the loss function defined in Equation (12) exhibits multiple minima. In particular, non-convexity in the loss function poses significant challenges for optimization, such as convergence to local minima or divergence to infinity. The severity of these issues depends on the degree of non-convexity and may be partially addressed through methods like random restarts. Nevertheless, these complications increase the computational effort required to obtain reliable estimates. Moreover, as seen in Figure 1a, even when the global minimum is successfully located—as in Figure 1b—it may still be farther from the true mean than the maximum likelihood estimator (MLE), raising concerns about the statistical efficiency of the classical oracle GA estimator.

Additionally, for certain combinations of

\bar{R}

and

{({\bar{R}}^{G, (i)})}_{i = 1, 2, \dots, K_{G}}

, the loss function may have multiple global minima, as depicted in Figure 1c. In such cases, the classical oracle GA estimator admits more than one equally optimal solution, each yielding the same loss value. This inherent ambiguity cannot be resolved through numerical optimization techniques alone and may lead to confusion, inconsistent interpretations, or suboptimal decisions, ultimately compromising the robustness of the estimation process.

The loss function demonstrates more favorable properties in Scenario B. As shown in Figure 1d, it consistently exhibits a single minimum. Moreover, this minimum is between the true mean and the MLE, as established in Proposition 4. Consequently, in Scenario B, the classical oracle GA estimator is always closer to the true mean than the MLE, highlighting its improved accuracy and robustness in this setting.

4.2. Classical Family-Oracle GA Estimator

Following the methodology of the previous section, we analyze the loss function of the classical family-oracle GA estimator. Figure 2a presents a visualization analogous to Figure 1a, demonstrating that the GA estimator can, in some instances, perform worse than the MLE by deviating further from the true mean. In this figure, the point

({\hat{μ}}^{G}, {\hat{μ}}^{D})

represents the estimates produced by the classical family-oracle GA estimator where

{\hat{μ}}^{G} = {\hat{μ}}^{D}

. The point

(μ^{M}, μ^{M})

denotes the true mean, while

(MLE, MLE)

represents the MLE for the true data and generator, i.e.,

{\bar{R}}^{G}

and

\bar{R}

. It is evident that

({\hat{μ}}^{G}, {\hat{μ}}^{D})

lies farther from

(MLE, μ^{M})

than

(MLE, MLE)

, indicating that, in certain samples, the classical GA estimator may yield less accurate estimates than the MLE.

In contrast, the case illustrated in Figure 1b and Figure 2b provides an example in which the min–max optimization procedure fails due to the loss function not being convex–concave over the

(μ^{G}, μ^{D})

plane. Consequently, no saddle point exists, and the minimax equality does not hold:

min_{μ^{G}} max_{μ^{D}} L (μ^{D}, μ^{G}) \neq max_{μ^{D}} min_{μ^{G}} L (μ^{D}, μ^{G}) .

The min–max solution is denoted as

({\hat{μ}}_{1}^{G}, {\hat{μ}}_{1}^{D})

, with its corresponding MLE labeled (MLE1, MLE), while the max–min solution is

({\hat{μ}}_{2}^{G}, {\hat{μ}}_{2}^{D})

, with corresponding MLE tag (MLE2, MLE).

The absence of a saddle point in this setting presents a practical challenge; multiple suboptimal solutions may emerge across simulation runs, undermining estimator reliability. Figure 2c shows the behavior of the loss function near these two points. The dashed green line represents the loss function with

μ^{G} = {\hat{μ}}_{1}^{G}

held constant, while the solid green line represents the loss with

μ^{D} = {\hat{μ}}_{1}^{D}

fixed. Similarly, the dashed olive line fixes

μ^{G} = {\hat{μ}}_{2}^{G}

, and the solid olive line fixes

μ^{D} = {\hat{μ}}_{2}^{D}

. These plots reveal that neither

({\hat{μ}}_{1}^{G}, {\hat{μ}}_{1}^{D})

nor

({\hat{μ}}_{2}^{G}, {\hat{μ}}_{2}^{D})

corresponds to a local maximum with respect to

μ^{D}

or a local minimum with respect to

μ^{G}

. As a result, the classical family-oracle GA estimator fails to reach an optimal solution in this scenario, degrading its performance.

Finally, Figure 2d illustrates the loss function of the classical family-oracle GA estimator in Scenario B. As expected, the loss function exhibits convex–concave behavior, and a unique saddle point is identified. Moreover, the resulting estimator is notably closer to the MLE, where the red and the green dot are the same, indicating improved performance.

4.3. Alternative GA Estimator

The loss functions of the alternative GA estimator are illustrated in Figure 3. In contrast, Figure 1a, Figure 2a, and Figure 3a show that the alternative GA estimate

{\hat{μ}}^{D}

no longer deviates from the MLE. Instead, discrepancies arise only in

{\hat{μ}}^{G}

, attributable to random noise from the generator. Consequently, the alternative GA estimator closely approximates the MLE and outperforms the classical oracle GA estimator with the family-oracle discriminator.

Moreover, the issue of the non-existence of a saddle point—observed with the classical family-oracle GA estimator—is resolved in the alternative GA framework. Figure 3b depicts a well-behaved convex–concave loss function exhibiting a unique saddle point that serves as a local minimum with respect to

μ^{G}

and a local maximum with respect to

μ^{D}

. This property is further clarified in Figure 3c, where the solid green line represents the loss function with

μ^{G} = {\hat{μ}}^{G}

fixed, and the dashed green line represents the function with

μ^{D} = {\hat{μ}}^{D}

fixed.

In Scenario B, the alternative GA estimator offers no clear advantage over the classical GA estimators. As shown in Figure 3d, its proximity to the MLE is similar to that observed in Figure 2d (i.e., red and green dots are in the same position).

4.4. Distribution of Estimators

In the previous sections, we have produced five GA estimators. These are the classical GA with the discriminator’s estimator for the generator (see Equation (14)), the classical GA with the family-oracle discriminator’s estimators for the discriminant and generator (see Equation (16)), and the alternative GA method’s estimators for the discriminant and generator (see Equations (17) and (18)). To compare the statistical properties of these five GA estimators with the MLE, we construct density plots for all.

Specifically, we generate 20,000 simulated samples of

{Z_{t}}_{t = 1 : T}^{n}

and

{Z_{t}^{G, (i)}}_{t = 1 : T, i = 1 : K_{G}}^{n}

, where

n = 1, 2, \dots, 20,000

represents the trial index. For each trial n, the five estimators are computed based on the corresponding

{Z_{t}}_{t = 1 : T}^{n}

and

{Z_{t}^{G, (i)}}_{t = 1 : T, i = 1 : K_{G}}^{n}

. The resulting distributions of the estimators are visualized using Gaussian kernel density estimates based on histograms from the 20,000 trials. These density plots are presented in Figure 4 and Figure 5.

Figure 4 is about the estimators of

μ^{D}

. It presents three distributions: the distributions of the estimators for the discriminator (

μ^{D}

) from both the classical GA method with the family-oracle and the alternative GA method with the family-oracle. It also includes the distribution of the MLE, while the true mean

μ^{M}

is indicated by the dashed blue vertical line. The density plots for the alternative GA method and MLE are all symmetric around the true mean. Notably, the density plot of the alternative GA method closely resembles that of the MLE. Descriptive statistics support these observations. The means and standard deviations of the alternative GA method and the MLE are nearly identical, confirming the strong performance of the alternative GA method.

In contrast, the classical GA method with the family-oracle discriminator exhibits a significant bias relative to the other estimators, despite having the lowest standard deviation (0.000668). To see the implication of this low standard deviation, we compute confidence intervals for these parameters, under the classical assumption of a Gaussian distribution for the estimators. At the 1% significance level, the 99% confidence interval for the mean of the classical family-oracle GA discriminator is

(0.000216, 0.000241)

, which excludes the true mean—indicating substantial bias and limited reliability for estimating the true value. On the other hand, the 99% confidence intervals for both the MLE and the alternative GA method are

(0.000376, 0.000412)

, and, for the classical oracle GA estimator,

(0.000379, 0.000409)

; all of these intervals contain the true mean. The bias seems to arise from the possibility that no saddle point exists for certain sample paths,

{Z_{t}}_{t = 1 : T}^{n}

and

{Z_{t}^{G, (i)}}_{t = 1 : T, i = 1 : K_{G}}^{n}

, as discussed in Section 4.3. The failure of the min–max optimization in these cases highlights the instability of the classical GA method with the family-oracle discriminator and suggests challenges in applying this method reliably in financial practice.

Figure 5 targets the estimators of

μ^{G}

. It displays distributions of four estimators for the generator (

μ^{G}

). These are the distribution from both the classical GA method with the family-oracle and with the oracle discriminator and the alternative GA method with the family-oracle. It also includes the true parameter (

μ^{M}

) and the distribution of the corresponding MLE.

In this figure, the distribution of the estimator from the classical GA method with the family-oracle discriminator deteriorates significantly, exhibiting a multi-modal pattern. Although its mean (0.000359) is closer to the true mean than in Figure 4, the estimator remains substantially biased, and its standard deviation increases to 0.001154. Moreover, the 99% confidence interval for the mean is

(0.000338, 0.000380)

, which excludes the true mean, indicating persistent bias.

The estimator from the alternative GA method also shows a slight decline in performance. Its mean decreases marginally from 0.000394 to 0.000393, and its standard deviation increases from 0.000994 to 0.000998. However, these changes are minimal and can be attributed to the inherent randomness in the generator. Its 99% confidence interval for the mean remains robust at

(0.000375, 0.000411)

.

Lastly, the estimator from the classical GA method with the oracle discriminator shows great performance, exhibiting the same mean and a smaller standard deviation (0.000811) compared to the other estimators. However, this improved performance is expected, as the classical GA method with the oracle discriminator incorporates the true mean into its loss function, effectively giving it an unfair and unrealistic advantage.

Overall, this figure reinforces that the classical GA estimator with the family-oracle discriminator is largely unreliable, whereas the alternative GA estimator remains statistically sound and robust.

5. Conclusions

Summarizing our methodology and main findings, using a simple setting of a Gaussian i.i.d. sequence with an unknown mean, we reveal several pitfalls for the classical GA methodology of [1] in finite samples. Our intention with this simple setting is, on the one hand, to gain a deeper understanding of the GA methodology by relying on a control experiment, i.e., where the true data generator or model is known. On the other hand, we intend to make the case that, if the methodology falters in simple settings, then it should be of concern, and further studies should be carried out in more complex settings.

In order to favor the methodology further, avoiding approximations to the discriminant or generator, we assume that the discriminants and generators are in the same family as the true data-generating process (family-oracle). We also try the oracle discriminant case but not as a serious contender due to its impracticality. In other words, the discriminant is perfect as it knows the true model. Theoretically and numerically, we find that the classical GA method might deliver multiple local optima, or even multiple global optimal in the oracle case, while occasionally failing to deliver a saddle point in the family-oracle setting. Moreover, the estimator implied in the family-oracle is biased with smaller variance than the maximum likelihood estimator. Such a combination of bias and small variance would only keep practitioners away from the true model.

Given the limitations of the classical GA method, we propose an alternative GA method that delivers, theoretically, a unique saddle point. This optimal point, as an estimator, is unbiased and consistent. Moreover, numerically, the methodology is very stable and a little time-consuming. The findings in the numerical sections were tested for robustness, with many other examples excluded from the paper to streamline the presentation.

In future research, the ideas presented in this paper will be used as a blueprint to study more advanced models (i.e., true data-generating processes) like non-Gaussian i.i.d series, autoregressive models, or autoregressive conditional heteroskedasticity (ARCH) models, as well as other GA methodologies or loss functions like Geometric GA, Least-Squares GA, and Wasserstein GA methods. And, more importantly, the analyses should be extended, first to a discriminant or generators relying on neural networks (NN), as a way of testing the capacity of NNs to uniquely and reliably capture true models, and, secondly, to true data-generating processes that are NNs themselves. If NN discriminators and generators can capture these models in control experiments, then they have a real shot at capturing reality; otherwise, the hype around NNs might be an illusion.

Author Contributions

Methodology, M.E.-A. and Y.J.; software, Y.J.; validation, M.E.-A. and Y.J.; formal analysis, M.E.-A. and Y.J.; investigation, M.E.-A. and Y.J.; data curation, M.E.-A. and Y.J.; writing—original draft, M.E.-A. and Y.J.; writing—review and editing, M.E.-A. and Y.J.; visualization, Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data will be made available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Nomenclature and Proofs

Appendix A.1. Nomenclature

The following table describes the symbols, and their interpretation, used throughout the paper.

Table A1. Nomenclature.

Symbols	Comment
$K, T$	Number of samples, size of each sample
$K_{G}, T_{G}$	Number of generated samples, size of each sample
$μ_{M}$	Mean of Gaussian r.v., true model
$σ^{2}$	Variance of Gaussian r.v., true model
D	Discriminator,
$μ^{D}$ , ${\hat{μ}}^{D}$	Parameter of discriminator, estimator
$μ^{G}$ , ${\hat{μ}}^{G}$	Parameter of generator, estimator
$R_{1 : T_{G}}^{G, i}$	Sample i generated of size $T_{G}$
$R_{1 : T}$	Sample of size T from true model
L	Generic loss function asymptotic
$L^{(E)}$	Generic loss function finite sample
$L^{(C)}$	Classical GA, loss function asymptotic
$L^{(E, C)}$	Classical GA, loss function finite sample
$L^{(A)}$	Alternative GA, loss function asymptotic
$L^{(E, A)}$	Alternative GA, loss function finite sample

Appendix A.2. Proofs

Lemma A1.

Let

X \sim N (μ, σ^{2})

,

f \in C^{1} (R, R)

, and

f^{'} \in L^{1} (Ω, F, P)

. Then,

E [(X - μ) g (X)] = σ^{2} E [g^{'} (X)]

.

Proof.

We begin by proving that

E [| (X - μ) g (X) |] < \infty

. Defining

Z = (X - μ) / σ

and expressing the expectation in integral form, we obtain

E [| (X - μ) g (X) |] = - \frac{1}{\sqrt{2 π}} \int_{- \infty}^{0} | g (μ + σ z) | z e^{- \frac{z^{2}}{2}} d z + \frac{1}{\sqrt{2 π}} \int_{0}^{\infty} | g (μ + σ z) | z e^{- \frac{z^{2}}{2}} d z .

Focusing on the second integral,

\begin{matrix} \begin{matrix} \frac{1}{\sqrt{2 π}} \int_{0}^{\infty} | g (μ + σ z) | z e^{- \frac{z^{2}}{2}} d z & = \frac{1}{\sqrt{2 π}} \int_{0}^{\infty} |g (μ) + σ \int_{0}^{z} g^{'} (μ + σ t) d t| z e^{- \frac{z^{2}}{2}} d z \\ \leq \frac{1}{\sqrt{2 π}} \int_{0}^{\infty} |g (μ)| z e^{- \frac{z^{2}}{2}} d z + \frac{σ}{\sqrt{2 π}} \int_{0}^{\infty} (\int_{0}^{z} |g^{'} (μ + σ t)| d t) z e^{- \frac{z^{2}}{2}} d z \\ = \frac{|g (μ)|}{\sqrt{2 π}} + \frac{σ}{\sqrt{2 π}} \int_{0}^{\infty} (\int_{t}^{\infty} z e^{- \frac{z^{2}}{2}} d z) |g^{'} (μ + σ t)| d t \\ = \frac{|g (μ)|}{\sqrt{2 π}} + \frac{σ}{\sqrt{2 π}} \int_{0}^{\infty} |g^{'} (μ + σ t)| e^{- \frac{t^{2}}{2}} d t \\ \leq \frac{|g (μ)|}{\sqrt{2 π}} + E [|g^{'} (X)|] \\ < \infty . \end{matrix} \end{matrix}

Here, the third line follows from Tonelli’s theorem, and the final inequality holds due to the assumption that

f^{'} \in L^{1} (Ω, F, P)

. A similar argument establishes that the first integral is also finite, thereby confirming that

E [| (X - μ) g (X) |] < \infty

.

Since this expectation is finite, we can apply Fubini’s theorem:

\begin{matrix} \begin{matrix} E [(X - μ) g (X)] & = σ E [Z g (μ + σ Z)] \\ = \frac{σ}{\sqrt{2 π}} \int_{- \infty}^{\infty} (g (μ) + σ \int_{0}^{z} g^{'} (μ + σ t) d t) z e^{- \frac{z^{2}}{2}} d z \\ = \frac{σ^{2}}{\sqrt{2 π}} \int_{- \infty}^{\infty} (\int_{0}^{z} g^{'} (μ + σ t) d t) z e^{- \frac{z^{2}}{2}} d z \\ = \frac{σ^{2}}{\sqrt{2 π}} \int_{- \infty}^{0} (\int_{0}^{z} g^{'} (μ + σ t) d t) z e^{- \frac{z^{2}}{2}} d z + \frac{σ^{2}}{\sqrt{2 π}} \int_{0}^{\infty} (\int_{0}^{z} g^{'} (μ + σ t) d t) z e^{- \frac{z^{2}}{2}} d z \\ = - \frac{σ^{2}}{\sqrt{2 π}} \int_{- \infty}^{0} (\int_{- \infty}^{t} z e^{- \frac{z^{2}}{2}} d z) g^{'} (μ + σ t) d t + \frac{σ^{2}}{\sqrt{2 π}} \int_{0}^{\infty} (\int_{t}^{\infty} z e^{- \frac{z^{2}}{2}} d z) g^{'} (μ + σ t) d t \\ = \frac{σ^{2}}{\sqrt{2 π}} \int_{- \infty}^{0} g^{'} (μ + σ t) e^{- \frac{t^{2}}{2}} d t + \frac{σ^{2}}{\sqrt{2 π}} \int_{0}^{\infty} g^{'} (μ + σ t) e^{- \frac{t^{2}}{2}} d t \\ = \frac{σ^{2}}{\sqrt{2 π}} \int_{- \infty}^{\infty} g^{'} (μ + σ t) e^{- \frac{t^{2}}{2}} d t \\ = σ^{2} E [g^{'} (μ + σ Z)] \\ = σ^{2} E [g^{'} (X)] . \end{matrix} \end{matrix}

Thus, we have established the desired result. □

Proof.

Proof of Proposition 1.

First, recall that the oracle discriminator can be written as

\begin{matrix} D^{*} (x) & = & \frac{f_{\bar{M}} (x)}{f_{\bar{M}} (x) + f_{\bar{G}} (x)} \\ = & \frac{1}{1 + exp (({(μ^{M})}^{2} - {(μ^{G})}^{2}) \frac{1}{2 σ^{2}} - (μ^{M} - μ^{G}) \frac{x}{σ^{2}})} . \end{matrix}

We then search for the best generator, i.e., the best

μ^{G}

:

\begin{matrix} \underset{μ^{G}}{arg min} \{E_{M} [ln (D^{*} (\bar{R}))] + E_{G} [ln (1 - D^{*} (\bar{R_{G}}))]\} \\ = & \underset{μ^{G}}{arg min} \{\int ln (\frac{f_{\bar{M}} (x)}{f_{\bar{M}} (x) + f_{\bar{G}} (x)}) f_{\bar{M}} (x) d x + \int ln (\frac{f_{\bar{G}} (y)}{f_{\bar{M}} (y) + f_{\bar{G}} (y)}) f_{\bar{G}} (y) d y\} . \end{matrix}

The result is shown next:

min_{μ^{G}} \{\int ln (\frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}}) f_{\bar{M}} d x + \int ln (\frac{f_{\bar{G}}}{f_{\bar{M}} + f_{\bar{G}}}) f_{\bar{G}} d y\}

\begin{matrix} \frac{\partial}{\partial μ^{G}} \{\int ln (\frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}}) f_{\bar{M}} d x + \int ln (\frac{f_{\bar{G}}}{f_{\bar{M}} + f_{\bar{G}}}) f_{\bar{G}} d y\} \\ = & - \int \frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}} \frac{\partial f_{\bar{G}}}{\partial μ^{G}} d x + \int (\frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}} \frac{\partial f_{\bar{G}}}{\partial μ^{G}} + ln (\frac{f_{\bar{G}}}{f_{\bar{M}} + f_{\bar{G}}}) \frac{\partial f_{\bar{G}}}{\partial μ^{G}}) d y \\ = & - \frac{T}{σ^{2}} \int (x - μ^{G}) \frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}} f_{\bar{G}} d x + \frac{T}{σ^{2}} \int (y - μ^{G}) (\frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}} + ln (\frac{f_{\bar{G}}}{f_{\bar{M}} + f_{\bar{G}}})) f_{\bar{G}} d y = 0 . \end{matrix}

The first-order condition is

\int (x - μ^{G}) ln (\frac{f_{\bar{G}}}{f_{\bar{M}} + f_{\bar{G}}}) f_{\bar{G}} d x = 0

or

E [(x - μ^{G}) ln (\frac{f_{\bar{G}}}{f_{\bar{M}} + f_{\bar{G}}})] = 0, x \sim N (μ^{G}, σ^{2} / T) .

Applying Lemma A1 to the FOC,

g (x) = ln (\frac{f_{\bar{G}} (x)}{f_{\bar{M}} (x) + f_{\bar{G}} (x)})

, and the FOC is

E [(x - μ^{G}) g (x)] = \frac{σ^{2}}{T} E [g^{'} (x)] = 0, x \sim N (μ^{G}, σ^{2} / T) .

Since

g^{'} (x) = \frac{T}{σ^{2}} (μ^{G} - μ^{M}) \frac{f_{\bar{M}} (x)}{f_{\bar{M}} (x) + f_{\bar{G}} (x)}

, the FOC requires

(μ^{G} - μ^{M}) E [\frac{f_{\bar{M}} (x)}{f_{\bar{M}} (x) + f_{\bar{G}} (x)}] = 0, x \sim N (μ^{G}, σ^{2} / T) .

Since

\frac{f_{\bar{M}} (x)}{f_{\bar{M}} (x) + f_{\bar{G}} (x)} > 0

for all x,

E [\frac{f_{\bar{M}} (x)}{f_{\bar{M}} (x) + f_{\bar{G}} (x)}] > 0

, and the FOC is satisfied only when

μ^{G} = μ^{M}

. Now, we prove that the unique extreme of the classical GA problem is

μ^{G} = μ^{M}

. Next, we prove that

μ^{G} = μ^{M}

is a minimum.

The second-order derivative of the loss function with respect to

μ^{G}

is

\begin{matrix} \frac{T}{σ^{2}} \frac{\partial}{\partial μ^{G}} \int (x - μ^{G}) ln (\frac{f_{\bar{G}}}{f_{\bar{M}} + f_{\bar{G}}}) f_{\bar{G}} d x \\ = \frac{T^{2}}{σ^{4}} \int ((R + ln (1 - R)) {(x - μ^{G})}^{2} - \frac{σ^{2}}{T} ln (1 - R)) f_{\bar{G}} d x \end{matrix}

where

R (x) = \frac{f_{\bar{M}} (x)}{f_{\bar{M}} (x) + f_{\bar{G}} (x)}

. At

μ^{G} = μ^{M}

, the second-order derivative equals

\begin{matrix} \begin{matrix} \frac{T^{2}}{σ^{4}} E [(\frac{1}{2} + ln \frac{1}{2}) {(x - μ^{M})}^{2} - ln \frac{1}{2} \frac{σ^{2}}{T}], x \sim N (μ^{M}, σ^{2} / T) \\ = \frac{T}{2 σ^{2}} > 0 . \end{matrix} \end{matrix}

We conclude that

μ^{G} = μ^{M}

is the unique global minimum of Equation (1) with the loss function of Equation (3). □

Proof.

Proof of Proposition 2. First, we produce the first-order condition (FOC) with respect to

μ^{D}

.

\underset{μ^{D}}{arg max} L (μ^{D}, μ^{G}) = \underset{μ^{D}}{arg max} \{E_{M} [ln (D (R_{1 : T}; μ^{D}))] + E_{G} [ln (1 - D (R_{1 : T_{G}}^{G}; μ^{D}))]\}

The FOC reads

\begin{matrix} \frac{\partial}{\partial μ^{D}} \{E_{M} [ln (D (R_{1 : T}; μ^{D}))] + E_{G} [ln (1 - D (R_{1 : T_{G}}^{G}; μ^{D}))]\} \\ = & \int \frac{\partial}{\partial μ^{D}} ln (D (x; μ^{D})) f_{M} (x) d x + \int \frac{\partial}{\partial μ^{D}} ln (1 - D (y; μ^{D})) f_{G} (y) d y \\ = & \int \frac{T (\bar{x} - μ^{D})}{σ^{2}} (\frac{f_{G} (x)}{f_{G} (x) + f_{D} (x)}) f_{M} (x) d x - \int \frac{T_{G} (\bar{y} - μ^{D})}{σ^{2}} (\frac{f_{D} (y)}{f_{G} (y) + f_{D} (y)}) f_{G} (y) d y = 0 \end{matrix}

where the integrations are in dimensions T and

T_{G}

, respectively.

It is easy to see that, if

T = T_{G}

, then the FOC becomes

\int \frac{(\bar{x} - μ^{D})}{σ^{2}} [(\frac{f_{M} (x) - f_{D} (x)}{f_{G} (x) + f_{D} (x)})] f_{G} (x) d x = 0

and

μ^{D} = μ^{M}

satisfies this FOC regardless of

μ^{G}

.

Next, we produce the first-order condition (FOC) with respect to

μ^{G}

:

\underset{μ^{G}}{arg min} L (μ^{D}, μ^{G}) = \underset{μ^{G}}{arg min} \{E_{M} [ln (D (R_{1 : T}; μ^{D}))] + E_{G} [ln (1 - D (R_{1 : T_{G}}^{G}; μ^{D}))]\}

The FOC reads

\begin{matrix} \frac{\partial}{\partial μ^{G}} \{E_{M} [ln (D (R_{1 : T}; μ^{D}))] + E_{G} [ln (1 - D (R_{1 : T_{G}}^{G}; μ^{D}))]\} \\ = & \int \frac{\partial}{\partial μ^{G}} \{ln (D (x; μ^{D})) f_{M} (x)\} d x + \int \frac{\partial}{\partial μ^{G}} \{ln (1 - D (y; μ^{D})) f_{G} (y)\} d y \\ = & - \int \frac{1}{f_{G} (x) + f_{D} (x)} \frac{\partial f_{G} (x)}{\partial μ^{G}} f_{M} (x) d x + \int (D (y; μ^{D}) + ln (1 - D (y; μ^{D}))) \frac{\partial f_{G} (y)}{\partial μ^{G}} d y \\ = & - \int \frac{T (\bar{x} - μ^{G})}{σ^{2}} \frac{f_{G} (x)}{f_{G} (x) + f_{D} (x)} f_{M} (x) d x + \int \frac{T_{G} (\bar{y} - μ^{G})}{σ^{2}} (D (y; μ^{D}) + ln (1 - D (y; μ^{D}))) f_{G} (y) d y \\ = & 0 \end{matrix}

Hence, the FOC(s) would be

\begin{matrix} \frac{T}{σ^{2}} \int (\bar{x} - μ^{D}) D (x; μ^{D}) \frac{f_{M} (x)}{f_{D} (x)} f_{G} (x) d x - \frac{T_{G}}{σ^{2}} \int (\bar{y} - μ^{D}) D (y; μ^{D}) f_{G} (y) d y = 0 \\ \frac{T_{G}}{σ^{2}} \int (\bar{y} - μ^{G}) (D (y; μ^{D}) + ln (1 - D (y; μ^{D}))) f_{G} (y) d y - \frac{T}{σ^{2}} \int (\bar{x} - μ^{G}) \frac{f_{G} (x)}{f_{G} (x) + f_{D} (x)} f_{M} (x) d x = 0 \end{matrix}

It is easy to see that the point

(μ^{D}, μ^{G}) = (μ^{M}, μ^{M})

satisfies the FOC(s).

Note that, with

μ^{D} = μ^{M}

, the optimal discriminator would be the oracle.

D (x; {(μ^{D})}^{*}) = D (x; μ^{M}) = \frac{f_{M} (x)}{f_{M} (x) + f_{G} (x)} = D^{*} (x)

□

Proof.

Proof of Proposition 3. For convenience, we work with T and K rather than

T_{G}

and

K_{G}

. Recall,

D (X_{1 : T}; μ^{D}) = f_{D} (X_{1 : T}; μ^{D}) = \prod_{t = 1}^{T} f_{D} (X_{t}; μ^{D}) = \frac{1}{{(2 π)}^{T / 2} {(σ)}^{T}} exp (- \sum_{t = 1}^{T} \frac{{(X_{t} - μ^{D})}^{2}}{2 σ^{2}})

This is the likelihood of a Gaussian model with mean and volatility

(μ^{D}, σ)

.

The new objective function is

L (μ^{D}, μ^{G}) = E_{M} [ln (D (R_{1 : T}; μ^{D}))] + E_{G} [- λ ln (D (R_{1 : T}^{G}; μ^{D}))]

with

λ > 0

. Then, we have

\begin{matrix} E_{M} [ln (D (R_{1 : T}, μ^{D}))] \\ = & \int ln (D (R_{1 : T}, μ^{D})) f (R_{1 : T}; μ^{M}) d R_{1 : T} \\ = & \int (- ln ({(2 π)}^{T / 2} {(σ)}^{T}) - \sum_{t = 1}^{T} \frac{{(R_{t} - μ^{D})}^{2}}{2 σ^{2}}) \frac{exp (- \sum_{t = 1}^{T} \frac{{(R_{t} - μ^{M})}^{2}}{2 σ^{2}})}{{(2 π)}^{T / 2} {(σ)}^{T}} d R_{1 : T} \\ = & - ln ({(2 π)}^{T / 2} {(σ)}^{T}) - \frac{T}{2} - \frac{T}{2 σ^{2}} {(μ^{M} - μ^{D})}^{2} \end{matrix}

and

\begin{matrix} E_{G} [ln (D (R_{1 : T}^{G}; μ^{D}))] \\ = & \int ln (D (R_{1 : T}^{G}; μ^{D})) f (R_{1 : T}^{G}; μ^{G}) d R_{1 : T}^{G} \\ = & \int (- ln ({(2 π)}^{T / 2} {(σ)}^{T}) - \sum_{t = 1}^{T} \frac{{(R_{t}^{G} - μ^{D})}^{2}}{2 σ^{2}}) \frac{exp (- \sum_{t = 1}^{T} \frac{{(R_{t}^{G} - μ^{G})}^{2}}{2 σ^{2}})}{{(2 π)}^{T_{G} / 2} {(σ)}^{T_{G}}} d R_{1 : T}^{G} \\ = & - ln ({(2 π)}^{T / 2} {(σ)}^{T}) - \frac{T}{2} - \frac{T}{2 σ^{2}} {(μ^{G} - μ^{D})}^{2} \end{matrix}

Together, the loss function becomes

\begin{matrix} L & = & E [ln (D (R_{1 : T}, μ^{D}))] - λ E [ln (D (R_{1 : T_{G}}^{G}, μ^{D}))] \\ = & (λ - 1) ln ({(2 π)}^{T / 2} {(σ)}^{T}) + (λ - 1) \frac{T}{2} - \frac{T}{2 σ^{2}} ({(μ^{M} - μ^{D})}^{2} - λ {(μ^{G} - μ^{D})}^{2}) \\ = & (λ - 1) ln ({(2 π)}^{T / 2} {(σ)}^{T}) + (λ - 1) \frac{T}{2} \\ - \frac{T}{2 σ^{2}} ({(μ^{M})}^{2} - λ {(μ^{G})}^{2} - (λ - 1) {(μ^{D})}^{2} - 2 μ^{D} (μ^{M} - λ μ^{G})) \end{matrix}

Let us compute the first-order condition with respect to

μ^{D}

:

\begin{matrix} \frac{\partial}{\partial μ^{D}} \{E [ln (D (R_{1 : T}; μ^{D}))] - λ E [ln (D (R_{1 : T}^{G}; μ^{D}))]\} \\ = & \frac{T}{2 σ^{2}} (2 (λ - 1) μ^{D} + 2 (μ^{M} - λ μ^{G})) = 0 \\ \Leftrightarrow & (λ - 1) μ^{D} + (μ^{M} - λ μ^{G}) = 0 \end{matrix}

Now, we compute the first-order condition with respect to

μ^{G}

:

\begin{matrix} \frac{\partial}{\partial μ^{G}} \{E [ln (D (R_{1 : T}; μ^{D}))] - λ E [ln (D (R_{1 : T}^{G}; μ^{D}))]\} \\ = & \frac{T}{2 σ^{2}} (2 λ μ^{G} - 2 λ μ^{D}) = 0 \\ \Leftrightarrow & - μ^{G} + μ^{D} = 0 \end{matrix}

The first-order conditions are satisfied uniquely if the following relation holds:

\begin{matrix} μ^{D} & = & μ^{M} \\ μ^{G} & = & μ^{M} \end{matrix}

A sufficient condition for a saddle point is a negative Hessian.

\begin{matrix} L_{D D} & = \frac{\partial^{2} L (μ^{D}, μ^{G})}{\partial μ^{D} \partial μ^{D}} = (λ - 1) \frac{T}{σ^{2}} \\ L_{D G} & = \frac{\partial^{2} L (μ^{D}, μ^{G})}{\partial μ^{D} μ^{G}} = - λ \frac{T}{σ^{2}} \\ L_{G D} & = \frac{\partial^{2} L (μ^{D}, μ^{G})}{\partial μ^{G} μ^{D}} = - λ \frac{T}{σ^{2}} \\ L_{G G} & = \frac{\partial^{2} L (μ^{D}, μ^{G})}{\partial μ^{G} μ^{G}} = λ \frac{T}{σ^{2}} \end{matrix}

Finally, the Hessian is

\begin{matrix} L_{G G} L_{D D} - L_{G D} L_{D G} & = & (λ - 1) λ \frac{T}{σ^{2}} \frac{T}{σ^{2}} - λ λ \frac{T}{σ^{2}} \frac{T}{σ^{2}} \\ = & \{\begin{matrix} - λ \frac{T^{2}}{σ^{4}} & \begin{matrix} λ \neq 1, \\ λ = 1, μ^{G} \neq μ^{M} \end{matrix} \\ 0 & λ = 1, μ^{G} = μ^{M} \end{matrix} \end{matrix}

Therefore, the point

(μ^{D}, μ^{G}) = (μ^{M}, μ^{M})

is the unique saddle point for the given optimization problem if

λ > 0

and

λ \neq 1

. Moreover, if

λ < 1

, then

L_{D D} < 0

, making the critical point a maximum in

μ^{D}

. □

Proof.

Proof of Proposition 4.

For convenience, we work with K rather than

K_{G}

. Let us compute the first-order condition with respect to

μ^{G}

:

\frac{\partial}{\partial μ^{G}} \{ln (D (R_{1 : T})) + \frac{1}{K} \sum_{i = 1}^{K} ln (1 - D (R_{1 : T}^{G, (i)}))\} = 0

We have

\begin{matrix} ln (D (R_{1 : T})) & = & ln (\frac{1}{1 + exp (({(μ^{M})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} - (μ^{M} - μ^{G}) \frac{T}{σ^{2}} \bar{R})}) \\ ln (1 - D (R_{1 : T_{G}}^{G, (i)})) & = & ln (\frac{1}{1 + exp (- ({(μ^{M})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} + (μ^{M} - μ^{G}) \frac{T}{σ^{2}} {\bar{R}}^{G, (i)})}) \end{matrix}

then

\frac{\partial}{\partial μ^{G}} ln (D (R_{1 : T})) = \frac{\frac{T}{σ^{2}} (μ^{G} - \bar{R})}{1 + exp (- ({(μ^{M})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} + (μ^{M} - μ^{G}) \frac{T}{σ^{2}} \bar{R})}

and

\begin{matrix} \frac{\partial}{\partial μ^{G}} \{\frac{1}{K} \sum_{i = 1}^{K} ln (1 - D (R_{1 : T}^{G, (i)}))\} \\ = & \frac{1}{K} \sum_{i = 1}^{K} (\frac{\frac{T}{σ^{2}} ({\bar{R}}^{G, (i)} - μ^{M})}{1 + exp (({(μ^{M})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} - (μ^{M} - μ^{G}) \frac{T}{σ^{2}} {\bar{R}}^{G, (i)})}) \end{matrix}

Together,

\frac{\partial}{\partial μ^{G}} \{ln (D (R_{1 : T})) + \frac{1}{K} \sum_{i = 1}^{K} ln (1 - D (R_{1 : T}^{G, (i)}))\} = 0

\begin{matrix} \frac{\frac{T}{σ^{2}} (μ^{G} - \bar{R})}{1 + exp (- ({(μ^{M})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} + (μ^{M} - μ^{G}) \frac{T}{σ^{2}} \bar{R})} \\ + \frac{1}{K} \sum_{i = 1}^{K} (\frac{\frac{T_{G}}{σ^{2}} ({\bar{R}}^{G, (i)} - μ^{M})}{1 + exp (({(μ^{M})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} - (μ^{M} - μ^{G}) \frac{T}{σ^{2}} {\bar{R}}^{G, (i)})}) & = & 0 \end{matrix}

Hence, we obtain a highly non-linear equation for

μ^{G}

, where we should recall that

R_{t}^{G, (i)} = μ^{G} + σ Z_{t}^{G, (i)}

, and

{\bar{R}}^{G, (i)} = μ^{G} + σ {\bar{Z}}^{G, (i)} .

Let us define

\begin{matrix} h_{1} (R, μ^{M}, μ^{G}) & = & \frac{1}{1 + exp (- ({(μ^{M})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} + (μ^{M} - μ^{G}) \frac{T}{σ^{2}} \bar{R})} \\ h_{4} (R^{G}, μ^{M}, μ^{G}) & = & \frac{1}{K} \sum_{i = 1}^{K} (\frac{({\bar{R}}^{G, (i)} - μ^{M})}{1 + exp (({(μ^{M})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} - (μ^{M} - μ^{G}) \frac{T}{σ^{2}} {\bar{R}}^{G, (i)})}) \\ h_{5} (R, R^{G}, μ^{M}, μ^{G}) & = & \frac{h_{4} (R^{G}, μ^{M}, μ^{G})}{h_{1} (R, μ^{M}, μ^{G})} \end{matrix}

Then, we can write the first-order condition as follows:

(μ^{G} - \bar{R}) + \frac{h_{4} (R^{G}, μ^{M}, μ^{G})}{h_{1} (R, μ^{M}, μ^{G})} = 0

An implicit solution could be written as

{\hat{μ}}^{G} = \bar{R} - h_{5} (R^{G}, μ^{M}, {\hat{μ}}^{G})

Note that

{\hat{μ}}^{G} = \bar{R}

is very close to a solution of the FOC when

{\bar{Z}}^{M}

,

{\bar{Z}}^{G, (i)}

, and T is sufficiently large.

For the case of

K \to \infty

, we compute the first-order condition with respect to

μ^{G}

on the objective function:

\frac{\partial}{\partial μ^{G}} \{ln (D (R_{1 : T})) + \int ln (1 - D (R_{1 : T}^{G})) f (R_{1 : T}^{G}; μ^{G}) d R_{1 : T}^{G}\} = 0

\begin{matrix} \frac{\partial}{\partial μ^{G}} \{\int ln (1 - D (R_{1 : T}^{G})) f (R_{1 : T}^{G}; μ^{G}) d R_{1 : T}^{G}\} \\ = & \int \frac{\partial}{\partial μ^{G}} \{ln (1 - D (y)) f (y; μ^{G})\} d y \\ = & \int \frac{\partial}{\partial μ^{G}} f (y; μ^{G}) \{D (y) + ln (1 - D (y))\} d y \end{matrix}

Together, we obtain

\frac{- D (R_{1 : T})}{f_{M} (R_{1 : T})} \frac{\partial}{\partial μ^{G}} f (R_{1 : T}; μ^{G}) + \int \frac{\partial}{\partial μ^{G}} f (y; μ^{G}) \{D (y) + ln (1 - D (y))\} d y = 0

- (1 - D (R_{1 : T})) (\bar{R} - μ^{G}) + \frac{T_{G}}{T} \int (\bar{y} - μ^{G}) f (y; μ^{G}) \{D (y) + ln (1 - D (y))\} d y = 0

Let us continue simplifying the FOC:

- \frac{T}{σ^{2}} (1 - D (\bar{R})) (\bar{R} - μ^{G}) + \frac{T}{σ^{2}} \int (y - μ^{G}) (\frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}} + ln (\frac{f_{\bar{G}}}{f_{\bar{M}} + f_{\bar{G}}})) f_{\bar{G}} d y = 0

\int (y - μ^{G}) (\frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}} + ln (\frac{f_{\bar{G}}}{f_{\bar{M}} + f_{\bar{G}}})) f_{\bar{G}} d y = (1 - D (\bar{R})) (\bar{R} - μ^{G})

Applying Lemma A1,

(μ^{G} - μ^{M}) E_{\bar{G}} [{(\frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}})}^{2}] = (\bar{R} - μ^{G}) (1 - D (\bar{R})) .

(μ^{G} - μ^{M}) \frac{E_{\bar{G}} [{(\frac{f_{\bar{M}}}{f_{\bar{M}} + f_{\bar{G}}})}^{2}]}{1 - D (\bar{R})} + μ^{G} = \bar{R} .

As a result,

{\hat{μ}}^{G} \neq μ^{M}

unless

\bar{R} = μ^{M}

. Also,

{\hat{μ}}^{G} \neq \bar{R}

, the MLE, unless

\bar{R} = μ^{M}

. When

\bar{R} = μ^{M}

, the unique minimum is

{\hat{μ}}^{G} = μ^{M}

.

Furthermore, since

μ^{G} < μ^{M} \Rightarrow \bar{R} < μ^{G}

and

μ^{G} > μ^{M} \Rightarrow \bar{R} > μ^{G}

, we find that one of the following is true: (i)

\bar{R} < {\hat{μ}}^{G} < μ^{M}

; (ii)

\bar{R} > {\hat{μ}}^{G} > μ^{M}

; or (iii)

\bar{R} = {\hat{μ}}^{G} = μ^{M}

. Thus, the infinite-sample classical oracle GA estimator is between the true mean

μ^{M}

and the MLE

\bar{R}

. □

Proof.

Proof of Proposition 5.

Recall that we have

\begin{matrix} ln (D (R_{1 : T}; μ^{D})) & = & ln (\frac{1}{1 + exp (({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} - (μ^{D} - μ^{G}) \frac{T}{σ^{2}} \bar{R})}) \\ ln (1 - D (R_{1 : T_{G}}^{G, (i)}; μ^{D})) & = & ln (\frac{1}{1 + exp (- ({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T_{G}}{2 σ^{2}} + (μ^{D} - μ^{G}) \frac{T_{G}}{σ^{2}} {\bar{R}}^{G, (i)})}) \end{matrix}

Let us compute the first-order condition with respect to

μ^{D}

:

\frac{\partial}{\partial μ^{D}} \{ln (D (R_{1 : T}; μ^{D})) + \frac{1}{K} \sum_{i = 1}^{K} ln (1 - D (R_{1 : T_{G}}^{G, (i)}; μ^{D}))\} = 0

Then,

\frac{\partial}{\partial μ^{D}} ln (D (R_{1 : T}; μ^{D})) = \frac{\frac{T}{σ^{2}} (\bar{R} - μ^{D})}{1 + exp (- ({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} + (μ^{D} - μ^{G}) \frac{T}{σ^{2}} \bar{R})}

and

\begin{matrix} \frac{\partial}{\partial μ^{D}} \{\frac{1}{K} \sum_{i = 1}^{K} ln (1 - D (R_{1 : T_{G}}^{G, (i)}; μ^{D}))\} \\ = & \frac{1}{K} \sum_{i = 1}^{K} (\frac{\frac{T_{G}}{σ^{2}} (μ^{D} - {\bar{R}}^{G, (i)})}{1 + exp (({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T_{G}}{2 σ^{2}} - (μ^{D} - μ^{G}) \frac{T_{G}}{σ^{2}} {\bar{R}}^{G, (i)})}) \end{matrix}

Together,

\begin{matrix} \frac{\frac{T}{σ^{2}} (\bar{R} - μ^{D})}{1 + exp (- ({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} + (μ^{D} - μ^{G}) \frac{T}{σ^{2}} \bar{R})} \\ + \frac{1}{K} \sum_{i = 1}^{K} (\frac{\frac{T_{G}}{σ^{2}} (μ^{D} - {\bar{R}}^{G, (i)})}{1 + exp (({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T_{G}}{2 σ^{2}} - (μ^{D} - μ^{G}) \frac{T_{G}}{σ^{2}} {\bar{R}}^{G, (i)})}) & = & 0 \end{matrix}

Hence, we obtain a highly non-linear equation for

μ^{D}

, where we should recall that

R_{t}^{G, (i)} = μ^{G} + σ Z_{t}^{G, (i)}

, and

{\bar{R}}^{G, (i)} = μ^{G} + σ {\bar{Z}}^{G, (i)}

.

Next, we compute the first-order condition with respect to

μ^{G}

:

\frac{\partial}{\partial μ^{G}} \{ln (D (R_{1 : T}; μ^{D})) + \frac{1}{K} \sum_{i = 1}^{K} ln (1 - D (R_{1 : T_{G}}^{G, (i)}; μ^{D}))\} = 0

then

\frac{\partial}{\partial μ^{G}} ln (D (R_{1 : T}; μ^{D})) = \frac{- \frac{T}{σ^{2}} (\bar{R} - μ^{G})}{1 + exp (- ({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} + (μ^{D} - μ^{G}) \frac{T}{σ^{2}} \bar{R})}

and

\begin{matrix} \frac{\partial}{\partial μ^{G}} \{\frac{1}{K} \sum_{i = 1}^{K} ln (1 - D (R_{1 : T_{G}}^{G, (i)}; μ^{D}))\} \\ = & \frac{1}{K} \sum_{i = 1}^{K} (\frac{- \frac{T_{G}}{σ^{2}} (μ^{D} - {\bar{R}}^{G, (i)})}{1 + exp (({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T_{G}}{2 σ^{2}} - (μ^{D} - μ^{G}) \frac{T_{G}}{σ^{2}} {\bar{R}}^{G, (i)})}) \end{matrix}

Together,

\begin{matrix} \frac{- \frac{T}{σ^{2}} (\bar{R} - μ^{G})}{1 + exp (- ({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} + (μ^{D} - μ^{G}) \frac{T}{σ^{2}} \bar{R})} \\ + \frac{1}{K} \sum_{i = 1}^{K} (\frac{- \frac{T_{G}}{σ^{2}} (μ^{D} - {\bar{R}}^{G, (i)})}{1 + exp (({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T_{G}}{2 σ^{2}} - (μ^{D} - μ^{G}) \frac{T_{G}}{σ^{2}} {\bar{R}}^{G, (i)})}) & = & 0 \end{matrix}

We also obtain a highly non-linear equation for

μ^{G}

.

Let us denote

\begin{matrix} h_{1} (R, μ^{D}, μ^{G}) & = & \frac{1}{1 + exp (- ({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T}{2 σ^{2}} + (μ^{D} - μ^{G}) \frac{T}{σ^{2}} \bar{R})} \\ h_{2} (R^{G}, μ^{D}, μ^{G}) & = & \frac{1}{K} \sum_{i = 1}^{K} (\frac{- (μ^{D} - {\bar{R}}^{G, (i)})}{1 + exp (({(μ^{D})}^{2} - {(μ^{G})}^{2}) \frac{T_{G}}{2 σ^{2}} - (μ^{D} - μ^{G}) \frac{T_{G}}{σ^{2}} {\bar{R}}^{G, (i)})}) \\ h_{3} (R, R^{G}, μ^{D}, μ^{G}) & = & \frac{h_{2} (R^{G}, μ^{D}, μ^{G})}{h_{1} (R, μ^{D}, μ^{G})} \end{matrix}

Then, we can write the first-order conditions as follows:

\begin{matrix} (\bar{R} - μ^{D}) + \frac{T_{G}}{T} h_{3} (R, R^{G}, μ^{D}, μ^{G}) & = & 0 \\ (\bar{R} - μ^{G}) + \frac{T_{G}}{T} h_{3} (R, R^{G}, μ^{D}, μ^{G}) & = & 0 \end{matrix}

which can be simplified to

\begin{matrix} {\hat{μ}}^{D} & = & {\hat{μ}}^{G} \\ {\hat{μ}}^{G} & = & \bar{R} + \frac{T_{G}}{T} h_{3} (R, R^{G}, {\hat{μ}}^{G}, {\hat{μ}}^{G}) \end{matrix}

For the case

K \to \infty

, we first compute the first-order condition with respect to

μ^{D}

on the objective function:

\frac{\partial}{\partial μ^{D}} \{ln (D (R_{1 : T}; μ^{D})) + \int ln (1 - D (R_{1 : T_{G}}^{G}; μ^{D})) f (R_{1 : T_{G}}^{G}; μ^{G}) d R_{1 : T_{G}}^{G}\} = 0

where

D (x; μ^{D}) = \frac{f (x; μ^{D})}{f (x; μ^{D}) + f (x; μ^{G})}

, and we obtain

\frac{\partial}{\partial μ^{D}} ln (D (R_{1 : T}; μ^{D})) = \frac{1 - D (R_{1 : T}; μ^{D})}{f (R_{1 : T}; μ^{D})} (\bar{R} - μ^{D}) \frac{T}{σ^{2}}

\begin{matrix} \frac{\partial}{\partial μ^{D}} \{\int ln (1 - D (R_{1 : T_{G}}^{G}; μ^{D})) f (R_{1 : T_{G}}^{G}; μ^{G}) d R_{1 : T_{G}}^{G}\} \\ = & \int \frac{\partial}{\partial μ^{D}} \{ln (1 - D (y; μ^{D})) f (y; μ^{G})\} d y \\ = & \int \frac{\partial}{\partial μ^{D}} \{ln (\frac{f (y; μ^{G})}{f (y; μ^{D}) + f (y; μ^{G})}) f (y; μ^{G})\} d y \\ = & - \int (\bar{y} - μ^{D}) \frac{T_{G}}{σ^{2}} f (y; μ^{G}) d y = - \frac{T_{G}}{σ^{2}} (μ^{G} - μ^{D}) \end{matrix}

Together, we obtain the FOC:

\frac{1 - D (R_{1 : T}; μ^{D})}{f (R_{1 : T}; μ^{D})} (\bar{R} - μ^{D}) - \frac{T_{G}}{T} (μ^{G} - μ^{D}) = 0

Now, we compute the first-order condition with respect to

μ^{G}

:

\frac{\partial}{\partial μ^{G}} \{ln (D (R_{1 : T}; μ^{D})) + \int ln (1 - D (R_{1 : T_{G}}^{G}; μ^{D})) f (R_{1 : T_{G}}^{G}; μ^{G}) d R_{1 : T_{G}}^{G}\} = 0

\begin{matrix} \frac{\partial}{\partial μ^{G}} \{\int ln (1 - D (R_{1 : T_{G}}^{G}; μ^{D})) f (R_{1 : T_{G}}^{G}; μ^{G}) d R_{1 : T_{G}}^{G}\} \\ = & \int \frac{\partial}{\partial μ^{G}} \{ln (1 - D (y; μ^{D})) f (y; μ^{G})\} d y \\ = & \int \frac{\partial}{\partial μ^{G}} f (y; μ^{G}) \{D (y; μ^{D}) + ln (1 - D (y; μ^{D}))\} d y \end{matrix}

Together, we obtain

\frac{- D (R_{1 : T}; μ^{D})}{f (R_{1 : T}, μ^{D})} \frac{\partial}{\partial μ^{G}} f (R_{1 : T}; μ^{G}) + \int \frac{\partial}{\partial μ^{G}} f (y; μ^{G}) \{D (y; μ^{D}) + ln (1 - D (y; μ^{D}))\} d y = 0

- (1 - D (R_{1 : T}; μ^{D})) (\bar{R} - μ^{G}) + \frac{T_{G}}{T} \int (\bar{y} - μ^{G}) f (y; μ^{G}) \{D (y; μ^{D}) + ln (1 - D (y; μ^{D}))\} d y = 0

Then, the system of the FOC(s) is as follows:

\begin{matrix} \frac{1 - D (R_{1 : T}; μ^{D})}{f (R_{1 : T}; μ^{D})} (\bar{R} - μ^{D}) - \frac{T_{G}}{T} (μ^{G} - μ^{D}) = 0 \\ - (1 - D (R_{1 : T}; μ^{D})) (\bar{R} - μ^{G}) + \frac{T_{G}}{T} \int (\bar{y} - μ^{G}) f (y; μ^{G}) \{D (y; μ^{D}) + ln (1 - D (y; μ^{D}))\} d y = 0 \end{matrix}

□

Proof.

Proof of Proposition 6.

For convenience, we work with T and K rather than

T_{G}

and

K_{G}

. Let us compute the first-order condition with respect to

μ^{D}

:

\frac{\partial}{\partial μ^{D}} \{ln (D (R_{1 : T}; μ^{D})) - \frac{λ}{K} \sum_{i = 1}^{K} ln (D (R_{1 : T}^{G, (i)}; μ^{D}))\} = 0

We have

\frac{\partial}{\partial μ^{D}} ln (D (R_{1 : T}, μ^{D})) = - \frac{\partial}{\partial μ^{D}} \frac{\sum_{t = 1}^{T} {(R_{t} - μ^{D})}^{2}}{2 σ^{2}} = \frac{\sum_{t = 1}^{T} (R_{t} - μ^{D})}{σ^{2}}

and

\begin{matrix} \frac{\partial}{\partial μ^{D}} \{\frac{λ}{K} \sum_{i = 1}^{K} ln (D (R_{1 : T_{G}}^{G, (i)}; μ^{D}))\} & = & - \frac{λ}{K} \sum_{i = 1}^{K} (\frac{\partial}{\partial μ^{D}} \frac{\sum_{t = 1}^{T} {(R_{t}^{G, (i)} - μ^{D})}^{2}}{2 σ^{2}}) \\ = & \frac{λ}{K} \sum_{i = 1}^{K} (\frac{\sum_{t = 1}^{T} (R_{t}^{G, (i)} - μ^{D})}{σ^{2}}) \end{matrix}

Together,

\begin{matrix} \frac{\partial}{\partial μ^{D}} \{ln (D (R_{1 : T}; μ^{D})) - \frac{λ}{K} \sum_{i = 1}^{K} ln (D (R_{1 : T_{G}}^{G, (i)}; μ^{D}))\} \\ = & \frac{\sum_{t = 1}^{T} (R_{t} - μ^{D})}{σ^{2}} - \frac{λ}{K} \sum_{i = 1}^{K} (\frac{\sum_{t = 1}^{T} (R_{t}^{G, (i)} - μ^{D})}{σ^{2}}) = 0 \end{matrix}

Hence, we obtain the following equation, where we have used

R_{t}^{G, (i)} = μ^{G} + σ Z_{t}^{G, (i)}

:

\sum_{t = 1}^{T} R_{t} - σ \frac{λ}{K} \sum_{i = 1}^{K} (\sum_{t = 1}^{T} Z_{t}^{G, (i)}) + (λ - 1) T μ^{D} = λ T μ^{G}

λ T μ^{G} - (λ - 1) T μ^{D} = \sum_{t = 1}^{T} R_{t} - σ \frac{λ}{K} \sum_{i = 1}^{K} (\sum_{t = 1}^{T} Z_{t}^{G, (i)})

Now, we compute the first-order condition with respect to

μ^{G}

:

\frac{\partial}{\partial μ^{G}} \{ln (D (R_{1 : T}; μ^{D})) - \frac{λ}{K} \sum_{i = 1}^{K} ln (D (R_{1 : T}^{G, (i)}; μ^{D}))\} = 0

As before,

\frac{\partial}{\partial μ^{G}} ln (D (R_{1 : T}; μ^{D})) = 0

and

\begin{matrix} \frac{\partial}{\partial μ^{G}} \{\frac{λ}{K} \sum_{i = 1}^{K} ln (D (R_{1 : T}^{G, (i)}; μ^{D}))\} & = & - \frac{λ}{K} \sum_{i = 1}^{K} \frac{\partial}{\partial μ^{G}} [\frac{\sum_{t = 1}^{T} {(R_{t}^{G, (i)} - μ^{D})}^{2}}{2 σ^{2}}] \\ = & - \frac{λ}{K} \sum_{i = 1}^{K} [\frac{\sum_{t = 1}^{T} (μ^{G} + σ Z_{t}^{G, (i)} - μ^{D})}{σ^{2}}] \end{matrix}

Together,

\begin{matrix} \frac{\partial}{\partial μ^{G}} \{ln (D (R_{1 : T}; μ^{D})) - \frac{λ}{K} \sum_{i = 1}^{K} ln (D (R_{1 : T}^{G, (i)}; μ^{D}))\} \\ = & \frac{λ}{K} \sum_{i = 1}^{K} [\frac{\sum_{t = 1}^{T} (μ^{G} + σ Z_{t}^{G, (i)} - μ^{D})}{σ^{2}}] = 0 \end{matrix}

leading to the second equation:

μ^{D} - μ^{G} = \frac{σ}{T K} \sum_{i = 1}^{K} \sum_{t = 1}^{T} Z_{t}^{G, (i)}

The estimates proposed by our GAN would be the system of two equations:

\begin{matrix} λ μ^{G} - (λ - 1) μ^{D} & = & \frac{1}{T} \sum_{t = 1}^{T} R_{t} - σ \frac{λ}{T K} \sum_{i = 1}^{K} \sum_{t = 1}^{T} Z_{t}^{G, (i)} \\ μ^{D} - μ^{G} & = & \frac{σ}{T K} \sum_{i = 1}^{K} \sum_{t = 1}^{T} Z_{t}^{G, (i)} \end{matrix}

which leads to

\begin{matrix} {\hat{μ}}^{D} & = & \frac{1}{T} \sum_{t = 1}^{T} R_{t} \\ {\hat{μ}}^{G} & = & \frac{1}{T} \sum_{t = 1}^{T} R_{t} - \frac{σ}{T K} \sum_{i = 1}^{K} \sum_{t = 1}^{T} Z_{t}^{G, (i)} \end{matrix}

We need to show that the solution to the FOC is unique, and it constitutes a saddle point.

Let us compute the Hessian:

\begin{matrix} \frac{\partial}{\partial μ^{D}} \{ln (D (R_{1 : T}; μ^{D})) - \frac{λ}{K} \sum_{i = 1}^{K} ln (D (R_{1 : T}^{G, (i)}; μ^{D}))\} \\ = & \frac{\sum_{t = 1}^{T} (R_{t} - μ^{D})}{σ^{2}} - \frac{λ}{K} \sum_{i = 1}^{K} (\frac{\sum_{t = 1}^{T} (R_{t}^{G, (i)} - μ^{D})}{σ^{2}}) = 0 \end{matrix}

\begin{matrix} \frac{\partial}{\partial μ^{G}} \{ln (D (R_{1 : T}; μ^{D})) - \frac{λ}{K} \sum_{i = 1}^{K} ln (D (R_{1 : T}^{G, (i)}; μ^{D}))\} \\ = & \frac{λ}{K} \sum_{i = 1}^{K} [\frac{\sum_{t = 1}^{T} (μ^{G} + σ Z_{t}^{G, (i)} - μ^{D})}{σ^{2}}] = 0 \end{matrix}

\begin{matrix} L_{D D}^{(E)} & = \frac{\partial^{2} L^{(E)} (μ^{D}, μ^{G})}{\partial μ^{D} \partial μ^{D}} = (λ - 1) \frac{T}{σ^{2}} \\ L_{D G}^{(E)} & = \frac{\partial^{2} L^{(E)} (μ^{D}, μ^{G})}{\partial μ^{D} μ^{G}} = - λ \frac{T}{σ^{2}} \\ L_{G D}^{(E)} & = \frac{\partial^{2} L^{(E)} (μ^{D}, μ^{G})}{\partial μ^{G} μ^{D}} = - λ \frac{T}{σ^{2}} \\ L_{G G}^{(E)} & = \frac{\partial^{2} L^{(E)} (μ^{D}, μ^{G})}{\partial μ^{G} μ^{G}} = λ \frac{T}{σ^{2}} \end{matrix}

Finally, the Hessian is, for

λ \neq 1

and

\bar{R} \neq \frac{1}{K} \sum_{i = 1}^{K} {\bar{R}}^{G, (i)}

,

L_{G G} L_{D D} - L_{G D} L = - λ \frac{T^{2}}{σ^{4}}

The point

(μ^{D}, μ^{G}) = ({\hat{μ}}^{D}, {\hat{μ}}^{G})

is the unique saddle point for the given optimization problem, and a maximum in

μ^{D}

, if

0 < λ < 1

.

It can be seen that unbiasedness and consistency follow by noticing that

\begin{matrix} E_{M} [{\hat{μ}}^{D}] & = & μ^{M} \\ E_{G} [{\hat{μ}}^{G}] & = & μ^{M} - \frac{σ}{T K} \sum_{i = 1}^{K} \sum_{t = 1}^{T} E_{G} [Z_{t}^{G, (i)}] = μ^{M} \end{matrix}

and

\begin{matrix} V_{M} [{\hat{μ}}^{D}] & = & \frac{μ^{M}}{T} \\ V_{G} [{\hat{μ}}^{G}] & = & \frac{μ^{M}}{T} - \frac{σ}{T K} V_{G} [Z_{t}^{G, (i)}] = \frac{μ^{M}}{T} - \frac{σ}{T K} \end{matrix}

both approach zero as T, and K goes to infinity. □

References

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Wiese, M.; Knobloch, R.; Korn, R.; Kretschmer, P. Quant GANs: Deep generation of financial time series. Quant. Financ. 2020, 20, 1419–1440. [Google Scholar] [CrossRef]
Takahashi, S.; Chen, Y.; Tanaka-Ishii, K. Modeling financial time-series with generative adversarial networks. Phys. A Stat. Mech. Its Appl. 2019, 527, 121261. [Google Scholar] [CrossRef]
Yang, J.; Gong, X.; Fang, A. Extreme Risk Spillover from Commodity Markets to Green Finance Markets: New Evidence Utilizing GAN and GARCH Model. Comput. Econ. 2025, 1–29. [Google Scholar] [CrossRef]
Ganguly, S. Implementing quantum generative adversarial network (qgan) and qcbm in finance. arXiv 2023, arXiv:2308.08448. [Google Scholar] [CrossRef]
Liao, S.; Ni, H.; Sabate-Vidales, M.; Szpruch, L.; Wiese, M.; Xiao, B. Sig-Wasserstein GANs for conditional time series generation. Math. Financ. 2024, 34, 622–670. [Google Scholar] [CrossRef]
Eckerli, F.; Osterrieder, J. Generative adversarial networks in finance: An overview. arXiv 2021, arXiv:2106.06364. [Google Scholar] [CrossRef]
Kaji, T.; Manresa, E.; Pouliot, G. An adversarial approach to structural estimation. Econometrica 2023, 91, 2041–2063. [Google Scholar] [CrossRef]
Lim, J.H.; Ye, J.C. Geometric GAN. arXiv 2017, arXiv:1705.02894, 02894. [Google Scholar] [CrossRef]
Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Smolley, S.P. Least Squares Generative Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2813–2821. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Athey, S.; Imbens, G.W.; Metzger, J.; Munro, E. Using wasserstein generative adversarial networks for the design of monte carlo simulations. J. Econom. 2021, 240, 105076. [Google Scholar] [CrossRef]
Kaji, T.; Manresa, E.; Pouliot, G.A. Adversarial Inference Is Efficient; AEA Papers and Proceedings; American Economic Association: Nashville, TN, USA, 2021; Volume 111, pp. 621–625. [Google Scholar]
Farnia, F.; Wang, W.W.; Das, S.; Jadbabaie, A. Gat–gmm: Generative adversarial training for gaussian mixture models. SIAM J. Math. Data Sci. 2023, 5, 122–146. [Google Scholar] [CrossRef]
Chakraborty, S.; Bartlett, P.L. On the statistical properties of generative adversarial models for low intrinsic data dimension. arXiv 2024, arXiv:2401.15801. [Google Scholar] [CrossRef]
Birrell, J. Statistical Error Bounds for GANs with Nonlinear Objective Functionals. arXiv 2024, arXiv:2406.16834. [Google Scholar]
Barron, E.N. Game Theory: An Introduction; John Wiley & Sons: Hoboken, NJ, USA, 2024. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Piessens, R.; de Doncker-Kapenga, E.; Überhuber, C.W.; Kahaner, D.K. Quadpack: A Subroutine Package for Automatic Integration; Springer: Berlin/Heidelberg, Germany, 1983; Volume 1. [Google Scholar]

Figure 1. Loss functions of the classical oracle GA estimator in different scenarios.

Figure 2. Loss functions of the classical family-oracle GA estimator in different scenarios.

Figure 3. Loss functions of the alternative GA estimator in different scenarios.

Figure 4. Density of estimator for

μ^{D}

using the classical GA method and the alternative GA method with the family-oracle discriminator.

Figure 4. Density of estimator for

μ^{D}

using the classical GA method and the alternative GA method with the family-oracle discriminator.

Figure 5. Density of estimators for

μ^{G}

using the classical GA method, the oracle and the family-oracle, and the alternative GA method and the MLE.

Figure 5. Density of estimators for

μ^{G}

using the classical GA method, the oracle and the family-oracle, and the alternative GA method and the MLE.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Escobar-Anel, M.; Jiao, Y. The Generative Adversarial Approach: A Cautionary Tale of Finite Samples. Algorithms 2025, 18, 564. https://doi.org/10.3390/a18090564

AMA Style

Escobar-Anel M, Jiao Y. The Generative Adversarial Approach: A Cautionary Tale of Finite Samples. Algorithms. 2025; 18(9):564. https://doi.org/10.3390/a18090564

Chicago/Turabian Style

Escobar-Anel, Marcos, and Yiyao Jiao. 2025. "The Generative Adversarial Approach: A Cautionary Tale of Finite Samples" Algorithms 18, no. 9: 564. https://doi.org/10.3390/a18090564

APA Style

Escobar-Anel, M., & Jiao, Y. (2025). The Generative Adversarial Approach: A Cautionary Tale of Finite Samples. Algorithms, 18(9), 564. https://doi.org/10.3390/a18090564

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Generative Adversarial Approach: A Cautionary Tale of Finite Samples

Abstract

1. Introduction

2. Problem Setting and Asymptotic Solutions

2.1. Classical GA Oracle Discriminator

2.2. Classical GA Family-Oracle Discriminator

2.3. Alternative Proposal for the Loss Function and Discriminator

3. Theoretical Results for a Finite Sample

3.1. Classical GA Approach

3.1.1. Finite Sample with Oracle Discriminator

3.1.2. Finite Sample with the Family-Oracle Discriminator

3.2. Alternative GA Proposal

4. Numerical Analysis

4.1. Classical Oracle GA Estimator

4.2. Classical Family-Oracle GA Estimator

4.3. Alternative GA Estimator

4.4. Distribution of Estimators

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Nomenclature and Proofs

Appendix A.1. Nomenclature

Appendix A.2. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI