Appendix B. Some Results of Sub-Linear Expectation
Frameworks and notations of [
23,
24,
28] are used in this paper.
Given measurable space , let  be a linear space of real functions on , and the conditions listed below hold:
(i) suppose , then  for every , where  represents the linear space of (local Lipschitz) function  that satisfies , , for some ,  depending on 
(ii) For any , , where  is the indicator function of event A.
Definition A6 ([
28])
. A function  is called a sub-linear expectation  on  if the properties below hold: for all , we have- (a) 
- monotonicity: If , then ; 
- (b) 
- constant preserving: , ; 
- (c) 
- sub-additivity: if  is not of the form  or , then ; 
- (d) 
- positive homogeneity: , . 
 The triple  is named a sub-linear expectation space. The random variable  under sub-linear expectation is a function  that satisfies  for each . It can be thought of  as a random variable space. Considering a sub-linear expectation , the conjugate expectation  of  can be defined as , .
When the inequality in (c) of Definition A6 turns into equality,  is a linear expectation. This is a specific situation of sub-linear expectations. Not considering this particular case, most sub-linear expectations need not require to satisfy , for every . However, a sub-linear expectation may satisfy  for some .
Next, we give an example of sub-linear expectations.
Example A1. During a game, a participant selects a ball at random from a box that includes balls of three colors: white (W), blue (B) and red (R) balls. The participant is not informed of the exact amounts of W, B, and R by the urn’s owner, who serves as the game’s banker. He/she only makes sure that  and . Consider a random variables ξ, andLet  We can evaluate the loss ,  conservatively. We can get that the distribution of ξ isFor each fixed , the robust expectation of ξ can be expressed as Next, we show that  is a sub-linear expectation. For every , we have:
- (a) 
- monotonicity: If , then  by the monotonicity of . 
- (b) 
- constant preserving: , . 
- (c) 
- (d) 
Hence,  is a sub-linear expectation. Actually, ∃, , such that . Letand , . Obviously, , and . In Example A1, a participant selects a ball at random from a box that contains W, B, and R balls. When the participant picks a ball from the urn repeatedly, at time i, the banker knows the true distribution of ξ, but the participant does not. At time , the banker may change the numbers of B balls  and W balls  without telling the participant. However, the range of  and  are both fixed within . For example,  at the first time, however,  at the second time. Simply stated, when the ball is picked each time, the type of urns is different. The difference between this and the classic game is that the number of B and W balls changes when the participant picks a ball from an urn each time, i.e., for each time, the type of urns changes, while the classic game does not. The range of B and W here is either informed by the banker or determined by the data. More generally, the range of B and W here is the range of . For how to determine the range of  by the data, we can later give the principle in Proposition A3.
 Proposition A1 ([
28])
. Given a sub-linear expectation space , let  be the conjugate expectation of . For any , then(i) .
(ii) , for .
(iii) .
(iv)  and  are both finite if  is finite.
 It is very helpful to know the following sublinear expectation representation theorem.
Theorem A1 ([
28]) (Robust Daniell-Stone theorem)
. Suppose that  is a sub-linear expectation space, and it satisfies the following condition:for every sequence  of random variables in  which fulfills  for each . Then, on the measurable space , ∃ a family of probability measures , such thatHere  denotes the smallest σ-algebra generated by . Remark A1. Theorem A1 shows that: under the suitable condition, a sub-linear expectation  could be expressed as a supremum of linear expectations. Based on this theorem, Chen [23] gave the definition of maximum expectation . Definition A7 below provide the concepts of maximum expectation  and minimum expectation  introduced by Chen [23]. In most cases, the sub-linear expectation  is the maximum expectation .  Definition A7 ([
23])
. Suppose that  is a measurable space,  is the set of all probability measures on Ω, and  is the linear expectation under probability measure . For a non-empty subset ,  and , the upper probability , the lower probability v, the maximum expectation  and the minimum expectation  are defined byrespectively. Definition A8 ([
1])
. Let  be a set function from  to .  is named a non-additive probability (also capacity) if properties (i
) and (ii
) below hold, and is named a lower or an upper continuous non-additive probability if property (iii
) or (iv
) below also hold:- (i)
- , . 
- (ii)
- If  and A, , then . 
- (iii)
- If , , and , then . 
- (iv)
- If , , and , then . 
 Remark A2. (i) The upper probability  and the lower probability v are two kinds of specific non-additive probabilities. Furthermore,  is subadditive, and  for every given . But v is not superadditive.  is a sub-linear expectation, and  is the conjugate expectation of . Indeed, provided a maximum expectation ,  and v can be produced by  and  for any .
(ii) We also call the subset  a family of probability measures related to the sub-liner expectation . For a given random variable , let  and call it a family of ambiguous probability distributions of X. Therefore, there exist two distributions for X: , and , , named the lower distribution and the upper distribution, respectively. In fact, the lower distribution and the upper distribution of X can characterize the ambiguity of distributions of X.
 Throughout this paper, we only explore cases that the sub-linear expectation 
 is the maximum expectation 
, and the upper probability 
 is upper continuous. Next, the concept of IID random variables under sub-linear expectation 
 is introduced. We use the concept of IID random variables proposed by [
24,
28].
Definition A9. (i) Independence: Let  be a random variable sequence satisfying . If for every Borel-measurable function φ on  with  and  for every ,  holds, where  and , then random variable  is independent to  under .
(ii) Identical distribution: Given random variables X and Y, if for every Borel-measurable function φ satisfying ,  then X and Y are identically distributed, denoted .
(iii) IID random variables: Given a sequence of random variables, if  and  is independent to  for every , then  is IID.
 Remark A3. (i) 
The case “Y is independent to X” occurs when Y occurs after X. Therefore, the information of X should be considered in a robust expectation. In a sub-linear expectation space , the case that Y is independent to X implies that the family of distributions  of Y is unchanged after every realization of  happens. That is, the “conditional sub-linear expectation” of Y with respect to X is . This notion of independence is merely the classical one when it comes to linear expectation. As illustrated by Peng [27], it should be noticed that the case that “Y is independent to X” is not automatically meant to mean that “X is independent to Y ” under sub-linear expectations. The following Proposition A2 (ii) illustrates this point.(ii) 
Let’s consider Example A1 again. Now we could mix completely and randomly. After selecting a ball from the box, we will receive 1 dollar if the selected ball is W, and −1 dollar if it is B. The gain  of this game isThat is,  if a R ball is selected. We duplicate this game, but after time i the random variable  is produced, and a new game starts, the banker can change the number of B balls  within the fixed range  without telling us. Now if we sold a contract  based on the i-th output , then, taking into account the worst case, the robust expectation isSo the sequence  is identically distributed. It can also be proved that  is independent to . Generally speaking, if  denotes a path-dependent loss function, then the robust expected loss is:(iii) 
If the sample  is actual data related to humanity, management, economics and finance, it is often not guaranteed that it meets the IID condition in classical probability theory. But Example A1 shows that most of the actual data can satisfy, or approximately, the IID condition under  given by Definition A9. Also, the test function φ has different meanings for different situations. For example,  could become a financial contract based on X, a put option , a consumption function, a profit and loss function, a cost function in an optimal control system, etc. When dealing with theoretical problems about sub-linear expectations, we only need to calculate the sub-linear expectation  corresponding to a certain class of function φ that we care about (see Peng [27] subsection 3.2 and 3.3 for more details). Proposition A2. Given the maximum expectation , let X and Y be two random variables under ,  be a family of probability distributions of X that corresponds to the family of probability measures , and  be a family of probability distributions of Y that corresponds to the family of probability measures . For every given , then ∃ a family of probability measures  satisfying:
(i)
          
andSpecifically, if Y is independent to X under , or X is independent to Y under , we haveand
		  (ii) 
Suppose that Y is independent to X under , then the lower distribution and the upper distribution of  areandrespectively. Suppose that X is independent to Y under , the lower distribution and the upper distribution of  areandrespectively. Proof.  (i) For (
A1), for every given 
, we obtain
          
          by Definition A7 and Remark A2. Similarly, (
A2) follows by the fact 
Next, we show (
A3). Since 
Y is independent to 
X under 
, we have
          
          by the positive homogeneity of 
.
Since 
X is independent to 
Y under 
, it follows that
          
		  Hence, (
A3) is proved.
Finally, we consider (
A4). Since 
Y is independent to 
X under 
, we have
          
If 
X is independent to 
Y under 
, we can prove (
A4) in a similar manner. Thus, (i) is proved.
(ii) With no loss of generality, we only show (
A5) and (
A6). Since 
Y is independent to 
X under 
, we obtain
          
		  Hence, (
A5) is proved.
Since 
X is independent to 
Y under 
, we obtain
          
		  Hence, (
A6) is proved.    □
 Definition A10 ([
28]) (Maximal distribution)
. If , for  where , , then the random variable η on a sub-linear expectation space  is called maximally distributed. Definition A11 ([
28]) (G-normal distribution)
. If for every given , writing , u is the viscosity solution of the partial differential equation (PDE):  where  and , , then the random variable η on a sub-linear expectation space  with  is called G-normal distribution, denoted . Theorem A2 ([
24]) (Kolmogorov strong LLN under sub-linear expectation)
. Assume that  is a sequence of IID random variables under , and  for some  Let ,  and v and  be the lower and upper probabilities respectively. ThenIf  is upper continuous, then  Definition A12 (Empirical distribution function)
. Assume that  is a sequence of IID random samples from a family of ambiguous probability distributions  under sub-linear expectation .is called the empirical distribution function. Proposition A3. Assume that  is a sequence of IID random samples from a family of ambiguous probability distributions  under sub-linear expectation . Set ,  Then for every given ,and  Proof.  For any given 
, set 
, 
 Since 
 is a sequence of IID random samples under 
, 
 is a sequence of IID random samples under 
. It can be shown that 
 has means 
 and 
, where
          
          and 
 for any given 
. Take 
 in Theorem A2, then for any 
, (
A7) and (
A8) hold.    □
 Remark A4. In fact, the empirical distribution function under  of the set of observed data can be non-convergent. When it converges, this is the classical probability problem. However, when it does not converge, it is our hypothetical scenario. Let’s still consider Example A1. When the empirical distribution function under  does not converge, this situation is what we are concerned about. At this time, the empirical distribution function must have the upper limit and lower limit, where the upper limit is the upper probability, and the lower limit is the lower probability. This is exactly the result of Proposition A3. By Proposition A3, we can determine the value boundary of  of Example A1, so as to identify the range of values for  in Example A1.
 Definition A13. Suppose that  is a provided measurable space, and V is a non-additive probability. If  then the random variable sequence  converges almost surely (a.s.) to η, denoted , a.s.
 Definition A14. Suppose that  is a provided measurable space, and V is a non-additive probability. If ∃, such that  converges uniformly to η in  for every given  and , as , then the random variable sequence  converges uniformly to η, a.s.
 Theorem A3 (Egoroff’s Theorem under sub-linear expectation)
. Suppose that  is a given measurable space, and v and  are the lower and upper probabilities respectively. If the random variable sequence  is convergent to η, a.s., with respect to v, then  converges uniformly to η, a.s., with respect to v, i.e., for any given , such that , as , Proof.  Let 
D denote the set of these points 
 at which 
 does not converge to 
. Then
          
		  Since 
 is convergent to 
, a.s., with respect to 
v, from 
, we have 
. Then, for every given 
, we have
          
It is clear that
          
		  So, from the condition that 
 is upper continuous, it can be obtained that
          
		  So, for every given 
, ∃
, such that
          
Take 
, then
          
		  Since 
, we have 
, as 
. Then for every given 
, ∃
, such that for every given 
,
          
		  Thus, 
 is convergent uniformly to 
, a.s., with respect to 
v.    □
 Theorem A4. Suppose that  is a sequence of IID random variables under , , and , for any  and some . For any , denote ,  and , . Defineand(a) For every given  and  satisfying , , then ∃, such that for every ,(b) For every given  and  satisfying , then ∃, such that for every ,  Proof.  Since the proofs of (a) and (b) are similar, we only show (a) here.
(a) Since 
 is a sequence of IID random variables under 
, then for any 
, 
 is a sequence of IID random variables under 
. Hence, by Theorem A2, we have
          
For every given 
 and 
 such that 
, denote
          
          and
          
		  Obviously, 
, as 
, and by (
A9), it follows that 
 Thus, from Theorem A3, ∃
 and 
, such that 
, 
 and for any 
,
          
		  By (
A10), for all 
 and 
, we have
          
		  It implies that
          
□