Proof about Weakly Universal State Contracting Networks

Recurrent networks that have transfer functions that fulfill the Lipschitz continuity with L=1, may be echo state networks if certain limitations on the recurrent connectivity are applied. Initially it has been shown that it is sufficient if the largest singular value of the recurrent connectivity $S$ is smaller than 1. Here it is investigated under which conditions it still can be shown that the network is an an echo state network even if S=1.


Introduction
Jäger [Jäger, 2001[Jäger, , 2010[Jäger, , 2003 has shown under which circumstances recurrent connectivity results (1) in internal states independent from initial states for sufficient long input histories, thus the state of the network finally is completely determined by the input. On the other hand Jäger also determined a set of conditions (2) in which the network behaves chaotic and thus is sensitive to the initial conditions in the internal layer. These conditions are called Echo State (ES) conditions . However, Jägers mathematical leaves a gap of undetermined configurations for which it is still open if the network belongs to either or the two groups. As outlined in sect. 4 the initial work of Jäger proves the validity of the ES conditions in terms of topology for an open set of conditions. The subject of this work is to check under which conditions the border of the open set (aka. the critical point) belongs to the area where the ES condition is fulfilled, or not. As it turns out below the behavior of the border depends on the choice of the transfer function; only for certain transfer functions ES conditions are fulfilled. In the following section the definitions for echo state networks are reiterated. Then two different transfer functions are presented that are applied in the proof. Sect. 4 describes the result of the initial proof by Jäger. The following 3 sections outline the extended proof. Finally, the results are discussed. 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000  Figure 1: ESN scheme: The initial ESN approach chooses input and recurrent connectivity randomly, although it has to obey the echo state condition. Learning is only applied to the output layer.

Model
The model is based on Jäger's ESN approach [Jäger, 2001[Jäger, , 2010[Jäger, , 2003. It consists of an input, recurrent layer and possibly an output layer (though not explicitly introduced in the scope of this paper). ESNs are composed of rate-based neurons with real valued transfer functions; the update rule is: where the vectors u t ,x t , and o t are the input, the neurons of the hidden layer, and the neurons of the output layer, respectively, and w in , W, and w out are the matrices of the respective synaptic weight factors. As a convention the transfer function θ(.) is continuous, differentiable and monotonically increasing with the limit 1 ≥ θ (.) ≥ 0, which is compatible with the requirement that θ(.) fulfills the Lipschitz continuity with L = 1. Jäger's approach uses random matrices for W and w in , learning is restricted to the output layer w out (see fig. 1). The learning (i.e. training o t ) can be performed by linear regression. Since the learning process with regard to some output itself is not of interest in the scope of this paper, this part of the approach is not outlined here.

Transfer functions
The following proof considers two different transfer functions and θ(x) = 0.5x − 0.25 sin(2x) It has to be noted that in both cases the maximal derivative θ (x) is 1.

Echo-state condition
A necessary condition for the performance of an ESN network is that the echo state condition is fulfilled. Consider a time-discrete recursive function where the x t s are interpreted as internal states and the u t s form some external input sequence, i.e. the stimulus. The definition of the echo-state condition is the following. Assume an infinite stimulus sequence:ū ∞ = u 0 , u 1 , . . . and two random initial internal states of the system x 0 and y 0 . To both initial states x 0 and y 0 , the sequencesx ∞ = x 0 , x 1 , . . . andȳ ∞ = y 0 , y 1 , . . . can be assigned; Then the system F (.) is called universally state contracting fulfills if independent from the set u t and for any (x 0 ,y 0 ) and all real values > 0, there exists an iteration τ ( ) for which d(x t , y t ) ≤ for all t ≥ τ . Jäger showed ( [Jäger, 2001[Jäger, , 2010 pp. 43) that the echo state condition is fulfilled if and only if the network is state contracting. The ESN is designed to be universally state contracting and thus to fulfill the echo-state condition.

Echo state condition limit with weak contraction
Jäger's sufficient echo state condition (see [Jäger, 2001[Jäger, , 2010, App. C, p. 41) has strictly been proven only for non-critical systems (largest singular value Λ < 1) and with tanh(.) as a transfer function. The original proof is based on the fact that tanh in combination with Λ < 1 is a contraction. In that case Jäger shows an exponential convergence. Obviously, that proof is not valid for the case that is presented in this present work. However, the initial theorem can be extended under the following circumstances. Theorem: If hyperbolic tangent or the function of eq. 5 are used as transfer functions and all singular values are larger than zero, the echo state condition (see eq. 7) is fulfilled even if Λ = 1. Summary of the proof: Basically, as an important precondition the proof requires that both transfer functions fulfill where φ(z) is defined as while 1 > γ, η > 0, κ ≥ 1 are parameters that have to be determined for each transfer function separately, and also for each metric norm d(., .) = ||.|| p 1 In sect. 6 is is shown that indeed both transfer functions fulfill that requirement. It then remains to prove that in the slowest case, we have a convergence in a process with 2 stages. The first stage, if d(y t , x t ) > γ there is a convergence that is faster or equal to an exponential decay. The second stage is a convergence process that is faster or equal to a power law decay. Proof: Note with regard to the test function φ: In analogy to Jäger one can check now the contraction between the time step t and t+1: One can rewrite d(y lin,t+1 , x lin,t+1 ) = ||Wy lin,t + I − Wx lin,t + I|| = ||W(y lin,t − x lin,t )||, (11) where I = w in u t . Thus one gets d(y lin,t+1 , x lin,t+1 ) × φ(d(y lin,t+1 x lin,t+1 )) = ||W(y lin,t − x lin,t )|| × φ(||W(y lin,t − x lin,t ||)) ≤ Λd(y lin,t , x lin,t ) × (φ(λd(y lin,t , x lin,t )), where Λ and λ are the largest and smallest singular values of W, respectively. Merging eq. 10 and eq. 12 results in the inequality d(y t+1 , x t+1 ) ≤ Λd(y lin,t , x lin,t ) × (φ(λd(y lin,t , x lin,t )).
First, assuming Λ < 1 we get an exponential decay This case is handled by Jäger's initial proof. With regard to an upper limit of the contraction speed (cf. eq. 7) one can find If the largest singular value Λ > 1, then the largest absolute eigenvalue is also larger than 1 due to the spectral theorem. Thus, the echo state condition is not fulfilled due to Jaeger's first and second condition (C1 & C2). What remains is to check the critical case Λ = 1. Here again one considers 2 cases: 1 Since P-norms are equivalent, once we found a valid set of parameters η p1 and γ p1 for one P-norm p 1 we can get a new set for another norm p 2 by simply rescaling η p2 = aη p1 , γ p2 = aγ p1 , where a > 0 is a finite scaling factor. If d(y lin,t , x lin,t ) > γ we can write the update inequality of eq. 13 as: Thus, for all ≥ γ, the slowest decay process can be covered by If < γ, then eq. 13 becomes: One can now consider the sequence a t : This series fulfills a t ≥ d(y t , x t ) for all t if we set a 0 = d(y 0 , x 0 ) ≤ γ. Unfortunately, it seems hard to find an explicit form for the sequence of eq. 19. Fortunately, another sequence, covers again the sequence a t (cf. sect. 7). From the convergence of the sequence a n , one can conclude the convergence of d(x t , y t ). From the solution one can calculate the upper limit τ ( ) if ≤ γ: 6 Weak contraction with the present transfer function In this section a test function according to eq. 9, i.e. parameters κ, γ and η is verified for the function of eq. 5 and hyperbolic tangent. Within this section we test the following values: for both transfer functions and Manhatten norm (||.|| 1 ). In order to derive these values one can start by considering linear responses ||y lin,t −x lin,t || and the final value ||y t −x t || within one single neuron as We can define Setting in these definitions in eq. 8 and for a single neuron and for any ζ we get Thus, it suffices to consider The max ζ can be found by basic analysis. Extremal points can be found at: Since for both suggested transfer functionsθ is an even function one gets ζ = −z/2 as the extremal point. Fundamental analysis shows that this point in both cases is a maximum. Thus eq. 26 becomes The case of z ≤ γ can be analyzed by Taylor series expansion. The Taylor series expansion T 4 (f, x 0 , x) of a function f with a fourth order upper limit estimate is: where one can call the right hand addend 7 Sequence a * t covers up sequence a t In the following, identical definitions to sect. 5 are used. One can start from the statement − λη κ ≥ −λη.
Since λ ≥ 0 and η ≥ 0 and κ ≥ 1, it can easily be seen that the inequality is fulfilled for any combination of λ, η and κ. One can now extend the numerator and denominator of the right side by ( λη κ t + C) and add also ( λη κ t + C) to both sides of the inequality. Here and in the following C is defined as One obtains Rearranging results in One can add λη κ at both sides and has the following Now, one can use the fact that and and thus rewrite inequality eq.45 as and finally arrive at 9 Acknowledgements NSC (Nation Science Council) of Taiwan provided the budget for our projects. Also thanks go to AIM-HI for various ways of support. N. M. M. thanks Chris Shane for his cross reading.