Next Article in Journal
Chirality of Dirac Spinors Revisited
Previous Article in Journal
Deep Learning Using Symmetry, FAST Scores, Shape-Based Filtering and Spatial Mapping Integrated with CNN for Large Scale Image Retrieval
Previous Article in Special Issue
Topological Symmetry Groups of the Heawood Graph
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of Recurrent Neural Network and Predictions

1
Seongsan Liberal Arts College, Daegu University, Kyungsan 38453, Korea
2
Department of Mathematics, College of Natural Sciences, Chungnam National University, Daejeon 34134, Korea
*
Author to whom correspondence should be addressed.
Symmetry 2020, 12(4), 615; https://doi.org/10.3390/sym12040615
Submission received: 14 March 2020 / Revised: 31 March 2020 / Accepted: 1 April 2020 / Published: 13 April 2020
(This article belongs to the Special Issue Discrete Mathematics and Symmetry)

Abstract

:
This paper analyzes the operation principle and predicted value of the recurrent-neural-network (RNN) structure, which is the most basic and suitable for the change of time in the structure of a neural network for various types of artificial intelligence (AI). In particular, an RNN in which all connections are symmetric guarantees that it will converge. The operating principle of a RNN is based on linear data combinations and is composed through the synthesis of nonlinear activation functions. Linear combined data are similar to the autoregressive-moving average (ARMA) method of statistical processing. However, distortion due to the nonlinear activation function in RNNs causes the predicted value to be different from the predicted ARMA value. Through this, we know the limit of the predicted value of an RNN and the range of prediction that changes according to the learning data. In addition to mathematical proofs, numerical experiments confirmed our claims.

1. Introduction

Artificial intelligence (AI) with machines are coming into our daily lives. In the near future, there will be no careers in a variety of fields, from driverless cars becoming commonplace, to personal-routine assistants, automatic response system (ARS) counsellors, and bank clerks. In the age of machines, it is only natural to let machines do the work [1,2,3,4,5], aiming for the operation principle of the machine and the direction of a machine’s prediction. In this paper, we analyzed the principles of operation and prediction through a recurrent neural network (RNN) [6,7,8].
The RNN is an AI methodology that handles incoming data in a time order. This methodology learns about time changes and predicts them. This predictability is possible because of the recurrent structure, and it produces similar results as the time series of general statistical processing [9,10,11,12]. We calculate the predicted value of a time series by calculating the general term of the recurrence relation. Unfortunately, the RNN calculation method is very similar to that of the time series, but the activation function in a neural-network (NN) structure is a nonlinear function, so nonlinear effects appear in the prediction part. For this reason, it is very difficult to find the predicted value of a RNN. However, due to the advantages of the recurrent structure and the development of artificial-neural-network (ANN) calculation methods, the accuracy of predicted values is improving. This led to better development and greater demand for artificial neural networks (ANNs) based on RNNs. For example, long short-term memory (LSTM), gated recurrent units (GRU), and R-RNNs [13,14,15,16] start from a RNN and are used in various fields. In other words, RNN-based artificial neural networks are used in learning about time changes and the predictions corresponding to them.
There are not many papers attempting to interpret the structure of recurrent structures, and results are also lacking. First, the recurrent structure is used to find the expected value by using it iteratively according to the order of data input over time. This is to predict future values from past data. In a situation where you do not know a future value, it is natural to use the information you know to predict the future. These logical methods include the time-series method in statistical processing, which is a numerical method. The RNN structure is very similar to the combination of these two methods. Autoregressive moving average (ARMA) in time series is a method of predicting future values by creating a recurrence relation by the linear combination of historical data. More details can be found in [17,18]. Taylor’s expanding RNN under certain constraints results in linear defects of historical data, such as the time series. More details are given in the text. From these results, this paper describes the range of the predicted value of a RNN.
This paper is organized as follows. Section 2 introduces and analyzes the RNN, and correlates it with existing methods. Section 3 explains the change of the predicted value through the RNN. Section 4 confirms our claim through numerical experiments.

2. RNN and ARMA Relationship

In this section, we explain how a RNN works by interpreting it. In particular, the RNN is based on the ARMA format in statistical processing. More details can be found in [19,20,21]. This is explained through the following process.

2.1. RNN

In this section, we explain RNN among various modified RNNs. For convenience, RNN refers to the basic RNN. The RNN that we deal with is
y t = w 1 h t + b y ,
where t represents time, y t is a predicted value, w 1 is a real value, and h t is a hidden layer. The hidden layer is computed by
h t = tanh ( w 2 x t + w 3 h t 1 + b h ) ,
where x t is input data, w 2 and w 3 are real values, and h t 1 is the previous hidden layer. For machine learning, let L S be the set of learning data, and let κ > 2 be the number of the size of L S . In other words, when the first departure time of learning data is 1, we can say that L S = { x 1 , x 2 , , x κ } . Assuming that the initial condition of the hidden layer is 0 ( h 0 = 0 ) , we can compute y t for each time t. x t is data on time and y t is a predicted value, so we want to satisfy y t = x t + 1 . Because unhappiness does not establish the equation, an error occurs between y t and x t + 1 . So, let E t = ( y t x t + 1 ) 2 and E = t = 1 κ 1 E t . Therefore, machine learning based on RNN is the process of finding w 1 , w 2 , and w 3 that can minimize error value E. We used x 1 , x 2 ,..., x κ 1 in learning data L S to find w 1 , w 2 , and w 3 that minimize error E, and used them to predict the values ( y κ , y κ + 1 ,...) after time κ . More details can be found in [22,23,24,25].

2.2. ARMA in Time Series

People have long wanted to predict stocks. This required predictions from historical data on stocks, and various methods have been studied and utilized. In particular, the most widely and commonly used is the ARMA method, which was developed on the basis of statistics. This method simply creates a linear combination of historical data for the value to be predicted and calculates it on this basis.
x ^ κ + 1 = C 0 x κ + C 1 x κ 1 + C 2 x κ 2 + , , C l x 0 + C e ,
where x 0 ,⋯, x κ are given data, and we can calculate predicted value x ^ l + 1 by calculating the values of C 0 , ⋯, C κ , and C . In order to obtain the values of C 0 , ⋯, C κ , and C , there are various methods, such as optimization by numerical data values, Yule–Walker estimation, and corelation calculation. This equation is used to predict future values through the calculation of general terms of the recurrence relation. More details can be found in [17].

2.3. RNN and ARMA

In RNN, the hidden layer is constructed by the hyperbolic tangent function that is
tanh ( x ) = e x e x e x + e x .
Function tanh is expanded:
tanh ( x ) = x 1 3 x 3 + 2 15 x 5 17 315 x 7 + . ,
where x is in [ π / 2 , π / 2 ] . Using this fact and expanding h t ,
h t = tanh ( w 2 x t + w 3 h t 1 ) = w 2 x t + w 3 h t 1 + e t ,
where e t is an error. Therefore, y t = w 1 w 2 x t + w 1 w 3 h t 1 + w 1 e t .
Since the same process is repeated for h t 1 ,
y t = w 1 w 2 x t + w 1 w 3 h t 1 + w 1 e t
= w 1 w 2 x t + w 1 w 3 w 2 x t 1 + w 3 h t 2 + e t 1 + w 1 e t
= w 1 w 2 x t + w 1 w 2 w 3 x t 1 + w 1 w 3 2 h t 2 + w 1 e t + w 1 w 3 e t 1 .
Repeatedly,
y t = w 1 w 2 x t + w 1 w 2 w 3 x t 1 + w 1 w 2 w 3 2 x t 2 + w 1 w 3 3 h t 3 + w 1 e t + w 1 w 3 e t 1 + w 1 w 3 2 e t 2 .
Therefore,
y t = k = 0 t 1 w 1 w 2 w 3 k x t k + w 1 w 3 k e t k + w 1 w 3 t h 0 .
If w 3 is less than 0.1, the terms after the fourth order ( w 3 4 ) are too small to affect the value to be predicted. Conversely, if w 3 is greater than 1, the value to be predicted increases exponentially. Under the assumption that we can expand hyperbolic tangent function (tanh), condition w 3 must be less than 1. Since we can change only w 1 , w 2 , and w 3 , the RNN can be written as
y t = w 1 w 2 x t + w 1 w 2 w 3 x t 1 + w 1 w 2 w 3 2 x t 2 + w 1 w 2 w 3 3 x t 3 + w 1 w 2 w 3 4 x t 4
= + w 1 w 2 e t + w 1 w 3 e t 1 + w 1 w 3 2 e t 2 + w 1 w 3 3 e t 3 + w 1 w 3 4 e t 4 .
This equation is an ARMA of order 5. More details can be found in [18]. This development method was developed on the premise that the variable part of the tanh function is smaller than a specific value ( tanh ( x ) and | x | < π / 2 ), and is limited in terms of utilization.

3. Analysis of Predicted Values

From the above section, w 1 , w 2 , w 3 , b y , and b h are fixed. Then, we obtained sequence { y κ } by the following equality:
y κ + 1 = w 1 h κ + b y = w 1 tanh w 2 y κ + w 3 h κ 1 + b h + b y , h κ = tanh θ h κ 1 + b ,
where θ = w 1 w 2 + w 3 and b = b h + w 2 b y .
Theorem 1.
Sequence { h κ } is bounded and has a converging subsequence.
Proof. 
Since tanh 1 , h κ 1 for all l. Using the Arzela–Ascoli theorem, there exists a converging subsequence. More details can be found in [26]. □
In order to see the change in the value of h κ , if the limit of h κ is h, Equation (14) is written as h = tanh θ h + b . Therefore, as the values of θ and b, the value of h that satisfies this equation changes.

3.1. Limit Points of Prediction Values

We now analyze the convergence value of the sequence. In order to see the convergence of the sequence, we introduced the following functions:
y = x
y = tanh θ x + b .
For calculation convenience, this equation changes as follows.
z = tanh θ z + b .
where z 0 is an initial condition, the convergence of z κ is z , and z satisfies Equation (17) ( z = tanh θ z + b ). Therefore, we have to look at the roots that satisfy the expression in Equation (17).
Theorem 2.
There should be at least one solution to Equation (17)
Proof. 
Let g ( z ) = tanh θ z + b z . Function g is continuous and differentiable. If z < 2 , then g ( z ) > 0 ; If z > 2 , then g ( z ) < 0 . Therefore, there exists at least one solution. □
Theorem 3.
If θ 1 , then the equation has just one solution.
Proof. 
If θ 1 , then g ( z ) = θ sech 2 ( θ z + b ) 1 0 . Therefore, g is a monotonically decreasing function. As a result of this, there exists only one solution satisfying g = 0 . □
Under the assumption that the value of θ > 1 , two values satisfying g ( z ) = 0 necessarily exist. Therefore, assuming θ > 1 , we find z l and z r satisfying θ sech 2 ( θ z l + b ) 1 = θ sech 2 ( θ z r + b ) 1 = 0 , and have g ( z l ) < g ( z r ) assuming z l < z r . Therefore, g ( z ) < 0 on z < z l , g ( z ) > 0 on z l < z < z r , and g ( z ) on z r < z from computing g. Assuming g ( z l ) = 0 and g ( z r ) = 0 , we have b = b l = θ tanh sech 2 1 1 / θ sech 2 1 1 / θ and b = b r = θ tanh sech 2 1 1 / θ sech 2 1 1 / θ , respectively. From computing sech 2 , b r < b l is obtained.
Theorem 4.
Assuming θ > 1 , If b = b l or b = b r then, g has two solutions. If b r < b < b l , then g has three solutions. If b l < b or b < b r , then g has one solution.
Proof. 
This proof assumes that θ > 1 . If b < b r , then we know g ( z r ) < 0 . Therefore, we have g ( z l ) < g ( z r ) < 0 . Since g ( z ) is a monotonically decreasing function on z < z l , there exists a unique solution, such that g ( z ) = 0 . If b = b r , then we know g ( z r ) = 0 . Therefore, we know g ( z l ) < g ( z r ) = 0 , and there exists a unique solution, such that g ( z ) = 0 on z < z l for the same reason. So, if b = b r , we have two solutions. One is g ( z ) = 0 on z < z l and the other is g ( z r ) = 0 . If b r < b < b l , we have g ( z l ) < and g ( z r ) > 0 . There are three solutions, such that g ( z ) = 0 on z < z l , g ( z ) = 0 on z l < z < z r , and g ( z ) = 0 on z l < z . If b = b l , we know that g ( z l ) = 0 . Therefore, since g ( z r ) > 0 , and g is a monotonically decreasing function on z r < z , there is a solution satisfying g ( z ) = 0 . So, if b = b l , we have two solutions, such that g ( z l ) = 0 and g ( z ) = 0 on z r < z . If b l < b , then g ( z l ) > 0 . Since g ( z r ) > g ( z l ) and g is a decrease function, there is a solution, such that g ( z ) = 0 on z > z r . □
In this section, we see the change in the number of solutions that satisfy Equation (17) as the values of θ and b change. The change of the sequence according to the initial condition of the sequence and according to the number of each solution of Equation (17) is explained.
Figure 1 shows the graph of z l and z r . If point ( θ , b) is contained in the white region, there is one solution. If point ( θ , b) lies in the red curve, there are two solutions. If point ( θ , b) is contained in the blue region, there are three solutions. In Section 4, we plot point (a, b) in the solution number region to check for the number of solutions of each case.

3.2. Change of Prediction Values (Sequence)

We examined the number of the solutions of g depending on the values of θ and b. In order to see the change of the predicted value according to the change of θ and b, Equation (14) was changed to z i + 1 = tanh θ z i + b , and sequence { z i } was obtained. Sequences { z i } , g, and h κ have the following relationship: z i + 1 = z i + g ( z i ) and z 0 = h κ . Therefore, the predicted value y κ + m + 1 was obtained by y κ + m + 1 = w 1 h κ + m + b y and h κ + m = z m . The solutions of g are the limit points of sequence { z i } by using z i + 1 = z i + g ( z i ) . One of the reasons we interpreted the predictions was to identify the movement condensation (the changing value) of the predictions. We saw various cases that made function g zero from the previous theorem. The change of the sequence according to initial condition z 0 in each case is explained.
Theorem 5.
Assuming θ > 1 and b l < b , sequence { z i } converged to z , where z satisfies g ( z ) = 0 .
Proof. 
Under condition θ > 1 and b l < b , g ( z ) > 0 on z < z and g ( z ) < 0 on z < z . If z 0 < z then g ( z 0 ) > 0 . From computing, { z i } is a monotonically increasing sequence. So, sequence { z i } converges to z . If z < z 0 then g ( z 0 ) < 0 . From computing, { z i } is a monotonically decreasing sequence. Therefore, sequence { z i } converged to z . □
Theorem 6.
Assuming θ > 1 and b l = b , there exist two solutions z l and z that satisfy g ( z ) = 0 . If z 0 < z l , sequence { z i } converges to z l . If z l < z 0 , sequence { z i } converges to z ,
Proof. 
0 g ( z ) on z < z . So { z i } is a monotonically increasing sequence from computing. If z 0 < z l , { z i } converges to z l ; if z l < z 0 < z , { z i } converges to z . On z
Theorem 7.
Assuming θ > 1 and b r < b < b l , if z 0 < z , { z i } converges to z l ; if z 0 > z , { z i } converges to z r , where z 0 is an initial condition.
Proof. 
From computing g ( z ) , we have g ( z ) > 0 on z < z l , and tanh ( θ z i + b ) > z i on z 0 < z l . Therefore sequence { z i } is a monotonically increasing sequence, and { z i } converges to z l . From g ( z ) > 0 , g is convex, and g ( z l ) = g ( z ) = 0 on z l < z < z , we have g ( z ) < 0 on z l < z < z . On z l < z 0 < z we have g ( z i ) < 0 and g ( z i ) = tanh ( θ z i + b ) z i < 0 . Sequence { z i } is a monotonically decreasing sequence, and the convergence value is z l . With the same calculation, g is concave, and g ( z ) = g ( z r ) = 0 . Therefore, g ( z ) > 0 on z < z < z r and g ( z i ) = tanh ( θ z i + b ) z i > 0 on z < z 0 < z r . Sequence { z i } is a monotonically increasing sequence, and the convergence value is z r . If z > z r , g ( z ) < 0 . Therefore, g ( z i ) = tanh ( θ z i + b ) z i > 0 on z 0 > z r . Sequence { z i } is a monotonically decreasing sequence, and the convergence value is z r . □
Theorem 8.
Assuming θ > 1 and b = b r , there exist two solutions z r and z that satisfy g ( z ) = 0 . If z r < z 0 , sequence { z i } converges to z r . If z < z 0 < z r , sequence { z i } converges to z . If z 0 < z , sequence { z i } converges to z ,
Proof. 
If z r < z 0 , g ( z 0 ) < 0 . Therefore, sequence { z i } is a monotonically decreasing sequence. So, sequence { z i } converges to z r . If z < z 0 < z r , g ( z 0 ) < 0 . Therefore, sequence { z i } is a monotonically decreasing sequence. So, sequence { z i } converges to z . If z 0 < z , g ( z 0 ) > 0 . Therefore, sequence { z i } is a monotonically increasing sequence. So, sequence { z i } converges to z . □
Theorem 9.
Assuming θ > 1 and b < b r , sequence { z i } converges to z , where z satisfies g ( z ) = 0 .
Proof. 
Under conditions ( θ > 1 and b < b r ), g ( z r ) < 0 . Therefore if z < z 0 then g ( z 0 ) < 0 . Therefore, sequence { z i } is a monotonically decreasing sequence. So, sequence { z i } converges to z . If z 0 < z , g ( z 0 ) > 0 . Therefore, sequence { z i } is a monotonically increasing sequence. So, sequence { z i } converges to z . □
Theorem 10.
Assuming 0 θ 1 , sequence { z i } converges to z , where z satisfies g ( z ) = 0 .
Proof. 
Under condition ( 0 θ 1 ), g ( z ) has a unique solution satisfying g ( z ) = 0 . If z 0 < z , g ( z 0 ) > 0 . Therefore, sequence { z i } is a monotonically increasing sequence. So, sequence { z i } converges to z . If z < z 0 , g ( z 0 ) < 0 . Therefore, sequence { z i } is a monotonically decreasing sequence. So, sequence { z i } converges to z . □
In condition θ > 0 , function tanh ( θ z + b ) is an increasing function, and there is no change of the sign of θ z . However, in condition θ < 0 , function tanh ( θ z + b ) is a decreasing function, and there is change of the sign of θ z .
Theorem 11.
Assuming 1 < θ < 0 , sequence { z i } converges to z , where z satisfies g ( z ) = 0 .
Proof. 
z i + 1 z i = tanh ( θ z i + b ) tanh ( θ z i 1 + b ) = θ sec 2 ( ζ ) z i z i 1 ,
where ζ is between z i 1 and z i . Therefore,
z i + 1 z i θ z i z i 1 .
Sequence { z i } is a C a u c h y s e q u e n c e that converges to z
Theorem 12.
Assuming θ 1 , sequence { z i } converges to z , where z satisfies g ( z ) = 0 , or sequence { z i } vibrates.
Proof. 
z i + 1 z i = tanh ( θ z i + b ) tanh ( θ z i 1 + b ) = θ sec 2 ( ζ ) z i z i 1 ,
where ζ is between z i 1 and z i . Therefore,
z i + 1 z i θ sec 2 ( ζ ) z i z i 1 .
If θ sec 2 ( ζ ) < 1 , sequence { z i } is a C a u c h y s e q u e n c e that converges to z . If θ sec 2 ( ζ ) 1 , sequence { z i } vibrates. □

4. Numerical Experiments

In this section, we confirmed the numerical results to identify RNN analysis interpreted in the previous section. As we saw in the previous section, RNN predictions appeared in three cases. Case 1 is Equation (17) that has one solution, Case 2 is Equation (17) that has two solutions, and Case 3 is Equation (17) that has three solutions. In Cases 1 to 3, we checked the number of solutions in Equation (17), and predicted the values according to the initial conditions. In Cases 4 through 7, experiments were conducted on the situation where learning data increase, learning data increase and decrease, learning data decrease and increase, and learning data vibrate. We obtained a picture from each numerical experiment. In each figure, (a) plots the RNN predictions and the learning data, the red curve is s i n , (b) denotes θ and b in the area of existence of the solution, and (c) is a picture of z about Equation (17).

4.1. Case 1: One-Solution Case of Equation (17)

The situation with one solution was divided into the case where θ is less than 1 and θ is greater than 1.

4.1.1. Theta < 1

Let x 0 = 0 , x 1 = 0.12 , x 2 = 0.23 , x 3 = 0.38 , and x 4 = 0.5 . x 0 x 4 are learning data. In this case, we obtained w 1 = 0.9 , w 2 = 0.9 , w 3 = 0.09 , b y = 0.2 and b h = 0.08 . Therefore, θ = 0.9 and b = 0.1 . The limit of the y t is y (0.65).
In Figure 2a, x 0 x 4 are the black stars and y 0 y 40 are the prediction values (blue line). Figure 2b shows θ and b ( = ( θ , b ) ) . Figure 2c shows the result of Equation (17). In Figure 2c, * is z 0 . From Figure 2, we see that from the learning data, the solution of Equation (17) is one, initial value z 0 is 0.6, and z 40 is 0.5.

4.1.2. Theta > 1

Let x 0 = 0 , x 1 = 0.03 , x 2 = 0.15 , x 3 = 0.33 , and x 4 = 0.4 . x 0 x 4 are learning data. In this case, we obtained w 1 = 0.9 , w 2 = 0.1 , w 3 = 1.39 , b y = 0.2 and b h = 0.18 . Therefore, θ = 1.3 and b = 0.2 . The limit of y t is y (0.64).
Figure 3 also shows results similar to those in Figure 2. Figure 3a shows x 0 x 4 and y 4 y 40 ( y 4 y 40 are the prediction values). Figure 3b shows θ and b. Figure 3c shows the result of Equation (17).

4.2. Case 2: Two-Solution Case of Equation (17)

This situation is two solutions of Equation (17) by ( θ , b ) = (1.3, 0.101). Let x 0 = 0 , x 1 = 0.02 , x 2 = 0.19 , x 3 = 0.36 , and x 4 = 0.5 . x 0 x 4 are learning data. Figure 4 shows the solution number region and ( θ , b ) (black star). As shown in Figure 4, there are two solutions to Equation (17) from the learning data. In this situation, we conducted two experiments. The first case was initial condition z 0 existing between z l and z r . The second case was initial condition z 0 being less than z l . In the first case, the limited value of z i from the proof had to go to z r , and in the second case, the limited value of z i from the proof had to go to z l . This result was verified from the numerical experiments. The theory of the previous section was exempted through this numerical experiment.

4.2.1. First Case

In this case, we obtained w 1 = 0.9 , w 2 = 0.4 , w 3 = 0.94 , b y = 0.1 and b h = 0.141 . Therefore, θ = 1.3 and b = 0.101 . The limit of y t is y (0.47).
Figure 5a shows that x 0 x 4 are the black stars and y 0 y 40 are the prediction values (blue line). Figure 5b shows the result of Equation (17). In Figure 5b, * is z 0 , and z 40 is 0.71.

4.2.2. Second Case

In this case, we obtained w 1 = 0.6 , w 2 = 6.5 , w 3 = 2.6 , b y = 0.7 and b h = 4.65 . Therefore, θ = 1.3 and b = 0.101 . The limit of y t is y (0.2).
Figure 6a shows that x 0 x 4 are the black stars and y 0 y 40 are the prediction values (blue line). Figure 6b shows the result of Equation (17). In Figure 6b, * is z 0 , and z 40 is −0.34.

4.3. Case 3: Three-Solution case of Equation (17)

This situation is three solutions of Equation (17) by ( θ , b ) = (2, 0.1). Let x 0 = 0 , x 1 = 0.01 , x 2 = 0.16 , x 3 = 0.37 , and x 4 = 0.46 . x 0 x 4 are learning data. Figure 7 shows the solution number region and ( θ , b ) (black star). As shown in Figure 7, there are three solutions from the learning data. In this situation, we conducted two experiments. For convenience, the three roots are indicated by z l , z , and z r , respectively, as in the notation above. The first case was initial condition z 0 existing between z l and z r . The second case is initial condition z 0 existing between z l and z . In the first case, the limited value of z i from the proof had to go to z r , and in the second case, the limited value of z i from the proof had to go to z l . This result was verified from the numerical experiments. The theory of the previous section was exempted through numerical experiments.

4.3.1. First Case

In this case, we obtained w 1 = 0.6 , w 2 = 0.5 , w 3 = 1.7 , b y = 0.1 and b h = 0.15 . Therefore, θ = 2 and b = 0.1 . The limit of the y t is y (0.58).
In Figure 8a, x 0 x 4 are the black stars and y 0 y 40 are the prediction values (blue line). Figure 8b shows the result of Equation (17). In Figure 8b, * is z 0 , and z 40 is 0.79.

4.3.2. Second Case

In this case, we obtained w 1 = 1.2 , w 2 = 3 , w 3 = 1.6 , b y = 0.1 , and b h = 0.2 . Therefore, θ = 2 and b = 0.1 . The limit of y t is y (1.03).
In Figure 9a, x 0 x 4 are the black stars and y 0 y 40 are the prediction values (blue line). Figure 9b shows the result of Equation (17). In Figure 5b, * is z 0 , and z 40 is −0.86.

4.4. Case 4: Learning Data Increase

Let x 0 = 0 , x 1 = 0.15 , x 2 = 0.3 , x 3 = 0.45 , and x 4 = 0.58 . x 0 x 4 are learning data. In this case, we obtained w 1 = 0.96 , w 2 = 0.95 , w 3 = 0.13 , b y = 0.24 , and b h = 0.08 . Therefore, θ = 1.04 and b = 0.15 . The limit of y t is y (0.93).
In Figure 10a, x 0 x 4 are the black stars and y 0 y 40 are the prediction values (blue line). Figure 10b shows θ and b. Figure 10c shows the result of Equation (17). In this case, x 4 . From θ and b, Equation (17) has one solution. As can be seen in Figure 10, learning data increased and converged to a specific value.

4.5. Case 5: Learning Data Increase and Decrease

Let x 0 = 0.95 , x 1 = 0.98 , x 2 = 1 , x 3 = 0.98 , and x 4 = 0.95 . x 0 x 4 are learning data. In this case, we obtained w 1 = 0.49 , w 2 = 0.58 , w 3 = 0.07 , b y = 0.67 , and b h = 0.2 . Therefore, θ = 0.21 and b = 0.6 . The limit of y t is y (0.97).
In Figure 11a, x 0 x 4 are the black stars and y 0 y 40 are the prediction values (blue line). Figure 11b shows θ and b. Figure 11c shows the result of Equation (17). In this case, x 4 . From θ and b, Equation (17) has one solution. As can be seen in Figure 11, the training data converged to a specific value after increasing and decreasing. From θ and b, Equation (17) has one solution. As can be seen in Figure 11, the average value of the learning data gave the predicted value.

4.6. Case 6: Learning Data Decrease and Increase

Let x 0 = 0.95 , x 1 = 0.98 , x 2 = 1 , x 3 = 0.98 , and x 4 = 0.95 . x 0 x 4 are learning data. In this case, we obtained w 1 = 0.32 , w 2 = 0.55 , w 3 = 0.14 , b y = 0.58 , and b h = 0.28 . Therefore, θ = 0.06 and b = 0.47 . The limit of y t is y (-0.97).
In Figure 12a, x 0 x 4 are the black stars and y 0 y 40 are the prediction values (blue line). Figure 12b shows θ and b. Figure 12c shows the result of Equation (17). In this case, x 4 . From θ and b, Equation (17) has one solution. As can be seen in the Figure 12, data increased and converged to a specific value. From θ and b, Equation (17) has one solution. As can be seen in Figure 12, the average value of the learning data gave the predicted value.

4.7. Case 7: Learning Data Vibrate

Let x 0 = 1 , x 1 = 1 , x 2 = 1 , x 3 = 1 , and x 4 = 1 . x 0 x 4 are learning data. In this case, we obtained w 1 = 0.5 , w 2 = 11.74 , w 3 = 5.15 , b y = 0 , and b h = 2.48 . Therefore, θ = 0.71 and b = 2.48 The limit of y t is y (0.5).
In Figure 13a, x 0 x 4 are the green circles, y 0 y 4 are the black stars, and y 4 y 40 are the prediction values (blue line). In Figure 13a, the reason that the value of learning data ( x t ) and the values of the learning result ( y t ) are different is that the RNN structure was simple, and sufficient learning was not achieved. In future work, we aim to study the RNN structure to learn these complex learning data well. Figure 13b shows θ and b. Figure 13c shows the result of Equation (17). In this case, x 4 . From θ and b, Equation (17) has one solution. As can be seen in Figure 13, data increased and converged to a specific value. In this case of θ and b, the solution of Equation (17) should be one. However, two contents are contradictory because learning data should be presented in two cases, 1 and −1. As a result, the cost function only increased.

5. Conclusions

In this paper, we interpreted the structure of the underlying the RNN and, on this basis, we found the principles that the RNN could predict. A basic RNN works like a time series in a very narrow range of variables. In a general range, a nonlinear function of which the maximum and minimum are specified causes the value of a function to fall within an iterative range. Because the function value is repeated within a certain range, the predicted value behaves like fixed-point iteration. In other words, we used the tanh (activation) function, so that the value was in the range of −1 to 1, and the absolute value of the predicted value in this range was less than 1. As a result, as the prediction value was repeated, the prediction value converged to a specific value. Through this paper, we found that the basic operating principle of a RNN is the operation principle of the time series, which we know as linear analysis and fixed-point iteration, which is nonlinear. In general, the solution of Equation (17) was one of the numerical calculations. Therefore, the present structure could not be solved in the case of numerical experiment Case 7 (learning data vibration). To solve this problem, it is necessary to diversify the structure, increase the number of layers, and switch to a vector structure. Next, we aim to further study RNNs in vector structures.

Author Contributions

Conceptualization, J.P. and D.Y.; Data curation, J.P.; Formal analysis, D.Y.; Funding acquisition, D.Y.; Investigation, J.P.; Methodology, D.Y. and S.J.; Project administration, J.P. and D.Y.; Resources, J.P.; Software, S.J.; Supervision, S.J.; Validation, S.J.; Visualization, S.J.; Writing—original draft, D.Y.; Writing—review & editing, J.P. and S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education, Science, and Technology (grant number NRF-2017R1E1A1A03070311).

Acknowledgments

We sincerely thank the anonymous reviewers whose suggestions helped to greatly improve and clarify this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  2. Werbos, P.J. Generalization of backpropagation with application to a recurrent gas market model. Neural Netw. 1988, 1, 339–356. [Google Scholar] [CrossRef] [Green Version]
  3. Schmidhuber, J. A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks. Connect. Sci. 1989, 1, 403–412. [Google Scholar] [CrossRef]
  4. Cho, K.; Merrienboer, B.V.; Bahdanau, D.; Bengio, Y. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar]
  5. Jin, Z.; Zhou, G.; Gao, D.; Zhang, Y. EEG classification using sparse Bayesian extreme learning machine for brain—Computer interface. Neural Comput. Appl. 2018, 1–9. [Google Scholar] [CrossRef]
  6. Schmidhuber, J. A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks. Neural Comput. 1992, 4, 243–248. [Google Scholar] [CrossRef]
  7. Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML 2013), Atlanta, GA, USA, 16–21 June 2013; pp. 1310–1318. [Google Scholar]
  8. Cho, K.; Merrienboer, B.V.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
  9. Dangelmayr, G.; Gadaleta, S.; Hundley, D.; Kirby, M. Time series prediction by estimating markov probabilities through topology preserving maps. In Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation II; International Society for Optics and Photonics: Bellingham, WA, USA, 1999; Volume 3812, pp. 86–93. [Google Scholar]
  10. Wang, P.; Wang, H.; Wang, W. Finding semantics in time series. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, Athens, Greece, 12–16 June 2011; pp. 385–396. [Google Scholar]
  11. Afolabi, D.; Guan, S.; Man, K.L.; Wong, P.W.H.; Zhao, X. Hierarchical Meta-Learning in Time Series Forecasting for Improved Inference-Less Machine Learning. Symmetry 2017, 9, 283. [Google Scholar] [CrossRef] [Green Version]
  12. Xu, X.; Ren, W. A Hybrid Model Based on a Two-Layer Decomposition Approach and an Optimized Neural Network for Chaotic Time Series Prediction. Symmetry 2019, 11, 610. [Google Scholar] [CrossRef] [Green Version]
  13. Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
  14. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  15. Gers, F.A.; Schraudolph, N.N.; Schmidhuber, J. Learning Precise Timing with LSTM Recurrent Networks. J. Mach. Learn. Res. 2002, 3, 115–143. [Google Scholar]
  16. Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
  17. Brockwell, P.J.; Davis, R. Introduction to Time-Series and Forecasting; Springer: New York, NY, USA, 2002. [Google Scholar]
  18. Shumway, R.H.; Stoffer, D.S. Time Series Analysis and Its Applications; Springer: New York, NY, USA, 2000. [Google Scholar]
  19. Elman, J.L. Finding structure in time. Cognit. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
  20. Rohwer, R. The moving targets training algorithm. In Advances in Neural Information Processing Systems 2; Touretzky, D.S., Ed.; Morgan Kaufmann: San Matteo, CA, USA, 1990; pp. 558–565. [Google Scholar]
  21. Mueen, A.; Keogh, E. Online discovery and maintenance of time series motifs. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; pp. 1089–1098. [Google Scholar]
  22. Khaled, A.A.; Hosseini, S. Fuzzy adaptive imperialist competitive algorithm for global optimization. Neural Comput. Appl. 2015, 26, 813–825. [Google Scholar] [CrossRef]
  23. Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference for Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  24. Zhang, Y.; Wang, Y.; Zhou, G.; Jin, J.; Wang, B.; Wang, X.; Cichocki, A. Multi-kernel extreme learning machine for EEG classification in brain-computer interfaces. Expert Syst. Appl. 2018, 96, 302–310. [Google Scholar] [CrossRef]
  25. Zhang, X.; Yao, L.; Wang, X.; Monaghan, J.; Mcalpine, D.; Zhang, Y. A Survey on Deep Learning based Brain Computer Interface: Recent Advances and New Frontiers. arXiv 2019, arXiv:1905.04149. [Google Scholar]
  26. Yosida, K. Functional Analysis; Springer: New York, NY, USA, 1965. [Google Scholar]
Figure 1. Solution number region.
Figure 1. Solution number region.
Symmetry 12 00615 g001
Figure 2. One-solution case of Equation (17) (θ < 1).
Figure 2. One-solution case of Equation (17) (θ < 1).
Symmetry 12 00615 g002
Figure 3. One-solution case of Equation (17) (θ > 1).
Figure 3. One-solution case of Equation (17) (θ > 1).
Symmetry 12 00615 g003
Figure 4. Solution number region in Case 2.
Figure 4. Solution number region in Case 2.
Symmetry 12 00615 g004
Figure 5. Two-solution case of Equation (17).
Figure 5. Two-solution case of Equation (17).
Symmetry 12 00615 g005
Figure 6. Two-solution case of Equation (17).
Figure 6. Two-solution case of Equation (17).
Symmetry 12 00615 g006
Figure 7. Solution number region in Case 3.
Figure 7. Solution number region in Case 3.
Symmetry 12 00615 g007
Figure 8. Three-solution case of Equation (17).
Figure 8. Three-solution case of Equation (17).
Symmetry 12 00615 g008
Figure 9. Three-solution case of Equation (17).
Figure 9. Three-solution case of Equation (17).
Symmetry 12 00615 g009
Figure 10. Learning data increase.
Figure 10. Learning data increase.
Symmetry 12 00615 g010
Figure 11. Learning data increase and decrease.
Figure 11. Learning data increase and decrease.
Symmetry 12 00615 g011
Figure 12. Learning data decrease and increase.
Figure 12. Learning data decrease and increase.
Symmetry 12 00615 g012
Figure 13. Learning data vibrate.
Figure 13. Learning data vibrate.
Symmetry 12 00615 g013

Share and Cite

MDPI and ACS Style

Park, J.; Yi, D.; Ji, S. Analysis of Recurrent Neural Network and Predictions. Symmetry 2020, 12, 615. https://doi.org/10.3390/sym12040615

AMA Style

Park J, Yi D, Ji S. Analysis of Recurrent Neural Network and Predictions. Symmetry. 2020; 12(4):615. https://doi.org/10.3390/sym12040615

Chicago/Turabian Style

Park, Jieun, Dokkyun Yi, and Sangmin Ji. 2020. "Analysis of Recurrent Neural Network and Predictions" Symmetry 12, no. 4: 615. https://doi.org/10.3390/sym12040615

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop