Independent Random Recurrent Neural Networks for Infrared Spatial Point Targets Classification

Exo-atmospheric infrared (IR) point target discrimination is an important research topic of space surveillance systems. It is difficult to describe the characteristic information of the shape and micro-motion states of the targets and to discriminate different targets effectively by the characteristic information. This paper has constructed the infrared signature model of spatial point targets and obtained the infrared radiation intensity sequences dataset of different types of targets. This paper aims to design an algorithm for the classification problem of infrared radiation intensity sequences of spatial point targets. Recurrent neural networks (RNNs) are widely used in time series classification tasks, but face several problems such as gradient vanishing and explosion, etc. In view of shortcomings of RNNs, this paper proposes an independent random recurrent neural network (IRRNN) model, which combines independent structure RNNs with randomly weighted RNNs. Without increasing the training complexity of network learning, our model solves the problem of gradient vanishing and explosion, improves the ability to process long sequences, and enhances the comprehensive classification performance of the algorithm effectively. Experiments show that the IRRNN algorithm performs well in classification tasks and is robust to noise.


Introduction
Spatial targets recognition is a significant problem in precise guidance systems and space surveillance systems.Infrared imaging technology is widely used in spatial targets recognition systems.On account of the long distance between the target and the sensor, spatial targets are often shown as a single pixel on the infrared image which is a great challenge in recognition [1].The grey level of the targets changes along with time called infrared signature, which contains numerous information and can be employed in discrimination systems.
In the past few decades, a lot of analysis methods have been utilized.Resch [2] implemented exoatmosphere object recognition using the ratios of the object's irradiance and the time-averaged irradiance values of each object in the FOV (field of view).A spatial target may have micro-motions due to maneuvering control or uneven force during exo-atmospheric flight, such as tumbling, spinning and precessing [3,4].An analysis method based on mixed micro-Doppler time-frequency sequences has been put forward to extract micro-motion dynamic and inertial characteristics (including the spin rate, the precession rate, and the nutation angle, etc.) of free rigid targets in the space [5][6][7].
As the important precondition for target classification, the IR radiation signature has been studied and achieved effective results.Constructing a model of the infrared signature of spatial targets helps us to understand the signature better [8].Dynamic parameters in the model change with the movement of the target.The infrared signature is influenced by a range of factors, such as wavelength range, LOS (line of sight) orientation, shape, and temperature of targets [9,10].
In the field of machine learning, Artificial Neural Networks (ANNs) have superior feature learning and data representation capabilities [11,12].RNNs is an improved structure of feed-forward ANNs which has time-recurrent structures and memory ability of previous information.Moreover, RNNs algorithm has a simple structure, high computational efficiency, and low computational and storage resources [13].However, the RNNs algorithm only focuses on local information essentially, especially when using the error back-propagation method to train the network, which inevitably limits the RNNs' grasp of the overall information of the sequence, hindering its ability to learn complex decision functions [13,14].The widely used LSTM algorithm is capable of selectively remembering and forgetting data, and retaining hidden important data and features for a longer period of time.By using the "processor" in the LSTM algorithm to determine whether the information is important, the information of the target data waveform is forced to "memorize" and "forget" by learning the training method [15][16][17].However, for both RNNs and LSTM, the ability to process long sequences is limited, which is not conducive to the characteristics of the periodicity of the time series data [18].
In the research of spatial targets classification, it is necessary to propose a more effective classification algorithm for the infrared radiation intensity sequence characteristics based on the actual situation [19].Based on the recurrent structure of RNNs, this paper proposes an IRRNN model, which adopts an independent structure in the hidden layer, so the unsaturated activation function can be used to solve the problem of gradient vanishing and gradient explosion.At the same time, our model introduces the historical output information into the input layer in the form of random weighting, which is the direction that the data tends to be easy to classify [20].
The rest of the paper is organized as follows.An infrared signature model is constructed in Section 2 and simulation has been conducted.Discrimination of spatial point targets based on IRRNN is put forward in Section 3, followed by experiments and a discussion in Section 4. Conclusions are presented in the last section.

Radiation Intensity Analysis
The emitted radiation is the main part of external radiation on the surface of the targets in outer space.The emitted radiation is determined by temperature, infrared emissivity, projection area, observing angle, etc. [21].If the target surface is assumed to have gray body radiation and diffuse reflection characteristics, according to Planck's law, the infrared radiation intensity received by the detector focal plane in the band can be approximated as Suppose A 0 is the entrance pupil area, M T (λ) is the radiation value at temperature of T, and ∆t is integration time of observing.If all available parameters are attributed to κ, then Equation (1) can be further expressed as It can be seen from the formula that the infrared radiation intensity is mainly determined by the target surface temperature T, the detection distance R (i.e., the linear distance between the detector and the target), and the geometric projection area of the target in the Line of Sight (LOS).
The heat transfer method in the outer space environment is mainly heat radiation, and the changes of T of various targets in the middle stage are only maintained in a small range [22].In the situation of long-distance detection, there is often a dense point target group, and the spatial distance is close.The changes of R are basically the same, so these two variables are difficult to provide important information for the classification of ballistic targets [23,24].In addition, A proj depends on the target nutation and geometric parameters, which affects the wave structure of the data and is an important parameter for target recognition research.

Attitude Motion Model
To calculate the geometric projection area of the ballistic target in the LOS direction, it is common practice to split the target surface into a number of small pieces and then accumulate the projected area of each small piece.Assume that the target surface is divided into N small-area slices, the normal vector and area of each slice are respectively → n i and a i , and the vector of LOS is → n los .The key to calculating the projection area sequence is to transform → n i and → n los into the reference coordinate system and determine the rotational transformation relationship of → n i over time in the reference coordinate system.Assume that the corresponding vector of → n i in the reference coordinate system (X, Y, Z) is → n i , the conversion relationship between the two can be described by the rotation matrix R init determined by the Euler angle (ϕ,θ,ϕ) [22].The order of Euler angles is zxy; R init can be mathematically expressed as At time t 0 , the azimuth and elevation angles of the target local axis z in the reference coordinate system are α 0 and β 0 respectively, as shown in Figure 1, and R init can be determined by the Euler angle (−α 0 , 0, 0.5π-β 0 ), so The target nutation contains two rotational motions: spinning and coning, as shown in Figure 1.Let the azimuth and elevation angles of the nutation axis be αn and βn, and the angular velocities of the coning and spinning are ωn and ωs respectively.According to the Rodriguez formula [25], the nutation rotation matrix R(t) of the target at the time t is where The target nutation contains two rotational motions: spinning and coning, as shown in Figure 1.Let the azimuth and elevation angles of the nutation axis be α n and β n , and the angular velocities of the coning and spinning are ω n and ω s respectively.According to the Rodriguez formula [25], the nutation rotation matrix R(t) of the target at the time t is where In the Equation ( 5), ê1 , ê2 are antisymmetric matrices, which are defined as Therefore, the vector The geometric projection area of the target at this time is

Infrared Radiation Sequence Simulation
The infrared radiation model and the attitude motion model of the spatial point target are analyzed, and the main factor affecting the target radiation sequence (projection area) are discussed in Sections 2.1 and 2.2.Based on the above model, a visual simulation experiment is performed on the infrared radiation sequence of the spatial point target in this section.
Our simulation is based on the elliptical ballistic theory to calculate the flight path of a spatial target.Assuming that the spatial target is only affected by the gravity of the Earth, according to the law of universal gravitation and Newton's second law, the differential equation of the basic motion of the space object is where µ = 3.986005 × 10 14 m 3 /s 2 is the gravity constant of the Earth.According to theoretical mechanics, the space target flight trajectory is located in the ballistic plane determined by its velocity vector and the Earth's gravitational vector.According to the law of conservation of momentum and the law of universal gravitation, the equation of elliptical ballistic motion of the target can be derived as where e is the eccentricity of the elliptical trajectory, and P is the half-diameter.We assume that the scanning frequency of the infrared sensor is 50 Hz, the aperture of the lens is 0.25 m, and the detection wavebands is 8-10 µm [26].Assume that the starting point of the free flight segment is (135 • E, 52 • N, 151 km) and the highest point of the flight path is 457.3 km from the ground.The unit vector of LOS is set as n' = [0.59,0.34, 0.73].
The simulation uses spatial point target data of four different shape types, including cone, cone-cylinder, ball-base cone, and curved pieces.The shape, physical property, micro-motion parameters, and sensor property parameters of various spatial targets are shown in Table 1 [4,13,22].Considering the thermal noise, non-uniformity of the infrared sensor, etc., Gaussian additive white noise is used to describe the data deviation caused by these factors in the infrared radiation simulation to improve the authenticity of the data [22].The gray scale sequences generated by these target shapes are shown in Figure 2.   Considering the thermal noise, non-uniformity of the infrared sensor, etc., Gaussian additive white noise is used to describe the data deviation caused by these factors in the infrared radiation simulation to improve the authenticity of the data [22].The gray scale sequences generated by these target shapes are shown in Figure 2.   Considering the thermal noise, non-uniformity of the infrared sensor, etc., Gaussian additive white noise is used to describe the data deviation caused by these factors in the infrared radiation simulation to improve the authenticity of the data [22].The gray scale sequences generated by these target shapes are shown in Figure 2.   Considering the thermal noise, non-uniformity of the infrared sensor, etc., Gaussian additive white noise is used to describe the data deviation caused by these factors in the infrared radiation simulation to improve the authenticity of the data [22].The gray scale sequences generated by these target shapes are shown in Figure 2.

Micro-motion Spinning and coning
Spinning and coning Tumbling Tumbling Micro-motion parameters  Considering the thermal noise, non-uniformity of the infrared sensor, etc., Gaussian additive white noise is used to describe the data deviation caused by these factors in the infrared radiation simulation to improve the authenticity of the data [22].The gray scale sequences generated by these target shapes are shown in Figure 2. Considering the thermal noise, non-uniformity of the infrared sensor, etc., Gaussian additive white noise is used to describe the data deviation caused by these factors in the infrared radiation simulation to improve the authenticity of the data [22].The gray scale sequences generated by these target shapes are shown in Figure 2. Considering the thermal noise, non-uniformity of the infrared sensor, etc., Gaussian additive white noise is used to describe the data deviation caused by these factors in the infrared radiation simulation to improve the authenticity of the data [22].The gray scale sequences generated by these target shapes are shown in Figure 2. Considering the thermal noise, non-uniformity of the infrared sensor, etc., Gaussian additive white noise is used to describe the data deviation caused by these factors in the infrared radiation simulation to improve the authenticity of the data [22].The gray scale sequences generated by these target shapes are shown in Figure 2.
Considering the thermal noise, non-uniformity of the infrared sensor, etc., Gaussian additive white noise is used to describe the data deviation caused by these factors in the infrared radiation simulation to improve the authenticity of the data [22].The gray scale sequences generated by these target shapes are shown in Figure 2.

Classification of IR Radiation Intensity Sequence Based on IRRNN
Aiming at the spatial point target shape classification problem studied in this paper, and according to the characteristics of the target infrared radiation intensity time series samples, this paper proposes an Independent Random RNN algorithm structure.The main idea is to use the independent neuron structure to extend the length of the RNN neural network and introduce a Random RNN (RRNN) algorithm to add all historical network output values before the current time,

Classification of IR Radiation Intensity Sequence Based on IRRNN
Aiming at the spatial point target shape classification problem studied in this paper, and according to the characteristics of the target infrared radiation intensity time series samples, this paper proposes an Independent Random RNN algorithm structure.The main idea is to use the independent neuron structure to extend the length of the RNN neural network and introduce a Random RNN (RRNN) algorithm to add all historical network output values before the current time, together with the random weight matrix, to the input part of the network.

Structure of IndRNN
In this section, we will introduce the structure of IndRNN.The main difference with RNNs is the way the hidden layer is connected.According to the calculation formula of the traditional RNNs, we describe the IndRNN using the following formula: The weight w is a vector, consisting of all diagonal elements of W HH in the Equation ( 8), and the dimension is H. Symbol indicates the Hadamard product, which is the corresponding elements of the two matrices multiplied.It can be seen from Equation ( 8) that each neuron in the hidden layer is independent of other neurons in the layer.So for the jth neuron, the state of its hidden layer h j,t can be expressed as: where W HI j and w j are the jth row and the jth element of the weight matrix and the weight vector, respectively.Each hidden layer neuron only receives the information from the input and its own state at a previous moment.So each hidden layer neuron in the IndRNN processes a spatial-temporal pattern independently.

Analysis of IndRNN Structure
This section mainly explains the gradient back propagation of IndRNN and how it solves the problem of gradient vanishing and explosion.For gradient back propagation of each layer, the gradient of the IndRNN can be calculated independently for each neuron because there is no interaction between the neurons in one hidden layer [19].
The output of the jth neuron is calculated without ignoring the deviation, h j,t = f W HI j x t + w j h j,t−1 .Assuming that the objective function to be minimized at time t' is V j , when the gradient is propagated back to time step t, we have: where f j,k+1 is the derivative of the element activation function.It can be seen that the gradient only relates to exponential terms of the scalar value w j that can be easily adjusted, as well as the gradient of the activation function, which is usually defined over a certain range.However, the gradient of the RNN ∂V ∂h t t −1 k=t diag( f (h k+1 ))W T , where diag( f (h k+1 )) is the Jacobian matrix of the element activation function.Compared with RNNs, the gradient of IndRNN depends directly on the recurrent weight, which only changes a small amplitude according to the learning rate.RNNs depends on the matrix product, which is mainly determined by the eigenvalue, and the change is very intense even if only each matrix element has small changes [14].Therefore, the training of IndRNN is better than traditional RNNs.In order to solve the gradient explosion and vanishing problem over time, we only need to adjust the exponential term w t −t j t −1 k=t f j,k+1 to the appropriate range.
In order to maintain long-term memory in the network, the current state (time step t) can still effectively influence the future state (time step t ) after a large time interval, so the gradient at time t' should also be effectively propagated to the time step t.By assuming a minimum effective gradient , a range of recurrent weights of IndRNN neurons can be obtained to maintain long-term memory.Specifically, in order to maintain the memory of the t'-t time step, it can be obtained according to Equation (10).
In order to avoid the vanishing of the gradient of the neurons, the above constraints should be satisfied.To avoid gradient explosion problems, the scope needs to be further constrained to where γ is the maximum gradient value without explosion.For commonly used activation functions, such as ReLU and tanh, their derivatives are not greater than 1, i.e., Especially for ReLU, the gradient is 0 or 1. Considering that short-term memory is important for network performance, the constraints on recurrent weight ranges using the ReLU activation function can be relaxed to w j ∈ 0, t −t √ γ .When the recurrent weight is 0, the neuron uses only information from the current input without retaining any memory information in the past.In this way, different neurons can learn to keep memories of different lengths.

Experiments to Process Long Sequences
Task Description: Enter two sequences, the first sequence is a string of evenly sampled between (0,1), the second sequence is a string of equal length, of which only two numbers are 1, and the rest of the numbers are 0. It is required that the output is the sum of the two numbers in the first sequence corresponding to the two digits 1 in the second sequence.This experiment was used to test whether the model has long-term memory capacity [15].The experimental sequence lengths were 100, 1000, 2000, and 5000, respectively, using MSE as the objective function.
LSTM is currently used with a wide range as an improved RNN structures and for comparison.The hidden layer structure in the LSTM and IndRNN network models in the experiment is a layer containing 128 neurons.LSTM uses tahn as the activation function, the initial learning rate is set to 2 × 10 −3 ; IndRNN uses ReLU as the activation function, and the initial learning rate is set to 2 × 10 −4 .The experiment uses mean square error (MSE) as the objective function, and uses Adam optimization method to update the network parameters in the training process.Both training data and testing data were randomly generated throughout the experiment.
The results are shown in Figure 3a-d.First, for short sequences (T = 100), both models perform well and converge to very small errors.When the sequence length is increased to 1000, the LSTM is no longer able to minimize the error.However, the IndRNN model can still converge very quickly.
were randomly generated throughout the experiment.
The results are shown in Figure 3a-d.First, for short sequences (T = 100), both models perform well and converge to very small errors.When the sequence length is increased to 1000, the LSTM is no longer able to minimize the error.However, the IndRNN model can still converge very quickly.We also performed a sequence of 2000 and 5000 sequences on the IndRNN model.The results are shown in Figure 3c,d, and the IndRNN still converges well.This illustrates that IndRNN can use We also performed a sequence of 2000 and 5000 sequences on the IndRNN model.The results are shown in Figure 3c,d, and the IndRNN still converges well.This illustrates that IndRNN can use the ReLU activation function to effectively solve the gradient explosion and vanishing problem over time, making training efficient and maintaining long-term memory.

Structure of RRNN
The RRNN algorithm adds the historical information before current state in the input space through the random weight matrix, as shown in Figure 4.The input layer in the figure consists of two parts, one is the data input at time step t, and the other is the weighted mapping of all output data information before time t.The rest of the network structure is consistent with the traditional RNN model.The RRNN algorithm adds the historical information before current state in the input space through the random weight matrix, as shown in Figure 4. input layer in the figure consists of two parts, one is the data input at time step t, and the other is the weighted mapping of all output data information before time t.The rest of the network structure is consistent with the traditional RNN model.
Hidden layer Input layer In the input layer of RRNN, the historical output information is transmitted as a storage unit together with the input data at time step t to the hidden layer, and the storage memory of the historical information is enhanced.Then the input of the hidden layer at time step t is as follows: where β is the weight that determines the proportion of historical information in the input space.The larger the value is, the larger the weight of the historical information is and the smaller the proportion of input information at the current time is, and β can be determined empirically.The historical output information yi, i = 1, 2,... t is analyzed.Since the output of the network is not very reliable at the first few states and the noise is high, it may cause deviations after mapping to the input space.Therefore, it is considered whether to use a randomly generated weight matrix Wi to cancel the noise, that is, each column element of Wi is subject to a random distribution of N (0, 1).The advantage of using the random weight matrix Wi is that in the high-dimensional space, the historical information yi can make the input information xt at time step t tend to different directions in space by random weighting, so that the combined xt data is more separable.According to the pseudo-orthogonal property of the high-dimensional space [27], when the number of rows of the random weight matrix Wi is large, the column vectors of Wi are approximately orthogonal.In this paper, the length of the input sequence satisfies the requirement of higher dimension, and the column vectors of Wi can be regarded as orthogonal.The network output yi in this paper is the classification result, and the orthogonal column vector of the random weight matrix Wi will weight the In the input layer of RRNN, the historical output information is transmitted as a storage unit together with the input data at time step t to the hidden layer, and the storage memory of the historical information is enhanced.Then the input of the hidden layer at time step t is as follows: where β is the weight that determines the proportion of historical information in the input space.The larger the value is, the larger the weight of the historical information is and the smaller the proportion of input information at the current time is, and β can be determined empirically.σ(•) is a saturated nonlinear function used by traditional RNNs to avoid degradation of the model into a simple linear model and to define the range of values of x t .Input x t ∈ R I×1 , and W i ∈ R I×O is the weight matrix of historical information y i mapped to the input layer.
The historical output information y i , i = 1, 2,... t is analyzed.Since the output of the network is not very reliable at the first few states and the noise is high, it may cause deviations after mapping to the input space.Therefore, it is considered whether to use a randomly generated weight matrix W i to cancel the noise, that is, each column element of W i is subject to a random distribution of N (0, 1).
The advantage of using the random weight matrix W i is that in the high-dimensional space, the historical information y i can make the input information x t at time step t tend to different directions in space by random weighting, so that the combined x t data is more separable.According to the pseudo-orthogonal property of the high-dimensional space [27], when the number of rows of the random weight matrix W i is large, the column vectors of W i are approximately orthogonal.In this paper, the length of the input sequence satisfies the requirement of higher dimension, and the column vectors of W i can be regarded as orthogonal.The network output y i in this paper is the classification result, and the orthogonal column vector of the random weight matrix W i will weight the corresponding y i , which will make the combined x t tend to different directions, thereby facilitating the subsequent classification processing.
Since the RRNN structure only improves the input layer based on the traditional RNN's structure, the network structure of other layers remains unchanged.Therefore, the calculation model of the RRNN network becomes: where, I, H, and O are the number of nodes of the input layer, the hidden layer, and the output layer respectively; a t and h t respectively represent the input and output of the hidden layer at time step t; and b t and y t represent the input and output of the output layer at time step t, respectively.W HI , W HH , and W OH are the weight matrix between the network layers respectively; B H and B O are the offset parameters of the hidden layer and the output layer; W i is the random matrix; and f h (•) and f o (•) are the activation function of the hidden layer and the output layer, respectively.Since the network structure is similar to the traditional RNNs algorithm, the RRNN training uses the gradient descent with momentum optimization method.
We choose the random weighted matrix to process the historical output information.The basis and advantages are: (1) Make full use of the historical output information y 1 , y 2 , . . ., y t−1 before time step t, and the random weighted history information t−1 i=1 W i y i is approximately irrelevant to the input data x t at time t analysis.
(2) The parameters in the randomly generated weight matrix W can reduce the over-fitting effect, which is similar to the increase of random noise for the input data to improve the generalization of the classification network [28].
(3) The randomly generated weight matrix W does not need to be obtained through learning, omitting the complicated steps of calculating the gradient of W and back-passing.Compared with the traditional RNNs, the training difficulty is not increased, and the classification performance is improved.

IRRNN Overall Structure and Algorithm
Aiming at the sample dataset characteristics and classification requirements of the targets, this paper proposes an IRRNN model.On the one hand, to ensure that the periodic features of the sample sequence are preserved and not destroyed by truncation, we use the IndRNN structure, which has the ability to process longer sequences than the traditional RNNs structure.On the other hand, in order to improve the classification performance of the RNNs model, we use the RRNN model to map historical output information to the input layer through a random weight matrix.
The spatial point targets generally have micro-motion forms such as precession or tumbling in the exo-atmosphere, so that the infrared radiation intensity sequences have periodic characteristics, and different shapes and micro-motion features will be fully embodied in the sequences, which makes the traditional feature extraction methods hard to extract.Therefore, the RNNs model is suitable for the classification of target infrared radiation intensity sequences.The length of each input sequence determines whether the feature information can be completely input to the neural network.IndRNN makes the training simpler and more efficient because of its independent structure.At the same time, it solves the problem of gradient explosion and vanishing and can input longer sequences.Therefore, we use the IndRNN structure to make the periodic characteristic information of the sequence not lost and can be used for subsequent classification.
The infrared radiation intensity sequence of the spatial point target has high correlation in time, and the information before time step t has important reference value for the classification of current time.Therefore, we combine the historical output information before time step t by random weighting with the input data at time t and input it to the hidden layer for further processing.
The structure of the IRRNN is as shown in the following Figure 5. First, the historical output information is weighted by the random weight matrix W and combined with the input data of the current time together, and then becomes the new input data and enters the hidden layer.Then the neurons of the hidden layer are independent of each other, and the training is more efficient and stable and can effectively converge when the input sequence is long.
The structure of the IRRNN is as shown in the following Figure 5. First, the historical output information is weighted by the random weight matrix W and combined with the input data of the current time together, and then becomes the new input data and enters the hidden layer.Then the neurons of the hidden layer are independent of each other, and the training is more efficient and stable and can effectively converge when the input sequence is long.To reflect the structure of the IndRNN, the connection of the hidden layer is represented by the symbol of the Hadamard product , and ReLU is used as the activation function.
Since the network structure of IRRNN improves the structure of the input layer and the connection mode of the hidden layer only on the basis of the traditional RNNs, the other network structures remain unchanged, so according to the Equations ( 8) and ( 14), the IRRNN network can be obtained.The calculation model is where indicates the Hadamard product, and other parameters are the same as Equation ( 14).In the formula,    To reflect the structure of the IndRNN, the connection of the hidden layer is represented by the symbol of the Hadamard product , and ReLU is used as the activation function.
Since the network structure of IRRNN improves the structure of the input layer and the connection mode of the hidden layer only on the basis of the traditional RNNs, the other network structures remain unchanged, so according to the Equations ( 8) and ( 14), the IRRNN network can be obtained.The calculation model is where indicates the Hadamard product, and other parameters are the same as Equation ( 14).In the formula, f h (•) and f o (•) are the activation function of the hidden layer and the output layer respectively.Because of the structure of IndRNN, f h (•) can be a unsaturated nonlinear function, so we choose ReLU as the activation function.And f o (•) still chooses Softmax as the activation function.
Through the above analysis of the IRRNN network structure, we use cross entropy as the loss function for the classification problem of the infrared radiation intensity time series of the spatial point target and use the gradient descent with momentum optimization method to update the network parameters during the training process.The specific training process of the IRRNN algorithm is shown in Algorithm 1.It is shown as follows: The network structure of the bidirectional IRRNN (B-IRRNN) is shown in Figure 6.As shown, there are two hidden layers that are independent of each other, and the input data is processed simultaneously in forward and backward manners.Then the output obtained is the weighted sum of results in two directions.It can be seen as a combination of two unidirectional RNNs networks, in particular, the hidden layer transmission of the two networks is reversed, and the output is determined by the results of the two networks together.
target of this paper, due to the periodicity and continuity of the target motion, the sample data information before and after the current time step t is very important for the classification decision of the current time.Therefore, this paper proposes a Bi-direction IRRNN model, which makes the sample information change from the original forward-only transmission to a bidirectional network structure that can be forward and reverse.The sample information before and after time step t can be applied in the decision of the current state.
The network structure of the bidirectional IRRNN (B-IRRNN) is shown in Figure 6.As shown, there are two hidden layers that are independent of each other, and the input data is processed simultaneously in forward and backward manners.Then the output obtained is the weighted sum of results in two directions.It can be seen as a combination of two unidirectional RNNs networks, in particular, the hidden layer transmission of the two networks is reversed, and the output is determined by the results of the two networks together.As shown in Figure 6, in the forward transmission layer, the hidden layer state h is recursively calculated from t = 1 to T, and the corresponding output is y .In the backward transmission layer, the hidden layer state h is inversely recursively calculated from t = T to 1, and the corresponding output is y .The final output is the weighted sum of the two output values.Therefore, the calculation formula of the B-IRRNN model is:  (18) where α1 and α2 are the weighting coefficients of the output, and α1 + α2 = 1 is required.Considering that the information before and after the current time is equally important in the time series of the infrared radiation intensity of the target, the equal weight addition is used here, so α1 = α2 = 0.5.In As shown in Figure 6, in the forward transmission layer, the hidden layer state → h is recursively calculated from t = 1 to T, and the corresponding output is → y .In the backward transmission layer, the hidden layer state ← h is inversely recursively calculated from t = T to 1, and the corresponding output is ← y .The final output is the weighted sum of the two output values.Therefore, the calculation formula of the B-IRRNN model is: where α 1 and α 2 are the weighting coefficients of the output, and α 1 + α 2 = 1 is required.Considering that the information before and after the current time is equally important in the time series of the infrared radiation intensity of the target, the equal weight addition is used here, so α 1 = α 2 = 0.

Experiments and Discussion
The purpose of this section is to discuss the performance of the proposed IRRNN algorithm by conducting multiple sets of experiments.Firstly, the classification performance of the IRRNN algorithm for the UCR data set is tested.Then, the classification performance of the IRRNN algorithm model and its extended form B-IRRNN for the infrared radiation intensity time series of the spatial target are discussed.space infrared sensor may be obscured, and the data may be missing, we set the target sample sequence to be randomly acquired from 120 s to 300 s, and the sequence of the interception time is 30 s.The classification performance of algorithms are observed at times of 8 s, 16 s, 24 s from the start point of each sequence.The experiment was divided into three groups, and the performance of the classification algorithm with the beginning time tbeg of 150, 200, and 250 s was tested.Set the acquired sample data to a signal-to-noise ratio (SNR) level of 20 dB.The simulated infrared radiation intensity sequence sample data of the targets is 2000 sets, and the four types of targets each have 500 sets.Samples are randomly assigned to the training set, validation set and testing set according to the ratio of 2:1:1.So the number of samples for the training set is 1000, and the number of samples for the validation set and testing set is 500.
In this experiment, a separate IndRNN structure(2 layers) and RRNN structure were added to compare with the IRRNN structure and its bi-directional extended structure B-IRRNN.The purpose is to compare the effects of these two structures on classification performance.Traditional RNNs were used as a reference.
According to the classification algorithm performance in three groups of experiments as shown in Tables 3-5, the following conclusions can be obtained: (1) The classification accuracy of traditional RNNs algorithm improves with the increase of the sequence length.Because the RNN's network can store the state information of the previous states and accumulate the historical history as the sequence length increases and the accuracy of classification of time series is also improved.However, although traditional RNNs have the ability of time-delay memory, the problem of degradation in parameter learning still exists.Only the sequence information in local time can be learned, and the long-term dependence of the sequence cannot be learned.
(2) IndRNN and RRNN have more advanced structures than traditional RNNs, and the classification performance is greatly improved compared with RNNs.The independent structure of IndRNN solves the problem of gradient vanishing and explosion, and can learn the long-term dependencies of sequences.For the RRNN structure, the historical output information uses the form of random weighting to make the new input data tend to the direction that is easy to classify.
(3) The classification performance of the IRRNN algorithm proposed in this paper is more prominent than the independent IndRNN algorithm and RRNN algorithm.Combining the advantages of both of them, the performance of the classification algorithm is significantly enhanced.
(4) The B-IRRNN model obtained the best classification performance at all observation times, and its classification accuracy was higher than that of the unidirectional IRRNN model.Because the B-IRRNN classification model has two hidden layers of forward propagation and backward propagation, it can simultaneously use past and future sequence information to help the classification decision at current time.It has more advantages than the unidirectional IRRNN model in time series classification.(2) Comparing Figure 7a,b, it can be seen that the classification accuracy of four algorithms increases steadily with the increase of the sequence length, because the longer the sequence, the more periodic information of the target motion included and the more favorable to the classification of the sequence.It can be clearly seen from Figure 7c that when the sequence reaches a certain length, traditional RNNs and the LSTM cannot achieve effective classification of the sequence.This is because the problems of gradient vanishing and gradient explosion still exist, and their structure and activation function decided that a long sequence could not be processed.At this time, the advantage of the IRRNN algorithm and B-IRRNN algorithm are reflected.Their ability to process long sequences is strong, and the classification accuracy does not fluctuate greatly with the length of the sequence, remains at a relatively stable high level, and improves with the sequence length increases.In summary, the IRRNN algorithm can process long sequences due to the independent structure of its hidden layer, which is beneficial to capturing the long-term periodic features of spatial point targets.The historical information is introduced into the input through random weighting simultaneously, which effectively improves the comprehensive classification performance of the algorithm and enhances the generalization capabilities of the network.In addition, the independent (1) With the increase of SNR, the classification accuracy of each algorithm is obviously improved, indicating that noise is an important factor affecting the classification task of spatial point target based on radiation intensity sequence.Therefore, the selected classification algorithm must be robust to noise.Compared with traditional RNNs and LSTM, the proposed IRRNN algorithm and B-IRRNN algorithm have obvious advantages, and even in the case of high noise level, they have more stable classification ability and verify the robustness to noise.
(2) Comparing Figure 7a,b, it can be seen that the classification accuracy of four algorithms increases steadily with the increase of the sequence length, because the longer the sequence, the more periodic information of the target motion included and the more favorable to the classification of the sequence.It can be clearly seen from Figure 7c that when the sequence reaches a certain length, traditional RNNs and the LSTM cannot achieve effective classification of the sequence.This is because the problems of gradient vanishing and gradient explosion still exist, and their structure and activation function decided that a long sequence could not be processed.At this time, the advantage of the IRRNN algorithm and B-IRRNN algorithm are reflected.Their ability to process long sequences is strong, and the classification accuracy does not fluctuate greatly with the length of the sequence, remains at a relatively stable high level, and improves with the sequence length increases.
In summary, the IRRNN algorithm can process long sequences due to the independent structure of its hidden layer, which is beneficial to capturing the long-term periodic features of spatial point targets.The historical information is introduced into the input through random weighting simultaneously, which effectively improves the comprehensive classification performance of the algorithm and enhances the generalization capabilities of the network.In addition, the independent structure of the IRRNN algorithm simplifies the network parameters and calculations, and the random weighting of historical information does not increase the learning complexity of the network.Therefore, the IRRNN algorithm can accomplish the classification easily and efficiently.For the infrared radiation intensity time series classification task of the spatial point target studied in this paper, IRRNN can process long time sequences, achieve stable classification accuracy, be robust to noise, and output classification results in real time, which is in accordance with classification task requirements.

Conclusions
This paper proposes a time series classification model based on IRRNN.The infrared signature model of spatial point targets has been constructed as the premise, and samples of infrared radiation intensity sequences are achieved.Our model improves the abilities of avoiding gradient vanishing and explosion, processing long-length sequences, and classifying effectively.In addition, the bidirectional extension structure of IRRNN was carried out to obtain a better classification performance.Experiments show that our algorithm achieves higher classification accuracy under various sequence lengths and noise levels compared with RNNs and LSTM.The proposed IRRNN model can effectively solve the problem of infrared radiation intensity time series classification of spatial point targets.

Figure 1 .
Figure 1.Geometry of IR sensor and a target with nutation, where the IR detection coordinates are parallel to the reference coordinates.

Figure 1 .
Figure 1.Geometry of IR sensor and a target with nutation, where the IR detection coordinates are parallel to the reference coordinates.

i
is the cosine of the angle between the vector → Appl.Sci.2019, 9, x FOR PEER REVIEW 6 of 21 Appl.Sci.2019, 9, x FOR PEER REVIEW 6 of 21 Appl.Sci.2019, 9, x FOR PEER REVIEW 6 of 21

21 Figure 2 .
Figure 2. Infrared radiation intensity sequence of four shapes of spatial targets.

Figure 2 .
Figure 2. Infrared radiation intensity sequence of four shapes of spatial targets.

Figure 3 .
Figure 3. MSE of LSTM and IndRNN with different sequence lengths.

Figure 3 .
Figure 3. MSE of LSTM and IndRNN with different sequence lengths.

Figure 4 .
Figure 4.The structure of RRNN unfolded by time.
σ(•) is a saturated nonlinear function used by traditional RNNs to avoid degradation of the model into a simple linear model and to define the range of values of xt.matrix of historical information yi mapped to the input layer.

Figure 4 .
Figure 4.The structure of RRNN unfolded by time.

Figure 5 .
Figure 5. Structure of IRRNN model unfolded by time.
function of the hidden layer and the output layer respectively.Because of the structure of IndRNN, unsaturated nonlinear function, so we choose ReLU as the activation function.And

Figure 5 .
Figure 5. Structure of IRRNN model unfolded by time.

Figure 6 .
Figure 6.Structure of B-IRRNN model unfolded by time.

Figure 6 .
Figure 6.Structure of B-IRRNN model unfolded by time.
5.  In the training process of B-IRRNN, the training parameters of the forward parameters , and the memory required is about twice the unidirectional IRRNN network.

4. 2 . 2 .
Effect of Noise and Sequence Length The experiment mainly tests the classification performance of four algorithms including RNNs, LSTM, IRRNN, and B-IRRNN under the situation of different noise levels and different sequence lengths.In the training process, all the algorithms are trained with the same signal-to-noise ratio that is SNR = 20 dB and the same sequence length L = 400.Classification performance of infrared radiation intensity time series of spatial point targets by different algorithms are tested under different signal-to-noise ratio levels (5 dB, 10 dB, 15 dB, 20 dB, 25 dB, and 30 dB) and different input sequence lengths (200, 400, 600, 800, and 1000).Ten independent Monte Carlo simulation experiments were carried out, and the mean of the classification accuracy of each algorithm was taken as the final result.As shown in Figure 7, we can get the following conclusions: Appl.Sci.2019, 9, x FOR PEER REVIEW 18 of 21

Figure 7 .
Figure 7. Classification performance of four algorithms under the situations of different SNR sequence lengths.

Figure 7 .
Figure 7. Classification performance of four algorithms under the situations of different SNR sequence lengths.

Table 1 .
Simulation parameters for four classes of spatial targets.

Table 1 .
Simulation parameters for four classes of spatial targets.

Table 1 .
Simulation parameters for four classes of spatial targets.

Table 1 .
Simulation parameters for four classes of spatial targets.

Table 1 .
Simulation parameters for four classes of spatial targets.

Table 3 .
Classification accuracy of four algorithms when t beg = 150 s.

Table 4 .
Classification accuracy of four algorithms when t beg = 200 s.

Table 5 .
Classification accuracy of four algorithms when t beg = 250 s.