A machine learning and feature engineering approach for the prediction of the uncontrolled re-entry of space objects

The continuously growing number of objects orbiting around the Earth is expected to be accompanied by an increasing frequency of objects re-entering the Earth's atmosphere. Many of these re-entries will be uncontrolled, making their prediction challenging and subject to several uncertainties. Traditionally, re-entry predictions are based on the propagation of the object's dynamics using state-of-the-art modelling techniques for the forces acting on the object. However, modelling errors, particularly related to the prediction of atmospheric drag may result in poor prediction accuracies. In this context, we explore the possibility to perform a paradigm shift, from a physics-based approach to a data-driven approach. To this aim, we present the development of a deep learning model for the re-entry prediction of uncontrolled objects in Low Earth Orbit (LEO). The model is based on a modified version of the Sequence-to-Sequence architecture and is trained on the average altitude profile as derived from a set of Two-Line Element (TLE) data of over 400 bodies. The novelty of the work consists in introducing in the deep learning model, alongside the average altitude, three new input features: a drag-like coefficient (B*), the average solar index, and the area-to-mass ratio of the object. The developed model is tested on a set of objects studied in the Inter-Agency Space Debris Coordination Committee (IADC) campaigns. The results show that the best performances are obtained on bodies characterised by the same drag-like coefficient and eccentricity distribution as the training set.


Introduction
From the launch of the first artificial satellite on the 4 th of October 1957, the population of bodies orbiting around the Earth has been continuously growing with an exponential trend [1]. Furthermore, in the last years, this growth has been exacerbated by the creation of large constellations for telecommunication, navigation and global internet coverage. Post-mission disposal guidelines [2,3] have been introduced in order to preserve the Low Earth Orbit (LEO) region and incentivise the early disposal of satellites so that they would not affect the already polluted orbital environment. Focusing on the re-entry problem, as of March 2022, a total of 26135 bodies have re-entered the atmosphere 1 . According to Pardini et al. [5], since April 2013, only one out of five re-entries of intact objects was controlled, corresponding to a total mass of roughly 11000 metric tonnes, mainly related to rocket bodies and spacecraft. Therefore, the continuously growing number of orbiting objects should be correlated with an expected increase in the re-entries of both controlled and uncontrolled objects. However, predicting the re-entry epoch and the location of an uncontrolled body remains critical because it is affected by several uncertainties, which could translate into risks of casualties on the ground [5].
In the typical dynamics-based approach to the re-entry prediction problem, the trajectory of the spacecraft is propagated until it reaches the altitude when the break-up occurs. This 1 www.space-track.org arXiv:2303.10183v1 [cs.LG] 17 Mar 2023 approach consists of determining the initial position of the object and accurately modelling the forces acting on it in order to predict the evolution of its trajectory. However, in general, it is difficult to accurately model the forces acting on the spacecraft and to precisely know its physical characteristics and state. In fact, a typically recommended relative uncertainty value of ±20% is recommended for re-entry predictions [6,7]. [5] give an in-depth review of the different sources of uncertainties in re-entry predictions, the most relevant being the modelling of the drag acceleration. Other forces affect the re-entry dynamics such as solar radiation pressure and the lunisolar attraction. However, in this work, we focus on the latest stage of the re-entry prediction of low eccentricity orbits; therefore, the dominant force is atmospheric drag and the main uncertainties are related to the atmospheric density. It is modelled through empirical models which describe the density variations on spatial and temporal scales [8][9][10][11][12]. However, they are affected by two types of uncertainties [11]: the simplified physical modelling on which they are based; and the uncertainties and complexities in predicting the space weather, in particular the solar index. Another important variable for the predictions of low eccentricity orbits is the ballistic coefficient of the object of interest. The ballistic coefficient depends on the object's shape, orientation, and attitude, which can be difficult to accurately know for uncontrolled objects.
In this circumstance, it becomes interesting to adopt a paradigm shift to model the problem, from a physical to a data-driven point of view. In this context, machine learning represents a new approach, where, instead of accurately modelling the space environment, its forces and how the spacecraft interacts with them, it autonomously builds its knowledge based on data, autonomously dealing with the complexities and uncertainties of the phenomenon. Jung et al. [13] proposed a machine learning approach to predict the re-entry epoch of an uncontrolled satellite based on Recurrent Neural Networks (RNNs). In their work, the RNN is trained on a set of Two-Line Elements (TLEs) sets using as the only feature the evolution of the average altitude of the object. TLE sets provide, among other information, the time evolution of the position of catalogued objects in orbit. However, as also addressed by Lidtke et al. [14], TLEs suffer from some inaccuracies that must be taken into account. Most notably, TLEs of an object are not homogeneous; they can be provided at irregular time steps and contain incorrect observations or observations belonging to other objects (outliers). In addition, TLEs do not directly provide information on the physical characteristics of the object (e.g., the ballistic coefficient), rather, they combine all the uncertainties and modelling errors in a single drag-like coefficient, referred to as B * [9,10]. Finally, TLEs do not contain accuracy information on the measure they provide that is they do not provide covariance information. However, TLE sets are currently the only public source of data that can be used for re-entry predictions. Therefore, despite their limitations and inaccuracies, they are used in this work to assess the capabilities of a machine-learning approach using multiple input features. As shown in [15], initial covariance information can be estimated using pseudo-observations, and the accuracy of TLE data can be improved as shown [16,17]. While in this work we focus on the application of a machine learning technique to raw TLE data, future work can benefit form the inclusion of such information to increase the accuracy of the re-entry predictions.
Starting from the work of Jung et al. [13], we expand on the machine learning model introducing additional features, namely the drag-like coefficient, B * , and the solar index. The first feature will allow the machine learning model to account for the physical characteristics of the objects, while the second feature will give a temporal dimension to the model, accounting for the different solar activity levels and how they affect the lifetime of an object. The developed model is trained on hundreds of TLE sets belonging to uncontrolled objects such as spent payloads and rocket bodies, and debris. Consequently, the same model can be tested on different objects, in order to obtain the output sequence and assess its accuracy.
The paper is structured as follows. Section 2 describes the deep learning model architecture adopted, Section 3 contains the procedure used for the pre-processing and filtering of the data required by the deep learning model, Section 4 contains an analysis of the characteristics of the input features of the model, and Section 5 shows the results obtained by the model on selected test cases, together with the optimisation procedure of the model's hyperparameters.

Deep learning model
From a machine learning perspective, the task of predicting the re-entry of an object is a time-series problem, where the aim of the model consists in predicting the evolution of the trajectory based on an initial set of conditions so that the re-entry epoch can be determined. In other words, it is a regression problem. Time series are discrete-time data and can be considered as a particular case of sequential data, where each value is associated with a time stamp. Therefore, the order of the series is imposed by the time dimension and it must be preserved. Let us denote an input sequence of length T x as [18]: where the vector x <t> ∈ R p contains the input features at time step t. Furthermore, let us denote a target sequence of length T y as [18]: where y <t> ∈ R is the target feature. We define the predicted sequence as [18]: The aim of the deep learning problem consists in predicting the target sequence, minimising a loss function, that in this work has been selected as the Mean Squared Error (MSE) [18]: The choice of the MSE is natural in this context since the similarity with the Euclidean distance. Indeed, the MSE can be intuitively considered as the squared distance between the true position and the predicted one, averaged by the number of observations in the output sequence.

Sequence-to-Sequence architecture
In this work, we selected a Sequence-to-Sequence (Seq2Seq) [19] architecture to model the re-entry problem, due to its ability to deal with input and output sequences of different lengths. As shown in Fig. 1 , a Seq2Seq architecture is composed of two neural networks, the encoder and the decoder, and the context vector. The encoder receives as input a sequence of arbitrary length, T x , and generates a fixed-length vector, called the context vector. This vector summarises the information contained in the input sequence. The outputs of the decoder are simply discarded. At each time step, the cell produces the hidden state considering the input at the same time step and the hidden state at the previous time step [20]: where h <t> Enc is the hidden state of the encoder at time t, h <t−1> Enc is the hidden state at the previous time step, x <t> is the input vector at the current time, and θ is the weights vector. The function f represents a neural network, which can be a simple Recurrent Neural Network (RNN) [21,22] or a more complex Gate Recurrent Unit (GRU) [20,21,23] or a Long-Short Term Memory (LSTM) [20,21] network. At the final time step, T x , the hidden state of the encoder is defined as the context vector as follows: The decoder is initialised with the final state of the encoder, h <t =0> Dec = c context . The aim of the decoder is to generate the output sequence based on this encoded vector. At each time step t ∈ [1, T y ], given the hidden state and the output at the previous time step t − 1, the decoder hidden state can be defined as follows [20]: where again g(·) can be a simple RNN, a GRU or an LSTM. Subsequently, it is possible to obtain the outputŷ <t > by passing the corresponding output of the decoder to an output layer. Note that the hidden state at time t = T y is simply discarded.
During training, the decoder receives its own previous hidden state and the true input, or the predicted output at the previous time step, to generate a new state and output sequentially. When only the "true input" is provided to the decoder, the training method is referred to as Teacher Forcing [21]. Therefore, in this method, the predicted output of the model is substituted with our reference data, which is considered the ground truth. However, this method introduces a discrepancy between the training phase and the inference phase in which the network is used in open-loop, which can introduce errors in the prediction as the network is not able to learn from its own mistakes. To mitigate this problem, Bengio et al. [24] proposed an approach called Curriculum Learning. It consists of randomly selecting either the previous true output or the model estimate, with a probability that is proportional to the training epoch. Intuitively, at the initial epochs of training, the model should be trained with teacher forcing since it is not still well trained and errors can be made. Instead at the final epochs, the model is expected to be well trained so that it is possible to use the model output, as during inference. Specifically, for every output to predict y <t > of every batch at the j-th epoch, a coin is flipped to decide to feed either the previous true output, associated with a probability j , or the model estimate, with a probability (1 − j ). Given the fact that in this context the size of the dataset is quite small and the training epoch number is large, the probability is set as a function of the training epoch, instead of the iteration [24], as: where k < 1 is a constant that depends on the speed of convergence, which is the decay rate of the scheduled sampling.
The representation of the Seq2Seq model during training is provided in Fig. 1, where x <t> denotes each time component of the input sequence;ŷ <t> the predicted output; and y <t> the ground truth.

Gate Recurrent Unit
In this work, both the encoder and the decoder are based on Gate Recurrent Units (GRUs) [23], given their capability of dealing with the vanishing gradient problem that characterises simple RNN when trained with long sequences. It can be seen as a simplified version of the LSTM and it has the advantage to reduce the computational cost, providing comparable performances [25]. At each time step, the GRU computes: where W z , W r , W c ∈ R k×p denote the input weight matrices; V z , V r , V c ∈ R k×k are the hidden state weights; b z , b r , b c ∈ R k are the bias vectors; the element-wise product; σ(·) and tanh(·) are the sigmoid function and the hyperbolic tangent function, respectively. The variables z <t> ∈ R k and r <t> ∈ R k represent the update and reset gate respectively, which helps to tackle the vanishing gradient problem. Their aim consists in controlling the quantity of information of the past time step that needs to be carried forward. In particular, the update gate is responsible for updating the old state, by controlling the similarity between the new state and the past one. The reset gate is responsible for setting the amount of information that needs to be forgotten. For these reasons, the values of the gates are designed to be bounded between 0 and 1, by using the sigmoid function. The output hidden state is computed in two steps. First, the candidate memory cellc ∈ R k is calculated according to Eq. (11). It takes as input the previous hidden state, which is multiplied by the reset gate, and the current input. Depending on the entries of the reset gate, two limit cases can be identified. When its values are close to unity, the information of the previous time step is carried forward in time, like a simple RNN. Instead, when the values are close to 0, the information from the previous time step is ignored. After having derived the candidate memory cell, the variable c ∈ R k , called memory cell, can be computed according to Eq. (12). The memory cell receives as input the current candidate and the previous memory cell and it is governed by the reset gate only. In this case, when the entries of the reset gate are close to 0, the new memory cell is updated with the candidate, meaning that the information of the previous time step is dropped. Instead, when the values of the gate approach to unity, the candidate is simply discarded and the memory cell maintains its state. Finally, the hidden state and the output of the GRU can be defined as, respectively: The computational flow diagram of a GRU is summarised in Fig. 2. The Seq2Seq model is built on TensorFlow [26], a widely used open-source library for machine and deep learning, developed by Google. TensorFlow automatically derives the gradients with respect to the weights of the model using auto-differentiation, making the training of the model a straightforward task. The high-level interface to TensorFlow is provided by Keras [27], a Python library that allows building, training, and testing neural networks.

Data pre-processing
The raw data used to train and validate the deep learning model has been retrieved from Space Track 1 . Specifically, all the decayed objects from the 1 st of January 2000 to the 7 th of October 2021, for which the Tracking and Impact Prediction (TIP) is available, have been considered. The TLE data is retrieved in the form of Orbital Mean-Elements Messages (OMM); once the raw TLE data is available, it is necessary to control the quality of the data and prune possible outliers. Following [14], the outliers can be found by focusing on the mean motion and B * coefficient. The main steps of the pruning procedure are as follows: 1.
Identify the presence of TLE corrections by defining a time threshold such that the previous observation is considered incorrect. A common time threshold is half an orbit.

2.
Identify large time intervals between consecutive TLEs, and, consequently, divide the entire set into different windows in order to ensure TLEs consistency.

3.
Find and filter out single TLEs with discordant values of the mean motion, based on a robust linear regression technique and on a sliding window of fixed length. In particular, the results obtained through regression are compared with the OMM following the sliding window and an outlier is identified if a relative and an absolute tolerance are exceeded (optimal tuning parameters for the filter can be found in Table 3 of [14]).

4.
Identify and filter out possible outliers in eccentricity and inclination, using a statistical approach. Particularly, the mean is computed within a sliding window, and it is subtracted from the central orbital element of the interval, obtaining a time series of differences. Afterwards, using another sliding window that scans the aforementioned series of differences, the mean absolute deviation is computed. An outlier is removed if the orbital element presents a difference from the mean that is higher than a given threshold ((optimal tuning parameters for the filter can be found in Table 4 and Table 5 of [14]).

5.
Filter out TLEs that present negative values of B * because they can be associated with unmodelled dynamics or manoeuvres.
For a complete description of the filters and their performance, as well as the optimal tuning of the filter parameters, the reader is referred to the work of Lidtke et al. [14]. It is worth noting that some outliers may not be identified by such filters; however, the percentage of outliers is considered to be sufficiently small to ensure the validity of the data. After the pruning process, the mean elements contained in the TLEs are converted to the corresponding osculating variables using the SGP4 propagator [10]. It is also important to consider other characteristics of the dataset. Specifically, we want to focus on re-entries that have a sufficient number of TLE points, whose re-entry prediction is sufficiently accurate. The minimum number of TLE points has been selected via a trade-off between the number of pruned TLEs and the accuracy of the fitting procedure (see Section 4.1) [29]. In addition, the orbit should not be too eccentric as the re-entry mechanism for these types of orbits changes significantly. The following criteria have been used to filter the dataset: The described criteria adopt of the average altitude, h, that is computed according to: where a is the osculating semi major axis; and R ⊕ the Earth radius. The result of this pruning and filtering procedure is a dataset with a total of 417 bodies, which is then split in 80% training and 20% validation (see Appendix A. Note that the objects in the validation set must resemble the characteristics of the training set. For example, the average, µ, and the relative standard deviation, σ, of the B * coefficient and the eccentricity for the training and validation sets should be comparable (Table 1).

Variable
Training Set Validation Set Moreover, the re-entry epochs are distributed quite uniformly from roughly January 2005 to April 2021 for both the sets, therefore the different objects experience different solar flux conditions and, therefore, similar values of drag acceleration (Fig. 3). Finally, it should be highlighted that the majority of the bodies in the training and validation sets is characterised by an uncertainty in the re-entry epoch of less than 2 minutes.  The objects for testing the prediction capabilities of the model are selected among the bodies analysed during IADC campaigns, as listed in

Features engineering
In this section, we outline the processes to select and pre-process the most important input features required to predict the re-entry epoch by using a machine learning model. However, this is not a straightforward task because it is necessary to find input variables that are correlated with the output of the deep learning model. Therefore, it is essential to analyse each feature from a physical point of view, so that this knowledge can be then applied to constructing the deep learning model, which is then going to benefit from each input variable. This process is typically known as feature engineering. In this optic, the deep learning model is considered a physical model, where, given the input variables, the output is obtained by means of physical relations. Instead, in machine learning, the model is treated as a black box, which is autonomously capable of learning the input-output relations, based on the true values and the input features. Therefore, each feature can be selected and generated based on the knowledge of the physical principles that govern the re-entry phenomenon, leaving the deep learning model to cope with the learning, and the uncertainties of the physical problem. By analysing each feature, the relations between the considered variables can be highlighted so that we can make a more informed decision on whether the considered feature can be beneficial to the deep learning model [28,29].
In addition, the raw data, provided by the TLEs requires a pre-processing and regularisation step in order for it to be used with an RNN. In fact, TLE data is characterised by a non-uniform generation frequency, which differs from object to object; it also lacks high accuracy and may present outliers. All these aspects must be taken into account before feeding the data to the RNN. The following sections analyse the contribution of the features considered in this work, which are the average spacecraft altitude, the ballistic coefficient, the solar index, and the B * coefficient.

Average altitude
The average altitude feature is of fundamental importance for deep-learning-driven reentry prediction. In fact, it is related to the definition of the loss function used to train the RNN Eq. (4). This means that it is used both as input and as output for the deep learning model, such that given a part of the average altitude profile as input, the remaining part is predicted. The pre-processing of the average altitude profile follows the work of Jung et al. [13]. In particular, each trajectory is generated by fitting the available TLEs, below 240 km, according to the following equation: (16) where t re f is the re-entry time, and a 1 , a 2 , a 3 and a 4 , are the parameter obtained through curve fitting using the Levenberg-Marquardt algorithm, which is available through the python library SciPy [30]. The first coefficient, a 1 , is equal to 80 km, which is the reference altitude at which TIP messages are provided [7]. This assumption avoids the fitting algorithm changing the re-entry epoch from the TIP message and, therefore, ensures consistency with the TLE data. Subsequently, the epoch corresponding to an altitude of 200 km is set as the initial epoch so that the time span from the starting altitude of 200 km to the re-entry at 80 km, identifies the residual time. As each object will have different residual times, it is convenient to interpolate the residual time as a function of the average altitude. To do so, we subdivide the considered altitude range (between 200 km and 80 km) into a uniform grid of 25 points. Consequently, each trajectory is identified by a series of observations {x i , t i }, with i = 1, 2, ... 25, where x i represents the average altitude and t i the relative epoch. In this way, it is possible to use only t i as the output of the deep learning model. Fig. 4 shows a result of such a procedure for the object identified by NORAD 27392. The resulting trajectories are represented for the training, validation, and test sets are represented in Fig. 5. It can be seen that there is a concentration of trajectories characterised by residual lifetimes between 0.5 days and 3 days. Meanwhile, only 3 trajectories with residual lifetimes higher than 6 days are present, and only two of them are used for training. This is important because this distribution incorporates the ballistic coefficient information and it can be directly related to the prediction capability of the model. We also note that in the test set, the leftmost trajectory has a residual time of roughly 14 minutes and it is associated with the Molniya 3-39 spacecraft. In this case, the trajectory is highly elliptical and does not satisfy our selection criterion defined in Section 3; however, it has been included in the test set to assess the capabilities of the model with such trajectories. In a real case scenario where the entire trajectory is not known a priori, Eq. (16) can be used with t re f as the epoch of the last available TLE. However, the resulting curve will be different from the one obtained with all the available TLEs. We thus expect to observe some differences in performance between the training and testing sets. Further analysis should be carried out on this aspect because there may be variations in the predicted trajectory that depends on the adopted procedure and on the number of available TLEs. However, this difference should be minimised when a high number of TLE points is available, which is the case, for example, of the objects analysed during IADC campaigns.

Ballistic coefficient and B*
The ballistic coefficient, B, has a key importance on the re-entry of low eccentricity orbits because the dynamic is dominated by the atmospheric drag perturbation. The ballistic coefficient, as defined in Eq. (17) [31], describes the aerodynamic interaction of the spacecraft with the atmosphere via its shape, mass, and drag coefficient.
However, as we are using TLE data for the re-entry prediction, we use the drag-like coefficient, B * , as a proxy of the ballistic coefficient. In fact, B * is the coefficient introduced in the SGP4 propagator, which in turn is used for the generation of TLE data. The expression for the B * is as follows [10]: where ρ 0 is the atmospheric density at the perigee of the orbit, assumed as 2.461 × 10 −5 kg/m 2 /ER, with ER = 6375.135km; and R ⊕ the Earth radius of 6378.135 km. In general, the B * coefficient of the SGP4 formulation soaks up different modelling errors. Despite the ballistic coefficient can be retrieved from B * , they are not identical. However, in this context, we assume B * can be used as a proxy of B as we are considering objects with similar orbital parameters that will be subject to similar modelling uncertainties. Similarly to the average altitude, it is essential to pre-process the B * data in order to feed them to the RNN. The first step of the processing consists of defining a sliding window with a fixed size to create a series of average values, in which each term is given by the average of all the previous points. The second step consists in applying a linear piece-wise interpolation scheme, in which, between two consecutive values, the preceding one is kept constant. Subsequently, the interpolated curve is sampled at the epochs defined through the average altitude processing. An example of the aforementioned procedure is represented in Fig. 6 for NORAD 27392.
Given the features generated so far, it is possible to apply the PCA to observe if the expected relations between the B * , the residual time in days and average altitude exist. The results of the analysis are summarised in Fig. 7. Focusing on the loading plot and the PC1, the feature related to the time has a strong positive loading; meanwhile, the others have negative loadings. Indeed, the time feature can be seen as a synonym of the residual lifetime, therefore as the B * increases, the lifetime of an object decreases due to the drag acceleration growth. Furthermore, the altitude-time relation is associated with the altitude decrease, as time proceeds.

Solar Index
The Solar Index, F 10.7 , is a parameter that describes the intensity of solar radiation. It is important for predicting the variations of the atmospheric density [10]. The data from TLEs do not include the atmospheric density; however, the B * accounts also for the solar index variations. To highlight this dependence, Fig. 8 represents the drag-like coefficient as a function of the last 81-day average of the solar index F 10.7 for different bodies. It can be observed that, as F 10.7 increases, the B * growths with a non-linear behaviour, showing the existence of a relation between the variables. In addition, this is also highlighted by the same colours of B * groups for the same values of F 10.7 , and by the terms of the curve fitting associated with F 2 10.7 . Therefore, the F 10.7 index is introduced as a feature of the RNN, along with the area-to-mass ratio. This decision stems from the fact that the model must deal with objects with similar B * but completely different solar indexes. From an implementation point of view, it has been found that including only the F 10.7 index corresponding to the starting epoch of the prediction led to the highest performance for the RNN. Therefore, it is included and repeated in the input sequence as a constant value. The same procedure is applied to the area-to-mass ratio, which is kept fixed.
Similarly to Sections 4.1 and 4.2, it is possible to explore the features generated so far in order to find if the expected relations can be found. To this aim, the PCA can be applied considering the F 10.7 index, the B * and the area-to-mass ratio.   Fig. 9. From the loading plot, it can be seen that, along PC1, all the variables are characterised by positive loadings. In particular, the relation between the A/m and the B * is related to the drag-like coefficient definition. The relation between the B * and the solar index is the same as discussed in Fig. 8, where a positive correlation is present. Moreover, the relation between the area-to-mass ratio and the F 10.7 can be explained through the B * .

Data regularisation
The features generated so far include the average altitude profile, B * , the F 10.7 and the area-to-mass ratio. Due to their nature, the solar index and time-related feature are normalised according to [18]: wherex represents the normalised feature; and x the related original input feature.

Results
The Seq2Seq architecture described in Section 2 is independently trained on four different cases, characterised by different starting altitudes, as shown in Table 3. In this way, the sensitivity of the prediction accuracy with respect to the starting altitude can be assessed.

Case T x [-]
Starting  In each one of the cases in Table 3, the entire set of TLEs is split into two independent parts, the training and validation sets. Furthermore, both the inputs and outputs of the two sets have the same lengths. Therefore, the model is trained to predict the output sequence of length T y , given an input sequence of length T x , and the same strategy is also applied for validation. This means that the losses, L Val MSE and L Train MSE , are computed on the values associated with the same average altitudes. The only difference is represented by the objects analysed, which are different between the sets. The input dataset is mathematically represented by a tensor of rank 3, which, in TensorFlow's notation, has dimension [Objects × Time × Features]. Therefore, according to the problem definition, the model is faced only with a part of the input trajectory of the objects, together with all the related input features. The length of the time dimension can be chosen according to the desired needs, by varying the value of T x , which is associated with the starting altitude of the prediction. Hence, the five different cases in Table 3 are identified by different lengths of the input sequence T x . Regarding the output, the model has to predict the final part of the average trajectory altitude until the re-entry epoch. The sum of the input and output sequence is T x + T y = 25.

Hyperparameters optimisation
The deep learning model described in Section 2 requires the definition of specific hyperparameters that are variables on which the learning algorithm relies for its optimisation and prediction processes. In general, hyperparameters may be different depending on the problem in exam; therefore, it is necessary to perform an informed selection of such parameters. This is usually achieved by optimising the hyperparameters in order to maximise the performances of the deep learning model. However, the hyperparameters optimisation process can be computationally intensive; therefore, for this work, it has been decided to perform the optimisation only of Case A of Table 3. In fact, this case is characterised by the smallest input sequence and is therefore considered the most challenging of the four cases. The optimisation is designed using HyperOpt [32] module, that is Bayesian optimisation library, together with the Asynchronous Successive Halving (ASHA) algorithm [33] to combine random search with the early stopping of hyperparameters combinations that exhibit poor performances. These algorithms can be conveniently combined and parallelised using the Python package Ray Tune [34]. In this work, part of the hyperparameter space has been kept constant and another part has been optimised. Table 4 shows the fixed hyperparameters and their associated value. As shown in Eq. (4), the loss function is the Mean Squared Error; the selected optimiser is Adam [35], which is a commonly used algorithm for deep learning model weights optimisation. It is a first-order gradient-based optimisation, based on adaptive estimates of first and second-order moments (i.e., mean and variance). As specified in Table 4, the decay rate of the first order moment, β 1 , and of the second order moment, β 2 , have been set, together with the Clipnorm parameters, which forces a re-scaling of the gradient whenever its norm exceeds the specified value.

Loss
Mean Squared Error The remaining hyperparameters that are optimised are summarised in Table 5, together with the relevant search spaces and sampling mode. The parameter related to the layer is defined as the number of stacked GRU units of the encoder and decoder, respectively. This means that if the number of layers is 2, the model is going to be composed of 4 units, where 2 are associated with the encoder and the remaining with the decoder. Specifically, the final hidden state of each layer of the encoder is passed to their respective layer of the decoder. For example, the first GRU unit of the decoder is initialised with the hidden state of the first unit of the encoder.

Hyperparameter
Interval  Finally, it is also necessary to specify the parameters for the ASHA algorithm. Table 6 shows the aforementioned parameters and the selected values. Specifically, the total number of trials refers to the amount of hyperparameters combinations the algorithm attempts before stopping, the reduction factor represents the fraction of trials removed after each run and the grace period that corresponds to the epochs for which a trial must be trained before halving. These values have to be selected considering the computational resources of the hardware and the expected results. The most suitable parameters, which have been found with a trade-off between results and computational time, are represented in Table 6. It has to be noted that a small grace period may bias ASHA towards those solutions that converge quite rapidly but do not lead to the best performances. Therefore, a good compromise is experimentally found to be at 400 epochs. Furthermore, the maximum value of the training epoch is set at 2100 because it has been observed that the optimal value is typically above 2000.  Table 6. Input parameters of ASHA. Fig. 10 shows the validation losses as a function of the value of each hyperparameter. We can observe that the curves associated with lower losses are approximately distributed at low values of the batch size, between 20 and 40, and hidden state size, in particular between 50 and 150. Furthermore, these distributions provide the qualitative importance of each parameter. Indeed, an important parameter can be defined by observing the magnitude changes of the resulting loss with respect to a variation of its value. This means that a change, even small, in such an input parameter, determines an important variation of the resulting loss. For example, by looking at the learning rate in the parallel coordinates plot, it can be seen the distinction between red and blue curves, where the lower rates lead to high validation losses. Instead, values between 0.002 and 0.0075 seem to lead to better performances. Contrarily, the decay rate of the Scheduled Sampling mechanism appears to have a negligible effect on the validation loss. Indeed, it is not possible to identify a range of values that leads to higher performances. This is related to the fact that the decay rate in Eq. (8) does not have a significant importance on the convergence because the threshold, for selecting the ground truth or the model output, decreases quickly for all the values that the optimiser selected, forcing the model to learn from its predicted outputs. Therefore, it may suggest that the model performs better when it has to deal with its own errors, in a way that is more similar to inference than training. Finally, the optimised values of the hyperparameters are given in Table 7.

Dataset characterisation
To analyse and understand the performances of the model, the test objects have to be divided depending on their physical properties with respect to the training set. We define two categories of objects as in Table 3 (Appendix A.3.1) in which the objects of the test set share similar characteristics to the training set. Specifically, as shown in Fig. 11, the two categories are subdivided with respect to the B * coefficient: Category 1 objects (red label) have a median B * coefficient inside the 0.25 and 0.75 quantiles of the training set distribution, while the median B * of objects in Category 2 (black label) lies outside this range. Therefore, given the significant importance of B * , it is expected that the deep learning model is going to perform better with those objects for which their distributions are similar to the training.  To analyse and understand the performances of the model, the test objects have to be divided depending on their physical properties with respect to the training set. Hence, Fig. 11 represents the B * distributions of the objects in the test and training sets, where two categories of bodies can be identified, as highlighted by the different colours of the NORAD IDs. Another parameter to analyse is the eccentricity, which distributions for the training and test set are represented in Fig. 12. It can be seen that 20813 represents an anomaly with respect to all the other objects. Indeed, it is characterised by a median of roughly 0.7, which identifies it as a highly elliptical orbit, for which the nature of the re-entry is different. However, we have decided to maintain this object to further test the capabilities and limitations of the developed deep learning model. To characterise the performances and compare the results, let the absolute error, abs , and the relative error, rel , be respectively: where t Predicted is the predicted re-entry epoch; and t Actual is the assessed re-entry epoch.

Testing
Starting from the objects of the first category, Table 8 represents the predictions' errors for all the different starting altitudes defined in Table 3. Generally, it can be seen that the model is successfully capable of predicting the output trajectory and therefore the re-entry epoch. For example, 39000 shows an absolute error that is less than 2 minutes, roughly 24 hours before the re-entry. Furthermore, this error stays confined below roughly 3 minutes, for all the predictions. This behaviour is related to the similarities of the B * distributions of each object with respect to the one of the training set. Indeed, it should be highlighted that the highest absolute error is 14 minutes and it is associated with 40138. Moreover, observing the MSEs, all the relative errors lay between 10 −5 and 10 −7 , showing that the similarities of the B * are associated with excellent performances. It is interesting to note that comparing the different cases, the MSE decrease does not necessarily correspond to a lower absolute error, as for example in 11601 and 39000. This indicates that better results could also be obtained by selecting a different loss function.
Regarding the objects in the second category, the errors for all the bodies are listed in Table 9. It can be generally seen that the performances of the model are inferior with respect  to the previous category of objects, due to the low number of bodies characterised by B * distributions similar to the one of the training set. This can be clearly observed in 37872, which can not be associated with any B * distributions in the sets. Hence, the resulting outputs are characterised by the highest MSEs, due to the fact that the model tends to overestimate all the relative epochs. However, it can be demonstrated that the output follows the shape of trajectory altitude quite accurately. Furthermore, it can be noted that 20813 is characterised by the highest relative errors and equally significant MSEs. Hence, this indicates that the model is not capable of predicting the output trajectory and it completely misses all the predicted epochs. Indeed, by inspecting Fig. 11, 20813 is characterised by a large distribution of B * values, which maximum is similar to 19045. However, the model has shown to follow the output profile reasonably well for 19045 in all the cases, with a maximum relative error of roughly 18 minutes and MSE between 10 −5 and 10 −7 [days 2 ]. However, object 20813 also has a large value of the eccentricity (see Fig. 12, which justifies the difficulties of the deep learning model in predicting its re-entry and indicates that, as expected, the contribution of the eccentricity cannot be ignored for orbits that are not quasi-circular.

Orbit prediction -Case A
In this section, we show in detail the results for Case A. Specifically, we present the evolution of the training and validation losses and the results of the trajectory predictions of the developed deep learning model for selected objects in the test set. For the test case, both categories of objects (Appendix A.3.1) have been included. Fig. 13 shows the training and validation losses as a function of the training epoch, for a total of 2900 epochs. We observe how the rate of convergence of the validation loss is quite lower than the training loss; nonetheless, the final loss is below 1 × 10 −5 , which has been observed to produce accurate results. Fig. 14 shows the trajectory prediction for selected objects of Category 1 in the test set. The black curve represents the input average altitude (the initial 5 elements of the input sequence, T x ), the red line represents the predicted sequence, and the blue trajectory defines the true output (as obtained from TLE data).
In general, it can be seen that the model is capable of predicting the re-entry epoch with excellent results. The model can successfully estimate the output trajectory of each object because the outputs follow the true paths quite accurately. Indeed, as also shown in Table 8, the highest MSE is associated with 38086; however, the relative errors on the re-entry epoch are small. Therefore, even if the MSE appears to be relatively high, the absolute and relative errors   on the re-entry epoch are small. Moreover, the absolute errors are all lower than approximately 13 minutes, except for 40138, which is characterised by an error of 13.35 minutes. This can be explained by the higher B * distribution of 40138 with respect to the training set. Furthermore, considering objects 40138 and 10973, it appears that the model has correctly assimilated the knowledge related to the relation between the solar index and the drag-like coefficient.
Concerning the objects with B * distributions different from the one of the training set (Category 2), the results are summarised in Fig. 15 and Fig. 16.
Generally, it can be observed that all errors are higher with respect to the previous case. This is related to the low number of objects characterised by a B * distribution similar to the training, which makes the model perform less accurately. However, in 4 out of 7 cases, the absolute errors are lower than approximately 40 minutes. Therefore, it can be concluded, in this case, that the model is capable of approximating the true output reasonably well. Nevertheless,  object 20813 represents an exception in terms of errors, as can be seen in Fig. 17, where the output trajectory is completely distorted.     By inspecting Fig. 11, it can be seen that the B * distribution of 20813 is characterised by a large range of values. However, its interquartile is lower than 19045, and the model has proved itself to provide acceptable performances on the latter, with an output that accurately follows the true one. Therefore, it appears that the drag-like coefficient can not be directly linked to bad performance. Indeed, 20813 is characterised by high values of eccentricity, as highlighted in Fig. 12. This makes the nature of the re-entry different, and the model is not capable of acquiring this knowledge because it is not trained for it. Therefore, the prediction capabilities have to be related not only to the B * but also to the eccentricity.

Conclusions and Discussion
This work presents the development of a deep learning model that is based on a variant of the Seq2Seq architecture using curriculum learning for the re-entry prediction of uncontrolled objects in Low Earth Orbit based on TLE data. We therefore propose a paradigm shift from a physical to a data-driven approach. Based on the physical knowledge of the re-entry phenomenon, we introduced additional features alongside the average altitude profile, in an effort to better characterise and predict the re-entry epoch. Two additional key features have been identified: the drag-like coefficient, B * , and the solar index, F 10.7 . The drag-like coefficient is of key importance as it relates the re-entry time to the interaction of the object with the atmosphere. In our work, we present a methodology to introduce such a parameter as a feature into a deep learning model, via a stepwise moving window interpolation scheme. The solar index is equivalently relevant in the re-entry prediction as it affects the density of the atmosphere and, with it, the re-entry time. Analysing this parameter, the need to introduce an additional feature has arisen, in order to relate the solar index with the physical properties of the object. Therefore, the area-to-mass ratio was also added as a feature. Studying the behaviour of the deep learning model with respect to these features, we identified possible ways to introduce them in the model. The developed deep learning model has been tested on a set of objects studied during the IADC campaigns. It has been observed, in accordance with Jung et al. [13], that the B * distribution has significant importance on the performances of the model. However, the poor performance related to object 20813 (Fig. 17) suggests that also eccentricity has a relevant impact. Hence, future works may include the processing of highly eccentricity orbits. In addition, it has been found that a decrease in the selected loss function (MSE) did not necessarily correspond to higher accuracy of the re-entry prediction. Consequently, future studies could also investigate the effect of different loss functions on the output of the deep learning model. Overall, the analyses showed promising results for the re-entry prediction using machine learning; however, as can be expected, the prediction capabilities of the model strongly depend on the quality and quantity of the training data. It is therefore crucial, in order to expand the capabilities of this data-driven approach, to expand the size of the training data and, potentially, improve the quality of such data by considering additional sources other than TLEs.

Conflicts of Interest:
The authors declare no conflict of interest.