A New Machine Learning Algorithm for Numerical Prediction of Near-Earth Environment Sensors along the Inland of East Antarctica

Accurate short-term small-area meteorological forecasts are essential to ensure the safety of operations and equipment operations in the Antarctic interior. This study proposes a deep learning-based multi-input neural network model to address this problem. The newly proposed model is predicted by combining a stacked autoencoder and a long- and short-term memory network. The self-stacking autoencoder maximises the features and removes redundancy from the target weather station’s sensor data and extracts temporal features from the sensor data using a long- and short-term memory network. The proposed new model evaluates the prediction performance and generalisation capability at four observation sites at different East Antarctic latitudes (including the Antarctic maximum and the coastal region). The performance of five deep learning networks is compared through five evaluation metrics, and the optimal form of input combination is discussed. The results show that the prediction capability of the model outperforms the other models. It provides a new method for short-term meteorological prediction in a small inland Antarctic region.


Introduction
Research and logistical activities in and around Antarctica are heavily dependent on environmental forecasting systems' reliable forecasts. The establishment of an integrated multi-sensor environmental prediction system is an urgent task when conducting scientific research activities and station area construction, especially in areas where local weather forecasts are lacking. Prediction of wind speed and temperature has been an issue of particular interest. However, due to the unique geography of East Antarctica, real-time forecasting is difficult in small areas. The existing meteorological stations and sensor data processing methods are mostly numerical models. The Australian Bureau of Meteorology uses the Community Climate System Simulation Global (ACCESS-G) Characteristic Numerical Atmospheric Weather Prediction (NWP) suite for Antarctic weather prediction. However, forecasts in the extreme high latitudes of the southeast have been shown to have severe degradation of model performance in the high southern latitudes in the validation work of Schroeter et al. 2019 [1]. However, when calculating large models, the hardware's performance requirements are relatively high due to a large amount of data and high time complexity [2]. Numerical calculations may be unreliable in the field or in emergencies where rapid prediction results are required. In addition to this, environmental prediction methods for East Antarctica's interior are less common in various studies, mainly because of the high altitude and harsh environment of the region, which makes sensor data collection difficult. This study aims to obtain a machine learning method for analyzing and predicting data from the surface atmospheric weather sensor array in the eastern part of Antarctica.
To obtain new data-driven methods and essential technical methods of data prediction, researchers in related fields have introduced machine learning methods in the process of surface environment prediction and data analysis. This method has been applied in many cases and achieved scientific goals. With machine learning development, image recognition techniques were applied to atmospheric observations [3]. However, under Antarctica's harsh climatic conditions and communication difficulties, sensors with lower power consumption become the preferred solution using less data transmission capacity. In addition to this, the lower air visibility index directly affects optical imaging. As a result, the device described in the article cannot work.
Yeh et al. 2019 combines sensor fusion with neural network prediction to improve urban wind turbine maintenance efficiency in various environments [4]. Similar work has been done at the Spanish National Meteorological Service (AEMET) to establish a network of meteorological stations on eight islands in the Canary Islands. Using data from these weather stations, Javier et al. 2019 propose an innovative machine learning added value based on a data stream mining paradigm. It was used to predict the wind speed in the region [5]. In the wind signature recognition study, Leopoldo et al. 2017 applied genetic algorithms to in-flight airspeed sensor data to make predictions [6]. Furthermore, in wind prediction in ships, Mei et al. 2020 used pattern search algorithms to predict wind and current rate [7]. In emerging work on spatial feature extraction for multidimensional, multi-location time series, Bilgera et al. 2018 used the CNN-LSTM algorithm to estimate gas source locations [8]. The prophecy of wind speed plays a vital role in predicting the Antarctic region's meteorological environment. High winds in the middle and high altitude areas of East Antarctica sometimes cause equipment damage.
In addition to predicting wind speed, the prediction of temperature is an important task. Surface temperature measurements in the East Antarctic are mainly based on temperature sensors in weather stations. Still, because the main research stations in the region are near the coastline, there is little access to meteorological data in the East Antarctic interior. Because of the East Antarctic's specific geographic area, most of the temperature prediction work and sensor data processing methods remain in the West Antarctic. Many of the work using machine learning in West Antarctica is mostly single-input neural networks, and there will be problems such as gradient disappearance during practical training. This makes it difficult to extract data features in the learning process. In surface climate prediction in the West Antarctic, David et al. 2005 use feed-forward neural networks to reconstruct ice cores for annual mean climate conditions based on surface weather station and shallow snow core data [9]. This method was the first approach to combining neural networks to predict surface temperatures. To alleviate the shortage of Antarctic surface meteorological data and make better use of this valuable data resource, David et al. 2004 use artificial neural network-based techniques to extend and fill data gaps in selected records from automated weather stations [10]. It can be seen that machine learning is one of the practical and popular methods for supplementing data and predicting data in the Antarctic.
Hochreiter and Schmidhuber proposed a temporally recurrent neural network in 1997, and their proposed network is known as the Long Short Term Memory (LSTM) network. The LSTM does not require exceptional complexity to debug hyperparameters, and it can choose to remember or forget long-term information through a forgetting gate [11]. Many researchers have used LSTM as a practical algorithm for solving vanishing gradients in electricity pricing, stock prices, robot control and disease prediction [12]. For time series prediction and analysis, this algorithm outperforms other traditional machine learning algorithms. In deep learning methods, stacked autoencoders (SAEs), typically learn high-level features incrementally from low-level features by minimising each input layer [13]. The technique is widely used in the field of soft perception. There are also corresponding variants that have been applied in the process, such as the deep-stacked isomorphic autoencoder (SIAE), which was used to improve soft measurement methods for prediction by Yan et al. 2020 [14]. In addition to this, a deep correlation learning method (DRRL) based on superimposed self-encoders was proposed by Yuan et al. 2020 to address feature representation techniques for complex structured data 1 [15]. Combining various applications, the advantage of SAE stands out in its ability to capture hidden features. This is a clear advantage in cases where the data volume is small, and the parts are not prominent. The acquisition of components is essential because of the scarcity of data in the Antarctic. The introduction of this method has aided in the learning and capture of features.
In the second part, the article presents measurement data from four stations with different meteorological conditions from the East Antarctic coast to the highest point of Dome A and the types of sensor data. The third part focuses on the LSTM and Bi LSTM neural networks, and the joint training prediction method of neural network SAE with neural network LSTM. In the fourth section of the article, the authors compare the prediction performance of five neural networks, reflecting the results in terms of error statistics; in the fifth section of the article, the authors discuss the prediction performance and time spent by each model. The final section of the article summarises the study's innovations and main work and provides an outlook for future work.

Areas of Study
Research in the East Antarctic has focused on the edges of ice shelves, areas of visible surface rock and sizes of ice domes, such as the highest point in Antarctica, Dome A, and areas such as the Princess Elizabeth Land. Four unmanned station observation sites were selected at different altitudes and under other environmental conditions to test the algorithm's stability. The meteorological stations were laid out in positions and the Chinese Antarctic Scientific Expedition (PANDA route). These include sites with geographical characteristics. As shown in Figure 1, the four star-shaped markers on the Antarctic continent represent the geographical locations of the four groups of sensor sites in this study, including the 100 km from the coast area, the 300 km from the coast area, the Princess Elizabeth Land area and the Dome A Antarctic Highest Point area. Data provided by the Chinese Academy of Meteorological Sciences. The detailed layout of the four monitoring stations is shown in Table 1.

Input Parameters and Correlation Analysis
Meteorological observations are needed to analyse the course of wind and temperature changes in an area. The panda 300 weather station at 300 km on the inner route was placed on 13 December 2019, as shown in Figure 2. The observable factors include air temperature and humidity at 1 m, air temperature and humidity at 2 m and air temperature and humidity at 4 m. wind speed and direction at 1 m, wind speed and direction at 2 m and wind speed and direction at 4 m. Light irradiance and snow surface albedo, snow

Input Parameters and Correlation Analysis
Meteorological observations are needed to analyse the course of wind and temperature changes in an area. The panda 300 weather station at 300 km on the inner route was placed on 13 December 2019, as shown in Figure 2. The observable factors include air temperature and humidity at 1 m, air temperature and humidity at 2 m and air temperature and humidity at 4 m. wind speed and direction at 1 m, wind speed and direction at 2 m and wind speed and direction at 4 m. Light irradiance and snow surface albedo, snow depth and snow quality. As the sensor at 1 m on the surface will encounter snow burial, ground disturbance and other factors during operation, the correlation analysis will not be carried out due to the loss of data in multiple sections of the sensor at 1 m, and the data on light irradiance and snow accumulation is not publicly available. Since changes in temperature and wind speed are relatively unlike snow depth, etc., which take a long time to accumulate, the temperature and wind speed and direction are simply predicted. Changes in each parameter's values up to 1 h, each parameter up to 2 h and each parameter over a more extended period are not considered. The correlation coefficient R x,y between the input parameters can be expressed as: Sensors 2021, 21, x FOR PEER REVIEW the mean value of wind direction per hour at 4 m and 2 m, respectively, and WS WSM2 represent the maximum wind speed collected per hour at 4 m and 2 m, respe BTV and CELLT represent the battery voltage as well as the controller temperatur the snow. As the Antarctic climate is a non-linear process, parameters are sel screened when selecting input features. Correlation colour charts of the relevant p ters are, therefore, analysed. The closer the correlation coefficient in the colour ch 1, the more similar the features' trend is. Such a combination of input variables chosen for prediction during multivariate input training of neural networks.  In Equation (1), x and y represent two different environmental variables, cov(x, y) represents the covariance between them.σ represents the root mean square.
The correlation diagram for the primary nine parameters is shown in Figure 3. TP 4 and TP 2 represent the mean values of the temperatures at 4 m and 2 m per hour, respectively (the mean values are calculated by averaging the values collected every ten minutes for one hour, the same applies to the means below), TPL 4 and TPL 2 represent the lowest temperatures collected per hour at 4 m and 2 m, respectively. HM 4 and HM 2 represent the mean value of humidity per hour at 4 m and 2 m, respectively, WS 4 and WS 2 represent the mean value of wind speed per hour at 4 m and 2 m, respectively, WD 4 and WD 2 represent the mean value of wind direction per hour at 4 m and 2 m, respectively, and WSM 4 and WSM 2 represent the maximum wind speed collected per hour at 4 m and 2 m, respectively. BT V and CELL T represent the battery voltage as well as the controller temperature under the snow. As the Antarctic climate is a non-linear process, parameters are selectively screened when selecting input features. Correlation colour charts of the relevant parameters are, therefore, analysed. The closer the correlation coefficient in the colour chart is to 1, the more similar the features' trend is. Such a combination of input variables is often chosen for prediction during multivariate input training of neural networks.

Normalisation and Regularisation
The dataset used in this study precedes the model training. Some work needs to be done to prepare the data. As most of the sensors used to collect the data were installed in December 2019, the job of supplementing a large amount of missing data can be omitted, and only the denoising of data and the supplement of individual data points are required. The data were collected over four months and, except for a few missing data, the total number of data samples per weather station was around 2900. The selection of the input dataset for the neural network model was based on two considerations: firstly, the human peer window for the Antarctic Inland Route was considered to be in December and April; secondly, the requirement for model training efficiency was that the model training time had to be within 20 min. With these two considerations in mind, the dataset was chosen for January-April. One of the prediction phases was the error analysis of the April data. In the input data of the neural network model 2175 samples (75%) was used as a training set for the experiment and 725 samples (25%) as a test set (the model does not adjust the parameters for the results to achieve the effect of changing the hyperparameters, and the test set does not show over-and under-fitting of the prediction results, so a separate validation set is not necessary). The purpose of this experiment is to compare the prediction performance of the newly proposed machine learning algorithm with other machine learning algorithms on the same data set. It is not an optimization problem for a particular algorithm. The separate partitioning of the validation set has been omitted to save time. The data's normalisation is the scaling of the data to fall into a small, specific interval. This is transformed into pure, dimensionless values so that indicators of different units or magnitudes can be compared and weighed. A single sample's characteristics are subtracted from the average of all training samples (same features) and divided by all training samples' variance. The features of individual segments are deducted from the average of all training samples (same characteristics) and divided by all training samples' conflict. Thus, for each fragment, all data are clustered around 0 with a variance of 1. This is calculated as follows: In Equation (2), where x is the sample eigenvalue, µ is the sample mean, σ is the standard deviation of the sample data, and X is the normalised eigenvalue [16].
One of the goals of neural network prediction is to normalise parameters while minimising prediction error. To minimise the error is to get the predicted value closer to the training data, and to normalise the parameters is to prevent over-fitting. The leading cause of overfitting is too many parameters. The error in the test set of the predictions is much larger than the error in the training set. The "simplicity" of the model is ensured to minimise training errors and give good generalisation performance of the resulting parameters. The fitting process usually tends to make the weights as small as possible and construct a model with small values for all parameters. This is because it is generally accepted that models with small parameter values are simpler, adapted to different datasets, and avoid overfitting. In addition to using the L 1 regularisation method, the L 2 regularisation method is used to prevent overfitting. In this paper, the L 2 regularisation method is used, with the standard term Equation (3), limiting the training model's complexity.
In Equation (3), J 0 is the original loss function, the latter term is the L2 regularisation term, and α is the regularisation factor.
The data acquisition process of the equipment in the field is as follows. Each sensor is collected at 10-min intervals, and the values contained each time are stored in Campbell's data logger. The embedded program compares and averages the data over one hour. In the individual data returns, the data are all real data. Inevitably, there will be missing data and incorrect data collection. The work to be done before input to the neural network for training is data cleaning. This work is also known as pre-processing, and the process involved is shown in Figure 4. The program smoothes out the noisy data from the original data and filters out the irrelevant data. The method used in this study is interpolation to deal with missing values. An interpolation function is created that replaces the unknown values with known points around the missing points. The process is as follows. data and incorrect data collection. The work to be done before input to the neural n for training is data cleaning. This work is also known as pre-processing, and the involved is shown in Figure 4. The program smoothes out the noisy data from the data and filters out the irrelevant data. The method used in this study is interpol deal with missing values. An interpolation function is created that replaces the un values with known points around the missing points. The process is as follows.

Methodology
In developing and predicting data, RNN has been proposed since the 1980s gradually gained widespread application. However, several subsequent problem RNNs unsuitable for most conditions and situations. Later, LSTM neural networ developed to solve the problem of gradient disappearance. At the same time, seve iants of LSTM have grown accordingly. For example, Bi LSTM, SAE LSTM and som methods have been proposed to enhance feature learning efficiency.

Long Short-Term Memory Prediction Model
A new neural network called LSTM, has been proposed [17], known as Lon Term Memory (LSTM) and is used to predict and analyse time-series data. The station data obtained in this study include temperature, humidity, barometric p and wind speed, all of which are time series, and the three core operator structu tained in LSTM determine how long and short-term memory can be achieved b RNN The form of the algorithm is shown in Figure 5. The overall operation of th network is shown in Figure 6.

Methodology
In developing and predicting data, RNN has been proposed since the 1980s and has gradually gained widespread application. However, several subsequent problems made RNNs unsuitable for most conditions and situations. Later, LSTM neural networks were developed to solve the problem of gradient disappearance. At the same time, several variants of LSTM have grown accordingly. For example, Bi LSTM, SAE LSTM and some other methods have been proposed to enhance feature learning efficiency.

Long Short-Term Memory Prediction Model
A new neural network called LSTM, has been proposed [17], known as Long Short-Term Memory (LSTM) and is used to predict and analyse time-series data. The weather station data obtained in this study include temperature, humidity, barometric pressure and wind speed, all of which are time series, and the three core operator structures contained in LSTM determine how long and short-term memory can be achieved based on RNN The form of the algorithm is shown in Figure 5. The overall operation of the neural network is shown in Figure 6.
involved is shown in Figure 4. The program smoothes out the noisy data from the origina data and filters out the irrelevant data. The method used in this study is interpolation t deal with missing values. An interpolation function is created that replaces the unknow values with known points around the missing points. The process is as follows.

Methodology
In developing and predicting data, RNN has been proposed since the 1980s and ha gradually gained widespread application. However, several subsequent problems mad RNNs unsuitable for most conditions and situations. Later, LSTM neural networks wer developed to solve the problem of gradient disappearance. At the same time, several var iants of LSTM have grown accordingly. For example, Bi LSTM, SAE LSTM and some othe methods have been proposed to enhance feature learning efficiency.

Long Short-Term Memory Prediction Model
A new neural network called LSTM, has been proposed [17], known as Long Short Term Memory (LSTM) and is used to predict and analyse time-series data. The weathe station data obtained in this study include temperature, humidity, barometric pressur and wind speed, all of which are time series, and the three core operator structures con tained in LSTM determine how long and short-term memory can be achieved based o RNN The form of the algorithm is shown in Figure 5. The overall operation of the neura network is shown in Figure 6.  The forgotten gate is the process of choosing to forget and is represented as follows: The input gate selects the information to be entered into the unit. The input gates are represented as follows: The output gates are represented as follows: In Equation (4), t f is the output of the forgotten gate, and the f W matrix determines the vector of input weights. f b is the bias vector of the forgotten gate; 1 t h − is the hidden layer state at the last moment; the current input is i x , and σ is the activation function is calculated as shown in Equation (7).
[ ] The output of the previous moment in Equation (5)

Stacken Autoencoder
The autoencoder is an unsupervised neural network model that learns the input data's implicit features, called coding, and reconstructs the original input data with the new features known, called decoding. As well as performing feature downscaling, the new features learned by the autoencoder can be fed into a supervised learning model so The forgotten gate is the process of choosing to forget and is represented as follows: The input gate selects the information to be entered into the unit. The input gates are represented as follows: The output gates are represented as follows: In Equation (4), f t is the output of the forgotten gate, and the W f matrix determines the vector of input weights. b f is the bias vector of the forgotten gate; h t−1 is the hidden layer state at the last moment; the current input is x i , and σ is the activation function where W f * [h t−1 , x i ] is calculated as shown in Equation (7).
The output of the previous moment in Equation (5) is h t−1 . It is the value of the current input, c t is the activation state in the current cell, the cell state at the last moment of the previous phase is c t−1 , W i matrix is the weight in the input gate; W c matrix is the weight in the forgotten gate. b i is the input gate's bias vector; b c is the bias vector of the forgotten gate.
In Equation (6), O t is the vector in the output gate, h t is the result of the output gate, and the W 0 matrix is the weight in the output gate. b o is the offset vector in the output gate.

Stacken Autoencoder
The autoencoder is an unsupervised neural network model that learns the input data's implicit features, called coding, and reconstructs the original input data with the new features known, called decoding. As well as performing feature downscaling, the new features learned by the autoencoder can be fed into a supervised learning model so that the autoencoder can act as a feature extractor. There are many different autoencoders, including stacked autoencoders, under-complete autoencoders, regular autoencoders and so on. AE (autoencoder) [19] has only one hidden layer. More specifically, the AE input and output layers are equal.
As shown in Figure 7, the first layer is the input layer. The middle layer is the hidden layer. The last layer is the output layer The AE network can be non-linearly transformed from one layer to the next utilizing an activation function. The encoder converts the input into a more abstract feature vector, and the decoder reconstructs the information from the feature vector. There are two processes in the AE execution, encoding and decoding. The Equations (8) and (9) used in these two processes are shown below: where and output layers are equal. As shown in Figure 7, the first layer is the input layer. The middle layer is the hidden layer. The last layer is the output layer The AE network can be non-linearly transformed from one layer to the next utilizing an activation function. The encoder converts the inpu into a more abstract feature vector, and the decoder reconstructs the information from th feature vector. There are two processes in the AE execution, encoding and decoding. Th Equations (8) and (9) used in these two processes are shown below: bias vector; the sigmoid function is used as 1 sf and 2 sf . Obtain the parameters in AE by executing the following Equation (10).
SAE is a superposition of several AEs. After the first AE has been executed, the sub sequent AEs are executed in sequence up to the Nth, with SAE's output result. The struc ture of SAE is displayed in Figure 8.  Obtain the parameters in AE by executing the following Equation (10).
SAE is a superposition of several AEs. After the first AE has been executed, the subsequent AEs are executed in sequence up to the Nth, with SAE's output result. The structure of SAE is displayed in Figure 8.
including stacked autoencoders, under-complete autoencoders, regular autoencoders and so on. AE (autoencoder) [19] has only one hidden layer. More specifically, the AE input and output layers are equal.
As shown in Figure 7, the first layer is the input layer. The middle layer is the hidden layer. The last layer is the output layer The AE network can be non-linearly transformed from one layer to the next utilizing an activation function. The encoder converts the input into a more abstract feature vector, and the decoder reconstructs the information from the feature vector. There are two processes in the AE execution, encoding and decoding. The Equations (8) and (9) used in these two processes are shown below:  Obtain the parameters in AE by executing the following Equation (10).
SAE is a superposition of several AEs. After the first AE has been executed, the subsequent AEs are executed in sequence up to the Nth, with SAE's output result. The structure of SAE is displayed in Figure 8.

Combination Algorithm for SAE and LSTM with Multiple Inputs
To capture more LSTM neural network features, this study combines SAE and LSTM, the aim of which is to avoid gradient explosion while allowing features in the time series to be learned more fully. The integration of algorithms is divided into four steps:

1.
The time series is partitioned into a training set, a test set and a prediction set, and the partitioned series are normalised.

2.
They are setting the neural network parameters. For the entire experiment, all neural networks run in the same hardware and software environment. 3.
The SAE neural network is first trained to learn the multi-input network features in the hidden layer thoroughly. The results of the SAE features are exported to the LSTM neural network. In phase 3, the SAE performed unsupervised pre-training and supervised fine-tuning. This means that only the SAE model is trained and Equations (8)- (10) are cycled once to obtain one AE output. To get the hidden layer output of the second AE, the production of the previous AE becomes the input of the following AE. The number of training sessions is set to obtain the learned feature matrix of the last AE. The fine-tuning of the entire network utilizing the constraint Equation (11) is performed at this point by using backpropagation, the aim being to obtain improved weights.
N l represents the number of samples; A t is the actual values and F t is the prediction value [20].
In the fourth step, the main focus is on the execution of the LSTM algorithm. Since the previous SAE training results are available, the input obtained at SAE combined with multiple input variables is used as input to the LSTM neural network. The output of the products is obtained by getting f t by Equations (4) and (7), respectively, and by bringing f t into Equation (5), where it can be obtained. c t is obtained by substituting i t and f t into Equation (5). o t is obtained by Equation (6).

Bi-Directional Long Short-Term Memory
LSTM is a method for predicting the next moment based on information from past times. In some cases, however, the time series of the current time is not only related to the last moment but may also be related to future moments. In general, the information in an LSTM network is unidirectional, with all learning features coming from the past. LSTM can take into account both history and future data information. LSTM connects two networks on the same principle. The forward LSTM has access to information about the input sequence's past data, and the backward LSTM has access to information about the future data of the input sequence. This can be expressed in the following formula [21]. Can be represented by the following formula: The hidden layer state h t of Bi LSTM at time t consists of forwarding h tf and backward h tb . The four matrices W 1 , W 2 , W 3 and W 4 are weighting factor matrices for the hidden layer and the inputs, respectively; x t is the input sequence at time t and h t is the output of the hidden layer at time t. The hidden layer state of Bi LSTM at time t consists of forwarding h tf and backward h tb .
The algorithm structure is shown in Figure 9: Sensors 2021, 21, x FOR PEER REVIEW 11 of 24 Figure 9. Bi LSTM algorithm structure diagram.

Experimental Design and Parameterisation
The experiment's preparatory stages are the setting up of the environment and the setting up of the parameters. The location up of the environment consists of a hardware environment and a software environment. The set of parameters includes the neighbourhood of five neural network parameters. The experiments were conducted to compare different prediction targets and additional prediction steps. Table 2 shows the parameters for the computer hardware environment as well as the software environment. To ensure that the experiments are fair, the hardware and software environments used are identical. The quantities were constant in each experiment. In order to reflect the prediction effect, the correlation coefficient (R 2 ) [22], root mean square error (RMSE) [23], mean absolute error (MAE) [24], mean percentage absolute error (MAPE) [25] and Nash-Sutcliffe efficiency coefficient (NSE) [26]are evaluated. These are given in Equations (13)

Experimental Design and Parameterisation
The experiment's preparatory stages are the setting up of the environment and the setting up of the parameters. The location up of the environment consists of a hardware environment and a software environment. The set of parameters includes the neighbourhood of five neural network parameters. The experiments were conducted to compare different prediction targets and additional prediction steps. Table 2 shows the parameters for the computer hardware environment as well as the software environment. To ensure that the experiments are fair, the hardware and software environments used are identical. The quantities were constant in each experiment. In order to reflect the prediction effect, the correlation coefficient (R 2 ) [22], root mean square error (RMSE) [23], mean absolute error (MAE) [24], mean percentage absolute error (MAPE) [25] and Nash-Sutcliffe efficiency coefficient (NSE) [26] are evaluated. These are given in Equations (13)-(17), respectively: where u i is the predicted value, u is the average of the expected value, o i is the observed value, and o is the average of the experimental values. The R 2 value is between 0 and 1, where 0 means the model does not explain any variation and one means it explains the observed variation perfectly; MAE is mean absolute error, a commonly used error statistic; RMSE (root-mean-square error), also known as the standard error, is the squared deviation of the observed value from the actual value. The square root of the ratio of the number of observations. The root mean squared error is a measure of the deviation between an observed value and the real value. The standard error is susceptible to very large or very small errors in a set of measurements and is, therefore, a good indicator of the measure's precision. The standard error can be used as a criterion for assessing the accuracy of this measurement process; the mean absolute percentage error (MAPE), similar to MAE, can also be used to measure how well a model predicts results; the NSE takes a value from negative infinity to 1, with an NSE close to 1 indicating that the model is of good quality and that the model is of good quality. High reliability: an NSE close to 0 means that the simulation results are immediate to the mean level of the observations, i.e., the overall results are credible, but the process simulation errors are enormous; an NSE less than 0 means that the model is not reasonable.  The research results aim to provide a camp meteorological indicator for inland convoys. This is the future wind speed and direction. As the Chinese inland convoy's daily mileage is 75 km-120 km, the stopping point is the area around each weather station. The driving time during the day is about 7 h. Therefore, it is essential to obtain wind speed and temperature data for the camping site after 7 h to ensure the safety and schedule of the convoy. Once stationed, it is also essential to have hourly weather data for the future. This means organising the operation of the personnel after camping and the loading and unloading of goods. The experiment was, therefore, designed by dividing the whole investigation into 16 group projects. There are four experiments for each station. These are the 4 m temperature prediction for the next hour, the temperature prediction for the next 8 h, the 4 m wind speed prediction for the next hour and the 4 m wind speed prediction for the next 8 h.  Figure 10 shows a comparison between the new model and four commonly used neural network models on an hourly basis, where (a)-(d) in Figure 10 are the single-step temperature predictions for the four sites. Real Data represents actual environmental parameters. The rest of the lines are the predictions of the deep learning prediction algorithm. The worst predicted model at Pand100 is the Bi LSTM model, which is generally higher than the actual value. Pand300 reflects the situation where the BP neural network indicates much smaller peaks than the actual results. Taishan station reflects the status where the LSTM and Bi LSTM models are compared. Taishan Station reported that the prediction results of LSTM and Bi LSTM models are around 300 h. The prediction results between 600-700 h are not precise, and the magnitude of change is smaller than the real situation. The Kunlun station reflects that the ELM model can have a much higher peak than the actual values during the forecasting process.    Figure 10e-h, are single-step wind speed predictions for the four stations. As wind speed differs from continuous temperature variation, the hourly variation in wind speed is stepwise. Compared with temperature data of the same order of magnitude, wind speed series characteristics are more difficult to learn. When the wind speed data at Kunlun Station dropped to zero in the last 200 h, only one model, SAE LSTM, gave the closest prediction to the actual value. In the previous 200 h of the test set, the actual wind speed at Kunlun Station dropped to zero. Except for SAE LSTM, the errors shown by the remaining models exceeded 0.5 m/s. In the single-step wind speed and temperature prediction, SAE LSTM showed good stability.

Experiment One-Multiparameter Predictions for the Next Hour
To better quantify the error statistics of each model. Tables 4 and 5, respectively, compare the five statistical indicators of the two sets of experiments in the single-step test. The bold yellow shows the best results.
It can be seen from the results in Table 4 that the NSE value of Bi LSTM is at the lowest level in each experiment. This also reflects that the model or model parameters are not suitable for forecasting such time series for the performance of the BP neural network's temperature prediction results at Kunlun Station and Panda100 Station. The coefficient of determination is close to or even beyond the SAE LSTM model. But the poor performance on other sites shows that the algorithm requires manual tuning. Table 5 further demonstrate that SAE LSTM shows better adaptability than other models when facing wind speed data with fewer features. In Table 5, the SAE LSTM algorithm's performance is the best for each site and each parameter comparison. Furthermore, in the forecast of wind speed at Kunlun Station, NSE has a negative value. It shows that the BPNN model at this time is not suitable for the prediction of the sequence. Secondly, it can be seen from the statistical indicators that LSTM has the best prediction effect among the four models except for SAE LSTM. It can be seen that the advantages of long and short-term memory are-and the prediction effect of the network has been-improved after combining with SAE. Figure 11 compares the predicted value and the real value with a prediction step of 8 h. As the prediction step size increases, the correlation between data decreases, but the structure of LSTM allows more feature information to be retained. Although the rise in compensation makes it more challenging to collect wind speed data characteristics, it is observed in Figure 11. The relative trend changes are still consistent. There is no over-fitting or under-fitting phenomenon. Furthermore, the problem reflected in Figure 11 is the same as Figure 10, especially in the multi-step wind speed prediction process, except for the Panda300 station, SAE LSTM and LSTM are the two best performings. LSTM appears underfitting at panda300. Tables 6 and 7 list the error statistics of the wind speed and temperature prediction at the four stations with a step length of 8 h. The best results in these tables are shown in bold yellow text. Similar to the problems in the previous experiment, in the process of predicting the wind speed of Kunlun Station. In the error statistics of BPNN, NSE less than zero indicates that there is also a problem of model mismatch. However, when the singlestep and multi-step prediction accuracy are compared only in terms of the coefficient of determination, the multiple temperature prediction accuracies of Kunlun Station is higher.

Experiment Two-Multi-Parameter Predictions for the Next Eight Hours
In comparison, the single-step prediction accuracy of the remaining three stations is more increased. BPNN predicts better than LSTM in experiments with variable step size, and the factor that causes its instability may be unadjusted parameters. However, the performance of SAE LSTM at each site is the best, regardless of the evaluation index.
bold yellow text. Similar to the problems in the previous experiment, in the process of predicting the wind speed of Kunlun Station. In the error statistics of BPNN, NSE less than zero indicates that there is also a problem of model mismatch. However, when the single-step and multi-step prediction accuracy are compared only in terms of the coefficient of determination, the multiple temperature prediction accuracies of Kunlun Station is higher.   The author also needs to add that in the relevant data set involving wind speed in the experiment, the wind speed data from Kunlun station have almost 20 days of 0 wind speed data (the wind speed sensor critical point wind speed is 0.2 m/s). These data are the actual standard data. Due to the low annual average wind speeds in the area, the maximum wind speed does not exceed a force three wind. It is permanently calm. This part of the data may cause the reader to be suspicious, but Dome A being the highest point in Antarctica (which is where Kunlun Station is located), the area is in a state of calm winds for up to 2 months.
The SAE LSTM combination model was found to have the best predictive performance through model presentation and data experiments. We can also try to understand it on a physical level. Variations in wind and temperature are based on the modelling theory of atmospheric physics. They are caused by local time accumulation and changes in the physical state of the spatial environment. In the case of temperature, for example, the rate of change of internal and kinetic energy with time in a matter volume element is equal to the sum of the heating rate of the external source and the power done to that matter volume element by the external source, according to the meaning expressed in the heat flow equation. As shown in Equation (18): In the formula, T and p are temperatures and air pressure, respectively; Q is the heating rate per unit mass of air by the external source; t is the time; c v is the specific constant volume heat capacity of dry air, and the value is 717 joules per kilo Kelvin (J·K −1 ·kg −1 ); a is the particular volume; ε is the Stokes dissipation function, which represents the dissipation rate of molecular viscosity to kinetic energy. In Equation (18), the air pressure is related to the wind speed, and the main influencing factors are the rate of change of the specific volume with time and the healing power of the external aid. Concentrated on the difference in the temperature value is expressed as the integral of the particular heat in time. For the non-adiabatic effects in the detailed section, certain factors significantly affect the temperature due to Spatio-temporal differences. As shown in both aspects, the LSTM neural network's strength in processing time series allows for obtaining time-integrated features and trends; the SAE neural network extracts components that contain mainly non-adiabatic effects uncountable residual spaces, physical states. The combination of SAE and LSTM allows the algorithm to show good predictive performance when dealing with time series with time-varying and time-accumulative features.

Discussion
Climate predictions for East Antarctica can provide strong support for scientific research activities. For this study, it is the first time that machine learning methods have been applied to an actual meteorological survey scheme for an inland Antarctic convoy route. The four groups of experiments were introduced to reflect the new algorithm's stability and good prediction performance from the results of five statistical indicators. Based on proposing the new algorithm, we have the following discussion: 1.
In the experiment, except that the algorithm's time complexity and space complexity is not considered much. This makes the timeliness of the conclusion to be verified. Therefore, this part will discuss in detail based on the efficiency of the five algorithms.

2.
In comparing the real value and the predicted value, further data visualisation is needed. That is, whether the one-to-one correspondence between the actual value and the expected value is close to the same in the linear fitting process. In other words, whether there is over-fitting or under-fitting when the error statistics are deficient. This part will supplementary experiments based on the single-step wind speed prediction and temperature prediction of Experiment 1.

3.
For multiple inputs, we choose an input combination with a Pearson coefficient close to 1. Whether to introduce other inputs will improve the accuracy of the prediction. Of course, the corresponding running time will be increased. At this time, efficiency must be considered. Therefore, we will not discuss the problem of input combinations too much here. The input combinations in this study are all combination sequences with Pearson graph correlation coefficient greater than 0.9. This ensures that the features are consistent.

4.
The dataset in the experiments spanned four months. The choice of input dataset period was based on Antarctic field operation requirements. Suitable dataset lengths were selected to meet the needs of the explorers and expedition personnel for weather forecasting. To further analyse the model's generalisability, annual data were chosen as model input data later in the long-term training process. The later stage of the model's input data consisted of 4 m wind speed and 4 m temperature from the Kunlun station for the whole year 2020. The addition of annual data explains the zero wind speed data from Kunlun Station and further validates this new model's applicability. Figure 12 is a statistical test of model prediction results. The coefficient of measurement (R-squared) and the slope of the proper function of the prediction scatter plot are commonly used parameters to judge the prediction results. The closer these two parameters are to 1, the better the prediction result. Based on this observation, Figure 12a-t can be obtained.
were selected to meet the needs of the explorers and expedition personnel weather forecasting. To further analyse the model's generalisability, annual d were chosen as model input data later in the long-term training process. The la stage of the model's input data consisted of 4 m wind speed and 4 m temperat from the Kunlun station for the whole year 2020. The addition of annual data expla the zero wind speed data from Kunlun Station and further validates this new mod applicability. Figure 12 is a statistical test of model prediction results. The coefficient of measu ment (R-squared) and the slope of the proper function of the prediction scatter plot commonly used parameters to judge the prediction results. The closer these two param ters are to 1, the better the prediction result. Based on this observation, Figure 12a-t c be obtained. According to the obtained Figure 12, it can be seen that in the fitting process of ea site, the slope of the right line of the ELM neural network in most cases is closer to According to the obtained Figure 12, it can be seen that in the fitting process of each site, the slope of the right line of the ELM neural network in most cases is closer to 1, followed by SAE LSTM. But the stability of SAE LSTM is better, in the single-step temperature prediction of each site. The slopes of the fitted straight lines are all greater than 0.92. In addition to this, combined with the correlation coefficient R 2 , it can be seen that the prediction results of SAE LSTM are more stable, while ELM will fluctuate. The product is very biased. The other three models only performed well in one or two experiments. Therefore, Figure 12 reflects the actual situation of the forecast more clearly. It further illustrates that after SAE combined with LSTM, the LSTM network is more stable, and the prediction accuracy is higher. Figure 13 shows a comparison of the training times of the models in the experimental panda100 site 4 m temperature prediction experiment. It can be seen from the figure that the size of the data sample affects the running time of each model. For the analysis of algorithmic complexity, the complexity of the forward/backward propagation of the neural network is proportional to the number of parameters when looking at the complexity of a single iteration, i.e., O(N) when using the Big O analysis, where N is the size of the input data. The primary analysis method used in this experiment is to evaluate the code runtime environment and the corresponding hardware, combined with the actual runtime of each model algorithm as an evaluation metric. followed by SAE LSTM. But the stability of SAE LSTM is better, in the single-step temperature prediction of each site. The slopes of the fitted straight lines are all greater than 0.92. In addition to this, combined with the correlation coefficient R 2 , it can be seen that the prediction results of SAE LSTM are more stable, while ELM will fluctuate. The product is very biased. The other three models only performed well in one or two experiments. Therefore, Figure 12 reflects the actual situation of the forecast more clearly. It further illustrates that after SAE combined with LSTM, the LSTM network is more stable, and the prediction accuracy is higher. Figure 13 shows a comparison of the training times of the models in the experimental panda100 site 4 m temperature prediction experiment. It can be seen from the figure that the size of the data sample affects the running time of each model. For the analysis of algorithmic complexity, the complexity of the forward/backward propagation of the neural network is proportional to the number of parameters when looking at the complexity of a single iteration, i.e., O(N) when using the Big O analysis, where N is the size of the input data. The primary analysis method used in this experiment is to evaluate the code runtime environment and the corresponding hardware, combined with the actual runtime of each model algorithm as an evaluation metric. When using the SAE LSTM model, the running time is longer than with other models. However, the prediction accuracy of LSTM has improved significantly. This increase in the prediction accuracy of the algorithm inevitably leads to an increase in complexity and training time. However, run times of less than 30 min can be used for action planning in the field. However, considering that ELM consumes the shortest time per training and test of any model and that the slope of the algorithm's fit is closest to 1 in the linear fitting process, there are some cases where it exhibits large deviations from the norm, the ELM neural network can be considered in cases where the shortest time is considered.
Finally, the authors present wind speed and direction data from Kunlun Station for the full year 2020, supplemented by a wind speed prediction experiment with the fullyear data as input.
The first consideration concerning the appearance of static wind data is the sensor's operating status, which can be determined in two ways. The first is to observe whether the angle of the wind sensor changes on a quiet wind day, as a quiet wind is not the absence of wind, but there will be vertical winds as well as winds of less than 0.3 m/s. In Figure 14, the wind speed and direction data for the whole year at Kunlun station are When using the SAE LSTM model, the running time is longer than with other models. However, the prediction accuracy of LSTM has improved significantly. This increase in the prediction accuracy of the algorithm inevitably leads to an increase in complexity and training time. However, run times of less than 30 min can be used for action planning in the field. However, considering that ELM consumes the shortest time per training and test of any model and that the slope of the algorithm's fit is closest to 1 in the linear fitting process, there are some cases where it exhibits large deviations from the norm, the ELM neural network can be considered in cases where the shortest time is considered.
Finally, the authors present wind speed and direction data from Kunlun Station for the full year 2020, supplemented by a wind speed prediction experiment with the full-year data as input.
The first consideration concerning the appearance of static wind data is the sensor's operating status, which can be determined in two ways. The first is to observe whether the angle of the wind sensor changes on a quiet wind day, as a quiet wind is not the absence of wind, but there will be vertical winds as well as winds of less than 0.3 m/s. In Figure 14, the wind speed and direction data for the whole year at Kunlun station are illustrated, where the period indicated by the blue illustrated double arrow is the intermittent quiet wind window. At the end of the year 2020 data near the 7000-h position on the axis, the windy season was entered, and the sensor returned wind speed values. Combining these two points would suggest that the zero wind speed input data from the earlier experiments were not the result of a damaged sensor. illustrated, where the period indicated by the blue illustrated double arrow is the intermittent quiet wind window. At the end of the year 2020 data near the 7000-h position on the axis, the windy season was entered, and the sensor returned wind speed values. Combining these two points would suggest that the zero wind speed input data from the earlier experiments were not the result of a damaged sensor. A span of 8763 h was entered into the data set during the later full-year data analysis. The start and end dates are 1 January 2020, and 31 December 2020. Figure 15 shows a comparison between the test set and the input data. It can be seen that the predicted values of WS4, the subject of the SAE LSTM model, follow the trend of the real values better. To illustrate the problem even further, the authors compare four error statistics in Figure 16.    A span of 8763 h was entered into the data set during the later full-year data analysis. The start and end dates are 1 January 2020, and 31 December 2020. Figure 15 shows a comparison between the test set and the input data. It can be seen that the predicted values of WS4, the subject of the SAE LSTM model, follow the trend of the real values better. To illustrate the problem even further, the authors compare four error statistics in Figure 16. illustrated, where the period indicated by the blue illustrated double arrow is the intermittent quiet wind window. At the end of the year 2020 data near the 7000-h position on the axis, the windy season was entered, and the sensor returned wind speed values. Combining these two points would suggest that the zero wind speed input data from the earlier experiments were not the result of a damaged sensor. A span of 8763 h was entered into the data set during the later full-year data analysis. The start and end dates are 1 January 2020, and 31 December 2020. Figure 15 shows a comparison between the test set and the input data. It can be seen that the predicted values of WS4, the subject of the SAE LSTM model, follow the trend of the real values better. To illustrate the problem even further, the authors compare four error statistics in Figure 16.    the axis, the windy season was entered, and the sensor returned wind speed values. Combining these two points would suggest that the zero wind speed input data from the earlier experiments were not the result of a damaged sensor. A span of 8763 h was entered into the data set during the later full-year data analysis. The start and end dates are 1 January 2020, and 31 December 2020. Figure 15 shows a comparison between the test set and the input data. It can be seen that the predicted values of WS4, the subject of the SAE LSTM model, follow the trend of the real values better. To illustrate the problem even further, the authors compare four error statistics in Figure 16.

Conclusions
This study aims to create a new neural network model to be applied to future shortterm parameter predictions at Antarctic multi-sensor weather stations to achieve accurate and efficient camp weather predictions. This will ensure the safety of inland convoys when camping and operating along the route. From these experiments and discussions, the following conclusions were drawn.
(1) In the algorithm training experiments, when selecting the input variables for the weather station, variables with a correlation higher than 0.9 can be chosen as input variables according to the Pearson diagram. (2) When performing ambient weather prediction in the field, it is concluded that the SAE LSTM neural network model is superior to the other four neural network models in terms of decision coefficient and slope of fit. The network solved the gradient explosion problem and obtained more data series characteristics with SAE. However, the model with the shortest time and better fit was the ELM neural network model. (3) There was little variability in the model's performance at each site during the multigroup experiments. Therefore, the model is feasible for meteorological prediction and environmental assessment of the East Antarctic Inland Line fleet.
The Antarctic interior is the primary location for scientific observation activities and scientific experiments in Antarctica. In addition to the harsh environment, which creates excellent conditions for the experimental requirements, the uncertain climate adds new difficulties to the experiments. This is why the prediction of near-Earth weather in a small area is a must for the field staff to ensure that the investigations are carried out. In the past, however, erroneous forecasts had often led to inevitable losses during periods when ground-based sensor networks and machine learning methods were lacking. The deep learning neural network model based on a novel weather station proposed in this paper provides a new prediction tool for inland camp weather forecasting. Compared to the commonly used BPNN and LSTM neural networks, the new algorithm solves the gradient explosion problem and extracts more sequence features via the SAE network.
If scenarios or data types are changed, data cleansing and parameterisation are done if necessary. Artificial intelligence brings shortcuts and new methods to research in related areas. In future research, a camp-wide digital twin network in the Antarctic could be considered. Equipment such as LORA could be regarded as to make an extensive, finegrained observation of the operational area. Along with increasing the density of the sensor network layout, the computing power of the corresponding core hardware and the optimisation of the algorithms must also be completed. In future studies, it is expected that sensor networks will be set up at permanent Antarctic stations such as Zhongshan, Kunlun or Taishan, where the digital twin will be used to provide the crew with real-time information on the future operational status of the camps and equipment observations.