A Long Short-Term Memory Network for Plasma Diagnosis from Langmuir Probe Data

Electrostatic probe diagnosis is the main method of plasma diagnosis. However, the traditional diagnosis theory is affected by many factors, and it is difficult to obtain accurate diagnosis results. In this study, a long short-term memory (LSTM) approach is used for plasma probe diagnosis to derive electron density (Ne) and temperature (Te) more accurately and quickly. The LSTM network uses the data collected by Langmuir probes as input to eliminate the influence of the discharge device on the diagnosis that can be applied to a variety of discharge environments and even space ionospheric diagnosis. In the high-vacuum gas discharge environment, the Langmuir probe is used to obtain current–voltage (I–V) characteristic curves under different Ne and Te. A part of the data input network is selected for training, the other part of the data is used as the test set to test the network, and the parameters are adjusted to make the network obtain better prediction results. Two indexes, namely, mean squared error (MSE) and mean absolute percentage error (MAPE), are evaluated to calculate the prediction accuracy. The results show that using LSTM to diagnose plasma can reduce the impact of probe surface contamination on the traditional diagnosis methods and can accurately diagnose the underdense plasma. In addition, compared with Te, the Ne diagnosis result output by LSTM is more accurate.


Introduction
Plasma is a complex thermodynamic system composed of electrons, ions, and neutral particles, which widely exists in cosmic space. Plasma is a conductive fluid as a whole, showing electrical neutrality macroscopically, but under the action of an electromagnetic field, energy transmission can occur. The measurement of the plasma state has always been the focus of researchers. The state of plasma can be characterized by electron density (N e ), electron temperature (T e ), plasma space potential (V p ), and other parameters, among which the most crucial are N e and T e . N e describes the number of electrons per unit volume, while T e describes the kinetic energy possessed by electrons. Under thermal equilibrium conditions, T e is equal to the ion temperature (T i ). Most diagnostic methods for plasma are aimed at obtaining N e and T e . The diagnosis methods of plasma are divided into telemetry diagnosis and in situ diagnosis. Telemetry diagnosis includes microwave diagnosis and spectral diagnosis. Langmuir probe diagnosis is the most common in situ diagnosis technology, which has been widely used in laboratory and space plasma detection [1][2][3][4][5][6][7]. Compared with telemetry diagnosis, the Langmuir probe can obtain more reliable and accurate diagnosis results.
However, the traditional diagnostic method of Langmuir probes highly depends on the acquisition of the current-voltage (I-V) characteristic curve. The degree of deviation of the collected I-V characteristic curve from the actual value directly affects the reliability of the diagnosed plasma parameters. The shape of the I-V characteristic curve is affected by many factors, such as the contaminated layer on the probe surface, the sheath and the Debye length of plasma, and even the driving circuit, which makes serious errors in the diagnosis results. In addition, the excessive "human factors" in the traditional diagnosis process also increase the randomness of the diagnosis results.
Since 1938, when using Langmuir probes to diagnose plasma [8,9], many researchers have found the contaminated layer on the probe surface, which is mainly manifested in the hysteresis of the I-V characteristic curve [10]. Two different distorted I-V curves could be obtained by a Langmuir probe, when the applied voltage was swept upward and then downward [2,[11][12][13][14]. In order to eliminate the influence of the contaminated layer on data, Oyama [15] applied a glass-sealed Langmuir probe to ionosphere exploration. The reason for abandoning the traditional Langmuir probe is that it easily adsorbs water molecules, nitrogen molecules, and oxygen molecules to form a contaminated layer on the surface of the probe. Before sounding rockets or satellites are launched, Langmuir probes that have been exposed to the atmosphere are easy to be contaminated. Subsequently, in order to avoid the influence of contaminants on the diagnostic results of Langmuir probes, Amatucci et al. [16] invented a spherical Langmuir probe that can remove surface contaminants by internal heating; however, this structure cannot be applied to the most widely used cylindrical probe now. Szuszczewicz and Holmes [17] invented a pulsed Langmuir probe (PLP), which employs a discontinuous modulated sweep of pulses following a sawtooth envelope. They believe that this approach can obtain more accurate plasma parameters than heating or ion bombardment. These new Langmuir probes can reduce the contaminated layer's influence to a certain extent. However, the structure of restraining contamination increases the complexity of the probe and makes the design of the probe or driving circuit more complex. On the other hand, for the data collected by the contaminated Langmuir probe, Jiang et al. [10] proposed a new iterative algorithm by using a method of establishing an equivalent circuit, but the operation is relatively tedious, and it is difficult to expand the application due to the influence of plasma characteristics.
In the plasma field, researchers have begun to use neural networks to realize the intelligent machine diagnosis of plasma. Kawaguchi et al. [18] utilized machine learning to solve the Boltzmann equation of electrons to obtain an electron distribution function (EVDF) in weakly ionized plasmas. Compared with plasma diagnosis, this network tends to solve a mathematical problem. Churchill et al. [19] utilized convolutional neural networks for tokamak disruption prediction, which is a popular problem in tokamak devices. In the Tokamak, too, Guo et al. [20] realized a long short-term memory (LSTM) model on a large disruption warning database to predict the disruption. To be exact, these two networks are aimed at disruption prediction rather than plasma diagnosis, although the Tokamak device is an application in the field of plasma. In the diagnosis of dusty plasma, Ding et al. [21,22] used the multilayer perceptron (MLP), took the air pressure and voltage or air pressure and current of the discharge device as the input of the perceptron, and trained the network to predict N e or T e . However, the network trained by the above method depends on the device's characteristics and is difficult to apply to other discharge devices or aerospace environments. There are many ways of gas discharge that can produce plasma. What we want to achieve is a more universal method to realize plasma diagnosis. Thus, it is necessary to diagnose the plasma without the parameters of the discharge device, which requires the data collected by Langmuir probes to construct the network. This method can be applied to space plasma diagnosis in the future.

Traditional Diagnostic Theory and Problems
A Langmuir probe is a small electrode inserted into a plasma. When a scan voltage is applied to it, with the change in voltage, the electrode will absorb electrons or ions, causing current flow and forming an I-V curve [23].
In a nondrifting, collisionless, and nonmagnetized plasma, the representative I-V characteristic curve of cylindrical probes is shown in Figure 1. The curve is divided into three regions: ion saturation, electron retardation, and electron saturation. The dividing points are floating potential V f and plasma potential V p . According to orbital-motion-limited (OML) theory [8], the electron current (I e ), ion current (I i ), and Langmuir probe current (I LP ) collected by cylindrical, planar, and spherical probes are as follows: where N i is the ion density, A is the surface area of the probe, e is the electron charge, k B is Boltzmann's constant, T i is the ion temperature, m e is the electron mass, m i is the ion mass, and V B is the voltage applied by the probe. When the probe is planar, cylindrical, or spherical, the corresponding β values are 0, 0.5, and 1. The critical algorithm in Langmuir probe data processing is the fitting and inversion calculation of the I-V characteristic curve. This method includes the determination of V f and V p , obtains the ion saturation current and modifies I e , the logarithmic fitting of the electron retardation curve to derive T e , and obtains N e according to the value of I e at V p . The detailed process is as follows: 1.
Determine V f and V p . V f is the point where the current of the I-V characteristic curve is 0. At this point, I e is the same as I i , and the direction is opposite. V p is the potential of the plasma relative to the environment, which is the inflexion point of the I-V characteristic curve, that is, the dividing point between the electron retardation and the electron saturation region.

2.
Obtain saturated ion current at V B < V f − 4 k B T e e . Theoretically, the impact of I e on I LP at this point is less than 1%, which can be ignored. Then, the I e is derived by subtracting the ion saturation current from I LP .

3.
T e is derived by logarithmic fitting of a section in the electron retardation curve. It can be seen from Equation (1) that there is an exponential relationship between I e and V B in the electron retardation region. Find the logarithm of Equation (1) and simplify it to obtain the following equation.
ln(I e ) = e k B T e V + ln(CI e0 ) (6) where C = 2 √ π , I e0 = N e Ae k B T e 2πm e . We can obtain T e from the slope of Equation (6).

4.
Derive N e from Equation (7). When V B = V p , I e = I e0 , bring in the calculated T e and obtain N e .
Unlike the representative I-V characteristic curve, the influence of the contaminated layer on the probe surface and the edge effect or sheath of the low-density plasma will make the collected I-V characteristic curve deviate from the standard form, resulting in the failure of the traditional diagnosis method. The influence of the two factors on the diagnosis method is discussed below.

Contaminated Layer on the Probe Surface
Previous studies have shown that the contaminated layer will change the uniform potential distribution on the probe surface and skew the collected data, resulting in the wrong plasma parameters [2,10,11,16]. In addition, the adsorption process of neutral gas molecules to the probe in the atmosphere can be completed in as short as 1 s. Therefore, the data collected by the probe are affected to varying degrees by the contaminated layer. When the I-V characteristic curve is collected by the probe with serious contamination, the upward curve and downward curve cannot coincide, as shown by the red line in Figure 2.  Figure 2 shows the comparison of the curves collected by the clean probe (P clean ) and the contaminated probe (P cont ) in the same ambient plasma. P cont is a probe that has been exposed to the atmosphere. P clean is a probe that has been bombarded by ions with a voltage of −200 V.
This set of data comes from our previous experiments and is collected by the B2912A precision Source/Measure Unit (SMU) of KEYSIGHT [24]. The source and measurement resolution of this SMU can be as low as 10 fA and 100 nV [24]. All experimental data in this study come from this SMU.
In fact, because the experimental device cannot always maintain a high vacuum, there are inevitably gas molecules and water molecules in the cabin. Therefore, before each experiment, we need to clean the probe; otherwise, the curve is as shown in Figure 2.
We use the clean data and contaminated data shown in Figure 2 for plasma diagnosis, and the comparison of plasma parameters is shown in Table 1. Table 1. The comparison of plasma parameters from P clean and P cont .
The upward and downward curve of the P clean basically coincide, and the calculated N e and T e errors are within 2%. However, the errors of N e and T e obtained from the upward and downward curves of P cont are 11% and 80%, respectively. More importantly, in the same ambient plasma, the N e obtained by the clean probe and contaminated probe can be as much as two times different, and the T e error is more than 15%, too.
When using the contaminated curve shown in Figure 2 for plasma diagnosis, it will face the phenomenon that the inflexion points of the upward curve and the downward curve are inconsistent. This situation is caused by the contaminants of the probe rather than the change in the plasma, which makes the plasma parameters obtained by diagnosis inevitably have errors.

Underdense Plasma Diagnosis
Due to the relatively small surface area of the cylindrical Langmuir probe, when the density of the ambient plasma is low, the probe collects fewer electrons and ions, and the probe current is very weak, resulting in a low signal-to-noise ratio (SNR) of the collected signal, which increases the difficulty of the data fitting process.
In addition, the plasma with lower density has a larger Debye length and sheath width. As the voltage increases, the electron sheath around the probe gradually expands. The OML theory assumes that all electrons entering the probe sheath will be absorbed by the probe [8]. Due to the growth of the sheath, more electrons enter the sheath and the current collected by the probe increases, and the I-V characteristic curve has no obvious saturation point; that is, it is difficult to obtain the correct inflexion point from dI LP /dV B data, as shown in Figure 3. This I-V characteristic curve comes from the plasma of N e = 2.38 × 10 11 m −3 and T e = 0.3 eV.
For this I-V characteristic curve, the computer program cannot be used for automatic diagnosis, but experienced researchers are required to manually select the inflexion point and the range of the electron retardation region to carry out the next diagnosis program. However, this selection process has great randomness, resulting in significant errors in the diagnosis results. All of these can lead to distortions of the probe's I-V characteristic due to spatial inhomogeneities in the probe's contact potential with the plasma and in the collected current. Aiming at the problems exposed by traditional diagnosis methods in plasma diagnosis, we plan to implement the machine learning method to solve them.
In this paper, we first select a suitable network structure, then obtain a large amount of data for model training, and preliminarily adjust the parameters of the model. Then, the model is trained iteratively, and the accuracy of the network prediction results is continuously tested. In order to make the network use the contaminated data to obtain relatively accurate results, it is also necessary to optimize the network parameters. Eventually, the network should have the following characteristics:

1.
N e and T e can be obtained by using the relatively rough I-V characteristic curve; 2.
Plasma diagnosis can be realized by using the data collected by a certain degree of contaminated Langmuir probe; 3.
It can realize low temperature and low-density plasma diagnosis.

Principle of LSTM
It is a standard regression problem to predict N e and T e using I-V data collected by Langmuir probes. Many neural networks can achieve this function. Ding et al. [21,22] built an MLP network and predicted N e or T e by using the state parameters of the discharge device. MLP is a kind of lightweight neural network that is easy to use. However, considering that the data collected by the probe may be affected by the coupling of multiple factors, for example, for the data affected by the contaminated layer on the probe' surface to varying degrees, the network should have the memory ability to correct the output value well. At this time, MLP is not competent for such tasks, because of its internal structure (one-way propagation from the input layer to multiple hidden layers to the output layer). LSTM is a kind of recurrent neural network (RNN) with a special structure. Compared with the traditional RNN, the hidden layer unit in LSTM is a linear self-cyclic memory block, which contains three gate structures, which allows the gradient to pass through a long sequence, solves the vanishing gradient problem, and overcomes the shortcomings of the RNN model [25]. The basic structural unit of the LSTM network is shown in Figure 4. A cell of the LSTM network is mainly composed of three parts: forget gate, input gate, and output gate. The input data of LSTM are x t , h t − 1 , and C t − 1 , and the output data are h t , y t and C t , where h t = y t . The main function of the forget gate is to filter the data of the previous state, which determines how much old state information is retained. The calculation equation is shown as follows: where f t denotes the forgetting threshold at time t, σ is the sigmoid activation function, W f is the weight, h t − 1 is the output value at time t − 1, x t is the input value, and b f is the bias term.
The input gate is used to record the information to be saved in the current state. The input gate consists of two parts: the sigmoid layer to update the value and the tanh layer for generating a new state value C t . The output of the two layers is as follows: where i t is the input threshold at time t, W i and W c are the weights, and b i and b c are bias terms. To update the state at time t, the expression is as follows: The main function of the output gate is to calculate the output value and make the corresponding prediction results. The equation is described as follows: where o t is the output threshold at time t, h t is the output value of the cell at time t, W h is the weight, and b h is the bias term.
In this paper, LSTM is selected to predict N e and T e . In order to better mine the deep-seated internal relationship of the data, a two-layer LSTM network is used, but it is easy for this to cause the over-fitting. To avoid over-fitting, a random deactivation layer is added after each layer of the LSTM network. Finally, the full connection layer is added to the network, and the linear activation function is adopted, making the network output consistent with the label value. The output value of the network is compared with the actual value, and the Adam optimizer is used to update the parameters. The full flowchart of the LSTM model is illustrated in Figure 5.

Evaluation Indicators
In the process of model training, the accuracy is measured by the loss function. The loss function in this model adopts the mean square error (MSE) loss function, which is often used in the regression prediction model to calculate the loss between the predicted value and the actual value. The MSE loss can be calculated as follows: where y i ' is the predicted value, y i is the actual value, and n is the number of samples. After the model training, in order to better evaluate the performance of the model, this paper mainly uses the mean absolute percentage error (MAPE) as the evaluation index, which can be written as follows: To more intuitively reflect the prediction effect of the validation set during model training, the average accuracy Ac of a batch of data can be calculated based on MAPE. The Acc can be expressed as:

Experimental Setup and Results
The experimental preparation, data acquisition, and analysis of prediction results are described in this section.

Experiment Setup and Steps
The experiment was carried out in a plasma vacuum chamber. The vacuum chamber is a stainless-steel container with a length of 1 m and a diameter of 0.8 m, which can maintain a vacuum of 10 −5 Pa. The plasma source adopts the DC glow discharge. During discharge, argon is charged to form an argon plasma environment with a gradient density distribution in the cabin. The experimental setup is shown in Figure 6. In order to obtain the probe acquisition data of the large-scale continuous distribution of plasma density, the Langmuir probe is driven by the two-dimensional motor platform installed in the cabin for omnidirectional acquisition. The motor control system controls the probe to move in the X or Y direction, and the trigger source meter unit collects an I-V characteristic curve every 10 mm forward.
Under the same discharge environment (the same filament current and pressure), the plasma density in the cabin is roughly maintained at the same order of magnitude, but it shows a different density distribution with different distances from the plasma source. When it is necessary to greatly adjust the plasma density in the cabin, it is completed by adjusting the current of the discharge filament.
In order to verify that the neural network can reduce the impact of probe surface contamination on the diagnosis results to a certain extent, without changing the discharge environment (pressure, filament current, etc.), we successively use the probe with the contaminated layer (P cont ) and clean probe (P clean ) to collect the I-V characteristic curve at the same position and use the traditional diagnosis method and LSTM network to diagnose the two groups of data to compare the diagnosis results. The specific steps of the comparative experiment are as follows:

1.
Expose two identical materials and specifications of the Langmuir probe (P cont and P clean ) to the humid atmosphere for more than 24 h;

2.
Install P cont and P clean on the two-dimensional platform in the vacuum chamber and mark the distance between them and the central position; 3.
Heat the filament and charge argon to make the discharge process reach a steady state. Apply −200 V to P clean for ten minutes, and remove the contaminated layer on the probe surface by heating and attracting electrons to bombard the probe surface; 4.
Control the two-dimensional platform to move the two probes to the central position to collect the I-V characteristic curve, and ensure that the time interval between the two probe curves is within 1 min; 5.
The two groups of collected data are diagnosed and analyzed by the traditional diagnosis method and LSTM network, respectively, to compare the results.

Data Preprocessing
In order to obtain the data of the large distribution range of N e and T e , the Langmuir probe is driven by the two-dimensional motor platform to scan in the cabin under the discharge of filament currents 70 A, 80 A, and 85 A, respectively. Under each discharge condition, 2000 groups of I-V characteristic curves are collected. The scan voltage range of each group of curves is −10 to 10 V, the sampling interval is 0.1 V, and a group of data is the current value collected by the probe corresponding to 201 voltages. The whole set of data (201 current values) is used for the traditional method diagnosis. However, only 21 values (interval 1 V) are input into the LSTM network for training or prediction. Note that only the current value is fed into the network training, as the voltage value is fixed. When the probe runs near the bulkhead of the vacuum chamber, it will enter the sheath, resulting in the distortion of the data collected by the probe. Therefore, the dataset should be screened. The total number of data sets available after filtering is 5186 groups.
In addition, due to the large dimension difference of each dimension of data, to eliminate the difference in parameter dimension, data are normalized. The method adopted is max-min normalization, and each group of data is linearly scaled to [0,1], as shown in Equation (17).
where x' is the normalized value, and max(x) or min(x) is the maximum or minimum of the dimension where the data are located, respectively. The dataset is divided into three parts: training set, verification set, and test set according to the ratio of 3:1:1. The training set and verification set are used to train and optimize the model to prevent overfitting, and the test set is used to test the generalization ability of the model.

Network Parameter Setting
There are many adjustable parameters in a neural network, such as the number of neurons in each layer and the learning rate (η). By adjusting these parameters, the prediction accuracy of the model can be effectively improved. In this paper, N e is mainly used to adjust the parameters and seek the optimal structure.
The number and proportion of neuron nodes in each layer and the full connection layer of the LSTM network determine the structure of the network and the prediction results. To determine the optimal number and proportion of nodes in each layer, we extract 1000 groups of data in proportion under the data of filament currents 70 A, 80 A, and 85 A for training iteration and we test the accuracy of network prediction of N e , as shown in Figure 7. As shown in Figure 7, n 0 is the node base, n l1 , n l2 , and n f are the number of nodes in the first layer, the second layer, and the full connection layer of the LSTM network, respectively, and the legend is n 0 Rn l1 /n l2 /n f . For example, 20R10/5/1 denotes n 0 = 20, n l1 = 10 × n 0 = 200, n l2 = 5 × n 0 = 100, and n f = n 0 = 20. The neural networks of each structure show great differences in the early stage. Among them, the network with the 50R4/4/1 structure can achieve a higher accuracy faster, so the final network structure is n l1 = 200, n l2 = 200, and n f = 50.
Learning rate is an essential parameter in the Adam optimizer, which determines whether and when the objective function converges to the minimum. Too large a learning rate may lead to the oscillation of the objective function, which is difficult to converge, and too small a learning rate will reduce the convergence speed. Therefore, choosing an appropriate learning rate is also a crucial link to improving the model's accuracy. The learning rate is set to 0.0001, 0.00005, 0.00003, 0.00001, and 0.000005, and the corresponding results are shown in Table 2. As shown in Table 2, when the learning rate is 0.00001, the model shows high prediction accuracy. When the learning rate increases or decreases, the model accuracy decreases, and the learning rate is determined as 0.00001.

Test Results and Analysis
The whole dataset includes 5186 groups of experimental data, in which N e ranges from 10 12 to 10 14 m −3 and T e ranges from 19,000 to 60,000 K. It is divided into 4150 training sets (including 1/4 verification sets) and 1036 test sets. The N e and T e distribution of the dataset are shown in Figure 8. When the filament current (I filament ) is 80 A and 85 A, the maximum N e is 1.75 × 10 14 and 3.0 × 10 14 m −3 , respectively, and T e is mostly concentrated at 19,000 K-26,000 K. When the I filament is 70 A, the data distribution is different from the other cases. There are two main reasons for this: first, in order to make the T e distribution range more extensive, we adjust the voltage of the accelerating grid when the I filament is 70 A, so that the electrons emitted by the filament have a different kinetic energy and different T e . Compared with the case of higher plasma density, it is easier to adjust T e at low density. Another reason is that when the I filament is 70 A, the collected I-V characteristic curve is difficult to reach saturation, so the results derived by traditional diagnosis methods are not accurate enough. This reason is discussed in detail in the prediction result analysis of T e .
The test set is extracted proportionally from the I filament at 70 A, 80 A, and 85 A. The training and prediction results of N e are shown in Figure 9. The network learning situation for N e is shown in Figure 9a. With the increase in the number of iterations, the loss decreases rapidly and converges. Only about 50 iterations are needed, and the loss value decreases to the order of 10 −4 . At this time, the model achieves a good prediction effect. After each training iteration, the prediction ability of the verification set is tested. As shown in Figure 9b, the Acc has a local maximum before the tenth iteration and gradually rises and stabilizes after the tenth iteration. The networks reach the accuracy rate of 95% after 50 iterations. After completing the training and verification process of 100 iterations, 1036 groups of data from three groups are used to test the model, as shown in Figure 9c. Compared with the low-density plasma (10 12 m −3 ), the network has more minor prediction errors for N e with large density. Of course, this is not entirely due to the network. Due to the weak current collected by the probe at low-density (I filament = 70 A), the SNR of the collected signal is reduced, and it is easy for the I-V characteristic curve to be affected by the instrument, resulting in inaccurate collected data, and this error is irregular, which makes it difficult for the network to converge when using these data for training and to obtain accurate results when predicting. In addition, the data at low density are more vulnerable to sheath and edge charge effects.
The training and prediction results of T e by the network are shown in Figure 10. The training of T e in the LSTM network is more difficult than that of N e . At 50 iterations, the loss value reaches 0.01. After that, the loss decreases very slowly. After about 500 iterations, the loss can reach 0.005. For the verification set, such as the training process of N e , Acc reaches the maximum after experiencing a local maximum. At this time, the number of iterations is about 25. After that, Acc gradually decreases and stabilizes to 0.9 after about 200 iterations. Figure 10c shows that the adverse effects of instrument acquisition accuracy, and the sheath and edge effect on the data are more significant at low density. However, the main reason for the poor accuracy of network prediction in the underdense plasma environment is that the calculation results of T e by traditional diagnostic methods are relatively inaccurate. In Section 2, there is a detailed T e calculation process in which obtaining T e requires the logarithmic fitting of the electron retardation region of the I-V characteristic curve. Theoretically, the relationship between the I e in the electron retardation region and the V B increases exponentially, and the logarithmic relationship is linear. However, in the underdense plasma, the electron retardation region of the I-V characteristic curve is relatively wide, and it is difficult to reach saturation. The I-V source data are more linear than the exponential function in the retardation region (as shown in Figure 3). The retardation curve presents a nonlinear shape (logarithmic function) after the logarithmic operation, so its slope changes significantly. Which section is selected for linear fitting has a large influence on T e , so the actual value curve of T e appears very unsmooth at low density. Theoretically, because the probe is continuously collected in space, the value of T e will not change suddenly, but the limitation of traditional diagnosis methods determines that the calculation result of T e has a significant error. Therefore, it is difficult to use such data to train and predict the network.

Effect of Eliminating Contamination
We also test the prediction results of the network with the data collected by the contaminated probe (P cont ) after the network training. A total of seven groups of I-V characteristic curves collected by P cont and P clean are used as raw data to diagnose the plasma by using traditional diagnosis methods and the LSTM network, respectively. The comparison results and errors of N e and T e are shown in Figure 11 and Table 3, respectively.  As shown in Figure 11a, for N e , the data collected by P cont or P clean are collected as the source data, and the results obtained by traditional diagnosis methods are significantly different. Table 3 shows that the absolute average error of seven groups of data reaches 40.33%. The reason is obvious. The absorption capacity of the probe with a contaminated layer for electrons is greatly weakened, resulting in the reduction in the collected electron saturation current and the calculated N e . The diagnosis results obtained using the LSTM network as the diagnosis method and the data collected by P cont as the input are shown in the curve marked "LSTM" in Figure 11. It can be seen that using the machine learning method for plasma diagnosis can still obtain more accurate results even when there is a certain contaminated layer on the probe surface. The average absolute error of N e obtained by the LSTM network is reduced from 40.33% to 10.69%. In other words, the LSTM plasma diagnosis network has the effect of partially compensating for the contamination on the probe surface. Figure 11b shows the diagnostic results of T e . Similar to N e , the LSTM network has a certain compensation effect for the probe surface's contamination; it can reduce the absolute average error from 14.81% to 5.05%. Unlike N e , the data collected by P cont are used as the source data, and the results derived by traditional diagnostic methods are relatively large compared with the actual value because the contaminated layer may change the work function of the material of the probe itself. In addition, when using the traditional method for plasma diagnosis, we first calculate T e and then combine the electron saturation current to derive N e . From Equation (7), it can be seen that the small N e is also partly due to the sizeable diagnostic result of T e . Compared with N e , the consistency of LSTM's results of T e is low. For example, in groups 1, 4, and 7, the LSTM network diagnosis results are close to the actual value, but in group 6, although the error caused by the contaminated layer can be partially compensated, there is still a 14.54% error with the actual value. The third group of data is unusual. The LSTM prediction results do not only reduce the error, but also increase the error from 5.70% to −9.98%. This problem seems to need more T e data with a larger distribution range to train the network.
Another advantage of plasma diagnosis using the LSTM network is that there is no coupling process in the diagnosis process of N e and T e ; that is, their acquisition is relatively independent. In this way, it avoids the large influence on the result of N e due to the significant error of T e , which is common in traditional diagnosis methods.

Conclusions
In this paper, we use the machine learning method to train an LSTM network based on Langmuir probe data to diagnose plasma. This network is separated from the gas discharge device and has more robust applicability. Compared with the traditional diagnosis method based on OML theory, the LSTM network only needs to input 1/10 data points, which significantly reduces the error caused by "human factors" in the traditional diagnosis method and has a certain compensation effect on the contamination of the probe surface. The training and test data of the network are from the experimental data of the plasma vacuum chamber. On the one hand, through the coverage experiment of an extensive density range, it is proved that after about 50 to 200 iterations, the network can obtain more than 95% prediction accuracy of N e and more than 90% prediction accuracy of T e respectively. On the other hand, through the contrast experiments designed separately, compared with the traditional diagnosis method, the LSTM network can reduce the electron density error caused by contamination from 40.33% to 10.69%, and the electron temperature error from 14.81% to 5.05%. That is, the LSTM network can obtain relatively accurate results even using the data obtained by the probe whose surface is partially contaminated.
This network is lightweight and requires less input data than traditional diagnosis methods do. In the future, it can be applied to ionospheric plasma diagnosis by carrying satellites, which can greatly save downlink data and improve the spatial resolution of ionospheric detection.
Although there is little research on the application of the machine learning method to plasma diagnosis, machine learning is a very flexible tool and is suitable for the diagnosis of various plasmas. In our next research, we will further optimize the LSTM network proposed in this paper to improve its prediction accuracy, especially for T e . In addition, we are also ready to apply the machine learning method to other plasmas, such as magnetron plasma or ionospheric plasma.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author upon reasonable request.