A Terahertz Identification Method for Internal Interface Structures of Polymers Based on the Long Short-Term Memory Classification Network

Polymers are used widely in the power system as insulating materials and are essential to the power grid’s security and stability. However, various insulation defects may occur in the polymer., which can lead to severe insulation accidents. Terahertz (THz) detection is a novel non-destructive testing (NDT) method that is able to detect the interface structures inside polymers. The large quantity of information in the THz waveform has potential for the identification of interface types, and the long short-term memory (LSTM) network is one of the most popular artificial intelligence methods for time series data like THz waveform. In this paper, the LSTM classification network was used to identify the internal interfaces of the polymer with the reflected THz pulses of the internal interfaces. The experiment verified that it is feasible to identify and image the void interfaces and impurity interfaces in the polymer using the proposed method.


Introduction
As the scale of the power grid continues to grow, more and more insulation equipment is being used in the power grid [1]. However, during manufacturing, transportation, installation, and operation, many defects caused by human or environmental factors may occur in the insulating material [2][3][4]. These defects can cause continuous degradation of the insulation performance and eventually lead to insulation accidents, resulting in economic losses and human casualties [5,6]. Therefore, in practical applications, there is an urgent need for effective non-destructive testing (NDT) methods for internal defects.
The commonly used NDT methods have their limitations. For example, ultrasonic testing requires a coupling agent for contact detection and has high demands on the ability of operators [7]. Infrared detection can only be applied for detecting the defects that cause external temperature changes and is highly influenced by environmental conditions [8]. Microwave inspection is also a contacting method with very few feature parameters, and can only qualitatively determine the presence of defects [9]. X-ray is harmful to humans, and its strong penetration ability makes it easy to directly pass through the defect, resulting in missed detection [10].
Terahertz (THz) technology is an emerging NDT method, in which the electromagnetic waves in the THz band (0.1-10 THz) are used to achieve a non-contact, safe, and efficient NDT. It has been widely used in fields such as security screening [11][12][13], biomedicine [14][15][16], and aerospace [17,18]. Meanwhile, using the property that THz waves can detect the internal interface structures of polymers, it has also been applied in the detection of power insulation defects [19][20][21]. However, most of the existing studies focus on detecting the location, size, and shape of the internal interface using macroscopic features of THz waveform (such as peak value, time delay, and other parameters). Very few studies focus on using the microscopic features of pulses in THz waves to identify the type of the interface, which is also very meaningful for the detection. The long short-term memory (LSTM) network is one of the most common current artificial intelligence methods for processing time-series data [22,23]. It has been proven to be suitable for the THz time domain waveform data [24]. Therefore, the LSTM network has great potential for interface identification in polymers.
In this paper, an internal interface identification method is proposed for polymer detection, which is based on THz waves and LSTM classification networks. Three different artificial interface samples were made to collect THz waveform data to train the LSTM. Then, the trained LSTM was used to identify and image the internal interfaces in other samples to test its performance.
The rest of the paper is organized as follows: Section 2 presents the theoretical background of THz detection and LSTM networks. Section 3 describes the experimental system and samples. The experimental results and discussions are reported in Section 4. Section 5 is the conclusion.

Terahertz Detection Theory
Assuming that a THz wave is incident vertically from medium 1 to medium 2, the refraction coefficient and the reflection coefficient at the interface "1-2" can be expressed as [25] where n 1 (ω) and n 2 (ω) represent complex refractive indices of the medium 1 and medium 2, respectively. Many insulating materials are polymers that are translucent to THz waves [26]. Thus, THz waves can propagate through insulating materials and be reflected by internal interfaces such as the defect structure. Figure 1 shows this process. A THz pulse E 0 is incident into the sample, when it encounters an internal interface structure, a reflected pulse E ri will be generated, which can be expressed as follows [27]: where ω is the wave frequency. a, b, and c represent the air, sample material, and the internal interface, respectively. d a and d b are the propagating distances of THz wave in the air and the sample material, respectively. The propagation coefficient p represents the amplitude attenuation and phase change of the THz wave when propagating in the medium, which can be described as follows [27] where c is the vacuum speed of light. According to Equations (1)-(4), the reflected wave E ri contains information about the properties of the interface structure. In turn, it is theoretically feasible to identify the interface type by the reflected wave. dium 2, respectively.
Many insulating materials are polymers that are translucent to THz waves [26]. Thus, THz waves can propagate through insulating materials and be reflected by internal interfaces such as the defect structure. Figure 1 shows this process. A THz pulse is incident into the sample, when it encounters an internal interface structure, a reflected pulse will be generated, which can be expressed as follows [27]:

LSTM Theory
LSTM is a model structure of recurrent neural networks (RNN). In contrast with the conventional RNN, three gates are placed in each LSTM unit, including an input gate, an output gate, and a forget gate. Based on this special structure, LSTM effectively overcomes the vanishing gradient problem, which greatly limits classic RNNs [28]. A typical LSTM structure is shown in Figure 2.
where is the wave frequency. a, b, and c represent the air, sample material, and internal interface, respectively. and are the propagating distances of THz wav the air and the sample material, respectively. The propagation coefficient repres the amplitude attenuation and phase change of the THz wave when propagating in medium, which can be described as follows [27] where is the vacuum speed of light. According to Equations (1)-(4), the reflected wave contains information ab the properties of the interface structure. In turn, it is theoretically feasible to identify interface type by the reflected wave.

LSTM Theory
LSTM is a model structure of recurrent neural networks (RNN). In contrast with conventional RNN, three gates are placed in each LSTM unit, including an input gate output gate, and a forget gate. Based on this special structure, LSTM effectively overco the vanishing gradient problem, which greatly limits classic RNNs [28]. A typical LS structure is shown in Figure 2. As shown in Figure 2b, the forget gate output can be expressed as where represents the weight matrix of the forget gate, which is to be changed du training, ℎ is the output vector of the last unit, is the input vector of the cur unit, is the bias vector, which will also be changed in the training process, and σ resents the sigmoid function, which can be expressed as From Equation (6), it can be seen that each number in is between 0 and 1, wh represents the degree of forgetting for this part of information. Therefore, the forget determines which part of the information to be kept and which to be rid of.
The input gate determines which information will be used to update the cell s vector . The input gate equations are: As shown in Figure 2b, the forget gate output f t can be expressed as where W f represents the weight matrix of the forget gate, which is to be changed during training, h t−1 is the output vector of the last unit, x t is the input vector of the current unit, b f is the bias vector, which will also be changed in the training process, and σ represents the sigmoid function, which can be expressed as From Equation (6), it can be seen that each number in f t is between 0 and 1, which represents the degree of forgetting for this part of information. Therefore, the forget gate determines which part of the information to be kept and which to be rid of.
The input gate determines which information will be used to update the cell state vector C t . The input gate equations are: Polymers 2022, 14, 2611 4 of 13 where W i and W C are the weight matrices of input gate and cell state, respectively. b i and b c are the biases of the input gate and cell state, respectively. i t is the output vector of the input gate, C t is the cell input activation vector, C t is the cell state vector, and represents the element-wise product.
Finally, the output gate determines what information will be output by the following equations: where o t is the output vector of the output gate, h t is the output vector of the LSTM unit, and W o and b o are the weight matrix and bias of output gate, respectively.

Experimental Samples
Various interfaces can be inside insulating polymers. Void interfaces are very common in power insulation equipment, which can lead to a severe degradation of insulation performance [29]. When internal defects have developed to a certain degree, partial breakdown may occur, which can result in carbonization channels [30,31]. In addition, metal interfaces can also occur in insulation equipment, such as the metal sheaths of high-voltage cables [32], as well as metal impurities that intrude into the material during manufacture and operation [33,34]. Therefore, in this paper, three artificial interface samples were made to simulate the internal interfaces of the polymers, including the void interface, the carbonized interface, and the metal interface. Polyethylene (PE) was used as the insulating polymer. The artificial interface samples used to collect training data for LSTM are shown in Figure 3.
where and are the weight matrices of input gate and cell state, respectively. and are the biases of the input gate and cell state, respectively. is the output vector of the input gate, is the cell input activation vector, is the cell state vector, and ⨀ represents the element-wise product.
Finally, the output gate determines what information will be output by the following equations: where is the output vector of the output gate, ℎ is the output vector of the LSTM unit, and and are the weight matrix and bias of output gate, respectively.

Experimental Samples
Various interfaces can be inside insulating polymers. Void interfaces are very common in power insulation equipment, which can lead to a severe degradation of insulation performance [29]. When internal defects have developed to a certain degree, partial breakdown may occur, which can result in carbonization channels [30,31]. In addition, metal interfaces can also occur in insulation equipment, such as the metal sheaths of high-voltage cables [32], as well as metal impurities that intrude into the material during manufacture and operation [33,34]. Therefore, in this paper, three artificial interface samples were made to simulate the internal interfaces of the polymers, including the void interface, the carbonized interface, and the metal interface. Polyethylene (PE) was used as the insulating polymer. The artificial interface samples used to collect training data for LSTM are shown in Figure 3.  As shown in Figure 3, the size of the PE plate was 100 mm × 100 mm × 4 mm. For the artificial void interface sample, a resin holder with a size of 100 mm × 100 mm × 2 mm was placed between the upper and lower PE plates, which had a void area of 80 mm × 80 mm in its center. They were stacked on top of each other to simulate the upper surface ("polymer−air" interface (PA)) and the lower surface ("air−polymer" interface (AP)) of the void. To simulate metal and carbonized interfaces in the polymer, an aluminum film and a graphite film were placed between two PE plates, respectively. Different from the void interface, THz waves cannot penetrate metal and carbonized interfaces. Thus, these two artificial samples are only with the "polymer−metal" (PM) interface and the "polymer−carbonization" (PC) interface, respectively.
In addition, another PE sample was used in this paper to test the performance of the trained LSTM, and the size and shape of the sample are shown in Figure 4a. The sample was a PE block with five holes, which can simulate voids at different depths in the polymer. As shown in Figure 4b, by inserting aluminum rods and graphite rods in the holes, the metal interfaces and the carbonized interfaces in the polymer can also be simulated.
Polymers 2022, 14, x FOR PEER REVIEW 5 As shown in Figure 3, the size of the PE plate was 100 mm×100 mm×4 mm. Fo artificial void interface sample, a resin holder with a size of 100 mm × 100 mm × 2 mm placed between the upper and lower PE plates, which had a void area of 80 mm × 80 in its center. They were stacked on top of each other to simulate the upper surface (" ymer−air" interface (PA)) and the lower surface ("air−polymer" interface (AP)) of the v To simulate metal and carbonized interfaces in the polymer, an aluminum film a graphite film were placed between two PE plates, respectively. Different from the interface, THz waves cannot penetrate metal and carbonized interfaces. Thus, these artificial samples are only with the "polymer−metal" (PM) interface and the "p mer−carbonization" (PC) interface, respectively.
In addition, another PE sample was used in this paper to test the performance o trained LSTM, and the size and shape of the sample are shown in Figure 4a. The sam was a PE block with five holes, which can simulate voids at different depths in the p mer. As shown in Figure 4b, by inserting aluminum rods and graphite rods in the h the metal interfaces and the carbonized interfaces in the polymer can also be simulat

THz System
In this paper, QT-TO1000 from Quenda Technology Ltd., China, was used as the time domain spectroscopy system, which can emit and receive electromagnetic wav 0.1-3.5 THz. Its maximum scanning area is up to 100 mm × 100 mm with a step leng 0.3 mm, and its scanning speed is up to 60 pixels per second. As shown in Figure 5 whole system includes a THz probe, a motion system, and a PC. The THz probe is to emit and receive the THz waves, and the motion platform is driven by three moto X, Y, and Z directions, respectively. Both of them are connected to the PC, which con the motion platform and collects the THz waves' data to realize the scanning of samp

THz System
In this paper, QT-TO1000 from Quenda Technology Ltd., China, was used as the THz time domain spectroscopy system, which can emit and receive electromagnetic waves of 0.1-3.5 THz. Its maximum scanning area is up to 100 mm × 100 mm with a step length of 0.3 mm, and its scanning speed is up to 60 pixels per second. As shown in Figure 5, the whole system includes a THz probe, a motion system, and a PC. The THz probe is used to emit and receive the THz waves, and the motion platform is driven by three motors in X, Y, and Z directions, respectively. Both of them are connected to the PC, which controls the motion platform and collects the THz waves' data to realize the scanning of samples.

LSTM Training
First, waveform data were collected from artificial interface samples for the training of LSTM. The sample was placed on the motion platform. Before collecting data, the height of the platform was adjusted to make sure that the artificial interface was near the focus of the THz beam, in which case the reflected pulse amplitude would reach the maximum.

LSTM Training
First, waveform data were collected from artificial interface samples for the training of LSTM. The sample was placed on the motion platform. Before collecting data, the height of the platform was adjusted to make sure that the artificial interface was near the focus of the THz beam, in which case the reflected pulse amplitude would reach the maximum. A 50 mm × 50 mm area in the center of three artificial interface samples was scanned, separately, in a step length of 1 mm. Then, the reflected waveform data of the artificial interface samples could be collected. Typical reflected pulse waveforms are shown in Figure 6.

LSTM Training
First, waveform data were collected from artificial interface samples for the training of LSTM. The sample was placed on the motion platform. Before collecting data, the height of the platform was adjusted to make sure that the artificial interface was near the focus of the THz beam, in which case the reflected pulse amplitude would reach the maximum.  After that, the pulse data of the artificial interfaces were extracted from the scanned waveforms. As shown in Figure 7, t 1 and t 2 are the start time and end time of the pulse, respectively, and t p is the absolute maximum time (positive or negative peak) of the pulse. The positions of t 1 and t 2 can be determined by t p , which can be expressed as where ∆t is a random perturbation term, which made the training data more diverse so as to improve the generalization capability of the network. After that, the pulse data of the artificial interfaces were extracted from the scanned waveforms. As shown in Figure 7, and are the start time and end time of the pulse, respectively, and is the absolute maximum time (positive or negative peak) of the pulse. The positions of and can be determined by , which can be expressed as where ∆ is a random perturbation term, which made the training data more diverse so as to improve the generalization capability of the network. Then, the pulse data sets of the artificial PA interface ( ) ( = 1,2, . . . ,2500), artificial AP interface ( ), artificial PM interface ( ), and artificial PC interface ( ) can be obtained.
Given that the interfaces inside the actual polymers are usually not exactly at the beam focus, and the amplitude of their reflected pulse will probably decrease. Therefore, the amplitude of the training data should also be randomly reduced to enhance the performance of the trained network. The random reduction process can be expressed as where ( ), ( ), ( ), and ( ) are the random reduction factors of the four artificial interfaces, respectively. As a pulse with a too small amplitude is hard to distinguish from the background noise, a threshold value should be set. All of the valid pulses must have an absolute maximum larger than . In this paper, = 100. As shown in Figure 6, different interfaces have different pulse amplitudes. Therefore, the random reduction factors should have different lower limits to ensure all pulses in the training data are valid. Finally, the training set ( ) = [ ( ), ( ), ( ), ( )] can be obtained.
The training set was divided into two parts. One part was used to train the LSTM with 2250 pulse data for each type of interface, another one was as the validation set to test the accuracy of the trained LSTM. The LSTM network was established and was trained by the training set. The input dimension of LSTM was 300, which was the same as the length of pulse data. The LSTM layer was the bi-directional structure with 100 hidden units. The maximum number of iterations was 1200, the learning rate was 0.01, and the gradient threshold was 1.
The training results are shown in Figure 8. It can be seen that this LSTM has a good classification performance for the artificial interfaces. Then, the pulse data sets of the artificial PA interface P PA (i) (i = 1, 2, . . . , 2500), artificial AP interface P AP (i), artificial PM interface P PM (i), and artificial PC interface P PC (i) can be obtained.
Given that the interfaces inside the actual polymers are usually not exactly at the beam focus, and the amplitude of their reflected pulse will probably decrease. Therefore, the amplitude of the training data should also be randomly reduced to enhance the performance of the trained network. The random reduction process can be expressed as where f AP (i), f PA (i), f PM (i), and f PC (i) are the random reduction factors of the four artificial interfaces, respectively. As a pulse with a too small amplitude is hard to distinguish from the background noise, a threshold value V th should be set. All of the valid pulses must have an absolute maximum larger than V th . In this paper, V th = 100. As shown in Figure 6, different interfaces have different pulse amplitudes. Therefore, the random reduction factors should have different lower limits to ensure all pulses in the training data are valid. Finally, the training set The training set was divided into two parts. One part was used to train the LSTM with 2250 pulse data for each type of interface, another one was as the validation set to test the accuracy of the trained LSTM. The LSTM network was established and was trained by the training set. The input dimension of LSTM was 300, which was the same as the length of pulse data. The LSTM layer was the bi-directional structure with 100 hidden units. The maximum number of iterations was 1200, the learning rate was 0.01, and the gradient threshold was 1.
The training results are shown in Figure 8. It can be seen that this LSTM has a good classification performance for the artificial interfaces.

3D Identification Imaging Test
The holes in the PE block can be imaged in 3D as follows: After the PE block sam was scanned, all the valid pulses in the scanning waveforms data were extracted by rule of Equation (13) with ∆ = 0, and = 100. Each pulse corresponded to a point a certain interface. According to basic optics laws, its location can be calculated by is the spatial coordinates of the point, and are the numbers of sc ning steps in the X direction and the Y direction, respectively, is the scanning s length, is the vacuum light speed, and is the refraction index of PE. Depend on the composition and wave frequency, is in a range of 1.51-1.54 [35]. In this pap = 1.53. As the THz probe was above the sample as shown in Figure 5, the z-axis co dinate of each point should be negative.
After that, the pulse data are classified by the trained LSTM, and each point will labeled with a certain interface class. Thus, 3D identification imaging can be realized.
As the attenuation of propagating in material, deviation from the focus, and the no flat interface structures, the interfaces inside the PE block sample are more complica and more similar to an actual situation. Therefore, the PE block sample can be used to t the feasibility of 3D identification imaging with trained LSTM in a relatively actual sit tion. The 3D identification imaging results of the PE block sample of voids are shown Figure 9.

3D Identification Imaging Test
The holes in the PE block can be imaged in 3D as follows: After the PE block sample was scanned, all the valid pulses in the scanning waveforms data were extracted by the rule of Equation (13) with ∆t = 0, and V th = 100. Each pulse corresponded to a point of a certain interface. According to basic optics laws, its location can be calculated by where (x, y, z) is the spatial coordinates of the point, j and k are the numbers of scanning steps in the X direction and the Y direction, respectively, d is the scanning step length, c 0 is the vacuum light speed, and n PE is the refraction index of PE. Depending on the composition and wave frequency, n PE is in a range of 1.51-1.54 [35]. In this paper, n PE = 1.53. As the THz probe was above the sample as shown in Figure 5, the z-axis coordinate of each point should be negative. After that, the pulse data are classified by the trained LSTM, and each point will be labeled with a certain interface class. Thus, 3D identification imaging can be realized.
As the attenuation of propagating in material, deviation from the focus, and the nonflat interface structures, the interfaces inside the PE block sample are more complicated and more similar to an actual situation. Therefore, the PE block sample can be used to test the feasibility of 3D identification imaging with trained LSTM in a relatively actual situation. The 3D identification imaging results of the PE block sample of voids are shown in Figure 9.
As it can be seen, the upper surfaces of the voids are identified better than the lower surfaces. Let I-V represent the five holes at depths from 2 mm to 10 mm, respectively. The number of different points in the result of the PE block sample of voids is listed in Table 1.
From Table 1, the PA interfaces are identified well at all depths. Compared with the PA interfaces, both the correct identification rates and the numbers of points of the AP interfaces are lower. However, for AP interfaces I-IV, the AP points are still the most. For void V, the identification performance of the AP interface is poor. The reason could be that the interface was greatly out of focus, which caused strong attenuation of the THz waves. It also can be seen that the number of all points varied greatly with depth. Therefore, the depth could greatly influence the imaging performance of the voids.
The PE block samples of the metal and carbonization were also imaged in the same way, respectively. The results are shown in Figure 10 and Tables 2 and 3. As it can be seen, the upper surfaces of the voids are identified better than the lower surfaces. Let I-V represent the five holes at depths from 2 mm to 10 mm, respectively. The number of different points in the result of the PE block sample of voids is listed in Table  1. From Table 1, the PA interfaces are identified well at all depths. Compared with the PA interfaces, both the correct identification rates and the numbers of points of the AP interfaces are lower. However, for AP interfaces I-IV, the AP points are still the most. For void V, the identification performance of the AP interface is poor. The reason could be that the interface was greatly out of focus, which caused strong attenuation of the THz waves. It also can be seen that the number of all points varied greatly with depth. Therefore, the depth could greatly influence the imaging performance of the voids.
The PE block samples of the metal and carbonization were also imaged in the same way, respectively. The results are shown in Figure 10 and Tables 2 and 3.    As it can be seen, compared with voids, the depth does not affect the imaging performance much for the PM interfaces and the PC interfaces. On the other hand, the PM interfaces and the PC interfaces tend to be confused, which is different from the results of the training. In particular, the PC interfaces are easily misidentified as the PM interfaces. However, both the PM and the PC interfaces are less likely to be identified as the PA interfaces or the AP interfaces. Therefore, the PM interfaces and the PC interfaces can be combined as "polymer−impurity" (PI) interfaces to solve the confusion problem. As both the metal and carbonization are solid impurities, this method is still meaningful in practical application. The imaging results with three classes are shown in Figure 11. After combination, the average correct identification rates for PE block samples of metal and carbonization are up to 90.95% and 87.28%, respectively.  As it can be seen, compared with voids, the depth does not affect the imaging performance much for the PM interfaces and the PC interfaces. On the other hand, the PM interfaces and the PC interfaces tend to be confused, which is different from the results of the training. In particular, the PC interfaces are easily misidentified as the PM interfaces. However, both the PM and the PC interfaces are less likely to be identified as the PA interfaces or the AP interfaces. Therefore, the PM interfaces and the PC interfaces can be combined as "polymer−impurity" (PI) interfaces to solve the confusion problem. As both the metal and carbonization are solid impurities, this method is still meaningful in practical application. The imaging results with three classes are shown in Figure 11. After combination, the average correct identification rates for PE block samples of metal and carbonization are up to 90.95% and 87.28%, respectively. Polymers 2022, 14, x FOR PEER REVIEW 11 of 13 (a) (b) Figure 11. 3D identification imaging results with three classes of interface: (a) PE block sample of metal; (b) PE block sample of carbonization.

Conclusions
In this paper, a method based on the THz waves and LSTM classification network was proposed to identify the interface structures inside polymers. LSTM learns from the artificial interface samples firstly, then the trained LSTM is used to identify the internal interfaces from other samples to achieve 3D identification imaging. The experiment results showed that the proposed method could identify the voids interfaces well, while the metal and carbonization interfaces tended to be confused. By combining the metal and the carbonization interfaces as the impurity interface, the identification results were satisfactory.

Conclusions
In this paper, a method based on the THz waves and LSTM classification network was proposed to identify the interface structures inside polymers. LSTM learns from the artificial interface samples firstly, then the trained LSTM is used to identify the internal interfaces from other samples to achieve 3D identification imaging. The experiment results showed that the proposed method could identify the voids interfaces well, while the metal and carbonization interfaces tended to be confused. By combining the metal and the carbonization interfaces as the impurity interface, the identification results were satisfactory.