Cutting Pattern Identification for Coal Mining Shearer through Sound Signals Based on a Convolutional Neural Network

Recently, sound-based diagnosis systems have been given much attention in many fields due to the advantages of their simple structure, non-touching measurement style, and low-power dissipation. In order to improve the efficiency of coal production and the safety of the coal mining process, accurate information is always essential. It is indicated that the sound signal produced during the cutting process of the coal mining shearer contains much cutting pattern identification information. In this paper, the original acoustic signal is first collected through an industrial microphone. To analyze the signal deeply, an adaptive Hilbert–Huang transform (HHT) was applied to decompose the sound to several intrinsic mode functions (IMFs) to subsequently acquire 1024 Hilbert marginal spectrum points. The 1024 time-frequency nodes were reorganized as a 32 × 32 feature map. Moreover, the LeNet-5 convolutional neural network (CNN), with three convolution layers and two sub-sampling layers, was used as the cutting pattern recognizer. A simulation example, with 10,000 training samples and 2000 testing samples, was conducted to prove the effectiveness of the proposed method. Finally, 1971 testing sound series were recognized accurately through the trained CNN and the proposed method achieved an identification rate of 98.55%.


Introduction
In order to improve the utilization of coal resources, the shearer drum is controlled to move close to the interface of the coal seam and rock as much as possible [1].Since the 1960s, cutting pattern recognition, defined as whether the shearer drum is cutting coal bed, rock bed, or the coal bed mixed with gangue, has been widely researched in many coal producing countries.Selection of the source signal is a key factor for the performance of the cutting pattern recognition system.Among these methods, γ-ray detection [2], infrared-ray detection [3], image identification [4], and vibration analysis [5] have mostly been researched in the past.However, none of them is applied in practice on a large scale due to its huge size, contact measurement, and frequent maintenance.Another important source signal, which is the cutting sound produced by the impact between the shearer drum and the coal-rock, has prompted wide attention recently.The acoustic-based system has obvious advantages due to its compact structure, non-touching measurement style, and convenient maintenance [6].Therefore, it is widely applied in fault diagnosis [7,8], target detection [9], feature extraction [10], and so on.
Unfortunately, the original cutting signal acquired from the coal mining field is always nonlinear, nonstationary, and discontinuous.It is an exceedingly difficult problem to extract key information from the signal.Thus, a powerful signal process method is one of the keys to settling the tough matter.However, typical sound signal analysis approaches such as short-time Fourier transform (STFT), wavelet transform (WT), and wavelet packet transform (WPT) are inappropriate to treat the cutting sound.Due to the characteristics of strong nonlinearity and nonstationarity, STFT is unable to play an effective role on the signal due to the Dirichlet condition and Heisenberg uncertainty principle [11,12].Similarly, the WT and WPT do not work on interval cutting sound signal due to the fixed wavelet basis [13].In 1998, an adaptive decomposing method, named the Hilbert-Huang transform (HHT), was established by National Aeronautics and Space Administration (NASA) [14].The HHT is composed of empirical mode decomposition (EMD), proposed by Huang and Wu in 2008, and the Hilbert transform (HT) [15].As the basis is adaptive, the HHT is not affected by the restrictions of previous approaches and becomes an attractive tool to find faults in diagnosis [16], speech recognition [17], signal denoising [18], pattern recognition [19], forecasting [20], and so on.
After decomposing the original sound into a series of time-frequency characteristics, many researchers adopted some feature extractors to reduce the dimension and eliminate redundant information.The effectiveness of feature extracting is a key factor in the success of the recognition process.The extractor should distinguish characteristics of different classes and reserve identical features within the same class as much as possible [21,22].On the other hand, appropriate features must be determined as the basis of the classifier first to improve the success rate.However, because the type, number, and weight of different feature always need a priori knowledge, it is difficult to have a common approach [23,24].Moreover, the structure of the feature extractor is often difficult to fit the specific problem properly and is always a tedious and time-consuming task.Another efficient solution is combining the extractor and classifier together.Furthermore, the most representative method is deep learning.A deep learning approach describes information through simulating the human brain and is an important branch of machine learning.Typical algorithms, such as a convolutional neural network (CNN), recurrent neural network (RNN), deep belief network (DBN), and so on, contain multiple nonlinear hidden layers to conduct supervised or unsupervised feature extraction, pattern recognition, and classification [25].CNN is a type of feed-forward artificial neural network with shared weights and local connections [26].As a kind of supervised algorithm, CNN is widely applied in image detection, speech recognition, handwritten digits identification, and so on.
Inspired by the above background research, the authors of this paper aim to propose a cutting pattern recognition method for a coal mining shearer through the cutting sound.The sound is decomposed using the adaptive HHT and recognized by the deep learning CNN.The rest of this paper is organized as follows.Some related works are summarized according to recent literature in Section 2. In Section 3, the principle of HHT and the recognition process of CNN are presented.In Section 4, the combination of HHT and CNN is performed, and the cutting pattern recognition system based on HHT-CNN is elaborated.In Section 5, a simulation with 10,000 training samples and 2000 testing samples was performed to validate the effectiveness of the proposed method.Finally, some conclusions and future works are outlined in Section 6.

Literature Review
Recent research related to this paper can be divided into two research aspects: the Hilbert-Huang transform and convolutional neural networks.

Hilbert-Huang Transform
The HHT is the time series being decomposed in the time and frequency domains by integrating the EMD with the HT.Compared with the FFT and WT analysis by which a set of basis functions of constant amplitude is applied to describe each existing frequency component in the signal, the HHT scheme is given by the instantaneous frequency analysis that results from the HT of the signal [15].Since it was established, HHT has been widely used in fault diagnosis, speech recognition, signal denoising, pattern recognition, forecasting, and so on.In order to identify the flow regime in a gas-solid two-phase flow system, a new methodology uses the artificial neural network (ANN) scheme and HHT proposed in Reference [27].The electrostatic fluctuation signal was processed through HHT to obtain the Hilbert marginal spectrum, and four characteristics extracted from the spectra were treated as the input of ANN for recognition.In order to diagnose engine faults intelligently, Wang et al. proposed a comprehensive method based on HHT and a support vector machine (SVM).Seven IMFs decomposed from the EMD, the maximum value of HHT marginal spectrum, and its corresponding frequency component are extracted as the features, and the accuracy of the system was more than 90% [28].In Reference [19], the authors proposed an acoustic emission pattern recognition approach based on a smoothed presentation of the Hilbert spectrum to monitor the structural health in polymer composite materials.Although the HHT has been widely applied in various fields, there still existed some problems, among which undesirable IMF components during the EMD process received increasing attention.Undesirable IMF components, especially in the low-frequency region, contained redundant and contradictory IMFs, which would decrease the recognition rate for following analyses and increase the computational time [29].In Reference [30], an improved HHT based on the correlation between the IMFs and the original signal was applied to analyze the vibration signal of a machine.The IMFs' confidence index was introduced to select proper components in Reference [31], and the effectiveness of the improvement was validated by the axle bearing fault diagnosis accuracy.

Convolutional Neural Network
CNN is a kind of deep learning algorithm equipped with convolutional layers and inspired by a cat's visual cortex [26].In the 1980s, Fukushima et al. proposed a new kind of neural network with multiple simple cells and complex cells, which was regarded as the embryonic form of a CNN [32].Enlightened by a sparse local connection, LeCun et al. designed and trained convolutional networks using the back propagation and introduced the concept of weight sharing [33].Several years later, a real sense of a CNN, named LeNet-5, was established with invariant characteristics under translation, scaling, and rotation operations [34].As an automatic feature extractor and classifier, the CNN obtained key information of a signal through convolution, local connection, weight sharing, and subsampling, and classify the feature into different patterns using a fully-connected network.Recently, CNNs are widely applied in handwritten digit identification, image detection, face finders, speech recognition, and so on.In Reference [35], a trainable feature extractor for handwritten digit recognition based on the LeNet-5 CNN was proposed, and a test was conducted on the Mixed National Institute of Standards and Technology (MNIST) database with an error rate of 0.54%.Niu et al. designed another hybrid CNN-SVM classifier for this task and achieved an error rate of 0.19% without rejection [36].A novel face detection approach based on a convolutional neural architecture was design for fast and robust face detection in Reference [37].Chen et al. proposed a gearbox fault identification and classification method with FFT and CNN.The vibration was first decomposed using FFT, and then trained using a CNN.The identification accuracy indicated the dependability and applicability of the proposed scheme in diagnosing the industrial reciprocating machinery [38].In Reference [39], in order to reduce the error rate during speech recognition, a CNN was applied as the extractor and classifier.The log-energy computed directly from the mel-frequency spectral coefficients was used as the input of the network.Furthermore, the experiment with a relative error reduction of about 6-10% compared with Gaussian mixture model-hidden Markov model (GMM-HMM) proved the efficiency of the scheme.In Reference [40], the authors investigated CNN for large vocabulary distant speech recognition.The training data was collected from a single distant microphone and multiple distant microphones.

Discussion
Recently, many valuable HHT-based analysis methods and CNN recognition systems have been proposed and applied by researchers.The publications push forward the improvement of these fields greatly.However, there still exist some disadvantages listed as follows.First, initial coefficients obtained from HHT are too numerous and contained redundant information or undesired IMFs.Traditional recognition solutions cannot handle these problems appropriately.Therefore, various statistical characteristics, such as the energy, maximum, correlation, and so on, are extracted as features.However, different characteristics apply to different problems, which results in certain blindness in selected proper statistical features that lack a strict selection mechanism.Second, a CNN-based approach is successfully used in speech identification, but the speech is decomposed into mel-frequency spectral coefficients, which is not suitable for a machinery acoustic signal.Deep learning aimed at machinery sound have prompted few studies; how to process the initial sound signal, organize the input of CNN, and design the structure of CNN are still open questions.
In order to solve the above problems, a novel acoustic-based cutting pattern recognition method integrated HHT and CNN is introduced in this paper.An industrial microphone was installed to record the cutting sound signal.The initial signal was first decomposed using HHT to obtain a Hilbert marginal spectrum.Then CNN was conducted to extract key information and classify it into different cutting patterns.In order to prove the validity and superiority of the proposed scheme, some simulations and an industrial field application were organized and conducted.

Hilbert-Huang Transform
The HHT is composed of an EMD and a Hilbert transform.The original signal is first decomposed using EMD to obtain a series of intrinsic mode functions (IMFs).Then the Hilbert transform is performed for time-frequency characteristics of the signal.According to Huang's theory, any one-dimensional time series can be decomposed into several IMFs and a remainder, where an arbitrary IMF must obey two constraints: 1) The extreme points and zero crossing points must be equal or the difference is no greater than one.
2) The mean value of the upper and lower envelope calculated by the average of local minima and maxima equals zero at any point.
According to the conditions of IMF, the procedure of EMD is given as below: Step 1: For an arbitrary signal X(t), all extreme points are searched first.The upper and lower envelopes are constructed by connecting all the maximal points and the minimal points by cubic splines, respectively.The average value of the two envelopes is labelled as m 1 , then m 1 is extracted from X(t) to obtain a reminder h 1 , which is given as follows: If h 1 obeys the two constraints of IMF, then h 1 , defined as C 1 , is the first IMF of X(t).Else, h 1 is regarded as X(t), and the above step is repeated.In general, the highest frequency component of Step 2: Extract C 1 from X(t), and the remainder component r 1 can be described as follows: Then, X(t) is replaced by r 1 , and the above step is repeated until the N-th remainder is a monotonic function, namely r N .Furthermore, r N can be expressed as follows: Step 3: X(t) can be decomposed into N IMFs and a remainder, which can be described as follows: where r N is the residual and represents the average trend of X(t).Generally, X(t) is in the real domain, so IMFs decomposed from the original signal are also real functions.The Hilbert transform is then conducted on C n as follows: The analytical formula of C n can be expressed as follows: where a n (t) is amplitude function and ϕ n (t) is the phase function.
Then, the instantaneous frequency can be calculated as follows: After ignoring the remainder term r N (t), X(t) can is given as follows: where a n (t) and ϕ n (t) are functions of time.Furthermore, a three-dimensional time-frequency Hilbert spectrogram regarding time, frequency, and amplitude is established to describe the change of frequency and amplitude in the time domain.The Hilbert spectrum can be described as follows: Finally, the Hilbert marginal spectrum is defined as an integrated spectrum about time as follows: where the value of the Hilbert marginal spectrum is the total amplitude value of each frequency in a different time scale.It describes the variation tendency of amplitude with frequency across the whole frequency scale and also represents whether a given frequency is contained in the signal.The flowchart of HHT is given in Figure 1.

Convolutional Neural Network
Convolutional neural network (CNN) can be seen as a modification of a traditional neural network.The CNN introduces local connectivity and weight sharing in hidden layers, and adopts a special network structure, which consists of convolution layers and sub-sampling layers.
For a two-dimensional input map, being the pixel values at m and n (horizontal and vertical), feature extraction and classification through CNN are performed as shown in Figure 2. The early stage of a CNN consists of alternating convolution and sub-sampling operations, while the last few layers are fully connected one-dimensional layers.
Step 1: Convolution.The convolution layers are core components of a CNN, which have the characteristics of local connectivity and weight sharing.The previous feature maps are convolved with trainable kernels and transformed by the activation function to generate convolution feature maps.Each feature map in a convolution layer combines characteristics of multiple input maps.In general, it is calculated as follows:

Convolutional Neural Network
Convolutional neural network (CNN) can be seen as a modification of a traditional neural network.The CNN introduces local connectivity and weight sharing in hidden layers, and adopts a special network structure, which consists of convolution layers and sub-sampling layers.
For a two-dimensional input map, being the pixel values at m and n (horizontal and vertical), feature extraction and classification through CNN are performed as shown in Figure 2. The early stage of a CNN consists of alternating convolution and sub-sampling operations, while the last few layers are fully connected one-dimensional layers.
Symmetry 2018, 10, x FOR PEER REVIEW 7 of 14 Step 2: Sub-sampling.A sub-sampling operation is used on the convolution layer to obtain its corresponding pooling ply.Generally, feature maps in this layer have the same number with the number of that in the convolution ply, while each map is smaller.The aim of the sub-sampling operation is to reduce the resolution of feature maps.The process is realized through using a sampling function to several units in a local area of a size determined by the sub-sampling size.Typically, the sampling function will get the sum of each p × p block in the previous map such that the output feature map is p times smaller along both spatial dimensions.The sub-sampling operation can be presented as follows: , where the size of the input map is m1 × n1, and g × g is the sub-sampling shift size.If p = g, then the size of pooling ply would be 1/p of the convolution ply.
Step 3: Classification.After the 2-D map is transformed by several convolution and sub-sampling operations, the remaining units are spread to a nonlinear classifier.Nodes of different layers in the classifier are fully connected through the sigmoid function.Finally, a vector with t nodes is output through the CNN, where t represents the category number of the specific issue.

The Proposed Method
To recognize the cutting pattern of the coal mining shearer accurately, the cutting acoustic signal is recorded and utilized in this paper.The original sound is processed using the HHT first.Instead of extracting statistical characteristics from the decomposed signal, such as the energy, maximum, correlation coefficient, and so on, the HHT coefficients are directly input into the deep learning CNN.The flow of the proposed HHT-CNN can be summarized as follows: Step 1: Convolution.The convolution layers are core components of a CNN, which have the characteristics of local connectivity and weight sharing.The previous feature maps are convolved with trainable kernels and transformed by the activation function to generate convolution feature maps.Each feature map in a convolution layer combines characteristics of multiple input maps.In general, it is calculated as follows: where σ(•) is a nonlinear sigmoid function, M j donates a selection of input feature maps, l represents the l-th layer in a network, k donates a convolution kernel with the size of s × s, and b is an additive bias.The convolution kernel can also be regarded as a filter.The size of the feature map in the convolution layer is where the size of input map is m × n, and f × f is the convolution shift size.
Step 2: Sub-sampling.A sub-sampling operation is used on the convolution layer to obtain its corresponding pooling ply.Generally, feature maps in this layer have the same number with the number of that in the convolution ply, while each map is smaller.The aim of the sub-sampling operation is to reduce the resolution of feature maps.The process is realized through using a sampling function to several units in a local area of a size determined by the sub-sampling size.Typically, the sampling function will get the sum of each p × p block in the previous map such that the output feature map is p times smaller along both spatial dimensions.The sub-sampling operation can be presented as follows: where δ(•) donates the sampling function.The size of feature map in the sub-sampling layer is , where the size of the input map is m 1 × n 1 , and g × g is the sub-sampling shift size.If p = g, then the size of pooling ply would be 1/p of the convolution ply.
Step 3: Classification.After the 2-D map is transformed by several convolution and sub-sampling operations, the remaining units are spread to a nonlinear classifier.Nodes of different layers in the classifier are fully connected through the sigmoid function.Finally, a vector with t nodes is output through the CNN, where t represents the category number of the specific issue.

The Proposed Method
To recognize the cutting pattern of the coal mining shearer accurately, the cutting acoustic signal is recorded and utilized in this paper.The original sound is processed using the HHT first.Instead of extracting statistical characteristics from the decomposed signal, such as the energy, maximum, correlation coefficient, and so on, the HHT coefficients are directly input into the deep learning CNN.The flow of the proposed HHT-CNN can be summarized as follows: Step 1: Pretreatment.Acquire Q cutting sound samples, which can be classified into t averaged cutting patterns.The samples are divided into Q 1 training samples and the remaining Q 2 series are treated as testing samples.
Step 2: Decomposition.Decompose each sound sample into a series of IMFs using EMD.As the IMF quantity of different series may be different from each other, the biggest number is recorded as T max .Some zero vectors are added at low frequency if the IMF number is smaller than T max .Therefore, the signal can be described as follows: where q = 1, 2, 3, . . ., Q.
Step 3: Hilbert-Huang transform.Perform a Hilbert transform on the IMFs to obtain the marginal spectrum of each signal.In the practical calculation, the marginal spectrum consists of a suite of discrete values.Therefore, the signal can be noted as H q = [h 1 , h 2 , h 3 . . .h l ], where H q is treated as a feature vector of the sample and l donates the point number of the Hilbert transform.
Step 4: Reorganization.Each feature vector is reorganized into an m × n 2-D map, where l = m × n.The m × n feature map is regarded as the representation of the sound signal.
Step 5: CNN training and testing.Input the feature vector of each training sample into the CNN, and the output is the corresponding cutting pattern.The size of convolutional kernel is s × s, and the sub-sampling size is p × p. Finally, organize the testing samples as input maps of the cognitive CNN to validate the recognition accuracy.The flowchart of the proposed method is shown in Figure 3.
Symmetry 2018, 10, x FOR PEER REVIEW 8 of 14 Step 5: CNN training and testing.Input the feature vector of each training sample into the CNN, and the output is the corresponding cutting pattern.The size of convolutional kernel is s × s, and the sub-sampling size is p × p. Finally, organize the testing samples as input maps of the cognitive CNN to validate the recognition accuracy.The flowchart of the proposed method is shown in Figure 3.

Simulation and Application
In order to prove the validity and superiority of the proposed scheme, some simulations and an industrial field application were organized and conducted.The sound signal of four cutting patterns were recorded.Then HHT and CNN were conducted in order.Finally, some analysis was performed based on the simulation results.

Cutting Sound Acquisition
A full-size, coal-rock cutting wall was built to simulate a real geological condition to obtain cutting sound under different cutting patterns.All experiments were performed in the National Coal Mining Equipment Research and Experiment Center at the China Coal Zhangjiakou Coal Mining Machinery Co., Ltd (Zhangjiakou, China).Furthermore, the shearer model was an MG500/1130-WD, the cutting height range of the shearer was 1.6 meters to 3.3 meters and the production capacity was 1600 tons per hour, as shown in Figure 4.The pulling speed of the shearer was 3 m/min, and the cutting wall consisted of three typical sections: pure coal bed with a Protodikonov hardness coefficient of f2 (P1), pure coal bed with a hardness of f3 (P2), and coal bed gripping rock bed (P3).An industrial microphone was installed to record the cutting acoustic signal of P1, P2, P3, and the working condition of empty-load (P4).The sampling frequency of the cutting sound was 44.1 kHz and the initial sound was saved as a .wavfile.A 25-min sound sample was extracted as sample signal for each cutting pattern.A total of 12,000 sample series, each with a duration time of 0.5 s were collected.A total of 10,000 series were the training samples and the remaining 2000 series were treated as the testing samples.Four typical kinds of cutting sound signals are shown in Figure 5.

Simulation and Application
In order to prove the validity and superiority of the proposed scheme, some simulations and an industrial field application were organized and conducted.The sound signal of four cutting patterns were recorded.Then HHT and CNN were conducted in order.Finally, some analysis was performed based on the simulation results.

Cutting Sound Acquisition
A full-size, coal-rock cutting wall was built to simulate a real geological condition to obtain cutting sound under different cutting patterns.All experiments were performed in the National Coal Mining Equipment Research and Experiment Center at the China Coal Zhangjiakou Coal Mining Machinery Co., Ltd.(Zhangjiakou, China).Furthermore, the shearer model was an MG500/1130-WD, the cutting height range of the shearer was 1.6 meters to 3.3 meters and the production capacity was 1600 tons per hour, as shown in Figure 4.The pulling speed of the shearer was 3 m/min, and the cutting wall consisted of three typical sections: pure coal bed with a Protodikonov hardness coefficient of f2 (P1), pure coal bed with a hardness of f3 (P2), and coal bed gripping rock bed (P3).An industrial microphone was installed to record the cutting acoustic signal of P1, P2, P3, and the working condition of empty-load (P4).The sampling frequency of the cutting sound was 44.1 kHz and the initial sound was saved as a .wavfile.A 25-min sound sample was extracted as sample signal for each cutting pattern.A total of 12,000 sample series, each with a duration time of 0.5 s were collected.A total of 10,000 series were the training samples and the remaining 2000 series were treated as the testing samples.Four typical kinds of cutting sound signals are shown in Figure 5.

Sound Decomposition
The sound signal was then decomposed using HHT adaptively, which contained EMD and the Hilbert transform.EMD was first conducted on the signal, and then the signal was decomposed into several IMFs and a residual, where the EMD result of P1 is presented in Figure 6.A Hilbert transform was performed on the IMFs subsequently to obtain the Hilbert time-frequency spectrum, which is shown in Figure 7.The time-frequency spectrum in the figure contained both time and frequency information of the cutting sound series.Furthermore, the value of the normalized frequency was described by gradation of color.The Hilbert marginal spectrum, using an integrated spectrum with respect to time, of the four typical cutting sound series is shown in Figure 8.In this paper, the marginal spectrum was discretized into 1024 frequency bands.Based on the Nyquist-Shannon sampling theorem, the sampling frequency should be at least twice as big as the highest frequency in the signal.In this paper, the sampling frequency was 44.1 kHz, so the biggest distinguishable frequency was 22.05 kHz.The length of each frequency band was about 21.53 Hz.Finally, each signal series was decomposed into a vector with 1024 elements.

Sound Decomposition
The sound signal was then decomposed using HHT adaptively, which contained EMD and the Hilbert transform.EMD was first conducted on the signal, and then the signal was decomposed into several IMFs and a residual, where the EMD result of P1 is presented in Figure 6.A Hilbert transform was performed on the IMFs subsequently to obtain the Hilbert time-frequency spectrum, which is shown in Figure 7.The time-frequency spectrum in the figure contained both time and frequency information of the cutting sound series.Furthermore, the value of the normalized frequency was described by gradation of color.The Hilbert marginal spectrum, using an integrated spectrum with respect to time, of the four typical cutting sound series is shown in Figure 8.In this paper, the marginal spectrum was discretized into 1024 frequency bands.Based on the Nyquist-Shannon sampling theorem, the sampling frequency should be at least twice as big as the highest frequency in the signal.In this paper, the sampling frequency was 44.1 kHz, so the biggest distinguishable frequency was 22.05 kHz.The length of each frequency band was about 21.53 Hz.Finally, each signal series was decomposed into a vector with 1024 elements.

Sound Decomposition
The sound signal was then decomposed using HHT adaptively, which contained EMD and the Hilbert transform.EMD was first conducted on the signal, and then the signal was decomposed into several IMFs and a residual, where the EMD result of P1 is presented in Figure 6.A Hilbert transform was performed on the IMFs subsequently to obtain the Hilbert time-frequency spectrum, which is shown in Figure 7.The time-frequency spectrum in the figure contained both time and frequency information of the cutting sound series.Furthermore, the value of the normalized frequency was described by gradation of color.The Hilbert marginal spectrum, using an integrated spectrum with respect to time, of the four typical cutting sound series is shown in Figure 8.In this paper, the marginal spectrum was discretized into 1024 frequency bands.Based on the Nyquist-Shannon sampling theorem, the sampling frequency should be at least twice as big as the highest frequency in the signal.In this paper, the sampling frequency was 44.1 kHz, so the biggest distinguishable frequency was 22.05 kHz.The length of each frequency band was about 21.53 Hz.Finally, each signal series was decomposed into a vector with 1024 elements.

CNN Training and Testing
As the input of the CNN was usually a 2-D map, the 1024 elements were reorganized as follows.The first 32 elements were regarded as the first row, elements from 33 to 64 were treated as the second row, and the remainders were also reorganized as the rule.Finally, the 1024 Hilbert marginal spectrum points were transformed into a 32 × 32 feature map.
In order to evaluate the feasibility of the proposed deep learning method, the initial CNN was trained using the 10,000 training samples.The LeNet-5 CNN was applied to recognize the cutting pattern of the shearer.The architecture of the LeNet-5 is shown in Figure 9.It could be seen that there were three convolution layers and two subsampling layers in the LeNet-5.The input map was a 32 × 32 map that was first transformed into six feature maps through a 5 × 5 convolution kernel and a 1 × 1 convolution shift step, so the size of map in the first convolution layer was 28 × 28.Then, the sub-sampling operation was conducted through a pooling size of 2 × 2. The map size in this ply was 14 × 14.The number of feature maps in the second convolution and sub-sampling layer was sixteen.The size of the convolution kernel and pooling block was the same as the previous layer.Then the sixteen 5 × 5 feature maps were convolved into 120 feature points.Finally, a fully connected classifier with 84 hidden nodes and 4 output nodes was designed to recognize the cutting pattern of the coal mining shearer.The output loss function in the LeNet-5 CNN was the maximum likelihood estimation criterion, which in this case was a minimum mean squared error (MSE) equivalent.Then, the deep learning network was trained through the 10,000 training sound signal series.The training procedure was stopped after 1500 epochs.As the input of the CNN was usually a 2-D map, the 1024 elements were reorganized as follows.The first 32 elements were regarded as the first row, elements from 33 to 64 were treated as the second row, and the remainders were also reorganized as the rule.Finally, the 1024 Hilbert marginal spectrum points were transformed into a 32 × 32 feature map.
In order to evaluate the feasibility of the proposed deep learning method, the initial CNN was trained using the 10,000 training samples.The LeNet-5 CNN was applied to recognize the cutting pattern of the shearer.The architecture of the LeNet-5 is shown in Figure 9.It could be seen that there were three convolution layers and two subsampling layers in the LeNet-5.The input map was a 32 × 32 map that was first transformed into six feature maps through a 5 × 5 convolution kernel and a 1 × 1 convolution shift step, so the size of map in the first convolution layer was 28 × 28.Then, the sub-sampling operation was conducted through a pooling size of 2 × 2. The map size in this ply was 14 × 14.The number of feature maps in the second convolution and sub-sampling layer was sixteen.The size of the convolution kernel and pooling block was the same as the previous layer.Then the sixteen 5 × 5 feature maps were convolved into 120 feature points.Finally, a fully connected classifier with 84 hidden nodes and 4 output nodes was designed to recognize the cutting pattern of the coal mining shearer.The output loss function in the LeNet-5 CNN was the maximum likelihood estimation criterion, which in this case was a minimum mean squared error (MSE) equivalent.Then, the deep learning network was trained through the 10,000 training sound signal series.The training procedure was stopped after 1500 epochs.After training the CNN, the remaining 2000 testing samples were applied to validate the accuracy of the trained CNN.The same decomposition and reorganization were conducted on the testing series.Furthermore, the recognition result is presented in Figure 10.It could be seen from the figure that 1971 samples were recognized exactly.Therefore, the identification accuracy of the proposed cutting pattern recognition system based on HHT-CNN was 98.55%.A total of 29 samples were misjudged during the testing process.Among these, four samples in P1 were sorted into P2 and one sample each into P3 and P4.Seven sound series in P2 were classified into P1 and six into P3 mistakenly.Moreover, two acoustic samples in P3 were misclassified as P1 and six as P2.Only one series in P4 was misjudged.According to a deep analysis on the results, it could be seen that the acoustic signal of cutting objects with similar characteristics had a small deviation, and those with evident differences could be distinguished precisely.After training the CNN, the remaining 2000 testing samples were applied to validate the accuracy of the trained CNN.The same decomposition and reorganization were conducted on the testing series.Furthermore, the recognition result is presented in Figure 10.It could be seen from the figure that 1971 samples were recognized exactly.Therefore, the identification accuracy of the proposed cutting pattern recognition system based on HHT-CNN was 98.55%.A total of 29 samples were misjudged during the testing process.Among these, four samples in P1 were sorted into P2 and one sample each into P3 and P4.Seven sound series in P2 were classified into P1 and six into P3 mistakenly.Moreover, two acoustic samples in P3 were misclassified as P1 and six as P2.Only one series in P4 was misjudged.According to a deep analysis on the results, it could be seen that the acoustic signal of cutting objects with similar characteristics had a small deviation, and those with evident differences could be distinguished precisely.
where σ(•) is a nonlinear sigmoid function, Mj donates a selection of input feature maps, l represents the l-th layer in a network, k donates a convolution kernel with the size of s × s, and b is an additive bias.The convolution kernel can also be regarded as a filter.The size of the feature map in the convolution layer is [(m − s)/f + 1] × [(n − s)/f + 1], where the size of input map is m × n, and f × f is the convolution shift size.

Figure 2 .
Figure 2. Structure of the convolutional neural network.

Figure 2 .
Figure 2. Structure of the convolutional neural network.

Figure 3 .
Figure 3. Flowchart of the proposed method.

Figure 3 .
Figure 3. Flowchart of the proposed method.

Figure 4 .
Figure 4. (a) The arrangement of the experimental suite.(b) The experiment process.

Figure 5 .
Figure 5. (a) Cutting sound of coal bed with a hardness of f2.(b) Cutting sound of coal bed with a hardness of f3.(c) Cutting sound of coal bed gripping gangue.(d) Cutting sound of the empty-load.

Figure 4 .
Figure 4. (a) The arrangement of the experimental suite.(b) The experiment process.

Figure 4 .
Figure 4. (a) The arrangement of the experimental suite.(b) The experiment process.

Figure 5 .
Figure 5. (a) Cutting sound of coal bed with a hardness of f2.(b) Cutting sound of coal bed with a hardness of f3.(c) Cutting sound of coal bed gripping gangue.(d) Cutting sound of the empty-load.

Figure 5 .
Figure 5. (a) Cutting sound of coal bed with a hardness of f2.(b) Cutting sound of coal bed with a hardness of f3.(c) Cutting sound of coal bed gripping gangue.(d) Cutting sound of the empty-load.

Figure 7 .
Figure 7. Hilbert time-frequency spectrum of four different cutting signal.

Figure 8 .
Figure 8. Hilbert marginal spectrum of the four kinds of cutting sound.

Figure 7 .
Figure 7. Hilbert time-frequency spectrum of four different cutting signal.

Figure 8 .
Figure 8. Hilbert marginal spectrum of the four kinds of cutting sound.

Figure 7 .
Figure 7. Hilbert time-frequency spectrum of four different cutting signal.

Figure 8 .
Figure 8. Hilbert marginal spectrum of the four kinds of cutting sound.

Figure 8 .
Figure 8. Hilbert marginal spectrum of the four kinds of cutting sound.

Figure 9 .
Figure 9. Structure of the proposed CNN.

Figure 9 .
Figure 9. Structure of the proposed CNN.