You are currently viewing a new version of our website. To view the old version click .
Algorithms
  • Article
  • Open Access

7 February 2023

The Use of Correlation Features in the Problem of Speech Recognition

Data Analysis and Machine Learning Department, Financial University under the Government of the Russian Federation, pr-kt Leningradsky, 49/2, 125167 Moscow, Russia
This article belongs to the Special Issue Algorithms for Feature Selection

Abstract

The problem solved in the article is connected with the increase in the efficiency of phraseological radio exchange message recognition, which sometimes takes place in conditions of increased tension for the pilot. For high-quality recognition, signal preprocessing methods are needed. The article considers new data preprocessing algorithms used to extract features from a speech message. In this case, two approaches were proposed. The first approach is building autocorrelation functions of messages based on the Fourier transform, the second one uses the idea of building autocorrelation portraits of speech signals. The proposed approaches are quite simple to implement, although they require cyclic operators, since they work with pairs of samples from the original signal. Approbation of the developed method was carried out with the problem of recognizing phraseological radio exchange messages in Russian. The algorithm with preliminary feature extraction provides a gain of 1.7% in recognition accuracy. The use of convolutional neural networks also provides an increase in recognition efficiency. The gain for autocorrelation portraits processing is about 3–4%. Quantization is used to optimize the proposed models. The algorithm’s performance increased by 2.8 times after the quantization. It was also possible to increase accuracy of recognition by 1–2% using digital signal processing algorithms. An important feature of the proposed algorithms is the possibility of generalizing them to arbitrary data with time correlation. The speech message preprocessing algorithms discussed in this article are based on classical digital signal processing algorithms. The idea of constructing autocorrelation portraits based on the time series of a signal has a novelty. At the same time, this approach ensures high recognition accuracy. However, the study also showed that all the algorithms under consideration perform quite poorly under the influence of strong noise.

1. Introduction

Currently, automatic and automated systems in transport are of particular relevance. On the one hand, today there are already many algorithms for unmanned vehicles [,] and unmanned aerial vehicles [,,]. On the other hand, modern deep learning natural language and multimodal models [,,] make it possible to perform intelligent processing of audio data with a sufficiently high accuracy. Many modern smartphones are equipped with voice assistants, but the devices only solve tasks which are not so sensitive to recognition errors. Furthermore, among other things, errors in such compact devices arise due to distillation and simplification of machine learning models [,]. Another problem with the use of the considered recognizers is that such gadgets work badly under noise conditions. At the same time, transport security responsibilities include not only the identification of dangerous items of baggage [,], but also the tasks of correct and timely air traffic control.
For high-quality and fast organization of air traffic, a specialized language of phraseological radio exchange is used. This, on the one hand, simplifies the task of recognizing a limited number of phrase patterns, but, on the other hand, requires very high accuracy values for automatic implementation. However, the reception and transmission of the speech messages may adversely affect the aircraft pilot. It is clear that the reception is carried out with aircraft and other noises. Therefore, it is necessary to develop high-quality speech message preprocessing algorithms for feature extraction in order to build automated systems for the phraseological messages recognition. Moreover, such a system should be invariant to the gender and personal characteristics of the voice. Finally, efficient preprocessing and processing in real-time is required.
Obviously, the use of passenger transport carries certain risks. This also applies to aviation security. One of the factors that can be distinguished from human factors is pilot fatigue and workload. Thus, the article considers the applied task of reducing the aircraft pilot workload based on automated speech messages recognition. At the same time, radio exchange is the necessary for transmitting and receiving messages from air traffic control to ensure flight safety. Phraseological radio exchange allows dispatchers and pilots to control the situation on the ground and in the air. However, today radio traffic is not absolutely stable. It is often subject to various kinds of interference, which can greatly distort messages. Sometimes this problem deals with duplicating information, but the time spent can affect the level of flight safety. For example, in different conditions information about a flight height level change or an alternate landing airport will be extremely important.
As for the noise-resistant transmission problem, it is possible to use adaptive filtering methods [,,]. Application of simulation algorithms is necessary to analyze the quality of digital signal processing. This will make it possible to impose noise on speech messages, as well as to evaluate the recognition efficiency for various noise and filtration parameters. Finally, both approaches based on mathematical models [] and deep learning methods [] can be used for analysis. The next section is devoted to a brief related works review, which concludes with the use of digital signal preprocessing.

3. Materials and Methods

Twenty typical phraseological exchange phrases were selected from the aviation rules in Russian for further processing and recognition of speech commands. Ten speakers were involved. Every speaker wrote down each phrase 10 times. Thus, the size of the initial sample was 2000 sound recordings.
Primary data processing, which consists in the formation of time sequences, was performed in the Audacity program []. A typical signal from the phrases “Pre-flight check of communication” and “Ready for take-off” is shown in Figure 1. From left to right are the recordings of three different speakers. Note that the abscissa shows not the signal pressure but the modulation of voice using voltage after microphone recording. The X-axis is the time axis in seconds.
Figure 1. Time series of studied radio messages.
In Figure 1a, we see what time series are obtained when pronouncing the phrase “Pre-flight check of communication” by different speakers. Obviously, the content of the speech message itself has more weight on the generated signal than the speaker who pronounces it. Similarly, Figure 1b shows the representations of the phrase “Ready for take-off” by different speakers.
Figure 2 shows the spectra of messages shown in Figure 1. In this case, the X-axis will represent the frequency range (f) in Hz, and the Y-axis represent signal level (S) in dB.
Figure 2. The spectrum of the studied radio messages.
Note that by analogy with Figure 1a and Figure 2a shows the spectra of the speech signal “Pre-flight check of communication” recorded by three different speakers. Figure 2b shows the spectra of the speech message “Ready for take-off”. It can also be noted that the influence of the content of the message on the generated spectrum is much greater than the influence of the speaker The frequency range that differs in men and women can be interesting. However, aviation is currently dominated by men as pilots.
In the Audacity program, the available frequency range is from 0 to 21.9 kHz. When quantizing 8 bits, it is divided into 256 levels, on which the spectrum is analyzed. The advantage of switching to the spectral representation is that the processing becomes invariant to the signal duration, but the spectrum is highly dependent on the speaker, which can introduce recognition errors.
Note that 256 spectral characteristics per record increase the amount of data to almost 600,000 examples. In this regard, it was decided to decimate the spectra based on the average value. Thus, compressed (averaged) features of the spectrum were additionally extracted for faster processing,
Figure 3 shows the spectra of all studied phraseological radio exchange messages. Note that the graph has a logarithmic scale along the X-axis, where the frequency is reflected, and the signal level in dB is plotted on the Y-axis. For the convenience of visualization, we also averaged all records of all speakers for each phrase.
Figure 3. Spectra of various messages after averaging.
The next preprocessing step is to convert the values from dB to V. The reference voltage was taken as U 0 = 1 mV. Then it is possible to use the following relationship between the quantities:
S [ dB ] = 10 log ( U / U 0 ) ,
where S is the spectrum in [dB], U is the spectrum in [V], and U 0 is the reference voltage.
So, Formula (1) is needed to determine the desired voltage
U [ V ] = 10 S / 10 3 .
Using expression (2) for each component of the spectrum, it is possible to obtain the amplitude spectrum of voltages. For the data in Figure 4 similar spectra were plotted. However, the Y-axis now also uses a logarithmic scale.
Figure 4. An example of spectra of absolute voltage values after averaging.
Figure 3 shows examples for all of the studied speech commands. Since we have carried out averaging procedures, we can consider the obtained spectra as standards for each message. It should also be noted that all schedules have a similar structure, but at the same time, each message has its own peculiarities. Our next task is to get a more vivid representation of them. Analyzing the obtained curves, it can be noted that the averaging did not produce a complete smoothing of the spectra. However, almost all the resulting graphs are similar to each other, with the exception of some artifacts. The main features are located in these artifacts.
Figure 4 shows examples for all phrases.
Figure 4 develops a search in the voltage spectrum; however, it is not much different from Figure 3. This is due to the fact that the amplitude was displayed on a logarithmic scale. Therefore, further search for preprocessing with a clear separation of classes is required. According to the curves in Figure 4, it is possible to conclude that the shape is absolutely identical with the spectra in relative voltage values. The results of spectrum calculations (2) can be used to estimate the energy spectrum of the signal based on the following expression:
G ( F ) = | S ( F ) | 2 ,
where G ( F ) is the energy spectrum of the signal, | . | is the operation of calculating the modulus, and S ( F ) is the spectrum of the signal (amplitude).
Such a spectrum actually characterizes the energy or power of the signal, so we will use the unit W to evaluate it. In accordance with expression (3), it is easy to calculate the energy spectrum for all the studied speech messages. Obviously, squaring the value, although it forms a nonlinear transformation, taking into account the logarithmic representation, will not greatly affect the visual representation of the spectra. Figure 5 shows the results of such processing for each average phrase.
Figure 5. Examples of the radio exchange messages energy spectrum.
It can be seen that the visualization of the reference energy spectra for each speech command in Figure 5 also does not lead to their clear separation for classification. This is due to the too large scales of the logarithmic scales. However, the energy spectrum is needed to calculate the covariance function. Finally, the search for an autoregressive mathematical model to describe the data can be reduced to the model parameters estimation based on the autocorrelation function The following assumptions can be made for modeling:
1. Let the energy spectrum be symmetrical with respect to the Y-axis, i.e., G ( F ) = G ( F ) . Therefore, the correlation function will also be symmetric (stationary process property).
2. Let the signal that forms an arbitrary correlation function have such a temporal representation that obeys the homogeneity property. This means that the calculation of the covariance function is invariant to the location of the signal. This simplification can be circumvented by using a windowed representation of the signal. Let us write the homogeneity property in terms of the correlation function such as B ( t 2 t 1 ) = B ( t 4 t 3 ) = B ( Δ t 12 ) , if t 2 t 1 = t 4 t 3 .
3. Let the normalization of the covariance function be possible after estimating the variance, i.e., construction of the correlation (normalized covariance) R ( 0 ) = B ( 0 ) / σ 2 = 1 , where σ 2 is the variance of the studied speech message.
Taking into account the introduced proposals for modeling, the estimate of the correlation function can easily be obtained from the inverse Fourier transform for the energy spectrum (3) and found by the formula:
R ( t ) = F max F max G ( F ) e j 2 π F t d F σ 2 ,
where σ 2 is the maximum transformation value (covariance of identical message points).
Calculations of all correlation functions (4) were performed in the Matlab environment through the built-in function of the fast inverse Fourier transform. It is important to note that now the frequency step has been replaced by the time domain step.
Figure 6 shows all correlation functions that were calculated for 20 averaged studied voice messages.
Figure 6. Examples of radio exchange messages correlation functions.
It should be noted that the X-axis represents a discrete time interval (k), and the Y-axis shows the values of the normalized auto-covariance of messages (R). The normalized covariance (correlation R) has no units, and the interval k is equal to the sampling time step (discretization frequency is 48 kHz).
Figure 6 shows all 20 reference correlation functions of speech commands. All of them are distinguishable either in form or in correlation values. Values depend on source voice message. Such signals can already be used for classification. On the curves in Figure 6, it is not difficult to notice both the fading nature of the connection in speech signals, and some quasi-periodicity, which is present in speech and leads to several extrema on the graphs. However, the shape is already much better visually distinguished, which can be used to train neural networks.
As another preprocessing method for feature extraction, let us consider the construction, not of cross-correlation portraits, but autocorrelation portraits. To do this, it is necessary to import the original time series for a voice message. Then, knowing the total number of samples N and the desired sweep size M (it is necessary that the remainder of dividing N by M is 0), it is possible to obtain a correlation portrait using a sliding window:
R ( i , k ) = 1 N i = 0 N 1 1 M × σ 2 ( i ) k = 0 M 1 [ S ( i × M ) m ( i ) ] × [ S ( i × M + k ) m ( i ) ] ,
where σ 2 ( i ) is the signal variance in the region of the local window.
In this case, it is good idea to vary the different sizes of portraits. Figure 7 shows a representation of the voice message “Pre-flight check” with different portrait sizes.
Figure 7. Autocorrelation portraits for the phrase “Pre-flight check”.
It is easy to see that width of the image may show correlation interval. If brightness along X-axis is changing slowly the width should be bigger. Height of the portrait should be chosen such way that provides integer number of rows according to length of discrete message signal.
Figure 8 shows the autocorrelation portraits of two phrases for comparison.
Figure 8. Comparison of autocorrelation portraits.
Analysis of Figure 8 shows that very different images are generated for different messages. As a result, after building such portraits it is possible to use convolutional neural networks for recognition.
Figure 9 shows the final diagram of the preprocessing of a speech message to prepare for recognition.
Figure 9. Comparison of autocorrelation portraits.
In the next section, we will consider the results of comparing the proposed methods for extracting features of speech messages with methods of deep and machine learning.

4. Results

Before proceeding the comparing different algorithms results, it should be noted that the main task is the recognition of a speech message, i.e., the task is classify the message type from 20 studied messages, so classification metrics are used for algorithm evaluation. Another important parameter is the recognition speed, so this parameter is also measured. To be precise, the time taken to recognize one message on average is measured.
Moreover, since there were also machine learning methods among the studied algorithms, the test data consisted of two examples of messages spoken by each speaker, i.e., the data split approach used the classical proportion of 80% by 20% when separating data into training and testing. Thus, the size of the test sample was 400 speech messages, among which there were 20 examples of each class.
Given the balance of training and test data, the accuracy metric was chosen as the main metric.
Table 2 shows the results of efficiency evaluation using the proposed feature extraction methods and known algorithms. As for ANN algorithms, the 1D and 2D Convolutional networks were studied as well as simple recurrent networks and long short term memory (LSTM) networks.
Table 2. Recognition accuracy.
The highest accuracy is provided by the convolutional network after processing the correlation portraits and LSTM networks. It can be explained that long memory is important characteristic for voice messages. However, it is also noticeable that preprocessing in the form of a correlation function also increases the recognition quality.
Table 3 presents the performance results.
Table 3. Recognition time.
The analysis shows that simple algorithms work faster, but they lead to the greatest errors. The best model is the two-dimensional convolutional network. It is almost eight times slower than a recurrent model. However, for pure data, LSTM looks to be best in terms of speed and asset quality. The implementation of quantization of the weights of the convolutional neural network made it possible to reduce the processing time of one message to 831.3 ms.
Finally, we will study the efficiency of the proposed algorithms under the influence of white Gaussian noise. Let q be the ratio of signal variance to noise variance. Table 4 presents the recognition results under noise conditions.
Table 4. Noisy messages recognition.
From the results of Table 3 it is clear that the two-dimensional model is the most resistant to intense noise, however, at a noise level 10 times greater than the signal level, the recognition results drop significantly.
It is interesting that traditional machine learning work very badly if the data are noisy. However recurrent structure is more resistance to noise impact. LSTM shows low resistance to noise data. It is possible to use long short term memory networks for processing pure data.

5. Conclusions

Algorithms for the preprocessing of speech messages were proposed to extract features in the problem of recognizing phraseological radio exchange messages. Preprocessing was based on spectral analysis and obtaining correlation functions from the signal spectrum. Building an autocorrelation portrait of a speech message was also proposed. Then, the data were processed by convolution neural networks. This made it possible to increase the recognition accuracy compared to recurrent networks by about 2%. However, in noisy conditions it is not possible to maintain maximum accuracy. Performance approaches 50% at the signal-to-noise ratio q = 0.1. Quantization also ensures the speed of the two-dimensional convolutional network at the level of the recurrent network. The main results of the work are data acquisition and the proposed methods for their preprocessing. Based on the recordings of speech messages and their spectra, it is proposed that correlation functions are obtained so that all classes become increasingly separable. This is confirmed by the results of the study, the increase in accuracy after preprocessing reaches 1–2%. An additional improvement is provided by the use of autocorrelation portraits. In the future, an investigation into how the size of such portraits affects the accuracy is planned. On data in the absence of noise, LSTM networks are preferable, but they turned out to be unstable under the influence of noise. In this regard, the plans for further research include signal filtering.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Parekh, D.; Poddar, N.; Rajpurkar, A.; Chahal, M.; Kumar, N.; Joshi, G.P.; Cho, W. A Review on Autonomous Vehicles: Progress, Methods and Challenges. Electronics 2022, 11, 2162. [Google Scholar] [CrossRef]
  2. Khanum, A.; Lee, C.-Y.; Yang, C.-S. Deep-Learning-Based Network for Lane Following in Autonomous Vehicles. Electronics 2022, 11, 3084. [Google Scholar] [CrossRef]
  3. Brunelli, M.; Ditta, C.C.; Postorino, M.N. A Framework to Develop Urban Aerial Networks by Using a Digital Twin Approach. Drones 2022, 6, 387. [Google Scholar] [CrossRef]
  4. Andriyanov, N.; Vasiliev, K. Using Local Objects to Improve Estimation of Mobile Object Coordinates and Smoothing Trajectory of Movement by Autoregression with Multiple Roots. Adv. Intell. Syst. Comput. 2020, 1038, 1014–1025. [Google Scholar] [CrossRef]
  5. Jarray, R.; Bouallègue, S.; Rezk, H.; Al-Dhaifallah, M. Parallel Multiobjective Multiverse Optimizer for Path Planning of Unmanned Aerial Vehicles in a Dynamic Environment with Moving Obstacles. Drones 2022, 6, 385. [Google Scholar] [CrossRef]
  6. Andriyanov, N.A. Combining Text and Image Analysis Methods for Solving Multimodal Classification Problems. Pattern Recognit. Image Anal. 2022, 32, 489–494. [Google Scholar] [CrossRef]
  7. Mukhamadiyev, A.; Khujayarov, I.; Djuraev, O.; Cho, J. Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language. Sensors 2022, 22, 3683. [Google Scholar] [CrossRef]
  8. Ramos-Pérez, E.; Alonso-González, P.J.; Núñez-Velázquez, J.J. Multi-Transformer: A New Neural Network-Based Architecture for Forecasting S & P Volatility. Mathematics 2021, 9, 1794. [Google Scholar] [CrossRef]
  9. Andriyanov, N.; Papakostas, G. Optimization and Benchmarking of Convolutional Networks with Quantization and OpenVINO in Baggage Image Recognition. In Proceedings of the 2022 VIII International Conference on Information Technology and Nanotechnology (ITNT), Samara, Russia, 23–27 May 2022; pp. 1–4. [Google Scholar] [CrossRef]
  10. Wu, X.; Jin, Y.; Wang, J.; Qian, Q.; Guo, Y. MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition. Algorithms 2022, 15, 160. [Google Scholar] [CrossRef]
  11. Andriyanov, N.; Dementiev, V.; Gladkikh, A. Analysis of the Pattern Recognition Efficiency on Non-Optical Images. In Proceedings of the 2021 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), Yekaterinburg, Russia, 13–14 May 2021; pp. 0319–0323. [Google Scholar] [CrossRef]
  12. Rizà Porta, R.; Sterchi, Y.; Schwaninger, A. How Realistic Is Threat Image Projection for X-ray Baggage Screening? Sensors 2022, 22, 2220. [Google Scholar] [CrossRef]
  13. Ribas, D.; Miguel, A.; Ortega, A.; Lleida, E. Wiener Filter and Deep Neural Networks: A Well-Balanced Pair for Speech Enhancement. Appl. Sci. 2022, 12, 9000. [Google Scholar] [CrossRef]
  14. Antonetti, A.E.d.S.; Siqueira, L.T.D.; Gobbo, M.P.d.A.; Brasolotto, A.G.; Silverio, K.C.A. Relationship of Cepstral Peak Prominence-Smoothed and Long-Term Average Spectrum with Auditory–Perceptual Analysis. Appl. Sci. 2020, 10, 8598. [Google Scholar] [CrossRef]
  15. Andriyanov, N.; Andriyanov, D. Intelligent Processing of Voice Messages in Civil Aviation: Message Recognition and the Emotional State of the Speaker Analysis. In Proceedings of the 2021 International Siberian Conference on Control and Communications (SIBCON), Kazan, Russia, 13–15 May 2021; pp. 1–5. [Google Scholar] [CrossRef]
  16. Andriyanov, N.A. Recognition of radio exchange voice messages in aviation based on correlation analysis. Izv. Samara Sci. Cent. Russ. Acad. Sci. 2021, 23, 91–96. [Google Scholar] [CrossRef]
  17. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  18. Dhouib, A.; Othman, A.; El Ghoul, O.; Khribi, M.K.; Al Sinani, A. Arabic Automatic Speech Recognition: A Systematic Literature Review. Appl. Sci. 2022, 12, 8898. [Google Scholar] [CrossRef]
  19. Nallasamy, U.; Metze, F.; Schultz, T. Active Learning for Accent Adaptation in Automatic Speech Recognition. In Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), Miami, FL, USA, 2–5 December 2012; pp. 360–365. [Google Scholar]
  20. Wahyuni, E.S. Arabic Speech Recognition Using MFCC Feature Extraction and ANN Classification. In Proceedings of the 2017 2nd International Conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 1–2 November 2017; pp. 22–25. [Google Scholar]
  21. Trinh Van, L.; Dao Thi Le, T.; Le Xuan, T.; Castelli, E. Emotional Speech Recognition Using Deep Neural Networks. Sensors 2022, 22, 1414. [Google Scholar] [CrossRef]
  22. Satt, A.; Rozenberg, S.; Hoory, R. Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms. In Proceedings of the International Speech Communication Association (INTERSPEECH), Stockholm, Sweden, 20–24 August 2017; pp. 1089–1093. [Google Scholar]
  23. Aksyonov, K.; Antipin, D.; Afanaseva, T.; Kalinin, I.; Evdokimov, I.; Shevchuk, A.; Karavaev, A.; Chiryshev, U.; Talancev, E. Testing of the Speech Recognition Systems Using Russian Language Models. CEUR Workshop Proc. 2018, 2298, 1–7. [Google Scholar]
  24. Vazhenina, D.; Kipyatkova, I.; Markov, K.; Karpov, A. State-of-the-art speech recognition technologies for Russian language. HCCE’12. In Proceedings of the 2012 Joint International Conference on Human-Centered Computer Environments, Aizu-Wakamatsu, Japan, 8–13 March 2012; pp. 59–63. [Google Scholar] [CrossRef]
  25. Bagley, S.; Antonov, A.; Meshkov, B.; Sukhanov, A. Statistical Distribution of Words in a Russian Text Collection. In Proceedings of the Dialogue 2009, Bekasovo, Serbia, 27–31 May 2009; pp. 13–18. [Google Scholar]
  26. Alqadasi, A.M.A.; Sunar, M.S.; Turaev, S.; Abdulghafor, R.; Hj Salam, M.S.; Alashbi, A.A.S.; Salem, A.A.; Ali, M.A.H. Rule-Based Embedded HMMs Phoneme Classification to Improve Qur’anic Recitation Recognition. Electronics 2023, 12, 176. [Google Scholar] [CrossRef]
  27. Oh, D.; Park, J.-S.; Kim, J.-H.; Jang, G.-J. Hierarchical Phoneme Classification for Improved Speech Recognition. Appl. Sci. 2021, 11, 428. [Google Scholar] [CrossRef]
  28. Liu, Z.; Huang, Z.; Wang, L.; Zhang, P. A Pronunciation Prior Assisted Vowel Reduction Detection Framework with Multi-Stream Attention Method. Appl. Sci. 2021, 11, 8321. [Google Scholar] [CrossRef]
  29. Jeon, S.; Kim, M.S. Noise-Robust Multimodal Audio-Visual Speech Recognition System for Speech-Based Interaction Applications. Sensors 2022, 22, 7738. [Google Scholar] [CrossRef] [PubMed]
  30. Vazhenina, D.; Markov, K. End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features. Electronics 2020, 9, 1157. [Google Scholar] [CrossRef]
  31. Pervaiz, A.; Hussain, F.; Israr, H.; Tahir, M.A.; Raja, F.R.; Baloch, N.K.; Ishmanov, F.; Zikria, Y.B. Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data. Sensors 2020, 20, 2326. [Google Scholar] [CrossRef] [PubMed]
  32. Andriyanov, N.A.; Andriyanov, D.A. The using of data augmentation in machine learning in image processing tasks in the face of data scarcity. J. Phys. Conf. Ser. 2020, 1661, 012018. [Google Scholar] [CrossRef]
  33. Box, G.; Jenkins, G.; Reinsel, G. Time Series Analysis; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2008; p. 755. [Google Scholar]
  34. Draper, N.R.; Smith, H. Applied Regression Analysis; Wiley: New York, NY, USA, 1966; p. 407. [Google Scholar]
  35. Zhihua, W.; Yongbo, Z.; Huimin, F. Autoregressive Prediction with Rolling Mechanism for Time Series Forecasting with Small Sample Size. Math. Probl. Eng. 2014, 2014, 572173. [Google Scholar]
  36. Orzechowski, A.; Bombol, M. Energy Security, Sustainable Development and the Green Bond Market. Energies 2022, 15, 6218. [Google Scholar] [CrossRef]
  37. Prajakta, S.K. Time series Forecasting using Holt-Winters Exponential Smoothing. Kanwal Rekhi Sch. Inf. Technol. J. 2004, 13, 1–13. [Google Scholar]
  38. Suyamto, D.; Prasetyo, L.; Setiawan, Y.; Wijaya, A.; Kustiyo, K.; Kartika, T.; Effendi, H.; Permatasari, P. Measuring Similarity of Deforestation Patterns in Time and Space across Differences in Resolution. Geomatics 2021, 1, 464–495. [Google Scholar] [CrossRef]
  39. Zulifqar, A. Forecasting Drought Using Multilayer Perceptron Artificial Neural Network Model. Adv. Meteorol. 2017, 2017, 5681308. [Google Scholar]
  40. Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
  41. Andriyanov, N.A.; Dementiev, V.E.; Tashlinskii, A.G. Detection of objects in the images: From likelihood relationships towards scalable and efficient neural networks. Comput. Opt. 2022, 46, 139–159. [Google Scholar] [CrossRef]
  42. Dua, S.; Kumar, S.S.; Albagory, Y.; Ramalingam, R.; Dumka, A.; Singh, R.; Rashid, M.; Gehlot, A.; Alshamrani, S.S.; AlGhamdi, A.S. Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network. Appl. Sci. 2022, 12, 6223. [Google Scholar] [CrossRef]
  43. Salas-Páez, C.; Quintana-Romero, L.; Mendoza-González, M.A.; Álvarez-García, J. Analysis of Job Transitions in Mexico with Markov Chains in Discrete Time. Mathematics 2022, 10, 1693. [Google Scholar] [CrossRef]
  44. Yohannes, Y.; Webb, P. Classification and Regression Trees, CART: A User Manual for Identifying Indicators of Vulnerability to Famine and Chronic Food Insecurity; International Food Policy Research Institute: Washington, DC, USA, 1999; p. 59. [Google Scholar]
  45. Pehlivanoglu, I.V.; Atik, I. Time series forecasting via genetic algorithm for turkish air transport market. J. Aeronaut. Space Technol. 2016, 9, 23–33. [Google Scholar]
  46. Wenzel, F.; Galy-Fajou, T.; Deutsch, M.; Kloft, M. Bayesian Nonlinear Support Vector Machines for Big Data. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2017, Skopje, Macedonia, 18–22 September 2017, Proceedings, Part I; Springer: Cham, Switzerland, 2017; pp. 307–322. [Google Scholar]
  47. Kozionova, A.P.; Pyaita, A.L.; Mokhova, I.I.; Ivanov, Y.P. Algorithm based on the transfer function model and one-class classification for detecting the anomalous state of dams. Inf. Control. Syst. 2015, 6, 10–18. [Google Scholar]
  48. Timina, I.; Egov, E.; Yarushkina, N.; Kiselev, S. Identification anomalies the time series of metrics of project based on entropy measures. Interact. Syst. Probl. Hum. Comput. Interact. 2017, 1, 246–254. [Google Scholar]
  49. Woods, J.W.; Dravida, S.; Mediavilla, R. Image Estimation Using Doubly Stochastic Gaussian Random Field Models. Pattern Anal. Mach. Intell. 1987, 9, 245–253. [Google Scholar] [CrossRef]
  50. Danilov, A.N.; Andriyanov, N.A.; Azanov, P.T. Ensuring the effectiveness of the taxi order service by mathematical modeling and machine learning. J. Phys. Conf. Ser. 2018, 1096, 012188. [Google Scholar] [CrossRef]
  51. Andriyanov, N.; Dementiev, V.; Tashlinskiy, A. Development and Research of Intellectual Algorithms in Taxi Service Data Processing Based on Machine Learning and Modified K-means Method. In Intelligent Decision Technologies. Smart Innovation, Systems and Technologies; Springer: Singapore, 2022; Volume 309, pp. 183–192. [Google Scholar] [CrossRef]
  52. Armer, A.I. Modeling and Recognition of Speech Signals Against the Background of Intense Interference. Ph.D. Thesis, Ulyanovsk State Technical University, Ulyanovsk, Russia, 20 June 2006; pp. 1–190. [Google Scholar]
  53. Krasheninnikov, V.R.; Lebedeva, E.Y.; Kapyrin, V.K. Variation of the boundaries of speech commands to improve the recognition of speech commands by their cross-correlation portraits. In Proceedings of the Samara Scientific Center of the Russian Academy of Sciences, Samara, Russia, 20–21 November 2013; Volume 15, pp. 928–930. [Google Scholar]
  54. Ayvaz, U.; Guruler, H.; Khan, F.; Ahmed, N.; Whangbo, T.; Abdusalomov, A. Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning. Comput. Mater. Contin. 2022, 71, 5511–5521. [Google Scholar] [CrossRef]
  55. Khan, F.; Tarimer, I.; Alwageed, H.S.; Karadağ, B.C.; Fayaz, M.; Abdusalomov, A.B.; Cho, Y.-I. Effect of Feature Selection on the Accuracy of Music Popularity Classification Using Machine Learning Algorithms. Electronics 2022, 11, 3518. [Google Scholar] [CrossRef]
  56. Audacity. Available online: https://www.audacityteam.org/ (accessed on 11 January 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.