Influence of Environmental Noise on Quality Control of HVAC Devices Based on Convolutional Neural Network

Testing the quality of manufactured products based on their sound expression is becoming popular nowadays. To maintain low production costs, the testing is processed at the end of the assembly line. Such measurements are affected considerably by the factory noise even though they are performed in anechoic chambers. Before designing the quality control algorithm based on a convolutional neural network, we do not know the influence of the factory noise on the success rate of the algorithm that can potentially be obtained. Therefore, this contribution addresses this problem. The experiments were undertaken on a synthetic dataset of heat, ventilation, and air-conditioning devices. The results show that classification accuracy of the decision-making algorithm declines more rapidly at a high level of environmental noise.


Introduction
The current trends in industry are defined by the main ideas of Industry 4.0, which focuses on the automation and digitisation of production processes at the maximal scale [1]. That also includes a kind of permanent quality control mechanism, which is placed at the end of the production process to ensure that no defective products are shipped to customers [2].
Two different methodologies are commonly used for solving quality control problems. In addition to the traditional method based on analytical calculations, there are those based on artificial intelligence [3]. With this emerging technology, it is possible to solve tasks that were thought unsolvable just a few years ago [4].
Manufactured goods can be tested for various physical features. One minor area of quality control involves checking the quality of a product's sound expression; therefore, separate scientific disciplines are dedicated to solving such tasks. The field of acoustic event detection [5,6] aims to detect occurrences of certain events in an audio recording [7] and can be used for solving industrial applications [8]. In contrast, the area of industrial sound analysis deals with the automatic identification of faults in production machinery or manufactured products by analysing audio signals [9]. In general, testing products directly on a production line is very problematic since the measurement is affected by the noisy environment of production plants. Anechoic chambers, used in the prototype development phase, meet the requirements for the complete filtering of environmental noise (EN). The chambers that are installed in production plants, where space is limited, are not as effective in the filtering of EN. Therefore, it is important to test how effective an anechoic chamber

Materials and Methods
To find the relationship between the volume of EN and the classification accuracy, we need to perform multiple testing of the same products that are affected gradually by a wide range of background noise intensities. Maintaining a constant level of EN is not possible inside the production plant; especially problematic is ensuring low volumes of EN. For that reason, we decided to carry out testing on synthetic data that imitate the sound expression of real products at maximum scale. After obtaining the pure expressions of products, the EN in various intensities is added to the signal.
For the testing product, we chose an automotive heat, ventilation, and air-conditioning device (HVAC). Products are currently being tested using analytical methods from the area of psychoacoustics [25], for example, loudness, roughness and sharpness, where minimal Appl. Sci. 2021, 11,7484 3 of 17 and maximal acceptable values for each method are defined. If these limits are exceeded the product is classified as unsuitable. This means that in reality, three methods are being used at the same time, which is a time-consuming procedure. The objective is to replace this procedure with one method, based on an artificial neural network.
However, the research results are more general and explored values are valid with any rotary device that works in an industrial environment.

Measurements of Heat, Ventilation, and Air-Conditioning Devices
The company ELCOM, a.s., Ostrava, Czech Republic [26] is focused on designing a wide range of testing platforms, that are implemented in customers' production plants, where a wide range of automotive components are produced. There are, among their many products, HVACs of various parameters and sizes. Of those specific products, each type is tested for acoustic features at the end of the production line inside the specific anechoic chamber, see Figure 1. We used their equipment to acquire a large number of HVAC sound recordings.
Appl. Sci. 2021, 11,7484 3 of 17 methods from the area of psychoacoustics [25], for example, loudness, roughness and sharpness, where minimal and maximal acceptable values for each method are defined. If these limits are exceeded the product is classified as unsuitable. This means that in reality, three methods are being used at the same time, which is a time-consuming procedure. The objective is to replace this procedure with one method, based on an artificial neural network. However, the research results are more general and explored values are valid with any rotary device that works in an industrial environment.

Measurements of Heat, Ventilation, and Air-Conditioning Devices
The company ELCOM, a.s., Ostrava, Czech Republic [26] is focused on designing a wide range of testing platforms, that are implemented in customers' production plants, where a wide range of automotive components are produced. There are, among their many products, HVACs of various parameters and sizes. Of those specific products, each type is tested for acoustic features at the end of the production line inside the specific anechoic chamber, see Figure 1. We used their equipment to acquire a large number of HVAC sound recordings. The measurement was performed with the use of a Brüel and Kjaer Type 4966-H-041 microphone, which is characterised by high measurement accuracy in a wide range of volumes (16.5-134 dB), frequencies (6.3-20,000 Hz) and temperatures (−20-150 °C).
After the measurement, the sound recordings were analysed to depict the pure nature of HVAC sound. The main analysis was undertaken with the use of the discrete Fourier transform: The measurement was performed with the use of a Brüel and Kjaer Type 4966-H-041 microphone, which is characterised by high measurement accuracy in a wide range of volumes (16.5-134 dB), frequencies (6.3-20,000 Hz) and temperatures (−20-150 • C).
After the measurement, the sound recordings were analysed to depict the pure nature of HVAC sound. The main analysis was undertaken with the use of the discrete Fourier transform: That converts input signal x from time to frequency domain X [27]. The length of both sequences is N and k = 0, 1, . . . , N − 1.
Fourier's theorem states that any finite sound signal is composed of a set of singlefrequency sound waves; namely, the harmonic functions, sine and cosine [28]. The Fourier transform is used to decompose a signal into its individual frequencies, as well as the amplitudes of those frequencies. The result of the Fourier operation is called the spectrum [29]. As can be seen in Figure 2, the achieved results for HVAC show a recorded sample converted to a spectrum.
That converts input signal from time to frequency domain [27]. The length of both sequences is and = 0, 1, … , − 1. Fourier's theorem states that any finite sound signal is composed of a set of singlefrequency sound waves; namely, the harmonic functions, sine and cosine [28]. The Fourier transform is used to decompose a signal into its individual frequencies, as well as the amplitudes of those frequencies. The result of the Fourier operation is called the spectrum [29]. As can be seen in Figure 2, the achieved results for HVAC show a recorded sample converted to a spectrum. After analysing thousands of available HVAC sound recordings, we can describe the pure nature of HVAC as a composition of elementary harmonic functions. Each of these elementary sine functions is defined as the n-th harmonic function related to frequency of HVAC system's rotation f and has its origin in a different component or sub-part of the HVAC [30,31]. Figure 2b shows four specific peaks that correspond to four sources of noise. These are the only four sources detected from the results of the thousands of diagnosed sound recordings of the HVAC system: the shaft imbalance of the fan, the unequal electromagnetic field created by each coil of the electric motor, which causes a motor failure, defective bearings, and a defective fan blade. A detailed list of established functions with their presumed sources, that correspond to the described failures with their frequencies, is shown in Table 1. The next step is to compare the amplitudes of peaks with limit values which will show us if it can be qualified as a failure or not.

Frequency
Estimated Source 2f Shaft imbalance. 8f Electric motor failure. Number of coils, that form the electromagnetic field, is 8. 13f Defective bearings. Number of balls in the bearing is 13. 41f Ventilator fault. Number of fan blades is 41.
These signals make up the base for creating synthetical HVAC sound expressions. For generating such data, we need to define other features of these signals. Since the audibility of these signals is dependent on the factory background noise, we first need to After analysing thousands of available HVAC sound recordings, we can describe the pure nature of HVAC as a composition of elementary harmonic functions. Each of these elementary sine functions is defined as the n-th harmonic function related to frequency of HVAC system's rotation f and has its origin in a different component or sub-part of the HVAC [30,31]. Figure 2b shows four specific peaks that correspond to four sources of noise. These are the only four sources detected from the results of the thousands of diagnosed sound recordings of the HVAC system: the shaft imbalance of the fan, the unequal electromagnetic field created by each coil of the electric motor, which causes a motor failure, defective bearings, and a defective fan blade. A detailed list of established functions with their presumed sources, that correspond to the described failures with their frequencies, is shown in Table 1. The next step is to compare the amplitudes of peaks with limit values which will show us if it can be qualified as a failure or not. These signals make up the base for creating synthetical HVAC sound expressions. For generating such data, we need to define other features of these signals. Since the audibility of these signals is dependent on the factory background noise, we first need to measure the noise inside the chamber, and then set intensities of these signals which are relative to the noise.

Environmental Noise and Features of the Anechoic Chamber
The sound recording of factory noise that penetrated through walls of the anechoic chamber was recorded inside the chamber. Apart from the exterior noise, other elements of noise, such as influence of flowing blown air, are present during the testing of HVACs. These effects are simulated by mixing pink noise into the recording with the intensity that was observed from real testing. This complete noise is used for generating artificial sounds that simulate testing of the HVACs.
To better illustrate the EN, which is present during in-service testing of HVAC systems, we performed the following experiment, which describes the properties of the acoustic chamber. The measurement was performed inside and outside the acoustic chamber, as shown in Figure 3, at a time when no product was tested, so that we could verify only the properties of the acoustic chamber, respectively the EN, as shown in Figure 4. The damping rate of the chamber depends on the shape and material of its tiling [32,33]. measure the noise inside the chamber, and then set intensities of these signals which are relative to the noise.

Environmental Noise and Features of the Anechoic Chamber
The sound recording of factory noise that penetrated through walls of the anechoic chamber was recorded inside the chamber. Apart from the exterior noise, other elements of noise, such as influence of flowing blown air, are present during the testing of HVACs. These effects are simulated by mixing pink noise into the recording with the intensity that was observed from real testing. This complete noise is used for generating artificial sounds that simulate testing of the HVACs.
To better illustrate the EN, which is present during in-service testing of HVAC systems, we performed the following experiment, which describes the properties of the acoustic chamber. The measurement was performed inside and outside the acoustic chamber, as shown in Figure 3, at a time when no product was tested, so that we could verify only the properties of the acoustic chamber, respectively the EN, as shown in Figure  4. The damping rate of the chamber depends on the shape and material of its tiling [32,33].
Both acquired signals are converted from pressure amplitude to sound pressure level (SPL) on a decibel scale [34], which is usually used in this case, and shown in Figure 3.
The absorption level of anechoic chambers is usually portrayed in a one-third octave band spectrum [35,36]. One-third octave band levels of both signals were calculated for frequencies ranging from 25 Hz to 25 kHz.  The damping level of the anechoic chamber is then calculated as sound level difference between the signal recorded outside and inside of the acoustic chamber according to (2) [37].
As we can see from the Figure 4, the chamber is most effective at damping signals with frequencies ranging from 300 Hz to 2 kHz. The frequencies of EN that are outside of this range strongly influence the effectiveness of the anechoic chamber.

Elements of Heat, Ventilation, and Air-Conditioning Device (HVAC) Expression
Here, we return to the definition of the elementary expressions of HVACs for generating artificial datasets. It is necessary to determine the amplitude of components and, therefore, the data labeling. With the use of the acquired noise, we can define the amplitude level for all four elementary signals. For each component, we searched the threshold level of amplitude, where the presence of a component is barely recognisable from the noise. These amplitude levels were obtained by an experiment, where we Both acquired signals are converted from pressure amplitude to sound pressure level (SPL) on a decibel scale [34], which is usually used in this case, and shown in Figure 3. The absorption level of anechoic chambers is usually portrayed in a one-third octave band spectrum [35,36]. One-third octave band levels of both signals were calculated for frequencies ranging from 25 Hz to 25 kHz.
The damping level of the anechoic chamber D is then calculated as sound level difference between the signal recorded outside SPL O and inside SPL I of the acoustic chamber according to (2) [37].
As we can see from the Figure 4, the chamber is most effective at damping signals with frequencies ranging from 300 Hz to 2 kHz. The frequencies of EN that are outside of this range strongly influence the effectiveness of the anechoic chamber.

Elements of Heat, Ventilation, and Air-Conditioning Device (HVAC) Expression
Here, we return to the definition of the elementary expressions of HVACs for generating artificial datasets. It is necessary to determine the amplitude of components and, therefore, the data labeling. With the use of the acquired noise, we can define the amplitude level for all four elementary signals. For each component, we searched the threshold level of amplitude, where the presence of a component is barely recognisable from the noise. These amplitude levels were obtained by an experiment, where we individually merge a component signal of variable amplitude with the noise. Then we listened to the merged signals and tried to recognise the component signal. If the component signal was too strong, we reduced its amplitude. On the other hand, we enhanced the amplitude if we could not recognise the component signal. We did this for all four component signals individually, until we found the thresholds (see signal component nominal amplitude values in Table 2). The listening experiment was performed using a KOSS UR29 headset that provides great volume and is designed to block outside noise. The headset has an impedance of 100 ohms, a sensitivity of 101 dB/mW, and a frequency response range of 18 Hz-20 kHz. By setting the threshold for each possible product defect, we define the point at which the error is subjectively recognisable by a customer. The centre and the duration of the components' signal were observed from the real measurements of HVACs.
In the real world, some HVAC defects only occur in the narrow range of fan speeds. For quick detection of all defects across the complete working frequency range, the quality control sequence increases the product's frequency during testing, so it follows the ramp course. The ramp is shown in Figure 5. All components' parameters have been defined, so we can start to generate the first dataset that will be fed to the neural network. Each sound signal is formed by the noise signal and all four components, where each component feature is randomised by increasing or decreasing its value up to the percentage of the feature's nominal value. A full list of component features is shown in Table 2. Before feeding the signals into the network, we need to distinguish sound signals that represent both working and defective HVACs. The purpose of testing HVACs is not to detect a structural flaw, but a defect caused by improper manufacturing. In real testing, an HVAC is categorised as defective if any error appears and can be subjectively noticed by the customer. While the testing completely relies on a customer's auditory perception, the previous quality control method was based on psychoacoustic tests. Overall, it is not possible to define a dividing parameter based on amplitude or other sound parameters. For this reason, we decided to use CNN algorithms, which do not require any set limits. We will follow this logic and mark each signal as "NOK" if the amplitude of at least one component exceeds its nominal value after randomisation. Otherwise, the signal gets an "OK" tag.
A single dataset consists of 2000 samples evenly distributed across both classes. Each class set is divided into training and testing subsets with a ratio of 5:1. Therefore, each class is represented by 833 training and 167 testing samples.

Signal Transformation and Datasets Generation
State of the art technology is based on converting an acoustic [38,39] or vibration [40] signal into the spectrogram before feeding it into the neural network. While designing the CNN model for the classification of manufactured products based on their sound All components' parameters have been defined, so we can start to generate the first dataset that will be fed to the neural network. Each sound signal is formed by the noise signal and all four components, where each component feature is randomised by increasing or decreasing its value up to the percentage of the feature's nominal value. A full list of component features is shown in Table 2.
Before feeding the signals into the network, we need to distinguish sound signals that represent both working and defective HVACs. The purpose of testing HVACs is not to detect a structural flaw, but a defect caused by improper manufacturing. In real testing, an HVAC is categorised as defective if any error appears and can be subjectively noticed by the customer. While the testing completely relies on a customer's auditory perception, the previous quality control method was based on psychoacoustic tests. Overall, it is not possible to define a dividing parameter based on amplitude or other sound parameters. For this reason, we decided to use CNN algorithms, which do not require any set limits. We will follow this logic and mark each signal as "NOK" if the amplitude of at least one component exceeds its nominal value after randomisation. Otherwise, the signal gets an "OK" tag.
A single dataset consists of 2000 samples evenly distributed across both classes. Each class set is divided into training and testing subsets with a ratio of 5:1. Therefore, each class is represented by 833 training and 167 testing samples.

Signal Transformation and Datasets Generation
State of the art technology is based on converting an acoustic [38,39] or vibration [40] signal into the spectrogram before feeding it into the neural network. While designing the CNN model for the classification of manufactured products based on their sound expression, various approaches should be considered with some options to begin with. One option is to use a sound recording as a vector input of the neural network.
Since the spectrogram represents the signal in the frequency domain and takes an image format, currently, the best results in the field of the classification of sound recordings are achieved by networks working with input data in the form of images; thus, we decided to try another option to convert a sound recording into an image.
The highest success rate of final classification is usually achieved by transforming the signal into a spectrogram by short-term Fourier transform [41], Mel spectrogram [42] or wavelet spectrogram [43]. Due to the high tack of the HVAC's quality control process, we used the Gabor transform to make spectrograms [44] as it is considerably faster than the comparable algorithm. The created spectrograms were then scaled according to the Mel scale, which is a scale of pitches that seem to listeners to have equal distance between pitches [45].
Among the transformation parameters, the window length is especially important. With use of this parameter, we can increase the time resolution of the spectrogram while decreasing its frequency resolution, or vice versa. A high value of the window length will highlight long-lasting sound expressions of the original sound signal in a spectrogram, while lower values tend to highlight quick impulses [46]. To avoid omitting any type of sound expression, we will transform the signal into two different spectrograms. Figure 6 shows a spectrogram for the nominal values of EN. The images, which will be processed in the neural network, contain only a spectrogram with no axes. Only two of the four aforementioned HVAC signal components (from Table 2) are barely visible, so we have outlined them using red circles. A complete list of transforming parameters is shown in Table 3.
Among the transformation parameters, the window length is especially important. With use of this parameter, we can increase the time resolution of the spectrogram while decreasing its frequency resolution, or vice versa. A high value of the window length will highlight long-lasting sound expressions of the original sound signal in a spectrogram, while lower values tend to highlight quick impulses [46]. To avoid omitting any type of sound expression, we will transform the signal into two different spectrograms. Figure 6 shows a spectrogram for the nominal values of EN. The images, which will be processed in the neural network, contain only a spectrogram with no axes. Only two of the four aforementioned HVAC signal components (from Table 2) are barely visible, so we have outlined them using red circles. A complete list of transforming parameters is shown in Table 3.   All created spectrograms are represented as images with a resolution of 512 × 512 px. While the grey colour map was used for creating the spectrograms, the saved images have only a single channel. Using greyscale images requires only a quarter of memory compared to colour images; thus, further processing is considerably faster [47], which is important for the application. The right choice of using greyscale representation was verified with basic experiments, which showed that the use of colour spectrograms did not result in any increase in classification accuracy, but rather vice versa.
As we mentioned in the introduction, this paper is devoted to finding and describing the influence of EN on the classification accuracy of the decision-making algorithm based on the CNN.
To fulfil the task, we simulated measurements inside superior anechoic chambers that damp more EN. This was done by generating another five datasets. Each dataset simulated a measurement inside an anechoic chamber that could attenuate 2 dB of EN more than the previous one. The best chamber passed −10 dB of EN into the chamber. While the intensity of the generated noise decreased, the sound level of the components' signal remained constant. Figure 7 shows the spectrograms of the signal with minimal noise level, which has −10 dB compared to the nominal one. All four signal components of the HVAC are clearly visible in the pictures.
All created spectrograms are represented as images with a resolution of 512 × 512 px. While the grey colour map was used for creating the spectrograms, the saved images have only a single channel. Using greyscale images requires only a quarter of memory compared to colour images; thus, further processing is considerably faster [47], which is important for the application. The right choice of using greyscale representation was verified with basic experiments, which showed that the use of colour spectrograms did not result in any increase in classification accuracy, but rather vice versa.
As we mentioned in the introduction, this paper is devoted to finding and describing the influence of EN on the classification accuracy of the decision-making algorithm based on the CNN.
To fulfil the task, we simulated measurements inside superior anechoic chambers that damp more EN. This was done by generating another five datasets. Each dataset simulated a measurement inside an anechoic chamber that could attenuate 2 dB of EN more than the previous one. The best chamber passed −10 dB of EN into the chamber. While the intensity of the generated noise decreased, the sound level of the components' signal remained constant. Figure 7 shows the spectrograms of the signal with minimal noise level, which has −10 dB compared to the nominal one. All four signal components of the HVAC are clearly visible in the pictures. Similarly, we can simulate measurements inside inferior anechoic chambers that pass more EN to the inside of the chamber. We will also generate five more datasets; however, this time the intensity of EN will increase by 2 dB in every dataset, up to +10 dB in the last one.
As can be seen in Figure 8, the expressions of the HVAC components completely disappeared behind the EN, which was louder by +10 dB compared to the nominal one. Similarly, we can simulate measurements inside inferior anechoic chambers that pass more EN to the inside of the chamber. We will also generate five more datasets; however, this time the intensity of EN will increase by 2 dB in every dataset, up to +10 dB in the last one.
As can be seen in Figure 8, the expressions of the HVAC components completely disappeared behind the EN, which was louder by +10 dB compared to the nominal one.

Simulations
After defining the format of images that will be entering the neural network, we needed to choose a CNN architecture that will be used in the following simulations. A CNN model usually consists of the following layer types.

Simulations
After defining the format of images that will be entering the neural network, we needed to choose a CNN architecture that will be used in the following simulations. A CNN model usually consists of the following layer types.
The key components of CNN are convolutional layers that perform convolution operations on an input image using a set of kernels [48], where each kernel extracts different regional features from the input images [49]. The output matrices of convolutional layers travel to the pooling layers, which are crucial for the CNN model. This operation not only reduces the dimensions of feature maps, which has a very positive effect on the subsequent computational demands, but also allows the following layers to understand the abstract representation of the image [50]. The dropout layer improves the classification accuracy of the model, by randomly disabling the neurons of all layers during the training phase [51]. Perceptron layers reduce the dimensionality of large feature space [52] into a vector, whose size is equal to the number of available classes.
The training was executed on a single NVIDIA GeForce RTX 2070 graphic card. The card memory was a limiting factor for the size of the CNN architecture used. Initial tests for searching for optimal CNN architecture for our datasets were performed. We found that the best model of a neural network contained four convolution layers interlaced by pooling layers. Then the signal was flattened, and the dropout operation was applied to it. The end of the network consisted of a sequence of three perceptron layers. The complete architecture of the network used, along with a detailed description of the layers' parameters, is shown in Figure 9.
While searching for optimal CNN architecture, we selected the standard VGG16 [53], which we modified for our dataset. Similar modifications of this neural network type were described in [54,55]. This neural network type illustrates a growing trend in a number of kernels in convolution layers, and conversely a decreasing trend in a number of the perceptron layers of neutrons with the increasing depth of the network. We tested the specific structure optimisation by reducing the number of included layers as well as the size of these layers. However, no simpler architecture achieved at least the same quality as our selected model.
In future work, we will optimise the CNN model, for example by a genetic algorithm [56,57], to reduce its size and processing time, while maintaining great accuracy rates that The key components of CNN are convolutional layers that perform convolution operations on an input image using a set of kernels [48], where each kernel extracts different regional features from the input images [49]. The output matrices of convolutional layers travel to the pooling layers, which are crucial for the CNN model. This operation not only reduces the dimensions of feature maps, which has a very positive effect on the subsequent computational demands, but also allows the following layers to understand the abstract representation of the image [50]. The dropout layer improves the classification accuracy of the model, by randomly disabling the neurons of all layers during the training phase [51]. Perceptron layers reduce the dimensionality of large feature space [52] into a vector, whose size is equal to the number of available classes.
The training was executed on a single NVIDIA GeForce RTX 2070 graphic card. The card memory was a limiting factor for the size of the CNN architecture used. Initial tests for searching for optimal CNN architecture for our datasets were performed. We found that the best model of a neural network contained four convolution layers interlaced by pooling layers. Then the signal was flattened, and the dropout operation was applied to it. The end of the network consisted of a sequence of three perceptron layers. The complete architecture of the network used, along with a detailed description of the layers' parameters, is shown in Figure 9.
While searching for optimal CNN architecture, we selected the standard VGG16 [53], which we modified for our dataset. Similar modifications of this neural network type were described in [54,55]. This neural network type illustrates a growing trend in a number of kernels in convolution layers, and conversely a decreasing trend in a number of the perceptron layers of neutrons with the increasing depth of the network. We tested the specific structure optimisation by reducing the number of included layers as well as the size of these layers. However, no simpler architecture achieved at least the same quality as our selected model. Appl. Sci. 2021, 11, 7484 11 of 17 are obtained by the currently proposed model. This optimisation is beyond the scope of this contribution. During the training of the network, the global minimum of the loss function is searched by the RMSprop [58] optimisation algorithm with a learning rate of = 0.0001. We experimented with other combinations of the optimisation algorithms, such as Adam [59], AdaGrad [60], Adadelta [61] and AdaMax [59], with various values of learning rate , but none of these combinations proved to be better than the aforementioned one. The input images enter the network in batches of eight images, which is a maximally permissible value for our architecture and available GPU, since we need to avoid an outof-memory error. The training phase consists of 50 epochs, but the results present the In future work, we will optimise the CNN model, for example by a genetic algorithm [56,57], to reduce its size and processing time, while maintaining great accuracy rates that are obtained by the currently proposed model. This optimisation is beyond the scope of this contribution.
During the training of the network, the global minimum of the loss function is searched by the RMSprop [58] optimisation algorithm with a learning rate of η = 0.0001. We experimented with other combinations of the optimisation algorithms, such as Adam [59], AdaGrad [60], Adadelta [61] and AdaMax [59], with various values of learning rate η, but none of these combinations proved to be better than the aforementioned one. The input images enter the network in batches of eight images, which is a maximally permissible value for our architecture and available GPU, since we need to avoid an out-of-memory error.
The training phase consists of 50 epochs, but the results present the highest classification accuracy achieved across those epochs. A test dataset was used for verification.
The last layer of the used CNN model contains a single neuron that relies on a Sigmoid activation function [62], thus neuron value y varies from 0 to 1. The sample is classified as class C(y) according to (3).
The classification accuracy CA of the model on whole dataset is evaluated by (4) where n CoCl is the number of correctly classified samples from the dataset and n is the total amount of dataset samples. A sample is correctly classified if its predicted class C(y) is equal to the ground truth class, that was assigned to the sample during the generation.

Results and Discussion
The total number of generated datasets was 22 per spectrogram type, where the intensity of noise gradually grew from −10 dB to +10 dB with steps of 2 dB. To eliminate any statistical errors from the classification accuracy of a trained neural network, the training was performed multiple times for all datasets. The first experiments were done on datasets with images of spectrogram 1. Table 4 shows a list of classification accuracies, in percentages, that the CNN could reach from training on a particular dataset with data in the form of spectrogram 1. Graphical representation of the classification accuracy trend is detailed in Figure 10, which helps us to obtain a better image of the relationship between environmental noise and classification accuracy.
The same experiment was done on datasets with data in the form of spectrogram 2. Numerical value accuracy rates are presented in Table 5 and the graphical trend is shown in Figure 11. any statistical errors from the classification accuracy of a trained neural network, the training was performed multiple times for all datasets. The first experiments were done on datasets with images of spectrogram 1. Table 4 shows a list of classification accuracies, in percentages, that the CNN could reach from training on a particular dataset with data in the form of spectrogram 1. Graphical representation of the classification accuracy trend is detailed in Figure 10, which helps us to obtain a better image of the relationship between environmental noise and classification accuracy.    The same experiment was done on datasets with data in the form of spectrogram 2. Numerical value accuracy rates are presented in Table 5 and the graphical trend is shown in Figure 11.  The results describe the influence of environmental noise on the assembly line to the classification accuracy of a quality control algorithm based on a convolutional neural network. Both graphs show that decreasing the intensity of environmental noise from the nominal value leads to slight improvements of classification accuracy, and these results The results describe the influence of environmental noise on the assembly line to the classification accuracy of a quality control algorithm based on a convolutional neural network. Both graphs show that decreasing the intensity of environmental noise from the nominal value leads to slight improvements of classification accuracy, and these results are very stable across multiple runs. On the other hand, increasing the same amount of the noise level has a greater impact on reducing accuracy rates.
Sound recordings transformed to spectrograms 1, which have greater window length, tend to decrease the accuracy rate with noise increasing more or less linearly. On the contrary, a neural network trained on data transformed by spectrograms 2 could process slightly noisy records, but from +6 dB of nominal noise level, it could not learn any credible decision-making criterion, thus confirming the classical analytical method results. Simply put, the noise on this kind of spectrogram tends to overdraw the signal components' expressions more aggressively. For a direct comparison of both spectrogram types, see Figure 12. Sound recordings transformed to spectrograms 1, which have greater window length, tend to decrease the accuracy rate with noise increasing more or less linearly. On the contrary, a neural network trained on data transformed by spectrograms 2 could process slightly noisy records, but from +6 dB of nominal noise level, it could not learn any credible decision-making criterion, thus confirming the classical analytical method results. Simply put, the noise on this kind of spectrogram tends to overdraw the signal components' expressions more aggressively. For a direct comparison of both spectrogram types, see Figure 12. Overall, the greater the noise level, the less stable the results of classification were across multiple runs. Both spectrogram types sometimes failed to learn anything from the data and returned just 50% classification accuracy in very noisy datasets. This happened quite often to data from spectrogram 2. On the other hand, with the use of this type of spectrogram, the network reached its highest accuracy rates on less noisy data by approximately 2%, compared to data transformed by spectrogram 1.

Conclusions
We have diagnosed thousands of HVAC system samples and have found out unique sources of failures. Thanks to this we have been able to create synthetic recordings of specific failure sources which correspond to the measured harmonic frequencies. Here, we have converted the sound recordings into images using spectrograms, which are more suitable for diagnostics using neural networks. We have generated a total of 11 datasets, where each of them simulates a measurement in an anechoic chamber with various levels of EN damping.
From these findings, we can deduce a logical conclusion, that using a CNN architecture that can input multiple spectrograms of the same sound recording at the same time, which complement each other, should lead to an increased classification accuracy rate. Another improvement can be made by finding the ideal combination of spectrogram parameters and architecture of neural networks. This complex problem is far beyond the scope of this article. It will, however, be addressed in the forthcoming research.
The next area of further research is searching for a smaller and thus faster CNN model, that will be ideally suited for our task. The optimal architecture will be searched Overall, the greater the noise level, the less stable the results of classification were across multiple runs. Both spectrogram types sometimes failed to learn anything from the data and returned just 50% classification accuracy in very noisy datasets. This happened quite often to data from spectrogram 2. On the other hand, with the use of this type of spectrogram, the network reached its highest accuracy rates on less noisy data by approximately 2%, compared to data transformed by spectrogram 1.

Conclusions
We have diagnosed thousands of HVAC system samples and have found out unique sources of failures. Thanks to this we have been able to create synthetic recordings of specific failure sources which correspond to the measured harmonic frequencies. Here, we have converted the sound recordings into images using spectrograms, which are more suitable for diagnostics using neural networks. We have generated a total of 11 datasets, where each of them simulates a measurement in an anechoic chamber with various levels of EN damping.
From these findings, we can deduce a logical conclusion, that using a CNN architecture that can input multiple spectrograms of the same sound recording at the same time, which complement each other, should lead to an increased classification accuracy rate. Another improvement can be made by finding the ideal combination of spectrogram parameters and architecture of neural networks. This complex problem is far beyond the scope of this article. It will, however, be addressed in the forthcoming research.
The next area of further research is searching for a smaller and thus faster CNN model, that will be ideally suited for our task. The optimal architecture will be searched mainly by a genetic algorithm.
This contribution should be useful for companies that focus on production and testing of rotary devices directly at the production line and want to upgrade their anechoic chamber to improve the current quality control process. Considering the life cycle of the anechoic chambers installed directly in production lines could be up to six years, future research gives us a large amount of data for concluding the findings and future improvements. It will also help the manufacturers to make a decision about purchasing a better anechoic chamber-which would filter more EN-by having an estimate on the improvement level of the accuracy rate that can be gained by the upgrade every time a new production line is planned for a new model of HVAC device production.
The currently used qualification algorithm can distinguish faulty devices, and future work will show us also the cause of the failure, pointing out the source of the noise. Further research will focus on extending the use of the described methodology to different types of device using rotary mechanisms, either HVAC used outside the automotive industry, for example marine and aircraft technology, or general devices used for home appliances.