Illegal Logging Detection Based on Acoustic Surveillance of Forest

: In this article, we present a framework for automatic detection of logging activity in forests using audio recordings. The framework was evaluated in terms of logging detection classiﬁcation performance and various widely used classiﬁcation methods and algorithms were tested. Experimental setups, using di ﬀ erent ratios of sound-to-noise values, were followed and the best classiﬁcation accuracy was reported by the support vector machine algorithm. In addition, a postprocessing scheme on decision level was applied that provided an improvement in the performance of more than 1%, mainly in cases of low ratios of sound-to-noise. Finally, we evaluated a late-stage fusion method, combining the postprocessed recognition results of the three top-performing classiﬁers, and the experimental results showed a further improvement of approximately 2%, in terms of absolute improvement, with logging sound recognition accuracy reaching 94.42% when the ratio of sound-to-noise was equal to 20 dB.


Introduction
Forests have an imperative role in the maintenance of the earth's global biodiversity and preservation of the ecological balance.In general, forest covers across the globe are crucial and are a vital indicator of the overall health levels of the planet.It is well pointed out that forests properly purify air, preserve watersheds, prevent erosion, improve the quality of the water, and provide natural resources.In addition, forests assist in the face of global warming and they absorb a lot of carbon dioxide which is the major greenhouse gas, and thus assist in protection of the globe from climate change.According to various studies, it has been indicated that approximately 1.6 billion people across the globe rely on forests environments for their livelihoods and also that approximately 60 million indigenous people greatly rely on forests for their life and subsistence [1].
Many factors affect the existence and the sustainability of the forests.A main threat is illegal logging which can cause unmanaged and irreparable deforestation.Additionally, illegal logging is considered to be the greatest threat to biodiversity, since forests support almost 90% of terrestrial biodiversity [2].Moreover, illegal logging poses a great threat to the sustainability of forest ecosystems and can result in extensive deforestation which has a substantial negative effect on the atmosphere.The main results of illegal logging are flash floods, landslides, drought, and also climate change and global warming [2].
Illegal logging also results in losses of government revenues and may also contribute to the rise of poverty [3].Illegal logging activities affect the counties that are forest rich and also many countries that import and utilize various wood-based products from wood-producing countries [4].
In many cases, the range or scale of illegal logging is impossible to accurately calculate, mainly due to the nature of the activity.Illegal forest activities across the globe are estimated to result in approximately USD 10-15 billion losses of annual government revenue [3,5].Illegal trade irregularities were specified in the mid-1990s as accounting for almost 15% of global trade [6].In addition, it has been pointed out that, in the most vulnerable forest regions, more than half of all the logging activities have been performed illegally [7] Despite the recent work of the ecological initiatives and the formulation of various monitoring tools for export timber products, it is necessary, more than ever before, to employ systems for detecting illegal logging [8].
Many authorities in charge of forest management have taken actions for surveillance and information collection of forest environments aimed at confronting illegal logging and deforestation.In general, surveillance is conducted mainly by ground-based methods that use sensor-based monitoring approaches and that exploit the advancement of existing technologies [2].The ground-based methods include on-site monitoring by staff and patrols for the surveillance of the forest [9].In addition, observation towers are often used by specialized personnel for visual detection of illegal activities and fires.However, these approaches are very expensive, time-consuming, and in most cases, require a lot of resources.Therefore, technology-based methods and solutions need to be exploited.
During the last decades, developments in remote sensing technologies, as well as advancements in information and communication technologies (ICT) have enabled the utilization of automated or semi-automated surveillance solutions in broad areas such as forests.Technologies such as video surveillance, wireless surveillance systems, aerial photographs and satellite imagery, and communications are used.Satellite imagery is a costly solution for monitoring any illegal activities in forest areas such as illegal logging, trespassing, and deforestation, and these activities cannot always be detected by satellite photos.As an alternative, the technological advancements in wireless communications and the Internet of Things (IoT) allows various low cost and low power, small sensors to be used, that can be employed for surveillance of large areas such as forests.Wireless sensor networks (WSNs) are a technology that is using standards such as WiFi, Bluetooth, ZigBee [1], or mobile broadband (3/4/5G) [10] and can be utilized widely for forest surveillance and management [8,11].
In this article, we introduce an acoustic surveillance-based methodology for detecting logging in a forest.The presented methodology is modular and since it relies on audio evidence, it can be adapted to different forest characteristics and can be operated equally well during day and night.The remainder of this article is structured as follows: In Section 2, related work and systems in the literature are presented; in Section 3 a description of the framework for acoustic surveillance of forests for detection of illegal logging is described; in Section 4, the experimental study is described; in Section 5, the results of the study on audio-based logging identification are presented; and finally, in Section 6, the presented work is concluded.

Related Work
The detection of illegal logging in forests has attracted great interest in the research community mainly due to the substantial effect it has on the environment, the economy, and society, and therefore many studies in the literature aim at automatically detecting logging in forests.A complete presentation of methods and works on environmental sound recognition can be found in [12] with many works and studies on illegal logging in various urban and forest environments in [13].Most systems have relied on wireless sensor networks (WSNs) and have utilized sound sensors to detect the operation of chainsaws, as well as vibration sensors to specify the exact position where logging was taking place in a forest [14,15].
In [16], Ahmad and Singh presented a methodology for recognizing tree cutting in forests utilizing acoustic properties that were based on the distance between parameters, and also utilized Gaussian mixture measure (GMM), principal component analysis (PCA), and k-means clustering.Their methodology achieved satisfactory performance, reporting an accuracy of up to 92% in dense forests and up to 76% in open forests.
In [17], the authors proposed a three-tier architecture that could be used for monitoring a forest.The architecture was aimed at continuously monitoring a forest area to recognize illegal logging by using chainsaw noise identification methods on wireless sensor networks.In addition to the detection of chainsaw noises, the authors also presented methods that could localize the position of the noise of the chainsaws which were based on the time difference of arrival (TDOA) paired with multilateralism.Finally, the work utilized neural networks to efficiently identified acoustic signals of the chainsaws.
In [18], the authors presented a prototype system aimed at detecting illegal logging, which was based on the utilization of both vibration and sound sensors.Sound sensors were utilized to spot chainsaws, and vibration sensors were used to specify the falling of trees in forests.The Arduino Nano framework was utilized and GSM modules provided information to the guard patrols in the forests.The study results pointed out that the value of 63.4 dB for the chainsaws, as well as the threshold of 4400 for the vibration sensors, were suitable for detecting of illegal logging.
In [15], the authors designed and introduced a system that was based on wireless sensor networks and various sensors to detect and recognize illegal cutting of trees.In the nodes of the network, sound and vibration sensors were employed.The Xbee Pro S2C module was utilized as a communication medium and the Arduino Nano was used for data processing procedures.The system introduced by the authors was tested in a small forest and open area scenarios and the findings showed that the authors' work was cost-efficient and had a promising performance.
In [19], the authors presented a methodology for recognizing chainsaws and for specifying their position.The authors detected sound signals of chainsaws in soil and air, as well as the time difference of the arrival of the two waves in the two mediums.The sound wave from chainsaws was detected via microphone and geophone sensors.The methodology relied on a correlation to determine the time difference and to specify the distance between the sound source to a specific sensor, and also to specify the direction mainly by preformation microphone rotations.The system that was built based on the authors' methodology was energy efficient, and the testing phase reported an accuracy of 95%.
Authors, in [14], addressed logging detection and introduced a method that used vibration and sound sensors to detect illegal tree logging in mountains.In their work, they utilized simple subtraction of two data to obtain differential signal strength as a feature of the vibration.The results from this experimental study indicated that the method could distinguish between vibrations of sawing wood and vibrations of human bodies.The results also showed a clear increase in performance with the authors' sound sensing designs that utilized sound amplitude, as well as indicated better performance for detecting sounds made by sawing wood.
The authors, in [20] presented a hierarchical structured wireless sensor network which was oriented to on-site signal processing approaches that used low-cost microcontrollers.The authors introduced different time-domain methods; the first method relied on autocorrelation function, while the second method relied on TESPAR.The study results indicated that TESPAR was more sensitive to various weather effects and also pointed out that it was possible to achieve real-time, on-site, high detection performance with time domain, low complexity signal processing, with an approximately 80% true positive rate (TPR) and an almost 0% false positive rate (FPR) for different forest characteristics.The proposed system was low in cost as well as required hardware, and it could be easily used in collaborating networks of sensors in which the combination of data from different locations achieved quite good protection in large environments.
The authors, in [21], introduced a system that could be used for sound detection of chainsaws and it was based on extraction of Haar-like features.The method aimed to analyze and classify signals from audio sources using frequency-domain feature extraction.More specifically, from the spectrogram, Haar-like features were specified.The method performed a two-stage thresholding approach to discriminate chainsaw from non-chainsaw sounds.The results of the study indicated that the method was very effective in recognizing chainsaw sounds and that it could effectively perform this discrimination in forests.

System Framework for Logging Detection using Acoustic Monitoring
The presented framework for acoustic monitoring of logging in forests is based on a WSN setup of acoustic monitoring stations installed in different locations in a forest.The number of monitoring stations, M with 1 ≤ m ≤ M, can vary, with more stations resulting in more spatial resolution in acoustic monitoring of a given area of a forest.Given that specific locations/areas in a forest are highly suspicious for illegal logging, the forest authorities may select targeted locations to install the monitoring stations, thus minimizing the number of locations.The architecture for logging detection in forests using acoustic monitoring is illustrated in Figure 1.
spectrogram, Haar-like features were specified.The method performed a two-stage thresholding approach to discriminate chainsaw from non-chainsaw sounds.The results of the study indicated that the method was very effective in recognizing chainsaw sounds and that it could effectively perform this discrimination in forests.

System Framework for Logging Detection using Acoustic Monitoring
The presented framework for acoustic monitoring of logging in forests is based on a WSN setup of acoustic monitoring stations installed in different locations in a forest.The number of monitoring stations, with 1 ≤ ≤ , can vary, with more stations resulting in more spatial resolution in acoustic monitoring of a given area of a forest.Given that specific locations/areas in a forest are highly suspicious for illegal logging, the forest authorities may select targeted locations to install the monitoring stations, thus minimizing the number of locations.The architecture for logging detection in forests using acoustic monitoring is illustrated in Figure 1.As can be seen in Figure 1, the monitoring station has a microphone (which can be expanded to microphone array), a solar panel for energy autonomy, and an antenna for wireless communication with a base station (server).The microphone captures sound events and the acquired audio samples are sent wirelessly to a server for further processing.Any logging sound, at a distance that can be heard, is captured by the microphone, together with additive forest sounds and environmental noise.
Regarding the wireless transmission of the acquired audio samples, several technologies can be used.More specifically, based on the special characteristics and parameters of a forest area, data transmission can be performed using Wi-Fi or Zigbee protocols, while in the case of dense vegetation, no direct optical contact, or long distances between the stations, a mobile broadband network can be used.As baseline WSN, we consider monitoring stations, with 1 ≤ ≤ , which transfer the acquired audio data together with any log events to a base server station for further processing.
Regarding the server side, the captured audio signal, which is wirelessly transmitted from monitoring station m, is preprocessed and parameterized before being analyzed by machine learning methods for classification to detect logging sounds.The detection is performed using pretrained acoustic models for logging and the classification is binary, i.e., detection of logging sounds or not.Once a logging activity is detected, an alarm is activated to inform forest authorities.This can be done either by direct connection to a forest management/monitoring system and activation of the As can be seen in Figure 1, the monitoring station has a microphone (which can be expanded to microphone array), a solar panel for energy autonomy, and an antenna for wireless communication with a base station (server).The microphone captures sound events and the acquired audio samples are sent wirelessly to a server for further processing.Any logging sound, at a distance that can be heard, is captured by the microphone, together with additive forest sounds and environmental noise.
Regarding the wireless transmission of the acquired audio samples, several technologies can be used.More specifically, based on the special characteristics and parameters of a forest area, data transmission can be performed using Wi-Fi or Zigbee protocols, while in the case of dense vegetation, no direct optical contact, or long distances between the stations, a mobile broadband network can be used.As baseline WSN, we consider M monitoring stations, with 1 ≤ m ≤ M, which transfer the acquired audio data together with any log events to a base server station for further processing.
Regarding the server side, the captured audio signal, which is wirelessly transmitted from monitoring station m, is preprocessed and parameterized before being analyzed by machine learning methods for classification to detect logging sounds.The detection is performed using pretrained acoustic models for logging and the classification is binary, i.e., detection of logging sounds or not.Once a logging activity is detected, an alarm is activated to inform forest authorities.This can be Appl.Sci.2020, 10, 7379 5 of 12 done either by direct connection to a forest management/monitoring system and activation of the corresponding alarm or by an automatic phone call or text message to patrolling units.The modular structure of the above architecture allows adaptation of any of its modules, according to the specific needs of a forest management body, without loss of the functionality of the other modules.
The audio processing performed at the server station is based on short-time analysis of the acquired recording and decomposition of the signal in sequences of audio feature vectors.In more detail, let us denote as x the incoming audio signal.Using a window w of fixed length w the audio signal will be segmented to audio frames xi , with i = 1, 2, 3, . . .and xi ∈ R w and time step between consecutive frames typically being half of the frame length.Audio parameterization is then applied to each of the audio frames xi , thus extracting a feature vector vi , with i = 1, 2, 3, . . .and νi ∈ R V for each audio frame consisting of vi = V parameters.The sequence of audio feature vectors, vi , will then be processed by a machine learning classification model G in order to assign a binary label, logging or not logging sound, to each of the feature vectors, i.e.
where l i , with i = 1, 2, 3, . .., is the assigned binary label.To improve the logging sound recognition accuracy, a postprocessing method P can be applied on the recognized binary labels in a time window of +/− k audio frames, i.e., l i ← P(l i−k : l i+k ) where l i , with i = 1, 2, 3, . .., is the refined assigned binary label after the postprocessing step.The postprocessing step uses the recognition results of the previous k and next k audio frames to refine the detected labels and is expected to improve recognitions in the case of sporadic errors in labeling which might be caused by a burst of interference.The audio processing and logging sound classification steps are illustrated in Figure 2.
Appl.Sci.2020, 10, x FOR PEER REVIEW 5 of 12 corresponding alarm or by an automatic phone call or text message to patrolling units.The modular structure of the above architecture allows adaptation of any of its modules, according to the specific needs of a forest management body, without loss of the functionality of the other modules.
The audio processing performed at the server station is based on short-time analysis of the acquired recording and decomposition of the signal in sequences of audio feature vectors.In more detail, let us denote as the incoming audio signal.Using a window of fixed length ‖ ‖ the audio signal will be segmented to audio frames , with = 1,2,3, ... and ∈ ℝ ‖ ‖ and time step between consecutive frames typically being half of the frame length.Audio parameterization is then applied to each of the audio frames , thus extracting a feature vector , with = 1, 2, 3, ... and ̂ ∈ ℝ ‖ ‖ for each audio frame consisting of ‖ ‖ = parameters.The sequence of audio feature vectors, , will then be processed by a machine learning classification model in order to assign a binary label, logging or not logging sound, to each of the feature vectors, i.e.

← ( )
where , with = 1, 2, 3, ..., is the assigned binary label.To improve the logging sound recognition accuracy, a postprocessing method can be applied on the recognized binary labels in a time window of +/-audio frames, i.e., where , with = 1,2,3, ..., is the refined assigned binary label after the postprocessing step.The postprocessing step uses the recognition results of the previous and next audio frames to refine the detected labels and is expected to improve recognitions in the case of sporadic errors in labeling which might be caused by a burst of interference.The audio processing and logging sound classification steps are illustrated in Figure 2.

Experimental Setup
In this section, we present the audio dataset that was used in the experimental evaluation and we illustrate the audio features that were used for the parameterization of the acoustic recordings, as

Experimental Setup
In this section, we present the audio dataset that was used in the experimental evaluation and we illustrate the audio features that were used for the parameterization of the acoustic recordings, as well as the machine learning algorithms that were used for binary classification of logging sound activity.

Evaluation Dataset
In the present evaluation, we employed audio recordings from eleven different kinds of chainsaws that had a total duration of around 5 min.Except for audio recordings of the wood logging activity, audio recordings from forest sounds and environmental background noise such as rain, wind, the sound of the leaves, as well as bird vocalizations were also used.All audio data, used in the present evaluation, were collected from freely available online sound data repositories, and were all down sampled at 8 kHz with resolution analysis equal to 16 bits per sample.For the evaluation of the ability to detect wood logging in realistic conditions, the audio recordings of logging sounds were randomly mixed at various signal-to-noise ratios (SNRs) with the acoustic noise background audio recordings in the form of additive noise, as illustrated in Figure 1.

Audio Pre-Processing and Feature Extraction
All evaluated audio signals were initially frame blocked by a sliding window of 20 milliseconds length with 10 milliseconds (50%) overlap between successive audio frames.Each audio frame was parameterized by temporal and frequency domain audio descriptors.Regarding the temporal audio descriptors, the zero-crossing rate, the frame intensity, as well as the root-mean-square energy of the frame were used.The frequency domain audio features that were used were the 12 first Mel frequency cepstral coefficients (MFCCs), the harmonics-to-noise ratio by autocorrelation function, the voicing probability, as well as the dominant frequency.The dimensionality of the resulting feature vector was equal to 18, consisting of three temporal and 15 spectral audio descriptors.In addition, the above-mentioned audio features were calculated utilizing the openSMILE audio processing software tool [22].Dynamic range normalization was applied as a postprocessing step to all extracted features for equalizing the range of the numerical values.

Classification Methods and Algorithms
In our study, various widely used and well-known machine learning methods for classification were used to train binary models for logging activity acoustic detection.These machine learning algorithms were: • the support vector machine (SVM) that used the sequential minimal optimization algorithm with a radial basis function kernel [23]; • the widely used three-layer multilayer perceptron (MLP) neural network with a neuron architecture of 18-10-1, the neurons were all sigmoid and the MLP was trained with 50,000 iterations [24]; • the pruned C4.5 decision tree (J48) was set to three-fold for pruning the tree and seven-fold for growing the tree [25]; • the k-nearest neighbors classifier with linear search of the nearest neighbor and without weighting of the distance, referred to here to as an instance-based classifier (IBk) [26]; • the Bayes network learning (BN) using a simple data-based estimator for finding the conditional probability table of the network and hill-climbing for searching network structures [27].
In the study, the Weka [27] software toolkit was employed for the implementation of the aforementioned machine learning algorithms.In all the evaluated algorithms, the free parameters that were not mentioned above were set to their default values.

Results
The evaluation of the acoustic detection of logging activity presented in Section 2 was performed based on the experimental implementation presented in Section 3.For all experiments, a common protocol was followed and, in particular, the audio data were split, using 10-fold cross-validation as a means to prevent overlap between the training and the test data.The efficiency and the performance of the tested machine learning methods for binary classification, i.e., for detection of logging sound activity, were tested in terms of their accuracy for different levels of SNR.The experimental results are depicted in Figure 3.

Results
The evaluation of the acoustic detection of logging activity presented in Section 2 was performed based on the experimental implementation presented in Section 3.For all experiments, a common protocol was followed and, in particular, the audio data were split, using 10-fold cross-validation as a means to prevent overlap between the training and the test data.The efficiency and the performance of the tested machine learning methods for binary classification, i.e., for detection of logging sound activity, were tested in terms of their accuracy for different levels of SNR.The experimental results are depicted in Figure 3.In Figure 3, we observe that the classification algorithm with the highest performance across all the evaluated SNR levels, from 6 dB to 20 dB values, is the support vector machine algorithm.Specifically, the support vector machine classification algorithm reported a classification accuracy that was equal to 81.65% for a sound-to-noise ratio that was equal to 0 dB, an accuracy of 84.32% for a sound-to-noise ratio that was equal to 6 dB, an accuracy of 88.11% for a sound-to-noise ratio that was equal to 12 dB, and an accuracy of 89.45% for a sound-to-noise ratio that was equal to 16 dB, while for the noise-free conditions with SNR = 20 dB the accuracy was equal to 91.07% and dropped to 77.04% when noise increased to SNR = −6 dB.In general, the two discriminative algorithms, namely the SVM and MLP neural networks, achieved the highest classification accuracy for almost all levels of the SNR.From the results, we see that the accuracy of MLP was approximately 3% lower than that of the support vector machines and was followed by the J48 (i.e., C4.5 decision tree) which had a classification accuracy which was 80.75% and 86.02% for SNR levels 0 dB and 20 dB, respectively.We observed that the IBk algorithm and the Bayes network algorithm did not achieve good or competitive performance.
On the basis of the results, it is worth noting that in very noisy conditions, such as when the SNR level is 0 dB or -6 dB, the C4.5 decision tree method performs well and it is equally effective with the support vector machine method.This is a behavior that is in agreement with [28], in which the J48 In Figure 3, we observe that the classification algorithm with the highest performance across all the evaluated SNR levels, from 6 dB to 20 dB values, is the support vector machine algorithm.Specifically, the support vector machine classification algorithm reported a classification accuracy that was equal to 81.65% for a sound-to-noise ratio that was equal to 0 dB, an accuracy of 84.32% for a sound-to-noise ratio that was equal to 6 dB, an accuracy of 88.11% for a sound-to-noise ratio that was equal to 12 dB, and an accuracy of 89.45% for a sound-to-noise ratio that was equal to 16 dB, while for the noise-free conditions with SNR = 20 dB the accuracy was equal to 91.07% and dropped to 77.04% when noise increased to SNR = −6 dB.In general, the two discriminative algorithms, namely the SVM and MLP neural networks, achieved the highest classification accuracy for almost all levels of the SNR.From the results, we see that the accuracy of MLP was approximately 3% lower than that of the support vector machines and was followed by the J48 (i.e., C4.5 decision tree) which had a classification accuracy which was 80.75% and 86.02% for SNR levels 0 dB and 20 dB, respectively.We observed that the IBk algorithm and the Bayes network algorithm did not achieve good or competitive performance.
On the basis of the results, it is worth noting that in very noisy conditions, such as when the SNR level is 0 dB or −6 dB, the C4.5 decision tree method performs well and it is equally effective with the support vector machine method.This is a behavior that is in agreement with [28], in which the J48 was also observed to have a good performance.However, regarding the present evaluation, the support vector machine method outperformed all other evaluated machine learning methods regardless of the SNR level.This points out the advantage that support vector machines can offer in forest environments, where the presence of non-stationary interfering noises from the forest environment are widespread.In addition to forest sounds and noises, low levels of signal-to-noise ratio are expected during the acquisition of the audio when the sound source (the wood logging sounds in our case) is not close to the microphone sensors of the monitoring stations set up in the forest.
In the next step, we applied a postprocessing sliding window filter to the recognized labels of each frame in order to reduce or remove erroneous sporadic labeling of audio frames, for example, because of a momentary burst of interference, and thus contribute to improving the classification performance.More specifically, during the postprocessing step, we applied a decision-smoothing rule to each frame, vi , i.e., when the k preceding and the k successive audio frames were classified to one class (either wood logging sound or not), then the current frame was also (re)labeled as of this sound class.The length, L, of the smoothing window was subject to investigation and, in the general case, was set equal to L = 2 • k + 1.The case L = 1 corresponded to baseline setup, i.e., without any use of postprocessing of the classified labels.In Figure 4, the effect of the smoothing window on the wood logging sound classification performance for the best performing algorithm (i.e., the support vector machines) and for several SNR values is shown in percentages.
Appl.Sci.2020, 10, x FOR PEER REVIEW 8 of 12 was also observed to have a good performance.However, regarding the present evaluation, the support vector machine method outperformed all other evaluated machine learning methods regardless of the SNR level.This points out the advantage that support vector machines can offer in forest environments, where the presence of non-stationary interfering noises from the forest environment are widespread.In addition to forest sounds and noises, low levels of signal-to-noise ratio are expected during the acquisition of the audio when the sound source (the wood logging sounds in our case) is not close to the microphone sensors of the monitoring stations set up in the forest.
In the next step, we applied a postprocessing sliding window filter to the recognized labels of each frame in order to reduce or remove erroneous sporadic labeling of audio frames, for example, because of a momentary burst of interference, and thus contribute to improving the classification performance.More specifically, during the postprocessing step, we applied a decision-smoothing rule to each frame, , i.e., when the preceding and the successive audio frames were classified to one class (either wood logging sound or not), then the current frame was also (re)labeled as of this sound class.The length, , of the smoothing window was subject to investigation and, in the general case, was set equal to = 2 ⋅ + 1.The case = 1 corresponded to baseline setup, i.e., without any use of postprocessing of the classified labels.In Figure 4, the effect of the smoothing window on the wood logging sound classification performance for the best performing algorithm (i.e., the support vector machines) and for several SNR values is shown in percentages.As we can see in Figure 4 above, the impact of the postprocessing step is significant for all the signal-to-noise ratios and, especially, it is very assistive in the case of a very noisy environment, i.e., where we have low signal-to-noise ratio values.More specifically, we can see that the window length equal to three offers the best performance across all the evaluated signal-to-noise ratio values.In addition, after we employed the postprocessing with = 3 , we could see that the achieved classification accuracy was better and improved by almost 1% in terms of absolute improvement for all signal-to-noise ratio values.In the case of a noisy environment (i.e., for SNR value of −6 dB), we As we can see in Figure 4 above, the impact of the postprocessing step is significant for all the signal-to-noise ratios and, especially, it is very assistive in the case of a very noisy environment, i.e., where we have low signal-to-noise ratio values.More specifically, we can see that the window length equal to three offers the best performance across all the evaluated signal-to-noise ratio values.In addition, after we employed the postprocessing with L = 3, we could see that the achieved Appl.Sci.2020, 10, 7379 9 of 12 classification accuracy was better and improved by almost 1% in terms of absolute improvement for all signal-to-noise ratio values.In the case of a noisy environment (i.e., for SNR value of −6 dB), we could see that performance improvement was up to 2% as compared with the case in which no postprocessing was applied (L = 1).
Late-stage fusion of classifiers with postprocessing of the corresponding results was also evaluated.Specifically, there can be logging sound events that are correctly detected by one classifier but not by others.For these cases, the best performing support vector machines classifier can misrecognize a logging sound event, however, it is correctly recognized by another classification algorithm, and the fusion of their recognition outputs can potentially improve performance.To evaluate this, we applied late fusion of the recognized and postprocessed, as described in Figure 4, classifiers' outputs for the three top-performing classification methods, namely the support vector machines, the MLP neural network, and the C4.5 decision tree (J48).Moreover, the late fusion logging sound recognition results, after postprocessing, are illustrated in Figure 5.
Appl.Sci.2020, 10, x FOR PEER REVIEW 9 of 12 could see that performance improvement was up to 2% as compared with the case in which no postprocessing was applied ( = 1).Late-stage fusion of classifiers with postprocessing of the corresponding results was also evaluated.Specifically, there can be logging sound events that are correctly detected by one classifier but not by others.For these cases, the best performing support vector machines classifier can misrecognize a logging sound event, however, it is correctly recognized by another classification algorithm, and the fusion of their recognition outputs can potentially improve performance.To evaluate this, we applied late fusion of the recognized and postprocessed, as described in Figure 4, classifiers' outputs for the three top-performing classification methods, namely the support vector machines, the MLP neural network, and the C4.5 decision tree (J48).Moreover, the late fusion logging sound recognition results, after postprocessing, are illustrated in Figure 5.As can be seen in Figure 5, the late fusion of the postprocessed recognition outputs of the three classifiers resulted in further improvement of the logging sound detection accuracy.In particular, the accuracy and the performance for SNR equal to 20 dB was increased by almost 2% to 94.42% as compared with the postprocessed accuracy results of SVM.For the noisy conditions of SNR equal to -6 dB and 0 dB, the improvement, when using late fusion of the postprocessed outputs of the three classifiers, was slightly higher than 2%, resulting in an accuracy equal to 81.88% and 85.03%, respectively.The improvement, in terms of classification accuracy from the late fusion of postprocessed results of the three classifiers, indicates the complementary information carried by the outcomes of the different classification algorithms that were evaluated, despite the overall outperforming accuracy of the support vector machines.As can be seen in Figure 5, the late fusion of the postprocessed recognition outputs of the three classifiers resulted in further improvement of the logging sound detection accuracy.In particular, the accuracy and the performance for SNR equal to 20 dB was increased by almost 2% to 94.42% as compared with the postprocessed accuracy results of SVM.For the noisy conditions of SNR equal to −6 dB and 0 dB, the improvement, when using late fusion of the postprocessed outputs of the three classifiers, was slightly higher than 2%, resulting in an accuracy equal to 81.88% and 85.03%, respectively.The improvement, in terms of classification accuracy from the late fusion of postprocessed results of the three classifiers, indicates the complementary information carried by the outcomes of the different classification algorithms that were evaluated, despite the overall outperforming accuracy of the support vector machines.

Conclusions
In this article, a framework for automatic detection of logging activity in forests using audio recordings was presented.The framework used monitoring stations installed in the forest for audio recordings using microphones, and then acquired audio samples which were then processed and automatically classified into logging or not logging sounds.Five classification algorithms were tested, using well known and widely used audio descriptors during the feature extraction step, with the evaluation focusing on the chainsaw sound identification during logging in the forests.On the basis of the experimental study and the results, the best performance was reported by the support vector machine method.The experimental evaluation involved additive noise and the framework was evaluated using different values of sound-to-noise.The results demonstrated the robustness of the wood logging identifier in noisy environments, such as the sounds in real forests.Furthermore, postprocessing on decision level was also applied per audio frame providing an improvement in the performance of more than 1% and mainly in cases of low ratios of sound-to-noise.In addition, we evaluated a late-stage fusion method, combining the recognition results of the three top-performing classifiers, and the experimental results showed a further improvement of approximately 2%, in terms of absolute improvement, with logging sound recognition accuracy reaching 94.42% when the sound-to-noise ratio was 20 dB.
We deem that the presented framework greatly contributes as an affordable solution in the development of systems for monitoring forests and for preserving the sustainability of the environment, to reduce illegal deforestation and protect biodiversity.

Figure 1 .
Figure 1.The overall block diagram of the concept of the logging detection system.

Figure 1 .
Figure 1.The overall block diagram of the concept of the logging detection system.

Figure 2 .
Figure 2. Block diagram of the audio processing and logging sound classification.

Figure 2 .
Figure 2. Block diagram of the audio processing and logging sound classification.

Figure 3 .
Figure 3.The accuracy (in percentages) of the acoustic wood logging classification for various ratios of signal-to-noise and different classification algorithms.

Figure 3 .
Figure 3.The accuracy (in percentages) of the acoustic wood logging classification for various ratios of signal-to-noise and different classification algorithms.

Figure 4 .
Figure 4.The classification accuracy (in percentages) of the acoustic wood logging utilizing the postprocessing for the best performing support vector machine (SVM) classifier.

Figure 4 .
Figure 4.The classification accuracy (in percentages) of the acoustic wood logging utilizing the postprocessing for the best performing support vector machine (SVM) classifier.

Figure 5 .
Figure 5.The classification accuracy (in percentages) of the acoustic wood logging using late fusion of the postprocessed outputs of the three top-performing classifiers (SVM, support vector machine (MLP), and C4.5 decision tree (J48)).

Figure 5 .
Figure 5.The classification accuracy (in percentages) of the acoustic wood logging using late fusion of the postprocessed outputs of the three top-performing classifiers (SVM, support vector machine (MLP), and C4.5 decision tree (J48)).