In-Field Detection of American Foulbrood (AFB) by Electric Nose Using Classical Classification Techniques and Sequential Neural Networks

American foulbrood is a dangerous bee disease that attacks the sealed brood. It quickly leads to the death of bee colonies. Efficient diagnosis of this disease is essential. As specific odours are produced when larvae rot, it was investigated whether an electronic nose can distinguish between colonies affected by American foulbrood and healthy ones. The experiment was conducted in an apiary with 18 bee families, 9 of which showed symptoms of the disease confirmed by laboratory diagnostics. Three units of the Beesensor V.2 device based on an array of six semiconductor TGS gas sensors, manufactured by Figaro, were tested. Each copy of the device was tested in all bee colonies: sick and healthy. The measurement session per bee colony lasted 40 min and yielded results from four 10 min measurements. One 10-min measurement consisted of a 5 min regeneration phase and a 5 min object-measurement phase. For the experiments, we used both classical classification methods such as k-nearest neighbour, Naive Bayes, Support Vector Machine, discretized logistic regression, random forests, and committee of classifiers, that is, methods based on extracted representative data fragments. We also used methods based on the entire 600 s series, in this study of sequential neural networks. We considered, in this study, six options for data preparation as part of the transformation of data series into representative results. Among others, we used single stabilised sensor readings as well as average values from stable areas. For verifying the quality of the classical classifiers, we used the 25-fold train-and-test method. The effectiveness of the tested methods reached a threshold of 75 per cent, with results stable between 65 and 70 per cent. As an element to confirm the possibility of class separation using an artificial nose, we used applied visualisations of classes. It is clear from the experiments conducted that the artificial nose tested has practical potential. Our experiments show that the approach to the problem under study by sequential network learning on a sequence of data is comparable to the best classical methods based on discrete data samples. The results of the experiment showed that the Beesensor V.2 along with properly selected classification techniques can become a tool to facilitate rapid diagnosis of American foulbrood under field conditions.


Introduction
American foulbrood is a dangerous disease of the honeybee caused by the bacterium Paenibacillus larvae larvae (white). This bacterium produces highly resistant spores [1] and can therefore survive for decades [2]. The spores can be carried by bees with food and by the beekeeper on beekeeping equipment. In this way, the disease spreads rapidly first in the apiary from colony to colony and then from apiary to apiary.
The disease is widespread worldwide [3] and has been recorded in apiaries on five continents [4]. Monitoring studies of honey contamination with P.l. larvae spores conducted in Poland [5] have shown that there are regions where the risk of American foulbrood symptoms is high. An example is the Warmia and Mazury Province, where the presence of sporocarps was confirmed in almost 50% of the examined apiaries, 20% of which showed a high degree of contamination [5]. The results of this study translate into the current epizootic situation in the area (results of observations during field work by a co-author of the project, a veterinary surgeon specialising in bee diseases).
American foulbrood manifests itself in the dying and rotting of bee brood underneath the sealing caps (Figure 1. Consequently, this disease leads to the weakening of a strong colony and its death [6]. Beekeepers have difficulty perceiving and recognising the early symptoms of American foulbrood. As a result, they find that something bad is happening to their bee colonies when it is too late for rescue. Confusing symptoms of American foulbrood with symptoms of other diseases of sealed brood is also a problem. To confirm the presence of the disease in the apiary, laboratory diagnostics are necessary. This involves taking bee and brood samples for testing, protecting them properly, sending them to a laboratory, and waiting a long time for the results. Such tests are often expensive, especially when the apiary is large, and the number of samples sent for laboratory tests must also be significant. Other tools are therefore being sought for rapid and cheap diagnosis of American foulbrood, which will alert the beekeeper to the problem at an early stage of the disease under field conditions. Our study took advantage of the fact that rotting brood produces a mixture of valerian, isocaproic, and caproic acids [7]. This causes a specific odour to be emitted by a colony suffering from American foulbrood, which can even be detected by the human nose. It therefore made sense to use an electronic nose to detect American foulbrood. The effectiveness of electronic nose detection has been scientifically proven many times. These devices have found applications in the diagnosis of diseases in plants [8], animals [9], and humans [10]. In the case of bee diseases, an array of six solid-state gas sensors can distinguish bee colonies heavily infected with the Varroa destructor parasite from healthy colonies [11] and can diagnose Varroa by examining brood samples infected with this mite [12]. The effectiveness in detecting American foulbrood in bee colonies by the multi-sensor device in preliminary studies was confirmed by a team of scientists from Australia [13].
The study tested the effectiveness of the Beesensor V.2 multi-sensor recorder in detecting American foulbrood in bee colonies and thus to answer the question of whether this electronic nose could become an effective tool in diagnosing American foulbrood.

Multi-Sensor Recorder Beesensor V.2
In the experiment, three units of Beesensor V.2 (Beecom 1, Beecom 2, and Beecom 3) constructed at the Wrocław University of Technology in the Laboratory of Sensor Techniques and Indoor Air Quality Research were tested ( Figure 2). Beesensor V.2 is an advanced multi-sensor signal recorder based on semiconductor gas sensors TGS823, TGS826, TGS832, TGS2600, TGS2602, and TGS2603 from FIGARO. The sensing layer of the Taguchi Gas Sensors (TGS) used in the experiment was SnO 2 . This MOS is more sensitive than others and can operate already at 300 degrees Celsius. The sensitivity and selectivity of TGS sensors is achieved by enriching SnO 2 with various chemical elements, including noble metals (Pt, Pd, and Ru) or rare earth metals (Y). However, the exact composition and proportions of added elements of individual TGS sensor models are a trade secret of the company. Each sensor reacts to the presence of different substances (Table 1).  The device has two inputs (regeneration path and measuring path) and one gas outlet provided with quick plug connectors. Beesensor V.2 is built up of several modules. The pneumatic module is responsible for gas sampling and sample preparation. The essence of this module is a gas pump, which supplies gas samples to the sensor block by using Teflon tubes. A carbon filter is placed in the reference path and a dust filter composed of a mixture of cellulose ester (MCE 13 mm, 0.45 microns) in the measurement path.
The user interface is the communication module, which is made up of the following elements: -Main switch, -LCD display, -Alphanumeric keypad, -Four rocker switches controlling GPS, USB, GSM, and sound -Optical controls (LEDs), -Loudspeaker, -USB sockets.
Furthermore, the device includes a microcomputer, microcontroller, GSM and GPS communication modules, and an internal clock. Beesensor V.2 is equipped with a standalone power module consisting of a battery with a capacity of 22 Ah and a voltage of 12 V, so it can be used independently of the access to main power.
Beesensor V.2 has software modules for data acquisition, processing, and transmission. Furthermore, this multi-sensor recorder is fully controllable and programmable. It enables identification of the class of tested samples thanks to the possibility of introducing a classifier, in which the method of construction of the feature vector is the selection of a moment from the time series containing measurement data for each sensor independently. The classifier can be trained by pointing to files containing reference data for a given class from the previously performed measurements under controlled conditions. Based on the indicated feature vector and the previously built set of benchmarks, the classifier calculates the class and reports it to the user.

Experimental Scheme
The study was conducted in September 2020 in an apiary located in the Warmińsko-Mazurskie (Poland) province. This apiary consisted of 18 bee colonies (Apis mellifera carnica), of which 9 colonies showed symptoms of American foulbrood. Official laboratory tests on samples of the brood taken from the affected colonies confirmed infection with P.l. larvae. Bee colonies were housed in Warsaw beehives. The average strength of sick colonies was 7.3 combs, while healthy colonies were occupying an average of 10.3 combs. The bee colonies were classified according to their health status. Thus, two classes were distinguished: • Class 0-colonies suffering from American foulbrood (visible clinical symptoms; the disease was confirmed by laboratory diagnostics)-9 objects, • Class 1-healthy colonies-9 objects.
In the experiment, three twin units of Beesensor V.2 called: Beecom 1, Beecom 2, and Beecom 3 were used. Each unit was warmed up for at least 12 h before measurements were taken. Eighteen measurement sessions were performed. Each unit of the device measured a different colony in a single measurement session, and in subsequent sessions, it alternately measured a healthy and a sick colony ( Figure 3). The probe tip was placed in the middle of the bee nest, between two brood combs. The object of measurement was a gas sample from this inter-comb space. The research lasted 3 days. The main assumption of the measurements was to examine each bee colony by each device in similar conditions of the same apiary, hive type, and management method. Unfortunately, in the fifth session, the BEECOM 3 device crashed, which caused it to examine only 17 bee colonies: 9 sick and 8 healthy ( Table 2).
The single measurement procedure was already implemented in each device. The parameters of the device for performed measurements are shown in the Table 3.  A single measurement session per bee colony lasted 40 min and consisted of four individual measurements. For simplicity in data handling, the measurements were named "beex.y" where "bee" is the abbreviation for BEECOM; "x" is the number of the next Beesensor V.2; and "y" is the next measurement in the measurement session, whereby the x value x ∈ {1, 2, 3} and the value y ∈ {0, 1, 2, 3}. Each measurement lasted 10 min, of which 5 min were the regeneration (cleaning) phase, and 5 min were the object-measurement (scent saturation) phase (Table 4). During the regeneration phase, the devices sampled pure air outside the hive, and during the measurement phase, gas samples were collected from the centre of the bee nest, from between two brood patches. Measurements covering the first 20 min of the measurement session, where y = 0 and y = 1, were not taken for analysis at all, as they gave unstable readings.

Data Processing
One reading of the sensor is understood as taking the average volt recorded by the sensor during 1 s. The result of measurement session one for the bee colony was to obtain four raw measurement files. One file with a given device consists of 300 s of cleaning and 300 s of scent saturation. Both phases have their stabilisation intervals; in the cleaning phase, the minimum value was reached, and in the saturation phase, the maximum value was reached. An example of a reading in the cleaning and saturation phases can be seen in Figure 4.

The Experimental Part Design
In the field tests we had access to nine bee colonies of each class. The differences in the size of sites in the classes for devices and reading numbers were due to some difficulties encountered during the tests. In class 0, we considered colonies with numbers: 1, 2, 3, 6, 13A, 13B, 19, 20, and 44. In class 1, we considered colonies with numbers: 5, 9, 10, 11, 11B, 12, 14, 15, and 16. The colony numbers we use for each test are in Table 5. BEECOM 2 bee2.1 9 9 bee2.2 9 9 bee2.3 9 9 BEECOM 3 bee3.1 9 9 bee3.2 9 9 bee3.3 9 8 The starting point for investigating classification possibilities is to visualise the average readings of the individual TGS sensors-see . We visualised the data before and after the baseline correction. We used the Multiple Train and Test method (Monte Carlo Cross Validation technique [14,15]) to verify the quality of the classification models. We performed 25 tests where each split is applied to all classifiers simultaneously. To assess the quality of the results, we used the balanced parameter accuracy [16], which is the average classification accuracy of all classes. As an auxiliary parameter, we used the true-positive rate (TPR), i.e., the percentage hit in the class.
In the experimental part, we test the effectiveness of classical classification techniques (applied on extracted data with a selected strategy) and a technique based on the analysis of data sequences using neural networks. Let us start with our approach in the context of the use of classical techniques.
All classification methods used in the study are sourced from the R language [25] packages. They were used with default settings. The classification committee-i.e., the com3 method-was designed based on the 1nn, svm_linear, and lg methods-see the scheme in Figure 8.    The general scheme of the research carried out is shown in Figure 9.

Variants of Data Preprocessing
Assuming j is the bee colony number, i the TGS sensor number, i ∈ {823, 826, 832, 2600, 2602, 2603}, and s the measurement second number, we defined the following variants of the test methodology.

VARIANT1:
In this variant, we applied the following transformation.

VARIANT2:
In this variant, we applied the following transformation, which we call the baseline correction.

VARIANT3:
This time we used maximum readings in the scent saturation phase.
In the fourth variant, we had a baseline correction based on subtracting from the maximum reading of the saturation phase the minimum reading of the cleaning phase.

VARIANT5:
In this variant, we combined the attributes from variants 1 and 2. The result was a system with twelve conditional attributes.

VARIANT6:
In the final variant, we combined the conditional attributes from variants 3 and 4. The result was an analogous extended system as in variant 5.

Results of the Experiments for Classical Techniques Analysis of the Results for the 25 Times Train and Test Method
First, we used six baseline research methodologies-see Section 3.1.1-and 9 reference classification techniques.
The testing methodology we chose to evaluate the models with was to split the entire sample set 25 times into training and test systems, using a split ratio of 0.6. This solution was chosen because the tested decision system contains less than 20 test samples and because classical cross-validation cannot be applied. During testing, the selected classifiers were applied to the same data splits, and the results were averaged. We used the balanced accuracy as a parameter to measure the quality of classification and the sensitivity and significance (specificity) of the tests, i.e., the percentage effectiveness in indicating the class of diseased samples and the class of healthy samples. An example of the detailed result for the best variant can be seen in Tables 6-11. It is worth noting that the third and fourth measurements from devices BEECOM 1, BEECOM 2, and BEECOM 3, marked as bee1.2, bee1.3, bee2.2, bee2.3, bee3.2, and bee3.3 were selected as reference measurements because they showed stabilised readings compared to the first and second measurements. Let us move on to a summary of the experimental part.

Summary of Results
In Tables 12-14, we have a summary of the results for the tested classification techniques and the v1:v6 test methodologies. We have results for three classification quality intervals (accuracy-balanced) 0.6, 0.65, and 0.7. The best result encountered was 0.748 for the svm_linear method and the v4 variant. Variant v4 won slightly, but the second best was variant v2. The best classification method was found to be the svm_linear technique, which performed stably in all tested variants. The second best was com3 (i.e., a committee of three classifiers 1nn, svm_linear and lg).     Seeing the experimental results on the tested data, given the fact of having few samples, we decided to perform an additional visualisation of the mean class values in the best, variant 4, so far.

Description of the Input Data
The input data are a set of samples containing multivariate values of variables over time; for each sample, we assigned one of two decisions that classify this sample appropriately. Each sample contains 600 s of time-varying data and contains six attributes-values from the following sensors: TGS823, TGS826, TGS832, TGS2600, TGS2602, and TGS2603.
The input data are organised as follows: we had three sets of measurement data: bee1, bee2, and bee3 coming from different units of the device: BEECOM 1, BEECOM 2, and BEECOM 3, respectively. Each data set contains three consecutive data subsets from successive measurements that were collected in short intervals; the whole process is related to the specificity of the measurement data. Due to the need to stabilise the level of odours, and the need to obtain as far as possible undisturbed data with initial conditions, the first data subgroup called "pack1" was rejected; the model training and prediction process used data from three units of device (bee1, bee2, and bee3) and from two measurements (measurement2 and measurement3). For each 600 s data sequence, we had one decision assigned, which clearly determined whether the sample belonged to healthy bee colony or whether we were dealing with a disease entity. The decision data came from experts.
To create a data model that is useful in the diagnosis or prognosis of American foulbrood, a sequential neural network was proposed, composed of many layers of appropriately selected neurones and processing units. In the case of neural networks that are in favour of analysing data composed of long data sequences, architectures are used that show high resistance to fading or exploding gradient problems. Examples of such networks are architectures that use LSTM or GRU cells. The presented architecture uses Gated Recurrent Unit (GRU) units, due to their simpler structure compared to LSTM units, their having a similar performance, and their ability to learn long data sequences. The structure of the data used and the visualisation of our model can be seen in Figures 10 and 11.  The proposed neural network architecture includes five layers. The first two layers are GRUs 150 in the first layer and 50 in the second layer, respectively. Then, there are dropout units with a coefficient of 0.5, whose task is to ensure appropriate network regularisation. This process affects the quality of learning, which is to minimise the possibility of overfitting the network. The next layer is a layer of 10 fully connected units and the last decision layer containing two output neurones. Because the network is to be used for the classification of the input samples, the softmax activation function was used in the last layer, which provides an indication of the decision class with a certain probability.

Results for Sequential Network
The input data were divided into training and test data in the proportions of 90% and 10%, respectively.
The training data included 600 s of six-element vectors in one training iteration (batch_size = 600, features = 6), and the number of training iterations was set to 200 (epochs). Then, after training the network, the model was validated using test data. The network has two exits, and each of them determines the group's affiliation to a given class (healthy or sick). For training the network, 117 records were used, while 11 were for testing. In total, 421,200 values were used for training the network and 39,600 values for testing. The model for test samples showed an accuracy of 72.72%-for details see Table 15.  In this section, we illustrate the visual differences between the 0 and 1 classes. The visualisations presented support the claim that the artificial nose can distinguish between sick and healthy bee colonies. Visualisation is an important support because the number of samples we had did not allow to statistically verify this thesis with certainty. We considered two options for visualisation. In the first option-see Equation (2)-we presented the squared mean sensor readings of the decision classes. See  In the second optionsee Equation (3)-we used logarithmic averages and squared them-see  average(TGS In the f irst option, we used : (average(TGS In the second option, we used : (log(average(TGS class j i ))) 2 (3)

Discussion
Electric noses based on metal oxidesemiconductors (MOS) have been used with good detections for many human diseases such as the following: urinary tractinfections [26], cancer [27], diabetes [28], and bowel diseases [29,30]. They also work well in detecting animal diseases. Fend et al. 2005 [31] used successfully an electronic nose to diagnose Mycobacterium bovis infection in badgers and cattle. Devices based on semiconductor sensors operation have also been successful in veterinary diagnostics in cases: Acute liver failure of rats [32], Cutaneous myiasis of sheep [33], and white-nose syndrome of cave-dwelling bats [9].
Our team focused on the ability to detect the most dangerous honeybee diseases such as varrosis and American foulbrood. The Figaro 6-sensor TGS device we used has so far given satisfactory results in Varroa diagnostics, both in the laboratory (ref. [12] and in the field conditions [11]. We also successfully detected colonies on MYPGP media P.l. larvae [34]. In this article, we presented the results of using a BeesensorV.2 device based on an array of the same six sensors to detect American foulbrood under field conditions in live bee colonies. The diagnostic studies of American foulbrood under field conditions have not been conducted by anyone until now. This is completely innovative research. The team of scientists Moran i in. (2019) [13] did, by gas chromatography mass spectrometry (GC-MS), preliminary identification of volatile compounds (VCs). These were to become AFB volatile biomarkers and to be used in the future as indicators of diagnostic electronic nose. However, further results of the study were not presented. It is also not known on which sensors the device was to be based.
The analysis of the graphs showing the time-course readings of the individual results has shown that the readings from the first two measurements are unstable in many cases. In particular, the first measurement marked as 0 stands out from the next three measurements. This is because at the beginning of sampling, the bee colony gas was beginning to saturate with odour. Only the third and fourth measurements were stable. Thus, we can conclude that to diagnose American foulbrood, BeesensorV.2 cannot work on a single colony for less than half an hour. It must be remembered that these are field conditions. The situation is different in the case of laboratory tests, where the array of the same sensors already gave stable readings in the first and only measurement (10 min of measurement session) [12,34,35].
The nine classification techniques were selected for data analysis. The best method was svm_linear. It gave the highest efficiency of class distinction for samples of BEECOM 1 (75%) ( Table 6) and BEECOM 2 (74%) ( Table 8). Support vector machines (SVM) is an efficient classifier, which is often used by other data scientists [36]. Good classification results were obtained with this classifier in earlier studies on Varroa diagnosis [11]. For BEECOM 3, the best results were obtained with the lg method. A classification efficiency of 71% was obtained (Table 11. Considering all the analyses for the three units and all the third and fourth measurements, a good classification tool becomes the classification committee based on the three techniques of 1nn, lg, and svm_linear. The performance of the nine classifiers tested was subjected to both raw data and data with baseline correction performed using different techniques. These measures are described as variants 1-6 Section 3.1.1 The best results were obtained for variants 4 and 2, i.e., with a baseline correction. Many researchers recommend baseline correction for sensor readings that may be affected by environmental factors [37][38][39][40]. The bee colony is an active and rich-in-volatile-compounds (VCs) organism. Its smell can particularly change in late spring and summer when the bees are working intensively. We deliberately conducted our experiment in autumn months and on cold days (13 • C). Bee activity was low then, and there were no additional scent factors both inside the bee nest (e.g., influx of fresh nectar and swarming) and outside the hive (flowering of intensely scented plants) that could significantly interfere with the sensor readings. Additionally, the timing of the experiment helped reduce to zero the possibility of triggering a robbery in the apiary, which could contribute to the transmission of American foulbrood from sick to healthy colonies.
When testing different classification methods, we could not combine the results of three different units of the same device, so we had to treat the data from each unit individually. This is due to a phenomenon called sensors drift [41]. It is a certain imperfection of MOS sensors, where individual copies of the same sensor, used in the same conditions, do not give identical readings. The visualisation of the squared average reading according to variant 4 proved that the array image of the sensor readings obtained from the three copies of BEECOM 1, BEECOM 2, and BEECOM 3 are different Section 3.4. Therefore, each piece of semiconductor sensor-based equipment requires an individual approach to calibration, e.g., by using a special algorithm [42]. We proceeded differently in the case of the sequential neural network. Here, the results from three pieces of equipment were finally treated collectively. However, in the learning phase, the classified objects were separated. The sequential neural network gave us satisfactory results of classification on the level of 73%. This result was fully consistent with the average classification result obtained by the best classifiers for the three BeesensorV.2 units. We must remember that in a bee colony, different volatile compounds (VCs) are released by both elements of the hive environment (honey bee, pollen, wax, and propolis) and by living organisms: bees and brood. Therefore, we are dealing with a gas that is a mixture of several dozen substances, and we conducted research on various bee diseases in field conditions. Thus, it was found that the device could not be based on a single sensor. A matrix of six sensors was created, each of which reacts to different substances. We analysed the matrix image of the complicated gas coming from the tested object. It was therefore concluded that the best method of device calibration would be the use of learning algorithms. The best classifiers for a particular device will be trainable.
In conclusion, Beesensor V.2 along with appropriate classification techniques showed great potential as a tool to distinguish bee colonies affected by American foulbrood from healthy ones. Thus, this electronic nose can become an effective tool for diagnosing this extremely dangerous disease in the apiary. Rapid detection of infection P.l. larvae in a single bee colony will allow the beekeeper to react immediately. In this way, it will prevent further spread of the disease to other bee colonies and neighbouring apiaries. This will save the apiary from extinction.

1.
Beesensor V.2 distinguishes between bee colonies infected with American foulbrood and healthy bee colonies at a level of 73%.

2.
During the field tests, the third and fourth measurements out of four measurements, which is the result of the measurement procedure implemented in the device, proved to be the most stable regardless of the device used.

3.
As the third measurement was already stable, the time of measurement of a single bee colony with Beesensor V.2 could be shortened to 30 min.

4.
A baseline correction was required to obtain optimal classification results. Both winning variants v4 and v2 use it. 5.
The svm classifier with a linear kernel (svm_linear method) proved to be the best tool for classification (among classical methods) in the context studied. The second most stable method was the classification committee based on the three techniques: 1nn, lg, and svm_linear. 6.
The results of data analysis using sequential neural networks on the entire data series were found to be comparable with the results for the best classical methods, which are based on a baseline correction and an extracted discrete data sample.