A Novel Approach for Biofilm Detection Based on a Convolutional Neural Network

: Rhinology studies anatomy, physiology and diseases affecting the nasal region: one of the most modern techniques to diagnose these diseases is nasal cytology or rhinocytology, which involves analyzing the cells contained in the nasal mucosa under a microscope and researching of other elements such as bacteria, to suspect a pathology. During the microscopic observation, bacteria can be detected in the form of biofilm, that is, a bacterial colony surrounded by an organic extracellular matrix, with a protective function, made of polysaccharides. In the field of nasal cytology, the presence of biofilm in microscopic samples denotes the presence of an infection. In this paper, we describe the design and testing of interesting diagnostic support, for the automatic detection of biofilm, based on a convolutional neural network (CNN). To demonstrate the reliability of the system, alternative solutions based on isolation forest and deep random forest techniques were also tested. Texture analysis is used, with Haralick feature extraction and dominant color. The CNN-based biofilm detection system shows an accuracy of about 98%, an average accuracy of about 100% on the test set and about 99% on the validation set. The CNN-based system designed in this study is confirmed as the most reliable among the best automatic image recognition technologies, in the specific context of this study. The developed system allows the specialist to obtain a rapid and accurate identification of the biofilm in the slide images.


Background
In recent years, artificial intelligence and in particular machine learning has played a fundamental role in the medical field, providing important support to doctors, especially for assisted diagnosis by means of computer aided diagnosis (CAD) systems.
CAD systems have recently become an integral part of clinical diagnosis processes and medical images evaluation. Regardless of what happens with automatic diagnosis systems, CAD systems only play a role of support, and their performances are not supposed to be comparable to the ones of the specialized doctors and not even replace them, only playing a complementary role [1][2][3].
Numerous studies in the field of computer vision applied to medical and biomedical fields have demonstrated that additional CAD-based tools might support specialists in their tasks [20][21][22][23][24][25]. Modern technologies allow improve acquisition, transmission and analysis of digital images. A growing benefit is also provided by the possibility of sending clinical data, useful for the diagnosis of pathologies, thanks to the spread of fast connections for electronic computers, and mobile phone networks that allow the exchange of large amounts of data [26,27].
The increasing usage of such systems in clinical contests is due to the recent progress of digital imaging techniques, including the generations of quantifiable metrics, which are useful to enhance understanding of biological phenomena. Image processing and machine learning techniques are employed to find information, objects or even features related to a particular image, supporting the specialist during the clinical and diagnosis activities [28].
Visual interpretation of cellular features using microscopy also plays a fundamental role in cytology and histopathology diagnosis activities [29]. The human visual system is able to qualitatively detect and interpret visual patterns with great efficiency [30,31] and subjective evaluation is a reliable and accurate evaluation methodology but requires a great amount of time and effort: a quantitative approach trough an objective evaluation is welcome.
Using mathematical models and metrics, biological entities and phenomena could be described in less fuzzy terms enabling a quantitative, unbiased, reproducible and large scale analysis [32,33].
These techniques are now also profitably used in rhinology [34,35]. Rhinology is a branch of otorhinolaryngology, which deals with the study of anatomy, physiology and pathology and therapy of the nose and paranasal cavities.
One of the most common diagnostic techniques for identifying rhinological diseases is a nasal cytology, that is the study of nasal cellularity [36]. Rhino-cytological analyzes include a minimally invasive scraping procedure, which consists of scraping the mucous membrane of the nasal cavity. This procedure is simple, quick and does not require any type of anesthesia. The biological matter obtained, after preparation on a special slide, is then analyzed under a microscope; anomalies in the cellular distribution or the presence of unexpected elements can make one suspect a pathology: it is, therefore, an interesting diagnostic technique.
However, this diagnosis is very demanding because it consists of analyzing numerous images of the slide, classifying the cells present in the preparation, with the aim of tracking down "abnormal" cells that are indicative of particular conditions. Furthermore, this task is strongly influenced by the competence and attention of specialists.

Biofilm
For a long time in the history of microbiology, microorganisms have been considered planktonic organisms, i.e., suspended and independent cells. Van Leeuwenhoek observed microbial communities by scraping the surface of teeth. It took many years of studies and advanced diagnoses to reach a deeper knowledge of these bacterial associations, called biofilms ( Figure 1). The term biofilm derives from the Greek word "βίος", translated "that lives", as it represents living bacterial colonies, and from the English term "film", because, when observed under the microscope, it recalls the appearance of a film and also adheres to surfaces like it. The 15% of biofilms are made up of bacterial colonies, and the remaining 85% is surrounded by an organic matrix produced by them whose skeleton is made up of exopolysaccharides, that is extracellular polysaccharides, proteins and DNA. The amount of the latter varies depending on the organism and on the age of the biofilm. Furthermore, biofilm intermittently releases numerous colonies of bacteria which in turn are capable of determining the recurrence and spread of the infection [37]. There are many reasons why bacteria tend to aggregate and produce biofilms, but they can be summarized in the concept of self-preservation. In fact, the bacteria contained in a biofilm are protected from antimicrobial agents such as antibiotics, disinfectants and detergents, since the offensive action of the latter is largely slowed by the need to penetrate the dense extracellular matrix in order to reach the infectious agents.
For this reason, only 10% of microorganisms are distributed in planktonic form, remaining vulnerable to attacks by phagocytes (cells that incorporate and digest other microorganisms) and antibiotics. According to the Centers for Disease Control and Prevention, approximately 65% of all human bacterial infections involve biofilm [38]. Pediatric studies have also shown that bacterial biofilms occupy about 95% of the nasopharynx of children with respiratory infections [39]. These studies were conducted by examining nasal mucosa samples from many patients with infectious rhinopathies, and in most cases, biofilm was found.
With reference to the specific topic we deal with in this study, it must be said that the nasal mucosa is an ideal environment for the formation of biofilm, since it is wrinkled, hydrophobic and contains an abundance of nutrients.
Recent studies concerning nasal cytology have described for the first time some "morphologicalchromatic" aspects of the biofilm found in nasal mucosa [38]. Crucial was the detection of spots with specific gradations of cyan color in the diagnosis of smears of the nasal mucosa, colored with the May Grunwald Giemsa (MGG) method, containing biofilm. The cyan-biofilm association is confirmed by the systematic finding of the presence of numerous bacteria in these spots [38].
However, it should be noted that these spots, while remaining in the cyan spectrum, may have shades of variable color, due to the age of the biofilm: the more mature it is, the richer it is in the polysaccharide component and, consequently, the more intense its color [38].

Related Works
The identification and classification of these bacterial colonies are typically carried out by a specialist through observation, but automatic systems capable of replacing it are being studied in order to make significant improvements in the classification.
The biofilm, as easily understood, is not only present in the nasal mucosa, but is easily traceable in all humid environments. Water distribution systems are among the preferred environments for bacteria to form biofilms [40]. This last study was carried out thanks to the development of an automatic recognition system based on the Bayesian Naïve model. There are various types of bacterial colonies as well as biofilms and in "deep learning approach to bacterial colony classification" [41]; a system based on CNNs has been developed and made capable of extracting descriptors from the images and subsequently classifying the various bacterial colonies via the support vector machine (SVM) or random forest algorithms. Subsequently, in [42], a classification system of bacterial colonies formed on sulfide minerals was developed, based entirely on deep neural networks.
Image recognition algorithms have also proved to be very useful in the search for pathogens capable of removing biofilm, with new biological techniques for the removal of bacterial colonies [43]. Several essential oils extracted from Mediterranean plants were also analyzed for their activity against a particular biofilm, pseudomonas aeruginosa, and, with the application of machine learning algorithms, quantitative models of classification of the activity-composition relationships were developed and allowed research to be directed towards those chemical components of essential oil most involved in the inhibition of biofilm production. Algorithms such as random forest and SVM were applied.
In [44] a method is tested to measure the removal efficiency of biofilm from surfaces composed of biomaterials. Biofilm formation on biomaterial surfaces is a major health concern and significant research efforts are directed towards the production of biofilm resistant surfaces and the development of biofilm removal techniques. The authors perform a digital scanning segmentation carried out under a microscope on these materials, so as to be able to calculate their biofilm area and in order to study their development under certain conditions and specific treatments. Random forest algorithms capable of recognizing edges are used for segmentation.
One type of biofilm that most infects the human being is Salmonella. In [45] a machine learning method is designed to understand how the particular type of Salmonella biofilm adapts and develops in the intestine of the human being. The system is based on random forest and is able to recognize the presence of this specific bacterial formation.
The works we have indicated above are not directly comparable with our one, as they are based on images obtained using different coloring techniques and do not deal with the application of machine learning techniques in the field of cells and in particular of the nasal mucosa cells.
Unfortunately, as far as we are aware, there are no other studies in the literature that deals with this topic. In particular, no studies are available that use colored slides with the MGG technique which is the most used technique in rhinocytology. Other works refer to the identification of biofilms on materials other than nasal mucus and adopt Gram staining techniques of the slides very different from the MGG.

A New Diagnostic Support
The goal of this study is to provide interesting diagnostic support in the field of rhinocytology, for the rapid and accurate detection of biofilm. The algorithms used here analyze the chromatic and morphological characteristics of the biofilm. In particular, a system for the detection of biofilm based on CNN has been developed, which represents one of the most performing and reliable solutions in the recognition of elements in an image. CNN is divided into convolution and filtering operations on images with the aim of training the system to recognize the biofilm identified by the cyan-colored spots. At the same time, taking into consideration the color properties of the biofilm, a system was tested that was able to directly analyze the texture, with the extraction of Haralick features and dominant color. This system uses the isolation forest as a learning algorithm. From an approach more closely linked to machine learning such as the isolation forest, we wanted to orient ourselves towards deep learning, also designing and testing a solution based on deep random forest, with the aim of demonstrating the margins for improvement obtainable, adopting deep solutions learning. Finally, the three systems designed are compared, in order to show which of these technologies is the most effective in the application context of automatic biofilm recognition.

Materials and Methods
The cytological technique includes the following steps: withdrawal (sampling), processing (which includes fixation and staining) and observation under a microscope. Cytological sampling consists of the collection of superficial cells of the nasal mucosa, which can be performed with the aid of a sterile swab or with the use of a small curette (scraping) in disposable plastic material (nasal scraping ® -EP Medica, Lugo, Italy) e.g., nasal scraping. The sampling must be performed in the central portion of the inferior turbinate, which contains the correct ratio between hair cells and muciparous.
For this study, 24 preparations were made for slides at the Rhinology Clinic of the Department of Otolaryngology of the University of Foggia. After sampling, the cellular material was distributed on a microscopic slide, fixed by drying in the air and colored according to the MGG method, in which three dyes are used: red-orange eosin, methylene blue, blue and blue II of gray-blue color. Using this method, all cellular components of the nasal mucosa, immune cells, bacteria, fungal spores and fungal hyphae are stained. This coloring technique takes about 30 min. The standard clinical protocol regulates the compilation of the rhinocytogram by observing, for each slide, 50 fields under an optical microscope with magnification that generally ranges from 400× to 1000× to estimate the cell distribution present, identify abnormal cell elements or biofilms, important for diagnosis.

Acquisition and Scanning of Slides
All the slides have been observed using the Proway optical XSZPW208 T optical microscope with 100× lens and 1000× magnifying factor. The DCE-PW300 3MP digital camera has been used to obtain digital image-fields. The images were saved on JPEG files with a size of 3264 × 1840 pixels. The digital acquisition of nasal smears can be affected by uneven illumination due to the irregular thickness of the nasal smears, but since the procedure was performed manually, most of them are perfectly in focus.

Dataset
The dataset is composed of 24 images corresponding to the digital scans of the slides prepared as explained in Section 2.1. It was necessary to apply a pre-processing phase on the set, trying, through data mining techniques, such as image segmentation and image augmentation, to obtain a larger dataset, so as to be large enough to train the proposed models. In particular, the image segmentation method adopted is the tile-to-tile [46] which consists of segmenting a digital image into regions of uniform size, called tiles.
From each image of the dataset, 32 tiles of 384 × 384 pixels size are produced. Thus from the twenty-four initial images, a total of 768 tiles are produced. By applying this technique, we obtain a larger dataset with a greater degree of detail, in order to optimize the analysis of the images. The 768 tiles obtained are portions of the original images, and not all of them contain biofilm spots. A total of 542 tiles contained only a grey background image and then have been discarded. Only 226 tiles containing significant images were selected, to which labeling is applied, obtaining 112 tiles containing biofilm labeled "biofilm", and 114 tiles not containing biofilm, labeled "other".
Another fundamental requirement for correct training is the balancing of the dataset, i.e., the population of the positive class (number of 'biofilm' tiles) compared to the population of the false class (number of 'other' tiles). Based on the data obtained after the image segmentation phase, the dataset is well balanced as the two populations are almost equal.
The image augmentation technique allows you to expand the amount of data available, making a series of random changes on images such as random rotation, random translation or elastic distortion. By applying the image augmentation technique to the original 226 tiles, we obtained a total of 4520 tiles, of which 2240 are labeled "biofilm" and 2280 are labeled "other".

Designing a Working System
With the aim of designing a system that would support the specialist during the observation phase of the slides, we have defined two possible scenarios.
The first one takes advantage of the evolution of the smartphone technology and is based on the development of RhinoSmart [47], a multimedia system able to acquire an image from the digital microscope and to extract the cellular elements. During the preliminary technical trials, the images of the smears have been acquired with a Samsung Galaxy S6 Edge smartphone with a 16-megapixel digital rear camera, with a photo resolution of 5312 × 2988 pixels and an aperture of F/1.9. A specific smartphone adapter was also used as in Figure 2.
The main advantage of this framework is the possibility of sharing images obtained from the observed fields immediately, as they can be sent directly to a working server system that automatically processes them (https://rhinocyt.di.uniba.it/#/login). Then the algorithms presented in this paper can be directly implemented as a smartphone application or on the remote server. The second scenario allows the design of a fully-automated system, as it is becoming more important to increase the efficiency of lab operations by digitizing slide specimens, a practice known as whole slide imaging (WSI), which has many obvious productivity benefits. While fully motorized virtual slide scanners can streamline the WSI process, not every researcher has the budget to own one. As an alternative, you can combine a microscope with a motorized stage and a digital camera to perform cost-effective whole slide imaging.
The most immediate example is based on a system-on-chip (SoC) system commercially available, for example a Raspberry Pi or Nvidia Jetson. SoC systems have a relatively low cost and can be effectively customized implementing specific functionalities. In particular, to run a neural network model on an embedded system, the Raspberry Pi represents the best option currently available in terms of cost and offers reasonable performance for the execution of deep learning models, for example through the installation of OpenCV, TensorFlow, and Keras. A block diagram is shown in Figure 3. In this scenario, two main blocks are considered: an image acquisition one, that basically represents the motorized microscope, and an image processing one, based on the system on chip. The sample is the input, whilst the output is represented by the neural network output. The motorized microscope is schematized by four blocks: • Microscope manager: a logical block that can be tuned to perform a complete scan. It is logically responsible for the acquisition, as it sets up the parameters for the motorized stage, starts the acquisition and notifies the frames availability to the system on chip at the end. • The Microscope. • Motorized stage: this block is responsible for the physical movement of the sample. The movement signal is sent by the microscope manager step by step. At the end of each step it sends a trigger signal to the digital camera to enable the frame capture. • Digital camera: a sensor opportunely coupled with the other blocks that captures a frame upon request.
Finally, the system on chip is the hardware specifically devoted to host the neural network model. It receives the stack of images at the end of the acquisitions, processes them separately, and computes the output. . Block diagram of the second scenario described, which comprises a motorized microscope for the "image acquisition" and a system on chip board to perform "image processing".

The Convolutional Neural Network
The tiles obtained from the processing on the dataset are converted into grayscale with 256 shades and scaled down to 50 × 50 pixels, firstly because the information contained in the obtained tiles was sufficient to obtain excellent results, but also because the use large color images would entail the need to set a large number of hyperparameters, making training considerably more complex. The implementation choices of our CNN experimentation are inspired by the organization at the levels of the LeNet-5 network [48]. It presents convolution levels alternated with pooling levels, followed by a series of fully-connected levels.
In detail, the network levels are as follows: − Input level: where each neuron corresponds to a single pixel of the tile given as input; this level corresponds to a 50 × 50 two-dimensional matrix of neurons; − Five convolutional levels: arranged in sequence, each followed by a pooling level. They are initially organized in increasing order of depth: precisely there are 32, 64, 128 filters applied, in order to increase the number of features maps. Then there are convolutional levels arranged with a decreasing number of filters (128, 64, 32) to simplify the number of resulting features maps. For each convolutional level, 5 × 5 filters with unitary stride and self-balanced zeropadding are applied, and after obtaining the new features map, the ReLU activation function is applied. As far as the pooling levels are concerned, they serve to simplify the features maps and decrease the number of computing resources needed. Again, they are made up of 5 × 5 filters with five stride and max pooling is applied. After the last level, the flatten is applied, i.e., a function that transforms the various features maps from 3D into 1D to allow connection with the next level; − Fully-connected level: consisting of 1024 neurons. It is fully connected to the last convolutional level, that is, each neuron of the features map is connected with all 1024 neurons of this level. For each neuron, the ReLU activation function is applied to the input. To prevent overfitting, a dropout with a probability of 0.8 is applied during training. At each time, each neuron has an 80% probability of being included in the training phase and a 20% probability of not being considered, so as not to specialize too much on the input data; − Output level: consisting of two neurons, representing the biofilm and other labels. It is fully connected to the previous level and a Softmax activation function is applied; this function allows us to obtain a probability distribution on the two labels, therefore the most probable one is considered the correct output.
The development of the system, therefore, consists of three main phases: 1. Definition of the dataset, as described in Section 2.2; as explained later, validation was implemented through the k-fold cross validation technique. The tiles are converted to grayscale and resized for the reasons mentioned above; 2. The training phase, in which the convolution of the images via the network is carried out, updating the weights of the latter step by step. The update of the weights, through back propagation, takes place in 70 epochs. The choice of the number of epochs derives from a careful analysis of the accuracy, validation and loss function graphs in relation to the number of epochs. The graphs relating to the different trends during the training show a substantial slowdown in convergence over the twenty-fifth epoch (step 1000). While, in correspondence with the seventieth epoch (corresponding to step 2750), the value relating to accuracy, validation and loss function converge, see Figure 4. Moreover, it should be clarified that the CNN has been trained using back propagation and value of learning rate has been set equal to 0.001; 3. The classification phase, in which the images are provided as input to the network with the updated weights which will return a probability distribution of the class to which they belong, through the Softmax activation function.
In order to avoid the problem of overfitting in the training phase, the k-fold cross validation technique was used, with the parameter k set at 10 as also suggested in [49], in fact it has been shown empirically that it provides estimates the error rate of the test and that they do not suffer from excessive distortion or very high variance. At the end of this process, 10 CNN with different weights are generated and then tested.

Forest
At the same time as the development of the CNN described above, two alternative technologies based on the use of decision trees were implemented: isolation forest and deep random forest. Below we present the basic assumptions and related implementation details of isolation forest and deep random forest.

Texture Analysis
The concept of texture is highly variable in relation to the context in which it is being used. There is a vast amount of definitions to describe this concept. In the recognition of the biofilm, the texture is considered as the geometric arrangement of the luminance levels (grayscale) of the pixels of the image. The context of this study falls within the problem of texture classification, i.e., the automatic cataloging of images based on the class of texture they belong to. The only class present is the class with the "biofilm" label, since everything that does not belong to it does not need to be classified and will take the "other" label.
The extraction of information from images, or more generally from an entity, implies the use of a mathematical model, capable of describing this entity. The gray levels co-occurrence matrix (GLCM) model was used to describe the textures relating to the work carried out. This choice derives from the analysis of a study on the digital analysis of biofilm images in humid environments and in food [50]. The goal of this mathematical model is the processing of digital images in grayscale.
Once the GLCM has been generated, it is possible to extract from its statistical descriptors deriving from the properties of the textures. These descriptors were studied by Haralick. In total Haralick drew up a list of 14 descriptors, but in this treatment, only the first 13 are considered since the calculation of the fourteenth one (i.e., the maximum correlation coefficient, which corresponds to the square root of the second largest eigenvalue of the GLCM) is affected by computational instability [51]. For a complete description of these metrics, the reader is referred to the original paper [52].

Color Analysis
The second analysis performed to extract information from images is aimed at obtaining color data. In this experimentation, this procedure takes place using the color components of two different color models Hue Saturation Value (HSV) and CIE 1976 L*, a*, b* (LAB). The choice of HSV and LAB color models is suggested by their characteristic of linearly representing the pigmentation by means of the chromatic components, excluding brightness. Obviously we do not foresee a competition between the two models, but finally, we will choose the one that can allow us to obtain a better result. The analysis consists of the extraction from an image of the dominant color, that is the color among the most frequent pixels. In this perspective, it should be noted that, in a real image, adjacent pixels rarely show the same hue, since the color of each pixel represents the average of the colors present in a portion of the real image. To this end, as stated in the papers [53][54][55], the search for the dominant color is carried out by means of the K-Means, dedicated to grouping and classifying homogeneous elements [56]. In our specific case, we find the dominant color with K-Means by grouping the pixels in a number of clusters equal to the number of main colors present in the image, with all the shades of a specific color. The centroids of each cluster identify the main color. Finally, the cluster containing the most elements is chosen as the dominant color.

Scaling and PCA
The linearization of the data is necessary for the classification to take place in the most reliable way possible. This choice derives from the fact that Haralick features and the dominant color have a high variability of domains. For this purpose, a standardization technique is applied. This technique is applied both to the features extracted from the images contained in the dataset and to the features extracted from the images to be classified. The scaling technique applied consists of standardizing feature domains. The robust scaler [57] was chosen, as the data to be processed have outliers. It also works correctly in the presence of outliers, as it processes only the data contained in the interquartile range, that is the range of values that contains the "central" half of the observed values. It narrows the range from features to a range of 0 to 1.
In our work, the principal components analysis (PCA) was used for data simplification. However, the use of PCA does not always bring benefits to data processing. This is because, the PCA transformation generates correlations between the variables that are often incorrect, neglecting others that may have an important role for classification purposes. To this end, in this study the systems working on the features, isolation forest and deep random forest, have been tested with and without PCA transformation.
Ultimately, four different configurations were obtained, depending on the type of color model used and the application of the PCA transformation: 1. HSV with PCA transformation 2. HSV without PCA transformation 3. LAB with PCA transformation 4. LAB without PCA transformation.

Isolation Forest
The isolation forest classifier is based on the detection and isolation of "anomalies" in the dataset. The isolation forest is made up of decision trees, which first randomly select a feature and then create a random partition value between the minimum and maximum values of the selected feature. In general, vectors of anomalous features are less frequent than regular ones and differ from them by the deviated value of some features. For this reason, this random partitioning allows identifying anomalies closer to the tree root with fewer partitions needed. At the end of the search in the decision tree, a number identified as the isolation number is generated for each feature vector. If this score is close to 1, there is an anomaly, if the score is much smaller than 0.5 it indicates a normal observation, while if it is close to 0.5 the whole sample does not seem to have clearly distinct anomalies. To avoid problems due to the randomness of the choice of values during isolation, the procedure is carried out several times on multiple decision trees, generating a forest. Insulation value is calculated for each tree and, after visiting all of them, an average of these values is calculated.
In the study presented here, the isolation forest searches for anomalies in the features extracted from the tiles to be classified, and if there are, the tiles are classified with the "other" label, since normal observation is identified by the classified tiles with the label "biofilm".
The system we designed has been tested according to the four configurations described above. During the training phase, cross validation was used, using 80% of the tiles to train the system and 20% for the test.

Deep Random Forest
In this experimentation, the gcForest multi-grained cascade forest algorithm is used [58]. This consists of an ensemble method of decision trees. The ensemble learning estimates different learning methods deriving from machine learning and statistics to then combine them together and obtain a final model (ensemble model) with greater predictive power than the individual starting models (base learners). This algorithm generates a set of organized forests with a waterfall structure. The number of levels in cascade is automatically determined, in an adaptive way, based on the data available and in such a way that the complexity of the model can be set automatically, allowing gcForest to work in an excellent way.
The system we designed has been tested according to the four configurations described above. For each configuration, training time of approximately 35 min was required. During the training phase, cross validation was used, using 80% of the tiles to train the system and 20% for the test. Figure  5 shows the results obtained considering each configuration. The results are related to the accuracy of the system and the precision, recall and f1-score of the predictions of each class on the test set. In addition, the receiver operating characteristic (ROC) curve and the relative area under curve (AUC) are calculated by plotting the true positive rate against the false positive rate. From a preliminary analysis, based on accuracy, it can be seen that the system trained with the LAB model without applying the PCA transformation, is the least performing. The other three configurations, however, are equivalent. A second more careful analysis leads to the evaluation of the f1-score metric as it is calculated by balancing precision and recall. Specifically, it was decided to evaluate this metric in relation to the biofilm class, as the prediction errors of a tile containing biofilm, or a false negative (FN), are more serious than the prediction errors of tiles not containing biofilm, a false positive (FP). As a matter of fact, practical medical support should be fast and precise but, above all, ensure the probability of no-alarm equal to zero. Then it is more serious not to recognize the biofilm in a patient suffering from a pathology. The system with the highest f1-score in the biofilm class, during the training phase, turns out to be the system trained with the LAB color model and using the PCA transformation. This outcome is confirmed by the ROC curves, as this configuration generated a higher AUC value than the others.

Experimental Results
The same test process among classifiers was carried out, with 100 different tiles, 50 containing biofilm and 50 containing other material. The test phase produced the confusion matrices reported in appendix A, where the predictions are indicated on the lines (O = other, B = biofilm), while on the columns the actual truth value.
Considering the following definitions: • TP corresponds to the number of tiles correctly classified as biofilm; • TN corresponds to the number of tiles correctly classified as other; • FP corresponds to the number of other tiles labeled as biofilm; • FN corresponds to the number of biofim tiles labeled as other; Accuracy, sensitivity and miss rate metrics have been calculated.
Analyzing the results relating to the isolation forest reported in Table 1, it can be noted that the maximum accuracy is obtained using the HSV color model and performing the PCA transformation. This configuration, however, shows the highest miss rate value, ergo, the configuration that produces the greatest amount of FN, influencing the quality of the analysis. For this reason, it is preferable to establish the best configuration based on the value of the miss rate. In this sense, as can be seen from Table 1, the best configuration is the one with the HSV color model and without using the PCA transformation, which has detected only one false negative As it can be observed instead from Table 2, for the system based on deep random forest, undoubtedly the configuration that turns out to be more performing is that which uses the HSV color space regardless of the transformation of the PCA data, as it produces the lowest miss rate (with only one false negative), with very high accuracy. This result seems in contrast with that made through the cross validation during the training phase. This can be interpreted as a possible risk of overfitting in the LAB model with PCA, since in the training phase it reported excellent results, while in the test phase it was the model with higher miss rate, maybe due to an excessive adaptation of the system to training data, as well as a threat of overfitting. Ultimately, analyzing Table 2 and comparing it with the ROC curves of the deep random forest configurations, it can be said that the best configuration for this system is the one that uses the HSV color model and PCA transformation, as it reported an AUC equal to 0.84 in the training phase and in the test it generated only one FN and accuracy equal to 97%.
From Table 3, relating to the CNN-based system, we can read the accuracy equal to 0.98 (due to the presence of two FPs), while the rest of the predictions are all correct.
As for the two techniques, CNN and deep random forest, it can be said that CNN has achieved slightly higher accuracy than isolation forest. On the other hand, CNNs need a large dataset to be trained correctly, as well as higher computing power. Furthermore, it must be considered that the presence of errors in the classification can be compensated by the fact that the size of the biofilm spots, acquired at 1000×, usually turns out to be greater than the size of a single tile. For this reason, the same spot not detected in a tile could be detected in one of the adjacent tiles.

Conclusion and Future Work
The system presented herein has proved satisfactory and could be useful in cases where many patients need to be evaluated. One of the most important aspects of nasal cytology is the possibility that it offers the specialist an opportunity to make a correct differential diagnosis by means of a lowcost analysis without having to send their patient to a laboratory for further testing. The important help this system provides to the specialist is to give him the possibility to photograph an appropriate number of microscopic fields that he considers useful and automatically detect the biofilm presence.
We have seen how it is possible to detect the presence of biofilm spots in rhino-cytological scans through different algorithms: isolation forest, deep random forest and CNN. The CNN network involves greater attention, from an implementation point of view to the other techniques as regards its training phase and any overfitting problems in which it could fall. Despite this, he correctly classified all the images, unlike the other technologies; this system can, therefore, represent excellent support for rhino-cytological analyzes in order to recognize the biofilm, with more than reliable results in terms of accuracy and error rate.
The design choices concerning CNN have given the model strength and reliability, without however reduce its characteristics such as simplicity and flexibility. The main obstacle was the lack of data: with this awareness, it was nevertheless possible to train performing classifiers, which satisfied the primary requirements for integration, while also offering good precision scores.
Research is in progress to make the system capable of working with images of varying sizes, as it currently works on images of predefined sizes. The integration of this system with a medical rhinocytology system, used to assist the diagnosis of nasal diseases from cytological information, is currently underway.