Comparative Analysis of Rhino-Cytological Specimens with Image Analysis and Deep Learning Techniques

Cytological study of the nasal mucosa (also known as rhino-cytology) represents an important diagnostic aid that allows highlighting of the presence of some types of rhinitis through the analysis of cellular features visible under a microscope. Nowadays, the automated detection and classification of cells benefit from the capacity of deep learning techniques in processing digital images of the cytological preparation. Even though the results of such automatic systems need to be validated by a specialized rhino-cytologist, this technology represents a valid support that aims at increasing the accuracy of the analysis while reducing the required time and effort. The quality of the rhino-cytological preparation, which is clearly important for the microscope observation phase, is also fundamental for the automatic classification process. In fact, the slide-preparing technique turns out to be a crucial factor among the multiple ones that may modify the morphological and chromatic characteristics of the cells. This paper aims to investigate the possible differences between direct smear (SM) and cytological centrifugation (CYT) slide-preparation techniques, in order to preserve image quality during the observation and cell classification phases in rhino-cytology. Firstly, a comparative study based on image analysis techniques has been put forward. The extraction of densitometric and morphometric features has made it possible to quantify and describe the spatial distribution of the cells in the field images observed under the microscope. Statistical analysis of the distribution of these features has been used to evaluate the degree of similarity between images acquired from SM and CYT slides. The results prove an important difference in the observation process of the cells prepared with the above-mentioned techniques, with reference to cell density and spatial distribution: the analysis of CYT slides has been more difficult than of the SM ones due to the spatial distribution of the cells, which results in a lower cell density than the SM slides. As a marginal part of this study, a performance assessment of the computer-aided diagnosis (CAD) system called Rhino-cyt has also been carried out on both groups of image slide types.

In 2018, the authors carried out a project following a similar approach, concerning the nasal cytology domain. Thanks to an optical microscope equipped with a camera, it was possible to sample hundreds of cells from direct smear (SM) slides. Then, using a segmentation algorithm, single images for different cells were extracted and labeled, according to their type. Subsequently, CNN was trained with the resulting dataset. Interesting results were obtained in the experimental phase [11,12].
During a retrospective look at that study, a data quality issue emerged. To achieve better results and maximize consistency, general quality improvement is required. Better cytological slides will generate better cell images, that will, in turn, make it possible to improve the CNN performance. Thus, the use of cytological centrifugation (CYT) images has begun, CYT slides being considered important in terms of quality in medical practice. On the other hand, CYT processing has higher costs compared to SM, due to the time and instruments required.
Bartoli et al. compared SM and CYT techniques using an optical microscope, applying both techniques to rhinological samples of a large population, heterogeneous in terms of sex, age, habits and ills [14]. After studying the distinctness of the cells from the rest of the cytoplasm, the recognition of the different cell types, and how easy it was to identify them, they asserted that the CYT technique is superior.
That study was based on clinical practice, and the values of the considered metrics are subjective since they are based on the visual expertise of the human observer. We will try to conduct a new quality analysis using quantitative metrics with a stronger connection to the image analysis domain.

Nasal Cytology
Nasal cytology is a simple, affordable and non-invasive and point-of-care method used to diagnose rhinitis and other nasal diseases. The rationale of this method is the composition of the nasal mucosa: in a condition of health, only four types of cell are present in it: ciliated, muciparous, meta-plastic and ribbed cells; the presence and number of any of these can be a sign of an ongoing pathology [30].
The rhino-cytological exam is based on three main phases [31]: 1. Specimen Sampling: Cells from the superficial level of nasal mucosa are withdrawn using techniques such as nasal biopsy, nasal flush, brushing, scraping or nasal swab. Each technique offers different pros and cons, e.g., preserving some particular features of the mucosa, but on the other hand, being more invasive or more painful [31]. For clinical purposes, a simple and minimally invasive technique like scraping turns out to be preferable among the previous ones.

2.
Processing: Using the direct smear technique, the sample is spread out onto a microscope slide, then secured with air drying and later colored with the May-Grunwald-Giemsa (MGG) method. This method, which appears to be the most widely used, makes it possible to color all the cellular components present in the nasal mucosa, including immune system cells like neutrophil, eosinophil, lymphocytes and mast cells, and also foreign matter like bacteria and spores. 3.
Observation: The slice containing the oil-immersed samples is observed under an optical microscope, typically with a magnification factor equal to 1000×. The diagnostic protocol suggests acquiring at least 50 cytological fields to collect a statistically valid sample, i.e., at least 200 cells overall. During the manual reading phases, the specialist compiles a specific rhino-cytological diagram, counting the number of each different type of cell found (Table 1) and also assessing the quality of the cytological preparation, gathering useful information for the final diagnosis. Table 1. Cell types of the nasal mucosa, useful to diagnose rhinitis and nasal diseases.

Name Description Occurrence
Ciliated cell The most common cell among the nasal mucosa cells. It presents a polygonal shape characterized by three main regions: the apical portion containing the ciliary apparatus; the cell body containing most of the cytoplasm; and the caudal portion containing the basal region.

Frequent
Muciparous cell A chalice-shaped unicellular gland with the nucleus in a basal position below the vacuoles, which gives it the typical shape. Frequent Ribbed cell A column-shaped cell with a nucleus located toward the lower pole. Frequent Basal cell A small dimension cell with electrodense cytoplasm and a big and hyperchromatic nucleus. It adheres to the basal membrane. Frequent Neutrophil A granulocyte with polylobate nucleus and finely colored grains. Normal Eosinophil A granulocyte with a nucleus often bilobed and orange acidophile grains. Normal

Lymphocyte
The kidney-shaped nucleus occupies most of the cell, surrounded with a thin light blue cytoplasmatic portion. Rare Mast cell A granulocyte with an ovular nucleus, usually covered with purple basophils grains. Rare

Cytocentrifugation
An alternative technique to SM is CYT. This consists in diluting the already-taken sample in a phosphate-buffered saline or into ditiothreitol and, when the right cell concentration is established with a nephelometer (Figure 1), the samples are processed with a cytological centrifuge [14] like that in Figure 2.

Neutrophil
A granulocyte with polylobate nucleus and finely colored grains. Normal Eosinophil A granulocyte with a nucleus often bilobed and orange acidophile grains. Normal

Lymphocyte
The kidney-shaped nucleus occupies most of the cell, surrounded with a thin light blue cytoplasmatic portion. Rare Mast cell A granulocyte with an ovular nucleus, usually covered with purple basophils grains. Rare

Cytocentrifugation
An alternative technique to SM is CYT. This consists in diluting the already-taken sample in a phosphate-buffered saline or into ditiothreitol and, when the right cell concentration is established with a nephelometer (Figure 1), the samples are processed with a cytological centrifuge [14] like that in Figure 2.
During the centrifugation process, the diluted sample will be transferred to a microscope slide put into a cytofunnel ( Figure 3); the centrifugal force exerted by rotation, along with the hydraulic force, allows cells to settle in a single-celled layer with a concentric distribution in a restricted area of the slide. Then cells are colored with the MGG method. The staining process is not affected by centrifugation since it is executed after the cells are centrifuged. For this reason, the centrifugation process is agnostic to the staining and sampling techniques used.
Cytological centrifugation is considered the standard method to let cells settle on the microscope slide. Studies on cell integrity have proven centrifugation to be suitable for the majority of cases, although, if compared to the SM technique, CYT usually offers a lower number of samples [32]. It is also possible that the forces exerted on cells during the centrifugation process alter their morphological features, affecting the overall quality of the preparation. An additional qualitative parameter is represented by the density of cells: a too high number would imply that the layers are thick and stratified, making the observation phase further complicated due to the overlapping cells [32]. Despite a more complex preparation process, the CYT technique seems to dramatically reduce the microscope reading time, cutting one of the most effortful activities for the specialist and still resulting as clinically equivalent to SM processing [14].

Cell Detection and Segmentation
Cell detection frequently represents a requirement in a CAD application for automated image analysis. The task is mainly carried out in the preliminary phases; indeed, after cells have been detected they can be used to extract densitometric, morphometric and structural features, valuable During the centrifugation process, the diluted sample will be transferred to a microscope slide put into a cytofunnel ( Figure 3); the centrifugal force exerted by rotation, along with the hydraulic force, allows cells to settle in a single-celled layer with a concentric distribution in a restricted area of the slide. Then cells are colored with the MGG method. The staining process is not affected by centrifugation since it is executed after the cells are centrifuged. For this reason, the centrifugation process is agnostic to the staining and sampling techniques used.

Cell Detection and Segmentation
Cell detection frequently represents a requirement in a CAD application for automated image analysis. The task is mainly carried out in the preliminary phases; indeed, after cells have been detected they can be used to extract densitometric, morphometric and structural features, valuable for subsequent analysis.
Difficulty in obtaining a robust and accurate result is mainly due to the complexity of the biological information coded into images [33]. Microscope images are often characterized by the presence of artifacts produced during the acquisition process (noise, chromatic aberration, blurring, or non-uniform illumination). Information quality can also be affected by some features of the cytological preparation, such as layer thickness, quality of the coloration and agglomeration presence [34]. In addition, cells generally present poorly defined borders [33], and, in the case of agglomerations, partially overlap each other [35].
Based on the detail level required by the domain, it is possible to obtain the cell position with different options: without defining the borders (marker seeking), approximating the borders (cell detection) or delimiting borders (cell segmentation) [35]. Marker seeking techniques are used to support tasks such as cell counting and tracking or segmentation of preliminary steps.
Among several segmentation techniques, thresholding represents the easiest one. Assuming that the distribution of the intensity values of cells and background pixels are sufficiently distinct, the image is converted to a binary one using a global or a local threshold, determined automatically or a priori [35]. The watershed transform is a widely used region accumulation method. It is based on the iterative adding of connected pixels, starting from the region of the minimum, in order to create labeled regions that identify the cells. Because of intensity variation between image background and foreground, over-segmentation problems can often arise. The most common solution consists in the introduction of markers indicating the object of interest, which would also turn out to be the only ones the algorithm will start processing [35]. Segmentation can also be interpreted as a classification problem that can be solved with supervised learning methods. In this case, the belonging to a cell region or a background region is determined with a trained model like a neural network.
One of the main issues in determining the appropriate technique is linked to a low standardization level. In most cases, every proposed technique is evaluated on ad-hoc-built datasets with different metrics, making eventual qualitative comparison very hard to carry out [35].
In the CAD system developed in our previous work, called Rhinocyt, extraction is performed using OTSU thresholding, Euclidean distance transform (EDT) and watershed transform [11]. Cytological centrifugation is considered the standard method to let cells settle on the microscope slide. Studies on cell integrity have proven centrifugation to be suitable for the majority of cases, although, if compared to the SM technique, CYT usually offers a lower number of samples [32]. It is also possible that the forces exerted on cells during the centrifugation process alter their morphological features, affecting the overall quality of the preparation. An additional qualitative parameter is represented by the density of cells: a too high number would imply that the layers are thick and stratified, making the observation phase further complicated due to the overlapping cells [32]. Despite a more complex preparation process, the CYT technique seems to dramatically reduce the microscope reading time, cutting one of the most effortful activities for the specialist and still resulting as clinically equivalent to SM processing [14].

Cell Detection and Segmentation
Cell detection frequently represents a requirement in a CAD application for automated image analysis. The task is mainly carried out in the preliminary phases; indeed, after cells have been detected they can be used to extract densitometric, morphometric and structural features, valuable for subsequent analysis.
Difficulty in obtaining a robust and accurate result is mainly due to the complexity of the biological information coded into images [33]. Microscope images are often characterized by the presence of artifacts produced during the acquisition process (noise, chromatic aberration, blurring, or non-uniform illumination). Information quality can also be affected by some features of the cytological preparation, such as layer thickness, quality of the coloration and agglomeration presence [34]. In addition, cells generally present poorly defined borders [33], and, in the case of agglomerations, partially overlap each other [35].
Based on the detail level required by the domain, it is possible to obtain the cell position with different options: without defining the borders (marker seeking), approximating the borders (cell detection) or delimiting borders (cell segmentation) [35]. Marker seeking techniques are used to support tasks such as cell counting and tracking or segmentation of preliminary steps.
Among several segmentation techniques, thresholding represents the easiest one. Assuming that the distribution of the intensity values of cells and background pixels are sufficiently distinct, the image is converted to a binary one using a global or a local threshold, determined automatically or a priori [35]. The watershed transform is a widely used region accumulation method. It is based on the iterative adding of connected pixels, starting from the region of the minimum, in order to create labeled regions that identify the cells. Because of intensity variation between image background and foreground, over-segmentation problems can often arise. The most common solution consists in the introduction of markers indicating the object of interest, which would also turn out to be the only ones the algorithm will start processing [35]. Segmentation can also be interpreted as a classification problem that can be solved with supervised learning methods. In this case, the belonging to a cell region or a background region is determined with a trained model like a neural network.
One of the main issues in determining the appropriate technique is linked to a low standardization level. In most cases, every proposed technique is evaluated on ad-hoc-built datasets with different metrics, making eventual qualitative comparison very hard to carry out [35].
In the CAD system developed in our previous work, called Rhinocyt, extraction is performed using OTSU thresholding, Euclidean distance transform (EDT) and watershed transform [11].

Deep Learning
Deep learning (DL) is a branch of machine learning that consists of a set of techniques and algorithms inspired by human brain processes. One of the main features of DL-based models is their hierarchical organization in layers (for example in a neural network), which allows the use of different data representations with growing levels of abstraction for the training. Each layer processes input data with a non-linear transformation so that present patterns can be identified and extracted.
With input data (images) being structurally complex, non-linear and highly dimensional (thousands/millions of pixels) [36], DL turns out to be particularly suitable for an application domain like biomedical imaging, due to its capability to automate the process of feature extraction, thus reducing the effort employed in feature engineering activities.
A classic architecture with fully connected levels (FC) is not scalable: the high number of parameters to be learned increases the model capacity, the dataset dimension and, consequently, the training and execution time. The main limitation of this kind of model, like the multi-layer perceptron, consists in the lack of invariance between the transitions: a minimum variation of position for a certain pattern in the input implies new pattern-associated weights to be learned several times in different positions [36]. An additional limitation is the lack of analysis of the input topology. Images present a strong local structure: pixels spatially distributed in a small area are highly correlated, and various layers of an FC learn patterns at a global level, completely deleting the information associated with local structures [36]. In order to overcome such limitations, models with a specialized architecture are needed. The convolutional neural network is one of the most-used architectures in tasks like cell detection, cell segmentation and cell detection, achieving good results [37].
In order to limit overfitting, DL-based models require a training phase to be run on many-dimensional datasets [24]. However, in a biomedical context, it is not always possible to easily acquire samples, and the labeling process must also be carried out by specialized doctors, seldom available. Image augmentation techniques make it possible to increase the dataset dimension just by applying to the initial images a set of geometrical transformations like rotation, translation and mirroring, generating new artificial instances. After training on an augmented dataset, the model will turn out to be improved in terms of generalization, being able to learn invariant features under such transformations [24]. Additional problems are linked to imbalanced datasets: majority classes representing the most common cells are often present, accompanied by numerous minority classes linked to pathological situations. A common solution to dataset imbalance is artificial re-sampling: under-sampling for the majority classes and oversampling for the minority ones.

Sampling
The nasal mucosa specimens used in this study were collected by the specialists through nasal scraping from 34 different patients at the Otolaryngology Department of the University of Bari. Fourteen specimens were processed with direct smear and 20 specimens were diluted with PapSpin Collection Fluid. Seven specimens for SM and CYT were considered of good quality and suitable for experimentation by the specialists. Once the right cell concentration was established with a Electronics 2020, 9, 952 7 of 19 nephelometer (Figure 1), the specimens were processed with a cytocentrifuge ( Figure 2). All samples were dried in air and stained with MGG stain.

Data Acquisition
All the slides were observed using the Proway optical XSZPW208 T optical microscope with a 100× lens and 1000× magnifying factor. The DCE-PW300 3MP digital camera was used to obtain digital image-fields as 1024 × 768 px PNG files. The resolution and file format were chosen considering a reasonable computational overhead, at the same time avoiding noise artifacts due to the compression algorithm. As suggested by the protocol proposed by Heffler et al. [31], more than 50 fields were considered for each slide in order to acquire about 200 cells.
Two different datasets were created from the acquired images: the FIELDS dataset contains raw field images without annotations; the CELLS dataset contains images of isolated cells (extracted from FIELDS) annotated with their respective labels.

Quantitative Differential Analysis
Visual interpretation of cellular features using microscopy, i.e., image-based cytometry, plays a fundamental role in cytology and histopathology diagnosis activities [38]. The human visual system is able to qualitatively detect and interpret visual patterns with great efficiency [39,40]. A subjective evaluation is a reliable and accurate evaluation methodology but requires a great amount of time and effort, impractical in real-life large-scale applications and unachievable in automatized ones. Moreover, subjective evaluation could be further complicated by reproduction devices, environmental conditions, inter-observer variance, the observer's visual abilities and cognitive biases. In order to remedy these issues, a quantitative approach through an objective evaluation should be used. Using mathematical models and metrics, biological entities and phenomena could be described in less fuzzy terms, enabling quantitative, unbiased, reproducible and large-scale analysis [41].
Using a digital image as the data source, almost all the biological information is implicitly encoded at different levels of abstraction. In order to describe the object of interest in a formal and quantitative way, it is necessary to execute a feature extraction process [38]. A feature is a numeric quantity used to describe an image (or a restricted patch) at a high level of abstraction [42]. Usually, visual features are associated with the essential constructs of visual language like points, lines, shapes, colors, textures and patterns. According to the terminology proposed by Rodenacker et al. [38], a feature is denoted as morphometric if it describes only spatial relations or densitometric if describes only pixel intensity levels, while a feature that combines both aspects is a structural feature.
A different categorization proposed by Kothari et al. is based on the data-information-knowledge pyramid model [34]. Pixel-level features are the lowest in the hierarchy because the computation is based on pixel intensity data and they convey the lowest biological interpretability value [34]. They are the easiest to extract and in some cases are satisfactory for content description. Object-level features convey a higher degree of biological information because they describe shapes, textures and spatial distribution of cellular structures and substructures. Since the extraction is carried out at object level, the entire process is highly dependent on the accuracy of detection or segmentation of the target objects, by which the robustness of consequent analysis is affected [34]. Semantic-level features are the highest in the hierarchy because they tend to reduce the semantic gap between pixel data and biological concepts directly interpretable by specialists. The features' values are based on classification or statistical rules computed on lower-level features [34].
The selection of the right set of features has a great impact on the analysis quality; as a rule of thumb, two main aspects should be considered: • Interpretability: A feature should describe the encoded visual information in a way that is easily interpretable by the final observers. The description should be based as much as possible on human visual perception principles.
• Robustness: Visual information encoded in an image shows a high degree of variance. A robust feature should have properties like equivariance/invariance to illumination levels, noise, scale, orientation or translation.
Among the features known in the literature [38], those that better describe the visual content and discriminate the investigated processing techniques, i.e., SM and CYT, have been chosen for this study. The workflow depicted in Figure 4 describes two main analysis typologies based on the region of interest (ROI) scale: global, having the ROI defined on the entire field image, captures intercellular relations; local, restricting the ROI to the area occupied by a single cell, captures intracellular features. The main goal of quantitative differential analysis is to measure the differences between the feature empirical distribution of fields processed with SM relative to those processed with CYT.
Electronics 2020, 9, x FOR PEER REVIEW 8 of 19 The main goal of quantitative differential analysis is to measure the differences between the feature empirical distribution of fields processed with SM relative to those processed with CYT.
To detect cellular structures, the algorithm implemented by the Rhinocyt extraction module [11] has been used. As discussed in Section 2.3, the over-segmentation problem has been solved by using the watershed transform augmented with additional markers extracted employing the Otsu segmentation and EDT. General workflow used in the quantitative differential analysis. Steps reported within highlighted boxes are optional. A sampling of the images in the dataset is carried out as the first step. Next, the ROI is defined based on the type of the analysis (global or local); then, depending on the specific selected feature, a preprocessing step and a segmentation step may follow. For each technique and each image, a feature is extracted. The result is an empirical distribution: statistical descriptors are computed to assess the similarities/differences between SM and CYT.

Density
Conceptually, field density is defined as the number of cytological specimens contained in the image field. The field density is based on a simple pixel counting process. The most important thing to notice is that field density takes into account both the pixels associated with cells but also artifacts like dust, clots, scratches and stain spots (see Figure 5). The artifacts' presence in field images is marginal; thus, the quantitative contribution to the density measure is negligible. Moreover, field density tries to convey the information associated with the specialist observation process, as the specialist effort is directly proportional to the density field.
To extract the field density, the field RGB image has been processed by the mean-shift algorithm filtering stage. Using a weighted average of pixel neighbors (spatial window: 21 px; color window: 51 px), this step acts as a low-pass filter. This helps to dampen color shades and attenuate highfrequency components of pixel colors. The smoothed image has been converted to grayscale to discard any color information, binarized using Otsu's thresholding method and dilated by morphological dilation operation with a disk structuring element of 5-px radius to fill small holes. The output binary image has been then used to compute a normalized histogram; the field density and, in a complementary way, the field sparsity, is defined as in Equation (1).
where I and I' are the input and output images with size N × M, respectively, and H(I',0) and H(I',1) are the absolute frequencies of background and foreground pixels, respectively. Field density is a densitometric pixel-level feature because the computation is mainly based on an image histogram that does not encode any spatial information. In this way, the feature is orientation and translation invariant and possesses a certain degree of robustness to illumination variance since a binarized image is used. Figure 4. General workflow used in the quantitative differential analysis. Steps reported within highlighted boxes are optional. A sampling of the images in the dataset is carried out as the first step. Next, the ROI is defined based on the type of the analysis (global or local); then, depending on the specific selected feature, a preprocessing step and a segmentation step may follow. For each technique and each image, a feature is extracted. The result is an empirical distribution: statistical descriptors are computed to assess the similarities/differences between SM and CYT.
To detect cellular structures, the algorithm implemented by the Rhinocyt extraction module [11] has been used. As discussed in Section 2.3, the over-segmentation problem has been solved by using the watershed transform augmented with additional markers extracted employing the Otsu segmentation and EDT.

Density
Conceptually, field density is defined as the number of cytological specimens contained in the image field. The field density is based on a simple pixel counting process. The most important thing to notice is that field density takes into account both the pixels associated with cells but also artifacts like dust, clots, scratches and stain spots (see Figure 5). The artifacts' presence in field images is marginal; thus, the quantitative contribution to the density measure is negligible. Moreover, field density tries to convey the information associated with the specialist observation process, as the specialist effort is directly proportional to the density field.
To extract the field density, the field RGB image has been processed by the mean-shift algorithm filtering stage. Using a weighted average of pixel neighbors (spatial window: 21 px; color window: 51 px), this step acts as a low-pass filter. This helps to dampen color shades and attenuate high-frequency components of pixel colors. The smoothed image has been converted to grayscale to discard any color information, binarized using Otsu's thresholding method and dilated by morphological dilation operation with a disk structuring element of 5-px radius to fill small holes. The output binary image has been then used to compute a normalized histogram; the field density and, in a complementary way, the field sparsity, is defined as in Equation (1). where I and I are the input and output images with size N × M, respectively, and H(I ,0) and H(I ,1) are the absolute frequencies of background and foreground pixels, respectively. Field density is a densitometric pixel-level feature because the computation is mainly based on an image histogram that does not encode any spatial information. In this way, the feature is orientation and translation invariant and possesses a certain degree of robustness to illumination variance since a binarized image is used.

Agglomeration
The agglomeration feature is defined to quantify the cellular clumping phenomenon due to the mechanical forces exerted during the centrifugation process. The cellular clumping tends to reduce intercellular distance, creating cellular agglomerates. In the worst-case scenario, cells are overlapped, meaning that field reading is more difficult [14] and automatic cell detection and extraction performance are degraded [35]. Cellular clumping is a generic phenomenon that could be addressed from multiple points of view; the definition proposed here is based on the construction of a cellular network that reflects the cellular spatial distribution patterns, highlighting the connectivity among the image regions [43].
Since feature extraction is based on the spatial position of cells in the field, the agglomeration is an object-level feature. The Rhinocyt extraction module has been used for cell detection. As the first step, after the cellular structures have been detected, the associated image regions are used to compute a set of centroids C = {c1, …, cn}; each centroid is in a one-to-one relation with a cell. From centroid information, an unweighted undirected graph G (V, E), called a region adjacency graph (RAG), is built. Each vertex vk ∈ V is associated with a centroid ck ∈ C, and an edge e ∈ E is defined for each pair of adjacent regions (see Figure 6). Two regions Ri and Rj are considered to be adjacent if Ri ∪ Rj is a connected set, using 8-connectivity. Using the RAG, the image field agglomeration α is defined as: From Equation (2), it is clear that α has a lower bound (0 ≤ α) when the RAG has only isolated nodes (i.e., E = ∅). On the other hand, there is no clear analytical upper bound (α < ∞).
Particular attention must be paid to feature interpretation: two fields with different numbers of cells could have a similar agglomeration value if they have, in proportion, the same number of agglomerated cells. Agglomeration is a morphometric feature since the computation is mainly based on the RAG that, different to the original approach [43], encodes only spatial information without considering any. Moreover, the feature is computed on a grayscale version of the input image and so is robust to color variation.

Agglomeration
The agglomeration feature is defined to quantify the cellular clumping phenomenon due to the mechanical forces exerted during the centrifugation process. The cellular clumping tends to reduce intercellular distance, creating cellular agglomerates. In the worst-case scenario, cells are overlapped, meaning that field reading is more difficult [14] and automatic cell detection and extraction performance are degraded [35]. Cellular clumping is a generic phenomenon that could be addressed from multiple points of view; the definition proposed here is based on the construction of a cellular network that reflects the cellular spatial distribution patterns, highlighting the connectivity among the image regions [43].
Since feature extraction is based on the spatial position of cells in the field, the agglomeration is an object-level feature. The Rhinocyt extraction module has been used for cell detection. As the first step, after the cellular structures have been detected, the associated image regions are used to compute a set of centroids C = {c 1 , . . . , c n }; each centroid is in a one-to-one relation with a cell. From centroid information, an unweighted undirected graph G (V, E), called a region adjacency graph (RAG), is built. Each vertex v k ∈ V is associated with a centroid c k ∈ C, and an edge e ∈ E is defined for each pair of adjacent regions (see Figure 6). Two regions R i and R j are considered to be adjacent if R i ∪ R j is a connected set, using 8-connectivity. Using the RAG, the image field agglomeration α is defined as: From Equation (2), it is clear that α has a lower bound (0 ≤ α) when the RAG has only isolated nodes (i.e., E = ∅). On the other hand, there is no clear analytical upper bound (α < ∞).
Particular attention must be paid to feature interpretation: two fields with different numbers of cells could have a similar agglomeration value if they have, in proportion, the same number of agglomerated cells. Agglomeration is a morphometric feature since the computation is mainly based on the RAG that, different to the original approach [43], encodes only spatial information without considering any. Moreover, the feature is computed on a grayscale version of the input image and so is robust to color variation.

Rhinocyt System Evaluation
The same methodology used in the work of Dimauro et al. has been chosen to preserve analysis backward compatibility and validity [11]. The evaluation of the Rhinocyt system has been carried out separately for extraction and classification modules. Common evaluation metrics found in machine learning literature have been employed, namely, precision, recall, F-measure and confusion matrix. In both cases, the evaluation has been carried out as a supervised task requiring a manual labeling process on the selected acquired slides.

Extraction Process Evaluation
The extraction process of cells from fields can be seen as an object detection task. Extracted image regions are detected cells; the discarded regions are cells not recognized. The 85 most-populated CYT fields have been sampled from multiple slides in order to evaluate the extraction module. A total number of 449 "objects" have been automatically extracted from these 85 fields by the extraction module. The extracted objects have been analyzed and verified manually by a specialist considering: true positives, extracted objects that were real cells; false positives, extracted objects that were not real cells; false negatives, discarded objects that were real cells; and true negatives, discarded objects that were not real cells. The last one is the most difficult to define since the "opposite of a cell" could be any object, like a smudge, a clot or an exploded cell; therefore, in the true negative count, only "objects" visually identifiable that have a distance from each other greater than an empirical threshold of 100 px have been considered.
The specialist validation was conducted using the following principles: • If the cell area is occluded more than 75%, the cell is not taken into account • If the cell area is cut by field image borders, it is not taken into account.

•
If a single cell is detected in multiple regions for more than 50% of the cell area, then it is considered a true positive. The same cell images are then manually merged to be counted only once. An example of this phenomenon, denoted with "Splitting Problem", is depicted in Figure 7.
A further analysis of the extraction module algorithm explains the causes behind the splitting problem presented in the previous sections. To detect the watershed transform markers, the extraction algorithm uses a maximum filter with a local window to compute local maxima. The window size is a fixed predefined parameter that does not adapt well to different cell sizes. If the cell

Rhinocyt System Evaluation
The same methodology used in the work of Dimauro et al. has been chosen to preserve analysis backward compatibility and validity [11]. The evaluation of the Rhinocyt system has been carried out separately for extraction and classification modules. Common evaluation metrics found in machine learning literature have been employed, namely, precision, recall, F-measure and confusion matrix. In both cases, the evaluation has been carried out as a supervised task requiring a manual labeling process on the selected acquired slides.

Extraction Process Evaluation
The extraction process of cells from fields can be seen as an object detection task. Extracted image regions are detected cells; the discarded regions are cells not recognized. The 85 most-populated CYT fields have been sampled from multiple slides in order to evaluate the extraction module. A total number of 449 "objects" have been automatically extracted from these 85 fields by the extraction module. The extracted objects have been analyzed and verified manually by a specialist considering: true positives, extracted objects that were real cells; false positives, extracted objects that were not real cells; false negatives, discarded objects that were real cells; and true negatives, discarded objects that were not real cells. The last one is the most difficult to define since the "opposite of a cell" could be any object, like a smudge, a clot or an exploded cell; therefore, in the true negative count, only "objects" visually identifiable that have a distance from each other greater than an empirical threshold of 100 px have been considered.
The specialist validation was conducted using the following principles: • If the cell area is occluded more than 75%, the cell is not taken into account • If the cell area is cut by field image borders, it is not taken into account.

•
If a single cell is detected in multiple regions for more than 50% of the cell area, then it is considered a true positive. The same cell images are then manually merged to be counted only once. An example of this phenomenon, denoted with "Splitting Problem", is depicted in Figure 7. area is much greater than the local window, multiple markers will be defined, producing an oversegmentation of the cell. The splitting problem mainly occurs in the case of ciliated cells that have an elongated morphology. From a clinical perspective, ciliated cell number does not influence diagnosis, as they are normally present all among nasal mucosa, so this problem is not to be considered crucial.

Classification Evaluation
The cell classification experimented with here is fully based on the cell classification module as designed and trained in [11], where the experimentation was done taking into consideration many stained slides from different patients. The Rhinocyt classification module is based on a CNN neural network pre-trained on a CELLS-SM dataset [11,12]. The model architecture is a vanilla CNN: an input layer of 50 px × 50 px full-color images, three blocks of convolutional and pooling layers (with ReLU activation function) for feature extraction and a fully connected layer followed by a Softmax output layer for multiclass classification. Each input image is classified into a prefixed set of seven classes (see Table 2 for details). The class "other" was defined for extracted objects not recognized as cells. The epithelial class was merged from two different cytotypes: metaplastic and ciliated.

Density and Agglomeration
For both processing techniques, 324 image fields were sampled from the FIELDS dataset. The diagnostic protocol designed and used by the Italian Academy of Nasal Cytology provides for the acquisition of 50 fields from SM slides. Therefore, an initial 350 fields (50 × 7 slides) were acquired. From these 350 SM fields, 26 of them contained an excessive quantity of clusters of cells not separated and difficult to separate even during direct observation (and thus useless also for doctors); obviously, we also reduced the CYT fields to 324, excluding 26 fields randomly. For each image, density and agglomeration features were extracted, obtaining two empirical distributions, from which multiple descriptive statistics were computed, which are reported in Table 3. A further analysis of the extraction module algorithm explains the causes behind the splitting problem presented in the previous sections. To detect the watershed transform markers, the extraction algorithm uses a maximum filter with a local window to compute local maxima. The window size is a fixed predefined parameter that does not adapt well to different cell sizes. If the cell area is much greater than the local window, multiple markers will be defined, producing an over-segmentation of the cell. The splitting problem mainly occurs in the case of ciliated cells that have an elongated morphology. From a clinical perspective, ciliated cell number does not influence diagnosis, as they are normally present all among nasal mucosa, so this problem is not to be considered crucial.

Classification Evaluation
The cell classification experimented with here is fully based on the cell classification module as designed and trained in [11], where the experimentation was done taking into consideration many stained slides from different patients. The Rhinocyt classification module is based on a CNN neural network pre-trained on a CELLS-SM dataset [11,12]. The model architecture is a vanilla CNN: an input layer of 50 px × 50 px full-color images, three blocks of convolutional and pooling layers (with ReLU activation function) for feature extraction and a fully connected layer followed by a Softmax output layer for multiclass classification. Each input image is classified into a prefixed set of seven classes (see Table 2 for details). The class "other" was defined for extracted objects not recognized as cells. The epithelial class was merged from two different cytotypes: metaplastic and ciliated.

Density and Agglomeration
For both processing techniques, 324 image fields were sampled from the FIELDS dataset. The diagnostic protocol designed and used by the Italian Academy of Nasal Cytology provides for the acquisition of 50 fields from SM slides. Therefore, an initial 350 fields (50 × 7 slides) were acquired. From these 350 SM fields, 26 of them contained an excessive quantity of clusters of cells not separated and difficult to separate even during direct observation (and thus useless also for doctors); obviously, we also reduced the CYT fields to 324, excluding 26 fields randomly. For each image, density and agglomeration features were extracted, obtaining two empirical distributions, from which multiple descriptive statistics were computed, which are reported in Table 3. As can be seen in Figures 8-11, these distributions have a positive skewness towards lower values. In Figure 8, it is visible how FIELDS-CYT distribution is affected by the presence of outliers.
To increase analysis robustness, as suggested by Leys et al. [44], median, median absolute deviation (MAD) and interquartile range (IQR) were chosen, respectively, as centrality and dispersion indices. Figure 10 reports the density distributions obtained for both the CYT and SM techniques. The CYT distribution has a power-law distribution like shape; in fact, 75% of fields have a density lower than 10%, whereas the SM density values are more uniformly distributed around the median, between 20-40% of density. Such evidence is supported by Q3, MAD and IQR values reported in Table 3. Figure 12 relates the average density and average sparsity for both SM and CYT techniques. The lower average density value equal to 2.97% for CYT, relative to 29.9% for SM, proves that the cytocentrifugation process leads to a cellular population loss one order of magnitude greater than the direct smear technique. However, as can be seen in Figure 8, the presence of outliers in CYT distribution does not exclude the chance of high-density fields.  As can be seen in Figures 8-11, these distributions have a positive skewness towards lower values. In Figure 8, it is visible how FIELDS-CYT distribution is affected by the presence of outliers.
To increase analysis robustness, as suggested by Leys et al. [44], median, median absolute deviation (MAD) and interquartile range (IQR) were chosen, respectively, as centrality and dispersion indices. Figure 10 reports the density distributions obtained for both the CYT and SM techniques. The CYT distribution has a power-law distribution like shape; in fact, 75% of fields have a density lower than 10%, whereas the SM density values are more uniformly distributed around the median, between 20-40% of density. Such evidence is supported by Q3, MAD and IQR values reported in Table 3. Figure 12 relates the average density and average sparsity for both SM and CYT techniques. The lower average density value equal to 2.97% for CYT, relative to 29.9% for SM, proves that the cytocentrifugation process leads to a cellular population loss one order of magnitude greater than the direct smear technique. However, as can be seen in Figure 8, the presence of outliers in CYT distribution does not exclude the chance of high-density fields.  Special attention is needed to interpret the results obtained for the agglomeration feature. As already described in Section 3.2, agglomeration is an object-level feature; hence, it is necessary to execute a cell detection process, implying that errors arising at this stage affect the robustness of the extracted feature. For the reasons explained in Section 3.2, the "splitting problem", has slightly altered the agglomeration values since a single cell (mostly ciliated) can be detected by multiple adjacent regions and counted multiple times, forming a "fake" agglomerate.
In Figures 9 and 11, it is noticeable how the distributions are quite different: The SM distribution follows a Gaussian bell-like shape while the CYT distribution has a strong visible peak in α = 0. This value is mainly due to generally low-density values rather than the separation effectiveness of the cytocentrifugation process. In Figure 13 some examples of SM and CYT fields are reported.   Special attention is needed to interpret the results obtained for the agglomeration feature. As already described in Section 3.2, agglomeration is an object-level feature; hence, it is necessary to execute a cell detection process, implying that errors arising at this stage affect the robustness of the extracted feature. For the reasons explained in Section 3.2, the "splitting problem", has slightly altered the agglomeration values since a single cell (mostly ciliated) can be detected by multiple adjacent regions and counted multiple times, forming a "fake" agglomerate.
In Figures 9 and 11, it is noticeable how the distributions are quite different: The SM distribution follows a Gaussian bell-like shape while the CYT distribution has a strong visible peak in α = 0. This value is mainly due to generally low-density values rather than the separation effectiveness of the cytocentrifugation process. In Figure 13 some examples of SM and CYT fields are reported.   Special attention is needed to interpret the results obtained for the agglomeration feature. As already described in Section 3.2, agglomeration is an object-level feature; hence, it is necessary to execute a cell detection process, implying that errors arising at this stage affect the robustness of the extracted feature. For the reasons explained in Section 3.2, the "splitting problem", has slightly altered the agglomeration values since a single cell (mostly ciliated) can be detected by multiple adjacent regions and counted multiple times, forming a "fake" agglomerate.
In Figures 9 and 11, it is noticeable how the distributions are quite different: The SM distribution follows a Gaussian bell-like shape while the CYT distribution has a strong visible peak in α = 0. This value is mainly due to generally low-density values rather than the separation effectiveness of the cytocentrifugation process. In Figure 13 some examples of SM and CYT fields are reported.  Special attention is needed to interpret the results obtained for the agglomeration feature. As already described in Section 3.2, agglomeration is an object-level feature; hence, it is necessary to execute a cell detection process, implying that errors arising at this stage affect the robustness of the extracted feature. For the reasons explained in Section 3.2, the "splitting problem", has slightly altered the agglomeration values since a single cell (mostly ciliated) can be detected by multiple adjacent regions and counted multiple times, forming a "fake" agglomerate.
In Figures 9 and 11, it is noticeable how the distributions are quite different: The SM distribution follows a Gaussian bell-like shape while the CYT distribution has a strong visible peak in α = 0. This value is mainly due to generally low-density values rather than the separation effectiveness of the cytocentrifugation process. In Figure 13 some examples of SM and CYT fields are reported.

Correlation
Intuitively, agglomeration could be confused with density. Even though both features describe cell distribution, the former encodes the cellular spatial relations, whereas the latter encodes any at all. Despite this, both features are conceptually tightly coupled; in fact, higher density values are evidence of a reduction of intercellular space, leading thus to a higher likelihood that two or more cells are agglomerated. To assess the statistical correlation between density and agglomeration, Spearman's rank correlation coefficient was computed on SM and CYT distributions, since they are skewed and affected by outliers. For the SM technique, a positive correlation has been observed (ρs = 0.68, n = 324, p < 0.05); the same applies to the CYT technique (ρs = 0.72, n = 324, p < 0.05).

Correlation
Intuitively, agglomeration could be confused with density. Even though both features describe cell distribution, the former encodes the cellular spatial relations, whereas the latter encodes any at all. Despite this, both features are conceptually tightly coupled; in fact, higher density values are evidence of a reduction of intercellular space, leading thus to a higher likelihood that two or more cells are agglomerated. To assess the statistical correlation between density and agglomeration, Spearman's rank correlation coefficient was computed on SM and CYT distributions, since they are skewed and affected by outliers. For the SM technique, a positive correlation has been observed (ρs = 0.68, n = 324, p < 0.05); the same applies to the CYT technique (ρs = 0.72, n = 324, p < 0.05).

Correlation
Intuitively, agglomeration could be confused with density. Even though both features describe cell distribution, the former encodes the cellular spatial relations, whereas the latter encodes any at all. Despite this, both features are conceptually tightly coupled; in fact, higher density values are evidence of a reduction of intercellular space, leading thus to a higher likelihood that two or more cells are agglomerated. To assess the statistical correlation between density and agglomeration, Spearman's rank correlation coefficient was computed on SM and CYT distributions, since they are skewed and affected by outliers. For the SM technique, a positive correlation has been observed (ρ s = 0.68, n = 324, p < 0.05); the same applies to the CYT technique (ρ s = 0.72, n = 324, p < 0.05).

System Performance
Two different performance assessments were carried out separately through the Rhino-cyt system, the first using the extraction module and the second using the classification module.

Extraction
Overall, 85 CYT fields were presented as input to the extraction module; from each execution, a confusion matrix was computed. The resulting matrices were aggregated as a sum into a single "global" confusion matrix, reported in Table 4. Since, in this operative context, the primary goal is to quantify the identified cells, the recall metric was used. The presence of false positives in the evaluation has limited significance because, in the classification stage, all false positive cells extracted should be classified with the "other" label; hence, a precision metric was not computed. The recall value is equal to 52.3%. This value is lower than the recall value reported in the work of Dimauro et al. [11], which is equal to 97.7% and computed on 1639 objects extracted from 50 SM fields. Low recall value must be interpreted considering the low density and low agglomeration values of CYT fields.
In contrast to what was supposed in hypothesis 1, the extraction module performance on CYT fields has deteriorated. The causes mainly lie in two factors: • Low field density: 75% of fields have a density value of less than 10%, implying that each field contains an average of 2-3 cells.

•
Outliers: CYT distribution is affected by outlier samples (Figure 8) that have a density value of over 35% and agglomeration values different from 0; these two quantities suggest the presence of fields with large agglomerates. These agglomerates are detected from the extraction module as a single region and discarded; since the region size is greater than a fixed empirical threshold, all the cells contained in the agglomerates are unavoidably lost. These two factors counterbalance the low true positive count with a high false negative count.

Classification
A further analysis of the extraction module algorithm reveals the causes behind the splitting problem presented in the previous sections. To detect the watershed transform markers, the extraction algorithm uses a maximum filter with a local window to compute local maxima. The window size is a fixed predefined parameter that does not adapt well to different cell sizes. If the cell area is much greater than the local window, multiple markers will be defined, producing an over-segmentation of the cell. The splitting problem mainly occurs in the case of ciliated cells that have an elongated morphology.
To evaluate the classification module, 1990 cells were sampled from the CELLS-CYT dataset and presented as input to the classifier. The prediction errors are reported as a multiclass confusion matrix in Table 5. From Table 6, it is noticeable how the classifier was not able to correctly predict all samples belonging to the following classes: mast cells, lymphocytes and muciparous classes. This was reported in our previous paper [11], trained with a dataset lacking data for those classes. Thus, such a result for these classes could be imputable to underfitting. The misclassification of eosinophil class is interesting: 65% of them are wrongly classified as a neutrophil. The most plausible reason is that both cytotypes are morphologically and chromatically stained similarly. From Table 5, it is clearly visible that the most prominent class of dataset, namely for epithelial cells, is affected by severe errors: 49% of them are misclassified as a neutrophil. Different results were obtained in the work of Dimauro et al. on 1006 cell images extracted from slides processed with SM [11].

Conclusions
Through the quantitative differential analysis presented in this paper, it was possible to quantitatively verify how cells' spatial distribution differs according to the processing technique used. Based on the obtained outcomes, cytocentrifugation, as an alternative processing technique to "direct smear", seems to be discouraged. The biggest impact of cytocentrifugation appears to be lower density and loss of cellular population; to analyze the same amount of cells, the specialist must acquire a number of fields much greater than the minimum 50 fields suggested by the protocol.
Furthermore, in the face of a more articulated slide preparation procedure, the high cost of cytocentrifuge, and better performance of CAD Rhinocyt on SM rather than CYT, there is an increased effort for field acquisition by the specialist during the clinical exam.
It seems that the same CAD system cannot be used without ad-hoc modifications. Since the evidence suggests that it is impossible to use a single model trained on a single dataset without performance degradation, re-training the classifier neural network on the FIELDS-CYT dataset or a new dataset merged from FIELDS-CYT and FIELDS-SM could help to improve classifier performance. It has to be considered that, if the patients of this study are different from those for the model training, then any inter-patient variations between the training and the testing data may have negatively influenced the prediction on the CYT slides.
An in-depth local cytometric analysis for each cytotype could offer new insights into technique differences. High-density agglomerated fields involve a partial occlusion of cellular structures and less defined cellular edges, factors that negatively impact the performance of the extraction module. Cellular agglomerations are mitigated mainly due to the low-density of the fields, rather than for the separation forces exerted by cytocentrifugation. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.