3.2. Quantitative Differential Analysis
Visual interpretation of cellular features using microscopy, i.e., image-based cytometry, plays a fundamental role in cytology and histopathology diagnosis activities [
38]. The human visual system is able to qualitatively detect and interpret visual patterns with great efficiency [
39,
40]. A subjective evaluation is a reliable and accurate evaluation methodology but requires a great amount of time and effort, impractical in real-life large-scale applications and unachievable in automatized ones. Moreover, subjective evaluation could be further complicated by reproduction devices, environmental conditions, inter-observer variance, the observer’s visual abilities and cognitive biases. In order to remedy these issues, a quantitative approach through an objective evaluation should be used. Using mathematical models and metrics, biological entities and phenomena could be described in less fuzzy terms, enabling quantitative, unbiased, reproducible and large-scale analysis [
41].
Using a digital image as the data source, almost all the biological information is implicitly encoded at different levels of abstraction. In order to describe the object of interest in a formal and quantitative way, it is necessary to execute a feature extraction process [
38]. A feature is a numeric quantity used to describe an image (or a restricted patch) at a high level of abstraction [
42]. Usually, visual features are associated with the essential constructs of visual language like points, lines, shapes, colors, textures and patterns. According to the terminology proposed by Rodenacker et al. [
38], a feature is denoted as morphometric if it describes only spatial relations or densitometric if describes only pixel intensity levels, while a feature that combines both aspects is a structural feature.
A different categorization proposed by Kothari et al. is based on the data–information–knowledge pyramid model [
34]. Pixel-level features are the lowest in the hierarchy because the computation is based on pixel intensity data and they convey the lowest biological interpretability value [
34]. They are the easiest to extract and in some cases are satisfactory for content description. Object-level features convey a higher degree of biological information because they describe shapes, textures and spatial distribution of cellular structures and substructures. Since the extraction is carried out at object level, the entire process is highly dependent on the accuracy of detection or segmentation of the target objects, by which the robustness of consequent analysis is affected [
34]. Semantic-level features are the highest in the hierarchy because they tend to reduce the semantic gap between pixel data and biological concepts directly interpretable by specialists. The features’ values are based on classification or statistical rules computed on lower-level features [
34].
The selection of the right set of features has a great impact on the analysis quality; as a rule of thumb, two main aspects should be considered:
Interpretability: A feature should describe the encoded visual information in a way that is easily interpretable by the final observers. The description should be based as much as possible on human visual perception principles.
Robustness: Visual information encoded in an image shows a high degree of variance. A robust feature should have properties like equivariance/invariance to illumination levels, noise, scale, orientation or translation.
Among the features known in the literature [
38], those that better describe the visual content and discriminate the investigated processing techniques, i.e., SM and CYT, have been chosen for this study. The workflow depicted in
Figure 4 describes two main analysis typologies based on the region of interest (ROI) scale: global, having the ROI defined on the entire field image, captures intercellular relations; local, restricting the ROI to the area occupied by a single cell, captures intracellular features. The main goal of quantitative differential analysis is to measure the differences between the feature empirical distribution of fields processed with SM relative to those processed with CYT.
To detect cellular structures, the algorithm implemented by the Rhinocyt extraction module [
11] has been used. As discussed in
Section 2.3, the over-segmentation problem has been solved by using the watershed transform augmented with additional markers extracted employing the Otsu segmentation and EDT.
3.2.1. Density
Conceptually, field density is defined as the number of cytological specimens contained in the image field. The field density is based on a simple pixel counting process. The most important thing to notice is that field density takes into account both the pixels associated with cells but also artifacts like dust, clots, scratches and stain spots (see
Figure 5). The artifacts’ presence in field images is marginal; thus, the quantitative contribution to the density measure is negligible. Moreover, field density tries to convey the information associated with the specialist observation process, as the specialist effort is directly proportional to the density field.
To extract the field density, the field RGB image has been processed by the mean-shift algorithm filtering stage. Using a weighted average of pixel neighbors (spatial window: 21 px; color window: 51 px), this step acts as a low-pass filter. This helps to dampen color shades and attenuate high-frequency components of pixel colors. The smoothed image has been converted to grayscale to discard any color information, binarized using Otsu’s thresholding method and dilated by morphological dilation operation with a disk structuring element of 5-px radius to fill small holes. The output binary image has been then used to compute a normalized histogram; the field density and, in a complementary way, the field sparsity, is defined as in Equation (1).
where
I and
I′ are the input and output images with size
N × M, respectively, and
H(I′,0) and
H(I′,1) are the absolute frequencies of background and foreground pixels, respectively.
Field density is a densitometric pixel-level feature because the computation is mainly based on an image histogram that does not encode any spatial information. In this way, the feature is orientation and translation invariant and possesses a certain degree of robustness to illumination variance since a binarized image is used.
3.2.2. Agglomeration
The agglomeration feature is defined to quantify the cellular clumping phenomenon due to the mechanical forces exerted during the centrifugation process. The cellular clumping tends to reduce intercellular distance, creating cellular agglomerates. In the worst-case scenario, cells are overlapped, meaning that field reading is more difficult [
14] and automatic cell detection and extraction performance are degraded [
35]. Cellular clumping is a generic phenomenon that could be addressed from multiple points of view; the definition proposed here is based on the construction of a cellular network that reflects the cellular spatial distribution patterns, highlighting the connectivity among the image regions [
43].
Since feature extraction is based on the spatial position of cells in the field, the agglomeration is an object-level feature. The Rhinocyt extraction module has been used for cell detection. As the first step, after the cellular structures have been detected, the associated image regions are used to compute a set of centroids C = {
c1, …, cn}; each centroid is in a one-to-one relation with a cell. From centroid information, an unweighted undirected graph G (
V, E), called a region adjacency graph (RAG), is built. Each vertex
vk ∈
V is associated with a centroid
ck ∈
C, and an edge
e ∈
E is defined for each pair of adjacent regions (see
Figure 6). Two regions
Ri and
Rj are considered to be adjacent if R
i R
j is a connected set, using 8-connectivity. Using the RAG, the image field agglomeration α is defined as:
From Equation (2), it is clear that α has a lower bound (0 ≤ α) when the RAG has only isolated nodes (i.e., E = ∅). On the other hand, there is no clear analytical upper bound (α < ∞).
Particular attention must be paid to feature interpretation: two fields with different numbers of cells could have a similar agglomeration value if they have, in proportion, the same number of agglomerated cells. Agglomeration is a morphometric feature since the computation is mainly based on the RAG that, different to the original approach [
43], encodes only spatial information without considering any. Moreover, the feature is computed on a grayscale version of the input image and so is robust to color variation.
3.3. Rhinocyt System Evaluation
The same methodology used in the work of Dimauro et al. has been chosen to preserve analysis backward compatibility and validity [
11]. The evaluation of the Rhinocyt system has been carried out separately for extraction and classification modules. Common evaluation metrics found in machine learning literature have been employed, namely, precision, recall, F-measure and confusion matrix. In both cases, the evaluation has been carried out as a supervised task requiring a manual labeling process on the selected acquired slides.
3.3.1. Extraction Process Evaluation
The extraction process of cells from fields can be seen as an object detection task. Extracted image regions are detected cells; the discarded regions are cells not recognized. The 85 most-populated CYT fields have been sampled from multiple slides in order to evaluate the extraction module. A total number of 449 “objects” have been automatically extracted from these 85 fields by the extraction module. The extracted objects have been analyzed and verified manually by a specialist considering: true positives, extracted objects that were real cells; false positives, extracted objects that were not real cells; false negatives, discarded objects that were real cells; and true negatives, discarded objects that were not real cells. The last one is the most difficult to define since the “opposite of a cell” could be any object, like a smudge, a clot or an exploded cell; therefore, in the true negative count, only “objects” visually identifiable that have a distance from each other greater than an empirical threshold of 100 px have been considered.
The specialist validation was conducted using the following principles:
If the cell area is occluded more than 75%, the cell is not taken into account
If the cell area is cut by field image borders, it is not taken into account.
If a single cell is detected in multiple regions for more than 50% of the cell area, then it is considered a true positive. The same cell images are then manually merged to be counted only once. An example of this phenomenon, denoted with “Splitting Problem”, is depicted in
Figure 7.
A further analysis of the extraction module algorithm explains the causes behind the splitting problem presented in the previous sections. To detect the watershed transform markers, the extraction algorithm uses a maximum filter with a local window to compute local maxima. The window size is a fixed predefined parameter that does not adapt well to different cell sizes. If the cell area is much greater than the local window, multiple markers will be defined, producing an over-segmentation of the cell. The splitting problem mainly occurs in the case of ciliated cells that have an elongated morphology. From a clinical perspective, ciliated cell number does not influence diagnosis, as they are normally present all among nasal mucosa, so this problem is not to be considered crucial.
3.3.2. Classification Evaluation
The cell classification experimented with here is fully based on the cell classification module as designed and trained in [
11], where the experimentation was done taking into consideration many stained slides from different patients. The Rhinocyt classification module is based on a CNN neural network pre-trained on a CELLS-SM dataset [
11,
12]. The model architecture is a vanilla CNN: an input layer of 50 px × 50 px full-color images, three blocks of convolutional and pooling layers (with ReLU activation function) for feature extraction and a fully connected layer followed by a Softmax output layer for multiclass classification. Each input image is classified into a prefixed set of seven classes (see
Table 2 for details). The class “other” was defined for extracted objects not recognized as cells. The epithelial class was merged from two different cytotypes: metaplastic and ciliated.