Using Super-Resolution for Enhancing Visual Perception and Segmentation Performance in Veterinary Cytology

The primary objective of this research was to enhance the quality of semantic segmentation in cytology images by incorporating super-resolution (SR) architectures. An additional contribution was the development of a novel dataset aimed at improving imaging quality in the presence of inaccurate focus. Our experimental results demonstrate that the integration of SR techniques into the segmentation pipeline can lead to a significant improvement of up to 25% in the mean average precision (mAP) metric. These findings suggest that leveraging SR architectures holds great promise for advancing the state-of-the-art in cytology image analysis.


Introduction, Motivation
In recent years, deep learning-based solutions have emerged as a prominent topic in the field of Information Technology.Novel approaches are being developed and implemented daily to optimize, enhance, and facilitate various aspects of our lives.This growing trend is also evident in the medical domain, including veterinary medicine.It is important to note that deploying models for healthcare applications entails significant responsibility and necessitates rigorous testing and monitoring to mitigate any risks associated with artificial intelligence (AI) predictions.
Our research team is dedicated to developing solutions that assist veterinarians in making faster and more accurate diagnoses for their animal patients.Our prior work has focused on age classification [8], object segmentation [5] and detection [4] in the context of cytology imaging for canines.In this study, we continue our investigation into AI applications in the veterinary field, specifically exploring the combination of Super Resolution (SR) and Semantic Segmentation techniques.By building upon previous research, we aim to further advance the state of the art in veterinary image analysis and improve diagnostic outcomes.

State of the art, Reason for conducting the research
The acquisition of cytology images is a multifaceted process that involves the preparation of tissue samples using staining methods such as Diff-Quik, followed by the selection of suitable areas by a veterinary expert and image capture via a microscopemounted camera.This study aims to address the challenges associated with obtaining images of inadequate focus or suboptimal quality for examination purposes.
Research Questions: 1. Can deep learning-based architectures enhance the quality and resolution of cytology images, thereby facilitating improved image quality assessments?2. To what extent can such enhancements aid pathologists in diagnosing challenging or average-quality cases? 3. Does the improvement of image quality augment the performance of semantic segmentation architectures in detecting carcinogenic cells within preparations?4. How can a balance be struck between the varying perceptions of machine learning models, metrics, and human evaluators in determining image quality to achieve consistent and reliable results?
Dual Super-Resolution Learning for Semantic Segmentation [11] In the presented research authors proposed a two-way framework aimed at enhancing segmentation accuracy without incurring additional computational expenses.Prior investigations have demonstrated that diminishing the resolution of images leads to a decrease in segmentation quality.To address the high training costs associated with larger input sizes, the authors propose integrating super-resolution techniques into semantic segmentation tasks.
The devised architecture can perform segmentation and super resolution in an end-to-end training paradigm.This approach has been shown to achieve higher mean Intersection over Union (IoU) metric values while simultaneously requiring less computational power.
How Effective Is Super-Resolution to Improve Dense Labelling of Coarse Resolution Imagery? [9] The findings presented in this paper demonstrate that employing super-resolution (SR) techniques effectively enhances semantic segmentation performance.While the proposed approach surpasses conventional interpolation methods, it does not exceed the performance of the original high-resolution data.The developed pipeline features a straightforward design, consisting of two independent modules.The segmentation network receives input from the super-resolved data.
For the selected datasets, it was observed that as the level of degradation increased, the improvement in the Intersection over Union (IoU) metric became more substantial, due to the implementation of the super-resolution module.
Simultaneous Super-Resolution and Segmentation Using a Generative Adversarial Network: Application to Neonatal Brain MRI [10] Super resolution is frequently used in in the preprocessing of neonatal brain MRI data, owing to the lower resolution obtained in images acquired during examinations.Typically, this process involves a simple upscaling of the image followed by the application of segmentation models.
In this study, the authors introduce a unified solution that leverages a generative adversarial network (GAN).Experimental results demonstrate improved segmentation performance using the proposed method.Furthermore, the authors highlight potential future directions and applications within the field of medical image processing.

Contribution, new algorithm, constructed system
The primary contribution of this study is the incorporation of a Super Resolution module into the machine learning pipeline, with the aim of enhancing the accuracy of segmentation models (Figure 1).This potential application for improving image quality emerged as a result of various distortions that may occur during the acquisition of cytological preparation images (Table 1).Our research is specifically focused on addressing poor sharpness distortions.
The objective of this study is to develop a machine learning model capable of enhancing the quality of images affected by improper focus settings on the microscope's adjustment knob.To evaluate the effectiveness of this approach, the enhanced images will be compared to properly created images.The development of a dedicated dataset is a prerequisite for this evaluation [2].In the context of veterinary examinations, an animal patient undergoes evaluation when visible skin alterations are observed.A tissue sample is subsequently obtained and examined under a microscope, during which an image is generated.In instances where the image quality is sub optimal or the microscope lens focus is improperly set, the decision block (binary image classifier) routes the image to the Super Resolution model.Following this enhancement process, the segmentation model identifies objects within the image, and a diagnosis is proposed.

Novel dataset
The majority of datasets for Super Resolution (SR) tasks are artificially generated, employing image downscaling and interpolation techniques.However, the nature of the distortion we aim to address is distinct from these methods.Recognizing this discrepancy led to the development of an experimental dataset [2] in collaboration with a veterinary expert, which is elaborated upon in the subsequent chapter.
The following algorithm was proposed for the acquisition of samples: 1. Identify the diagnostic region within the cytological preparation.
2. Adjust the microscope lens focus to obtain a high-quality, sharp image.
3. Intentionally alter the microscope's adjustment knob to degrade the image quality and sharpness, thereby simulating the real-world distortion.
This approach enabled the generation of a dataset that more accurately represents the specific type of distortion we aim to mitigate, providing a more suitable foundation for model training and evaluation as in Figure 2.
Following this procedure we collected 1192 high resolution (2592 × 1944) samples with their corresponding distorted versions.

Proposed new super resolution metric
In this study, an additional metric based on frequency analysis is employed as a novel approach to assess segmentation performance.This method involves grouping the energy computed using a 2-dimensional Discrete Fourier Transform (DFT-2D) by frequency.This operation enables the observation of the total energy within each frequency range, providing insights into how machine learning models affect high frequencies in the image, which contribute to visual sharpness.The procedure for this approach is as follows (Figure 6): 1. Open an image in YCbCr mode and use only luminance channel 2. Compute the 2-dimensional discrete Fourier Transform and shift the zerofrequency component to the center of the spectrum.3. Calculate the absolute sum of all magnitudes for a chosen set of ring-shape masks and display the results in a bar plot.

Research formula for specific medical use case with unknown degradation
During the course of the research, addressing the super-resolution (SR) task when the nature of degradation is unknown was a significant concern.The following strategies were assessed: 1. Use pre-trained models on various datasets 2. Investigate known degradations, such as bicubic interpolation on our medical dataset 3. Develop a dedicated super-resolution dataset exhibiting the same degradation intended to be mitigated.
The most favorable results were achieved using the third approach; however, it is crucial to consider feedback from domain experts in the medical field.It was discovered that applying super-resolution to images introduced artifacts that would not typically be present in cytology images.Veterinary specialists tended to favor lower-quality images over sharper ones, as the artifacts introduced by SR models hindered the diagnostic process.One of the key conclusions drawn is that a sharper image does not necessarily equate to a superior model.

Experiments and results
This chapter provides a detailed look at the experiments conducted during this research study.Each subchapter explains the different stages of the study in a more accessible way, while still maintaining scientific accuracy.

Comparison of possible distortions in cytology imaging
Table 1 below presents a list of potential distortions that may occur during image creation by veterinary experts.These hypothetical scenarios may require the application of Super Resolution models as a preprocessing step to recover the images to their desired quality.The microscope screw set inaccurately or the focus is set for the background of the image This section presents a visual analysis of various classical image upsampling techniques when applied with a scale factor of 2. The objective is to evaluate the performance of each method in terms of image quality, preservation of structural details, and overall effectiveness in enhancing the resolution of the original image (Figure 7).

Pretrained segmentation model approach
This section contains the results of inferencing pretrained Super Resolution models and measuring their impact on semantic segmentation task on our cytology dataset.
In this comparative analysis, three state-of-the-art image upsampling architectures were selected for evaluation: SwinIR, BSRGAN and RealSRGAN.The goal was to assess each model's performance in terms of image quality and impact on segmentation metrics.
For segmentation evaluation, a deep learning model based on the Cascade Mask R-CNN [3] architecture was selected.The ResNeSt101 [12], which employs skip connections (i.e., input values bypass the current layer without any modifications, and are then summed with the modified input), was used for feature extraction.The model was initialized with weights pre-trained on the MS COCO dataset [6] and subsequently fine-tuned on cytology images.The results presented in Table 2 reveal a expected trend.As the bicubic interpolation scaling factor increases, which corresponds to a greater loss of information, both segmentation and super-resolution metrics are negatively affected.A higher scaling factor leads to increased confusion between objects.For instance, with bicubic interpolation using a scaling factor of 5, almost no cells are accurately recognized for the two cancer types, as illustrated in Figure 8. Figure 9 presents the relationship between the segmentation Average Precision (segmAP) and Peak Signal-to-Noise Ratio (PSNR) metrics.A decreasing trend in the ratio between these two metrics is observed when the scaling factor increases.This indicates that, in some cases, a linear correlation exists between the performance metrics of these two distinct computer vision tasks.As the scaling factor increases, the quality of the image and segmentation performance both tend to degrade.

Impact of the pre-trained Super Resolution models on segmentation inference
In this phase, a comparative analysis is conducted to evaluate the performance of the segmentation model on the original dataset opposite to bicubic interpolation with scaling factors of 2 and 5.The primary objective is to investigate whether employing pre-trained models can enhance the accuracy of the segmentation process.The application of various super-resolution (SR) architectures on the original data did not yield any improvements in segmentation quality.Nevertheless, the minimal loss in mean average precision (mAP) indicates that the model is proficient in identifying cancer cells that have undergone enhancement through the SR process (Table 3).
When employing image enhancement techniques for damaged data with decimation and bicubic interpolation, the results, as presented in Table 6 and Table 7, are found to be worse.This suggests that the utilization of SR introduces additional noise to the data, consequently leading to poorer performance by the segmentation model.This outcome was anticipated, given that the model was pre-trained on original data and was subsequently required to handle processed data during the inference phase.
The findings indicate that the naive application of SR models to images does not yield improvements in segmentation quality.However, it does enhance the perceptual quality of the image, as depicted in Figure 3.In certain instances, it also results in an increase in SR metrics, as demonstrated in Table 3.The obtained results were subsequently reviewed in consultation with a veterinary expert.It was determined that, although the images appeared sharper following the application of the super-resolution model, the presence of certain artifacts rendered them less reliable than the original images.The enhanced image quality did not contribute to improved diagnostic accuracy, as these artifacts introduced elements that would not typically be found in cytology images.In this experiment, we trained and tested the segmentation on data processed in various ways.As demonstrated in Table 4, we downsampled the images using decimation and bicubic interpolation and then upsampled using super resolution architectures.While this research may not hold practical significance from a medical standpoint, as manipulating original data is generally discouraged, it does reveal that the optimal results for cancer cell recognition are obtained when utilizing undistorted, original data.

Dedicated data set experiments for super resolution
Ultimately, the experiments were carried out on a dedicated dataset, with the SwinIR architecture selected for training [7].
During the exploration of the dataset, the data distribution was analyzed.The histograms presented in Figure 11 showcase the PSNR values for both bicubic interpolation and our dedicated dataset, in comparison to high-resolution original data.
The spectrum of our data set is wider and there are images that would be considered of a good quality in terms of pixel loss.In contrast, bicubic interpolation exhibits less diversity, limiting its applicability to the restoration of specific distortion types.
The dedicated dataset encompasses various forms of degradation that are likely to be encountered in cytology images.
The initial experiment exhibited a substantial improvement in the PSNR metric upon training the SwinIR model on our dataset, as displayed in Table 8.This improvement is also evident in the inferred images after training, presented in Figure 4.
The transformer model underwent training for approximately 1,000 epochs, utilizing four NVIDIA V-100 GPUs from the ACK Cyfronet Prometheus supercomputer [1].Default parameters tailored for the classical super-resolution task were employed.The upsampling factor was set to 2.  Experiments involving training on our data set demonstrated a notable improvement when compared to the bicubic data set, which is commonly utilized in superresolution tasks.

12/22
The second experiment focused on examining the influence of the selected spectrum.We investigated whether a narrow or wide spectrum of our data set would yield superior results.Figure 10 illustrates the three distinct segments of the dataset that were employed in this experiment.
The experiment reveals that training on the widest spectrum (as illustrated in Figure 10) leads to the the most favorable results for both narrow and wide spectrum test data sets, as presented in Table 9.This finding suggests that the model effectively learns to reconstruct images when the training data encompasses a diverse range of distortions with varying types and levels (Figure 5).The results depicted in the subsequent table exhibit promising outcomes.In certain methodologies, the results show improvements when compared to the lowquality dataset.For instance, with the BSRGAN architecture, employing a 4x up-

13/22
sampling technique followed by subsampling to the required resolution, the improvement reaches up to 25% when compared to the results that would have been obtained using a low-quality dataset.As anticipated, the most optimal results are achieved when training the segmentation on high-quality images, where the segm_mAP_75 increase is up to 64%.Interesting aspect of the following experiments is that using super resolution as a first step, before subsampling to desired annotations size leads to better results than the opposite operation.This is because subsampling as a first step loses even more informations that can not be restored during upsampling step, leading to lower segmentation results.
The intriguing outcome of the subsequent experiments suggests that implementing super-resolution as an initial step, prior to the subsampling process to achieve the desired annotation size, yields superior results compared to the inverse sequence of operations.This observed phenomena could be primarily attributed to the fact that the adoption of subsampling as a preliminary step entails an excessive loss of information.This loss of detail, once occurred, is unable to be fully recuperated during the upsampling process, culminating in compromised segmentation outcomes.Hence, the order of these processing steps plays a critical role in maximizing the data fidelity and overall accuracy of the segmentation results.

Conclusion
The presented research, which encompasses two fields of Computer Vision -Super Resolution and Semantic Segmentation, underscores the possibility of enhancing the quality of medical images for their interpretation and analysis.The primary challenges identified during the investigation include a scarcity of data and differing perceptions between programmers and medical experts.
The first challenge stems from the nature of the formulated problem.Restoring an image from an unknown degradation is a daunting task, particularly in medical imaging.Consequently, a unique dataset was created for this study, enabling improvements in both super-resolution metrics and human perception.The second challenge arises from the specific use case provided; veterinary experts analyze medical images differently from individuals unfamiliar with cytology.The realization that sharper, high-resolution images are not always preferable for diagnosis due to artifacts observable after applying AI models was not immediately evident.
This research primary achievements include formulating a potential procedure for addressing unknown degradation, such as incorrectly set sharpness on a microscope.The steps taken during the experimental phase could potentially be applied to other domain-specific use cases.Another valuable aspect is the identification and analysis of possible distortions in medical imaging.We facilitated a better understanding of the problem's nature and its uniqueness compared to standard Super Resolution tasks.
Finally as expected, the substantial increase in the PSNR measure during SwinIR architecture training (Table 8) and the visual perception improvement shown in Figure 4 are noteworthy.The remarkable improvement of up to 25% in certain experimental scenarios, with respect to the segm_mAP metric change, is also worth mentioning.This underscores the potential of the applied methods in enhancing the performance of image segmentation tasks in medical imaging.

Figure 1 .
Figure 1.Proposed working system scheme

Figure 2 .
Figure 2. Examples of high and low sharpness images from the data set bulb is not turned on or the room where the image is created is dark, the resulting image may suffer from low contrast and poor illumination.closed aperture Responsible for the amount of the light that comes to a focus in the image plane closed condensor An improperly adjusted condenser, which is responsible for providing evenly distributed illumination dark outside The image was not directly at the lens leading to dark edges bad sharpness

Figure 3 .
Figure 3. Example results of chosen architectures (cropped part of the image)

Figure 4 .
Figure 4. Comparison of results for training on dedicated and bicubic data sets

Figure 5 .
Figure 5. Example result for training on a wide spectrum

Figure 8 .
Figure 8. Number of True Positive cells detection for different scaling factors

Figure 9 .
Figure 9. Correlation between segmAP and PSNR metrics for different scaling factors

Figure 11 .
Figure 11.Comparison of histograms for data sets

Table 1
Potential Distortions in Veterinary Image Creation

Table 2
Comparison of super resolution and segmentation metrics using pretrained segmentation model for inference

Table 3
Results for inference on the test set for the original data set

Table 4
4.3.3.Training semantic segmentation model on Super-Resolution medical data Training semantic segmentation on interpolated and super resolution data

Table 5
End-to-end pipeline tests BSRGAN

Table 6
Data set bicubic2 segm_mAP avg_precision avg recall PSNR SSIM LPIPS Results for inference on the test set for bicubic interpolation with factor 2

Table 7
Results for inference on the test set for bicubic interpolation with factor 5

Table 8
Dedicated and bicubic results