A Comprehensive Analysis of Deep Neural-Based Cerebral Microbleeds Detection System

Machine learning-based systems are gaining interest in the field of medicine, mostly in medical imaging and diagnosis. In this paper, we address the problem of automatic cerebral microbleeds (CMB) detection in magnetic resonance images. It is challenging due to difficulty in distinguishing a true CMB from its mimics, however, if successfully solved, it would streamline the radiologists work. To deal with this complex three-dimensional problem, we propose a machine learning approach based on a 2D Faster RCNN network. We aimed to achieve a reliable system, i.e., with balanced sensitivity and precision. Therefore, we have researched and analysed, among others, impact of the way the training data are provided to the system, their pre-processing, the choice of model and its structure, and also the ways of regularisation. Furthermore, we also carefully analysed the network predictions and proposed an algorithm for its post-processing. The proposed approach enabled for obtaining high precision (89.74%), sensitivity (92.62%), and F1 score (90.84%). The paper presents the main challenges connected with automatic cerebral microbleeds detection, its deep analysis and developed system. The conducted research may significantly contribute to automatic medical diagnosis.


Introduction
The number of successful applications of machine learning algorithms is constantly growing. Unlike classic approaches, deep neural networks (DNNs) are naturally predisposed to efficiently handle vast amounts of data. They successfully cope with inaccurate or noisy data, different sizes and orientations of objects, as well as, varying lighting conditions. Moreover, if these algorithms are properly selected and trained, they have a high capacity to generalise the acquired knowledge. The latter is extremely important in practical applications where we have to struggle with a variety of cases, small, yet significant differences between classes, and a large diversity of objects within a class or an insufficient number of appropriately labelled unbalanced data.
This paper approached the very important problem of cerebral microbleeds (CMB) detection in MR images. Cerebral microbleeds are small, oval, hypointense areas visible at T2*-weighted or susceptibility-weighted (SW) imaging [1,2]. The cerebral microbleeds can be seen in the images due to changes in local magnetic susceptibility because of pathologic iron accumulation as a result of (most often) perivascular macrophages due to vasculopathy. A single microbleed is mostly from 2 to 5 mm or even 10 mm in diameter [3]. However, the size is not the differentiation criterion, as it can be deceptively increased due to blooming artifacts. MR images give a detailed three-dimensional view of organs and can be effectively used to detect and analyse the abnormalities in them. Nonetheless, automated detection and classification of brain lesions, in particular CMBs, in 3D MR images is a challenging task due to their wide distribution within the brain, small sizes compared with the whole image, and the similarity between different lesions and lesion mimics.
The paper is organised as follows: further, in this section, we present the medical aspect of cerebral microbleeds as well as related works regarding CMB detection and challenges in this field; in Section 2, we introduce the reader to the case study, our approach including algorithms and data handling, while in Section 3, we describe conducted experiments and deliver the results. Finally, in Section 4, we discuss obtained results and conclude them in Section 5.
According to [7], around 5% of the healthy population has microbleeds, but their higher occurrence may be connected with several medical conditions. The presence of CMBs is strongly correlated with cognitive dysfunction [8]. Moreover, it increases the risk of stroke recurrence [9]. However, CMBs can also be found in healthy elderly people with unknown clinical implications [9].
The most commonly used method for the detection of CMBs is Magnetic Resonance Imaging (MRI) [10]. This method uses a non-ionizing radiation method to create diagnostic images. The image is created thanks to the natural magnetic properties of tissues. Specifically for detection of CMBs neuroimaging in MRI include T2* sequence or susceptibility-weighted imaging (SWI) [1,2]. Cerebral microbleeds appear on MR images as spherical signal loss (hypointense focal area), due to the paramagnetic properties of hemosiderin. CMBs as hemosiderin deposits contained macrophages that are displayed as hypointense images, which is related to containing high concentrations of iron. Paramagnetic properties of hemosiderin cause a signal loss because of susceptibility effects [11,12]. Detection of CMBs is increasing with the frequency of usage of MRI for diagnostic, CMBs are accidentally found along with different diagnostic pathologies.
Detection of all present cerebral microbleeds in MRI is crucial for proper diagnosis and treatment, as it is a common abnormality connected with different diseases. Despite increasing detection, there is still a lack of clear guidance and quick detection of CMB. The process of manual inspection and detection of microbleeds is very laborious and time-consuming. Automating the whole process of CMBs detection would make radiologists' work easier and faster.

Problem Statement and Related Works
The problem of CMB detection has been considered in a number of publications in recent years. Based on their analysis, several important challenges and conclusions regarding data, approaches and algorithms can be indicated.
Despite the great success of ML-based systems in the medical field, which outperforms other classic methods, there are many problems related to the use of algorithms. Among others: insufficient number of publicly available, labelled datasets; different quality and resolution of data; uneven class balance within the datasets; still poor ability to generalise results in some cases; and inconsistent evaluation of results, hindering their analysis [13][14][15][16][17][18][19][20].
In this paper, we try to discuss and find a solution to some of them in the case of CMB detection. Our extensive analysis and experiments were conducted to propose a way of synthesis the suitable DNN-based system for reliable CMB detection, which shows high performance and generalisation ability.
The problem of data shortage is common in the analysis of medical data. In radiology, class annotations alone are hardly enough for most prediction tasks. CMB similarly requires manually annotated bounding boxes or segmentation masks, which have to be done by medical experts. Such a precise manual data annotation is not only expensive and time-consuming, but also requires data anonymisation, and still, the CMBs are labelled only by a single point. Mostly, new datasets are created for specific research carried out by research teams. They are not later published due to complicated privacy regulations.
Moreover, existing datasets are prepared by different groups with different measurement equipment, medical procedures and also, for different purposes (e.g., strictly for the needs of physicians, data analysts, ML specialists). For instance, there are differences between the labelling methodology or the examination parameters. Not only the MRI machines specification depends on their producer, but also during the MRI examination technician set parameters depending on a case. The differences between patient origin are also crucial, as the human anatomical structure is different. Therefore, although datasets seem similar, especially for non-specialists in the medical field, to design a data-driven decision-support system able to efficiently operate in very different conditions, it is essential to have scans as diversified as possible.
In the case of automatic detection systems, it is fairly easy to mistake microbleeds with other objects, mainly because of their small size compared to the whole image, their similarity to the background, and lesion mimics (see Figure 1). For instance, an oval cross-section through a vessel or calcification is very similar to a CMB. The differences between microbleeds and other objects can be observed when rating the whole MRI altogether.
Sometimes, it is difficult to objectively compare the research results because, as mentioned earlier, there is a lack of objective benchmark databases. Besides, different metrics are used to evaluate the systems. For example, sensitivity is sometimes the only metric reported. However, it is relatively easy to obtain high sensitivity scores, but at the price of a large number of false positives. To avoid this, other metrics should also be provided-for instance, precision or FPavg (average number of false positives per subject). Of course, the goal is to have as high sensitivity as possible with a low false-positives score.
The development of ML methods has caused that traditional methods of image processing and analysis have been replaced by methods using mainly different types of tools based on deep neural networks.
Generally, ML-based solutions for object detection tasks may be divided into two groups: one-stage and two-stage detectors. In one-stage detectors, both detecting an object and assigning it to the predefined class are done at the same time, while in the two-stage approach, these two sub-tasks are carried out separately by producing the regions of interest (RoI) and then its classification.
The most popular representative of the one-stage approach is the family of YOLO (You Only Look Once) networks, the most recent ones are YOLOv4 [21], scaled YOLO [22] and YOLOv5 [23]. Although such approaches are much faster than two-stage ones, they produce a larger number of false positives and have significantly worse results for detecting small objects. This problem is clearly visible in the work [24]. The YOLO detector produces dozens of false-positive CMBs for one subject; hence, another stage is needed to reduce them.
Although new and better architectures are emerging, such as EfficientDet [25] and Vision Transformer [26], the above-mentioned issues related to the one-step approach have not been diminished.
The most popular architecture from the family of two-stage detectors is R-CNN [27,28] and its successors. The idea was based on defining regions of the proposal using selective search [29]. Then, scale them to a fixed size and apply them to a CNN network for feature extraction and finally to assign them the proper category using a linear SVM classifier. The biggest issue in this approach was the detection speed. Although computational capabilities continue to grow, creating more efficient algorithms makes them more usable in everyday life. This led to an improvement called Fast R-CNN [30] combining RCNN with Spatial Pyramid Pooling Network (SPPNet) [31] that did not require the fixed size of the region of the proposal passed to CNN.
Another proposed solution to speed up the computation process was Faster RCNN [32]. The novelty was in the generation of regions of interest by applying the Region Proposal Network (RPN). The interesting proposal was Feature Pyramid Networks (FPN) [33] enabling usage of the whole CNN network, instead of just its top layer for the detection task. That enabled achieving significantly better results. These days, the mentioned architecture is often used in object detection problems with different backbone variants.
Most of the proposed methods were based on a two-stage approach [24,36,37,40]. The first stage aimed to detect CMB candidates and was implemented in different ways, not always using neural networks; for example, the authors of [37,40] used fast radial symmetry transform (FRST). As a result, at this stage, it was possible to detect CMBs with high sensitivity, but the price for that is an enormous number of false-positives, which should be reduced in the next stage.
The challenge in 2D cerebral microbleeds detection is the fact that CMBs are mistaken with objects, like vessels, which are similar in two-dimensional space. The features to effectively distinguish CMBs from CMB mimics become apparent when analysing the sequence of adjacent slices and different types of images from the SWI sequence. Although cerebral microbleeds are best visible in the SWI, other ones also can be used to detect CMB. While most authors [34][35][36][37]40] used only SWI, others used also Phase [24], GRE [41,42], or QSM [43]. The results reported in these papers and a comparison with our approach can be found in Section 4.
In this paper, we present the results of our efforts put into the synthesis of a cerebral microbleeds detection system. We aimed to achieve a reliable system, i.e., one characterised by both high sensitivity and precision. Therefore, we have researched and analysed, among others, the impact of the way the training data are provided to the system, their resolution, the way of input images pre-processing, the choice of model and its structure, and also the ways of regularisation. Finally, we proposed a new algorithm for the system's predictions post-processing, which enabled us to partially take into account the three-dimensional nature of the analysed problem, despite using a 2D detector.
The results of the most interesting research are presented in Tables 3-7. The system with the most suitable structure was compared with the results reported by other research groups (see Table 8). Its performance was also tested on a different dataset, completely different from the data used to train, validate and test the system (see Table 7).

Materials and Methods
Although the most valuable feature of ML-based systems is their ability to efficiently extract knowledge directly from data, to make the system effective and reliable, it is crucial to provide a sufficient number of representative and well-pre-processed data selections, suitable model and accompanying learning algorithms, and finally, draw appropriate conclusions from the achieved results. In Figure 2, a pipeline illustrating the steps of the synthesis of the proposed system is shown. In the following section, they are described in detail.

Datasets
During the research, we took advantage of the cerebral microbleeds dataset collected and prepared by Medical Imaging LABoratory (MILAB) at Yonsei University and Gachon University Gil Medical Center [24]. The dataset, along with the ground-truth labels, was prepared by expert neuroradiologists using the pre-processed SWI, Phase and Magnitude images following the gold standard labelling. The details of the data annotation procedure can be found in [44].
The dataset consists of two types of MRI images: • High in-plane resolution (HR_data): 0.5 × 0.5 mm 2 ; • Low in-plane resolution (LR_data): 0.8 × 0.8 mm 2 . The exact parameters describing the images within the dataset are gathered in Table 1. For each subject, there are three types of sequences-SWI , Phase and Magnitude, as well as corresponding labels containing a number of slices and coordinates of a microbleed. Although microbleeds are usually visible on more than one slice, the labels do not always relate to all slices where the given microbleed is visible. To test the generalisation abilities of our system, we also used another dataset [36]. This dataset was used only for testing purposes (see Table 1). Its in-plane resolution is similar to HR_data-0.45 × 0.45 mm 2 . This dataset was built for work [36], and it consists of 320 subjects, but only 20 of them are publicly available. Nevertheless, such a batch of data collected in other conditions than data used for training and evaluation of the proposed system, used just for testing, ensures higher confidence of obtained results evaluation. The pipeline of the proposed cerebral microbleeds detection system. The input dataset undergoes pre-processing including padding, resize, normalisation, slice concatenation and labelling correction. Next, it goes through a deep neural network model and all the predictions are checked in the post-processing stage. At the output we get a bounding box with a predicted microbleed with a confidence score supported by the specific metrics.

Data Pre-Processing
It is well known that proper data pre-processing has an important influence on the capability to properly train a model. In this case, the pre-processing stage involved a few steps.
We resized images to select the appropriate size of the input images and then to scale them accordingly. Based on the number of experiments we decided to utilise images of a size: 512 × 512 (or 288 × 288 in the case of LR_data). The content of medical data (e.g., regarding the images shape, size, colour, contrast, etc.) is of great importance for analysis, and therefore should be modified very carefully, if needed. In particular, the aspect ratio of the images should not be changed because it might deform the lesions in the original images. We first pad all images to square size. This way, any further resize will not deform the lesions. The influence of image size on the final results was also the subject of research (see Section 3.3).
Next, the data were normalised and standardised by reducing the value of a single pixel by the image mean and dividing it by the image standard deviation.
As it was aforementioned in the text, microbleeds may be easily mistaken with vessels or other objects visible in the image. To distinguish CMB from its mimics, analysis of few adjacent slices is essential. It is possible through the 3D sequence; however, in our case, we use 2D instead of 3D. To provide information from the adjacent slices, many configurations of input images were tested. Finally, we took advantage of 3 DNN input channels, usually used in computer vision applications, to analyse red, green and blue channels. We decided to use each of the channels as a separate input; therefore, thanks to MRI slices being only one-channel images, we can put multiple images as an input to the network.
Another, as it turned out, important study involved the modification of the original labels in the dataset that we used. Although microbleeds are small (up to 10 mm in diameter), they are usually visible on more than one slice. We carefully analysed the images slice by slice and noticed that their annotations are not always fully consistent.
In most cases, one microbleed was labelled only in one slice, more precisely on the one where the microbleed was most visible. In cases where CMBs were relatively big and clear, they were labelled on a few successive slices. It inspired us to slightly change the way that annotations were provided. We created two new datasets to obtain consistency of labelling throughout the dataset. It was done by a machine learning specialist with prior consultation and approval from a radiologist. In the first one-HR_data_reduced, we removed part of the labels, so that there was only one annotation per microbleed. While in the second one-HR_data_extended, we added some labels so that each CMB was labelled in each slice in which it was visible.
Furthermore, in the original database, the microbleeds were marked as single points indicating their location, yet we replaced these points with 20x20 bounding boxes with a given point in the centre.

Model
Drawing from the experience of other authors confirmed by our preliminary research and regarding the poor performance of one-stage detectors in small objects detection, we decided to take advantage of a two-stage detector. We chose Faster R-CNN structure and ResNet50 architecture as a feature extraction backbone since it is widely recognised as one of the most effective structures in numerous studies, including medical applications considered in this paper. Although, two-stage detectors, are more computationally demanding, they more effectively handle the problems of small object detection and produce fewer false positives, which is crucial in cerebral microbleeds detection. The scheme of Faster R-CNN is illustrated in Figure 2.
To improve the results and make them more reliable, we applied several regularisation techniques. To enlarge the training set, we applied data augmentation. As far as medical data are concerned, we should be very careful with the image modifications because some relevant data may be lost or some artifacts added. The images from the training set were randomly flipped-with a 50% chance for a horizontal or vertical flip. In addition, they were also randomly rotated between 0 • and 90 • -with a 30% chance. To facilitate and accelerate the training, we adopted the network weights from ResNet-50-FPN pretrained on the COCO dataset using transfer learning.
The Smooth L1 Loss as the loss function for box prediction and Cross Entropy Loss for its classification were used to train the network. Nevertheless, it is worth remembering that in our case it was only one class during classification. As an optimiser, we employed Stochastic Gradient Descent (SGD) with momentum algorithm, with 0.005 learning rate, 0.9 momentum. The weight decay was set to 0.0005 and we used the batch size of 2. In addition, we applied the learning rate scheduler StepLR with step size set to 4 and gamma of 0.9, which means that every 4th epoch the learning rate is multiplied by 0.9 to prevent overfitting. Based on observation, the threshold was set to 70%. Nevertheless, we also further investigated the appropriate threshold value.
The networks were trained using the PyTorch library. All tests were performed on a computing unit equipped with: GeForce GTX 2080 Ti GPU with 8 GB memory and 32 GB RAM.

Predictions Post-Processing
To deal well with the three-dimensional problem by applying two-dimensional DNN, we proposed to apply an extra stage for post-processing of the given system's predictions. The idea is illustrated in the flowchart presented in Figure 3. The post-processing consists of two phases: verification of ground truth CMB detection and verification of false positives.
Of course, the main goal is to detect all the microbleeds within the analysed images. However, it is crucial to find CMB in any slice, not exactly one in which it was labelled. Therefore, we investigate if the ground truth CMB is present in the adjacent slices and add them to True Positive Candidates. To verify that we use IoU (Intersection over Union) of 40%, which means that a predicted bounding box has 40% of the common area with the ground truth bounding box. Finally, we eliminate all the duplicates.
On the other hand, even if the network prediction seems to be falsely positive, it is crucial to check if the mistake does not arise because of the labelling type. In the second stage, we validate if any of the false positives cover the ground truth CMB from the adjacent slices. If so, we no longer treat it as a false-positive prediction.

System Evaluation
Selected metrics, i.e., sensitivity, precision, F1 score, FP average, allow for a comprehensive assessment of achieved results. The metrics are calculated as follows: where: • TP-true positive -the number of actual CMBs, that were detected; • FP-false positive -the number of predicted CMBs, that were not marked as CMB in ground truth; • FN-false negative -the number of actual CMBs, that were not detected; • n-the number of subjects (patients) in the test set; • r-recall (sensitivity); • p(r)-precision as a function of recall.
Sensitivity (recall) (1) shows how the system deals with ground truth CMB detection. A high score means that almost all ground-true CMBs were detected. Precision (2) represents how accurate the predictions are, a high score means that the system generates a small number of false positives. F1 score (3) helps to check if there is a balance between sensitivity and precision. FPavg (4) shows the average number of false alarms per subject, while average precision (5) AP@0.5 represents an area under the precision-recall (sensitivity) curve with an IoU of 0.5. We used k-fold cross-validation, with 5 folds. The exact number of subjects and microbleeds in each fold is presented in Table 2. Table 2. Folds used in model evaluation. Test  Val  Train  Test  Val  Train   1  14  14  44  20  22  116  2  14  14  44  22  36  100  3  14  14  44  36  35  87  4  14  14  44  35  27  96  5  16  14  42  45 20 93

Case Study Results
To effectively select the system parameters and comprehensively evaluate the system, we conducted a series of experiments. To increase the objectivity of the results, the study was performed using cross-validation. Each study was repeated ten times-two per each fold, and the presented results are the averages of the experiments.

Input Configuration
To deal with a three-dimensional problem using a two-dimensional model we need to organise the model input so that the spatial dependence between successive slices of the MRI image sequence is taken into account as much as possible. For this purpose, separate input channels to the DNN were used and consecutive images from the sequence are fed to the network. Therefore, there are several ways in which a sequence of images can be delivered to network input. Images can be provided, as a single image, or as a sequence of consecutive images, or as a weighted average of consecutive images, etc. Similar research reported by other authors suggested merging of different sequences like Phase or Magnitude; however, our research found that relying solely on SWI images yields the best results. The structure of inputs with more channels was also analysed; however, there was no efficiency improvement, while the computation time increased significantly. The analysed ways of structuring the network inputs are gathered in Table 3. Table 3. Experiment results considering the type of data concatenation. The first three columns present the type of concatenation-which image was put to the channel and the rest are results for each case.

Input
where k stands for each image with an annotated CMB. For the sake of simplicity, the k representing the consecutive number of a slice is omitted in notations and in the Table 3. Please note that in the case of other input configurations, the sensitivity is higher, but the number of false predictions is significantly higher as well. The latter may be since that information from neighbouring images is not provided, therefore CMBs can be easily mistaken with, e.g., an oval cross-section through a vessel.
The results clearly present that information from CMB's surroundings is necessary to distinguish an actual CMB from its mimics. Applying information from adjacent slices significantly increases the precision. Although sensitivity drops, the delivered predictions are more accurate. Differences between the second and third cases are very slight as these two cases are pretty similar. Nevertheless, a bigger emphasis on image surrounding is crucial in terms of generated number of false positives.
It is also visible that providing only the SWI image (without a Phase image) gives better results in terms of F1 score.
The main goal in this experiment was to increase the precision and therefore lower the false positive ratio. We decided to choose the second variant in which we applied the additional SWI images-previous and next to the main one, as it had the highest precision (80.21%), the lowest FPavg (0.58) and the highest F1 score (82.29%).

Selection of Data Annotation Type
As mentioned in Section 2.2, we prepared two versions of dataset annotations, HR_data_reduced (one annotation per microbleed) and HR_data_extended (each CMB labelled in each slice in which it is visible). We checked how these influenced the results.
Raising the number of labels not only did not improve the sensitivity, but also increased the number of false positives. However, reducing the number of labels resulted in lowering the false positive ratio (FPavg = 0.64), while keeping the sensitivity at a high level (88.22%) at the same time. Therefore, we decided to use this kind of annotation in our further investigations. See Table 4 for more detailed results.

Input Image Size
MR images are relatively small comparing to those used in other computer vision problems. Usually, images are resized to smaller dimensions so that the computation cost was smaller.
In our research, we decided to enlarge our images. There were two main reasons. The first one was the size of CMB. As it is presented in the Figure 1, they are really small objects. Resizing the image to make it bigger also makes objects more visible. In this case, there are not many images, so a slight extension of training time is acceptable.
As it is presented in Table 5, the biggest image size-1500 × 1500, appeared to achieve the best results, as expected. In this experiment, our main goal was to obtain the best sensitivity (92.62%), because resizing the image was supposed to provide high true positive detection. However, increasing the size of images leads to longer computation time, thus 1500 × 1500 seems to be a good compromise.

Confidence Score Threshold Selection
Although the F1 score provides a fairly objective assessment, in practical solutions, keeping the appropriate balance between sensitivity and precision is important. To achieve this, we analysed the relationships between these metrics.
Confidence score shows how reliable the prediction from the network is with a value between 0 and 1, where a high value indicates a strong likelihood of a detected object to be an actual CMB. It is crucial to select an appropriate confidence score threshold that will reduce the number of predictions to only reliable ones (with high confidence scores).
As it is visible in Figure 4, all the metrics meet at one point for the value of threshold equal to 80%, where sensitivity equals 81.12% and precision equals 79.13%. Our main goal was to achieve a high precision value with as high sensitivity as possible, therefore we decided to select a threshold value of 70%. In that case, sensitivity equals 90.18%, but precision equals 72.97%. This experiment was conducted for an image size of 1024 × 1024 on the hr_data dataset, using no. 2 input configuration. It should be noted that, depending on one's priorities, threshold values within the range of 70-90% will still be a suitable choice.

Predictions Post-Processing
As mentioned in Section 2.4, to improve and make the results more reliable, we introduced an algorithm for predictions post-processing. In Table 6, we gathered results showing a comparison of the metrics with and without the post-processing stage employed. It is clear that most metrics are significantly better in the case of an extra analysis taking into account the adjacent slices. Especially noteworthy is an impressive rise in precision.
In Figure 5, we present examples of how the proposed algorithm works. In the first case, the same microbleed was found in two adjacent slices (see case (a) in Figure 5). Even if there was a single label in one slice, we should not treat the other prediction as a false positive, since it is actually a true positive. Therefore, in the verification of false positives, we inspect if the prediction is already in ground truth CMB from adjacent slices. If yes, we mark a prediction as correct. Only if we do not find any ground truth CMB matching a prediction, we add it to False Positive.
Another case is when a ground truth CMB was not detected (see case (b) in Figure 5). However, it was verified that it was detected on the next slice. Therefore, it was added to the True Positive candidates. After inspection, if it is not duplicated, it was marked as True Positive, as this microbleed was actually detected in the adjacent slice.
This approach prevails, because it lets us evaluate the system in terms of the whole MR image, not only a single slice.

Subsets
It is commonly known that having a well-prepared dataset used to train a model is crucial to obtain satisfying results. In medical data analysis, very often we have to struggle with the problem of highly unbalanced training sets. The reason is the shortage of data describing lesions, especially in the early stage. Regarding the issue of cerebral microbleeds analysis, it is obvious that images containing microbleeds represent just a small fraction of all images. Besides, the number of microbleeds in the MR image has a significant impact on the learning process of the neural diagnostic system, as well as on its further ability to generalise the acquired knowledge to similar cases.
In the opinion of radiologists confirmed by our experiments, crucial information in terms of cerebral microbleeds detection is its number, not necessarily its size or placement in the brain. Therefore, we analysed the effect of training set selection on performance. The idea was to select the datasets in such a way as to ensure their representativeness, i.e., to include various possible cases of the number of microbleeds per patient.
In particular, the 72 patients were divided into the following sub-groups: • Patients with 1 CMB; • Patients with between 2 and 5 CMB; • Patients with over 5 CMB.
As a result, we received three groups containing 38, 30 and 4 patients, respectively. Next, we prepared training, validation and test set, so that in each of them were patients from each subset. To not exclude any of the subjects from the test set, we performed the cross-validation through 4 folds (different from the original ones).
The test results presented in the Table 7 show a significant rise in the sensitivity metric, on the other hand, precision dropped. Training using original folds achieved more balanced results comparing to the prepared subset folds. Adding subjects with a clearly higher number of CMBs causes a greater ability to detect microbleeds, but entails a rise in false-positive predictions. Probably it is due to data imbalance. Ensuring a similar number of subjects for each group could significantly improve the performance.
It is worth noting, that results obtained at test_data are only slightly worse than from the HR_data_reduced. It is probably due to similar resolution and type of labelling. Nevertheless, it is a great success of the system to perform so well on a completely different database.
However, sensitivity obtained on the LR_data is markedly worse. Naturally, detecting a small object in the images with a much worse resolution is hard. It was also observed during the experiment described in Section 3.3. Obtained results were a lot worse for an image size of 256 × 256. Moreover, there is also a labelling factor. Data from LR_data were not unified as HR_data_reduced were. However, it may be interesting to note that precision for LR_data using subsets is higher than for HR_data_reduced or test_data. We assume that it might be connected with the lower system's ability to detect-not only CMBs, but also its mimics. Nevertheless, we decided to keep our final results tested on traditional folds as the results are more balanced and comparable to other research.

Discussion
The research and analysis presented in Section 3 allowed us to synthesise the final structure and parameters of the neural system supporting the detection of microbleeds. As the most suitable, we have selected three channels input configuration no. 2 (see Section 3.1), we took advantage of the reduced form of labelling (see Section 3.2), as the model we chose Faster R-CNN structure with ResNet50 architecture as a feature extraction backbone, we trained the model utilising images rescaled to 1500 × 1500 size (see Section 3.3), and finally, we applied the predictions post-processing (see Section 3.5).
The final results presented against the state of the art results are gathered in Table 8. In cases where the F1 score was not reported in the papers we compare against, it was calculated by us using sensitivity and precision.
The proposed approach outperformed state of the art results in terms of precision and false-positive ratio (FPavg). Moreover, with such a high precision level (89.74%) that is higher at least ten percentage points than reported by other researchers, we also managed to obtain relatively high sensitivity (92.62%). Additionally, the F1 score, which is an essential measure of the quality of the system's performance, is at the highest level among the others. It surpasses the next-best system by more than 5%. Moreover, our system also reached a high AP@0.5 level (88.16%). An example of how the system detects a microbleed is illustrated in Figure 6. The red boxes indicate ground truth CMBs, while the green ones represent system predictions. Although cerebral microbleeds are small lesions, the detector manages to find even hardly visible ones. It is also apparent that false-positive predictions are really similar to ground truth CMBs (see Figure 6d, for example).

Conclusions
To conclude, the main goal of our research was to develop a system that allows efficient and reliable detection of microbleeds. To achieve this, we analysed the influence of many important issues on the system performance. The analysis allowed us to draw many interesting conclusions and finally to implement the system accordingly.
In particular, we pointed out a number of pre-and post-processing techniques that allow increasing the ability to detect CMBs and distinguish them from their mimics.
Enlargement of the images has improved the networks ability to detect CMBs while providing information from the adjacent slices by skilfully input structuring has enabled a significant reduction of false-positive rate. We also confirmed that appropriate unification of the method of labelling the lesions is also crucial in terms of final results.
As a result, we achieved high levels of both sensitivity and precision metrics, confirmed by a high F1 score and a low number of false positives. As proven, compared to other such systems, ours performs very well. Joint analysis of reported metrics is important and allows for proper evaluation of the system and its comparison to other ones. It should be emphasised that this would not have been possible without the close cooperation of machine learning and radiologists.
Three-dimensional approaches seem to naturally fit this problem, as the data is also three-dimensional and with the increasing availability of powerful GPUs, it is becoming possible to efficiently analyse the volumetric medical data using 3D deep learning, but still, the issues like limited availability of data, the curse of dimensionality and related high computational cost, difficulties in analysing and interpreting the achieved results are still a challenge. At this stage of research and application, using 2D approaches seems more practical and effective.
Although the model that we used is not a state-of-the-art solution, it was carefully chosen considering its ability to detect small objects despite the longer computational time. We also selected appropriate hyper-parameters as well as image augmentation methods. Moreover, we have tested the impact of training set selection. We confirmed a significant impact of proper data selection, its diversity, representativeness and balance.
Finally, we proposed a novel prediction post-processing algorithm to appropriately evaluate the model. This has enabled the transition from two-dimensional to three-dimensional space of consideration. It made possible the reduction of false-positive predictions that are in fact CMBs. Moreover, it allowed the detection of cerebral microbleeds not only on slices, where they were labelled but also on the adjacent ones.
In our current research, we are focused on extending the functionality of the system to diagnose Small Vessels Disease (SVD), of which one of the symptoms are cerebral microbleeds. This requires the preparation of a more numerous, balanced and more precisely labelled patient dataset, which we are already involved in. Data Availability Statement: Data used in the study was publicly available on 1 April 2021. HR_data and LR_data at https://github.com/Yonsei-MILab/Cerebral-Microbleeds-Detection and test_data at http://www.cse.cuhk.edu.hk/~qdou/cmb-3dcnn/cmb-3dcnn.html.