Expanded Brain CT Dataset for the Development of AI Systems for Intracranial Hemorrhage Detection and Classification

Khoruzhaya, Anna N.; Bobrovskaya, Tatiana M.; Kozlov, Dmitriy V.; Kuligovskiy, Dmitriy; Novik, Vladimir P.; Arzamasov, Kirill M.; Kremneva, Elena I.

doi:10.3390/data9020030

Open AccessData Descriptor

Expanded Brain CT Dataset for the Development of AI Systems for Intracranial Hemorrhage Detection and Classification

¹

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow, Russian Federation, Petrovka Street, 24, Building 1, 127051 Moscow, Russia

²

Research Center of Neurology, Volokolamskoe hw. 80, 125367 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Data 2024, 9(2), 30; https://doi.org/10.3390/data9020030

Submission received: 22 October 2023 / Revised: 10 January 2024 / Accepted: 26 January 2024 / Published: 6 February 2024

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

Download

Browse Figures

Versions Notes

Abstract

:

Intracranial hemorrhage (ICH) is a dangerous life-threatening condition leading to disability. Timely and high-quality diagnosis plays a huge role in the course and outcome of this disease. The gold standard in determining ICH is computed tomography. This method requires a prompt involvement of highly qualified personnel, which is not always possible, for example, in case of a staff shortage or increased workload. In such a situation, every minute counts, and time can be lost. The solution to this problem seems to be a set of diagnostic decisions, including the use of artificial intelligence, which will help to identify patients with ICH in a timely manner and provide prompt and quality medical care. However, the main obstacle to the development of artificial intelligence is a lack of high-quality datasets for training and testing. In this paper, we present a dataset including 800 brain CT scans consisting of multiple series of DICOM images with and without signs of ICH, enriched with clinical and technical parameters, as well as the methodology of its generation utilizing natural language processing tools. The dataset is publicly available, which contributes to increased competition in the development of artificial intelligence systems and their advancement and quality improvement.

Dataset: https://mosmed.ai/en/datasets/rasshirenniinaborkompyuternihtomogrammgm/.

Dataset License: CC-BY-NC-ND

Keywords:

computed tomography; intracranial hemorrhage; artificial intelligence; training dataset

1. Summary

Intracranial hemorrhage (ICH) of any genesis is a potentially life-threatening condition. A variety of conditions can cause ICH. They can be classified as primary (80–85%) and secondary (15–20%), as well as traumatic and nontraumatic. According to the Global Burden of Disease, Injuries, and Risk Factors Study (GBD) [1], in 2019, nontraumatic ICH resulting from aneurysm rupture or vascular malformation or hemorrhagic transformation of ischemic stroke accounted for 37.6% (4.59 million people) of all stroke cases (12.2 million) with a prevalence of 350 per 100,000 people, and a mortality rate of 3.3 million, which is half of all stroke deaths in 2019 (6.55 million). Traumatic ICH also makes a significant contribution to mortality. Every year, about 69 million people experience a traumatic brain injury (TBI) [2], of which about 5.48 million suffer from severe TBI with ICH [3]. Mortality in this group of patients is up to 90%. At the same time, from 1,730,000 to 1,965,000 lives could be saved with timely and professional assistance [3].

The number of hospitalizations primarily for nontraumatic ICH has increased significantly over the past two decades due to the aging population, more frequent use of blood thinners, and/or lack of blood pressure control, and hypertension is a major risk factor for hemorrhage [1]. Computed tomography (CT) is considered the gold standard for diagnosing ICH upon admission to the hospital, as the most sensitive to hemorrhage and the fastest imaging method [4]. However, a widespread use of CT, especially in the emergency medical care, leads to a significant increase in the workload of radiologists and can increase a number of diagnostic errors and missed diagnoses [5], as well as lead to employees’ emotional burnout [6,7]. At the same time, over the past five years, a lot of evidence has emerged that artificial intelligence (AI) is able to reduce the workload of radiologists [8,9,10].

In recent years, there has been a huge leap in the development of AI, and it is used in a variety of fields, including medicine. Due to widespread digitalization and the transition from analogue to digital media, diagnostic radiology turned out to be one of the most promising for introducing AI in the healthcare sector—hundreds of computed tomography (CT) and magnetic resonance imaging (MRI) scans, X-rays, and others are carried out every day. Such a huge amount of data contributes to the development of AI systems, their complexity, and quality improvement [11]. A good example is the experiment on the use of computer vision innovative technologies for the analysis of medical images and further application in the Moscow healthcare system [12].

For 3 years, as part of the experiment, AI-based diagnostic services have been successfully implemented into the Moscow healthcare system for 27 modalities and target pathologies. However, such success would not be possible without high-quality datasets, because they are the basis of each algorithm. Even the most complex and advanced AI will not be able to perform well if it is not trained on a high-quality dataset, the generation of which is not an easy task for a team of specialists from different fields. Incorrectly labeled datasets can lead not only to reduced quality of the AI model, but also to incorrect assessment of diagnostic accuracy criteria if they are used for testing [13].

In our research, we focused on generating a dataset for training AI systems to identify signs of ICH, since their underdiagnosis can lead to high disability and even death of the patient [1]. Often, under a high workload of medical personnel, as happened during the COVID-19 pandemic, AI systems can play an important role in the promptness of identifying patients requiring immediate medical attention [14]. Currently, there are already datasets for detection of the signs of ICH [15,16]. One of the biggest datasets consists of 874,025 CT studies with the presence and absence of signs of various types of ICH [16]. However, datasets, as a rule, either contain incomplete diagnostic studies or have labels not corresponding to the types of hemorrhages by localization, which are important training parameters for AI in this category and an important diagnostic task for a radiologist. Also, they most often do not have an indication of the technical parameters of CT studies, which can significantly affect the AI quality, as well as some other clinical signs (such as the multiplicity of hemorrhages or their secondary genesis, for example, hemorrhage in tumor), the labeling of which can also make the service more accurate.

In addition, before creating a methodology for generating datasets [17], we developed a natural language processing (NLP) tool aimed at converting text (for example, radiology reports) into interpretable datasets for intracranial hemorrhage to allow significant simplification of the process of data collection and improvement in their quality [18].

Thus, our goal was to generate a dataset of brain CT scans with and without signs of intracranial hemorrhage, supplemented with clinical and technical parameters for training artificial intelligence systems.

2. Data Description

The dataset consisted of 800 studies in DICOM format and a table with labeling. In addition, it was supplemented with information on the types of hemorrhages, associated pathologies, and technical characteristics, as well as text reports of a radiologist. A class distribution is presented in Table 1.

In addition, the dataset was supplemented with patient age and technical parameters for each study: Slice Thickness, kVp (kiloVolt Peak), X-ray Tube Current (mA), Convolution Kernel, Manufacturer (Table 2). A more detailed table with technical parameters is provided in the Supplementary Materials (Table S1). Please note that the class distribution may differ quantitatively from the 800 declared CT scans, as it depends on the number/type of series.

There is less extensive evidence in the literature on the effect of specific physical parameters on AI performance. However, available sources report that, for example, various convolution kernels used in chest CT scans may underestimate the AI’s assessment of cardiovascular risk [19].

Taking into account these technical parameters of each image, in our opinion, can help to solve the problem of data distribution shift (aka multidomain shift) [20]. This problem is that data about a particular organ (in our case, the brain) collected at different scanning parameters on different equipment in different clinical settings may not be analyzed correctly, especially by models that have been trained under empirical risk minimization (ERM) [21], because ERM assumes that training and testing data are collected in the same domain (institution) or similar domains (similarly configured diagnostic equipment). To date, there are approaches (e.g., physics-based data augmentation, PBDA) available to cope with the challenge of generalizing to new datasets that may be acquired with acquisition protocols different from the training set. But these approaches are still not able to provide robust AI performance in terms of, for example, reducing false positive responses when diagnosing pathologies on lung CT scans [22]. In other words, real CT scans with real physical parameters are needed for AI pretraining when deployed in clinical practice on different equipment.

Below are examples of studies containing signs of epidural (Figure 1a), subarachnoid (Figure 1b), subdural (Figure 1c), intracerebral (Figure 1d), multiple hemorrhages (Figure 1e), as well as studies with signs of a skull fracture (Figure 1f), combined pathologies (Figure 1g) and break in the cerebrospinal fluid spaces (Figure 1h).

3. Methods

3.1. Data Collection and Verification Process

The data were obtained from the Unified Radiological Information Service of the Uniform Medical Information Analytical System (URIS UMIAS). CT studies were performed from 5 May 2020 to 1 August 2023.

The study was conducted in accordance with the Declaration of Helsinki and approved by the Independent Ethics Committee of MRO RORR (protocol code 2/2020, the date of approval: 20 February 2020; Clinical trial: NCT04489992). The data collection process is shown in the flowchart (Figure 2).

At the first stage, 4,369,511 text reports of CT studies were uploaded. From these, brain CTs were selected, which totaled 256,120. The raw data contained missing values, outliers, and duplicates, which were excluded. For this reason, the number of studies whose data were included in the sample was 230,682. The inclusion criteria for the sample were completed text boxes for the description of CT scans, no abnormal values for patient age, and no duplicate information. All studies were carried out between 00:00:00:00 1 January 2020 and 00:00:00:00 31 May 2023. The minimum age of patients was 18 years and the maximum was 99 years.

The considered task of assessing the presence of intracranial hemorrhage from the results of text protocols of brain CT, regardless of their localization, was a binary classification task: hemorrhage present/not present. Data analysis as well as preprocessing was performed using NLTK (Natural Language Toolkit, version 3.6.5.), a library for symbolic and statistical processing of natural language, and Scikitlearn, a machine learning library containing tools for classification tasks. The libraries used, as well as the subsequent algorithm, are written in the Python programming language. This formed the basis for the creation of natural language processing tool [23].

As an initial sample for machine learning, studies containing keywords relevant to intracranial hemorrhage in Russian in the description and conclusion were selected. The key words were generated based on the expert opinion of a specialist radiologist with more than 3 years of experience in this field.

A key word combined with a negation (stop word or stop phrase) means the absence of the pathology sought. For this reason, the next step involved a repeated automatic selection of studies with keywords, resulting in a total of 84,180 studies. A list of 63 stop phrases was also compiled, whose content in the diagnostic protocols implied the absence of any intracranial hemorrhage in the study.

From this number, 10,000 diagnostic protocols of brain CT scans were randomly selected.

Then 10,000 CT reports were verified, and 5000 with a presence of a description of ICH in the report and 5000 without ICH were selected out of them. Verification at this stage was carried out by radiologists who analyzed text reports.

Criteria for classification:

With pathology: a presence of ICH description in the text report
Without pathology: an absence of ICH description in the text report.

A total of 400 studies with signs of ICH and 400 studies without signs of ICH were randomly selected and verified out of 10,000 studies.

Verification of the final dataset was carried out by a peer review. Two radiologists with more than 5 years of experience independently analyzed studies. In case of disagreement, an expert specialized in this field with more than 10 years of experience was involved. If the third expert experienced difficulties with the identification, CT scans of the patient in dynamics were analyzed (if available).

3.1.1. Classification Criteria

With pathology:
- Intracerebral hemorrhage (ICH): a CT study reveals a hyperdense area (from +45–50 to 90 HU) of any location and shape, heterogeneous or homogeneous inside the brain tissue, which can rupture into the intraventricular spaces (intraventricular hemorrhage);
- Subarachnoid hemorrhage (SAH): in CT scans, hyperdense areas (from +45–50 to 90 HU) of various shapes and sizes are found in the subarachnoid spaces and cisterns of the brain;
- Subdural hemorrhage (SDH): on CT images, crescent-shaped hyperdense areas (from +45–50 to 90 HU), homogeneous or heterogeneous in structure, are detected in the subdural space, or crescent-shaped isodense ones (~35–40 HU), corresponding to the subacute stage of SDK;
- Epidural hemorrhage (EDH): a CT study reveals biconvex (lenticular) hyperdense areas (from +45–50 to 90 H units) in the epidural space, often heterogeneous in structure [24].
Without pathology: hyperdense areas (from +45–50 to 90 HU) are not detected either inside the brain tissues or in the meningeal spaces on CT scans.

3.1.2. Inclusion Criteria

−: Patient’s age is at least 18 years;
−: Availability of a radiology report of the study with conclusion in the information system;
−: Presence of the desired sign of pathology in the images—for selecting studies of the “presence of pathology” class.

3.1.3. Noninclusion Criteria

−: Patient’s age is less than 18 years;
−: Absence of a radiology report of the study with conclusion in the information system,
−: Inability to reliably determine (by verification) whether the changes on CT scans are signs of hemorrhage.

3.1.4. Exclusion Criteria

−: A presence of image artifacts that could potentially complicate the AI service operation (dynamic artifacts, bone and/or metal artifacts, artifacts from detector malfunction);
−: For studies using contrast enhancement—the absence of a series of CT images from native phase in the information system.

3.2. Population Parameters and Anonymization

The final dataset included studies of 338 women and 459 men in DICOM format (no gender data were available for 3 people). The minimum age of participants was 19 years, the maximum was 100, and the median age was equal to 57.

Data anonymization was carried out by removing an extended list of tags containing personal information [25] using a special software module. In addition, pseudonymization was performed by replacing the unique study identifier.

To accomplish a task of complete anonymization of images—defacing, an algorithm was developed that adds artifacts to an image of the front part of the skull. Artifacts cover both soft tissue and some parts of bone structures, which are not critical for the clinical purposes of this dataset. A catalog with a brain study (a series of DICOM files) was fed to the input of the developed algorithm, and we obtained a study with artifacts at the output.

The algorithm was implemented in Python using libraries for operating with images and data arrays (numpy, pydicom, skimage). The algorithm processes a study from slice to slice in the axial projections. Each study is read from a bottom of the head. Artifacts are added to 60% of the slices of the total number. This approach allows removal of the facial structures by which a person’s personality can be recognized (lips, nose, eyes, and their relationship), without affecting the brain structures.

Let us illustrate the algorithm’s operation on the example of a single slice (Figure 3). To apply artifacts, the points corresponding to the outer edges of bone structures are searched. Based on the fact that bone structures have a higher X-ray density, we binarize an image at the threshold of 490 HU, equating pixels above the threshold to 1, and the remaining structures become a background (i.e., 0). Next, we fill in the binary area using the “Convex hull” method. The borders of the resulting area correspond to the edges of the bone structures in the image.

A centroid is calculated for each area. Next, we draw a straight line from the centroid to a front of the image and find a point where the border of the area ends and the background begins. We save this point. We draw nine lines in different directions with a given step. In total, we obtain 9 points located at the edges of the bone structures. An ellipse is constructed for each point, which is filled with random values. Ellipses are artifacts performing anonymization.

Thus, the algorithm runs through each slice, applying artifacts to the edges of bone structures using the approach described above. The algorithm has a number of limitations. In particular, when a patient positions his head with a large turn (to the left or to the right), applying artifacts to the front of the skull, a side of the head is covered instead of the front. This limitation is overcome manually by shifting a direction of the lines towards a head rotation.

Two experts assessed efficiency of covering with ellipsoidal artifacts for anonymization of soft tissues and bone structures of the facial skull independently. In addition, a proportion of slices (60%) to be processed was determined by the expert assessment method.

4. Conclusions

In addition to the diagnostic images themselves, the inclusion of other associated information in the dataset for training and testing of the AI may reduce the number of technological defects and increase its clinical relevance. This accompanying information can be a diagnostic description, a report, information about the CT scanner, a scan protocol, a unique identification number assigned to each specific study, the age and gender of the patient, the date of the diagnostic study, the medical institution where the study was performed, and additional information (localization of intracranial hemorrhage, skull bone fractures, test results, etc.). For both training and testing, depending on the clinical problem to be solved, either the complete dataset or a part of it (image and age/image only, description, localization, etc.) can be used. A benchmark dataset with a large amount of additional information about each study can be effectively applied in monitoring AI-based diagnostic services in operation.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/data9020030/s1, Table S1: Class distribution of technical parameters in the dataset.

Author Contributions

Conceptualization, E.I.K. and A.N.K.; methodology, K.M.A.; software, D.V.K.; validation, A.N.K.; investigation, A.N.K. and D.V.K.; data curation, D.K. and V.P.N.; writing—original draft preparation, A.N.K. and T.M.B.; writing—review and editing, K.M.A. and E.I.K.; visualization, A.N.K. and T.M.B.; supervision, K.M.A.; project administration, E.I.K.; funding acquisition, E.I.K. All authors have read and agreed to the published version of the manuscript.

Funding

This publication was supported by the Russian Science Foundation Grant No. 22-25-20231, https://rscf.ru/project/22-25-20231/ (accessed on 5 February 2024).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analyzed in this study. These data can be found here: https://mosmed.ai/en/datasets/rasshirenniinaborkompyuternihtomogrammgm/ (accessed on 5 February 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yu, N.; Yu, H.; Li, H.; Ma, N.; Hu, C.; Wang, J. A Robust Deep Learning Segmentation Method for Hematoma Volumetric Detection in Intracerebral Hemorrhage. Stroke 2022, 53, 167–176. [Google Scholar] [CrossRef]
Dyer, T.; Chawda, S.; Alkilani, R.; Morgan, T.N.; Hughes, M.; Rasalingham, S. Validation of an artificial intelligence solution for acute triage and rule-out normal of non-contrast CT head scans. Neuroradiology 2022, 64, 735–743. [Google Scholar] [CrossRef]
Schmitt, N.; Mokli, Y.; Weyland, C.S.; Gerry, S.; Herweh, C.; Ringleb, P.A.; Nagel, S. Automated detection and segmentation of intracranial hemorrhage suspect hyperdensities in non-contrast-enhanced CT scans of acute stroke patients. Eur. Radiol. 2022, 32, 2246–2254. [Google Scholar] [CrossRef]
Daugaard Jørgensen, M.; Antulov, R.; Hess, S.; Lysdahlgaard, S. Convolutional neural network performance compared to radiologists in detecting intracranial hemorrhage from brain computed tomography: A systematic review and meta-analysis. Eur. J. Radiol. 2022, 146, 110073. [Google Scholar] [CrossRef] [PubMed]
Zhou, Q.; Zhu, W.; Li, F.; Yuan, M.; Zheng, L.; Liu, X. Transfer Learning of the ResNet-18 and DenseNet-121 Model Used to Diagnose Intracranial Hemorrhage in CT Scanning. Curr. Pharm. Des. 2022, 28, 287–295. [Google Scholar] [CrossRef]
Kau, T.; Ziurlys, M.; Taschwer, M.; Kloss-Brandstätter, A.; Grabner, G.; Deutschmann, H. FDA-approved deep learning software application versus radiologists with different levels of expertise: Detection of intracranial hemorrhage in a retrospective single-center study. Neuroradiology 2022, 64, 981–990. [Google Scholar] [CrossRef] [PubMed]
Alfaer, N.M.; Aljohani, H.M.; Abdel-Khalek, S.; Alghamdi, A.S.; Mansour, R.F. Fusion-Based Deep Learning with Nature-Inspired Algorithm for Intracerebral Haemorrhage Diagnosis. J. Healthc. Eng. 2022, 2022, 4409336. [Google Scholar] [CrossRef]
Hopkins, B.S.; Murthy, N.K.; Texakalidis, P.; Karras, C.L.; Mansell, M.; Jahromi, B.S.; Potts, M.B.; Dahdaleh, N.S. Mass Deployment of Deep Neural Network: Real-Time Proof of Concept with Screening of Intracranial Hemorrhage Using an Open Data Set. Neurosurgery 2022, 90, 383–389. [Google Scholar] [CrossRef] [PubMed]
Alis, D.; Alis, C.; Yergin, M.; Topel, C.; Asmakutlu, O.; Bagcilar, O.; Senli, Y.D.; Ustundag, A.; Salt, V.; Dogan, S.N.; et al. A joint convolutional-recurrent neural network with an attention mechanism for detecting intracranial hemorrhage on noncontrast head CT. Sci. Rep. 2022, 12, 2084. [Google Scholar] [CrossRef]
Abrigo, J.M.; Ko, K.L.; Chen, Q.; Lai, B.M.H.; Cheung, T.C.Y.; Chu, W.C.W.; Yu, S.C.H. Artificial intelligence for detection of intracranial haemorrhage on head computed tomography scans: Diagnostic accuracy in Hong Kong. Hong Kong Med. J. 2023, 29, 112–120. [Google Scholar] [CrossRef]
Dash, S.; Shakyawar, S.; Sharma, M.; Kaushik, S. Big data in healthcare: Management, analysis and future prospects. J. Big Data 2019, 6, 1–25. [Google Scholar] [CrossRef]
Yu, A. Vasiliev and others. In Computer Vision in Diagnostic Radiology: The First Stage of the Moscow Experiment: Monograph, 2nd ed.; revised and expanded; Publishing Solutions: Moscow, Russia, 2023; 376p. [Google Scholar]
Aggarwal, R.; Sounderajah, V.; Martin, G.; Ting, D.S.W.; Karthikesalingam, A.; King, D.; Ashrafian, H.; Darzi, A. Diagnostic accuracy of deep learning in medical imaging: A systematic review and meta-analysis. NPJ Digit. Med. 2021, 4, 65. [Google Scholar] [CrossRef] [PubMed]
Morozov, S.P.; Gavrilov, A.V.; Arkhipov, I.V. The influence of artificial intelligence technologies on a duration of the provision of a computed tomography report of patients with COVID-19 in inpatient healthcare. Prev. Med. 2022, 25, 14–22. [Google Scholar]
Chilamkurthy, S.; Ghosh, R.; Tanamala, S.; Biviji, M.; Campeau, N.G.; Venugopal, V.K.; Mahajan, V.; Rao, P.; Warier, P. Deep learning algorithms for detection of critical findings in head CT scans: A retrospective study. Lancet 2018, 392, 2388–2396. [Google Scholar] [CrossRef] [PubMed]
Flanders, A.E.; Prevedello, L.M.; Shih, G.; Halabi, S.S.; Kalpathy-Cramer, J.; Ball, R.; Mongan, J.T.; Stein, A.; Kitamura, F.C.; Lungren, M.P.; et al. Construction of a Machine Learning Dataset through Collaboration: The RSNA 2019 Brain CT Hemorrhage Challenge. Radiol. Artif. Intell 2020, 2, e190211. [Google Scholar] [CrossRef] [PubMed]
Pavlov, N.A.; Andreychenko, A.E.; Vladzymyrskyy, A.V.; Revazyan, A.A.; Kirpichev, Y.S.; Morozov, S.P. Reference medical datasets (MosMedData) for independent external evaluation of algorithms based on artificial intelligence in diagnostics. Digit. Diagn. 2021, 2, 49–66. [Google Scholar] [CrossRef]
Khoruzhaya, A.N.; Kozlov, D.V.; Arzamasov Yu, M.; Kremneva, E.I. Text analysis of radiologists’ reports with signs of intracranial hemorrhage on brain CT using a decision tree algorithm. Mod. Technol. Med. 2022, 14, 34–42. [Google Scholar]
Lin, Y.; Lin, G.; Peng, M.T.; Kuo, C.T.; Wan, Y.L.; Cherng, W.J. The Role of Artificial Intelligence in Coronary Calcium Scoring in Standard Cardiac Computed Tomography and Chest Computed Tomography with Different Reconstruction Kernels. J. Thorac. Imaging 2023, 10-1097. [Google Scholar] [CrossRef]
Ye, Q.; Gao, Y.; Ding, W.; Niu, Z.; Wang, C.; Jiang, Y.; Wang, M.; Fang, E.F.; Menpes-Smith, W.; Xia, J.; et al. Robust weakly supervised learning for COVID-19 recognition using multi-center CT images. Appl. Soft Comput. 2022, 116, 108291. [Google Scholar] [CrossRef]
Vapnik, V. Principles of risk minimization for learning theory. Adv. Neural Inf. Process. Syst. 1991, 4, 831–838. [Google Scholar]
Omigbodun, A.O.; Noo, F.; McNitt-Gray, M.; Hsu, W.; Hsieh, S.S. The effects of physics-based data augmentation on the generalizability of deep neural networks: Demonstration on nodule false-positive reduction. Med. Phys. 2019, 46, 4563–4574. [Google Scholar] [CrossRef] [PubMed]
Certificate of State Registration of Computer Programme No. 2022681196. 386-420.
Brust, J.C. Current Diagnosis and Treatment in Neurology; McGraw-Hill Medica: New York, NY, USA, 2006; pp. 386–420. [Google Scholar]
Aryanto, K.Y.E.; Oudkerk, M.; van Ooijen, P.M.A. Free DICOM de-identification tools in clinical research: Functioning and safety of patient privacy. Eur. Radiol. 2015, 25, 3685–3695. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Examples of brain CT scans with labels. (a) Brain CT scan with signs of epidural hemorrhage in the right hemisphere. (b) Brain CT scan with signs of subarachnoid hemorrhage in both hemispheres. (c) Brain CT scan with signs of subdural hemorrhage in the right hemisphere. (d) Brain CT scan with signs of intracerebral hemorrhage in the right hemisphere. (e) Brain CT scan with signs of multiple hemorrhages (in this case subdural, subarachnoid, intracerebral) in both hemispheres. (f) Brain CT scan (bone kernel) with signs of frontal bone fracture. (g) Brain CT scan with signs of combined pathology (in this case, the tumor was complicated by hemorrhage). (h) Brain CT scan with signs of blood breakthrough into the liquor spaces, indicated by intraventricular hemorrhage.

Figure 2. Data collection flowchart. CT—computed tomography, ICH—intracranial hemorrhage, NLP—natural language processing.

Figure 3. Stages of the defacing algorithm: (a)—obtaining an axial slice, (b)—drawing a line to the facial part of the slice from the centroid, obtaining a point for constructing an artefact (ellipse), (c)—constructing an ellipse of random size and filling it with specified values, (d)—repeating operation (b,c) for each slice (a), obtaining artefacts across the entire facial part of the skull, (e)—3D reconstruction of the CT study after running the algorithm for 60% of the slices in axial view.

Table 1. Label names of clinical parameters and class distribution in the dataset.

Label Name	Number of Classes	Class Names	Class Distribution
Signs of ICH ¹	2	Presence/absence	400/400
Signs of epidural hemorrhage	2	Presence/absence	100/700
Signs of subarachnoid hemorrhage	2	Presence/absence	112/688
Signs of subdural hemorrhage	2	Presence/absence	155/645
Signs of intracerebral hemorrhage	2	Presence/absence	191/609
Signs of multiple hemorrhages	2	one/two or more	265/535
Signs of a skull fracture	2	Presence/absence	124/676
Signs of combined pathologies	2	Presence/absence	23/777
Signs of a break in the cerebrospinal fluid spaces	2	Presence/absence	89/711
Radiology text report	Not applicable	Presence/absence	Not applicable

¹ ICH—intracranial hemorrhage.

Table 2. Label names of technical parameters in the dataset.

Label Name	Number of Classes	Class Names
Convolution Kernel	28	B, BONE, BONEPLUS, D, FC08, FC09, FC21, FC23, FC26, FC30, FC3 5, FC62, FC64, FC68, FC81, H31s, H70h, J30s, J37s, J40s, J45s, J80s, SOFT, STANDARD, Sharp, UB, YA, YB
Slice Thickness	11	0.500, 0.625, 0.750, 0.900, 1.000, 1.250, 1.500, 2.000, 2.500, 3.000, 5.000
Manufacturer	5	GE Medical Systems, Mobius Imaging, Philips, Siemens, Toshiba
kiloVolt Peak	6	80, 100, 120, 130, 135, 140
X-ray Tube Current (mA)	216	25–450

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khoruzhaya, A.N.; Bobrovskaya, T.M.; Kozlov, D.V.; Kuligovskiy, D.; Novik, V.P.; Arzamasov, K.M.; Kremneva, E.I. Expanded Brain CT Dataset for the Development of AI Systems for Intracranial Hemorrhage Detection and Classification. Data 2024, 9, 30. https://doi.org/10.3390/data9020030

AMA Style

Khoruzhaya AN, Bobrovskaya TM, Kozlov DV, Kuligovskiy D, Novik VP, Arzamasov KM, Kremneva EI. Expanded Brain CT Dataset for the Development of AI Systems for Intracranial Hemorrhage Detection and Classification. Data. 2024; 9(2):30. https://doi.org/10.3390/data9020030

Chicago/Turabian Style

Khoruzhaya, Anna N., Tatiana M. Bobrovskaya, Dmitriy V. Kozlov, Dmitriy Kuligovskiy, Vladimir P. Novik, Kirill M. Arzamasov, and Elena I. Kremneva. 2024. "Expanded Brain CT Dataset for the Development of AI Systems for Intracranial Hemorrhage Detection and Classification" Data 9, no. 2: 30. https://doi.org/10.3390/data9020030

Article Menu

Expanded Brain CT Dataset for the Development of AI Systems for Intracranial Hemorrhage Detection and Classification

Abstract

1. Summary

2. Data Description

3. Methods

3.1. Data Collection and Verification Process

3.1.1. Classification Criteria

3.1.2. Inclusion Criteria

3.1.3. Noninclusion Criteria

3.1.4. Exclusion Criteria

3.2. Population Parameters and Anonymization

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI