Dataset of Two-Dimensional Gel Electrophoresis Images of Acute Myeloid Leukemia Patients before and after Induction Therapy

: Acute myeloid leukemia (AML) is a malignant disorder of the hematopoietic stem and progenitor cells, which results in the build-up of immature blasts in the bone marrow and eventually in the peripheral blood of affected patients. Accurately assessing a patient ´ s prognosis is very important for clinical management of the disease, which is why there are several prognostic factors such as age, performance status at diagnosis, platelet count, serum creatinine and albumin that are taken into account by the clinician when deciding the course of treatment. However, proteomic changes related to treatment response in this patient group have not been widely explored. Here, we make available a set of 22 two-dimensional gel electrophoresis (2DGE) images obtained from the peripheral blood samples of 11 patients with AML, taken at the time of diagnosis and after induction therapy (approximately 21–28 days after starting treatment). The same set of 2DGE images is also made available after a preprocessing stage (an additional 22 2DGE pre-processed images), which was performed using algorithms developed in Python, in order to improve the visualization of characteristic spots and facilitate proteomic analysis of this type of images. Dataset: The dataset will be published as a supplement to this paper, so this ﬁeld will be ﬁlled by the editors of the journal . investigation, L.F.R.; resources, E.C.; data curation, S.R.; writing—original draft preparation, J.E.U., S.R. and E.D.-T.; writing—review editing, J.E.U., S.R., J.P.-A., M.C.T.-M. and E.D.-T.; visualization, J.E.U., S.R. and E.D.-T.; supervision, S.R., M.C.T.-M. and E.D.-T.;


Summary
According to the Global Cancer Observatory (Globocan 2018), each year, 437,033 patients worldwide are diagnosed with some type of leukemia, and 309,006 people die from this disease. Acute myeloid leukemia (AML) is a type of leukemia that mainly occurs in older adults; 42% of Americans diagnosed with AML are over 65 years of age, and their diagnosis is rarely made before 40 years of age, although cases have progressively increased over time [1]. AML is the result of an accumulation of acquired genetic alterations in the Data 2021, 6, 20 2 of 5 DNA of hematopoietic progenitor cells, and accurately assessing a patient's prognosis is very important for clinical management of the disease. The patient's cytogenetic profile is currently the strongest prognostic factor. For example, a complex karyotype, monosomy 5 or 7, t(6;9), inv (3), or 11q changes, other than t(9;11), have all been associated with a significantly lower response to treatment and overall survival [2]. It is clear that genetic studies are very valuable; however, isolated from a context in which thousands of proteins mediate cellular function, this prognostic model is not complete.
The images of this dataset were obtained by two-dimensional gel electrophoresis (2DGE), a technique that separates proteins according to their isoelectric point and molecular weight [3], followed by protein staining and image capture. Often, 2DGE images include anomalies [4,5] such as vertical lines, horizontal lines, diffuse points, and noise, among others, which make it difficult to identify spots that contain valuable information. Therefore, a preprocessing stage is often necessary in order to discriminate stains and noise from real protein spots [6]. Omitting this stage can affect the interpretation of the data, as noise could be identified as false protein spots [7]. Image preprocessing is responsible for reducing or correcting these irregularities in 2DGE images. The authors have implemented an approach that integrates the techniques of image normalization, noise reduction by nonlinear techniques, and background correction [4,8], sequentially applying the following structure: adaptive piecewise histogram equalization for image normalization, a geometric nonlinear diffusion filter (GNDF) for filtering, and multilevel thresholding for background correction, obtaining favorable results [9].

Data Description
The database consists of a set of 22 2DGE images obtained from the peripheral blood samples of 11 patients with acute myeloid leukemia. Of these, 11 images correspond to samples taken at the time of diagnosis, and the other 11 correspond to samples taken from the same patients after induction therapy (approximately 21-28 days after starting treatment). Images named with the suffix BEFORE refer to 2DGE images of samples taken at the time of diagnosis (before treatment), while images named with the suffix AFTER correspond to 2DGE images of samples taken after treatment. These 22 images are also made available with the preprocessing stage applied, to which the prefix PREPROC has been applied. Each image in the database is in tagged image file format (TIFF) format with a resolution of 300 dots per inch (DPI). In total, the database, which can be found in the Supplementary Materials, contains 44 images (22 raw 2DGE images and 22 pre-processed 2DGE images). The characteristics corresponding to each image are summarized in Table 1.

Patients
Peripheral blood was obtained from 11 newly diagnosed patients with de novo AML at Hospital Manuel Uribe Angel in Colombia. Two blood samples were taken from each patient: at the time of diagnosis (before the start of chemotherapy) and once again after completion of the first round of induction therapy, which was typically 2-3 weeks after induction or when neutrophil and platelet recovery was achieved. Relevant clinical information of the patients involved in this study is summarized in Table 2. According to the French-American-British (FAB) classification system. 2 (i): initial blast count, before induction therapy. 3 (f): final blast count, after induction therapy. 4 CR: complete remission, defined as <5% blasts in bone marrow after induction therapy. 5 CCR: complex chromosome rearrangement. 6 PR: partial response, defined as 5-20% blasts in bone marrow after induction therapy. 7 Resistance to therapy, defined by >20% blasts in bone marrow after induction therapy.

Two-Dimensional Gel Electrophoresis
Proteins (50 µg) were loaded by passive rehydration onto 7 cm ZOOM ® immobilized pH gradient (IPG) strips with a pH of 3-10 NL (ThermoFisher Scientific, Waltham, MA, USA) at room temperature. Isoelectric focusing was carried out using the following voltage ramp: 100 V for 1 h, 150 V for 1 h, 200 V for 5 min, 450 V for 5 min, 600 V for 5 min, 750 V for 5 min, 950 V for 5 min, 1200 V for 10 min, 1400 V for 10 min, 1600 V for 10 min, and 2000 V for 45 min. The IPG strips were then reduced with 100 mM Dithiothreitol (DTT) and alkylated with 2.5% iodacetamide, according to the manufacturer's recommended protocol. After this, the IPG strips were loaded onto SDS-PAGE NuPAGE ™ Novex ™ 4-12% Bis-Tris protein gels 1.5 mm in size (ThermoFisher Scientific) and run at 200 V for 45 min. After electrophoresis, these were stained with SYPRO ® Ruby (Invitrogen ™ , ThermoFisher Scientific), and the gel images were acquired using the ChemiDoc ™ MP System (Biorad).

Image Pre-Processing
This step was performed in order to mitigate anomalies due to the acquisition routines and improve spot detection. The approach proposed in [9] was applied, integrating the following techniques for image normalization, noise reduction, and background correction: adaptive piecewise histogram equalization, a geometric nonlinear diffusion filter (GNDF), and multilevel thresholding. The algorithm was executed in Python, which is an opensource programming language, with free access to permanent online support through a considerable number of available libraries, accelerating the creation of multi-stage structure codes with the aim of obtaining consistent, reliable, and potentially integrable results.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available in the Supplementary Material.