HyCervix: In Vivo Hyperspectral Cervix Dataset for Non-Invasive Detection of Precancerous and Cancerous Lesions
Abstract
1. Summary
2. Data Description
3. Methods
3.1. Ethics Approval
3.2. HS Colposcope System
3.3. Data Acquisition Methodology
3.4. Study Population
3.5. Annotation of the HS Images
3.6. HS Data Calibration
4. User Notes
Machine Learning Guidelines and Benchmark Protocol
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| HS | Hyperspectral |
| HSI | Hyperspectral Imaging |
| CIN | Cervical Intraepithelial Neoplasia |
| HPV | Human Papillomavirus |
| IC | Invasive Carcinoma |
| HSIL | High-Grade Squamous Intraepithelial Lesion |
| LSIL | Low-Grade Squamous Intraepithelial Lesion |
| GT | Ground Truth |
| GUI | Graphical User Interface |
References
- Wentzensen, N.; Walker, J.; Smith, K.; Gold, M.A.; Zuna, R.; Massad, L.S.; Liu, A.; Silver, M.I.; Dunn, S.T.; Schiffman, M. A Prospective Study of Risk-Based Colposcopy Demonstrates Improved Detection of Cervical Precancers. Am. J. Obstet. Gynecol. 2018, 218, 604.e1–604.e8. [Google Scholar] [CrossRef] [PubMed]
- Lycke, K.D.; Kalpathy-Cramer, J.; Jeronimo, J.; de Sanjose, S.; Egemen, D.; del Pino, M.; Marcus, J.; Schiffman, M.; Hammer, A. Agreement on Lesion Presence and Location at Colposcopy. J. Low. Genit. Tract Dis. 2024, 28, 37–42. [Google Scholar] [CrossRef] [PubMed]
- Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global Cancer Statistics 2022: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2024, 74, 229–263. [Google Scholar] [CrossRef] [PubMed]
- Bray, F.; Carstensen, B.; Møller, H.; Zappa, M.; Žakelj, M.P.; Lawrence, G.; Hakama, M.; Weiderpass, E. Incidence Trends of Adenocarcinoma of the Cervix in 13 European Countries. Cancer Epidemiol. Biomark. Prev. 2005, 14, 2191–2199. [Google Scholar] [CrossRef] [PubMed]
- Bray, F.; Loos, A.H.; McCarron, P.; Weiderpass, E.; Arbyn, M.; Møller, H.; Hakama, M.; Parkin, D.M. Trends in Cervical Squamous Cell Carcinoma Incidence in 13 European Countries: Changing Risk and the Effects of Screening. Cancer Epidemiol. Biomark. Prev. 2005, 14, 677–686. [Google Scholar] [CrossRef] [PubMed]
- Utada, M.; Chernyavskiy, P.; Lee, W.J.; Franceschi, S.; Sauvaget, C.; Berrington de Gonzalez, A.; Withrow, D.R. Increasing Risk of Uterine Cervical Cancer among Young Japanese Women: Comparison of Incidence Trends in Japan, South Korea and Japanese-Americans between 1985 and 2012. Int. J. Cancer 2019, 144, 2144–2152. [Google Scholar] [CrossRef] [PubMed]
- Walboomers, J.M.; Jacobs, M.V.; Manos, M.M.; Bosch, F.X.; Kummer, J.A.; Shah, K.V.; Snijders, P.J.F.; Peto, J.; Meijer, C.J.L.M.; Muñoz, N. Human Papillomavirus Is a Necessary Cause of Invasive Cervical Cancer Worldwide. J. Pathol. 1999, 189, 12–19. [Google Scholar] [CrossRef]
- Ostör, A.G. Natural History of Cervical Intraepithelial Neoplasia: A Critical Review. Int. J. Gynecol. Pathol. 1993, 12, 186. [Google Scholar] [CrossRef] [PubMed]
- Yu, Y.; Ma, J.; Zhao, W.; Li, Z.; Ding, S. MSCI: A Multistate Dataset for Colposcopy Image Classification of Cervical Cancer Screening. Int. J. Med. Inform. 2021, 146, 104352. [Google Scholar] [CrossRef] [PubMed]
- Ben, O.; Jones, J.L.; Kumar, H.; Risdal, M.; Rao, M.; Sherman, V. Intel & MobileODT Cervical Cancer Screening. Kaggle Competition. 2017. Available online: https://www.kaggle.com/competitions/intel-mobileodt-cervical-cancer-screening (accessed on 17 January 2026).
- Jurjuţ, O.; Weiss, M.; Daniel, Y.; Matovina, S.; Neis, F.; Rall, K.; Schöpp, K.; Henes, M.; Linzenbold, W.; Brucker, S.Y.; et al. Detection of Cervical Intraepithelial Neoplasia Using Hyperspectral Tissue Signatures. IEEE J. Transl. Eng. Health Med. 2025, 13, 532–539. [Google Scholar] [CrossRef] [PubMed]
- Schimunek, L.; Schöpp, K.; Wagner, M.; Brucker, S.Y.; Andress, J.; Weiss, M. Hyperspectral Imaging as a New Diagnostic Tool for Cervical Intraepithelial Neoplasia. Arch. Gynecol. Obstet. 2023, 308, 1525–1530. [Google Scholar] [CrossRef] [PubMed]
- Vega, C.; Medina, N.; Leon, R.; Fabelo, H.; Martín, A.; Callico, M.G. HyCervix Dataset. Zenodo 2026. [Google Scholar] [CrossRef]
- Vega, C.; Medina, N.; Quintana-Quintana, L.; Leon, R.; Fabelo, H.; Rial, J.; Martín, A.; Callico, G.M. Feasibility Study of Hyperspectral Colposcopy as a Novel Tool for Detecting Precancerous Cervical Lesions. Sci. Rep. 2025, 15, 820. [Google Scholar] [CrossRef] [PubMed]
- Nayar, R.; Wilbur, D.C. The Bethesda System for Reporting Cervical Cytology: Definitions, Criteria, and Explanatory Notes; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Vega, C.; Medina, N.; Leon, R.; Fabelo, H.; Martín, A.; Callico, G. In-Vivo Detection of Cervical Cancer Lesions Using Hyperspectral Colposcopy. Prepr. Res. Sq. 2026. [Google Scholar] [CrossRef]




| File Name | Included Elements | Description |
|---|---|---|
| cube.hdr | Set of variables:
| Header file in ENVI format, which contains the metadata for interpreting the HS cube. It also contains relevant information such as the number of bands and wavelength |
| cube.dat | HS reflectance cube | HS reflectance data as a flat-binary raster. |
| Patient_XX_GT.mat | Set of variables:
| Structure including all the GT annotations for the patient. It contains the patient_class label, the masks of the main areas, and the pixel-level annotations, organized by numeric code and color. |
| Patient_XX_Diagnostic.txt | Diagnostic. | Patient diagnostic class. |
| Patient_XX_RGB.png | RGB image. | Synthetic RGB image generated from the HS cube. |
| Patient_XX_GT.png | GT image. | RGB image containing pixel-level annotations coded by color. |
| Label ID | Label Name | Label Subclass | # Images | # Labelled Pixels | RGB Code |
|---|---|---|---|---|---|
| 0 | Not Labelled | - | - | - | [0,0,0] |
| 100 | Normal (HPV Infected) | Ectocervix | 9 | 8,671,216 | [0,0,255] |
| 101 | Endocervix | 1,003,041 | [0,255,0] | ||
| 102 | Outlier | 179,502 | [255,255,255] | ||
| 103 | Normal (Gold Standard) | Ectocervix | 19 | 4,493,133 | [0,0,255] |
| 104 | Endocervix | 368,314 | [0,255,0] | ||
| 105 | Outlier | 103,244 | [255,255,255] | ||
| 200 | CIN1 | - | 26 | 19,082 | [255,0,0] |
| 201 | CIN2 | - | 13 | 10,790 | [255,0,0] |
| 202 | CIN3 | - | 18 | 69,179 | [255,0,0] |
| 300 | Invasive Carcinoma | - | 4 | 87,869 | [255,0,255] |
| Feature Name | Type | Description |
|---|---|---|
| AnnonID | String | Anonymous patient identifier. |
| Group | String | Partition of the dataset to which the patient was assigned for supervised classification approaches comparison (see Section 4). |
| Age | Integer | Age of the patient when the HS image was captured. |
| Parity | Integer | Number of pregnancies (0: No pregnancies). |
| Smoker | Boolean | The individual smokes or not (Yes; No). |
| Menopause | Boolean | Menopausal status of the patient (Yes; No). |
| Contraceptive | String | Type of birth control method used (Barrier Copper IUD; Hormonal IUD; Anovulators; Hormone implant; Tubal ligation; Partner’s vasectomy; None). |
| Age at First Intercourse | Integer | The age of first sexual intercourse. |
| Number of Sexual Partners | Integer | Number of sexual partners. |
| Previous Conization | Boolean | The individual has received a previous conization (Yes; No). |
| Reason of Study | String | Reason for undergoing the exam. |
| HPV Test | String | HPV test result (HPV 16; HPV 16 and Others; HPV 18; HPV Negative; HPV Others). |
| Cytology | String | Cytological examination result (HSIL; LSIL; Normal; Invasive Carcinoma). |
| Colposcopy Result | String | Colposcopy examination result (Invasive Carcinoma; Grade 2; Grade 1; Normal). |
| Transfer Area | String | Cervical transformation zone type (Zone type 1; Zone type 2; Zone type 3). |
| Biopsy Result | String | Pathological result of the biopsy sample extracted (Normal; CIN 1; CIN 2; CIN 3; Invasive Carcinoma). |
| Biopsy Location | String | Cervical clock-face notation of the biopsy sample location. |
| Definitive Diagnostic | String | Definitive diagnosis given by the gynecologist based on colposcopy, HPV test, and cytology results. |
| Component | Manufacturer | Model | Key Parameter | ||
|---|---|---|---|---|---|
| Colposcope | Colposcope Model | OPTOMIC ESPAÑA, S.A., Colmenar Viejo, Spain | OP-C5 | ||
| Binocular | Inclined 45° | ||||
| Eyepiece | Wide field | ||||
| Objective | f = 300 mm. 5-step Galilei magnification changer (0.4×, 0.6×, 1×, 1.6×, 2.5×) | ||||
| Power supply unit 1 | Fibrolux LED HP | 100–240 v AC/50/60 Hz | |||
| Power supply unit 2 | Fibrolux 150 | 100–240 v AC/50/60 Hz | |||
| LED Light | Green or amber filter | ||||
| Halogen Lamp | OSRAM GmbH, Munich, Germany | 64634 HLX | 150 W | ||
| HSI System | HS Camera | IMEC, Leuven, Belgium | SNAPSCAN VNIR | Technology | Snapscan |
| Spectral range | 470 to 900 nm | ||||
| N° of bands | 158 bands | ||||
| Spectral resolution | 2.86 nm | ||||
| FWHM | 10–15 nm | ||||
| Sensor | ams OSRAM AG, Munich, Germany | ams CMV2000 | Technology | CMOS | |
| Pixel pitch | 5.5 µm | ||||
| Spatial size | 1000 × 900 pixels | ||||
| Feature | Category/Range | Total | Normal | LSIL (CIN 1) | HSIL (CIN 2–3) | Cancer | Chi2 | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N | % | N | % | N | % | N | % | N | % | p-Value | ||
| Age | 25–28 | 5 | 6 | 4 | 80 | 1 | 20 | 0 | 0 | 0 | 0 | 0.214 |
| 29–42 | 45 | 58 | 21 | 47 | 10 | 22 | 10 | 22 | 4 | 9 | ||
| 43–56 | 20 | 26 | 7 | 35 | 3 | 15 | 9 | 45 | 1 | 5 | ||
| 57–67 | 5 | 6 | 1 | 20 | 1 | 20 | 2 | 40 | 1 | 20 | ||
| NA | 2 | 3 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Parity | 0 | 31 | 40 | 15 | 48 | 7 | 23 | 6 | 19 | 3 | 10 | 0.283 |
| 1 | 23 | 30 | 11 | 48 | 3 | 13 | 8 | 35 | 1 | 4 | ||
| >2 | 21 | 27 | 7 | 33 | 5 | 24 | 7 | 33 | 2 | 10 | ||
| NA | 2 | 3 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Smoker | No | 53 | 69 | 28 | 53 | 7 | 13 | 13 | 25 | 5 | 9 | 0.059 |
| Yes | 21 | 27 | 5 | 24 | 7 | 33 | 8 | 38 | 1 | 5 | ||
| NA | 3 | 4 | 2 | 67 | 1 | 33 | 0 | 0 | 0 | 0 | ||
| Contraceptive | Barrier | 14 | 18 | 5 | 36 | 4 | 29 | 5 | 36 | 0 | 0 | 0.141 |
| Copper IUD | 4 | 5 | 4 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Hormonal IUD | 5 | 6 | 3 | 60 | 1 | 20 | 1 | 20 | 0 | 0 | ||
| Anovulators | 19 | 25 | 8 | 42 | 5 | 26 | 3 | 16 | 3 | 16 | ||
| Coitus Interruptus | 1 | 1 | 1 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Tubal ligation | 3 | 4 | 0 | 0 | 2 | 67 | 1 | 33 | 0 | 0 | ||
| No | 26 | 34 | 11 | 42 | 2 | 8 | 11 | 42 | 2 | 8 | ||
| Partner’s vasectomy | 3 | 4 | 1 | 33 | 1 | 33 | 0 | 0 | 1 | 33 | ||
| NA | 2 | 3 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Age at First Intercourse | ≤15 | 15 | 19 | 4 | 27 | 3 | 20 | 6 | 40 | 2 | 13 | 0.092 |
| 16–18 | 53 | 69 | 24 | 45 | 11 | 21 | 14 | 26 | 4 | 8 | ||
| >18 | 7 | 9 | 5 | 71 | 1 | 14 | 1 | 14 | 0 | 0 | ||
| NA | 2 | 3 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Number of Sexual Partners | <5 | 32 | 42 | 15 | 47 | 7 | 22 | 7 | 22 | 3 | 9 | 0.214 |
| 6–10 | 21 | 27 | 11 | 52 | 5 | 24 | 5 | 24 | 0 | 0 | ||
| >10 | 22 | 29 | 7 | 32 | 3 | 14 | 9 | 41 | 3 | 14 | ||
| NA | 2 | 3 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Menopause | No | 66 | 86 | 30 | 45 | 14 | 21 | 17 | 26 | 5 | 8 | 0.688 |
| Yes | 8 | 10 | 3 | 38 | 1 | 12 | 3 | 38 | 1 | 12 | ||
| NA | 3 | 4 | 2 | 67 | 0 | 0 | 1 | 33 | 0 | 0 | ||
| Transfer Area | Zone type 1 | 34 | 44 | 19 | 56 | 6 | 18 | 8 | 24 | 1 | 3 | 0.043 |
| Zone type 2 | 27 | 35 | 7 | 26 | 8 | 30 | 11 | 41 | 1 | 4 | ||
| Zone type 3 | 14 | 18 | 7 | 50 | 1 | 7 | 2 | 14 | 4 | 29 | ||
| NA | 2 | 3 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Previous Conization | No | 63 | 82 | 24 | 38 | 12 | 19 | 21 | 33 | 6 | 10 | 0.018 |
| Yes | 12 | 16 | 9 | 75 | 3 | 25 | 0 | 0 | 0 | 0 | ||
| NA | 2 | 3 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| HPV Test | HPV 16 | 12 | 16 | 3 | 25 | 1 | 8 | 4 | 33 | 4 | 33 | <0.001 |
| HPV 16 and Others | 5 | 6 | 0 | 0 | 0 | 0 | 5 | 100 | 0 | 0 | ||
| HPV 18 | 3 | 4 | 2 | 67 | 1 | 33 | 0 | 0 | 0 | 0 | ||
| HPV Negative | 26 | 34 | 22 | 85 | 3 | 12 | 1 | 4 | 0 | 0 | ||
| HPV Others | 29 | 38 | 6 | 21 | 10 | 34 | 11 | 38 | 2 | 7 | ||
| NA | 2 | 3 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Cytology | HSIL | 28 | 36 | 2 | 7 | 2 | 7 | 20 | 71 | 4 | 14 | <0.001 |
| LSIL | 13 | 17 | 3 | 23 | 10 | 77 | 0 | 0 | 0 | 0 | ||
| Normal | 31 | 40 | 27 | 87 | 3 | 10 | 1 | 3 | 0 | 0 | ||
| Possible IC | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 100 | ||
| NA | 4 | 5 | 3 | 75 | 0 | 0 | 0 | 0 | 1 | 25 | ||
| Colposcopy Result | Invasive Carcinoma | 6 | 8 | 0 | 0 | 1 | 17 | 0 | 0 | 5 | 83 | <0.001 |
| Grade 2 | 15 | 19 | 2 | 13 | 2 | 13 | 11 | 73 | 0 | 0 | ||
| Grade 1 | 14 | 18 | 2 | 14 | 6 | 43 | 5 | 36 | 1 | 7 | ||
| Normal | 40 | 52 | 29 | 72 | 6 | 15 | 5 | 12 | 0 | 0 | ||
| NA | 2 | 3 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Training Class | Number of Pixels | Percentage of Pixels | Number of Patients | Percentage of Patients | Label IDs Included |
|---|---|---|---|---|---|
| Normal (Gold Standard) | 4,964,691 | 96.4% | 26 | 38% | 103, 104, 105 |
| LSIL (CIN1) | 19,082 | 0.4% | 15 | 22% | 200 |
| HSIL (CIN2–3) | 79,969 | 1.6% | 21 | 31% | 201, 202 |
| Invasive Carcinoma | 87,869 | 1.7% | 6 | 9% | 300 |
| Metric | Normal (Healthy) | HSIL + IC |
|---|---|---|
| F1-score | 0.85 ± 0.11 | 0.62 ± 0.42 |
| Precision | 0.91 ± 0.14 | 0.69 ± 0.46 |
| Recall | 0.83 ± 0.14 | 0.60 ± 0.39 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Vega, C.; Medina, N.; Leon, R.; Fabelo, H.; Martín, A.; Callico, G.M. HyCervix: In Vivo Hyperspectral Cervix Dataset for Non-Invasive Detection of Precancerous and Cancerous Lesions. Data 2026, 11, 62. https://doi.org/10.3390/data11030062
Vega C, Medina N, Leon R, Fabelo H, Martín A, Callico GM. HyCervix: In Vivo Hyperspectral Cervix Dataset for Non-Invasive Detection of Precancerous and Cancerous Lesions. Data. 2026; 11(3):62. https://doi.org/10.3390/data11030062
Chicago/Turabian StyleVega, Carlos, Norberto Medina, Raquel Leon, Himar Fabelo, Alicia Martín, and Gustavo M. Callico. 2026. "HyCervix: In Vivo Hyperspectral Cervix Dataset for Non-Invasive Detection of Precancerous and Cancerous Lesions" Data 11, no. 3: 62. https://doi.org/10.3390/data11030062
APA StyleVega, C., Medina, N., Leon, R., Fabelo, H., Martín, A., & Callico, G. M. (2026). HyCervix: In Vivo Hyperspectral Cervix Dataset for Non-Invasive Detection of Precancerous and Cancerous Lesions. Data, 11(3), 62. https://doi.org/10.3390/data11030062

