Next Article in Journal
Beyond the List: A Framework for the Design of Next-Generation MEDLINE Search Tools
Previous Article in Journal
VitralColor-12: A Synthetic Twelve-Color Segmentation Dataset from GPT-Generated Stained-Glass Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Data Descriptor

MCR-SL: A Multimodal, Context-Rich Skin Lesion Dataset for Skin Cancer Diagnosis

by
Maria Castro-Fernandez
1,*,
Thomas Roger Schopf
2,
Irene Castaño-Gonzalez
3,
Belinda Roque-Quintana
3,
Herbert Kirchesch
4,
Samuel Ortega
1,5,6,
Himar Fabelo
1,7,8,
Fred Godtliebsen
6,
Conceição Granja
2 and
Gustavo M. Callico
1,*
1
Research Institute for Applied Microelectronics (IUMA), Universidad de Las Palmas de Gran Canaria, 35001 Las Palmas de Gran Canaria, Spain
2
Norwegian Center for E-Health Research, University Hospital of North-Norway, 9038 Tromsø, Norway
3
Department of Dermatology, Hospital Universitario de Gran Canaria Dr. Negrín, Barranco de la Ballena s/n, 35010 Las Palmas de Gran Canaria, Spain
4
Dermatology Private Office, 51147 Cologne, Germany
5
Norwegian Institute of Food, Fisheries and Aquaculture Research (Nofima), 9291 Tromsø, Norway
6
Department of Mathematics and Statistics, UiT The Arctic University of Norway, 9037 Tromsø, Norway
7
Fundación Canaria Instituto de Investigación Sanitaria de Canarias (FIISC), 35019 Las Palmas de Gran Canaria, Spain
8
Research Unit, Hospital Universitario de Gran Canaria Dr. Negrín, 35019 Las Palmas de Gran Canaria, Spain
*
Authors to whom correspondence should be addressed.
Data 2025, 10(10), 166; https://doi.org/10.3390/data10100166
Submission received: 4 September 2025 / Revised: 10 October 2025 / Accepted: 15 October 2025 / Published: 18 October 2025

Abstract

Well-annotated datasets are fundamental for developing robust artificial intelligence models, particularly in medical fields. Many existing skin lesion datasets have limitations in image diversity (including only clinical or dermoscopic images) or metadata, which hinder their utility for mimicking real-world clinical practice. The purpose of the MCR-SL dataset is to introduce a new, meticulously curated dataset that addresses these limitations. The MCR-SL dataset was collected from 60 subjects at the University Hospital of North Norway and comprises 779 clinical images and 1352 dermoscopic images of 240 unique lesions. The lesion types included are nevus, seborrheic keratosis, basal cell carcinoma, actinic keratosis, atypical nevus, melanoma, squamous cell carcinoma, angioma, and dermatofibroma. Labels were established by combining the consensus of a panel of four dermatologists with histopathology reports for the 29 excised lesions, with the latter serving as the gold standard. The resulting dataset provides a comprehensive resource with clinical and dermoscopic images and rich clinical context, ensuring a high level of clinical relevance, surpassing many existing resources in that matter. The MCR-SL dataset provides a holistic and reliable foundation for validating artificial intelligence models, enabling a more nuanced and clinically relevant approach to automated skin lesion diagnosis that mirrors real-world clinical practice.
Dataset: The data presented in this study are openly available in Zenodo at https://zenodo.org/records/17306338 (accessed on 16 October 2025).
Dataset License: CC-BY

1. Summary

Skin cancer is one of the most prevalent types of cancer worldwide, and it is expected to continue growing in the fair-skinned population in 2050 [1]. It is diagnosed by dermatologists, who evaluate the look of the lesion and other factors such as the subject’s risk factors (e.g., family history of skin cancer) or any associated symptoms (e.g., itching, bleeding, or pain) of the lesion. There are two image modalities typically used to depict skin lesions for diagnosis: clinical and dermoscopic images. The first shows the lesion as it appears to the naked eye. At the same time, the latter is captured using a dermoscope, which illuminates the skin with polarized or non-polarized light, removes surface reflections, and magnifies the lesion. An example of each modality is shown in Figure 1.
Artificial Intelligence (AI) models have been applied to skin lesion classification for some years. The initial breakthrough came with Convolutional Neural Networks (CNNs), which demonstrated the ability to classify skin lesions just as well or better than human dermatologists on carefully curated datasets [2,3,4,5]. Commonly used architectures included Inception (v3 and v4), ResNet50, EfficientNet, and ensembles based on these models. However, when these unimodal models were tested with real-world, lower-quality images (such as those collected with a smartphone), their performance often dropped significantly [6]. To tackle these robustness and generalizability challenges, two areas of research emerged: (1) Developing more ‘dermatologist-like’ and multimodal approaches, which involved using attention mechanisms for explainability and combining image data with crucial clinical metadata about the patient and lesion, mirroring a human’s holistic assessment [7,8,9]. (2) Applying entirely new, more powerful architectures like the Vision Transformer (ViT), which leverage global context and attention mechanisms to process images, thereby promising greater robustness and accuracy than traditional CNNs [10,11].
The development of those models has been greatly promoted by the existence of public datasets, such as the HAM10000, BCN20000, PH2, or PAD-UFES-20 dataset [12,13,14,15]. However, the ability of these models to accurately classify skin lesions and assist in clinical diagnosis is directly dependent on the richness and quality of the data used for their training and validation. While several datasets of skin lesions are publicly available, many present limitations in terms of image diversity, detailed metadata, or the methodology for establishing ground truth labels [15,16].
To address these limitations, this work introduces a new multimodal dataset of skin lesions, collected and curated to provide a comprehensive resource for the scientific community. The dataset includes clinical and dermoscopic images, as well as tabular metadata about the subjects, lesions, and diagnoses, covering clinical data, skin cancer risk factors, lesion characteristics (e.g., lesion diameter or body location), and diagnostic information. The Multimodal, Context-Rich Skin Lesion dataset (MCR-SL) is the result of a data acquisition campaign carried out at the University Hospital of North Norway (UNN) and was initially created to serve as a controlled test dataset for the AI models developed within the European project WARIFA (Watching the Risk Factors, Grant Agreement: 101017385) [17]. The project targeted automatic skin cancer prevention and detection based on smartphone applications, which necessitated a dataset that reflects the challenging, non-ideal conditions inherent in data captured by the general public. While even curated datasets show significant variability in lighting and focus [18], real-life collected data presents even deeper variability in lighting, motion blur, and lack of focus. Therefore, the MCR-SL dataset was specifically curated to capture this diversity, making it an ideal resource for testing robust models intended to be used in challenging scenarios.
The MCR-SL dataset comprises 2131 images documenting 240 skin lesions from 60 subjects. It includes a combination of 779 clinical images and 1352 dermoscopic images, covering diagnostic categories, including: nevus (NEV), seborrheic keratosis (SK), basal cell carcinoma (BCC), actinic keratosis (AK), atypical nevus (ATY), melanoma (MEL), squamous cell carcinoma (SCC), angioma (ANG), dermatofibroma (DF), and unknown (UNK). A central feature of this dataset is its approach to ground truth labeling. The diagnosis for each lesion was established in two ways: first, a panel of four dermatologists diagnosed every lesion; then, for those lesions that had been excised, the histopathology results served as the gold standard. A unified diagnosis combines both to serve as the gold standard. In addition to the images, the dataset includes extensive anonymized metadata: 9 attributes for lesions, 22 for subjects, and 16 attributes in total for diagnoses (encompassing dermatological, histopathological, and unified diagnoses). All data underwent a thorough curation process to ensure integrity and consistency, which included image standardization and the removal of all subject-identifying information to maintain privacy.
The MCR-SL dataset distinguishes itself from existing resources by combining key strengths often found in isolation in other datasets, as shown in Table 1 (Note that in this table and the following ones “#” stands for “number of”). For instance, while datasets like PAD-UFES-20 [15] include detailed subject metadata, they consist solely of clinical images. Conversely, popular dermoscopic-only datasets such as PH2 [14] offer detailed dermoscopic criteria of the images, but lack the crucial context provided by clinical images and extensive metadata. The well-known HAM10000 dataset [12] provides a large number of images and robust ground truth, but its metadata is often limited to basic subject demographics, and its image modalities can vary across different challenges. The MCR-SL dataset combines both clinical and dermoscopic images with extensive subject and lesion metadata, which provides the critical context typically available to a clinician. This holistic structure aims to mirror real-world clinical practice, enabling a more nuanced approach to lesion diagnosis where experts consider a subject’s complete history, individual characteristics, and the nuanced details of the lesion itself. Furthermore, for those lesions that were excised (around 12% of the lesions in the dataset), our ground truth labeling combines the consensus of an expert panel of dermatologists for all the lesions with histopathology reports (when available), providing a robust and reliable label for each lesion.
The single-center, Scandinavian origin of the dataset is a limitation to generalizability, as well as its limited size. However, its design incorporates several features to facilitate future expansion through multi-center or international collaboration. The relational structure was chosen to ensure that new data entries can be seamlessly integrated. By using unique identifiers and standardized tables, the addition of new subjects, lesions, and images is straightforward. Furthermore, the modular design accommodates the inclusion of new data collection points without requiring a redesign of the core database schema. For instance, a new table for data collection locations could be added to account for other clinics or hospitals. This forward-looking approach allows the dataset to grow over time and provides a flexible framework for potential collaborative, multi-center studies.
Nowadays, the field is heading towards the usage of multimodal models, like the one proposed by Yan et al. [19], but the modalities under investigation are expanding rapidly. A variety of advanced non-invasive imaging and spectral modalities are currently being researched to improve skin cancer detection, sometimes in combination. Some of these techniques include Confocal Reflectance Microscopy (RCM), Optical Coherence Tomography (OCT), Laser Speckle Contrast Imaging (LSCI), Photoacoustic Imaging (PAI), and Raman Spectroscopy (RS). Among these, the development and application of Multi and Hyperspectral Imaging (MHSI) is a deeply researched area, with a strong body of work focusing on its unique ability to capture rich spectral signatures (providing molecular and chemical information) over a wide spatial area without the high cost or complexity associated with modalities like OCT [20,21,22,23]. The growing trend emphasizes the complementary nature of these data sources, as evidenced by studies combining dermoscopic images with RCM [24], a “four-modal device” comprising OCT, photoacoustic tomography, ultrasound, and Raman spectroscopy developed for in vivo skin lesion assessment [25], or the use of LSCI, hyperspectral, and photoacoustic imaging for functional and molecular 3D mapping of tumors [26]. Furthermore, the integration of structural and spectral data, such as OCT with Raman Spectroscopy [27], highlights the shift toward models that can interpret a deep, feature-rich portrait of the lesion. This trajectory suggests that future multimodal models will increasingly incorporate rich spectral data, such as MHSI, alongside traditional clinical, dermoscopic, and patient metadata, to offer a truly holistic and non-invasive diagnostic assessment.

2. Data Description

The MCR-SL dataset documents 240 unique skin lesions collected from 60 subjects. It consists of 779 clinical images, 1352 dermoscopic images. Each lesion has a diagnosis from a panel of dermatologists, and for the 29 lesions that were excised, a histopathological diagnosis is also included. The dataset encompasses various types of skin lesions, including NEV, SK, BCC, AK, ATY, MEL, SCC, ANG, DF, and UNK.
Table 2 summarizes the distribution of lesion types, detailing the number of lesions and subjects associated with each specific type. The percentages for subjects and lesions are calculated with respect to the total number of subjects (60) and lesions (240) in the dataset, respectively. Note that the percentages for lesions sum to 100%, but the percentages for subjects do not, as some subjects may present with multiple types of skin lesion types. For example, a subject with both nevi and seborrheic keratoses is counted in both categories, which is why the sum of subjects across categories can exceed the total number of unique subjects (60). How the lesions were diagnosed is explained in Section 3.4.
Beyond the formal lesion and histopathological diagnoses, the cohort’s phenotypical characteristics were recorded. Notably, the subject’s Fitzpatrick types were not formally recorded. Regarding the Fitzpatrick Skin Type (FST), we acknowledge that this information was not formally assessed during the consultation. However, the metadata collected already includes the “subject’s skin reaction to sun exposure.” As confirmed by our clinical experts, the levels recorded in this variable (specifically those describing the reaction of ‘red and pain,’ ‘red,’ and ‘tanning without reddening’) align directly with the core criteria used to determine FST I, FST II, and FST III, respectively. Considering the Scandinavian setting where FST I–III are most common, this field can be accepted as a clinically acceptable surrogate for FST.
To better illustrate them, the association between subjects’ characteristics and lesion’s characteristics with the lesion malignancy is explored in Table 3 and Table 4, respectively. In both tables, the missing values have been managed to allow the analysis of these relationships. In the case of categorical data, they have been treated as “unknown”, while in the case of numerical data, they have been input as the mean of the attribute. Then, Chi-squared test p-values are included to indicate the strength of association between each attribute and lesion malignancy. In these tables, NM and M stand for Non-malignant and Malignant, respectively.
Regarding the images, they are stored in separate folders for clinical and dermoscopic image modalities. Both types of images were cropped to standardized sizes, which are detailed in the Methods section. It is important to note that many images of the same lesion are near duplicates, captured with slight variations in lighting, focus, or rotation.
The images are provided in PNG (.png) format and utilize the sRGB color space. Each image is accompanied by extensive metadata detailing the lesions and subjects’ characteristics. The metadata is organized into multiple tables (provided as spreadsheets) designed to function as a relational database. The attributes and structure of these tables are further explained in Section 2.1.

2.1. Dataset Structure

The dataset is composed of both images and contextual data tables, which together provide a comprehensive record of skin lesions. The images are organized into two separate folders based on modality: dermoscopic and clinical. The contextual data is stored in several tables that contain clinical information about each lesion and the subjects. Each of these elements is an entity in our dataset, with the Lesion entity serving as the central element that connects all other data. This structure is further detailed in the Entity-Relationship Diagram shown in Figure 2.
In it, each rectangle represents an entity of our database, and the diamonds represent the relationship between the two entities connected to them. The numbers indicate the cardinality of the relationship, which specifies how many instances of one entity can be associated with instances of another. For instance, the relationship “subject-lesion (1,1):(1,M)” in the diagram shows that a single Subject can have multiple Lesions (1 to M), but each Lesion belongs to a single Subject. In contrast, the relationship “lesion-unified (1,1):(1,1)” in the diagram indicates that each Lesion is linked to a single Unified diagnosis, and each Unified diagnosis corresponds to a single Lesion. Also, the relationship “histopath-unified (0,1):(1,1)” between Histopathology diagnosis and Unified diagnosis entities shows that each Histopathology diagnosis is associated with a single Unified diagnosis; however, a Unified diagnosis may not be linked to any Histopathology diagnosis. Each entity and its relationships are further explained below.

2.1.1. Lesion Entity

The Lesion entity serves as the central entity of the dataset. Each entry is uniquely identified by a lesion_id, which is tied to a specific subject. This entity contains a unified diagnosis for each lesion and is associated with multiple images (both clinical and dermoscopic). Additional attributes, such as the lesion’s diameter (diameter) or the referring physician’s diagnosis (referral_diagnosis), are also included. All attributes of the Lesion entity are described in detail in Table 5.
Note that a few lesions were included even if they had only one image modality, as they were accompanied by other lesions from the same subject for which both modalities were available. This allowed for a more complete dataset and a holistic analysis of each subject.

2.1.2. Subject Entity

The Subjects table contains extensive, anonymized clinical data and risk factors, obtained through a questionnaire filled in by the subject. Each entry is uniquely identified by its subject_id. The specific attributes of this table are detailed in Table 6.

2.1.3. Image Entities

The Images entity stores metadata for all the acquired images. Each entry is uniquely identified by its image_id and is linked to a specific lesion via the lesion_id. It also includes the modality attribute, which specifies the type of image (e.g., clinical or dermoscopic). The attributes of the Images entity are detailed in Table 7, which serves as the key link connecting the image files to the rich contextual information of lesions and subjects stored in the dataset.

2.1.4. Diagnostic Entities: Dermatology, Histopathology, and Unified Diagnosis

The dataset contains three distinct types of diagnosis:
(1)
Dermatology Diagnosis: A diagnosis provided by a panel of dermatologists assigned to each lesion.
(2)
Histopathology Diagnosis: A diagnosis derived from histopathology reports, available for a subset of 29 excised lesions (out of 240). This report also contains tumor thickness information when applicable.
(3)
Unified Diagnosis: The definitive label for this dataset, derived by synthesizing the dermatology and histopathology diagnoses. The methodology for generating this label is detailed in the Methods section.
The attributes of the Dermatology, Histopathology, and Unified diagnosis entities are detailed in Table 8, Table 9, and Table 10, respectively. The Dermatology entity contains individual diagnoses from each expert, whereas the Unified entity holds the definitive final diagnosis for each lesion. With both expert and histopathology diagnoses available, two analyses can be performed: first, to calculate the diagnostic accuracy of the dermatologists for the 29 excised lesions; and second, to analyze the interobserver variability among the experts. The diagnostic accuracy derived from this subset of histologically proven lesions will be used to extrapolate the dermatologists’ expected performance on the larger set of non-confirmed lesions.

3. Methods

3.1. Ethics Declaration

This study was conducted in accordance with the Declaration of Helsinki. The dataset was obtained in partnership with the dermatology and plastic surgery departments at UNN. The data collection campaign received approval from the Regional Committee for Medical and Health Research Ethics (North) (Ref.: 392439).

3.2. Participants and Selection Criteria

Eligibility criteria include subjects with skin lesions belonging to one of the following types (previously introduced in Table 2): NEV, ATY, SK, AK, ANG, DF, BCC, SCC, or MEL. These skin lesion types have been selected to aid in the development of AI-based algorithms to learn the key differences among benign, malignant, and premalignant lesions. In order to make comparisons between lesions, at least two lesions (of any type) per subject were captured. This protocol allows future users to explore and account for intra-subject variability whenever two lesions of the same type are available for a given subject. The importance of capturing this variability is underscored by studies such as Rotemberg et al. [28], whose work demonstrated that models evaluated accounting for this variability improve their performance. The MCR-SL dataset’s design, which provides the necessary Subject IDs and Lesion IDs for creating subject-disjoint validation splits, directly enables realistic model evaluation without introducing patient-level data leakage. Given resources and time constraints during the collection phase, this strategy was also an efficient way to expand the dataset size while maintaining high data quality.
Note that originally, only melanoma, BCC, or SCC were considered as eligible skin cancer lesions, but the inclusion criteria were extended during the data collection to decrease the probability of missing a relevant lesion for the study. The methodology for consolidating these diagnoses and establishing ground truth is detailed in Section 3.4.
Participants include patients and volunteers from the dermatology and plastic surgery departments at UNN. In this context, patients are individuals who sought medical care, while volunteers are individuals recruited specifically for the study who did not seek treatment. Both departments were asked to collaborate to increase the potential number of subjects and lesions in the final dataset, finding patients with at least one lesion fitting the eligibility criteria.
The recruitment process differed between the departments. In the case of dermatology, dermatologists derived every subject after screening them. For plastic surgery, patients were scheduled to participate 45 min before their surgery. This approach prioritized subject convenience, as other methods would have required additional hospital appointments.
Additionally, a few subjects volunteered to participate to increase the number of benign lesions collected during the campaign. Images of as many lesions as possible were taken from all participants, including those the subjects were referred to, and any incidental findings.

3.3. Data Acquisition Workflow

Data collection and diagnosis were performed as three distinct steps, separately. First, a questionnaire is used to gather information about the subject’s demographic profile (e.g., age, sex) and skin cancer risk factors, while images of their lesions are acquired. Then, the histopathology reports for the excised lesions were collected through the plastic surgery department personnel. Finally, once the image collection ended, a panel of dermatologists was asked to diagnose one image per lesion. The image provided for each lesion was a randomly selected dermoscopic image, except for a few cases where none was available, in which case a clinical image was used instead. The workflow is illustrated in Figure 3. The consolidation of the diagnoses from dermatology and histopathology is explained in Section 3.4.
Image and subject metadata were collected on-site at UNN. Data collection was carried out by a researcher familiar with the appearance of clinical and dermoscopic images, though without formal training in their acquisition. Room illumination corresponded to a standard clinical setup. Clinical images were generally collected as close-up views of the lesions, although in a few cases focusing issues prevented the acquisition of optimal images. The equipment used for image collection consisted of a Xiaomi Redmi 9A smartphone (Xiaomi Corp., Beijing, China), equipped with a 13 MP rear camera (f/2.2 aperture, 28 mm focal length), and a DermLite HÜD 2 portable dermatoscope (3Gen Inc., San Juan Capistrano, CA, USA) providing × 10 magnification under polarized light. The dermoscope, with an outer diameter of approximately 59.2 mm and a lens diameter of 12.5 mm, was attached directly to the smartphone for dermoscopic acquisitions. Clinical images were captured at close range using ambient room illumination without flash.
The equipment was selected based on the specific goals of the WARIFA project. This decision was driven by two primary rationales: (1) alignment with deployment reality and (2) robustness testing. We determined that a mid-class smartphone reflects the typical consumer device and image quality that the deployed AI models will encounter. We acknowledge that higher-end imaging setups could have provided better image quality; however, our choice prioritized accessibility and realism over optimal resolution. The variability, resolution limits, and subtle artifacts introduced by a consumer-grade device (unlike those from a highly controlled clinical camera) create a realistic stress test necessary to evaluate the robustness and generalizability of AI models.
Assuming two lesions per subject, the total time required for carrying out the whole process was around 15–20 min. The steps followed are outlined below:
  • Informed consent: When the subject arrives in the room, they are informed about the study. Then, the subject is given the informed consent form to read and sign if they are willing to participate in the study (Figure 3a). Estimated time: 5 min.
  • Clinical data collection: If the informed consent form is signed, the subject is asked to fill out a questionnaire in situ, so the data collector can clarify any questions the subject may have if needed (Figure 3b). Estimated time: 10 min.
  • Clinical and dermoscopic image acquisition: A smartphone-based digital camera is used by the data collector for capturing the images with and without the dermoscope attached to the device (Figure 3c). Estimated time: 30 s per lesion.
  • Diameter measurement of the skin lesion: The lesion is measured by the data collector with a caliper gauge (Figure 3d). Estimated time: 20 s per lesion.
  • Data Storage: All acquired data are verified and stored in a secure, encrypted storage system (Figure 3e). Estimated time: 5 min per lesion.
Our strategy involved capturing multiple images of the same lesion (near duplicates). Even in curated datasets, there is significant variability in lightning and focus [18]. Our images are meant to reflect the variability caused by real-world data imperfections, which we assume have deeper variability in lighting, motion blur, and lack of focus. To capture their differences, a mixed approach was utilized:
  • Real-World Baseline: Images were first acquired using the default automatic settings for all parameters. This captures the natural, heterogeneous noise expected from the average user of the WARIFA application.
  • User Manipulation Scenario: Subsequent images of the same lesion were taken by deliberately adjusting settings such as brightness/exposure and focus. Crucially, this adjustment was performed using the typical user interface (e.g., tap-to-focus or brightness sliders) without setting specific technical values for ISO or exposure time.
This combined approach allows the MCR-SL dataset to simultaneously model two critical challenges: the unpredictable variability of automatic settings and the effects of non-expert user adjustments. Rather than introducing an undesirable bias, this variability is a key feature that allows researchers to test and evaluate whether machine learning models are truly robust against changes in a user’s environment, device settings, and active manipulation. This strengthens model generalizability for clinical application, which is the primary goal of this dataset.
Regarding the subjects, note that they were guided through the questionnaire, but their answers were entirely their own. Participation was encouraged, but not mandatory, which resulted in missing values for some cases. The questionnaire was written in Norwegian to make it easier for subjects to understand; an English version is available in the Supplementary Material.
Special consideration was given to two subjects who were underage at the time of data collection. In these cases, the consent form was signed by their legal guardians. Furthermore, questions asking about characteristics at 18 years old were answered with their current status. Consequently, the answers for the current number of moles and the number of moles at 18 years old are the same for these subjects.
As mentioned, the diagnoses of the excised lesions were obtained from their histopathology reports (Figure 3f). To do that, all the recent reports for a given patient were collected, sorted, and translated into English to extract the relevant information: procedure, diagnosis, and tumor thickness (when applicable).
Once the data collection was completed, the dermatology diagnoses of the acquired images were performed remotely by a panel of four dermatologists (Figure 3g) who had high or very high levels of expertise (self-reported). They used a customized software interface developed for this purpose (Figure 4).
Using this software interface, the dermatologists could examine the image shown and provide a diagnosis for the lesion, alongside other variables. For each image, the dermatologists were asked to specify their certainty of diagnosis on a scale of 0%, 25%, 50%, 75%, and 100% (A 100% rating meant they were completely sure about their answer, while a 0% rating meant the opposite). They also provided a rating of image quality on a scale from 1 to 10, with 1 being the lowest and 10 the highest quality, and the software automatically recorded the time spent on each image. The dermatologists were encouraged to add comments about the lesion and were given the option to specify a lesion type if it was not available in the predefined class options. Finally, if they were unsure about their primary diagnosis, they were asked to provide a second diagnostic option.

3.4. Diagnosis Consolidation and Ground Truth Determination

The lesions were diagnosed in two ways: by dermatologists and, when possible, by histopathology.
First, the software interface and corresponding images were sent to a panel of four expert dermatologists, who independently provided a diagnosis, among other variables. Using only one image per lesion to diagnose, the dermatology diagnosis was determined by taking the most frequent diagnosis from the four experts (i.e., majority voting). If no experts provided the same diagnosis, the diagnosis for that lesion was labeled as UNK. This happened with six lesions (around 2.5% of the total).
A tie-breaker criterion was implemented for the cases where two labels were proposed (Figure 5), based on the collective accuracy of the experts who proposed each label. This occurred for 26 out of 240 lesions (around 10.8% of the total). The experts were ranked according to their individual accuracy against the histopathology diagnoses available within the same dataset. For each tied diagnosis, the average accuracy of the proposing experts was calculated, and the diagnosis associated with the pair of experts with the highest average accuracy was selected as the final dermatology diagnosis. This approach ensured that the decision reflected the combined expertise of the agreeing parties rather than relying on a single individual. Specific accuracy values are reported in Figure 5, and additional details can be found in the dataset.
Histopathology is widely considered the definitive diagnosis because it provides a final, conclusive diagnosis based on the microscopic examination of tissue, rather than on visual patterns alone. However, it is well-documented that it is susceptible to interobserver variability [29]. Similarly, clinical and dermoscopic diagnoses are also subject to variability, as shown by studies on expert consensus [4], but a diagnosis reached by a consensus of expert dermatologists can still serve as a highly reliable benchmark [30].
The reality of public machine learning datasets is that the primary diagnostic label often comes from different sources (histopathology, single image consensus, follow-up, or confirmation by in vivo confocal microscopy). That is the case for both BCN20000 and HAM10000, the latter with a 50% of confirmed by histopathology lesions [12,13]. The PAD-UFES-20 dataset relies primarily on histopathology. However, in the case of the ISIC2020 [28], widely known and used as well, the percentage lowers to 14% for the training set and 7% for the test set [31].
In our case, we established a single, definitive ground truth for each lesion by combining the diagnosis from histopathology and dermatology (multi-expert panel consensus), which minimizes the risk of individual error. To create the unified diagnosis for our dataset, we adopted a hierarchical approach to diagnosis. For all lesions that were excised, the histopathology diagnosis was prioritized and used as the definitive gold standard (Figure 6). For the remaining lesions, which were not excised, the consensus diagnosis from the expert panel served as the ground truth. This systematic integration of expert consensus and histopathology provides a more robust and less ambiguous label for each image in the dataset than a diagnosis from a single expert.
Even though our diagnosis label approach is consistent with community standards, we acknowledge that one limitation of the MCR-SL dataset is the low number of confirmed cases by histopathology (29 out of 240, around 12%), which means that the majority of the labels rely on the panel consensus accuracy. Older studies have demonstrated that single dermatologists typically achieve high accuracy (e.g., >70% nearly two decades ago [32]), and modern research confirms that a majority vote consensus achieves a balanced improvement in both sensitivity and specificity [33], compared to individual experts’ performance. For these reasons, we believe that the multi-expert consensus, documented with the diagnosis-related variables, provides a reliable clinical ground truth relevant to testing diagnostic models.

3.5. Data Curation and Validation

To ensure the integrity and consistency of the dataset, several steps were taken during data collection and post-processing (Figure 7).
First, to protect subject privacy and comply with ethical guidelines, all identifying information, such as names, birth dates, and exact data collection dates were removed. A relational-like database was established, allowing unique identifiers to be assigned to each subject, lesion, and image immediately after collection. These codes serve to identify and link data records while ensuring anonymity. Any free-text fields containing potentially identifying information were also removed to complete the anonymization process.
In the following subsections, the data curation process for the images, metadata, and, specifically, the experts’ feedback is presented.

3.5.1. Image Standardization and Curation

To homogenize the image data, most camera settings were standardized. While automatic white balancing and exposure were used by default to capture the first image of all the lesions, another image was also captured with manually adjusted brightness for comparison (a near duplicate of the first one). This process ensured consistent image capture across the dataset.
Following data collection, a visual inspection and curation process was performed. Images of poor, unrecoverable quality (e.g., highly unfocused or improperly framed images) were removed. However, we intentionally retained images of moderate to lower quality (e.g., slight blurriness or sub-optimal lighting) to allow for comparisons between higher and lower quality images and to study their effect on other variables, such as diagnosis accuracy.
Note that since a formal, automated quality analysis was performed, we do not provide a precise image count for a predefined “low quality” threshold, nor its exact numerical distribution across lesion types. Defining that threshold could be interesting work to do in the future. However, we also believe that the most scientifically robust measure is the one provided by clinical experts themselves. We have provided the necessary data for researchers to conduct an inter-observer analysis on diagnosis-related variables, including the dermatologists’ image quality ratings and diagnostic certainty scores. This complete quality metadata also enables users to conduct the requested quantitative analysis, which remains a potential research line for users of the MCR-SL dataset.
After the visual inspection, the class imbalance is significant and represents an irrefutable limitation for standard skin lesion image classification tasks. However, we believe the decision to retain all images, even those with limited representation, is essential because the dataset’s value is derived from its multimodality and rich contextual metadata, not solely from the size of the lesion classes. This is particularly true for rare, high-priority lesions like melanoma. The low count for melanoma directly reflects its low natural prevalence compared to benign lesions (e.g., nevi); retaining these highly valuable, confirmed cases is paramount, and they should not be discarded by any means.
For all images, even those belonging to underrepresented types, they provide crucial knowledge connected to the other variables. These images carry detailed information related to the experts’ diagnoses (including image rating, diagnosis certainty, and time spent diagnosing that particular image). Thus, even though we considered discarding these images, we concluded that the nuances and knowledge to be extracted from the interconnection between the various variables justify their inclusion in the dataset.
Ultimately, we believe providing the complete, unaltered dataset empowers future users. Researchers can choose to ignore or exclude these minority classes to suit their specific benchmarking needs. Conversely, removing the images would permanently eliminate the valuable multimodal context associated with them, hindering broader research applications.
The remaining images were then manually cropped to contain only the region of interest, excluding artifacts such as the frame of the dermoscopic lens or stray hairs (Figure 8). While this cropping does not necessarily reflect routine clinical practice, it is a standard step performed to ensure a clear judgment and help the AI model focus on the lesion instead of irrelevant artifacts [29,34]. To further standardize the image data, dermoscopic images were consistently cropped to a size of 1750 × 1750 pixels. This dimension was chosen as the maximum squared size that could be obtained while eliminating the dermoscope’s frame. Clinical images were cropped to 512 × 512, 1024 × 1024, or 1750 × 1750 pixels, depending on the lesion size.

3.5.2. Metadata Validation and Consolidation

In parallel with image curation, the metadata was reviewed for consistency and completeness. Missing values were not imputed or removed from the dataset, leaving the choice of how to handle them to end users of the dataset. All categorical data, such as lesion locations and diagnoses, were standardized to a predefined list. For instance, we collapsed the original body locations into a more compact list of categories but maintained both fields to provide flexibility for future researchers. Data types were also validated to ensure, for example, that age was stored as a numerical value. Beyond these standard measures, the expert ratings on image quality, collected for a fraction of the dataset (29 out of 240 lesions), serve as a valuable, human-centric validation of the dataset’s visual integrity.

3.5.3. Experts’ Feedback

The MCR-SL dataset offers comprehensive raw and joined diagnostic data for every lesion, which is essential for evaluating model robustness against uncertain, realistic clinical inputs. This extensive metadata includes the individual diagnoses proposed by experts, their certainty levels, and the time spent diagnosing. By providing the raw individual diagnoses of all dermatologists who labeled the images, the dataset enables researchers to explore diagnostic uncertainty via expert votes and certainty scores, or to apply alternative consensus criteria by leveraging the raw voting data in place of the established tiebreaker. Furthermore, the dataset ensures complete transparency regarding the ground truth by clearly distinguishing between lesions confirmed by histopathology (the highest confidence level) and those confirmed by the dermatology diagnosis (multi-expert consensus). This transparent stratification allows researchers to utilize the data selectively, for example, by using only the cases confirmed by histopathology for critical validation tasks
However, all image quality ratings from expert E002 were lost due to an unrecoverable technical error. Given the nature of this complete data loss, imputation was not feasible and was thus not attempted. This loss primarily serves as a limitation for future analyses of inter-observer variability for image quality ratings. Future users planning this specific analysis must note that the comparison pool for image quality is limited to the three remaining experts. Crucially, all other diagnostic metadata (e.g., final diagnosis, certainty score) provided by expert E002, and all the data related to the other experts, were unaffected by this loss and are fully retained in the dataset for all other forms of analysis.
Notably, the remaining experts’ responses varied concerning image quality: those who generally gave lower ratings to the images also tended to achieve lower accuracy in their diagnosis. This suggests that some experts may be less comfortable diagnosing when image quality is limited. However, further studies are needed to fully analyze this inter-observer variability and the impact of image quality on diagnostic confidence.

Supplementary Materials

The questionnaire used for the data collection can be downloaded at: https://www.mdpi.com/article/10.3390/data10100166/s1.

Author Contributions

Conceptualization, M.C.-F.; Methodology, M.C.-F., T.R.S., H.K., B.R.-Q. and I.C.-G.; Software, M.C.-F. and S.O.; Validation, M.C.-F., T.R.S., H.K., B.R.-Q. and I.C.-G.; Formal Analysis, M.C.-F.; Investigation, M.C.-F., T.R.S., H.K., B.R.-Q. and I.C.-G.; Resources, H.F. and G.M.C.; Data Curation, M.C.-F.; Writing—Original Draft Preparation, M.C.-F.; Writing—Review and Editing, T.R.S., H.K., B.R.-Q., I.C.-G., H.F., S.O., F.G. and G.M.C.; Visualization, M.C.-F.; Supervision, H.F. and G.M.C.; Project Administration, M.C.-F. and C.G.; Funding Acquisition, F.G., G.M.C., T.R.S. and C.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was completed while Maria Castro-Fernandez was a beneficiary of a predoctoral fellowship from the 2022 Ph.D. Training Program for Research Staff of the University of Las Palmas de Gran Canaria (ULPGC). The data collection was performed as part of the tasks in the Watching the Risk Factors (WARIFA) project. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101017385. The labeling software used is an adaptation of the original version created during a research project supported by the IKT+ initiative, funded by the Research Council of Norway (grant no. 332901).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Regional Committee for Medical and Health Research Ethics (North) (Ref.: 392439) at UNN Hospital.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are publicly available at https://doi.org/10.5281/zenodo.17306338 (Uploaded on 10 October 2025).

Acknowledgments

The authors would like to thank the Dermatology and Plastic Surgery Departments at UNN Hospital for their invaluable assistance in facilitating subject recruitment and data collection. Authors also thank Pablo Hernández Morera for his valuable comments, which have helped to improve some of the descriptions presented in this manuscript. During the preparation of this manuscript, the authors used Gemini 2.5 Flash to polish a human-written text. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
CNNConvolutional Neural Network
ViTVision Transformer
MCR-SLMultimodal, Context-Rich Skin Lesion
UNNUniversity Hospital of North Norway
WARIFAWatching the Risk Factors
NEVNevus
SKSeborrheic Keratosis
BCCBasal Cell Carcinoma
AKActinic Keratosis
ATYAtypical nevus
MELMelanoma
SCCSquamous Cell Carcinoma
ANGAngioma
DFDermatofibroma
UNKUnknown
NMNon-malignant
MMalignant

References

  1. Wang, R.; Chen, Y.; Shao, X.; Chen, T.; Zhong, J.; Ou, Y.; Chen, J. Burden of Skin Cancer in Older Adults from 1990 to 2021 and Modelled Projection to 2050. JAMA Dermatol. 2025, 161, 715. [Google Scholar] [CrossRef]
  2. Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks. Nature 2017, 542, 115–118, Erratum in Nature 2017, 546, 686. https://doi.org/10.1038/nature22985. [Google Scholar] [CrossRef]
  3. Brinker, T.J.; Hekler, A.; Enk, A.H.; Klode, J.; Hauschild, A.; Berking, C.; Schilling, B.; Haferkamp, S.; Schadendorf, D.; Holland-Letz, T.; et al. Deep Learning Outperformed 136 of 157 Dermatologists in a Head-to-Head Dermoscopic Melanoma Image Classification Task. Eur. J. Cancer 2019, 113, 47–54. [Google Scholar] [CrossRef] [PubMed]
  4. Haenssle, H.A.; Fink, C.; Schneiderbauer, R.; Toberer, F.; Buhl, T.; Blum, A.; Kalloo, A.; Ben Hadj Hassen, A.; Thomas, L.; Enk, A.; et al. Man against Machine: Diagnostic Performance of a Deep Learning Convolutional Neural Network for Dermoscopic Melanoma Recognition in Comparison to 58 Dermatologists. Ann. Oncol. 2018, 29, 1836–1842. [Google Scholar] [CrossRef] [PubMed]
  5. Ha, Q.; Liu, B.; Liu, F. Identifying Melanoma Images Using EfficientNet Ensemble: Winning Solution to the SIIM-ISIC Melanoma Classification Challenge. arXiv 2020, arXiv:2010.05351. [Google Scholar]
  6. Dascalu, A.; Walker, B.N.; Oron, Y.; David, E.O. Non-Melanoma Skin Cancer Diagnosis: A Comparison between Dermoscopic and Smartphone Images by Unified Visual and Sonification Deep Learning Algorithms. J. Cancer Res. Clin. Oncol. 2021, 148, 2497–2505. [Google Scholar] [CrossRef] [PubMed]
  7. Pacheco, A.G.C.; Krohling, R.A. The Impact of Patient Clinical Information on Automated Skin Cancer Detection. Comput. Biol. Med. 2020, 116, 103545. [Google Scholar] [CrossRef] [PubMed]
  8. Pacheco, A.G.C.; Krohling, R.A. An Attention-Based Mechanism to Combine Images and Metadata in Deep Learning Models Applied to Skin Cancer Classification. IEEE J. Biomed. Health Inf. 2021, 25, 3554–3563. [Google Scholar] [CrossRef]
  9. Castro-Fernandez, M.; Hernandez, A.; Fabelo, H.; Balea-Fernandez, F.J.; Ortega, S.; Callico, G.M. Towards Skin Cancer Self-Monitoring through an Optimized MobileNet with Coordinate Attention. In Proceedings of the 2022 25th Euromicro Conference on Digital System Design (DSD), Maspalomas, Spain, 31 August–2 September 2022; IEEE: New York, NY, USA, 2022; pp. 607–614. [Google Scholar]
  10. Nie, Y.; Sommella, P.; Carratù, M.; O’Nils, M.; Lundgren, J. A Deep CNN Transformer Hybrid Model for Skin Lesion Classification of Dermoscopic Images Using Focal Loss. Diagnostics 2022, 13, 72. [Google Scholar] [CrossRef] [PubMed]
  11. Gallazzi, M.; Biavaschi, S.; Bulgheroni, A.; Gatti, T.M.; Corchs, S.; Gallo, I. A Large Dataset to Enhance Skin Cancer Classification with Transformer-Based Deep Neural Networks. IEEE Access 2024, 12, 109544–109559. [Google Scholar] [CrossRef]
  12. Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 Dataset, a Large Collection of Multi-Source Dermatoscopic Images of Common Pigmented Skin Lesions. Sci. Data 2018, 5, 180161. [Google Scholar] [CrossRef]
  13. Combalia, M.; Codella, N.C.F.; Rotemberg, V.; Helba, B.; Vilaplana, V.; Reiter, O.; Carrera, C.; Barreiro, A.; Halpern, A.C.; Puig, S.; et al. BCN20000: Dermoscopic Lesions in the Wild. arXiv 2019, arXiv:1908.02288. [Google Scholar] [CrossRef]
  14. Mendonca, T.; Ferreira, P.M.; Marques, J.S.; Marcal, A.R.S.; Rozeira, J. PH2—A Dermoscopic Image Database for Research and Benchmarking. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan, 3–7 July 2013; IEEE: Osaka, Japan, 2013; pp. 5437–5440. [Google Scholar]
  15. Pacheco, A.G.C.; Lima, G.R.; Salomão, A.S.; Krohling, B.; Biral, I.P.; de Angelo, G.G.; Alves, F.C.R., Jr.; Esgario, J.G.M.; Simora, A.C.; Castro, P.B.C.; et al. PAD-UFES-20: A Skin Lesion Dataset Composed of Patient Data and Clinical Images Collected from Smartphones. Data Brief 2020, 32, 106221. [Google Scholar] [CrossRef]
  16. Codella, N.; Rotemberg, V.; Tschandl, P.; Celebi, M.E.; Dusza, S.; Gutman, D.; Helba, B.; Kalloo, A.; Liopyris, K.; Marchetti, M.; et al. Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC). arXiv 2019, arXiv:1902.03368. [Google Scholar] [CrossRef]
  17. Watching the Risk Factors: Artificial Intelligence and the Prevention of Chronic Conditions|WARIFA Project|Fact Sheet|H2020|CORDIS|European Commission. Available online: https://cordis.europa.eu/project/id/101017385/es (accessed on 27 October 2021).
  18. Petrie, T.C.; Larson, C.; Heath, M.; Samatham, R.; Davis, A.; Berry, E.G.; Leachman, S.A. Quantifying Acceptable Artefact Ranges for Dermatologic Classification Algorithms. Ski. Health Dis. 2021, 1, e19. [Google Scholar] [CrossRef]
  19. Yan, S.; Yu, Z.; Primiero, C.; Vico-Alonso, C.; Wang, Z.; Yang, L.; Tschandl, P.; Hu, M.; Ju, L.; Tan, G.; et al. A Multimodal Vision Foundation Model for Clinical Dermatology. Nat. Med. 2025, 31, 2691–2702. [Google Scholar] [CrossRef]
  20. Johansen, T.H.; Møllersen, K.; Ortega, S.; Fabelo, H.; Garcia, A.; Callico, G.M.; Godtliebsen, F. Recent Advances in Hyperspectral Imaging for Melanoma Detection. WIREs Comput. Stat. 2020, 12, e1465. [Google Scholar] [CrossRef]
  21. Leon, R.; Martinez-Vega, B.; Fabelo, H.; Ortega, S.; Melian, V.; Castaño, I.; Carretero, G.; Almeida, P.; Garcia, A.; Quevedo, E.; et al. Non-Invasive Skin Cancer Diagnosis Using Hyperspectral Imaging for In-Situ Clinical Support. J. Clin. Med. 2020, 9, 1662. [Google Scholar] [CrossRef]
  22. Aloupogianni, E.; Ishikawa, M.; Ichimura, T.; Sasaki, A.; Kobayashi, N.; Obi, T. Design of a Hyper-Spectral Imaging System for Gross Pathology of Pigmented Skin Lesions. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Guadalajara, Mexico, 1–5 November 2021; IEEE: New York, NY, USA, 2021; pp. 3605–3608. [Google Scholar]
  23. Hetz, M.J.; Garcia, C.N.; Haggenmüller, S.; Brinker, T.J. Advancing Dermatological Diagnosis: Development of a Hyperspectral Dermatoscope for Enhanced Skin Imaging. arXiv 2024, arXiv:2403.00612. [Google Scholar] [CrossRef]
  24. De Pascalis, A.; Perrot, J.L.; Tognetti, L.; Rubegni, P.; Cinotti, E. Review of Dermoscopy and Reflectance Confocal Microscopy Features of the Mucosal Melanoma. Diagnostics 2021, 11, 91. [Google Scholar] [CrossRef]
  25. Roth, B.; Kukk, A.F.; Wu, D.; Panzer, R.; Emmert, S. Four-Modal Device Comprising Optical Coherence Tomography, Photoacoustic Tomography, Ultrasound, and Raman Spectroscopy Developed for in Vivo Skin Lesion Assessment. Biomed. Opt. Express 2025, 16, 1792–1806. [Google Scholar] [CrossRef]
  26. Stridh, M.; Dahlstrand, U.; Naumovska, M.; Engelsberg, K.; Gesslein, B.; Sheikh, R.; Merdasa, A.; Malmsjö, M. Functional and Molecular 3D Mapping of Angiosarcoma Tumor Using Non-Invasive Laser Speckle, Hyperspectral, and Photoacoustic Imaging. Orbit 2024, 43, 453–463. [Google Scholar] [CrossRef]
  27. Wu, D.; Fedorov Kukk, A.; Panzer, R.; Emmert, S.; Roth, B. In Vivo Differentiation of Cutaneous Melanoma from Benign Nevi with Dual–Modal System of Optical Coherence Tomography and Raman Spectroscopy. J. Biophotonics 2025, 18, e70040. [Google Scholar] [CrossRef] [PubMed]
  28. Rotemberg, V.; Kurtansky, N.; Betz-Stablein, B.; Caffery, L.; Chousakos, E.; Codella, N.; Combalia, M.; Dusza, S.; Guitera, P.; Gutman, D.; et al. A Patient-Centric Dataset of Images and Metadata for Identifying Melanomas Using Clinical Context. Sci. Data 2021, 8, 34. [Google Scholar] [CrossRef] [PubMed]
  29. Daneshjou, R.; Barata, C.; Betz-Stablein, B.; Celebi, M.E.; Codella, N.; Combalia, M.; Guitera, P.; Gutman, D.; Halpern, A.; Helba, B.; et al. Checklist for Evaluation of Image-Based Artificial Intelligence Reports in Dermatology: CLEAR Derm Consensus Guidelines from the International Skin Imaging Collaboration Artificial Intelligence Working Group. JAMA Dermatol. 2022, 158, 90–96. [Google Scholar] [CrossRef] [PubMed]
  30. Bourkas, A.N.; Barone, N.; Bourkas, M.E.C.; Mannarino, M.; Fraser, R.D.J.; Lorincz, A.; Wang, S.C.; Ramirez-Garcialuna, J.L. Diagnostic Reliability in Teledermatology: A Systematic Review and a Meta-Analysis. BMJ Open 2023, 13, e068207. [Google Scholar] [CrossRef] [PubMed]
  31. ISIC Archive. ISIC 2020: Training Data. Available online: https://gallery.isic-archive.com/#!/topWithHeader/onlyHeaderTop/gallery?filter=%5B%22collections%7C70%22%5D (accessed on 2 October 2025).
  32. Tran, H.; Chen, K.; Lim, A.C.; Jabbour, J.; Shumack, S. Assessing Diagnostic Skill in Dermatology: A Comparison between General Practitioners and Dermatologists. Australas. J. Dermatol. 2005, 46, 230–234. [Google Scholar] [CrossRef] [PubMed]
  33. Nervil, G.G.; Ternov, N.K.; Lorentzen, H.; Kromann, C.; Ingvar, Å.; Nielsen, K.; Tolsgaard, M.; Vestergaard, T.; Hölmich, L.R. Teledermoscopic Triage of Melanoma-Suspicious Skin Lesions Is Safe: A Retrospective Comparative Diagnostic Accuracy Study with Multiple Assessors. J. Telemed. Telecare 2025, 31, 1296–1307. [Google Scholar] [CrossRef]
  34. Barata, C.; Celebi, M.E.; Marques, J.S. A Survey of Feature Extraction in Dermoscopy Image Analysis of Skin Cancer. IEEE J. Biomed. Health Inf. 2019, 23, 1096–1109. [Google Scholar] [CrossRef]
Figure 1. (a) A clinical and (b) a dermoscopic image of the same lesion (L0098).
Figure 1. (a) A clinical and (b) a dermoscopic image of the same lesion (L0098).
Data 10 00166 g001
Figure 2. Entity-Relationship Diagram: This diagram visually represents the relationships between the dataset’s core entities. Attributes are not included in the figure for the sake of simplicity.
Figure 2. Entity-Relationship Diagram: This diagram visually represents the relationships between the dataset’s core entities. Attributes are not included in the figure for the sake of simplicity.
Data 10 00166 g002
Figure 3. Workflow of the data acquisition process. (a) The subject signs the informed consent form; (b) Collecting clinical data via a questionnaire completed by the subject; (c) Clinical and dermoscopic images are acquired using a smartphone and a portable dermoscope; (d) Diameter of the skin lesion is measured with a caliper; (e) After the subject interview, data is verified and transferred from the smartphone to be stored in a secure, encrypted system; (f) Histopathology diagnosis are obtained for the excised lesions; (g) Dermatology diagnoses are collected from a panel of four expert dermatologists (E1–E4). Note that the dataset labels (final diagnoses) are derived from the combination of both diagnoses (f,g).
Figure 3. Workflow of the data acquisition process. (a) The subject signs the informed consent form; (b) Collecting clinical data via a questionnaire completed by the subject; (c) Clinical and dermoscopic images are acquired using a smartphone and a portable dermoscope; (d) Diameter of the skin lesion is measured with a caliper; (e) After the subject interview, data is verified and transferred from the smartphone to be stored in a secure, encrypted system; (f) Histopathology diagnosis are obtained for the excised lesions; (g) Dermatology diagnoses are collected from a panel of four expert dermatologists (E1–E4). Note that the dataset labels (final diagnoses) are derived from the combination of both diagnoses (f,g).
Data 10 00166 g003
Figure 4. View of the software interface showing a list of potential diagnoses and certain fields for quantifying the certainty level of the diagnosis, the image quality, and for adding a comment about the image or the lesion.
Figure 4. View of the software interface showing a list of potential diagnoses and certain fields for quantifying the certainty level of the diagnosis, the image quality, and for adding a comment about the image or the lesion.
Data 10 00166 g004
Figure 5. Tie-breaker criterion for the dermatology diagnosis, explained with an example.
Figure 5. Tie-breaker criterion for the dermatology diagnosis, explained with an example.
Data 10 00166 g005
Figure 6. Unified diagnosis: if the lesion has a diagnosis from histopathology, that is the gold standard. If not, then it is the diagnosis given by the panel of dermatologists.
Figure 6. Unified diagnosis: if the lesion has a diagnosis from histopathology, that is the gold standard. If not, then it is the diagnosis given by the panel of dermatologists.
Data 10 00166 g006
Figure 7. Data curation and validation workflow. The flowchart details the four main stages of data preparation. Data collection involves acquiring images and associated clinical metadata. The data then undergoes Anonymization, where identifying information is removed (✕) and replaced with unique subject and lesion IDs (✓). The subsequent stage is Data Curation, which comprises two parallel processes: Image Curation & Standardization and Metadata Validation. The former involves rejecting unsuitable images (thumbs down symbol) and standardizing accepted images by cropping them (indicated by the red frame) to isolate the lesion. Simultaneously, metadata validation focuses on consistency checks, standardizing categorical data, and handling missing values in the clinical data. Finally, all refined data feeds into Diagnosis Consolidation for the final expert assessment.
Figure 7. Data curation and validation workflow. The flowchart details the four main stages of data preparation. Data collection involves acquiring images and associated clinical metadata. The data then undergoes Anonymization, where identifying information is removed (✕) and replaced with unique subject and lesion IDs (✓). The subsequent stage is Data Curation, which comprises two parallel processes: Image Curation & Standardization and Metadata Validation. The former involves rejecting unsuitable images (thumbs down symbol) and standardizing accepted images by cropping them (indicated by the red frame) to isolate the lesion. Simultaneously, metadata validation focuses on consistency checks, standardizing categorical data, and handling missing values in the clinical data. Finally, all refined data feeds into Diagnosis Consolidation for the final expert assessment.
Data 10 00166 g007
Figure 8. An example of before (a) and after (b) cropping one of the collected images. The red frame indicates the cropping area.
Figure 8. An example of before (a) and after (b) cropping one of the collected images. The red frame indicates the cropping area.
Data 10 00166 g008
Table 1. Comparison of characteristics for some public skin lesion datasets and the MCR-SL dataset.
Table 1. Comparison of characteristics for some public skin lesion datasets and the MCR-SL dataset.
Dataset
Name
#
Images
Classes
Included
Image
Modality
Gold
Standard
Fields
with IDs
Subject’s
Data
Lesion
Data
Diagnosis
Variables
PH2200NEV, MEL, ATYDermoscopicMixed (Histology,
Expert Consensus)
Image-Dermoscopic
criteria
-
BCN2000010,015NEV, MEL, BCC, SK, AK, ANG, DFDermoscopicMixed (Histology,
Follow-up, Confocal, Expert Consensus)
Lesion, ImageAge, sexBody locationVerification Type (dx_type)
HAM1000019,424NEV, MEL, BCC, SCC, SK, AK, ANG, DF, otherDermoscopicMixed (Histology,
Expert Consensus)
Subject, Lesion, ImageAge, sexBody location-
PAD-UFES-202298NEV, MEL, BCC, SCC, SK, AKClinicalMixed (100% Biopsy for cancers; Expert Consensus for others)Subject, Lesion, ImageAge, sex, skin cancer risk
factors, others
Body location,
lesion diameter, others
-
MCR-SL779;
1352
NEV, SK, BCC, AK, ATY, MEL, ANG, DF, UNKClinical,
Dermoscopic
Mixed (Histology,
Expert Consensus)
Subject, Lesion, ImageAge, sex, skin cancer risk
factors, others
Body location,
lesion diameter, others
Certainty,
image quality, time
Table 2. Lesion distribution by unified and histopathological diagnoses. The numbers in parentheses represent the proportion of subjects and lesions out of the total subjects and lesions in the dataset.
Table 2. Lesion distribution by unified and histopathological diagnoses. The numbers in parentheses represent the proportion of subjects and lesions out of the total subjects and lesions in the dataset.
Lesion TypeMalignancyDiagnosed by HistopathologyDiagnosed by Dermatologists
SubjectsLesionsSubjectsLesions
BCCMalignant18 (30.0%)20 (8.3%)18 (30.0%)26 (10.8%)
MELMalignant3 (5.0%)3 (1.3%)7 (11.7%)8 (3.3%)
SCCMalignant0 (0.0%)0 (0.0%)5 (8.3%)5 (2.1%)
NEVNon-Malignant3 (5.0%)3 (1.3%)37 (61.7%)85 (35.4%)
SKNon-Malignant1 (1.7%)1 (0.4%)34 (56.6%)84 (35.0%)
AKNon-Malignant0 (0.0%)0 (0.0%)10 (16.7%)12 (5.0%)
ATYNon-Malignant2 (3.3%)2 (0.8%)6 (10.0%)7 (2.9%)
ANGNon-Malignant0 (0.0%)0 (0.0%)2 (3.3%)4 (1.7%)
DFNon-Malignant0 (0.0%)0 (0.0%)2 (3.3%)2 (0.8%)
UNK-0 (0.0%)0 (0.0%)6 (10.0%)7 (2.9%)
Total 27 (45.0%)29 (12.1%)60 (100.0%)240 (100.0%)
Table 3. Distribution of demographic and clinical attributes in the dataset (related to Lesions), including counts and percentages for each category, stratified by lesion malignancy.
Table 3. Distribution of demographic and clinical attributes in the dataset (related to Lesions), including counts and percentages for each category, stratified by lesion malignancy.
Attribute
[No Missing Values/Total]
Values#%# NM% NM# M% Mp-Value
Diameter
[238/240]
1.235–12.08319983%2312%53%0.0030
12.083–22.8673414%1750%00%0.0030
22.867–33.6521%00%00%0.0030
33.65–44.43310%00%00%0.0030
55.217–66.010%1100%00%0.0030
Location group
[232/240]
Back9941%1111%33%0.0005
Arms4519%24%12%0.0005
Face4117%1639%12%0.0005
Torso3113%1032%00%0.0005
Legs125%217%18%0.0005
Head42%125%00%0.0005
unknown83%00%00%0.0005
Lesion status when captured
[240/240]
Lesion23598%3816%63%0.0047
Biopsied lesion52%480%00%0.0047
Referral diagnosis
[240/240]
Voluntary sample19782%168%63%0.0000
BCC2510%22100%00%0.0000
SK73%00%00%0.0000
MEL52%360%00%0.0000
NEV52%00%00%0.0000
Morbus Bowen carcinoma10%1100%00%0.0000
MalignancyNon-malignant19280%
Malignant4218%
unknown62%
Table 4. Distribution of demographic and clinical attributes in the dataset (related to Subjects), including counts and percentages for each category, stratified by lesion malignancy.
Table 4. Distribution of demographic and clinical attributes in the dataset (related to Subjects), including counts and percentages for each category, stratified by lesion malignancy.
Attribute
[No Missing Values/Total]
Values#%# NM% NM# M% Mp-Value
Age
[59/60]
14.9–40.7813%00%8100%0.582
40.7–66.32338%1357%1043%
66.3–92.02948%1966%1034%
Sex
[60/60]
Female3355%1236%2164%0.008
Male2745%2074%726%
Height (cm)
[59/60]
145.9–162.31423%429%1071%0.053
162.3–178.72745%1763%1037%
178.7–195.01932%1158%842%
Weight (kg)
[59/60]
38.9–66.01932%632%1368%0.496
66.0–93.03253%2062%1341%
93.0–120.0915%667%444%
Natural hair color
(≤18 years old)
[60/60]
Brown2542%1248%1352%0.382
Fair blonde1932%1053%947%
Dark brown, black1220%975%325%
Red or auburn35%133%267%
Blonde12%00%1100%
Skin reaction to
sun exposure
[60/60]
Red2948%1655%1345%0.844
Brown without 1st becoming red2237%1255%1045%
Red with pain915%444%556%
Number of moles
(≤18 years old)
[53/60]
Few2135%1467%733%0.065
Some1830%528%1372%
Many1423%857%643%
Unknown712%571%229%
Moles > 5 mm
[55/60]
Yes3050%1447%1653%0.361
No2542%1664%936%
Unknown58%240%360%
Moles > 20 cm
[60/60]
No60100%3253%2847%1.000
Number of moles (now)
[53/60]
Some2440%938%1562%0.133
Few2237%1568%732%
Many712%343%457%
Unknown712%571%229%
Number of severe
sunburns
[52/60]
02847%1450%1450%0.617
1–21322%754%646%
3–5813%338%562%
>535%267%133%
Unknown813%675%225%
Sunbed use
[58/60]
No5490%2954%2546%0.218
Yes47%125%375%
Unknown23%2100%00%
History of cancer
[60/60]
No3965%1744%2256%0.073
Yes2135%1571%629%
History of skin cancer
[56/60]
No4168%1946%2254%0.102
Yes1525%960%640%
Unknown47%4100%00%
History of skin cancer
(close relatives)
[60/60]
No5083%2550%2550%0.418
Yes1017%770%330%
Organ transplant
[59/60]
No5795%3053%2747%0.234
Yes23%2100%00%
Unknown12%00%1100%
Immunosuppression
[59/60]
No5490%3056%2444%0.448
Yes58%240%360%
Unknown12%00%1100%
Patients derived from
[60/60]
Plastic surgery3558%2057%1543%0.040
Dermatology1728%1165%635%
Volunteer813%112%788%
Subjects with known
malignant lesions
yes3253%
no2847%
Table 5. Attributes of the Lesion entity.
Table 5. Attributes of the Lesion entity.
AttributeData TypeDescription
lesion_idstringA unique identifier for the lesion.
referral_diagnosistextThe initial diagnosis provided during the subject’s referral.
lesion_status_when_capturedcategoricalThe status of the lesion at the time of imaging.
locationcategoricalThe anatomical location of the lesion on the subject’s body.
location_groupcategoricalA broader classification of the lesion’s location.
diameternumericalThe measured diameter of the lesion in millimeters.
malignancycategoricalThe malignancy status of the lesion (i.e., malignant, non-malignant).
lesion_diagnosistextThe unified diagnosis assigned to the lesion.
diagnosis_image_idstringThe unique identifier of the specific image used by the dermatologists to make their diagnoses.
Table 6. Attributes of the Subject entity.
Table 6. Attributes of the Subject entity.
AttributeData TypeDescription
subject_idstringA unique identifier for the subject.
derived_fromcategoricalThe hospital’s department that derived the subject.
year_of_birthintegerThe subject’s year of birth.
ageintegerThe subject’s age.
sexcategoricalThe subject’s sex.
heightnumericalSubject height in centimeters.
weightnumericalSubject weight in kilograms.
natural_hair_colorcategoricalThe subject’s natural hair color at 18 years old.
skin_reaction_to_suncategoricalHow the subject’s skin reacts to sun exposure without sun protection.
number_of_molesintegerThe total number of moles on the subject at 18 years old.
moles_bigger_5mmintegerCurrent number of moles larger than 5mm.
moles_bigger_20cmintegerCurrent number of moles larger than 20cm.
moles_bodyintegerCurrent number of moles on the body.
sunburn_numberintegerThe number of severe sunburns the subject has experienced.
sunburn_agetextThe age at which the subject experienced severe sunburns.
sunburn_number_groupcategoricalA categorized group for the number of sunburns.
sunbedbooleanWhether the subject has used a sunbed.
h_cancerbooleanHistory of hereditary cancer.
h_skin_cancerbooleanHistory of hereditary skin cancer.
h_skin_cancer_relativesbooleanHistory of skin cancer in close relatives.
organ_transplantbooleanWhether the subject has had an organ transplant.
immunosuppresionbooleanWhether the subject is on immunosuppressive medication.
Table 7. Attributes of the Image entity.
Table 7. Attributes of the Image entity.
AttributeData TypeDescription
image_idstringA unique identifier for each image.
lesion_idstringA unique identifier for the lesion depicted in the image.
modalitycategoricalThe modality of the image (clinical or dermoscopic).
Table 8. Attributes of the Dermatology diagnosis entity.
Table 8. Attributes of the Dermatology diagnosis entity.
AttributeData TypeDescription
diagnosis_idstringA unique identifier for each diagnosis.
lesion_idstringThe identifier of the lesion the diagnosis refers to.
image_idstringThe identifier of the image that was diagnosed.
expert_idstringThe identifier of the dermatologist who provided the diagnosis.
diagnosisstringThe primary diagnosis provided by the expert (e.g., NEV, MEL).
2nd_optionstringAn optional second choice or differential diagnosis.
certaintycategoricalA numerical rating of the expert’s confidence in their diagnosis. Potential values are 0%, 25%, 50%, 75%, and 100%.
image_ratingintegerThe expert’s rating of the image quality, ranging from 1 to 10.
timedatetimeThe time taken by the expert to provide the diagnosis.
Table 9. Attributes of the Histopathology diagnosis entity.
Table 9. Attributes of the Histopathology diagnosis entity.
AttributeData TypeDescription
diagnosis_idstringA unique identifier for each histopathology diagnosis.
lesion_idstringThe identifier of the lesion the diagnosis refers to.
procedurestringThe type of procedure described in the report (e.g., biopsy, excision).
tumor_thicknessfloatThe Breslow thickness of the tumor, if applicable.
diagnosisstringThe final diagnosis from the histopathology report (e.g., NEV, MEL).
Table 10. Attributes of the Unified diagnosis entity.
Table 10. Attributes of the Unified diagnosis entity.
AttributeData TypeDescription
diagnosis_idstringA unique identifier for the unified diagnosis.
lesion_idstringThe identifier of the lesion the diagnosis refers to.
dermatology_diagnosisstringThe final diagnosis selected by the dermatology experts.
histopathology_diagnosisstringThe diagnosis from the histopathology report, used as the ground truth when available.
diagnosis_id_histopathstringThe unique identifier of the histopathological diagnosis of the lesion.
unified_diagnosisstringThe final ground truth diagnosis for the lesion.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Castro-Fernandez, M.; Schopf, T.R.; Castaño-Gonzalez, I.; Roque-Quintana, B.; Kirchesch, H.; Ortega, S.; Fabelo, H.; Godtliebsen, F.; Granja, C.; Callico, G.M. MCR-SL: A Multimodal, Context-Rich Skin Lesion Dataset for Skin Cancer Diagnosis. Data 2025, 10, 166. https://doi.org/10.3390/data10100166

AMA Style

Castro-Fernandez M, Schopf TR, Castaño-Gonzalez I, Roque-Quintana B, Kirchesch H, Ortega S, Fabelo H, Godtliebsen F, Granja C, Callico GM. MCR-SL: A Multimodal, Context-Rich Skin Lesion Dataset for Skin Cancer Diagnosis. Data. 2025; 10(10):166. https://doi.org/10.3390/data10100166

Chicago/Turabian Style

Castro-Fernandez, Maria, Thomas Roger Schopf, Irene Castaño-Gonzalez, Belinda Roque-Quintana, Herbert Kirchesch, Samuel Ortega, Himar Fabelo, Fred Godtliebsen, Conceição Granja, and Gustavo M. Callico. 2025. "MCR-SL: A Multimodal, Context-Rich Skin Lesion Dataset for Skin Cancer Diagnosis" Data 10, no. 10: 166. https://doi.org/10.3390/data10100166

APA Style

Castro-Fernandez, M., Schopf, T. R., Castaño-Gonzalez, I., Roque-Quintana, B., Kirchesch, H., Ortega, S., Fabelo, H., Godtliebsen, F., Granja, C., & Callico, G. M. (2025). MCR-SL: A Multimodal, Context-Rich Skin Lesion Dataset for Skin Cancer Diagnosis. Data, 10(10), 166. https://doi.org/10.3390/data10100166

Article Metrics

Back to TopTop