1. Introduction
Skin cancer remains one of the most prevalent malignancies worldwide, with both melanoma and keratinocyte skin cancers showing a sustained increase, particularly among fair-skinned populations [
1,
2]. Early detection is essential, as timely diagnosis is strongly associated with improved treatment outcomes and prognosis, especially in melanoma [
1,
2].
Dermoscopy, a noninvasive imaging technique, has been extensively used to enhance the visual assessment of skin lesions and support clinical decision-making [
3]. The availability of large, publicly accessible dermoscopic image datasets has further accelerated research in automated skin lesion analysis [
4,
5]. This has enabled the development of data-driven models with high predictive performance [
6,
7]. Convolutional neural networks (CNNs) play a central role by leveraging the spatial structure of dermoscopic images for predictive tasks [
8]. As a result, most recent advances in automated skin lesion assessment have primarily focused on image-based deep learning (DL) pipelines, often relying almost exclusively on dermoscopic images [
9].
However, clinical decision-making in dermatology does not rely exclusively on images. In addition to dermoscopic patterns, clinicians consider patient-level clinical metadata, such as age, sex, and anatomical location, as well as lesion-level descriptors derived from image analysis, including asymmetry, border irregularity, color distribution, and texture-related characteristics [
10]. While clinical metadata typically give rise to low-dimensional tabular representations, lesion-level descriptors extracted from dermoscopic images are often summarized through high-dimensional handcrafted or statistically derived feature sets [
11,
12]. Together, these complementary sources of tabular information capture clinically meaningful aspects of skin lesions. Although clinically relevant, such tabular information remains underutilized in modern DL-based dermatological pipelines [
9,
13]. This is mainly due to the challenges of tabular dermatological data for DL models, including their heterogeneity, absence of spatial structure, and low dimensionality. As a result, standard DL architectures—originally designed to model spatial or temporal correlations—often struggle to effectively learn from tabular representations [
14,
15,
16].
Recent advances in medical image analysis have also emphasized robustness and adaptability to heterogeneous and ambiguous data sources. Beyond purely discriminative CNN pipelines, generative and multimodal learning frameworks have been proposed to better handle uncertainty and complementary information across modalities [
17]. These developments highlight the importance of adaptive representation learning for structured and heterogeneous biomedical data. In this context, transforming tabular clinical features into spatially organized representations can be viewed as an alternative strategy to bridge tabular data with convolutional architectures while preserving clinically meaningful relationships.
Motivated by this need, recent work has explored transforming tabular data into image-like representations, thereby enabling the application of CNN-based models to non-visual data [
18,
19,
20,
21,
22,
23]. These approaches aim to impose a spatial structure on tabular features, allowing convolutional architectures to exploit local patterns and relationships. However, such methods have rarely been investigated in the context of dermatological data [
9], where heterogeneous clinical metadata and lesion-level descriptors are routinely available and clinically meaningful.
Clinical metadata are frequently combined with dermoscopic images in AI-based dermatology studies using multimodal learning strategies such as late fusion, feature concatenation, or attention-based modules [
7,
13,
24,
25]. However, in most existing approaches, tabular information is processed separately from image data, typically using multilayer perceptrons or other dedicated tabular-processing modules, without an explicit spatial organization of structured variables [
24,
26,
27]. This limitation reflects a fundamental representation challenge: although tabular clinical features encode clinically meaningful relationships, their non-spatial format limits how convolutional neural architectures capture spatial dependencies. Tabular-to-image transformation addresses this gap by projecting structured clinical data into a domain more suitable for convolutional processing. While this paradigm has gained increasing attention in other biomedical domains, including omics, radiomics, and electronic health records [
20,
22,
28], its application to dermatological tabular data remains largely unexplored.
Within this line of research, the Low Mixed-Image Generator for Tabular Data (LM-IGTD) framework [
29] introduced a permutation-based strategy to map numerical and categorical features onto two-dimensional images according to their statistical relationships. In addition, LM-IGTD incorporates a noise-based feature augmentation mechanism designed to improve representations in low-dimensional settings. This makes it particularly suitable for clinical data scenarios commonly encountered in biomedical applications.
To the best of our knowledge, the application of tabular-to-image transformation frameworks to dermatological data has not yet been investigated. In this context, the present work is conceived as a proof-of-concept (PoC) study exploring the feasibility of LM-IGTD-based tabular-to-image representations for dermatological clinical data. Specifically, this study considers different sources of tabular information derived from dermoscopic datasets, including patient-level clinical metadata, lesion-level statistical features extracted from images, and their joint representation, in order to assess their suitability for CNN-based melanoma classification.
The remainder of this paper is organized as follows.
Section 2 presents the publicly available PH2 and Derm7pt datasets, which provide dermoscopic images together with clinical metadata, as well as the feature extraction process and the LM-IGTD transformation framework. Experimental results are reported in
Section 3, followed by a discussion in
Section 4 and concluding remarks in
Section 5.
2. Materials and Methods
This section describes the datasets, feature extraction procedures, and tabular-to-image transformation framework employed in this study. An overview of the complete methodology is presented in
Figure 1, summarizing the main processing stages of the proposed pipeline, from tabular data construction to CNN-based melanoma classification.
First, patient-level clinical metadata and lesion-level statistical features are extracted from the PH2 and Derm7pt datasets and represented as tabular data. These tabular representations are subsequently analyzed either independently or jointly, depending on the experimental setting. Next, the tabular representations are transformed into two-dimensional grayscale images using the LM-IGTD framework. This process incorporates feature ranking, pixel arrangement optimization, and noise-based feature augmentation when required. Finally, the generated images are used as input to CNN-based models for melanoma classification.
2.1. Dataset Description
This study uses two publicly available dermoscopy datasets, PH2 [
30] and Derm7pt [
31], both of which provide dermoscopic images together with clinical metadata.
The PH2 dataset consists of 200 dermoscopic images acquired at the Dermatology Service of Hospital Pedro Hispano (Matosinhos, Portugal), including 160 benign lesions—80 common nevi and 80 atypical nevi—and 40 melanomas [
30]. In this study, common and atypical nevi are grouped into a single not melanoma class, while all melanoma cases are assigned to the melanoma class, resulting in a binary classification scenario. PH2 provides metadata associated with dermoscopic criteria, including lesion colors, asymmetry, pigment network, dots and globules, streaks, regression areas, and blue-whitish veil.
The Derm7pt dataset contains over 2000 dermoscopic and macroscopic images, accompanied by metadata derived from the seven-point checklist and additional clinical information [
31]. This study focuses only on the 1011 dermoscopic images available in the dataset. Lesions are grouped into a binary classification setting. Melanoma-related categories are merged into a single melanoma class, while the remaining lesion types are assigned to the not melanoma class, resulting in 252 melanoma and 759 not melanoma samples, respectively. The associated metadata include dermoscopic criteria (pigment network, streaks, pigmentation, regression structures, dots and globules, blue-whitish veil, and vascular structures), as well as clinical and lesion-related features such as diagnostic difficulty, elevation, anatomical location, sex, and management.
Table 1 summarizes the main characteristics of the tabular metadata used in this study, which consist of low-dimensional, clinically interpretable features.
2.2. Image Feature Extraction
To obtain clinically and statistically meaningful descriptors, lesion-level statistical features were extracted from dermoscopic images. These features aim to capture complementary aspects of lesion morphology, color distribution, and texture patterns that are routinely assessed by dermatologists during visual examination [
32]. Following established practice in dermoscopic image analysis, we considered geometric features, color features, and a diverse set of local and global texture descriptors [
33,
34].
Geometric features were derived using the ABCD rule [
35], which encodes asymmetry, border irregularity, color variation, and lesion diameter—criteria closely related to melanoma risk assessment. Color information was characterized using multiple color spaces, including RGB, HSV, CIE L*a*b, CIE L*u*v, and YCrCb. For each color channel, first-order statistical moments were computed to describe the distribution and variability of pigmentation within the lesion.
Texture features were extracted to characterize spatial variations and structural patterns within the lesion region. A broad range of statistical and signal-processing approaches was employed to characterize texture. These included first- and second-order statistics, run-length and size-zone matrices, wavelet-based representations, and local pattern descriptors [
36]. These methods quantify heterogeneity, regularity, and spatial organization of pixel intensities, properties shown to be informative for melanoma detection.
Table 2 provides an overview of the extracted feature categories, the corresponding techniques, and their dimensionality, together with representative studies illustrating the use of these descriptors in dermoscopic image analysis and melanoma detection.
Overall, the resulting feature set constitutes a high-dimensional tabular representation capturing complementary low-level characteristics of skin lesions. This representation is suitable for tabular-to-image transformation using the LM-IGTD framework, which supports both high- and low-dimensional heterogeneous data. A more detailed description of the feature extraction procedures and associated descriptors can be found in our previous work [
46].
2.3. Tabular-to-Image Transformation Using LM-IGTD
LM-IGTD [
29] is a tabular-to-image transformation framework that converts tabular data into two-dimensional grayscale image representations. It builds upon the original IGTD [
22] method and extends it to low-dimensional and mixed-type datasets frequently encountered in clinical and biomedical domains.
IGTD permutes features by assigning them to pixel positions so that statistically similar features are placed close to each other. LM-IGTD further enhances this idea by incorporating additional mechanisms, including type-aware similarity measures for mixed data and noise-based feature augmentation. These extensions address challenges related to heterogeneous feature types and the limited dimensionality of clinical datasets.
A central limitation when applying tabular-to-image transformations to clinical metadata lies in the low dimensionality of many real-world datasets. When the number of available features is small, the resulting image representations present low resolution. This limits the ability of CNN-based models to exploit spatial patterns [
14,
61]. LM-IGTD explicitly addresses this issue by incorporating a stochastic noise-based feature augmentation mechanism, which increases the dimensionality of the tabular input prior to image generation while preserving its underlying statistical structure.
The augmentation mechanism generates a feature matrix
, where
n represents the number of samples,
d is the number of original features, and
denotes the total number of noisy features added. For each continuous feature
(where
i denotes the sample index and
j the feature index),
m noisy features
are generated through the Gaussian noise mechanism:
where
is the
m-th noisy feature created for sample
i and feature
j, and
represents Gaussian noise sampled from a normal distribution with zero mean and variance
, corresponding to the empirical variance of the original feature
j. The scaling factor
controls the noise power and determines the magnitude of the perturbation applied to
.
Similarly, for categorical features, the
m-th noisy feature
for a specific sample
i is defined via a swap-noise mechanism:
where
is the original category of sample
i in feature
j, and
is a value randomly selected from a different sample
k (
) within the same column
j. This swap mechanism preserves the original category frequencies. The swap probability
is sampled by the same noise power
.
LM-IGTD supports both Homogeneous Noise Generation (HoNG) and Heterogeneous Noise Generation (HeNG). Let d denote the number of original features and m the augmentation factor (the number of noisy features generated per original feature). In HoNG, the noise generator produces exactly noisy features, which are directly concatenated to the original feature matrix. In contrast, HeNG introduces variability in the augmented representation. A candidate pool of noisy features (up to ) is first generated. Subsequently, the number of noisy features is randomly sampled from a set of feasible sizes not exceeding , and only the corresponding subset is concatenated to the original feature matrix. In our experimental evaluation, different augmentation levels were explored for low-dimensional clinical metadata by generating noisy features per original feature, in order to control the effective dimensionality of the resulting tabular representations.
To identify the optimal noisy feature set among
stochastically generated candidate datasets, denoted as
(where each
represents a candidate augmented matrix obtained through stochastic noise generation), we implemented the unsupervised selection mechanism detailed in [
29]. The selected optimal candidate is subsequently denoted as
and used for image generation. This procedure ensures that the noisy features preserve the intrinsic structure of the original clinical dataset through the following steps:
Mixed-type metric computation: We first computed the pairwise dissimilarities between samples for each candidate dataset
. We utilized the Gower distance [
62], as it ensures consistent handling of the mixed numerical and categorical features present in the clinical metadata.
Spectral structure analysis: To capture the underlying data manifold, we applied spectral clustering [
63]. Specifically, we constructed an affinity matrix
S, where each element
measures the similarity between samples
and
from the candidate dataset. The affinity was computed using a Gaussian kernel
, where
denotes the Euclidean distance between samples
i and
j, and
is the kernel bandwidth controlling the neighborhood scale. Let
D denote the diagonal degree matrix with entries
, and let
I be the identity matrix. The normalized Laplacian is then defined as
. The eigenvectors associated with the
k smallest eigenvalues of
form a low-dimensional embedding where
k-means is applied to identify clusters.
Selection criterion based on cluster validity: The quality of the resulting partitions was quantified using the Silhouette Coefficient (SC) [
64]. To ensure that the augmentation does not distort the original data patterns, the number of clusters
k was fixed based on the original data
(selecting the
that maximized SC for
). This fixed
k was then used to evaluate all candidates
, and the dataset
with the highest SC was selected for image generation.
In the present PoC study, noise-based feature augmentation is applied exclusively to low-dimensional clinical metadata. In contrast, statistical features extracted from dermoscopic images are inherently high-dimensional and naturally yield image representations with an adequate number of pixels. Consequently, these are transformed directly without the introduction of additional synthetic features.
Once the augmented tabular representation is obtained (when applicable), the image generation process proceeds as follows. In LM-IGTD, each tabular sample is mapped to a grayscale image in which each pixel corresponds to a single input feature. The pixel intensity represents the normalized feature value for that sample. The spatial arrangement of features within the image is determined through an optimization process that seeks to preserve intrinsic relationships among features.
To this end, LM-IGTD computes two ranking matrices. First, a feature ranking matrix (
) is constructed by calculating pairwise dissimilarities between features. Spearman [
65] correlation (
) is used for numerical–numerical pairs, point-biserial [
66] correlation (
) for numerical–binary pairs, and Phik [
67] correlation (
) for pairs involving categorical features. All pairwise correlation coefficients
C (whether
,
, or
) are uniformly converted to dissimilarities via
and subsequently ranked to yield
.
In parallel, a pixel ranking matrix (
) is defined based on the pairwise distances between pixel locations on the two-dimensional image grid, computed using distance measures (Euclidean for numerical data or Gower [
68] for mixed types) and then ranked.
The optimization procedure iteratively permutes the feature assignment to minimize the structural difference between the feature and pixel ranking matrices. Formally, the algorithm seeks to minimize the squared error function
E:
where
denotes the total number of features. Here,
represents the rank of the dissimilarity between the
i-th and
j-th features, while
denotes the rank of the spatial distance between the pixel locations assigned to those features. This alignment minimizes the discrepancy between statistical and spatial structures, ensuring that correlated features are mapped to neighboring pixels.
3. Results
This section presents the experimental results organized by data modality (clinical metadata, statistical features, and their fusion). For each modality, we compare the performance of traditional machine learning (ML) models trained directly on tabular data with CNN-based models trained on LM-IGTD-generated image representations. Results are reported for both the PH2 and Derm7pt datasets.
3.1. Experimental Setting
For each dataset, samples were randomly split into training (80%) and test (20%) sets, with 15% of the training data further reserved for validation and used for hyperparameter tuning and early stopping. All experiments were repeated five times using different random seeds for the training–test split, and results are reported as averages across runs. Given the class imbalance present in both datasets, random undersampling was applied to the training data only prior to model fitting. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity [
69].
For tabular baselines, the evaluated ML models included Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Least Absolute Shrinkage and Selection Operator (LASSO) [
69,
70]. Hyperparameter tuning for these models was performed using 5-fold cross-validation on the training set. The hyperparameter ranges are summarized in
Table 3.
The LM-IGTD augmentation factor,
m (number of synthetic noise features generated per original feature), was explored within the search space
as part of an ablation study. This range was defined based on prior LM-IGTD-based literature [
29,
71] and to systematically evaluate the trade-off between increasing spatial resolution and preserving the relative contribution of the original clinical features.
The CNN architecture and training process were optimized for each dataset and modality using Bayesian optimization [
72] combined with HyperBand [
73]. This strategy explores the search space by combining probabilistic sampling with multi-fidelity scheduling. Hyperparameters such as convolutional filters, kernel and pooling sizes, dense units, dropout rates, learning rate, optimizer, and image resolution were tuned (see
Table 3). Each configuration underwent 50 trials to select the best model based on validation loss. Final architectures and hyperparameters for all datasets and modalities are listed in the
Supplementary Material (Table S1).
3.2. Results Using Clinical Metadata
This subsection reports the results obtained using clinical metadata features described in
Section 2.1 (
Table 1). Due to the low dimensionality of metadata in both datasets, LM-IGTD was combined with noise-based feature augmentation to generate image representations suitable for CNN-based learning. Both HoNG and HeNG noise generation strategies were evaluated using different augmentation levels, considering the addition of 3, 5, and 7 synthetic features per original feature.
Table 4 summarizes the classification performance obtained on the PH2 and Derm7pt datasets using clinical metadata.
Results on the PH2 dataset show increased variability across noise configurations, which can be attributed to the smaller dataset size and the limited number of available metadata features. In this setting, CNN-based models display a noticeable trade-off between sensitivity and specificity, especially for HeNG, which often favors higher specificity at the expense of sensitivity. Nevertheless, the overall performance trends remain coherent across runs, indicating that LM-IGTD can be applied to low-dimensional clinical metadata even in small-sample scenarios.
For the Derm7pt dataset, CNNs trained on LM-IGTD image representations show stable performance across configurations. Although AUC values remain below those of tabular baselines, models with HeNG noise augmentation are more consistent than those without. This suggests that controlled augmentation helps mitigate limitations associated with low-dimensional metadata.
Overall, these results support the feasibility of transforming low-dimensional dermatological metadata into image representations using LM-IGTD, enabling the use of CNN-based models in settings where traditional tabular learning approaches remain dominant.
3.3. Results Using Statistical Features
This subsection reports the results obtained using statistical features extracted from dermoscopic images. In contrast to clinical metadata, these features form a high-dimensional tabular representation composed of numerous handcrafted descriptors capturing geometric, color, and texture-related characteristics of skin lesions. As the original feature dimensionality is sufficient to generate image representations with adequate spatial resolution, LM-IGTD was applied without noise-based feature augmentation.
Table 5 summarizes the classification performance obtained on the PH2 and Derm7pt datasets using statistical features. For both datasets, traditional ML models trained directly on tabular data provide strong baseline performance, particularly in terms of sensitivity. In contrast, CNN-based models trained on LM-IGTD-generated image representations consistently achieve higher specificity. Although AUC values are comparable across approaches, the two modeling paradigms exhibit a systematic trade-off. Tabular models tend to achieve higher sensitivity, whereas CNN models favor higher specificity.
On the PH2 dataset, CNN-based models achieve comparable or improved performance in terms of AUC and specificity relative to tabular baselines, despite the limited sample size. Notably, the increase in specificity is more pronounced than in Derm7pt, indicating that LM-IGTD-based representations may be particularly effective in small-sample settings when the original feature space is high-dimensional.
For the Derm7pt dataset, CNN-based models trained on LM-IGTD-generated image representations exhibit an increase in specificity compared to tabular baselines, while traditional ML models achieve higher sensitivity. This behavior suggests that the spatial encoding induced by LM-IGTD leads CNNs to learn more conservative decision boundaries, favoring the reduction of false positive predictions when learning from high-dimensional statistical descriptors. Overall, these results demonstrate that LM-IGTD can effectively transform high-dimensional dermatological statistical features into image representations suitable for CNN-based learning. This supports its applicability beyond low-dimensional clinical metadata and motivates its use in subsequent multimodal fusion experiments.
3.4. Results Using Fused Clinical Metadata and Statistical Features
This subsection reports the results obtained using the fusion of clinical metadata and statistical features. Feature fusion was performed by concatenating clinical metadata and statistical features at the tabular level, followed by either direct learning using traditional ML models or transformation into image representations using LM-IGTD for CNN-based classification. Given the high dimensionality of the fused feature space, noise-based feature augmentation was not applied.
Table 6 summarizes the classification performance obtained on the PH2 and Derm7pt datasets using fused features. For both datasets, traditional ML models trained on the fused tabular representation provide strong baseline performance, achieving improved or comparable results relative to models trained on individual feature modalities.
On the Derm7pt dataset, tabular fusion leads to improved performance compared to using statistical features alone, while achieving performance comparable to that obtained using clinical metadata, particularly for LASSO-based models. CNN-based models trained on fused image representations achieve stable but lower performance, indicating that, in large datasets with rich tabular information, direct learning on fused tabular features remains more effective than image-based representations in this setting.
In contrast, results on the PH2 dataset highlight the benefit of feature fusion in small-sample settings. Tabular models trained on fused features achieve the highest AUC and sensitivity among all evaluated configurations. CNN-based models trained on fused image representations exhibit higher specificity but increased variability in sensitivity, reflecting the combined effect of limited sample size and high feature dimensionality.
Overall, these results indicate that feature fusion enhances classification performance when complementary sources of tabular information are combined. While traditional ML models remain strong baselines for fused tabular data, LM-IGTD-based image representations provide an alternative representation for integrating heterogeneous dermatological features within an image-based learning framework.
Figure 2 provides an overview of the comparative performance across data modalities, including clinical metadata, statistical features, and their fusion. For each modality, the figure reports the AUC achieved by the best-performing tabular ML models and the best-performing LM-IGTD-based CNN models for the Derm7pt and PH2 datasets. Overall, the figure illustrates how classification performance varies across data modalities and modeling approaches. While traditional tabular models generally achieve strong AUC values, LM-IGTD-based CNN models provide competitive results in several configurations, highlighting their feasibility as an alternative representation strategy depending on the feature dimensionality and dataset size.
Figure 2 compares performance across data modalities—clinical metadata, statistical features, and their fusion—showing AUCs for the best tabular ML and LM-IGTD-based CNN models on the Derm7pt and PH2 datasets. While tabular models generally achieve higher AUCs, LM-IGTD CNNs remain competitive in several configurations, demonstrating their potential as an alternative representation strategy depending on feature dimensionality and dataset size.
To assess statistical significance, we conducted two-sided Wilcoxon signed-rank tests [
74] comparing the best LM-IGTD CNN models with the strongest tabular ML baselines across datasets and modalities. Tests were performed for AUC, sensitivity, and specificity. All resulting
p-values were above
, indicating no statistically significant differences. To complement hypothesis testing, we estimated Cohen’s
d effect sizes [
75] and 95% confidence intervals for the mean paired differences using a Student’s
t-distribution [
76]. The results suggest modest, dataset-dependent trends that mainly reflect a sensitivity–specificity trade-off rather than a consistent advantage of either approach. Detailed values are reported in
Supplementary Material Table S2.
Furthermore, to complement the scalar performance metrics, we included Receiver Operating Characteristic (ROC) curve visualizations (see
Figure 3), which compare the best CNN-based LM-IGTD models with the strongest tabular ML baselines across datasets and modalities. The figure is organized into six panels: the top row corresponds to the PH2 dataset (metadata, statistical features, and fusion), and the bottom row to the Derm7pt dataset. This layout enables a direct visual comparison of discriminative behavior across modalities and datasets.
For the PH2 dataset, the ROC curves reveal similar discriminative behavior across modalities, with neither approach consistently outperforming the other. In some modalities—particularly statistical features—both curves overlap substantially, while in the fusion setting the curves intersect, reflecting different sensitivity–specificity trade-offs rather than a consistent dominance of either approach. For the Derm7pt dataset, tabular ML models tend to achieve higher true-positive rates at low false-positive rates, especially in the metadata and statistical feature modalities. Nonetheless, LM-IGTD–CNN models remain competitive and converge toward similar performance levels at higher thresholds.
Overall, the ROC analysis confirms that LM-IGTD preserves the discriminative structure of the original tabular data and does not introduce systematic performance degradation, supporting its feasibility as an alternative representation for CNN-based learning.
3.5. Visual Examples of LM-IGTD Image Representations
To enhance interpretability,
Figure 4 presents LM-IGTD-generated images for not melanoma and melanoma cases across both datasets and modalities. Although these images do not correspond to natural visual structures, distinct spatial organization patterns can be observed between classes.
For clinical metadata (panels a and d), the representations exhibit structured block-like patterns resulting from the deterministic mapping of heterogeneous features. Differences between not melanoma and melanoma cases manifest as variations in the location and intensity of activated regions, reflecting underlying feature differences. For statistical features (panels b and e), which involve higher-dimensional descriptors, the generated images display more granular and texture-like patterns. Melanoma cases tend to exhibit more heterogeneous intensity distributions compared to not melanoma cases, suggesting richer inter-feature interactions in the encoded space.
In the fusion representations (panels c and f), characteristics from both modalities are jointly embedded. The resulting spatial patterns combine block-like and fine-grained textures, indicating that LM-IGTD preserves complementary information when integrating clinical metadata and statistical features. Overall, these qualitative examples illustrate how LM-IGTD transforms tabular data into spatially organized representations in which class-dependent differences remain observable, supporting convolutional processing.
4. Discussion
This work presents a PoC study investigating the feasibility of representing clinically meaningful dermatological tabular data as two-dimensional image-like structures using the LM-IGTD framework, and subsequently applying CNNs for melanoma classification. Rather than aiming to outperform established image-based or tabular learning frameworks, the primary objective of this study was to assess whether tabular-to-image transformations constitute a valid and coherent representation strategy for heterogeneous dermatological data sources.
The experimental results demonstrate that LM-IGTD can effectively encode both low-dimensional clinical metadata and high-dimensional statistical features into image representations suitable for CNN-based learning. Across both the Derm7pt and PH2 datasets, CNN models trained on LM-IGTD-generated images exhibited stable and consistent performance trends, supporting the hypothesis that tabular dermatological data can be mapped to a spatial domain while preserving discriminative information. This observation is particularly relevant in dermatology, where tabular clinical descriptors and derived lesion features are commonly used alongside visual assessment.
For clinical metadata, which typically consist of a limited number of heterogeneous features, the incorporation of noise-based feature augmentation proved useful in stabilizing the generated image representations and facilitating CNN learning. In particular, heterogeneous noise generation led to more consistent performance in the larger Derm7pt dataset, suggesting that controlled stochastic augmentation can mitigate the limitations imposed by low feature dimensionality. These findings align with the design principles of LM-IGTD and support its applicability to clinical metadata commonly encountered in dermatological practice.
While noise-based augmentation is useful for low-dimensional metadata, it may introduce spurious patterns or artificial correlations if synthetic features are not carefully controlled. However, the augmentation strategy implemented in LM-IGTD is not based on arbitrary noise injection. Instead, it is type-aware and structure-preserving: Gaussian perturbations are applied to numerical features, while swap-based noise is used for categorical attributes, ensuring that the statistical nature of each feature is maintained. Importantly, this process is fully unsupervised and does not rely on class labels. Furthermore, augmented candidate datasets are not accepted indiscriminately. Multiple stochastic candidates are generated and evaluated using a structure-preserving selection pipeline based on cluster validity criteria (Silhouette Coefficient). Only the candidate that best preserves the intrinsic geometry of the original data is selected for image generation. This step mitigates the risk of retaining destructive or misleading synthetic patterns. Finally, the ablation study exploring augmentation factors (3, 5, and 7 synthetic features per feature) empirically demonstrates that excessive augmentation leads to performance degradation rather than artificial improvement. This observed signal dilution effect further supports that LM-IGTD does not benefit from uncontrolled dimensional inflation, reinforcing the controlled nature of the augmentation mechanism.
In contrast, statistical features extracted from dermoscopic images are inherently high-dimensional and naturally support the construction of image representations without additional augmentation. In this context, LM-IGTD-based CNN models achieved performance levels comparable to traditional tabular baselines, while exhibiting a systematic increase in specificity in several experimental configurations. This behavior suggests that the spatial organization induced by LM-IGTD enables CNNs to learn more conservative decision boundaries, potentially reducing false positive predictions.
Consistent with prior work [
29,
71], traditional ML models trained directly on tabular data remain strong baselines across all evaluated scenarios, particularly for low-dimensional clinical metadata. Importantly, the goal of this study is not to replace such models, but to explore an alternative representation paradigm that enables the use of DL architectures in contexts dominated by tabular data. The fact that LM-IGTD-based CNN models achieve competitive and coherent performance—despite the additional representation step—constitutes an important observation, demonstrating that the tabular-to-image transformation preserves relevant information and supports stable learning behavior.
From a methodological perspective, an important question concerns the mechanism through which LM-IGTD enables CNN-based learning on tabular data. Its effectiveness can be interpreted in terms of inductive bias alignment. CNNs exploit local spatial correlations through weight sharing and localized receptive fields. By reorganizing tabular features according to their statistical relationships and placing correlated features in spatial proximity, LM-IGTD converts abstract feature dependencies into local neighborhoods. This spatial encoding allows convolutional filters to model inter-feature interactions that would otherwise require high-capacity fully connected layers, introducing a form of structural regularization that constrains the hypothesis space. In low-dimensional settings, the feature-based augmentation further enriches the representation space, mitigating sparsity and supporting more stable learning behavior. Thus, LM-IGTD does not simply reshape tabular data; it embeds statistical dependencies and controlled variations into spatial structures, enabling CNN architectures to leverage their inherent inductive biases in a coherent manner.
An additional contribution of this work lies in the exploration of feature-level fusion through tabular-to-image transformation. By concatenating clinical metadata and statistical features prior to image generation, this study introduces a unified representation in which heterogeneous sources of information are jointly embedded within a single spatial structure. This approach can be interpreted as an implicit fusion mechanism operating entirely within the tabular domain, allowing CNNs to model interactions between clinical and lesion-level features without requiring raw image data at inference time. The fusion experiments indicate that this unified encoding is feasible and yields coherent performance trends across datasets, even though direct tabular learning remains highly effective.
Beyond the specific CNN architectures evaluated in this study, an important implication of the proposed tabular-to-image representation is that it enables the use of a broader class of DL models originally developed for visual data. Once tabular clinical information is encoded as a structured image, more advanced architectures such as residual [
77] and densely connected networks [
78], as well as more recent attention-based and transformer-based models [
79,
80], can be directly applied without requiring fundamental changes to the input representation. From a PoC perspective, the competitive performance obtained with relatively simple CNN architectures suggests that further gains may be achieved using more expressive models capable of capturing long-range spatial dependencies and higher-order feature interactions [
81].
Another promising direction involves integrating LM-IGTD-generated images with raw dermoscopic images in multimodal learning frameworks. In this setting, tabular-derived representations could act as a complementary visual modality encoding clinical metadata and lesion descriptors. This opens the possibility of jointly modeling tabular clinical information and visual lesion characteristics using unified image-based architectures, potentially enhancing robustness and interpretability in dermatological decision-support systems. While the present study focuses on establishing feasibility, the proposed framework naturally lends itself to multimodal extensions that combine generated and acquired images.
Although this study focuses on melanoma classification, the LM-IGTD framework is inherently domain-agnostic and can be adapted to other medical tasks involving heterogeneous tabular data. Previous validations of the framework have demonstrated its applicability in distinct clinical domains, including diabetes, hepatitis [
29], and cardiovascular risk prediction [
71], confirming its ability to preserve feature correlations across diverse biomedical contexts. In broader medical imaging scenarios, this representation strategy could support multimodal learning settings. For example, in breast cancer analysis, tabular clinical features (e.g., demographic factors or molecular markers) are often analyzed alongside mammography [
82] or histopathology images [
83]. Transforming such tabular descriptors into spatially organized representations may facilitate their integration within convolutional pipelines using unified image-based architectures. Empirical evaluation in these domains remains future work, but the methodological principles of LM-IGTD are directly transferable.
The present study is subject to certain constraints that are inherent to its PoC nature. The experimental evaluation was conducted on two publicly available datasets, with relatively limited sample sizes—particularly in the case of PH2—and no external validation cohort was considered. In addition, the CNN architectures explored were kept relatively simple, as the emphasis of this work was placed on assessing the feasibility of the proposed representation rather than on architectural optimization. Finally, the analysis was limited to binary melanoma classification using tabular features derived from dermoscopic datasets. These aspects define the scope of the present study and naturally motivate future research involving larger and more diverse cohorts, external validation, more advanced network architectures, and extended clinical scenarios.
Overall, this PoC study demonstrates that LM-IGTD-based tabular-to-image representations provide a feasible framework for encoding heterogeneous dermatological data. By enabling CNN-based learning on clinical metadata, statistical features, and their fusion, this work supports future research exploring more advanced architectures and multimodal strategies for dermatological decision support.
5. Conclusions
This work presents a PoC study evaluating the feasibility of representing heterogeneous dermatological tabular data as two-dimensional image-like structures using the LM-IGTD framework. The study assesses whether such tabular-to-image transformations provide a stable representation for clinically relevant dermatological data.
The experimental results show that LM-IGTD can encode both low-dimensional clinical metadata and high-dimensional statistical features into structured image representations while preserving discriminative information. CNN models trained on these representations exhibited consistent performance across datasets and modalities, indicating stable learning behavior. Although traditional ML models trained directly on tabular data remain strong baselines, the competitive performance of LM-IGTD-based CNNs supports the feasibility of the proposed representation strategy.
Beyond individual data modalities, the results demonstrate that feature-level fusion through tabular-to-image transformation is feasible. This enables the joint representation of clinical metadata and statistical features within a unified spatial structure, supporting future multimodal dermatological AI systems. By converting non-visual clinical data into a compatible spatial format, the framework facilitates integration with dermoscopic images in unified deep learning architectures. This reduces the need for complex fusion mechanisms and allows models to jointly learn from visual lesion characteristics and patient-level metadata.
Overall, this PoC study demonstrates that LM-IGTD-based tabular-to-image representations provide a feasible approach for encoding heterogeneous dermatological tabular data. By offering a unified image-based representation, this work supports future research on multimodal strategies combining tabular-derived images with medical imaging data, with potential benefits for dermatological decision-support systems.