1. Introduction
Understanding drug response is essential for the study of cancer treatment because determining the efficacy of personalized therapy directly is crucial. Evaluating drug responses allows clinicians to distinguish which kinds of drug treatment regimens will be more advantageous for a particular patient, and through this approach, they can better tailor drug selection, dosages, and combination regimens to enhance a patient’s therapeutic response [
1,
2]. Studies on drug responses have elucidated the mechanisms underlying treatment resistance in cancer, thus providing a scientific basis for the development of novel anti-resistance therapies and advancing the implementation of precision medicine in oncology [
3].
Although tremendous achievements have been made in research on drug responses, much remains unknown, such as the interindividual differences leading to varied therapeutic outcomes [
4,
5,
6]. Additionally, the processing and interpretation of large-scale multi-omics data (e.g., genomics, transcriptomics, and proteomics) to extract clinically relevant information remain major challenges [
7]. To overcome these difficulties, recent studies have focused on how to integrate large-scale datasets to characterize cell heterogeneity and drug responses using integration methods. For example, Hao et al. assembled an extensive single-cell dataset by integrating data from all publicly accessible single-cell resources, aiming to develop a foundation model, scFoundation, designed to decode the “language” of cells. The decoder of scFoundation generated gene-level contextual embeddings and demonstrated strong performance across various tasks, including read-depth enhancement, drug response and sensitivity prediction at the single-cell level, perturbation modeling, and cell type annotation [
8]. While foundation models such as scFoundation offer computational insights into cellular behavior, experimental studies aim to directly associate single-cell transcriptomes with drug responses. By integrating single-cell transcriptomics with high-throughput drug screening in high-grade serous ovarian cancer, Dini et al. applied a 96-plex scRNA-seq approach to analyze the responses of 36,000 cells to 45 drugs, representing 13 mechanisms of action [
9].
Predicting changes in gene expression levels upon drug perturbation is important for revealing the biological mechanisms underlying different responses to drugs, including their sensitivity, resistance, and side-effects [
10,
11,
12]. In contrast to static transcriptome profiling, which reflects only the final state of drug response, predictive models help in understanding the dynamic regulation leading to the establishment of drug response phenotypes. By predicting changes in specific pathways and genes after treatment with drugs, early-stage modifications that impact the fate of various therapeutic responses can be identified. Accordingly, accurately predicting drug outcomes has the capacity to improve knowledge of heterogeneity in drug responses and promote the development of precision oncology.
Deep learning based on neural network architectures can combine multiple omics datasets and process vast and complex data to discover detailed patterns and associations that have been difficult to find with ordinary approaches [
13,
14,
15]. Accordingly, deep learning is capable of dissecting drug responses, particularly in predicting outputs from drug response models. Moreover, the use of these methods provides a more comprehensive understanding of drug responses. Through many studies, deep learning has made significant contributions to the large-scale data mining of drug responses. For example, Zheng et al. developed a computational framework, UNAGI, to model the temporal dynamics of cellular behavior during complex disease progression. This tool enables a detailed analysis of gene regulatory networks and cellular transitions, offering insights into potential therapeutic targets and drug candidates for idiopathic pulmonary fibrosis. By simulating compound responses and disease trajectories, UNAGI facilitates drug repositioning and mechanistic investigation [
16].
Lung cancer causes approximately 20% of all cancer deaths and continues to be the main cause of death from cancer worldwide [
17,
18]; its advanced heterogeneity has led to many problems that have prevented the identification of effective therapeutic options. Lung adenocarcinoma (LUAD) represents the most prevalent histological subtype of lung cancer and continues to impose a substantial global health burden. Despite advances in targeted therapy and immunotherapy, recent clinical studies have highlighted the persistent heterogeneity in treatment response and prognosis among patients. For instance, contemporary investigations evaluating pembrolizumab-based combination strategies have demonstrated variable therapeutic benefits depending on molecular and clinical characteristics, underscoring the complexity of precision treatment for LUAD [
19]. In parallel, emerging biomarker studies have explored novel molecular indicators for prognosis and immunotherapy responses, yet robust and universally applicable predictive markers remain limited [
20]. These findings collectively emphasize that the therapeutic response in lung cancer patients is highly individualized and difficult to predict using conventional clinical parameters alone. The reliable prediction of patient-specific responses is crucial for guiding individualized treatments. One reason for this is that compared with most other cancers, lung cancer has a relatively large number of public databases that contain detailed gene expression profiles as well as annotated drug responses, which makes it the perfect field to explore using deep learning models.
In this study, we present a Generative Adversarial Network (GAN) framework, GRIP-Lung (Generative Model of Response to Drug-Induced Perturbation in Lung Cancer), for predicting drug responses in lung cancer, optimized through a systematic evaluation of various generator and discriminator architectures. By applying this model to lung cancer cell gene expression data, we predict drug response profiles following treatment. The model accurately captures gene expression changes induced by drug perturbation and demonstrates good robustness in inferring biological functions at the cellular level. Furthermore, the model is flexible enough to dissect drug-induced perturbations across gene expression in bulk patient RNA-sequencing datasets. This capability enables the identification of key genes that are crucial for drug responses in lung cancer, providing valuable insights into the molecular mechanisms underlying treatment efficacy. Building on this capability, we define six drug response states using ssGSEA to analyze key biomarker transcriptomes for each post-treatment state. We then present a framework that can differentiate among a wide range of drug responses based on RNA-seq data. Integrating GAN-based drug response prediction with gene expression information could help improve personalized cancer therapy and inform treatment strategies for lung cancer. The package and source code are available on GitHub at
https://github.com/XieHB-lab/GRIP-Lung (accessed on 1 April 2026).
3. Discussion
In this study, we constructed a drug response prediction model based on a GAN and its principles and established complete and comprehensive evaluation metric methods to test it. The results indicate that the drug response prediction models we developed have high prediction accuracy and strong stability. These results offer insight into the drug response patterns of lung cancer cells and support a new approach for identifying important genes and pathways related to drug response.
While providing accurate overall predictions, this model also provides biologically plausible evidence of the potential mechanisms governing the responses to different drugs. For instance, when erlotinib was used to treat lung cancer cell lines, our prediction model revealed that the genes affected by treatment were involved mainly in cell-cycle regulation. This has been documented by other studies in which the stimulation of such pathways in response to erlotinib inhibits cell proliferation and survival.
Case-specific interpretability suggests that GRIP-Lung can be used in more applications than merely for prediction; it can be further employed as an inference tool to contribute greatly to the study of drug mechanisms. Thus, GRIP-Lung improves clinicians’ judgment regarding therapy selection for lung cancer patients. The GRIP-Lung generator employs a residual network as its main feature extractor, allowing it to perform well with regard to predicting the effects on gene expression when tested using synthetic and real-world compound data. The integration of cell line and drug-specific information into the initial representations learned by the network allows the more adequate capture of sample-specific variance in biological responses. Traditional model approaches work on individual gene expression in isolation from the expression of other genes, whereas our approach explores their interconnectedness. This architecture has an embedded aux condition, which makes this neural network adaptable and interpretable. The ResNet-based encoder can model complex nonlinear dependencies between genes effectively, which is convenient for generating heterogeneous omics datasets, mainly without any a priori domain-specific knowledge about certain gene interactions or their combinations. Residual connections can solve the vanishing gradient problem such that the deep network can be trained more smoothly. These properties enable the ability of GRIP-Lung to capture complex transcriptional perturbation patterns, and in combination with a lightweight MLP discriminator, GRIP-Lung can achieve stable adversarial training to distinguish real from generated gene expression profiles for better prediction; thus, GRIP-Lung is a strong candidate tool for precision medicine, as the major point of inference of transcriptional response should be highly accurate no matter how the cells or drugs are transformed under different conditions.
On the basis of the ssGSEA framework, drug-induced transcriptional changes can be summarized into six key cellular states, allowing the functional comparison of varying heterogeneous samples while moving away from a binary approach of effective versus ineffective. Nevertheless, these states fail to entirely cover the entirety of cancer biology, as important processes such as metabolic reprogramming or EMT are lacking. In addition, several of these states contain overlapping pathways, leading to uncertainty in categorizing the data obtained from the network analysis process.
GRIP-Lung also provides a transcriptome-driven framework for computational drug repositioning by modeling drug-induced state transitions. By predicting post-treatment gene expression profiles from baseline states and quantifying functional program shifts using GSEA, the model enables the prioritization of compounds that drive tumor cells toward favorable biological states, such as enhanced programmed cell death or suppressed resistance pathways. Unlike approaches based primarily on chemical similarity or predefined targets, this strategy evaluates drugs according to their systems-level regulatory impact on cancer-associated gene programs. Although the present implementation is restricted to a limited number of lung cancer cell lines and compounds, the framework is readily extensible to larger pharmacogenomic datasets and broader drug libraries. At the current stage, the framework supports transcriptome-driven computational drug repositioning by prioritizing candidate compounds on the basis of predicted state transitions. It should be regarded as a hypothesis-generating tool that guides subsequent experimental and clinical validation rather than as definitive clinical repurposing evidence.
Although favorable performance was achieved during the evaluation of this model, several limitations still exist. First, the present model is insufficient for considering the importance of the tumor microenvironment in influencing drug responses, such as those involving infiltrating immune cells and their intercellular interactions, which can strongly impact the effectiveness of therapies. Importantly, GRIP-Lung was trained exclusively on cancer cell line RNA-seq data that lacked a complete TME context. Although moderate concordance was observed between the predicted transcriptional changes and patient-derived bulk RNA-seq profiles, bulk tumor samples inherently contain both malignant and microenvironmental components. Therefore, this concordance likely reflects the ability of the model to capture tumor cell-intrinsic transcriptional responses rather than TME-mediated effects. TME-associated processes, such as immune infiltration, stromal signaling, or hypoxia-driven transcriptional adaptation, are not explicitly modeled in the current framework and represent important limitations. In addition, drug dosages and duration of exposure were not incorporated into the model. These pharmacological parameters can significantly influence transcriptional outcomes and may contribute to variability in clinical drug response.
Another limitation concerns generalization across drugs and cellular contexts. GRIP-Lung relies on learned drug embeddings derived from compounds included in the training set; therefore, it cannot directly predict responses to entirely unseen drugs without retraining. Although the model may extrapolate to transcriptionally similar lung cancer cell lines within the training distribution, its performance in more divergent contexts remains uncertain. Future work could incorporate structure-based drug representations, such as SMILES-derived embeddings, to improve the generalizability to novel compounds, requiring expanded and more diverse training data. Furthermore, autophagy and senescence play dual roles in cancer. Protective autophagy may increase tumor survival under therapeutic stress, and the SASP can promote tumor progression. Thus, the interpretation of these states in our model should be understood within the framework of predicted transcriptional programs, rather than as definitive biological outcomes.
To overcome the aforementioned drawbacks, in the near future, a multi-modal model could be explored to utilize single-cell transcriptome information, spatial transcriptomics, and even multi-omics analysis to integrate more detailed intratumor heterogeneity and a comprehensive description of the TME. Moreover, widening the data scope, broadening the training datasets, and improving biological interpretation will facilitate the development of these types of models as robust solutions for creating powerful AI algorithms in precision oncology.
4. Materials and Methods
4.1. Data Collection and Preprocessing
Gene expression profiles of lung cancer cell lines, both untreated and treated with various anticancer drugs, were obtained from the publicly available Gene Expression Omnibus (GEO) database. Additional data were obtained from the LINCS L1000 dataset, which similarly includes the transcriptional profiles of untreated and drug-treated cells. Each profile contains gene expression data following treatment with drugs such as topotecan and erlotinib, among others. These data served as the input and output pairs for training the GAN model. The full list of drugs, corresponding GEO accession numbers, and associated cell lines is provided in
Supplementary Table S4.
Gene expression data were obtained as processed files (series matrix or normalized expression values, when available) and log2-transformed. The baseline gene expression profiles correspond to pre-treatment expression measurements for lung cancer cell lines under untreated control conditions, as provided in the original datasets. Only lung cancer cell lines with both untreated (baseline) samples and corresponding drug-treated samples were retained for analysis. All the samples were transposed such that each row represented a gene and each column represented a sample. Standard normalization was performed using z score transformation for both the input (pre-treatment) and output (post-treatment) expression matrices. Specifically, the TPM values were transformed using log2 (TPM + 1) prior to normalization. For each dataset, z score normalization was applied independently to the input (pre-treatment) and output (post-treatment) expression matrices across samples for each gene. The normalization parameters were computed within each cross-validation training fold and then applied to the corresponding validation and test sets to avoid data leakage. The normalized data were subsequently converted into PyTorch tensors (v1.12.1; Meta AI, Menlo Park, CA, USA) for model training.
4.2. Model Architecture of GRIP-Lung
We propose a GAN-based approach for generating drug expression profiles. The GAN framework comprises two main components: a generator and a discriminator. The generator maps undrugged expression profiles to drugged expression profiles by minimizing reconstruction error, with mean squared error (MSE) as the loss function. As the generator feeds its output into the discriminator, the task of the latter is to judge whether a certain profile originates from the drug expression distribution to provide feedback on how to adjust the model to further reduce the error rate. The input and output file formats used in this model are publicly available in our GitHub repository to facilitate reproducibility.
4.3. Generator and Discriminator Combinations Tested
The model was built upon the GAN framework, comprising multiple generator and discriminator architectures. Three generator variants, namely, a conditional autoencoder (cAE), a transformer, and a residual network, and two discriminator variants, namely, a 1D convolutional neural network (1D-CNN) and a lightweight multilayer perceptron (MLP), were designed, resulting in six possible GAN configurations. These configurations are summarized in
Table 2. The configuration that achieved the best performance was selected as the final model.
Among the six evaluated architectures (cAE-MLP, cAE-1D-CNN, Transformer–MLP, Transformer–1D-CNN, Residual–MLP, and Residual–1D-CNN), Residual–MLP demonstrated the best performance. Therefore, we provide a detailed description of the Residual generator and MLP discriminator below. The configurations of the other architectures are reported in
Supplementary Note S1.
4.4. Embedding of Cell Lines and Drugs
To integrate contextual biological information into the model, we applied an embedding strategy for both cell lines and drug identifiers. Each embedding vector was trained to capture latent characteristics of the corresponding entity and was projected to match the dimensionality of the input gene expression profile. The cell line and drug embeddings were then added element-wise to the normalized input expression vector, enabling the generator to learn drug- and cell-specific perturbation patterns during training. This element-wise integration avoided increasing the input dimensionality, as would occur with simple concatenation, and ensured efficient conditioning while maintaining the interpretability of gene-level features. Each embedding vector had a dimensionality of 128, a choice that provided sufficient capacity to capture latent biological and pharmacological features while maintaining a balanced contribution to the hidden representation of the generator.
4.5. Residual Network Generator
The generator was designed as a residual conditional network, where condition information is incorporated by embedding layers to predict drug-induced transcriptional perturbations. The input to the generator consists of the baseline gene expression profile x along with the identifiers of the cell line and the administered drug. The cell lines and drug identities are first embedded into low-dimensional vectors, which are concatenated and passed through a modulation network to generate two conditioning parameters, γ and β. These parameters are applied to the hidden features via feature-wise linear modulation (FiLM), enabling the network to adaptively adjust its feature activations according to the specific cell–drug combination.
A residual block is then applied to facilitate gradient propagation and stabilize training. The generator outputs a residual vector Δ
x, representing the predicted transcriptional shift induced by drug treatment, which is added to the original baseline profile to produce the final predicted expression:
This residual formulation emphasizes learning relative rather than absolute differences, which leads to more stable convergence and a more interpretable biological space. In addition, the generator is trained using a hybrid loss that combines an adversarial term with a regression constraint to produce both realistic gene-level prediction distributions and high numerical accuracy.
Specifically, the generator projects the input gene expression vector to a hidden dimension of 256 using a fully connected layer. The subsequent residual block consists of two fully connected layers, each with 256 hidden units and ReLU activation functions. The FiLM-style modulation parameters, γ and β, are applied after this projection to condition the hidden features on the cell line and drug embeddings. Finally, a linear layer maps the processed hidden representation back to the original gene dimension, and the resulting residual vector is added to the baseline expression to form the predicted transcriptional response. This detailed architecture allows the network to perform deep feature transformations while preserving direct gene-level correspondence, facilitating stable training and accurate perturbation prediction.
4.6. Lightweight MLP Discriminator
The discriminator was formulated as a lightweight multilayer perceptron for distinguishing gene expression profiles on the basis of whether they were derived from real or fake samples under a given condition. Each input consisted of a gene expression vector together with the embeddings of the corresponding cell type and drug.
The network consists of three fully connected layers with LeakyReLU activation functions and one output neuron, yielding a scalar value representing whether the input data corresponded to experimental measurements of drug cell expression or was synthesized using our generative model. This simple design allows the number of parameters to remain low, thereby reducing the risk of overfitting and making training stable. Despite its simplicity, the discriminator manages to identify correlations in drug cell expression and gives robust adversarial signals to the generator.
4.7. Adversarial Training Strategy
The conditional GAN was trained using an adversarial process in which the aim of the generator is to make the predicted post-treatment transcriptional profiles minimally different from the real ones, and the task of the discriminator is to maximize the discrepancy between them. Additionally, the generator employs an adversarial strategy coupled with a regression constraint on the noise term, ensuring that the output is not only plausible but also numerically close to the expected values. The specific definitions of the loss function terms are shown below:
where
λ is a weighting hyperparameter that balances the adversarial and regression objectives.
The adversarial term encourages the generator to produce realistic drug response distributions, whereas the MSE term constrains the output to remain numerically consistent with ground-truth post-treatment gene expression.
4.8. Training Procedure
The model was trained using the Adam optimizer with a learning rate of 0.001. During each epoch, the discriminator and generator were alternately updated: the discriminator was first optimized to distinguish real from generated samples, and the generator was then optimized to both fool the discriminator and minimize reconstruction error. The MSE between the predicted and actual expression profiles was used to monitor the regression accuracy of the generator output.
4.9. Extension to Patient-Derived Transcriptomic Data
To evaluate the translational potential of our model, we applied a trained generator that was originally developed on lung cancer cell line data to bulk RNA-seq data derived from lung cancer patients. A publicly available dataset (GSE165019) was used as an external source for patient transcriptome profiles. Four samples whose clinical annotations included erlotinib treatment information were selected for analysis.
Patient gene expression data were preprocessed to match the input format of the generator, including gene filtering and normalization procedures identical to those applied to the training data. The untreated expression profile of the patient was mapped to the most similar cell line in the training dataset on the basis of expression similarity, and the corresponding cell embedding was used for prediction, while the drug embeddings were maintained as in the original model. To evaluate the robustness of the model, we compared the predicted expression profiles of post-treatment patients with their corresponding actual profiles. Expression correlation metrics were used to evaluate the consistency between the predicted responses and observed patterns.
4.10. ssGSEA-Based Approach for Identifying Drug Post-Treatment States
To study the biological effects of the post-treatment predicted expression profile, we constructed an encompassing set of representative biomarkers that capture various major cellular states we tested, such as programmed cell death (apoptosis, autophagy, and ferroptosis), senescence, cell-cycle arrest, drug resistance, immune escape, and malignant progression. The upregulated expression of all selected biomarkers was used as an indicator of the aforementioned cellular states. Notably, certain biological processes, such as autophagy and senescence, are highly context-dependent. In the present study, the upregulation of the expression of biomarkers was operationally interpreted as reflecting treatment-induced stress and growth suppression in vitro, rather than long-term tumor-promoting effects such as senescence-associated secretory phenotype (SASP)-mediated progression. We used the single-sample gene set enrichment analysis (ssGSEA) scoring method to classify predicted expression profiles on the basis of each biomarker signature to determine whether treatment-induced expression patterns, such as gene programs for programmed cell death, cell-cycle arrest, and senescence, represented an effective therapeutic response or states associated with ineffective therapy, such as drug resistance, immune escape, and tumor progression. The details of the gene sets representing different post-treatment response states are provided in
Supplementary Table S5. We evaluated the performance of ssGSEA using the logGI
50 from the NCI-60 cell line panel as a measure of drug sensitivity and calculated effective scores as mean z scores from the effective gene sets [
34]. Ineffective scores for the ineffective gene sets were computed similarly, and the difference from the effective score was used to calculate the resistance index (RI) for every sample, providing an indication of whether the associated pathway was relatively more activated. If a sample had an RI value greater than zero, then the sample was regarded as ineffective; otherwise, if an RI value was equal to or less than zero, the sample was considered to be effective, thus also representing the opposite sides of transcriptional program activity following drug treatment.