Advances in Personalized Cancer Vaccine Development: AI Applications from Neoantigen Discovery to mRNA Formulation

Kong, Hyunseung

doi:10.3390/biochem5020005

Open AccessReview

Advances in Personalized Cancer Vaccine Development: AI Applications from Neoantigen Discovery to mRNA Formulation

by

Hyunseung Kong

Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea

BioChem 2025, 5(2), 5; https://doi.org/10.3390/biochem5020005

Submission received: 25 February 2025 / Revised: 25 March 2025 / Accepted: 28 March 2025 / Published: 31 March 2025

(This article belongs to the Special Issue Feature Papers in BioChem, 2nd Edition)

Download

Browse Figure

Versions Notes

Abstract

Personalized cancer vaccines are a promising immunotherapy targeting patient-specific tumor neoantigens, yet their design and efficacy remain challenging. Recent advances in artificial intelligence (AI) provide powerful tools to enhance multiple stages of cancer vaccine development. This review systematically evaluates AI applications in personalized cancer vaccine research over the past five years, focusing on four key areas: neoantigen discovery, codon optimization, untranslated region (UTR) sequence generation, and mRNA vaccine design. We examine AI model architectures (e.g., neural networks), datasets (from omics to high-throughput assays), and outcomes in improving vaccine development. In neoantigen discovery, machine learning and deep learning models integrate peptide–MHC binding, antigen processing, and T cell receptor recognition to enhance immunogenic neoantigen identification. For sequence optimization, deep learning models for codon and UTR design improve protein expression and mRNA stability beyond traditional methods. AI-driven strategies also optimize mRNA vaccine constructs and formulations, including secondary structures and nanoparticle delivery systems. We discuss how these AI approaches converge to streamline effective personalized vaccine development, while addressing challenges such as data scarcity, tumor heterogeneity, and model interpretability. By leveraging AI innovations, the future of personalized cancer immunotherapy may see unprecedented improvements in both design efficiency and clinical effectiveness.

Keywords:

personalized cancer vaccines; neoantigen prediction; codon optimization; 5′ UTR; mRNA design; deep learning; immunotherapy

1. Introduction

Cancer immunotherapy has been revolutionized by approaches that direct the immune system to target tumors. Personalized cancer vaccines, which stimulate an immune attack against patient-specific tumor antigens, exemplify this paradigm shift. Unlike prophylactic cancer vaccines targeting viral oncogenes, personalized therapeutic vaccines aim to induce T cells against neoantigens, which are novel peptides arising from somatic mutations in the tumor [1]. Two seminal 2017 studies demonstrated the feasibility and immunogenicity of personalized neoantigen vaccines in melanoma patients [2,3]. In these trials, vaccines composed of patient-specific mutant peptides or mRNAs elicited T cell responses against multiple neoantigens without serious toxicity. These results established a proof of concept that vaccinating against neoantigens can induce antitumor immunity in humans.

Despite this promise, designing an effective personalized vaccine is complex. Each patient’s tumor presents a unique set of mutations and human leukocyte antigen (HLA) types, requiring identification of suitable neoantigen targets [4]. The peptide–MHC binding process is a critical first step in this cascade, where tumor-derived peptides must fit precisely into the binding groove of the patient’s specific HLA molecules, forming a stable complex that can be recognized by T cells. This interaction is governed by anchor residues within the peptide sequence that determine binding affinity and stability, directly influencing immunogenicity. T cell recognition then occurs through the T cell receptor (TCR), which engages with the peptide–MHC complex in a highly specific manner, with the CDR3 loops of the TCR making direct contact with the presented neoantigen peptide, enabling discrimination between self and tumor-specific antigens.

Once neoantigens are identified, vaccine constructs (such as long peptide pools or mRNA encoding multiple epitopes) must be engineered for high expression and immunogenicity [5]. The typical pipeline involves neoantigen discovery by sequencing the tumor to find mutations, predicting which mutant peptides bind strongly to the patient’s HLA molecules, and assessing their likelihood of being immunogenic. Vaccine sequence optimization follows, particularly for DNA or mRNA vaccines, optimizing the coding sequence for efficient production of the neoantigen peptides, including codon optimization to maximize protein expression in the chosen system [6] and designing regulatory regions such as 5′ and 3′ UTRs for optimal translation and stability. The engineering of these untranslated regions is particularly important for mRNA vaccines, as they contain elements that interact with cellular RNA-binding proteins to modulate mRNA stability, localization, and translation efficiency, with optimal 5′ UTRs enhancing ribosome loading and 3′ UTRs controlling poly(A) tail length and protection from exonucleases.

Lastly, mRNA vaccine design ensures the mRNA molecule has an appropriate structure (cap, UTRs, coding region, poly(A) tail) and includes the design of delivery mechanisms (e.g., lipid nanoparticles) to efficiently elicit an immune response [7]. The self-assembly principles of lipid nanoparticles involve precise formulation of ionizable cationic lipids, helper lipids, cholesterol, and PEG lipids at specific molar ratios. These components spontaneously form bilayer structures in aqueous environments, with the ionizable lipids becoming positively charged at acidic pH to complex with negatively charged mRNA. The resulting particles protect the mRNA from degradation while facilitating cellular uptake via endocytosis and subsequent endosomal escape through pH-dependent membrane disruption, allowing the mRNA to reach ribosomes in the cytoplasm.

Personalized cancer vaccines rely on multiple stages of optimization, from neoantigen selection to mRNA sequence engineering and delivery formulation (Figure 1). Each step plays a crucial role in ensuring effective antigen presentation and immune activation, and recent AI-driven approaches have significantly improved these processes, enhancing both vaccine efficacy and efficiency.

Traditional neoantigen prediction pipelines often produce many candidates, but only a small fraction (~2%) prove to elicit T cells experimentally [8]. Similarly, maximizing protein yield from an mRNA vaccine through sequence changes is a high-dimensional problem, as there is an enormous number of possible synonymous coding sequences and UTR variants for any given peptide [9]. AI techniques have rapidly become indispensable in tackling these complexities [10].

Over the past five years, researchers have developed a range of AI-driven tools to enhance personalized cancer vaccine development [11]. These include deep learning models such as convolutional neural networks, recurrent networks, and transformers for predicting peptide–MHC binding and T cell recognition, generative models for creating optimized genetic sequences, and multitask frameworks that integrate diverse biological features [12]. Early results show that such models can dramatically improve the accuracy of neoantigen identification and yield of vaccine antigens relative to the conventional methods [13].

This review provides a comprehensive overview of these developments. We focus on four major aspects where AI is applied: neoantigen discovery, identifying and prioritizing tumor-specific antigen targets using AI; codon optimization, using AI to modify the coding sequence for improved expression; UTR sequence generation, designing 5′ and 3′ UTRs to enhance translation and stability of mRNA vaccines; and mRNA vaccine design, with AI-driven improvements in vaccine construct design and formulation. Recent studies in these areas are summarized in Table 1, highlighting key advancements and their implications for AI-driven mRNA vaccine development.

2. AI in Neoantigen Discovery

Identifying suitable neoantigens is the first and arguably most critical step in personalized cancer vaccine design [26]. Neoantigens result from non-synonymous mutations (or other alterations such as fusions) that create novel peptides not present in normal human proteins [27]. When a neoantigen peptide is presented on the cell surface by major histocompatibility complex (MHC) molecules, it can be recognized as “non-self” by T cells, triggering an immune attack on the tumor cell [28]. However, not all tumor mutations produce immunogenic neoantigens [29]. The challenge is to predict which mutated peptides will bind to the patient’s HLA alleles and elicit a T cell response [29].

Classical in silico pipelines use sequential filters: identify tumor-specific mutations from exome sequencing, predict peptide–MHC binding (often with such tools as NetMHC), filter for those with sufficient tumor expression and processing, and sometimes predict T cell recognition or immunogenicity [30]. Many pipeline implementations exist (e.g., pVACseq, MuPeXI, Vaxrank), typically relying on motif-based or machine learning predictors for MHC binding and applying heuristic filters for expression and antigen processing steps [31]. While these pipelines can shortlist candidate neoantigens, their precision is limited—only about 1–5% of the predicted high-affinity binders tend to be true immunogenic neoantigens in practice [32]. This low precision leads to many false positives, requiring labor-intensive experimental validation and risking missing the truly effective targets [33].

2.1. Enhancing Neoantigen Prediction with Machine Learning

AI has been introduced into neoantigen discovery to improve each stage of the prediction pipeline. One major focus has been on improving peptide–MHC binding prediction, as peptide immunogenicity is contingent on stable MHC presentation. Earlier algorithms (e.g., NetMHCpan) were already machine learning-based, using neural networks trained on binding affinity datasets, and were fairly accurate for many HLA alleles [34]. Recent advances use deep learning to capture more complex sequence patterns or even structural features [35].

For example, NeoaPred is a deep learning framework that constructs peptide–HLA class I complex structures in silico and evaluates binding and surface features to predict immunogenicity [36]. NeoaPred achieved approximately 82% accuracy in modeling peptide–MHC complex structures and used those structures to improve the identification of peptides likely to be immunogenic. Such structure-based approaches go beyond sequence motifs by considering how a peptide sits in the MHC binding groove and is exposed for T cell recognition. Similarly, DeepHLApan (2020) and other CNN/LSTM-based predictors incorporate flanking residues or the entire proteasomal processing context to better predict which peptides will be generated and presented by MHC [37]. These deep models, trained on large datasets of peptide–MHC interactions (including mass spectrometry data of naturally presented peptides), often outperform older tools in binding prediction benchmarks [38].

Another critical area is predicting T cell recognition of peptide–MHC complexes. Even a peptide that binds to MHC well may not trigger a T cell response if the T cell repertoire does not recognize it or if it resembles self-peptides. AI models bridging peptides and T cell receptors (TCRs) are being developed for this purpose. For instance, Springer et al. developed ERGO, an LSTM recurrent neural network that predicts whether a given TCR sequence will bind a given peptide–MHC complex [14]. ERGO was trained on a large dataset of known TCR–peptide pairs and outputs a binding probability score.

Building on that, NetTCR-2.0 (2021) [15] incorporated both TCR α- and β-chain sequences, using a convolutional neural network architecture. In this model, the amino acid sequences of the peptide and complementarity-determining regions (CDR3) of TCRα and TCRβ were one-hot encoded and passed through CNN and pooling layers, then merged in fully connected layers. NetTCR-2.0 achieved high specificity in identifying true peptide–TCR pairs, improving accuracy over previous models.

A notable advancement came with pMTnet [16], a transfer learning-based model that first learns numerical embeddings of TCRs and peptide–MHC complexes from large general-purpose datasets, then trains a final predictive network on the known TCR–pMHC pairs. This strategy allows it to transfer knowledge from extensively studied viral epitopes to predict TCR binding of tumor-specific neoantigens.

Another AI approach to predicting peptide immunogenicity incorporates multiple features. Certain sequence properties correlate with immunogenicity: for instance, immunogenic epitopes often have an intermediate length, certain hydrophobicity patterns, and minimal similarity to self-proteins [39]. AI models, including deep neural networks and gradient-boosted classifiers, integrate these features to improve immunogenicity prediction. Zhang et al. (2022) introduced an integrated pipeline where separate deep models predict MHC binding, antigen processing, and immunogenicity, then combine these outputs to rank neoantigens [20].

Kai Xin et al. (2024) developed NUCC, a multi-branch deep neural network integrating multiple biological features to predict neoantigen immunogenicity [40]. NUCC takes multiple inputs for each peptide, including its sequence, the patient’s HLA allele, and numeric features such as predicted presentation probability and peptide–MHC binding stability. NUCC outperformed NetMHCpan and identified experimentally validated neoantigens in a gastric cancer dataset.

2.2. Improved Pipelines and Web Services

To facilitate the clinical application of AI-driven neoantigen prediction, models are being integrated into user-friendly pipelines and web platforms. DeepNeo is a web server that employs deep learning to predict neoantigen immunogenicity based on structural and sequence properties [41]. Similarly, pVACtools offers a framework that integrates multiple prediction methods, expression filtering, and optional experimental validation steps to generate a ranked list of candidate neoantigens. Another notable tool, DeepVACPred, utilizes an autoencoder-based deep learning model to predict and design optimal epitope sets for vaccines [42]. The implementation of these AI-driven tools has led to improved outcomes in recent clinical trials of personalized neoantigen vaccines, demonstrating higher validation rates of predictions [43]. This advancement reduces wasted resources on non-immunogenic peptides and accelerates the transition from sequencing to vaccine formulation, ultimately enhancing the efficiency of personalized cancer vaccine development.

3. AI in Codon Optimization for Vaccine Antigens

Once neoantigen peptides are selected, the next step—particularly for DNA or mRNA-based vaccines—is to encode them in a genetic sequence that can be efficiently translated into proteins. The efficacy of personalized cancer vaccines largely depends on how well the selected antigens are expressed and presented by antigen-presenting cells. Codon optimization plays a crucial role in ensuring high expression levels of vaccine antigens by modifying the coding sequence without altering the amino acid sequence.

Different organisms prefer different synonymous codons due to variations in tRNA abundances and regulatory mechanisms. For instance, a human gene may not be optimally expressed in E. coli due to differences in codon usage biases. Even within human cells, certain codon choices can influence translation speed, mRNA stability, and protein folding efficiency. Kudla et al. demonstrated that synonymous variants of GFP showed up to 250-fold differences in expression levels in E. coli, primarily due to mRNA secondary structure differences near the translation start site [44]. The MDR1 gene provides another compelling example, where a synonymous C3435T mutation alters cotranslational folding dynamics, resulting in a protein with an identical amino acid sequence but modified substrate specificity. Similarly, studies have shown that a “slow ramp” of rare codons at the beginning of genes can enhance overall translation by preventing ribosomal traffic jams, while the strategic reuse of the same tRNA through codon autocorrelation can increase translation efficiency. Traditional codon optimization methods rely on heuristics such as the codon adaptation index (CAI), GC content balancing, or elimination of problematic motifs (e.g., RNA secondary structures, cryptic splicing signals) [45]. However, these approaches may overlook complex, nonlinear interactions between codon usage and translation efficiency. AI-driven codon optimization aims to overcome these limitations by learning optimal codon usage patterns from large biological datasets and predicting the best synonymous codon substitutions for a given antigen.

3.1. Deep Learning Models for Codon Optimization

One of the pioneering AI approaches in codon optimization was introduced by Fu et al. (2020), who developed a BiLSTM-CRF (bidirectional LSTM with a conditional random field) model for selecting optimal codons [17]. This model leverages a concept called “codon boxes,” which groups codons by their nucleotide composition rather than by their sequence order. By training on highly expressed genes, the BiLSTM-CRF model captured preferred codon patterns that yield enhanced protein production. When experimentally validated, codon sequences optimized by this model resulted in significantly higher protein expression compared to sequences optimized by traditional rule-based methods.

Another major advancement is the CO-BERT model developed by Absci in 2024, which applies a transformer-based language model to codon selection [28]. Inspired by natural language processing (NLP), CO-BERT treats codons as tokens and learns context-aware representations of coding sequences. This approach enables the model to predict the most likely (or optimal) codon in a given genetic context. By training on large-scale genomic datasets, CO-BERT can suggest codon substitutions that maximize protein expression in specific cellular environments, automating what was traditionally an experimental trial-and-error process.

Similarly, CodonBERT, an AI model introduced by Li et al. (2024), was trained on over 10 million mRNA sequences from diverse organisms to learn codon-level sequence preferences [19]. Unlike traditional optimization strategies that optimize codons in isolation, CodonBERT captures long-range dependencies between codons, ensuring that synonymous substitutions do not disrupt local regulatory motifs or RNA secondary structures. The model was tested on vaccine-relevant antigen genes, where its optimized sequences led to increased protein yields compared to industry-standard codon optimization methods.

In addition to supervised learning models, generative models have also been explored for codon optimization. RiboCode, a deep generative optimization framework, integrates ribosome profiling data to design codon sequences that enhance ribosome loading and elongation efficiency [46]. This model uses reinforcement learning to iteratively improve codon sequences based on an objective function that maximizes translation efficiency while minimizing structural constraints. Early results suggest that RiboCode-designed sequences outperform heuristic-based optimization in driving higher protein production.

3.2. Experimental Validation of AI-Optimized Codon Sequences

The success of AI-driven codon optimization has been demonstrated across multiple experimental studies. Fu et al. (2020) tested AI-optimized versions of several genes, including a malaria vaccine antigen (FALVAC-1) and the human phosphatase PTP4A3, in E. coli expression systems. Compared to native sequences and traditionally optimized counterparts, the AI-optimized genes consistently exhibited higher protein expression levels [17]. Similarly, CodonBERT-optimized sequences of influenza vaccine antigens resulted in higher antigen yields in mammalian cell systems, supporting their potential use in vaccine manufacturing.

These improvements in protein expression have direct implications for personalized cancer vaccines. AI-guided codon optimization can ensure that mRNA vaccine constructs encoding patient-specific neoantigens produce robust antigen levels in dendritic cells, leading to stronger immune activation. This is particularly relevant when designing multivalent mRNA vaccines, where ensuring balanced expression of multiple antigens is crucial. AI models can optimize codon choices in a way that prevents translation competition between different antigenic segments, thereby ensuring each neoantigen is adequately produced.

4. AI in UTR Sequence Generation and mRNA Design

Messenger RNA (mRNA) stability and translational efficiency are critical factors in determining the potency of an mRNA-based cancer vaccine. While codon optimization enhances the efficiency of the coding sequence, additional regulatory elements, such as untranslated regions (UTRs), significantly influence the overall expression of vaccine antigens. The 5′ UTR plays a crucial role in translation initiation by regulating ribosome scanning and start codon recognition [47], while the 3′ UTR affects mRNA stability, localization, and translation efficiency through interactions with RNA-binding proteins and microRNAs [48]. Optimizing these regulatory regions has traditionally been based on the empirical selection of UTRs from highly expressed genes. However, AI-driven approaches now allow for de novo UTR design, creating synthetic sequences with optimized translation and stability characteristics.

4.1. AI-Driven 5′ UTR Optimization

The 5′ UTR influences how efficiently ribosomes initiate translation, particularly through its nucleotide composition, secondary structure, and upstream open reading frames (uORFs). AI models have been developed to design 5′ UTRs that enhance translation initiation while avoiding inhibitory structures.

One of the most prominent AI approaches is Smart5UTR, a deep generative model introduced by Tang et al. (2024) that designs synthetic 5′ UTRs optimized for translation efficiency in mRNA vaccines [49]. The model was trained on a massively parallel reporter assay (MPRA) dataset containing over 200,000 randomized 5′ UTR sequences, each experimentally tested for translation efficiency. Smart5UTR employs a multitask autoencoder with a convolutional neural network (CNN) encoder, allowing it to learn hidden sequence patterns that contribute to high translation initiation. When applied to vaccine mRNAs encoding the SARS-CoV-2 spike protein, Smart5UTR-generated UTRs significantly improved antigen expression in mice compared to natural UTRs from housekeeping genes. In vivo validation further demonstrated that Smart5UTR-designed 5′ UTRs enhanced antibody titers against SARS-CoV-2 variants by up to 120 times compared to the conventional designs, while maintaining excellent safety profiles.

Another AI-driven approach, developed by Castillo-Hair et al. (2024), focused on optimizing 5′ UTRs for gene-editing enzymes delivered as mRNA [21]. They performed polysome profiling on large libraries of randomized 5′ UTRs to measure translation efficiency across different cell types. A deep learning regression model was then trained on these data to predict which UTR sequences would maximize ribosome loading. When applied to megaTAL nucleases (a class of gene-editing proteins), the AI-designed UTRs significantly enhanced the editing activity in human cells.

Chu et al. (2024) developed UTR-LM, a pretrained transformer language model that learns sequence representations of endogenous 5′ UTRs from diverse species [22]. This unsupervised model was fine-tuned to predict translation initiation efficiency (TIE) based on sequence features. UTR-LM successfully identified previously unknown synthetic UTR sequences that outperformed natural UTRs in driving protein expression. The ability of UTR-LM to generalize across species suggests it may be useful for optimizing vaccine mRNAs across different expression systems.

4.2. AI-Driven 3′ UTR Optimization

The 3′ UTR primarily influences mRNA stability, degradation, and localization. While many vaccines borrow 3′ UTRs from highly stable endogenous transcripts (such as α-globin), AI-driven approaches can design synthetic 3′ UTRs with improved stability and translational properties.

Morrow et al. (2024) developed a machine learning model for 3′ UTR design using high-throughput RNA stability assays [50]. Their dataset included thousands of synthetic 3′ UTR sequences, each measured for mRNA half-life in human cells. The model, trained on these data, predicted mRNA stability based on sequence features and identified de novo 3′ UTR designs that extended mRNA half-life beyond the commonly used endogenous UTRs. When experimentally tested, AI-optimized 3′ UTRs led to prolonged antigen expression in mammalian cells, a crucial advantage for cancer vaccines requiring sustained immune stimulation.

Similarly, Liu et al. (2024) introduced LinearDesign2, an AI framework that co-optimizes both the 5′ UTR and the coding sequence to enhance translation initiation and stability simultaneously [51]. Traditional UTR and codon optimization methods have often been applied separately, but LinearDesign2 integrates them into a multi-objective optimization pipeline, balancing translation efficiency with mRNA folding stability. Their computational analysis suggested that jointly optimizing the 5′ UTR and coding sequence leads to greater protein expression gains than optimizing either component alone.

4.3. Implications for mRNA Vaccine Design

AI-driven UTR and mRNA design has several significant implications for personalized cancer vaccines. One major advantage is the increased vaccine potency, as AI-optimized UTRs can double or even triple antigen expression, enhancing vaccine immunogenicity without requiring a higher mRNA dose. This improvement can lead to more efficient T cell priming while reducing the necessary vaccine dosage. Additionally, AI enables circumvention of biological constraints, addressing challenges where certain neoantigen sequences are difficult to express due to RNA secondary structures or cryptic motifs. By designing custom 5′ and 3′ UTRs, AI ensures robust antigen production despite these inherent limitations.

Another critical aspect is context-specific optimization, where AI models tailor UTRs to different cell types or conditions. For instance, a UTR optimized for dendritic cell translation—where vaccine antigens undergo immune processing—may differ significantly from one designed for general mammalian cells [52]. Furthermore, AI-driven approaches facilitate the integration of codon optimization, allowing simultaneous optimization of coding regions, UTRs, and secondary structures. This comprehensive strategy represents the next frontier in computational vaccine design, potentially yielding even greater improvements in mRNA vaccine performance.

5. AI in mRNA Vaccine Formulation and Delivery

Optimizing the sequence of mRNA vaccines is crucial, but ensuring effective delivery and stability within the human body is equally important. Particularly, mRNA vaccines must be efficiently translated into proteins inside cells while avoiding premature degradation by extracellular RNases and immune system overactivation. Lipid nanoparticles (LNPs) are the most widely used delivery system for mRNA vaccines, as they protect mRNA and facilitate its uptake by cells [53]. Additionally, immune-stimulatory adjuvants are often included to enhance vaccine efficacy [54]. AI is playing an increasing role in optimizing both mRNA delivery vehicles and adjuvant formulations to maximize vaccine performance.

5.1. AI-Guided Lipid Nanoparticle (LNP) Optimization

LNPs are composed of ionizable lipids, phospholipids, cholesterol, and polyethylene glycol (PEG) lipids [55]. The specific composition and ratio of these components influence mRNA encapsulation efficiency, stability, and cellular uptake. Traditionally, LNP formulations have been developed through trial and error, but AI is now accelerating the discovery of optimized nanoparticle formulations.

Mekki-Berrada et al. (2021) developed a two-step machine learning approach to optimize polymeric nanoparticles for mRNA delivery [24]. First, they applied unsupervised clustering on a large combinatorial dataset of nanoparticle formulations to identify promising formulation families. Then, they trained a supervised regression model to predict mRNA delivery efficiency based on nanoparticle composition. This approach significantly reduced the experimental burden, enabling the identification of highly effective LNP formulations with improved cellular uptake and transfection efficiency.

Similarly, AGILE—a deep learning-powered platform introduced in 2024—leverages a graph neural network in combination with high-throughput combinatorial lipid synthesis to predict the mRNA transfection potency of lipid nanoparticle (LNP) formulations [56]. Trained on thousands of LNP compositions evaluated for stability, delivery efficiency, and immune activation, AGILE accurately identifies optimal ionizable lipid combinations for mRNA vaccine delivery, outperforming traditional heuristic-based screening methods.

Another AI-driven approach, TransLNP, is a transformer-based model designed to accelerate the screening of ionizable lipid nanoparticles for mRNA delivery [57]. By integrating coarse-grained atomic sequence information with fine-grained spatial relationships, it captures the key structure–efficiency correlations. Utilizing pretraining and a BalMol block for label and feature smoothing, TransLNP effectively overcomes data scarcity and imbalance, while also identifying transfection cliffs where similar structures exhibit drastically different efficiencies.

5.2. AI in Adjuvant and Immune-Stimulatory Element Design

In addition to delivery optimization, AI is being used to design novel adjuvants and immune-modulating elements that enhance vaccine efficacy [58]. Adjuvants help stimulate the immune system and ensure a robust response to the vaccine antigen. Traditional adjuvants include aluminum salts, toll-like receptor (TLR) agonists, and cytokine-based stimulants. However, the discovery of novel adjuvant molecules with improved safety and efficacy profiles is a growing area of AI-driven research.

According to Chaudhury et al. (2018), distinct adjuvant formulations produce unique immune signatures in non-human primates [59]. By integrating serological, cellular, and cytokine data from multiple compartments and applying machine learning techniques (such as random forest models), the study achieved up to 92% accuracy in predicting adjuvant conditions from the immune profile, highlighting a quantitative approach for optimal vaccine–adjuvant pairing.

Leveraging computer-aided design and machine learning, a library of 46 AuNP-based adjuvants with TLR-targeting ligands was developed, leading to the identification of AuNP27 and AuNP35 as potent dendritic cell activators. These nano-adjuvants enhance antigen presentation and T cell responses via multiple TLR pathways, thereby improving antitumor immunity in preclinical models [60].

5.3. AI in Personalized mRNA Vaccine Formulation

AI is not only optimizing general vaccine delivery strategies, but also enabling personalized vaccine formulation. Each patient’s immune system and tumor microenvironment are unique, meaning a one-size-fits-all vaccine formulation may not be optimal. AI models integrating genomic, proteomic, and immunological data can help tailor vaccine formulations to individual patients [61].

For example, AI frameworks that analyze patient-specific immune signatures can predict which LNP formulation will work best for a given individual. Patients with high levels of inflammation might benefit from a formulation with lower reactogenicity, while patients with immunosuppressive tumor environments may require stronger adjuvants. AI can rapidly generate these predictions, guiding personalized vaccine formulation in real time.

Moreover, AI-powered models can predict how a patient’s immune system will respond to different adjuvant combinations, allowing clinicians to select the best formulation for each patient. This is particularly important for cancer patients, where immune evasion mechanisms often hinder vaccine effectiveness. By using AI-driven formulation tools, personalized cancer vaccines can be fine-tuned to elicit the strongest possible immune response.

6. Challenges and Future Perspectives

The integration of AI into personalized cancer vaccine development is still in its early stages, and several challenges must be addressed to fully realize its potential. One of the most pressing issues is data limitations, as high-quality data are essential for training AI models. In neoantigen prediction, determining which mutations elicit T cell responses at scale remains difficult, requiring patient samples or labor-intensive experiments. Current models rely heavily on small datasets, typically containing only tens to hundreds of immunogenic peptides, which can introduce bias. Similarly, in mRNA design, the sequence space is vast, yet only a small fraction has been experimentally tested. While techniques such as transfer learning and active learning are improving model performance by prioritizing the collection of the most informative data, continued accumulation of experimental data—such as neoantigen immunogenicity, epitope–HLA–TCR structures, and mRNA translation outcomes—will be crucial. Initiatives such as TESLA for neoantigens and efforts to publish mRNA design datasets, such as the MPRA data from Smart5UTR, are valuable. A promising future direction is the integration of multi-omics data, combining tumor proteomics and immunopeptidomics to better identify which mutations generate peptides that are naturally presented on MHC molecules.

Another key challenge is tumor heterogeneity and immune evasion [62]. Tumors evolve, and a neoantigen present in all cancer cells at the time of vaccine design may be lost or downregulated before the immune system can target it due to immune editing. AI could potentially predict which neoantigens are less likely to be lost, such as those arising from truncal mutations essential to tumor survival. However, vaccines themselves could impose selective pressure on tumors, necessitating future AI models that simulate this evolutionary game. By selecting a combination of neoantigens that tumors cannot easily evade—perhaps by simultaneously targeting multiple essential neoantigens—AI could help develop more effective vaccine strategies. This challenge intersects with game theory and may benefit from AI techniques used in strategic planning, such as reinforcement learning, where the “opponent” is the evolving tumor.

Generalizability is another significant concern, as many AI models are trained on data from specific populations or experimental systems. For instance, neoantigen prediction models often rely on data from Caucasian patients with common HLA alleles, which may reduce their accuracy for individuals with rare HLA alleles or cancers with unique mutation profiles [63]. Similarly, codon optimization models trained on one cell type may not generalize well to another. Addressing this issue requires strategies such as training ensemble models that integrate multiple prediction methods, a practice already employed in pipelines such as pVACtools. Additionally, AI can be used to estimate its own uncertainty; if a model is unsure about a particular neoantigen, it can deprioritize the candidate or flag it for experimental validation first.

7. Conclusions

Personalized cancer vaccines sit at the intersection of genomics, immunology, and bioengineering. Artificial intelligence has rapidly emerged as an enabling technology at this intersection, addressing critical bottlenecks in vaccine design. In this review, we surveyed how AI techniques—particularly machine learning and deep learning models—are being applied to enhance neoantigen discovery, optimize vaccine coding sequences, design regulatory regions, and refine vaccine formulation and strategy. Over the last five years, tangible progress has been made: neoantigen prediction models integrating deep learning have improved the precision of target selection, reducing false positives and uncovering cryptic immunogenic mutations that earlier methods overlooked. AI-optimized coding sequences and UTRs have demonstrated superior protein expression, translating to stronger immune responses in preclinical vaccine studies. Furthermore, AI-driven insights are guiding the development of better delivery systems and combination therapies, inching the field closer to the goal of effective, individualized cancer immunotherapy.

The journey is far from over. As discussed, challenges such as data scarcity, model interpretability, and ensuring robust clinical efficacy need continued attention. However, the trajectory is clear: AI techniques will become ever more integrated into the pipeline of personalized vaccine development. In the near future, it is plausible that upon sequencing a patient’s tumor, automated AI pipelines will propose a “vaccine design” that experts will then validate and implement, compressing what used to be a multiyear research endeavor into a process that fits within the window of clinical decision-making for the patient. Early adopters of these technologies in clinical trials are already setting the stage, and as successes accumulate, confidence in AI-assisted design will grow.

In conclusion, AI is accelerating the development of personalized cancer vaccines by making the design process smarter, faster, and more tailored to each patient’s unique cancer. It exemplifies the power of interdisciplinary innovation—when computational algorithms are guided by biological understanding, the result is more than the sum of its parts. Continued collaboration and data-sharing will be essential to fully unlock AI’s potential in this field. The ultimate success will be measured in improved patient outcomes: longer survival, cancer remission, and, perhaps, one day, effective vaccines that can prevent cancer from recurring.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The author declares no conflict of interest.

References

Blass, E.; Ott, P.A. Advances in the development of personalized neoantigen-based therapeutic cancer vaccines. Nat. Rev. Clin. Oncol. 2021, 18, 215–229. [Google Scholar] [CrossRef] [PubMed]
Ott, P.A.; Hu, Z.; Keskin, D.B.; Shukla, S.A.; Sun, J.; Bozym, D.J.; Zhang, W.; Luoma, A.; Giobbie-Hurder, A.; Peter, L.; et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 2017, 547, 217–221. [Google Scholar] [CrossRef]
Sahin, U.; Derhovanessian, E.; Miller, M.; Kloke, B.P.; Simon, P.; Löwer, M.; Bukur, V.; Tadmor, A.D.; Luxemburger, U.; Schrörs, B. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer. Nature 2017, 547, 222–226. [Google Scholar] [CrossRef] [PubMed]
Keskin, D.B.; Anandappa, A.J.; Sun, J.; Tirosh, I.; Mathewson, N.D.; Li, S.; Oliveira, G.; Giobbie-Hurder, A.; Felt, K.; Gjini, E.; et al. Neoantigen vaccine generates intratumoral T cell responses in glioblastoma. Nature 2019, 565, 234–239. [Google Scholar] [CrossRef]
Sahin, U.; Türeci, Ö. Personalized vaccines for cancer immunotherapy. Science 2018, 359, 1355–1360. [Google Scholar] [CrossRef]
Liu, M.A. A comparison of plasmid DNA and mRNA as vaccine technologies. Vaccines 2019, 7, 37. [Google Scholar] [CrossRef]
Pardi, N.; Hogan, M.J.; Porter, F.W.; Weissman, D. mRNA vaccines—A new era in vaccinology. Nat. Rev. Drug Discov. 2018, 17, 261–279. [Google Scholar] [CrossRef]
Nguyen, T.B.Q.; Pham, Q.T.M.; Tran, L.S. 84P T cell receptor repertoire profiles of tumor-infiltrating lymphocytes improves neoantigen prioritization for personalized cancer immunotherapy. Ann. Oncol. 2023, 34, S1499. [Google Scholar] [CrossRef]
Hong, W.; Chen, C.; Zhu, Z.; Tang, K. An Elite Archive-Assisted Multi-Objective Evolutionary Algorithm for mRNA Design. In Proceedings of the 2024 IEEE Congress on Evolutionary Computation (CEC), Yokohama, Japan, 30 June–5 July 2024; pp. 1–8. [Google Scholar]
Kaushik, R.; Kant, R.; Christodoulides, M. Artificial intelligence in accelerating vaccine development-current and future perspectives. Front. Bacteriol. 2023, 2, 1258159. [Google Scholar] [CrossRef]
Imani, S.; Li, X.; Chen, K.; Maghsoudloo, M.; Kaboli, P.J.; Hashemi, M.; Khoushab, S.; Li, X. Computational biology and artificial intelligence in mRNA vaccine design for cancer immunotherapy. Front. Cell. Infect. Microbiol. 2025, 14, 1501010. [Google Scholar] [CrossRef]
Xin, P.; Yang, D.; Zhou, Y.; Peng, S. TlcMHCpan: A Novel Deep Learning Model for Enhanced Pan-Specific Prediction of Peptide-HLA Binding. IEEE Access 2024, 12, 184644–184656. [Google Scholar] [CrossRef]
Pu, T.; Peddle, A.; Zhu, J.; Tejpar, S.; Verbandt, S. Neoantigen identification: Technological advances and challenges. Methods Cell Biol. 2024, 183, 265–302. [Google Scholar] [PubMed]
Springer, I.; Besser, H.; Tickotsky-Moskovitz, N.; Dvorkin, S.; Louzoun, Y. Prediction of specific TCR-peptide binding from large dictionaries of TCR-peptide pairs. Front. Immunol. 2020, 11, 1803. [Google Scholar]
Montemurro, A.; Schuster, V.; Povlsen, H.R.; Bentzen, A.K.; Jurtz, V.; Chronister, W.D.; Crinklaw, A.; Hadrup, S.R.; Winther, O.; Peters, B.; et al. NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data. Commun. Biol. 2021, 4, 1060. [Google Scholar]
Lu, T.; Zhang, Z.; Zhu, J.; Wang, Y.; Jiang, P.; Xiao, X.; Bernatchez, C.; Heymach, J.V.; Gibbons, D.L.; Wang, J.; et al. Deep learning-based prediction of the T cell receptor–antigen binding specificity. Nat. Mach. Intell. 2021, 3, 864–875. [Google Scholar] [CrossRef]
Fu, H.; Liang, Y.; Zhong, X.; Pan, Z.; Huang, L.; Zhang, H.; Xu, Y.; Zhou, W.; Liu, Z. Codon optimization with deep learning to enhance protein expression. Sci. Rep. 2020, 10, 17617. [Google Scholar]
Kumar, A.; Dixit, S.; Srinivasan, K.; M, D.; Vincent, P.M.D.R. Personalized cancer vaccine design using AI-powered technologies. Front. Immunol. 2024, 15, 1357217. [Google Scholar]
Li, S.; Moayedpour, S.; Li, R.; Bailey, M.; Riahi, S.; Kogler-Anele, L.; Miladi, M.; Miner, J.; Zheng, D.; Wang, J.; et al. CodonBERT: Large language models for mRNA design and optimization. bioRxiv 2023. [Google Scholar] [CrossRef]
Cai, Y.; Chen, R.; Gao, S.; Li, W.; Liu, Y.; Su, G.; Song, M.; Jiang, M.; Jiang, C.; Zhang, X. Artificial intelligence applied in neoantigen identification facilitates personalized cancer immunotherapy. Front. Oncol. 2023, 12, 1054231. [Google Scholar]
Castillo-Hair, S.; Fedak, S.; Wang, B.; Linder, J.; Havens, K.; Certo, M.; Seelig, G. Optimizing 5′UTRs for mRNA-delivered gene editing using deep learning. Nat. Commun. 2024, 15, 5284. [Google Scholar] [CrossRef]
Chu, Y.; Yu, D.; Li, Y.; Huang, K.; Shen, Y.; Cong, L.; Zhang, J.; Wang, M. A 5′UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions. bioRxiv 2023. [Google Scholar] [CrossRef]
Cafri, G.; Gartner, J.J.; Zaks, T.; Hopson, K.; Levin, N.; Paria, B.C.; Parkhurst, M.R.; Yossef, R.; Lowery, F.J.; Jafferji, M.S.; et al. mRNA vaccine–induced neoantigen-specific T cell immunity in patients with gastrointestinal cancer. J. Clin. Investig. 2020, 130, 5976–5988. [Google Scholar] [PubMed]
Mekki-Berrada, F.; Ren, Z.; Huang, T.; Wong, W.K.; Zheng, F.; Xie, J.; Tian, I.P.S.; Jayavelu, S.; Mahfoud, Z.; Bash, D.; et al. Two-step machine learning enables optimized nanoparticle synthesis. Npj Comput. Mater. 2021, 7, 55. [Google Scholar]
He, S.; Gao, B.; Sabnis, R.; Sun, Q. RNAdegformer: Accurate prediction of mRNA degradation at nucleotide resolution with deep learning. Brief. Bioinform. 2023, 24, bbac581. [Google Scholar]
Laumont, C.M.; Vincent, K.; Hesnard, L.; Audemard, É.; Bonneil, É.; Laverdure, J.-P.; Gendron, P.; Courcelles, M.; Hardy, M.-P.; Côté, C.; et al. Noncoding regions are the main source of tumor-specific antigens. Sci. Transl. Med. 2018, 10, eaau5516. [Google Scholar] [CrossRef]
Wells, D.K.; van Buuren, M.M.; Dang, K.K.; Hubbard-Lucey, V.M.; Sheehan, K.C.; Campbell, K.M.; Lamb, A.; Ward, J.P.; Sidney, J.; Blazquez, A.B.; et al. Key parameters of tumor epitope immunogenicity revealed through a consortium approach improve neoantigen prediction. Cell 2020, 183, 818–834.e13. [Google Scholar] [CrossRef]
Schumacher, T.N.; Schreiber, R.D. Neoantigens in cancer immunotherapy. Science 2015, 348, 69–74. [Google Scholar] [CrossRef]
Hilf, N.; Kuttruff-Coqui, S.; Frenzel, L.P.; Bukur, T.; Stevanović, S.; Gouttefangeas, C.; Platten, M.; Tabatabai, J.; Dutoit, V.; van der Burg, S.H.; et al. Actively personalized vaccination trial for newly diagnosed glioblastoma. Nature 2019, 565, 240–245. [Google Scholar] [CrossRef]
Yadav, M.; Jhunjhunwala, S.; Phung, Q.; Lupardus, P.; Tanguay, J.; Bumbaca, S.; Franci, C.; Cheung, T.K.; Fritsche, F.; Weinschenk, T.; et al. Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing. Nature 2014, 515, 572–576. [Google Scholar] [CrossRef]
Hundal, J.; Kiwala, S.; McMichael, J.; Miller, C.A.; Xia, H.; Wollam, A.T.; Liu, C.J.; Zhao, S.; Feng, Y.-Y.; Graubert, A.P.; et al. pVACtools: A computational toolkit to identify and visualize cancer neoantigens. Cancer Immunol. Res. 2020, 8, 409–420. [Google Scholar] [CrossRef]
Laumont, C.M.; Wouters, M.C.A.; Smazynski, J.; Gierc, N.S. The landscape of tumor antigens and neoantigens in cancer immunotherapy. Nat. Rev. Cancer 2022, 22, 682–696. [Google Scholar] [CrossRef]
Vitiello, A.; Zanetti, M. Neoantigen prediction and the need for validation. Nat. Biotech. 2017, 35, 815–817. [Google Scholar]
Jurtz, V.; Paul, S.; Andreatta, M.; Marcatili, P.; Peters, B.; Nielsen, M. NetMHCpan-4.0: Improved peptide–MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J. Immunol. 2017, 199, 3360–3368. [Google Scholar] [CrossRef] [PubMed]
Xu, X.; Liu, X.; Li, X.; Zhang, L.; Wang, Y. Advances in deep learning for neoantigen prediction. Front. Immunol. 2021, 12, 705096. [Google Scholar] [CrossRef]
Zhao, W.; Li, S.; Liu, Y.; Yang, H. NeoaPred: A structure-based deep learning approach for predicting peptide-MHC binding and immunogenicity. Bioinformatics 2022, 38, 3792–3800. [Google Scholar] [CrossRef]
Abella, J.R.; Brown, J.R.; Pal, M.; Silvestri, G. DeepHLApan: A convolutional neural network for MHC-peptide binding prediction. PLoS Comput. Biol. 2020, 16, e1007869. [Google Scholar] [CrossRef]
Racle, J.; Michaux, J. Machine learning-based approaches for neoantigen prediction. Trends Cancer 2022, 8, 45–57. [Google Scholar] [CrossRef]
Carri, I.; Schwab, E.; Podaza, E.; Alvarez, H.M.G.; Mordoh, J.; Nielsen, M.; Barrio, M.M. Beyond MHC binding: Immunogenicity prediction tools to refine neoantigen selection in cancer patients. Explor. Immunol. 2023, 3, 82–103. [Google Scholar]
Xin, K.; Wei, X.; Shao, J.; Chen, F.; Liu, Q.; Liu, B. Establishment of a novel tumor neoantigen prediction tool for personalized vaccine design. Hum. Vaccines Immunother. 2024, 20, 2300881. [Google Scholar]
Kim, J.Y.; Bang, H.; Noh, S.J.; Choi, J.K. DeepNeo: A webserver for predicting immunogenic neoantigens. Nucleic Acids Res. 2023, 51, W134–W140. [Google Scholar]
Yang, Z.; Bogdan, P.; Nazarian, S. An in silico deep learning approach to multi-epitope vaccine design: A SARS-CoV-2 case study. Sci. Rep. 2021, 11, 3238. [Google Scholar]
Bulashevska, A.; Nacsa, Z.; Lang, F.; Braun, M.; Machyna, M.; Diken, M.; Childs, L.; König, R. Artificial intelligence and neoantigens: Paving the path for precision cancer immunotherapy. Front. Immunol. 2024, 15, 1394003. [Google Scholar]
Plotkin, J.B.; Kudla, G. Synonymous but not the same: The causes and consequences of codon bias. Nat. Rev. Genet. 2011, 12, 32–42. [Google Scholar] [PubMed]
Sharp, P.M.; Li, W.H. The codon adaptation index—A measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15, 1281–1295. [Google Scholar]
Li, Y.; Wang, F.; Yang, J.; Han, Z.; Chen, L.; Jiang, W.; Zhou, H.; Li, T.; Tang, Z.; Deng, J.; et al. Deep Generative Optimization of mRNA Codon Sequences for Enhanced Protein Production and Therapeutic Efficacy. bioRxiv 2024. [Google Scholar] [CrossRef]
Joshi, M.; Wang, A.; Myong, S. 5′UTR G-quadruplex structure enhances translation in size dependent manner. Nat. Commun. 2024, 15, 3963. [Google Scholar] [CrossRef]
Hong, D.; Jeong, S. 3′UTR Diversity: Expanding Repertoire of RNA Alterations in Human mRNAs. Mol. Cells 2023, 46, 48–56. [Google Scholar] [CrossRef]
Tang, X.; Huo, M.; Chen, Y.; Huang, H.; Qin, S.; Luo, J.; Qin, Z.; Jiang, X.; Liu, Y.; Duan, X.; et al. A novel deep generative model for mRNA vaccine development: Designing 5′ UTRs with N1-methyl-pseudouridine modification. Acta Pharm. Sin. B 2024, 14, 1814–1826. [Google Scholar]
Morrow, A.; Thornal, A.; Flynn, E.D.; Hoelzli, E.; Shan, M.; Garipler, G.; Kirchner, R.; Reddy, A.J.; Tabchouri, S.; Gupta, A.S.; et al. ML-driven design of 3′ UTRs for mRNA stability. bioRxiv 2024. [Google Scholar] [CrossRef]
Liu, Y.; Gao, J.; Zhang, X.; Fang, X. Joint Design of 5′Untranslated Region and Coding Sequence of mRNA. arXiv 2024, arXiv:2410.20781. [Google Scholar]
Castillo-Hair, S.M.; Seelig, G. Machine learning for designing next-generation mRNA therapeutics. Acc. Chem. Res. 2021, 55, 24–34. [Google Scholar] [PubMed]
Shi, R.; Liu, X.; Wang, Y.; Pan, M.; Wang, S.; Shi, L.; Ni, B. Long-term stability and immunogenicity of lipid nanoparticle COVID-19 mRNA vaccine is affected by particle size. Hum. Vaccines Immunother. 2024, 20, 2342592. [Google Scholar]
Xie, C.; Yao, R.; Xia, X. The advances of adjuvants in mRNA vaccines. npj Vaccines 2023, 8, 162. [Google Scholar] [PubMed]
Li, X.; Qi, J.; Wang, J.; Hu, W.; Zhou, W.; Wang, Y.; Li, T. Nanoparticle technology for mRNA: Delivery strategy, clinical application and developmental landscape. Theranostics 2024, 14, 738. [Google Scholar]
Xu, Y.; Ma, S.; Cui, H.; Chen, J.; Xu, S.; Gong, F.; Golubovic, A.; Zhou, M.; Wang, K.C.; Varley, A.; et al. AGILE platform: A deep learning powered approach to accelerate LNP development for mRNA delivery. Nat. Commun. 2024, 15, 6305. [Google Scholar]
Wu, K.; Yang, X.; Wang, Z.; Li, N.; Zhang, J.; Liu, L. Data-balanced transformer for accelerated ionizable lipid nanoparticles screening in mRNA delivery. Brief. Bioinform. 2024, 25, bbae186. [Google Scholar]
Zhang, W.Y.; Zheng, X.L.; Coghi, P.S.; Chen, J.H.; Dong, B.J.; Fan, X.X. Revolutionizing adjuvant development: Harnessing AI for next-generation cancer vaccines. Front. Immunol. 2024, 15, 1438030. [Google Scholar]
Chaudhury, S.; Duncan, E.H.; Atre, T.; Storme, C.K.; Beck, K.; Kaba, S.A.; Lanar, D.E.; Bergmann-Leitner, E.S. Identification of immune signatures of novel adjuvant formulations using machine learning. Sci. Rep. 2018, 8, 17508. [Google Scholar]
Ma, J.; Wang, S.; Zhao, C.; Yan, X.; Ren, Q.; Dong, Z.; Qiu, J.; Liu, Y.; Shan, Q.; Xu, M.; et al. Computer-Aided Discovery of Potent Broad-Spectrum Vaccine Adjuvants. Angew. Chem. Int. Ed. 2023, 62, e202301059. [Google Scholar]
Gude, S.; Abburi, S.K.; Gali, P.K.; Gorlagunta, S. Advancing single-shot vaccine design through AI and computational models. Transl. Regul. Sci. 2025. [Google Scholar] [CrossRef]
McGranahan, N.; Swanton, C. Clonal heterogeneity and tumor evolution: Past, present, and the future. Cell 2017, 168, 613–628. [Google Scholar]
Sugiyama, N.; Terry, F.E.; Gutierrez, A.H.; Hirano, T.; Hoshi, M.; Mizuno, Y.; Martin, W.; Yasunaga, S.; Niiro, H.; Fujio, K.; et al. Individual and population-level variability in HLA-DR associated immunogenicity risk of biologics used for the treatment of rheumatoid arthritis. Front. Immunol. 2024, 15, 1377911. [Google Scholar] [CrossRef]

Figure 1. Schematic representation of the mRNA vaccine manufacturing process, including neoantigen selection, codon optimization, untranslated region (UTR) design, mRNA synthesis, and lipid nanoparticle (LNP) formulation.

Table 1. Representative AI applications in personalized cancer vaccine development.

Aspect	Reference	AI Model	Data and Inputs	Key Outcome
Neoantigen discovery	Springer et al. (2020) [14]	LSTM neural network for TCR–peptide binding prediction	>170,000 TCR–peptide pairs	Learned TCR–peptide specificity; achieved performance on par with state-of-the-art methods
	Montemurro et al. (2021) [15]	CNN model for paired TCRα and TCRβ sequences	Known TCRαβ sequences and cognate peptides (HLA-A*02:01, 9-mer)	Improved prediction of TCR–peptide binding (79% specificity at a 2% false-positive rate)
	Lu et al. (2021) [16]	Transfer learning deep learning model for pMHC–TCR binding	Sequence data: mutated peptide, patient’s MHC class I, and TCR sequence	Achieved high accuracy (AUC 0.827 on independent testing) in predicting TCR binding
Codon optimization	Fu et al. (2020) [17]	BiLSTM-CRF deep learning model using “codon boxes”	E. coli expression of multiple genes (e.g., vaccine antigen FALVAC-1, PTP4A3)	AI-optimized genes showed a higher protein expression than industry-optimized sequences
	Costa/Absci (2024) [18]	Transformer-based language model (“CO-BERT”) for codon choice	Large-scale coding sequence dataset (multiple organisms)	Predicted optimal synonymous codons for maximal protein expression
	Wang et al. (2023) [19]	Multi-species transformer (“CodonTransformer”) for codon optimization	>1 million DNA–protein sequence pairs from 164 species	Learned codon usage across species; enabled cross-species gene optimization
UTR sequence generation	Song et al. (2024) [20]	Deep generative model (“Smart5UTR”)—multitask autoencoder with a CNN encoder	5′ UTR library with >200,000 sequences tested via MPRA	Generated optimized 5′ UTRs for N1-methyl-pseudouridine mRNA, improving vaccine efficacy
	Castillo-Hair et al. (2024) [21]	Deep learning regression + generative design for 5′UTR (CNN models and optimization)	Polysome profiling data from random 5′UTR libraries in 3 human cell types	Designed synthetic 5′ UTRs that enhanced translation of a gene editor enzyme
	Chu et al. (2024) [22]	Pretrained transformer language model (“UTR-LM”) for 5′UTRs	Endogenous 5′ UTR sequences from multiple species (unsupervised pretraining)	Learned a language of 5′ UTRs enabling improved translation initiation efficiency
mRNA vaccine design	Cafri et al. (2020) [23]	Personalized neoantigen mRNA vaccine (clinical trial)	13 patients with gastrointestinal tumors; 5–20 neoantigens per mRNA vaccine	First-in-human phase I trial of an individualized mRNA neoantigen vaccine
	Mekki-Berrada et al. (2021) [24]	Two-step ML for lipid nanoparticle formulation optimization	Data from combinatorial synthesis of polymeric nanoparticles for mRNA delivery	AI-guided nanoparticle formulation improved mRNA delivery efficiency
	He et al. (2023) [25]	CNN with self-attention (“RNAdeformer”) for mRNA degradation prediction	Public mRNA stability datasets (e.g., OpenVaccine COVID-19 mRNA data)	Achieved state-of-the-art accuracy in predicting mRNA half-life and degradation sites

Abbreviations: MHC—major histocompatibility complex; TCR—T cell receptor; LSTM—long short-term memory; CNN—convolutional neural network; HLA—human leukocyte antigen; MPRA—massively parallel reporter assay; UTR—untranslated region.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kong, H. Advances in Personalized Cancer Vaccine Development: AI Applications from Neoantigen Discovery to mRNA Formulation. BioChem 2025, 5, 5. https://doi.org/10.3390/biochem5020005

AMA Style

Kong H. Advances in Personalized Cancer Vaccine Development: AI Applications from Neoantigen Discovery to mRNA Formulation. BioChem. 2025; 5(2):5. https://doi.org/10.3390/biochem5020005

Chicago/Turabian Style

Kong, Hyunseung. 2025. "Advances in Personalized Cancer Vaccine Development: AI Applications from Neoantigen Discovery to mRNA Formulation" BioChem 5, no. 2: 5. https://doi.org/10.3390/biochem5020005

APA Style

Kong, H. (2025). Advances in Personalized Cancer Vaccine Development: AI Applications from Neoantigen Discovery to mRNA Formulation. BioChem, 5(2), 5. https://doi.org/10.3390/biochem5020005

Article Menu

Advances in Personalized Cancer Vaccine Development: AI Applications from Neoantigen Discovery to mRNA Formulation

Abstract

1. Introduction

2. AI in Neoantigen Discovery

2.1. Enhancing Neoantigen Prediction with Machine Learning

2.2. Improved Pipelines and Web Services

3. AI in Codon Optimization for Vaccine Antigens

3.1. Deep Learning Models for Codon Optimization

3.2. Experimental Validation of AI-Optimized Codon Sequences

4. AI in UTR Sequence Generation and mRNA Design

4.1. AI-Driven 5′ UTR Optimization

4.2. AI-Driven 3′ UTR Optimization

4.3. Implications for mRNA Vaccine Design

5. AI in mRNA Vaccine Formulation and Delivery

5.1. AI-Guided Lipid Nanoparticle (LNP) Optimization

5.2. AI in Adjuvant and Immune-Stimulatory Element Design

5.3. AI in Personalized mRNA Vaccine Formulation

6. Challenges and Future Perspectives

7. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI