1. Introduction
Early-stage ship design is a critical phase that sets the foundation for a vessel’s performance, cost, and feasibility. Decisions made at the conceptual design stage have outsized impacts; the hull form alone influences an estimated
of a ship’s total cost [
1]. Traditionally, naval architects rely on iterative, human-driven processes (the design spiral) to balance requirements such as speed, capacity, stability, and efficiency. These conventional hull form generation methods typically use mathematical or parametric models defined by a handful of geometric parameters (block coefficient, midship section coefficient, B-spline control points, etc.). Such approaches allow systematic variation of hull shapes and have been widely employed in academic studies and optimization frameworks [
2]. However, a well-known limitation is that these geometric parameters do not directly correspond to performance objectives. Designers often specify high-level performance targets (such as required speed, cargo capacity, or fuel efficiency) during early conceptual design stages, yet the parametric hull models require guessing shape coefficients that only indirectly achieve those targets. This disconnect makes it challenging to explore the design space in a performance-driven way during conceptual design. As a result, the traditional process can be laborious and may miss non-intuitive but high-potential hull configurations.
Generative modeling offers a novel approach to address these challenges, leveraging machine learning to automatically learn shape–performance relationships from data [
3]. Instead of manually tweaking hull parameters, a generative AI model can synthesize new hull designs that meet desired criteria, effectively serving as a creative assistant in early-stage design. Recent advances in deep learning, especially generative adversarial networks (GANs), variational autoencoders (VAEs), and denoising diffusion probabilistic models (DDPMs), have demonstrated the ability to produce realistic and high-quality designs in other engineering domains such as aerofoil shapes conditioned on aerodynamic targets [
3,
4]. Applying these models to ship hulls is particularly promising because it could automate the exploration of a vast design space while ensuring feasibility and adherence to objectives. Preliminary studies indicate that generative models can significantly broaden the range of hull form options considered, including unconventional shapes that a designer might not have conceived manually [
5]. Moreover, by conditioning on performance metrics, generative models enable a direct inverse design capability.
Despite this potential, hull form generative modeling presents unique challenges. A ship hull is a complex 3D geometry embodying multiple design aspects (hydrostatics, hydrodynamics, stability, etc.), and capturing it in a learning model is crucial. Conventional generative approaches typically represent the hull in a single modality (for instance, a parameter vector or a surface mesh), which may not capture all relevant features of the shape. Additionally, ensuring that generated hulls are feasible (fairness, manufacturability, etc.) is difficult, as pure data-driven models might produce unrealistic forms if the models lack awareness of naval architectural principles. These challenges motivate a representation-aware learning approach; by training on multimodal 3D geometric descriptors of the hull simultaneously, the model can be imbued with a richer, more structured understanding of hull geometry. In this work, the novelty of joint training on multiple representations is emphasized—in this case, 3D point clouds of the hull surface, waterline and buttock curves (characterizing cross-sectional shape distribution). By fusing these descriptors into a common latent space, the generative model can learn a holistic representation of hull form. The aim is to enable conditional generation of hulls that not only meet high-level design targets but also inherently respect geometric constraints and hydrodynamic considerations. This representation-aware generative modeling approach has the potential to overcome the limits of parametric models and produce creative yet feasible hull designs, accelerating the early-stage design process.
The main contributions of this work can be summarized as follows:
A representation-aware multimodal learning framework for ship hull geometry that jointly utilizes point clouds, waterline splines, and buttock curves.
A shared latent representation that enforces cross-modal geometric consistency between these complementary descriptors.
A conditional latent diffusion model enabling controllable generation of hull geometries based on high-level design parameters.
A surface reconstruction pipeline capable of generating coherent hull meshes from multimodal geometric representations.
Together, these contributions provide a scalable, data-driven approach for exploring hull design spaces during early-stage ship design.
2. Background
2.1. Parametric and Optimization-Based Hull Form Generation
Parametric hull form models have long underpinned early-stage design and optimization. Classic mathematical hull forms (such as the Wigley hull and its variants) and spline-based surface representations allow designers to describe a family of hull shapes using a few parameters. For example, Zhou et al. [
6] used NURBS surfaces to create a flexible parametric design framework for ship hulls, and earlier works like Abt et al. [
7] and Zhang et al. [
8] demonstrated how adjusting parametric coefficients can systematically generate hull variants for exploration. These parametric models are invaluable for conducting design of experiments and sensitivity studies, since they ensure that every combination of parameters yields a plausible hull geometry. However, their knowledge is limited by the chosen parametrization; they excel at interpolating within the defined shape family but may struggle to represent innovative hull forms outside that parent hull’s family.
Building atop parametric models, a vast body of research applied numerical optimization algorithms to improve hull performance. Early works include multi-objective genetic algorithms and direct search methods used to minimize drag or improve seakeeping by varying hull parameters. Han et al. [
9] and Feng et al. [
10] optimized a containership’s hull parameters to minimize resistance in calm water and waves, demonstrating the benefits of automated search on top of a baseline parametric model. Such CFD-driven optimization can yield refined shapes; Feng et al. [
10] achieved notable resistance reductions but at high computational cost due to the required simulations. Likewise, Tahara et al. [
11] integrated a CFD solver with a multi-objective evolutionary algorithm to balance drag and other objectives. These efforts show that given a parametric design space, conventional optimizers can find better hulls than manual tuning.
Researchers have also explored specialized parametric methodologies for particular hull features. For instance, parametric bulbous bow design has been tackled by optimizing curve-based descriptors of the bow region, resulting in reduced resistance for specific operating conditions. Knight et al. [
12] applied particle swarm optimization to a planing craft hull, incorporating strategies to maintain solution diversity. Their follow-up work introduced a niching mechanism to ensure a spread of optimal designs, addressing the tendency of single-objective optimizations to converge to one solution [
13]. These studies highlight a recurring challenge in parametric optimization: the need to balance multiple objectives and generate a suite of viable designs rather than a single optimum. Traditional optimizers require many evaluations and can become trapped in local optima, and ensuring diversity often relies on improvised methods.
Parametric approaches provide a structured way to generate hull forms and, when coupled with optimization, can improve designs against specific criteria. However, they are constrained by the predetermined shape basis and heavy reliance on expert intuition. As noted by Brown & Salcedo [
14], naval design is inherently multi-objective and interdisciplinary, which makes it difficult for any low-dimensional parametric model to capture all relevant design trade-offs. This has paved the way for data-driven techniques that learn more complex shape relationships from examples, as discussed next.
2.2. Deep Generative Models in Marine Design
Inspired by successes in image and aerospace shape generation, researchers have recently begun applying deep generative models to ship design. GANs are a popular choice for producing new geometry examples. Yonekura et al. [
2] proposed a conditional GAN for ship hulls that takes inputs as target performance metrics (specifically design speed, displacement/tonnage, and drag coefficient) and outputs a hull form geometry. This approach essentially learns an inverse design mapping from performance to shape. Using a Wasserstein GAN with gradient penalty, trained on a dataset derived from a generalized Wigley hull, they demonstrated the ability to generate hulls that achieve specified drag and tonnage within small error bounds. Notably, their GAN-based model required no explicit geometric parameters as input; in contrast to parametric models, the generator network directly produced the hull’s lines plan. This work demonstrated the feasibility of performance-conditioned hull generation, addressing the gap where conventional methods struggled, exploring hull geometry purely driven by performance goals.
Another GAN-based framework, ShipHullGAN by Khan et al. [
15], aimed to expand the diversity of generated hull forms beyond a single ship type. ShipHullGAN is a deep convolutional GAN trained on a remarkably large and diverse dataset of 52,591 hull designs spanning multiple vessel classes (container ships, tankers, bulk carriers, etc.). By learning from such a broad distribution, the model can generate hull forms that are not confined to one class, addressing the “conservatism” of earlier parametric models that typically handle only a specific ship type. A key innovation was the use of a unified shape representation for all training hulls. Each hull was converted into a standardized 3D shape tensor (based on low-order geometric moment invariants) so that geometrically and physically disparate designs could be learned jointly. This shape-signature tensor enabled the GAN to not only produce plausible new hulls, but also incorporate physics-informed features (the geometric moments correlate with volume, centroid, etc.) during generation. The results showed that ShipHullGAN can generate a rich variety of hull forms, including both traditional-looking designs and novel configurations, while maintaining geometric validity. This underscores the promise of GANs for expanding the creative scope in ship design, especially when trained on big data covering the full range of historical designs.
DDPMs have also emerged as a cutting-edge generative approach, offering stability advantages and high-quality outputs in many fields. Bagazinski and Ahmed [
16] introduced ShipGen, a diffusion model for early-stage hull design that operates on a vector of hull form parameters (principal dimensions, fullness coefficients, etc.). In ShipGen, random noise is iteratively refined into a realistic hull parameter set by the diffusion process, which is trained on a large dataset (the Ship-D repository of 30k hulls [
17]). By incorporating a classifier-guidance mechanism during generation, ShipGen can bias the output toward desired performance outcomes. The results were astonishing; the diffusion model’s samples covered about
of the original data distribution, a
improvement in design space coverage compared to random sampling of parameters. Moreover, the generated hulls on average had
lower wave drag and significantly higher displacement (which raises the question of validation and verification) than the dataset mean, indicating the model was not merely reproducing existing designs but extrapolating toward higher-performing hull forms. This “generate and optimize” capability is particularly noteworthy; the generative model inherently seeks better designs without a separate optimization loop. Bagazinski & Ahmed [
18] also developed a conditional diffusion model (C-SHIPGEN) that allows designers to input high-level constraints (desired length, beam, draft, etc.) and then generates hulls satisfying those constraints. By integrating a physics-based resistance estimator into the sampling process (analogous to adding domain knowledge), the conditional diffusion model achieved around 25–30% lower resistance in its generated hulls compared to a baseline optimized design, all while meeting the specified dimensional constraints. These advances position diffusion models as a powerful tool in marine design as they provide probabilistic, controllable generation and can seamlessly handle multi-objective goals via guidance.
In addition to deep neural generative models, simpler generative techniques have been explored as benchmarks. Thakur et al. [
5] utilized a Gaussian Mixture Model (GMM) as a generative engine for hull design, using the same Ship-D dataset of 30,000 hulls. A GMM statistically estimates the probability distribution of hull form parameters; sampling from it yields new design points that mimic the real data distribution. Thakur et al. reported that the GMM approach could efficiently produce innovative hull designs that venture into sparse regions of the dataset, covering extremes that random or grid sampling would likely miss. While a GMM lacks the complexity to capture subtle shape relationships compared to GANs or diffusion models, it offers mathematical transparency and very fast sampling. Such approaches can be useful for generating large pools of candidate designs for preliminary evaluation.
The marine design community has begun embracing deep generative models (GANs, VAEs, DDPMs) alongside probabilistic models (GMMs) to automate hull form creation. Each comes with their own trade-offs; GANs and VAEs learn an explicit latent representation of hull geometry, which can be useful for interpolation and clustering of designs, whereas diffusion models excel at unbiased exploration of complex distributions. Crucially, all these methods highlight a trend towards data-driven design synthesis, moving beyond the limitations of human intuition and low-dimensional parametric spaces.
2.3. Geometric Representation Learning in 3D Engineering Design
A core challenge in applying generative models to hull forms is how to represent the 3D geometry in a way that is learnable by machine learning algorithms. Unlike images or aerofoils, a ship hull lacks a simple parametrization (it can be described by offsets, sectional area curves, surface patches, point clouds, etc.). Recent work in engineering design underscores the importance of finding compact, informative representations for complex shapes. The availability of comprehensive datasets such as Ship-D [
17] (which provides multiple representations of each hull, including parameter vectors, 3D meshes, point clouds, and even 2D projections, along with computed performance metrics) has been a catalyst in this area. With such multimodal data, one can train models to understand correspondences between different shape descriptions and performance attributes.
Dimensionality reduction techniques are often a first step in representation learning for hulls. By reducing a hull’s shape to a few latent variables, a design space where exploration and optimization are more tractable and where generative models can operate efficiently can be obtained. Principal Component Analysis (PCA) is a classical approach. Yu & Wang [
19] applied PCA to a database of existing hull forms to derive a set of orthogonal shape basis vectors, and then used a neural network to map those PCA scores to predictions of resistance. This approach treated the PCA coefficients as reduced parameters of the hull, enabling rapid evaluation of thousands of designs in the low-dimensional PCA space. While PCA ensures linear dimensionality reduction, more powerful nonlinear techniques have been explored. Demo et al. [
20] introduced a self-learning mesh morphing and parameter space reduction method for hull shape optimization. They combined a parametric model with mesh morphing, then applied proper orthogonal decomposition to reduce the shape degrees of freedom, effectively learning a small set of modes that capture most of the hull variation. This reduced-order model, when coupled with optimization algorithms, significantly accelerated the search for optimal shapes.
Deep learning methods take representation learning further by training neural networks to encode and decode shapes. An autoencoder, for instance, can compress a hull surface (represented as a point cloud or voxel grid) into a latent vector, and then reconstruct the surface from that vector. Such learned latent spaces often capture non-linear shape features more expressively than PCA. Wang et al. [
21] demonstrated a 3D hull form encoding using a deep neural network, where hull geometries were embedded in a latent space that was then used for efficient hydrodynamic optimization. By operating in the learned feature space, their method could perform gradient-based optimization of hull forms with respect to drag and stability criteria, navigating design changes that were hard to realize in a purely parametric space. The success of this approach indicates that a well-trained encoder can act as a geometric surrogate, translating between the high-dimensional shape domain and a manageable set of latent variables that still govern the essential shape characteristics.
Multi-modal representation learning is particularly relevant to hull design because no single representation is able to capture all aspects of a hull. Researchers have begun to merge different sources of geometric information to improve model fidelity. One noteworthy concept is the augmented shape signature. Masood et al. [
22] proposed using augmented shape signature vectors (SSVs) that concatenate geometric descriptors with physics-related features for each design. In a comparative study of generative vs. non-generative models for hydrofoil design, they found that providing the learning model with a richer descriptor (including integral geometry properties and key performance parameters) led to more valid and diverse designs. In essence, by informing the latent representation with physical context (such as coefficients related to lift or drag alongside geometry), the model could more reliably generate shapes that met design criteria without producing non-physical outliers. A similar philosophy was employed by Khan et al. [
15] in ShipHullGAN, where the Shape-Signature Tensor encodes each hull’s geometry in terms of its low-order moments. These moments embed basic naval architectural knowledge (volume, centre of buoyancy, etc.) into the representation itself. The result was a latent space where geometric validity and even coarse performance trends were implicitly learned, aiding the generative model in producing practical designs.
It is worth noting that combining multiple geometric representations can dramatically increase the available information for learning. For example, a hull’s waterline curve (or a set of sectional curves at various drafts) captures important aspects of its shape that a simple length–beam–draft vector does not. By jointly training on surface points and curves, a model can infer correlations between these views, leading to a more robust latent representation. This approach aligns with trends in 3D shape learning where multi-view or multi-modal autoencoders have shown improved performance in capturing complex shapes (such as using both images and point clouds to learn a single object latent space). Although specific applications to ship hulls are still emerging, the groundwork in related fields suggests that representation-aware learning will enhance generative design. All in all, by leveraging techniques from dimensionality reduction, deep autoencoding, and multi-modal data integration, researchers are building the representational foundations needed for advanced hull generators. The stage is set to move from single-modality models to those that understand a hull form in a more comprehensive, human-like way through multiple complementary descriptors.
3. Methodology
Figure 1 presents the overall methodology flow chart outlining the stages of the proposed generative hull form design framework. The proposed pipeline comprises four principal components: dataset generation via structured and unstructured geometric preprocessing, multimodal latent representation learning through a conditional autoencoding architecture, a generative hull synthesis model based on latent diffusion, and last but not least, evaluation procedures for assessing geometric fidelity and generative diversity. Each of these stages is detailed in the following subsections.
3.1. Dataset Generation
To train the models, a synthetic hull form dataset covering a broad design space (different ship types and sizes) was constructed. Starting from 24 diverse parent hulls, as seen in
Figure 2, 625 systematic variations were generated per parent by perturbing principal dimensions (length, beam, draft) and fullness coefficients (e.g., block and prismatic coefficients). The approach is similar to the creation of the Ship-D dataset (30,000 hulls from 12 parents) described by Thakur et al. [
5], but at a smaller scale. To exploit inherent port–starboard symmetry, all hulls are reduced to their half-hull geometry by discarding the portion with
, following symmetry plane clipping procedures implemented via trimesh. This not only reduces data dimensionality but also eliminates redundant shape features. In total, the dataset comprises 15,000 hulls, providing a broad coverage of hull form geometry and parameters within the chosen design space.
All hull representations were spatially normalized by scaling each instance to fit within a unit cube of dimensions . This was achieved by dividing the vertex coordinates and spline descriptors by the respective hull’s length overall (L), breadth (B), and depth (D). Such normalization ensures geometric consistency across the dataset and stabilizes the subsequent training of neural architectures. For each hull in the dataset, three complementary geometric representations were precomputed:
Point Cloud (PC): A uniformly sampled 3D point cloud of the hull surface, capturing the global geometric topology.
Waterline Splines: A series of 2D splines representing hull cross-sections at fixed vertical intervals along the draft, encoding longitudinal shape transitions.
Buttock Splines: Vertical splines taken along constant longitudinal slices, encoding sectional curvature and hull fairness features along the beamwise direction.
These modalities capture complementary spatial features: the point cloud represents full surface geometry; the waterlines emphasize longitudinal distribution and fullness; and the buttocks capture curvature and fairness.
3.2. Multimodal Latent Representation Learning
To encode hull geometry into a compact, informative, and cross-representational format, a conditional multimodal autoencoder was developed. This model learns to map each of the three geometric modalities (point cloud, waterline grid, and buttock grid) into a shared latent space, as seen in
Figure 3, while ensuring that each modality can be accurately reconstructed from the same latent code.
Each modality is processed by a dedicated encoder tailored to its geometric structure. The unstructured point cloud is passed through a permutation-invariant network that aggregates local features into a global shape descriptor. The waterline and buttock sections are encoded using convolutional architectures that exploit the structured grid layout to learn spatial features over longitudinal and vertical slices.
In addition to the geometric data, the autoencoder is explicitly conditioned on a set of hull design parameters. These include normalized values of principal dimensions and form coefficients, representing global design intent. The conditioning vector is incorporated into each encoder branch, enabling the model to associate geometric variation with design specifications.
Latent representations from each encoder are fused through a lightweight aggregation module to produce a unified latent code. The decoder, conditioned on the same design vector, reconstructs all three modalities simultaneously from this latent embedding. Joint reconstruction from a single latent source enforces cross-modal consistency and ensures that the latent space captures an interpretable and coherent description of the hull form.
The model is trained end-to-end using modality-specific reconstruction losses, with adjustments to account for structured grid validity and representation-specific scales. Details of auxiliary objectives and weighting schemes are omitted here for brevity. Training was performed over an extended schedule with adaptive optimization and mixed precision acceleration. Throughout training, visual inspections and reconstruction diagnostics confirmed that the model consistently recovered high-fidelity geometric outputs across all representations.
3.3. Conditional Generative Hull Model
While the multimodal autoencoder provides a compact and physically structured latent representation of hull geometry, it is not inherently generative. To enable controlled synthesis of novel hull forms, a conditional DDPM was trained directly in the learned latent space of the autoencoder. The DDPM gradually learns to reverse a noise process to generate latent codes corresponding to plausible hull designs. This class of generative models has been shown to be particularly stable and effective in capturing multi-modal data distributions.
The generative process is conditioned using a 12-dimensional design vector representing a set of global hull parameters commonly used in preliminary ship design. These parameters provide high-level geometric constraints that guide the generative model during the sampling of the latent design space. While such global parameters cannot fully describe the detailed geometry of a ship hull, they provide an interpretable and physically meaningful conditioning mechanism that allows the model to produce hull forms consistent with fundamental naval architectural characteristics. These parameters act as conditioning variables rather than a complete geometric parametrization of the hull surface.
In this study, the parameter set was intentionally limited to a small number of widely used design variables in order to demonstrate the feasibility of the proposed representation-aware generative framework. The aim is therefore not to fully parametrize hull geometry, but rather to guide the generative model toward realistic regions of the design space. Future work will explore the integration of more detailed geometric descriptors and configuration-specific parameters, enabling finer control over local hull features such as bulbous bow variations and sectional geometry.
The DDPM is conditioned on a 12-dimensional design vector representing high-level hull design parameters.
L: Length
B: Breadth
D: Depth
T: Draft
: Prismatic coefficient
: Midship coefficient
: Waterplane coefficient
LBP: Length between perpendiculars
LCF: Longitudinal centre of floatation
LCB: Longitudinal centre of buoyancy
KB: Centre of buoyancy
Bulbousbow: Indicator of presence of bulbous bow
These descriptors are computed for each training sample and normalized to zero mean and unit variance. During training, the DDPM is conditioned on these vectors, enabling it to generate latent codes tailored to specific hull configurations. The model employs a UNet-based architecture with sinusoidal timestep embeddings, trained for 1000 epochs with a batch size of 32. During inference, a desired condition vector is provided, and the DDPM generates a latent code, which is decoded by the trained autoencoder to produce point cloud and spline representations of a full hull form.
4. Results
The intermediate representations produced by the model (point clouds and spline descriptors) are primarily used within the reconstruction pipeline to generate the hull surface in the
Section 4, where the focus is placed on the final reconstructed hull surface. The results demonstrate that the proposed representation-aware generative framework can successfully produce realistic and geometrically coherent ship hull forms in an early-stage design context, consistent with the objectives outlined in this study. By jointly learning from point cloud representations of the hull surface and structured waterline and buttock spline representations, the autoencoder captures both local surface geometry and global hull structure within a shared latent space.
All experiments were conducted using a GPU-accelerated deep learning framework on a single NVIDIA RTX 4070 Ti Super GPU with 32 GB of ram. The dataset generation pipeline required approximately 12 h to process the complete set of hull geometries and extract the multimodal representations, including curvature-adaptive point clouds, waterline splines, and buttock splines. Training of the multimodal autoencoder required approximately 16 h to converge under the specified training configuration, while training of the conditional latent diffusion model required approximately 12 h. Once training is complete, inference is computationally efficient, with hull geometries generated within seconds. These computational requirements demonstrate that the proposed framework is compatible with practical early-stage ship design workflows, where rapid exploration of candidate hull forms is desirable. The reported training times correspond to the full dataset and training schedule used in this study and may vary depending on hardware configuration and dataset size.
After training, the conditional latent diffusion model is able to generate hull forms that satisfy specified principal dimensions with a high success rate. Qualitatively, the generated hulls appear smooth (more so than the training data) and realistic after reconstruction, capturing the typical features of feasible ship designs from the training data while also presenting new combinations of shape characteristics. Because the model operates in a learned shape space, even complex hull surface details are preserved, unlike simpler parametric models that might oversimplify geometry. The inclusion of form coefficients (, ) in the conditioning proves useful in guiding the fullness of the generated hull. These trends align with naval architectural intuition, indicating the model has learned meaningful relationships between the input parameters and the hull geometry.
4.1. Multimodal Autoencoder Performance
Figure 4 shows the training loss curve of the multimodal autoencoder over the course of the 195 training epochs. The curve exhibits a smooth and monotonic decline, indicating stable convergence without signs of overfitting or divergence. The gradual flattening of the curve after approximately 50 epochs suggests that the model reaches a saturation point in its reconstruction capability, where further improvement becomes marginal. This behavior is consistent with typical learning dynamics in point-based and grid-based geometric encoders, where initial gains stem from coarse structural learning, followed by refinement of local detail. The absence of sudden spikes or oscillations further confirms the numerical stability of the training process, attributed in part to effective data normalization and the use of a well-balanced loss function comprising Chamfer Distance and masked grid errors. Overall, the loss trend validates the efficacy of the network architecture and training configuration for capturing and reconstructing diverse hull geometries.
Reconstruction results indicate that each geometric modality is recovered with high fidelity.
Figure 5 illustrates representative point cloud, waterline and buttock curve reconstructions produced by the autoencoder. The reconstructed point clouds are able to preserve the overall hull geometry while retaining fine-scale surface features, particularly in regions such as the bow, bilge, and stern. The reconstructed waterlines maintain smooth longitudinal variation and preserve key hull characteristics such as beam progression, flare behavior, and fullness distribution along the draft. No oscillatory artifacts or discontinuities are observed, indicating that the spline decoders successfully learn fair hull geometry. The buttock curve reconstructions retain correct keel rise, bilge curvature, and longitudinal fairness. The predicted validity masks accurately delineate regions of geometric support, ensuring that reconstructed buttocks do not introduce spurious geometry outside the hull envelope. An important observation is that the reconstructed point clouds, waterlines, and buttocks describe compatible hull geometries. This cross-modality consistency confirms that the shared latent representation encodes a unified geometric description rather than independent modality-specific features, as intended in the representation-aware training strategy.
4.2. Multimodal DDPM Performance
Figure 6 presents the training and validation loss curves for the conditional latent diffusion model, expressed in terms of mean squared error (MSE) on epsilon prediction. Both curves exhibit a steady and consistent decrease over the full course of 1000 training epochs, confirming the model’s ability to learn the reverse denoising process effectively. The close alignment between training and validation losses throughout training indicates strong generalization and the absence of overfitting. The steep initial decline suggests rapid learning of coarse generative structure within the latent space, while the more gradual slope in later epochs reflects the fine-tuning of residual noise components. The final validation loss converges to approximately 0.17, demonstrating that the model has achieved accurate approximation of the true noise schedule under the velocity-based objective. These trends validate the suitability of the latent diffusion architecture and the stability of the cosine noise schedule adopted during training.
The generated bulkcarrier shown in
Figure 7 and
Table 1 exhibits a smooth and realistic geometry characteristic of full-form cargo vessels. The model successfully captures the volumetric fullness typical of bulk carriers, with a broad midbody, flat bottom sections, and gently sloped entrance angles. Waterline contours display minimal flare, while buttock lines are largely vertical throughout the midship region, yielding a robust and boxy cross-sectional profile. The surface reconstruction is watertight and captures the principal geometric characteristics of the generated hull forms. The overall output demonstrates a high degree of plausibility, underscoring the model’s proficiency in generating stable and operationally credible cargo hulls.
The hull form of the generated tanker shown in
Figure 8 and
Table 2 is notably smooth, symmetric, and highly realistic in appearance. The generated hull features a long parallel midbody, bulbous fullness at the fore and aft, and a broad transom, all hallmarks of conventional tanker design. Waterline and buttock curves exhibit gradual transitions and a balanced displacement distribution, without discontinuities or distortions. The point cloud supports fine detailing along the bilge radius and stern curvature, leading to a high-fidelity surface mesh. This result highlights the model’s capability to internalize and reproduce large-volume hull forms with convincing geometric consistency.
The generated ferry shown in
Figure 9 and
Table 3 exhibits several defining characteristics typical of short-route passenger or vehicle ferries, such as a broad beam, flat-bottomed sections, and a near-vertical stern profile. The model is able to capture the general proportions and layout expected for this vessel class; however, minor geometric deformations are visible, particularly along the stern curvature and upper hull line. These deviations may result from limited representation of ferry-type hulls within the training dataset, leading to less precise generalization. Despite these irregularities, the surface reconstruction remains continuous and watertight, and the spline contours display adequate fairness. This example demonstrates the model’s ability to approximate the functional geometry of ferries, while also highlighting the need for broader dataset diversity to enhance fidelity in less common vessel types.
The generated DTMB 5415-like hull shown in
Figure 10 and
Table 4 exhibits the general proportions and slenderness characteristic of high-speed naval vessels, including a fine entry, moderate beam-to-draft ratio, and a pronounced tapering stern. While the overall hull form reflects the expected hydrodynamic features of the parent hull, the model shows limited accuracy in reproducing the distinct bulbous bow geometry. This is attributed to the relative scarcity of DTMB-type hulls in the training dataset, which limits the model’s exposure to such specialized naval configurations. Nonetheless, the spline sections remain smooth and coherent, and the surface reconstruction yields a watertight mesh. This behavior highlights the importance of dataset diversity in generative modeling and suggests that the inclusion of additional hull forms with comparable bow characteristics could further improve the fidelity of generated results.
A key strength of the approach is the ability to explore the design space under constraints. Conditioning on global geometric and hydrostatic parameters led to stable and repeatable generation behavior. Hulls generated with similar conditioning vectors exhibit similar principal dimensions and overall form characteristics, while variations in conditioning variables produce corresponding and interpretable changes in hull geometry. This confirms that the model learned meaningful relationships between high-level design parameters and detailed geometric outcomes, rather than treating conditioning inputs as auxiliary noise.
By conditioning on global geometric and hydrostatic parameters, naval architects are able to fix some parameters and vary others to see a range of hull options. For instance, holding L, B, and T constant but sweeping through
values yields a family of hull shapes from slender to voluminous, all within the same principal dimensions. This can aid in investigating the impact of hull fullness on performance metrics like resistance or stability. Moreover, because the diffusion model inherently learned from examples with viable performance, the generated hulls are likely to be high-performing or easily tunable. Previous research has noted that generative models can find shape variants with significantly reduced drag or improved displacement efficiency [
18]. The conditional model extends this capability by ensuring those performance-improved variants also meet designers’ predefined size constraints. In effect, it automates the generation of optimized hull candidates for given design specifications, which can then be evaluated or fine-tuned with high-fidelity tools as needed.
5. Discussion
The results demonstrate that the proposed representation-aware generative framework is capable of producing realistic and geometrically coherent ship hull forms in an early-stage design context. By integrating point clouds with structured waterline and buttock spline representations, the autoencoder learns a more complete and robust description of hull geometry than any single representation could provide in isolation. Each modality captures complementary aspects of the hull. Point clouds are effective at representing local surface detail, while waterline and buttock splines encode global hull structure and longitudinal–transverse consistency in a form that closely aligns with traditional naval architectural practice.
While the reconstructed hull surfaces capture the primary geometric characteristics of the generated designs, the resulting geometries should be interpreted as conceptual outputs intended for early-stage design exploration. As with many data-driven geometry generation approaches, additional fairing or smoothing may be required before the hulls can be used directly for high-fidelity hydrodynamic simulations or detailed naval architectural analysis. In practical ship design workflows, such geometries would typically undergo further refinement and validation before downstream performance calculations are performed.
In practice, the multimodal formulation was found to improve robustness during both reconstruction and generative sampling. Even when the point cloud reconstruction exhibited minor local inaccuracies, the waterline and buttock splines preserved the overall hull envelope and global proportions. Conversely, when spline reconstructions were locally sparse or incomplete, the point cloud retained detailed surface information. This redundancy constrains the shared latent space to encode geometry that is simultaneously consistent across structured and unstructured representations, reducing the likelihood of latent codes that decode to implausible or distorted hull forms. This property is particularly beneficial when a diffusion model is trained in the learned latent space, as it limits the generation of pathological latent samples that would otherwise produce noisy or physically unrealistic geometry.
The multimodal geometric approach adopted here contrasts with purely parametric or coefficient-based generative models, which are efficient at exploring variations in principal dimensions but often struggle to capture subtle geometric features such as local curvature transitions, bilge shape, or bow refinement. By directly operating on geometric representations, these nuanced features are inherently embedded in the learned latent space rather than approximated indirectly through a small set of parameters.
A key practical advantage of the framework is the ability to condition hull generation on global geometric and hydrostatic parameters, such as length, breadth, depth, and form coefficients. Conditioning ensures that generated hulls are not merely random samples from the learned distribution, but are instead aligned with explicit design intent. This allows naval architects to request hulls of approximately a given size and type and obtain geometries that respect those constraints. Earlier generative studies based on GANs or VAEs have often relied on weak or absent conditioning, leading to outputs that require significant manual filtering. In contrast, conditioning directly embeds design requirements into the generative process.
From a generative modeling perspective, the diffusion-based approach employed on top of the learned latent space offers several advantages over GAN-based alternatives. Diffusion models are generally more stable to train and less prone to mode collapse, which is particularly important when modeling high-dimensional geometric spaces. Although diffusion sampling incurs higher computational cost than direct GAN sampling, generation times on the order of seconds per hull remain acceptable for conceptual design exploration. Moreover, diffusion models naturally accommodate conditioning by incorporating conditioning vectors at each denoising step, which proved effective in guiding hull generation toward desired geometric characteristics.
While autoencoders alone provide a compact latent representation, they are not inherently generative. By augmenting the autoencoder with a diffusion model operating in latent space, the framework transitions from a compression mechanism to a fully generative system capable of producing novel hull forms. Compared to simpler probabilistic models, such as Gaussian mixture models operating in parameter space, the proposed approach offers greater expressive power to capture nonlinear relationships between design variables and geometry. Importantly, any latent sampled by the diffusion model decodes to a valid geometric representation in terms of continuity and coherence, even though functional validity (e.g., resistance or stability) is not explicitly enforced at this stage.
Limitations
Despite the promising results, several limitations were identified in the current methodology. First, the diversity and representativeness of the training dataset impose inherent constraints on generalization. The dataset is derived from systematic variations of a finite set of parent hull geometries, which introduces the possibility that certain geometric characteristics of the parent forms may influence the learned design space. Hull types that are under-represented, such as high-speed naval forms or specialized ferries, exhibit reduced fidelity in the generated outputs, particularly in localized geometric features such as bulbous bows or superstructure transitions. Second, the current pipeline is designed exclusively for monohull geometries and does not accommodate multihull configurations such as catamarans or trimarans, which require fundamentally different topological handling and connectivity assumptions. Third, while the multimodal representation (spline grids and point clouds) enables rich geometric encoding, it assumes hulls to be symmetric and watertight, which may not hold in certain asymmetric or damaged scenarios. Additionally, the conditioning mechanism is limited to a fixed set of scalar design parameters and does not yet incorporate operational or performance data such as resistance estimates or seakeeping metrics, which could further refine generative control. Finally, reconstruction quality remains sensitive to the resolution of the spline grid and sampling density, with overly coarse configurations leading to loss of geometric detail. These limitations motivate future extensions in dataset scope, topology-aware representations, and performance-integrated conditioning frameworks.
Overall, the multimodal, representation-aware approach aligns with the broader trend of using generative artificial intelligence as a design assistant rather than a replacement for human expertise. The intent of the framework is to support rapid exploration of the design space, enabling naval architects to generate, visualise, and assess multiple candidate hull forms in a fraction of the time required by manual modeling. Within this scope, the results demonstrate that the proposed method provides a meaningful enhancement to the early-stage ship design process, offering both geometric fidelity and controllable diversity while remaining compatible with established naval architectural workflows.
6. Conclusions
This paper builds directly upon the review by Htein et al. [
1], which identified generalizability, conditional controllability, and physics awareness as key unmet needs in data-driven ship hull generation. Motivated by these gaps, the present work proposed and evaluated a representation-aware generative modeling framework that explicitly integrates multiple geometric descriptions of the hull form. By jointly learning from point clouds, waterline splines, and buttock splines, the framework provides a more holistic and structured understanding of hull geometry than single-representation approaches.
The core contribution of this study is the integration of a multimodal autoencoder with a conditional diffusion model operating in the learned shared latent space. Unlike earlier generative approaches such as ShipGen [
16] or ShipHullGAN [
15], which rely on a single representation, the proposed method embeds several complementary views of the hull simultaneously. As demonstrated by the consistent reconstructions across modalities (
Figure 5), the learned latent space encodes both detailed surface geometry and global hull structure in a unified form, improving the fidelity and robustness of generated designs [
1].
The results show that point-based and spline-based geometric representations can be encoded into a compact latent vector without significant loss of information. The autoencoder provides a deterministic mapping between latent space and full three-dimensional hull geometry, ensuring that any generated latent sample can be decoded into a coherent hull form. This property is evidenced by the strong agreement between reconstructed point clouds and reconstructed spline representations (
Figure 5), which jointly describe consistent hull envelopes and sectional geometry.
Decoding the reconstructed point clouds into surface meshes further confirms the geometric coherence of the learned representation. As shown in
Figure 7 and
Figure 8, the generated meshes are smooth and watertight, despite surface connectivity not being explicitly enforced during training. This demonstrates that the latent space preserves sufficient geometric structure to support continuous surface reconstruction, an important requirement for downstream naval architectural analysis.
By adapting denoising diffusion models to operate within this multimodal latent space and conditioning generation on naval architectural metadata, the framework enables controlled synthesis of new hull forms aligned with specified design intent. The diffusion model exhibits stable generation behavior and strong coverage of the learned design space, avoiding the mode collapse issues often encountered in GAN-based approaches. Conditioning ensures that generated hulls respect principal dimensions and form characteristics, addressing a key limitation of earlier unconditioned generative models [
1].
From a design perspective, the proposed pipeline enables rapid generation of families of hull form alternatives for a given set of requirements. This capability has clear implications for early-stage ship design, where designers must explore a broad solution space under limited information and tight timelines. By automating the synthesis of geometrically plausible hulls, the framework can significantly shorten design iteration cycles and support creative exploration beyond what is feasible through manual manipulation of traditional hull lines.
Generally, this work demonstrates how representation-aware learning can bridge the gap between data-driven generative models and established naval architectural practice. Rather than treating hull generation as a purely parametric or optimization-driven task, the approach embraces data-driven creativity while remaining grounded in physically meaningful geometric representations. In doing so, it implemented several of the future research directions identified in the earlier review [
1].
Building on this foundation, future research can explore several avenues:
Incorporating performance objectives: integrating a resistance or seakeeping estimator into the training loop, to directly bias the generator toward high-performance designs (similar to what was attempted in ShipGen [
16], but perhaps with more conservative targets to ensure feasibility).
Higher-resolution representations: using patch-based or multi-scale decoders to capture more detailed geometry, which could allow the inclusion of features like keels, rudders, or internal structure.
Interactive design tools: developing a user interface on top of this model where a designer can intuitively adjust parameters and see the hull shape update in real-time, possibly with the ability to “morph” between generated options.
Generalization to other vessel types: training on more diverse datasets to handle multihulls, planing hulls, or submarines, and examining how well the latent space can generalize to radically different forms.
Hybrid approaches: combining generative models with traditional CAD modeling. For instance, using a generated hull as a starting point that can be further refined in a CAD software by a naval architect, blending AI suggestions with human expertise.
The encouraging results of this study suggest that representation-aware generative modeling is a promising pathway for marine design. As data availability grows and computational methods advance, it is anticipated that such tools will become integral in the design process, enabling rapid prototyping of concepts, informed decision-making through data, and ultimately more efficient and imaginative ship designs. This aligns with the broader trend of AI-driven methodologies enhancing the efficiency and innovation of ship design practices. By automating the generation of viable hull forms, naval architects can refocus efforts on higher-level design decisions and performance evaluations, thereby streamlining the journey from concept to viable ship. The integration of domain knowledge with generative AI, as exemplified by this work, paves the way for a new era of computer-aided ship design that is fast, intelligent and reliable.
7. Use of AI Tools
The authors used ChatGPT 5.1 (OpenAI) solely for language editing and phrasing improvements during the preparation of this manuscript. The tool was employed to assist in refining English grammar, sentence structure, and readability. The scientific content, methodology, data analysis, and conclusions were developed entirely by the authors. All outputs generated by the software were critically reviewed and verified by the authors.