Product Image Generation Method Based on Morphological Optimization and Image Style Transfer

Zhou, Aimin; Wang, Xinle; Huang, Yujin; Wang, Weitang; Zhang, Shutao; Ouyang, Jinyan

doi:10.3390/app15137330

Open AccessArticle

Product Image Generation Method Based on Morphological Optimization and Image Style Transfer

by

Aimin Zhou

¹

,

Xinle Wang

²

,

Yujin Huang

¹

,

Weitang Wang

¹

,

Shutao Zhang

¹

and

Jinyan Ouyang

^1,*

¹

School of Architecture and Art Design, Lanzhou University of Technology, Lanzhou 730050, China

²

School of Creative Design, Guilin Institute of Information Technology, Guilin 541004, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7330; https://doi.org/10.3390/app15137330

Submission received: 12 May 2025 / Revised: 19 June 2025 / Accepted: 26 June 2025 / Published: 30 June 2025

(This article belongs to the Topic Theoretical and Applied Problems in Human-Computer Intelligent Systems)

Download

Browse Figures

Versions Notes

Abstract

In order to improve the controllability and esthetics of product image generation, from the perspective of design, this study proposes a product image generation method based on morphological optimization, esthetic evaluation, and style transfer. Firstly, based on computational esthetics and principles of visual perception, an esthetic comprehensive evaluation model is constructed and used as the fitness function. The genetic algorithm is employed to build a product morphological optimization design system, obtaining product form schemes with higher esthetic quality. Then, an automobile front-end image dataset is constructed, and a generative adversarial network model is trained. Using the aforementioned product form scheme as the content image and selecting automobile front-end images from the market as the target style image, the content features and style features are extracted by the encoder and input into the generator to generate style-transferred images. The discriminator is utilized for judgment, and through iterative optimization, product image schemes that meet the target style are obtained. Experimental results demonstrate that the model generates product images with good effects, showcasing the feasibility of the method and providing robust technical support for intelligent product image design.

Keywords:

product form; optimization design; esthetic evaluation; style transfer; image generation

1. Introduction

In recent years, the technology of Generative Artificial Intelligence-Generated Content (AIGC) has undergone explosive growth. The launch of popular AIGC tools, such as Sora, Midjourney, and Adobe Firefly, has sparked a significant design revolution across diverse fields, including design, art, education, and entertainment. This marks the transition of AI from the 1.0 era to the 2.0 era, fundamentally altering the future trajectory of the design field [1]. AIGC leverages technologies such as large-scale data learning, pre-trained models, and generative algorithms to autonomously create innovative and unique image concepts, thereby significantly boosting the efficiency and creative capabilities of design. It exhibits extensive potential applications and significant importance in the realm of product design. Despite the ability of AIGC to automatically produce refined images with specific visual semantics by utilizing content images and several textual cues, it encounters difficulties in organizing textual cues and detecting subtle changes in textual semantics, which can lead to vastly different image contents being generated. In contrast, image style transfer (IST) employs content images as references for generation and style images as references for style. IST demonstrates greater predictability in generating images and fulfills the basic requirements for enhancing the esthetic appeal of existing images.

Style transfer techniques are primarily grounded in neural networks and deep learning algorithms. During the style transfer process, the style representation encapsulates features such as texture and color within an image, whereas the content representation primarily concerns itself with the form and structure of objects. By extracting and comparing these two representations, the style of one image can be seamlessly transferred to the content of another. Traditional research in the field of style transfer has centered around knowledge representation and extraction, aiming to minimize computational complexity and memory usage while achieving versatile and customizable IST.

In the realm of product form design, generating a multitude of creative and innovative form proposals is of paramount importance. Consequently, there exists significant demand for innovation in content imagery. The product form optimization process involves converging multiple design solutions into an optimal final design. This begins with thorough analysis of user needs to establish clear, specific, and actionable evaluation criteria. These criteria are then integrated as constraints to guide, screen, and optimize design solutions, ensuring alignment with design objectives. Throughout this iterative process, promising solutions are retained and progressively refined until convergence is achieved. The evolutionary nature of this process—where solutions undergo selection and improvement—parallels biological evolution. Inspired by this principle, we employ a genetic algorithm (GA) for product form optimization. Within this GA framework, the esthetic index serves as the fitness function to evaluate and select solution candidates. This methodology ultimately facilitates optimization and innovation in product form design. To fulfill the demands of product form design and bolster the controllability and esthetic appeal of content imagery, this study introduces a method for generating product images through morphological optimization, esthetic evaluation, and style transfer.

2. Literature Review

2.1. Intelligent Design of Product Form

The integration of evolutionary algorithms into product form design for the generation of innovative solutions has emerged as a focal point in product design research. Frequently utilized evolutionary algorithms encompass genetic algorithms [2], Strength Pareto Evolutionary Algorithm 2 (SPEA2) [3], particle swarm optimization [4], and artificial fish swarm algorithms [5]. Presently, fitness evaluation in evolutionary design systems primarily relies on methods such as manual evaluation and imagery evaluation models. However, the inherent subjectivity of manual evaluation is apparent, as individual preferences exert a substantial influence on design outcomes. Prolonged and repetitive assessments can induce user fatigue, ultimately undermining the system’s practicality and precision. Conversely, imagery evaluation models, which are grounded in quantitative data of morphological elements and cognitive imagery information measurements, offer advantages such as swift evaluation speeds, high efficiency, and the elimination of manual intervention. Traditional product imagery evaluation models typically establish a mapping model with “design features” as input and “sensory imagery” as output. Research on design features primarily focuses on the interconnected nodes of key curves in products, modeling units, or functional components. Sensory imagery extraction commonly utilizes introspective analysis methods drawn from psychology, typically obtained through questionnaire surveys. Common techniques employed for constructing product imagery prediction models encompass neural networks, support vector machines, and fuzzy theory. The aforementioned evaluation methods primarily focus on assessing product imagery, whereas utilizing esthetic evaluation as a fitness function facilitates the system’s selection of products with superior esthetic quality, thereby better aligning with individuals’ esthetic expectations and demands. Nevertheless, additional research on this subject remains imperative.

2.2. Aesthetic Evaluation of Product Form

As living standards improve, individuals increasingly pursue not only material enjoyment but also a higher quality of life. Products with esthetic attributes resonate deeply with consumers’ inner emotions, fulfilling their emotional desires. The esthetic appeal of product design has the power to influence people’s thoughts, moods, and emotional development. Consequently, excellent product design should also embody an esthetically pleasing appearance. The evaluation of product esthetics centers on users’ cognitive activities, involving comparisons, assessments, and judgments of product appearance and esthetic evaluation based on their individual esthetic interests, concepts, and preferences [6]. These evaluation results serve as a crucial basis for purchase decision-making. Traditional studies of esthetics have primarily focused on philosophical discussions, viewing esthetics as implicit knowledge that is difficult to articulate and considering it subjective, arbitrary, and ambiguous. However, with the emergence of modern esthetics and advancements in science and technology, computational esthetics has rapidly emerged as a research focal point in the fields of digital industrial design and art design. Computational esthetics objectively and quantitatively constructs the relationship between the formal characteristics of objects and esthetic experience, transforming esthetics from implicit to explicit knowledge, thereby offering a novel methodology and approach for esthetics research.

In recent years, significant advancements have been made in the research on esthetic evaluation of product forms, particularly in the realms of esthetic knowledge representation, application patterns, experimental methodologies, and modeling techniques. Zhang et al. [7] formulated three esthetic index formulas for balance, contrast, and harmony according to esthetic principles. Utilizing linear regression, they established an esthetic evaluation model tailored for interfaces. Meanwhile, Lo [8] constructed formulas for product form esthetic indexes such as symmetry, simplicity, and cohesion. Deng et al. [9] adopted factor analysis to extract six primary esthetic indexes from interfaces, using the variance contribution rates of the indexes as weights to construct a comprehensive esthetic evaluation model. Lugo et al. [10] analyzed curve organizational characteristics and, guided by Gestalt principles, established six esthetic index formulas utilizing vector products: proximity, continuity, closure, symmetry, directionality, and similarity. Valencia et al. [11] took a water bottle as an example and, based on Gestalt principles, proposed calculation methods for three indexes: symmetry, similarity, and continuity. They also explored the esthetic utility of these indicators through discrete-choice experiments in virtual reality. Ngo et al. [12] simplified the graphic elements in the interface to rectangles and constructed formulas for 13 esthetic indexes, applying computational esthetics methods to quantify the esthetic appeal of image elements. Zhou et al. [13] formulated 15 esthetic index formulas based on esthetic principles and Gestalt principles and established a comprehensive esthetic evaluation model for product form using principal component analysis. These aforementioned studies have provided valuable insights into the research on esthetic evaluation of products. However, they primarily concentrate on the representation of relatively straightforward indexes, resulting in an esthetic index system that lacks comprehensiveness. For some of these esthetic indexes with complex connotations, due to their characteristics of polysemy, fuzziness, and uncertainty, it is difficult to represent them accurately. Current research in this area is still inadequate. Therefore, it is necessary to conduct in-depth research on esthetic measure indexes with complex connotations.

2.3. Image Style Transfer

IST entails the process of reimagining the content of a specified image within the stylistic confines of another, marking it as a focal point of research in the realm of artificial intelligence and computer vision. Both domestically and internationally, a myriad of IST techniques are in widespread use, categorizable into traditional IST and neural network-based IST, distinguished by their distinct implementation methodologies and theoretical frameworks. Traditional methods emphasize manual feature extraction and mathematical optimization, exemplified by stroke-based rendering (SBR) [14], image analogy [15], image filtering [16], and IST based on the idea of texture synthesis [17]. While these traditional methods are comparatively straightforward and intelligible, they face constraints in terms of controllability, efficacy, and speed. As deep learning has evolved and been applied, neural network-based IST [18,19] has emerged as a pivotal direction in style transfer research. This technology facilitates the direct creation of realistic artistic styles, enhancing production efficiency without necessitating the possession of specific skills or experience by creators. Presently, numerous scholars are leveraging image encoding for multi-threaded complex tasks in intelligent generation endeavors, including image generation (such as handwritten digits, faces, and indoor scenes [20]), video generation [21], and text-to-image synthesis [22]. Chen et al. [23] successfully applied style transfer algorithms to the development and design of lacquerware creative products, achieving this through a process involving the collection of lacquerware-related techniques, simulation calculations, demo testing, and the utilization of graphic software for secondary overlay effects. Liu et al. [24] incorporated two generative adversarial network (GAN) modules for the design and generation of chair images. Specifically, the first GAN was tasked with generating chair images, while the second GAN was employed to enhance the resolution of these images. However, owing to the diverse viewpoints of the collected chair images, the resultant concept images lacked intricate details. Deng et al. [25] have similarly contributed to this field. We propose a method for swiftly generating product renderings using StyleGAN in conjunction with sketching techniques. This method leverages image deformation techniques to create realistic product renderings at various levels, grounded on product sketches. DAI et al. [26], on the other hand, utilized a GAN to extract features from existing product images and subsequently generated new design schemes based on these extracted features. By further processing the sketches in reverse, they could transform them into high-quality color design schemes.

In recent years, diffusion models have garnered considerable interest from both academic and industrial communities within the generative imaging domain owing to their distinctive forward and reverse stochastic diffusion mechanisms. The Stable Diffusion (SD) framework, introduced by Stability AI [27], has facilitated the democratization of image synthesis technology through its exceptional output fidelity and operational accessibility.

The superior generative performance of diffusion models arises from two key attributes: (1) their training paradigm eliminates adversarial optimization, thereby mitigating training instability and complexity; (2) inherent support for multimodal inputs enables precise semantic alignment between conditional data and generated outputs.

Historically, generative adversarial networks (GANs) dominated computer vision research through their capacity to synthesize photorealistic images. Fundamentally, GANs leverage adversarial training—where generator–discriminator dynamics drive the learning of high-fidelity samples with rich detail. Their architectural flexibility facilitates the integration of diverse network structures and conditioning mechanisms for controlled synthesis.

However, GANs exhibit critical limitations: (i) simultaneous optimization of dual networks induces training instability and convergence challenges; (ii) dependence on random noise sampling inherently constrains output novelty to variations within the training data distribution, limiting generative innovation.

The image authenticity and detail generation capabilities of diffusion models surpass those of generative adversarial network (GAN) models. Consequently, the focus of generated image detection research is progressively shifting towards diffusion models. Compared to GANs, diffusion models demonstrate substantial advancements in generation quality and controllability. Specifically, the underlying principle of diffusion models involves a forward noising process followed by a reverse denoising process. Through this mechanism, neural networks learn to synthesize images by iteratively denoising initially random noise inputs. During reverse reconstruction, the network acquires representations of the global structure and fine-grained details of images, preserving output diversity.

Furthermore, diffusion models offer enhanced controllability over generated content. This is particularly evident in their integration with cross-modal techniques, such as CLIP, enabling significant breakthroughs in text-to-image synthesis and overcoming limitations inherent in prior generative approaches.

However, as diffusion models have only recently gained prominence in image generation, associated detection methodologies remain underdeveloped. Key limitations include the following: Firstly, the iterative nature of the diffusion process demands considerable computational resources, resulting in comparatively slower generation speeds, though subsequent model iterations have shown improvements. Secondly, current diffusion models exhibit content generation constraints. For instance, the Stable Diffusion (SD) model, while a leading example in AI-generated art, is primarily effective for specific portrait domains or artistic styles. Generating content in broader or novel domains typically necessitates model retraining or fine-tuning.

When it comes to generating realistic product renderings from contour images, the full potential of style transfer technology has yet to be fully harnessed. Presently, the majority of researchers primarily concentrate on mere alterations in image color and texture through the application of style transfer technology, while overlooking the significance of form modification. Nevertheless, in particular scenarios, designers aspire not only for the model to facilitate style transformations between contour images and actual product images, but also for it to incorporate suitable image deformations, thereby enabling the creation of product design renderings at various levels using design contour images.

To tackle the aforementioned issues, this paper introduces an intelligent image design methodology grounded in esthetic evaluation, contour optimization, and style transfer, all from the vantage point of product form design. The primary contributions of this paper can be summarized as follows:

1.: We construct a model aimed at optimizing product forms. By utilizing Bézier curves to delineate the contours of the products, we employ an esthetic comprehensive evaluation model as the fitness function. Subsequently, genetic algorithms are leveraged to refine the product forms, ultimately yielding esthetically superior solutions.
2.: We employ genetic algorithms for morphological optimization. Initially, an initial population is established by encoding key morphological parameters of the target product. Through iterative application of stochastic genetic operators—selection, crossover, and mutation—successive generations are generated. An esthetic index serves as the objective fitness function, quantifying individual fitness values within each population. This cyclic process of fitness evaluation, operator application, and population renewal continues until convergence criteria are satisfied. Systematic evaluation of algorithmic outputs enables identification of Pareto-optimal solutions, ultimately achieving computational optimization and innovative morphological generation.
3.: We construct a GAN model for IST. Utilizing web crawling techniques, we compile an image dataset comprising car front faces to train the GAN model. The product form solutions serve as content images, while car front-face images from the market are chosen as the target style images. The encoder is responsible for extracting both content and style features. These features are then fed into the generator to produce style-transferred images. Ultimately, the discriminator assesses whether the generated image’s style aligns with the target image. Through iterative optimization, we obtain product form solutions that are consistent with the desired target style.

This paper is structured into five parts. In Section 2, a concise overview of the product form optimization method is presented, which is grounded in genetic algorithms and esthetic evaluation. Section 3 delves into the product form IST method, specifically utilizing a GAN. Section 4 centers on the design of automobile front-face images, employing the aforementioned method as the focal point of the design process. Lastly, Section 5 outlines the conclusions of this study and suggests potential avenues for future research.

3. Optimization Design Model of Product Form Based on Genetic Algorithm and Esthetic Evaluation

3.1. A Morphological Optimization Design Model Based on the Genetic Algorithm

The process of product form design involves the concrete realization of stages such as user demand exploration and scheme presentation by designers. From the perspective of knowledge flow theory, it constitutes a micro-process entailing the deconstruction and reconstruction of knowledge information. The standard genetic algorithm is goal-oriented and constructs fitness functions to evaluate the population. Through processes such as initial population evaluation and genetic operations, it selects individuals possessing optimization fitness function values, which are deemed as the optimal solutions to the problem [28]. The single-parent genetic algorithm represents a special instance of the standard genetic algorithm, wherein genetic operations are solely performed on a single individual, namely the target product sample that is to be optimized. In this study, an esthetic comprehensive evaluation model is employed to compute the fitness of individuals within the population. The flowchart illustrating the morphological optimization design process is depicted in Figure 1.

Gene encoding involves analyzing the styling elements and key control points of superior product samples and subsequently decomposing them into morphological components. The key morphological components of the product specimen delineate the target region. The coordinates of these key control points represent the product’s form and offer a quantitative description of the product sample.

The design parameter set for the product sample is defined as

C = \{c_{1}, c_{2} . . ., c_{g}\}

, where each design parameter cg has a value range of

C_{g} \in [U_{g}, V_{g}]

(

U_{g}

denoting the maximum value and

V_{g}

denoting the minimum value of the design parameter). Subsequently, binary encoding is employed to encode the product sample.

In order to realize the optimization of the genetic algorithm, this study uses a binary coding scheme to discretize continuous parameters within predefined boundaries, mainly based on the following reasons:

(1): Compatibility: Binary encoding, as a classical implementation of the genetic algorithm, can directly apply standard crossover and mutation operators;
(2): Scalability: The coordinate values of each control point are mapped from bounded continuous intervals (e.g., $X \in [X_{min}, X_{max}], Y \in [Y_{min}, Y_{max}]$ ]) to binary segments, which supports high-precision parameter optimization;
(3): Efficiency advantage: Binary coding can effectively balance the scale of the bounded discrete search space and computational efficiency in morphological optimization.

Based on this coding scheme, the initial population is first generated in the discrete parameter space within the defined boundaries by random sampling, and then a new design scheme is generated by iterative optimization in the bounded discrete search space through genetic operations such as selection, crossover, and mutation.

System parameter initialization involves the setting of parameters for morphological optimization design, including the initial population size, number of iterations, crossover probability, and mutation probability. The initial population pertains to the quantity of product samples during the first iteration. For morphological optimization problems, the size of the initial population primarily hinges on the complexity of the morphological design features or parameter sets. Typically, the population size falls within a range of 10 to 160. The determination of the number of iterations is generally based on the convergence curve obtained through numerous experimental trials. The crossover probability and mutation probability are often set according to the diversity of product solutions. A higher crossover probability augments the algorithm’s capability to search for novel design solutions, whereas a lower mutation probability facilitates the inheritance and reproduction of advantageous genes. Typically, in the initial stages of evolution, a higher mutation probability and a lower crossover probability are employed, whereas in the later stages, the crossover probability is increased and the mutation probability is decreased.

The fitness function constitutes the cornerstone of genetic operations, derived from an esthetic comprehensive evaluation model. In the context of esthetic product form design, the esthetic evaluation of individuals serves as the benchmark for assessing fitness. The ultimate objective of morphological esthetic design is to attain the optimal integration of design elements. For a single-objective function, the challenge lies in maximizing the esthetic evaluation of individual morphological solutions.

Single-parent mutation involves taking the target product samples that require optimization as initial individuals, with their respective genes being encoded. Mutation operations are then conducted to generate novel individuals. Subsequently, the initial individuals are merged with the newly generated ones to constitute the initial population for subsequent genetic operations.

Genetic operations primarily encompass selection, crossover, and mutation. Selection entails choosing relatively superior individuals from the initial population and transmitting them to the subsequent offspring population. Crossover consists of randomly pairing individuals within the population and exchanging the values of gene positions based on the crossover probability

P_{c}

. Mutation, on the other hand, involves altering the values of certain gene positions to their allelic counterparts within the population, contingent upon the mutation probability

P_{m}

. In the context of binary genes, this implies flipping the value of gene loci from 1 to 0 or vice versa.

After numerous generations of evolution, offspring populations are derived, and subsequent to gene decoding, the resultant outcomes are presented.

3.2. A Comprehensive Esthetic Evaluation Model for Product Form

3.2.1. Construction Method of Esthetic Index Formula

Previous research on esthetic evaluation of product forms constructed formulas for esthetic indexes from the perspective of vector graphics. To enhance the automation of the system, this paper draws upon six formulas developed by Ngo [11], namely balance, symmetry, proportion, rhythm, order, and unity, as well as six formulas developed by Zhou [12], namely regularity, parallelism, continuity, simplification, similarity, and proportional similarity, and reconstructs these twelve esthetic index formulas from an image perspective, denoted as

(D_{1} \sim D_{12})

, respectively. Additionally, for some esthetic indexes with complex connotations, combining the advantages of morphological visual cognitive processing in Gestalt psychology, four new formulas are proposed, including stability, hierarchy, contrast, and complexity, denoted as

(D_{13} \sim D_{16})

, respectively. The detailed explanation of the meaning of the 16 esthetic indicators is shown in Table 1. The esthetic index formulas can express the relationship function between product form and esthetic index at a quantitative level, making implicit esthetic cognitive knowledge explicit.

The procedures and methodologies for developing the image esthetic index formulas are outlined as follows:

Step 1: Analysis of Aesthetic Attribute Connotation and Psychological Effects. This involves examining the connotations of the esthetic index, establishing classification benchmarks, categorizing and hierarchizing the connotations of the esthetic index, and analyzing the visual structural relationships and psychological effects of product esthetic principles and Gestalt principles.

Step 2: Definition of Morphological Constraints. Based on taxonomic thinking, morphologically constrained conditions that influence esthetic psychological changes are defined in a categorized and hierarchical manner, and the relationships among these constraints are analyzed.

Step 3: Image Contour Extraction. Obtain a binary image and a binary data matrix of a certain product through image contour extraction technology.

Step 4: Connected Component Labeling. By taking the pixel points in the binary image as the basic units, scanning and analyzing the binary image can yield the division of connected components within the binary image, as well as information such as the location and number of these connected components.

Step 5: Extraction of Morphological Control Parameters. Using the information of connected components in the image to represent the contour line information of the product’s form, various morphological control parameters are extracted, such as area, curvature, centroid, key points, angles, orientation, etc.

Step 6: Structured Quantitative Description. Based on generative esthetic rules and utilizing the connected component labeling algorithm, a structured quantitative description of morphological organizational relationships is conducted in a categorized and hierarchical manner, constructing a formula for evaluating the esthetic index of the image.

3.2.2. Construction of the Esthetic Index Formula

The definition of the image coordinate system entails constructing a coordinate system tailored to a specific view of the product for the purpose of calculating the esthetic index. The origin

(X_{0}, Y_{0})

of this coordinate system is strategically positioned at the geometric center of the product’s external contour. The precise coordinates of the origin are derived through meticulous analysis and measurement of the product’s form and structure. Specifically, the coordinates of the origin are as follows:

\begin{matrix} X_{0} = \frac{P_{i, min} + P_{i, max}}{2} \end{matrix}

(1)

\begin{matrix} Y_{0} = \frac{P_{min, j} + P_{max, j}}{2} \end{matrix}

(2)

In Equation,

P_{i, min}

and

P_{i, \max}

signify the minimum and maximum pixel values in the horizontal direction, respectively, aligning with the leftmost and rightmost extremities of the product’s external contour. Similarly,

P_{min, j}

and

P_{max, j}

denote the minimum and maximum pixel values in the vertical direction, corresponding to the uppermost and lowermost points of the product’s external contour.

3.2.3. Stability

Stability refers to the capacity to assess the visual constancy of morphological elements within the contour line. Three factors primarily influence stability. Firstly, the height of the center of gravity: a lower height indicates greater stability, whereas a higher height signifies reduced stability. Secondly, the extent of deviation between the center of gravity and the axis of symmetry: a greater deviation leads to reduced stability and enhanced dynamics. Thirdly, the supporting area: a larger area provides greater stability, whereas a smaller area results in reduced stability. The formula utilized for calculating stability is outlined below:

\begin{matrix} D_{13} = \frac{D_{13 y} + D_{13 x} + D_{13 b}}{3} \end{matrix}

(3)

\begin{matrix} D_{13 y} = \frac{1}{2} - \frac{y_{c}}{h_{c}} \end{matrix}

(4)

\begin{matrix} D_{13 x} = 1 - \frac{2 |x_{c}|}{b_{c}} \end{matrix}

(5)

\begin{matrix} D_{13 b} = \frac{x_{b}}{b_{c}} \end{matrix}

(6)

In the equation,

D_{13 y}

,

D_{13 x}

, and

D_{13 b}

denote the respective influences of height, horizontal displacement, and supporting area on stability. The variables

b_{c}

and

h_{c}

represent the width and height of the product’s contour line, respectively. The variable

x_{b}

denotes the width of the supporting area, while

x_{c}

and

y_{c}

represent the x and y coordinates of the product’s center of gravity.

3.2.4. Hierarchy

A clear hierarchy and a rich sense of layering are crucial principles in esthetics. Clear hierarchy typically refers to the distinctive size variations among design elements, reflecting varying levels of importance and thereby creating distinct layers. A rich hierarchy, on the other hand, implies that the design elements exhibit overall variety and diversity, thus preventing a monotonous appearance. Factors that influence hierarchy encompass the size of distances between different levels and within each level, as well as the total number of levels. All lines are arranged in descending order of length, denoted as follows:

S_{n} = \{S_{1}, S_{2}, \dots, S_{n}\}

. When the distance between

S_{i}

and

S_{i + 1}

exceeds

10 %

of the length of

S_{1}

, all lines extending from

S_{1}

to

S_{i}

are deemed to constitute a single level. Following the same logic, other levels are searched for. The total count of these levels is denoted as m, and the formula for computing the hierarchy is provided below:

\begin{matrix} D_{14} = 1 - \frac{1}{2 m} - \frac{1}{2} \sum_{i = 1}^{m} \frac{S_{i}^{F} - S_{i}^{L}}{S_{1}} \end{matrix}

(7)

In the equation,

S_{i}^{F}

and

S_{i}^{L}

, respectively, indicate the lengths of the first and last lines at the i-th level.

3.2.5. Contrast

Contrast pertains to the variations in the form and size of design elements, typically manifested through differences in line length. Based on the hierarchical division method, categorize all lines into m levels. Initially, compute the ratio of the average length of line

i + 1

to the average length of line i. Subsequently, aggregate these ratios and compute their mean. The corresponding formula is outlined below.

\begin{matrix} D_{15} = 1 - \frac{2}{m (m - 1)} \sum_{i = 1}^{m} \frac{\frac{1}{k} \sum_{k} S_{i + 1}^{k}}{\frac{1}{j} \sum_{j} S_{i}^{j}} \end{matrix}

(8)

In the equation

S_{i}^{j}

represents the length of the j-th line of the i-th level;

S_{i + 1}^{k}

represents the length of the k-th line of the

i + 1

level;

\frac{1}{j} \sum_{j} S_{i}^{j}

represents the average length of the lines in the i-th level; and

\frac{1}{k} \sum_{k} S_{i + 1}^{k}

represents the average length of the lines in the

i + 1

level.

3.2.6. Complexity

The complexity of a contour line is primarily associated with the extent of its curvature. As the length of the line segment between any two points on the contour line increases, so does the complexity of the enclosed area. Consequently, the relative increment in the contour’s perimeter compared to the perimeter of its convex hull serves as a metric to quantify the contour line’s complexity. The relevant formula is as follows:

\begin{matrix} D_{16} = \frac{1}{n} \sum_{i = 1}^{n} {(1 - \frac{S_{i}^{T}}{S_{i}^{L}})}^{\cdot} \end{matrix}

(9)

In the equation,

S_{i}^{L}

and

S_{i}^{T}

denote the contour perimeter of the i-th curve and the perimeter of its convex hull, respectively.

3.2.7. Esthetic Evaluation Model of Product Form Based on Entropy Weighting Method

The entropy weight method is a widely utilized objective weighting technique that addresses the subjectivity and unpredictability inherent in evaluation methodologies such as questionnaire surveys and expert weighting methods. Consequently, it enhances the objectivity and reliability of esthetic evaluation outcomes. This approach computes the information entropy value to quantify the dispersion degree of each esthetic index, thus facilitating the determination of the weights assigned to each esthetic index. The detailed steps involved are outlined below.

Assuming that the number of form samples is denoted by n, and the number of esthetic indexes is represented by m, the matrix encapsulating each esthetic index for each sample can be expressed as

X = {(x_{i j})}_{n \times m}

. To facilitate further analysis, normalization of this matrix is necessary.

\begin{matrix} x_{i j}^{'} = \frac{x_{i j}}{\sum_{i = 1}^{m} x_{i j}} \end{matrix}

(10)

The entropy values associated with each esthetic index are presented as follows.

\begin{matrix} e_{j} = - \frac{1}{ln m} \sum_{i = 1}^{m} x_{i j}^{'} ln x_{i j}^{'} \end{matrix}

(11)

The utility values pertaining to each esthetic index are outlined below.

\begin{matrix} d_{j} = 1 - e_{j} \end{matrix}

(12)

The weights of each esthetic index are

\begin{matrix} W_{j} = \frac{d_{j}}{\sum_{j = 1}^{n} d_{j}} \end{matrix}

(13)

4. IST Model Based on GAN

4.1. Architecture of GAN

GANs, introduced by Goodfellow et al. in 2014 [29], are rooted in the core concept of “adversarial learning”. This framework comprises two primary players: a generator G and a discriminator D. The generator’s objective is to mimic the distribution of real data as closely as possible, whereas the discriminator’s aim is to accurately differentiate between genuine data x and the synthetic data

G (z)

produced by the generator. To accomplish their respective objectives, both the generator and the discriminator engage in continuous training, enhancing their generative and discriminative capabilities through adversarial learning. The ultimate goal of this training procedure is to attain a Nash equilibrium between the two entities.

4.2. Building a Style Transfer Product Image Generation Model

The car front-end generation task presented in this study entails generating car front-end images that are coherent with both a given reference image and a form scheme diagram of the car front end. This task necessitates meeting two primary criteria: (1) authenticity, which ensures that the generated car front-end images possess a visually realistic appearance; (2) consistency, which necessitates that the form and intricate details of the generated car front end correspond precisely with the descriptions outlined in the form scheme diagram. Authenticity guarantees the practical relevance of the generated images to users, whereas consistency ensures that the users’ requirements for the form of the generated images are fulfilled, striving for a close alignment with the visual contour specifications delineated in the form scheme diagram. The methodology introduced in this study employs authenticity discriminators and conditional skip connection (CSC) modules to fulfill these two criteria.

As illustrated in Figure 2, the proposed framework comprises two modules: the generator module and the authenticity discriminator. The generator module encompasses an encoder and a decoder. By initiating with a real car front-end image, the encoder is employed to extract multi-scale features of the corresponding form scheme diagram. Subsequently, the high-level style features of the reference image are extracted by inputting the reference car front-end image. The decoder integrates the encoding results from both encoders to synthesize the desired car front-end image. The authenticity discriminator evaluates whether the input car front-end image is generated by the generator or is authentic, from a holistic perspective, thereby guiding the generator to produce realistic car front-end images. The training process is governed by five loss functions: image reconstruction loss, perceptual loss, style loss, edge loss, and authenticity discriminator loss.

4.2.1. Generator Module

In the generator module of this study, a network architecture featuring an encoder and a decoder is employed to generate images of car front ends. The detailed structure of the generator is illustrated in Figure 3 .

In the real image encoder

E_{r}

, the input reference image undergoes a series of convolutional layers for feature down-sampling, ultimately being compressed into a high-level image feature with 512 channels and dimensions of 1 × 1 in height and width. Similarly, the form design scheme graph encoder

E_{c}

extracts multi-scale features of the form design scheme through multiple convolutional layers. In the decoder, the high-level image features extracted by

E_{r}

are continuously integrated with the multi-scale form design scheme features extracted by the contour graph encoder, ultimately generating a car front image through iterative up-sampling processes.

Compared to synthesizing random content, generating car front images guided by form design schemes poses additional challenges for the generator. As the depth of the encoder network increases, the fine-grained local form information tends to diminish, resulting in blurry or distorted outputs. In extreme scenarios, the generated content may even completely disregard the detailed information from the sketch, such as the morphology of the car front, leading to a mismatch between the synthesized car front and the user’s intended form design scheme. To ensure consistency between the synthesized car front image and the morphological scheme diagram, and to enhance the propagation of morphological scheme features within the network, this study introduces the CSC module. The morphological scheme diagram serves as a constraint in each decoding process, thereby enhancing the edge and detail information of the morphological scheme. This effectively aids the generator in retaining the desired form information and achieving perfect alignment between the synthesized car front and the morphological scheme style. The CSC module integrates the strengths of residual networks and U-Net, incorporating both short and long skip connections. The short skip connections mitigate the issue of gradient vanishing, enabling deeper networks and improved model performance. The long skip connections originating from the encoder

E_{r}

strengthen the feature propagation between corresponding network layers, enhancing the reusability of morphological scheme features at various scales and effectively maintaining the consistency of edges and details.

As depicted in Figure 4, upon inputting the morphological scheme diagram into the contour map encoder, the resultant features are skip-connected to the decoder. The decoder employs convolutional layers to bolster its capacity for capturing information while augmenting the dimensionality of the features. Upon completion of the i-th convolutional layer, the output decoder features serve as the new input features, which are subsequently fed into the CSC module. Within the CSC module, the decoder features and encoder features are initially concatenated along the channel dimension. This concatenation facilitates the integration of information from disparate modalities, leading to a more seamless and precise feature fusion. Subsequently, the concatenated features undergo three successive convolutional layers with dimensions of 1 × 1, 3 × 3, and 1 × 1, respectively. These layers are designed to derive new features that match the dimensionality of the decoder features. Ultimately, the newly acquired features are merged with the decoder features through residual connections, and the resultant features are outputted.

4.2.2. Discriminator Module

The objective of this study necessitates fulfilling the criterion of authenticity. Guided by the aforementioned observations, an authenticity discriminator has been devised. This authenticity discriminator is tasked with evaluating the visual fidelity of the generated images, ensuring that each one closely mimics a real image. The detailed architecture, depicted in Figure 5, encompasses multiple convolutional layers. When fed with either real or generated car fronts as input, the discriminator module’s objective function aims to produce an authenticity score.

The discriminator is composed of multiple convolutional layers, and in the case of the car style image

Y_{4}

and the car front-face shape optimization figure

Y_{3}

, the real car front face

Y_{1}

or the generated car front face

Y_{2}

are used as inputs, and the objective function of the final output authenticity is as follows:

\begin{matrix} L_{Z S} = E_{Y_{1}} [log D_{Z S} (Y_{1})] + E_{Y_{4}}, Y_{3} [1 - log D_{Z S} (Y_{2})] \end{matrix}

(14)

4.2.3. Loss Function

Optimizing the loss function to enhance the authenticity of the discriminator results in a more realistic appearance of the generated car front face. In the context of synthesizing car front faces, achieving maximum consistency between the generated faces and real car front faces is highly desirable. The loss function, also known as the Mean Absolute Error (MAE), measures the total sum of absolute differences between the target values and the estimated values. Furthermore, the loss function can also be designated as the Mean Square Error (MSE), which computes the sum of squared differences between the target values and the estimated values. According to the literature, utilizing loss functions instead of directly optimizing the GAN can mitigate the occurrence of blurry generated images. To ensure fidelity to the contour image, previous studies have employed

L_{1}

loss to reduce blurriness in the generated results and have introduced perceptual loss and style loss. Consequently, this research adopts

L_{1}

loss to calculate the image reconstruction loss between the generated car front face

Y_{2}

and the real car front face

Y_{1}

:

\begin{matrix} L_{c} = {∥Y_{1} - Y_{2}∥}_{1} \end{matrix}

(15)

In the equation,

{∥ \cdot ∥}_{1}

represent the

L_{1}

losses.

We introduce the concepts of perceptual loss and style loss by leveraging the pre-trained VGG16 network to compute the activation feature maps for both the generated and real car front faces. Subsequently, we calculate the

L_{1}

distance between these feature maps. The representation of the perceptual loss is as follows:

\begin{matrix} L_{G Z} = \sum_{i} {∥φ_{i} (Y_{1}) - φ_{i} (Y_{2})∥}_{1} \end{matrix}

(16)

In the provided expression, the term

φ_{i}

signifies the feature map of the i-th layer within the pre-trained VGG16 model. By optimizing the perceptual loss function, the generated images of car fronts can maintain consistency with real car front images in the feature space. Furthermore, the style loss aids in generating car front images that align with real car front images in terms of color, texture, and other stylistic features. Specifically, the style loss computes the statistical discrepancy between the activation feature maps.

\begin{matrix} L_{F G} = \sum_{j} {∥G_{j}^{φ} (Y_{1}) - G_{j}^{φ} (Y_{2})∥}_{1} \end{matrix}

(17)

In the formulation,

G_{j}^{φ}

is the basis for constructing the Gram matrix, utilizing the activation map

φ_{j}

.

To maintain consistency between the generated image and the contour map, an edge loss is incorporated to minimize the discrepancy in contour information between the generated car front-face image and the authentic real-world car front-face image.

\begin{matrix} L_{B Y} = {∥H E D (Y_{1}) - H E D (Y_{2})∥}_{1} \end{matrix}

(18)

In the formulation,

H E D (.)

represents the trained contour map extraction network.

\begin{matrix} L = L_{Z S} + λ_{1} L_{C} + λ_{2} L_{G Z} + λ_{3} L_{F G} + λ_{4} L_{B Y} \end{matrix}

(19)

In the given equation,

λ_{1}

,

λ_{2}

,

λ_{3}

, and

λ_{4}

denote the respective weights assigned to the various losses.

5. Case Study

This study centers on the design of automotive front-end facades, elucidating the particular application of form optimization and style transfer techniques in the generation of product imagery. The front face of a car was selected as a case study due to its complex morphological features. First of all, from the perspective of morphological complexity, the front-face area of the car contains multiple functional components such as light clusters, grilles, and bumpers, and the strict geometric coordination between each component needs to be maintained. In addition, the automotive industry places a high value on esthetic appeal and brand identity, making it a relevant application for style transformation and form optimization. As shown in the figure, the front-face modeling involves the collaborative optimization of 60 key control points, and this high-dimensional design space optimization problem can fully verify the ability of the proposed algorithm to deal with complex forms and the effectiveness of dealing with complex design elements.

5.1. Experimental Design for Car Front-Face Form Optimization

5.1.1. Car Front-Face Form Design Parameters

In the exterior styling design of automotive front-end facades, windows, side mirrors, and wheels function as adaptable genetic elements with minimal style constraints, maintaining a consistent form. Conversely, the upper and lower headlights, along with the upper and lower grilles, serve as personalized morphological genes in the automotive front-end design and constitute the focal points of the design. Morphological optimization design is conducted on six components, encompassing the upper and lower headlights and grilles. Considering the symmetrical nature of automotive front-end facades, it suffices to initialize one side and subsequently achieve comprehensive optimization via symmetrical reconstruction. By profiling the automotive front-end facade, the coordinates of key points for the six components are measured utilizing Python.

In this study, the design of the front face of the car was optimized by the genetic algorithm. The target area of the sample contains the following key morphological feature components: 14 key points of the upper and lower headlights, 16 key points of the upper and lower grilles, and a total of 60 control points and 120 coordinate values. According to the maximum and minimum values of the key points of each morphological component in the existing car front-face sample, the upper and lower bounds of the morphological optimized search space are determined, and the initial morphological structure is generated by Bezier curve fitting control points.

The key control points are illustrated in Figure 6.

5.1.2. Optimizing the Front Fascia of the Automobile

During the initialization phase of optimizing the automobile’s front fascia, four curves—specifically, the upper and lower headlight curves, as well as the upper and lower air intake grille curves—were identified as pivotal features. To accurately depict and manipulate these selected elements, the Bezier curve fitting technique was adopted. The parameter coordinates of each curve served as representative sample data for the form of the automobile’s front fascia. For optimization, a genetic algorithm was utilized, with the crossover probability and mutation probability set, along with the specification of the genetic generation. For instance, a crossover probability of 0.1, a mutation probability of 0.1, and a genetic generation count of 50 were employed, yielding 20 samples of optimized front fascia forms, as illustrated in Figure 7.

5.1.3. Calculating the Esthetic Index for the Front Fascia of Automobiles

Utilizing the aforementioned method, the contour lines of the sample front fascia were established as the design benchmark. The target elements for calculating the esthetic index encompassed the two upper headlights, two lower headlights, the upper air intake grille, and the lower air intake grille. The detailed numerical results for the 16 esthetic indexes are presented in Table 2.

5.1.4. Weight Analysis Based on Entropy Method

By employing Equations (10)–(13) and consulting Table 2, the weights for each esthetic index can be determined. Subsequently, the relational expression for the comprehensive evaluation model of automobile front fascia esthetics can be derived.

\begin{matrix} \begin{matrix} F = 0.044 D_{1} + 0.032 D_{2} + 0.031 D_{3} + 0.017 D_{4} + \\ 0.093 D_{5} + 0.038 D_{6} + 0.127 D_{7} + 0.038 D_{8} + \\ 0.013 D_{9} + 0.043 D_{10} + 0.044 D_{11} + 0.020 D_{12} + \\ 0.252 D_{13} + 0.053 D_{14} + 0.104 D_{15} + 0.053 D_{16} \end{matrix} \end{matrix}

(20)

In the equation, F denotes the comprehensive esthetic evaluation value. Based on Equation (20), the esthetic evaluation values and rankings of 20 schemes have been computed and are presented in Table 3. Among these, the optimal solution is depicted in Figure 8.

5.2. Automotive Front-Face Style Transfer Experiment Design

5.2.1. Experimental Setup and Dataset

This experiment was conducted on the Linux platform, utilizing an SSH remote connection to access a cloud computing server. The server’s configuration comprises a GPU processor, namely the NVIDIA GeForce RTX 3080 with 10 GB of VRAM, a CPU processor, specifically the Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50 GHz, and a 50 GB SSD. The deep learning framework employed was PyTorch 1.11.0, running on Python 3.8 (Ubuntu 20.04) within a Cuda 11.3 environment. Furthermore, commonly utilized third-party Python libraries, including NumPy 1.24.3, OpenCV 4.5.2, Matplotlib 3.8.0 and Tqdm 4.65.0, were installed.

The experimental dataset comprises 2891 original images of car front faces, sourced from “The Car Connection” and “Autohome” websites. The contour dataset of car front faces was generated using the Canny edge detection algorithm. These images possess a resolution of 512 × 384 pixels. For the training phase, the model employs the Adam optimizer, configured with a learning rate of 0.0002, to optimize its parameters. The loss function incorporates weight parameters

λ 1

,

λ 2

,

λ 3

, and

λ 4

, which are assigned values of 1.0, 100.0, 0.5, and 1000.0, respectively. The batch size is set to 4, and the model undergoes training for 800 epochs.

5.2.2. Product Image Generation

In this process, the optimized car front-face solution depicted in Figure 7 serves as the input content image. The target style image is chosen as the 2022 Mercedes-Benz C-Class model’s car front face. The encoder is utilized to extract both content and style features from these images. Subsequently, these extracted features are input into the generator to produce style-transferred images, which aim to blend the content of the input image with the style of the target image. Ultimately, the discriminator assesses whether the style of the generated image matches the target image style. Through iterative optimization, a product form scheme that conforms to the desired target style is ultimately obtained.

The experiment with the GAN entails the generation of car front faces through adversarial learning between the generator and the discriminator, utilizing a predefined set of test samples. The training process, depicted in Figure 9, showcases the progress at various epochs: 100, 200, 300, 400, 500, 600, 700, and 800.

The figure illustrates the progression of generating car front-face images during the training process. Initially, at the early stages, the generated images exhibit rough outlines of car contours but still deviate significantly from real images. After 200 epochs, the car front-face images begin to exhibit noticeable features, such as the overall form. By 500 epochs, the generated images show more distinct features, including headlights and grille structures, though further refinement of details is required. At 700 epochs, the primary design of the car front face has undergone significant enhancement, with improved details compared to earlier stages. Ultimately, after 800 epochs of training, the generated images achieve a more pronounced and realistic representation of car front-face features.

After training the network for 800 epochs, the definitive experimental result is presented in Figure 10. It is visually apparent that the generated scheme falls slightly short of achieving the authentic effect. While the generated scheme is relatively clear and displays distinct characteristics, capable of serving as a guide for the design of visual representations, it still exhibits notable disparities when compared to the work of professional designers, particularly in the realm of visual details. However, it remains advantageous in aiding designers to swiftly procure a car front-face effect scheme during the initial phases of the design process.

5.2.3. Comparative Assessment of Multi-Style Transfer

To validate the effectiveness and generalization capability of the proposed algorithm, we performed a multi-style transfer comparative experiment. Adhering to the principle of style diversity, two target styles exhibiting distinct visual characteristics were selected for systematic comparative analysis of style transfer outcomes.

Analysis of multi-style comparative experiments reveals that the generated images demonstrate competent preservation of structural details across components including windows, tires, body panels, and roof profiles in Figure 11. Notably, the roof geometry exhibits significant dependency on the target style image, indicating that the final product imagery is predominantly governed by the input stylistic reference.

Specifically, Group 1 outputs maintain well-defined contours of upper/lower headlamps and intake grilles, yielding front fascia renderings with discernible brand characteristics. Conversely, Group 2 results exhibit blurred contours in these critical components, compromising front-end feature recognition. This divergence illustrates that stylistic complexity directly influences contour fidelity, where angular design languages present greater structural retention challenges than curvilinear motifs.

Consequently, when benchmarked against professional renderings, the generated images manifest three principal limitations: (1) incomplete style transfer, (2) suboptimal headlamp representation, and (3) deficient color accuracy. Additional shortcomings include diminished contour feature salience and localized artifacting. Nevertheless, the methodology successfully generates automotive product imagery that satisfies fundamental rendering requirements, achieving enhanced clarity and structural congruence with reference inputs. This approach significantly streamlines design workflows by accelerating rendering screening efficiency, reducing redundant design iterations, and enabling rapid product visualization—thereby validating the model’s technical feasibility.

6. Discussion

To better satisfy users’ esthetic demands and enhance the controllability and esthetic appeal of product image generation, this study is grounded in computational esthetics and visual cognition principles. It employs a product form optimization design methodology that integrates Bezier curves with genetic algorithms to establish a form design model. The esthetic evaluation model is constructed utilizing the esthetic index and entropy weight method. With the optimized design scheme as input, this study utilizes GAN techniques for conducting style transfer experiments, with a particular focus on the car front face as the subject of investigation. The primary contributions and accomplishments of this study are outlined as follows:

(1): By integrating Bezier curves with genetic algorithms, the parameters of key points in car front-face form samples are extracted. The esthetic index serves as the fitness function to assess the outcomes of genetic operations. Bezier curves, owing to their distinctive form, can distinctly discern curvature changes in forms, whereas genetic algorithms possess robust parameter optimization capabilities for extracting the most representative features from intricate data. This methodology obviates the need for designers to invest considerable time in extensive model training. Commencing directly from form parameters, it minimizes human factor interference and ensures the objectivity of the evaluation process. This approach offers designers an abundance of product form designs and facilitates the swift development of diverse product form solutions.
(2): Product form design is conducted on the basis of esthetic index calculation. The formula for calculating the esthetic index determines the esthetic index value of experimental samples, while the entropy weight method is employed to ascertain the weights of the indices and obtain evidence for objective esthetic evaluation, thereby achieving esthetic evaluation of product form. In contrast to traditional manual evaluation methods utilized in intelligent product form design, this methodology utilizes computational esthetics technology to select solutions, yielding superior objectivity and reliability.
(3): We apply a GAN to generate realistic images of car front-face outlines. A dataset, comprising paired outline images and their corresponding real images, is created to facilitate the network’s learning and mapping processes in complex feature spaces. The GAN meticulously examines both the input form scheme image and the real image, identifying structural proportions and intricate details, thereby enabling it to train effectively and generate realistic images of car front faces. In comparison to previous product outline optimization designs, this methodology attains more realistic image effects that align more closely with actual requirements. Furthermore, when compared to traditional style transfer methods, this approach offers superior controllability over the generated content images.

This experiment also has certain limitations. The generated content partially loses information from the edge contour image, particularly the outer contour form of the car front face, leading to incomplete alignment between the generated car front face and the input contour image. The overall proportions of the car front face are therefore incompatible. Clearer and more contour-accurate images were not produced, and the reasons for this can be analyzed from multiple perspectives:

(1): Dataset: Firstly, the training of the GAN necessitates a substantial amount of high-quality data, and the limited experimental data constitutes a primary influencing factor. Furthermore, inconsistencies may exist within the dataset. Secondly, the distribution of car front-face data is intricate, and training the GAN requires intricate parameter adjustments. Inadequate parameter settings can readily impair model performance.
(2): Design: This study primarily focuses on preliminary exploration of car front-face contour design. In future endeavors, a more thorough analysis and evaluation of contour images of diverse product types can be conducted to observe the performance effects across various product forms. Additionally, strategies can be explored to enhance the method’s universality.
(3): Image Generation Quality: This method can provide relatively clear and high-resolution images, yet numerous challenges remain in achieving optimal image quality. For instance, the generated images may lack realism, exhibit compatibility issues with complex contours, and omit color details. Consequently, further in-depth research is imperative in the future to ensure that the generated car appearances closely resemble reality and to produce high-precision designs that meet designers’ expectations.
(4): Algorithm Application: This research solely explores the application effects of a GAN. Currently, various deep learning network models are emerging, and the processing capabilities of hardware devices are continually improving. The research achievements in deep learning are becoming increasingly plentiful and possess significant driving effects on solving practical problems. In the future, more efficient algorithms can be introduced to investigate product form design issues. It is also conceivable to integrate them with AIGC technology to tackle more intricate design challenges.

7. Conclusions

In this study, a morphological scheme diagram and a reference image of a car’s front end were utilized as inputs, with dual encoders employed to extract both style and content features. During the decoding phase, a conditional skip connection was incorporated to constrain the morphological scheme diagram, thereby enhancing edge and detail information to ensure the fidelity of the generated images to the original scheme. By incorporating a realism discriminator, the quality of the generated images was further enhanced, aiding designers in effectively boosting their efficiency in selecting effect images. This, in turn, reduced unnecessary steps in the design process and facilitated the rapid and efficient completion of product design representations. Experimental results demonstrate that the proposed model improves the quality of image generation to a significant degree. However, the generated effect images still exhibit issues such as a lack of vibrant color effects and incompatible contour lines. Future research should prioritize the development of models that can compensate for additional texture details.

At the same time, the experiment also has a disadvantage: the generated content partially loses the information in the edge contour map, such as the outer contour shape of the front face of the car, resulting in the generated front face of the car not exactly matching the input contour map, and the overall proportion of the front face of the car is incompatible. The lack of images that are clearly closer to the morphology of the contour map may be attributed to the following reasons: Firstly, the training of generative adversarial networks requires a large amount of high-quality data, and the lack of experimental data is one of the main effects, and there may be inconsistencies in the dataset. Secondly, the data distribution of the front face of the car is complex, and the training of the generative adversarial network needs to be adjusted with complex parameters.

This study combines the genetic algorithm of style transfer and uses esthetic indicators as fitness functions to pre-optimize product morphological parameters to generate content images with high esthetic quality. Through the GAN, the optimized morphological design scheme is integrated with the target style diagram to generate a car front-face rendering with both structural rationality and an artistic nature. However, this method has the disadvantages of high computational complexity, too long a processing time for images, and insufficient detail fidelity, and the generation of images may lose contour accuracy (such as blurred headlights) or experience color deviation due to style complexity. Genetic algorithms are suitable for various optimization problems, and efficient exploration of the design space is achieved through parameter coding and fitness functions. However, genetic algorithms provide controllable contour schemes at low computational costs and lack visual expressiveness [30]. The style migration GAN model directly generates results based on content images and style images, with a simple process and high computing efficiency. However, it is difficult for this model to actively optimize the morphological structure, and it can only migrate textures and color styles and cannot correct the inherent defects of the content images. Therefore, by integrating these two methods to form complementarity, genetic algorithms provide morphological design solutions, style transfer enhances visual effects, and the two jointly promote the development of intelligent design.

Finally, diffusion-based generative frameworks have demonstrated exceptional efficacy in synthesizing high-fidelity imagery conditioned on diverse inputs. Subsequent research will establish a hybrid diffusion architecture to generate product morphological solutions with sub-pixel geometric precision and physically plausible textural characteristics within categorical constraints, effectively overcoming current limitations in spectral fidelity and micro-scale feature delineation. Following validation on high-complexity automotive fascia designs, extension to domains including furniture systems and consumer electronics is imperative to rigorously evaluate the method’s cross-domain generalization capacity. In the future, new technologies such as diffusion models can be combined to further improve the quality and efficiency of generation.

Author Contributions

A.Z. and X.W. were involved in conceptualization, investigation, methodology, validation, visualization, and writing—original draft; Y.H. and W.W. helped in data curation and writing—review and editing; S.Z. contributed to formal analysis; J.O. helped in project administration and supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China “Research on Product Morphology Dynamics Model for Industrial Design” (Grant No: 52165033). This project is led by Professor Jianning Su from Lanzhou University of Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available on the websites https://www.thecarconnection.com/ (accessed on 10 May 2025), and https://www.autohome.com.cn (accessed on 10 May 2025). The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no competing interests.

References

Chen, J.; Shao, Z.; Zheng, X.; Zhang, K.; Yin, J. Integrating aesthetics and efficiency: AI-driven diffusion models for visually pleasing interior design generation. Sci. Rep. 2024, 14, 3496. [Google Scholar] [CrossRef] [PubMed]
Sohail, A. Genetic algorithms in the fields of artificial intelligence and data sciences. Ann. Data Sci. 2023, 10, 1007–1018. [Google Scholar] [CrossRef]
Wang, Z.; Liu, W.; Yang, M.; Han, D. A multi-objective evolutionary algorithm model for product form design based on improved SPEA2. Appl. Sci. 2019, 9, 2944. [Google Scholar] [CrossRef]
Wang, T.; Zhou, M. Product Development and Evolution Innovation Redesign Method Based on Particle Swarm Optimization. In Proceedings of the Advances in Industrial Design: Proceedings of the AHFE 2021 Virtual Conferences on Design for Inclusion, Affective and Pleasurable Design, Interdisciplinary Practice in Industrial Design, Kansei Engineering, and Human Factors for Apparel and Textile Engineering, 25–29 July 2021, USA; Springer: Cham, Switzerland, 2021; pp. 1081–1093. [Google Scholar]
Zhou, A.; Ouyang, J.; Su, J.; Zhang, S.; Yan, S. Multimodal optimisation design of product forms based on aesthetic evaluation. Int. J. Arts Technol. 2020, 12, 128–154. [Google Scholar] [CrossRef]
Świątek, A.H.; Szcześniak, M.; Stempień, M.; Wojtkowiak, K.; Chmiel, M. The mediating effect of the need for cognition between aesthetic experiences and aesthetic competence in art. Sci. Rep. 2024, 14, 3408. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Yu, J.; Zhang, K.; Zheng, X.S.; Zhang, J. Computational aesthetic evaluation of logos. ACM Trans. Appl. Percept. (TAP) 2017, 14, 1–21. [Google Scholar] [CrossRef]
Lo, C.H. Application of aesthetic principles to the study of consumer preference models for vase forms. Appl. Sci. 2018, 8, 1199. [Google Scholar] [CrossRef]
Deng, L.; Wang, G. Quantitative Evaluation of Visual Aesthetics of Human-Machine Interaction Interface Layout. Comput. Intell. Neurosci. 2020, 2020, 9815937. [Google Scholar] [CrossRef] [PubMed]
Lugo, J.E.; Schmiedeler, J.P.; Batill, S.M.; Carlson, L. Quantification of classical gestalt principles in two-dimensional product representations. J. Mech. Des. 2015, 137, 094502. [Google Scholar] [CrossRef]
Valencia-Romero, A.; Lugo, J.E. An immersive virtual discrete choice experiment for elicitation of product aesthetics using Gestalt principles. Des. Sci. 2017, 3, e11. [Google Scholar] [CrossRef]
Ngo, D.C.L.; Teo, L.S.; Byrne, J.G. Modelling interface aesthetics. Inf. Sci. 2003, 152, 25–46. [Google Scholar] [CrossRef]
Zhou, A.; Ma, J.; Zhang, S.; Ouyang, J. Optimal Design of Product Form for Aesthetics and Ergonomics. Comput.-Aided Des. Appl. 2023, 20, 1–27. [Google Scholar] [CrossRef]
Haeberli, P. Paint by numbers: Abstract image representations. In Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques, Dallas, TX, USA, 6–10 August 1990; pp. 207–214. [Google Scholar]
Hertzmann, A.; Jacobs, C.E.; Oliver, N.; Curless, B.; Salesin, D.H. Image analogies. In Seminal Graphics Papers: Pushing the Boundaries; ACM: New York, NY, USA, 2023; Volume 2, pp. 557–570. [Google Scholar]
Tomasi, C.; Manduchi, R. Bilateral filtering for gray and color images. In Proceedings of the International Conference on Computer Vision, Bombay, India, 7 January 1998. [Google Scholar]
Efros, A.A. Image quilting for texture synthesis and transfer. In Proceedings of the Computer Graphics, Hong Kong, China, 6 July 2001. [Google Scholar]
Gatys, L.A.; Ecker, A.S.; Bethge, M. A Neural Algorithm of Artistic Style. arXiv 2015, arXiv:1508.06576. [Google Scholar] [CrossRef]
Gatys, L.A.; Ecker, A.S.; Bethge, M. Image Style Transfer Using Convolutional Neural Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Sheng, J.; Hu, G.; Li, Y. Emotional Rendering of 3D Indoor Scene with Chinese Elements. J. Front. Comput. Sci. Technol. 2024, 18, 465–476. [Google Scholar]
Skorokhodov, I.; Tulyakov, S.; Elhoseiny, M. StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Wang, C.; Duan, Z.; Liu, B.; Zou, X.; Chen, C.; Jia, K.; Huang, J. Pai-diffusion: Constructing and serving a family of open chinese diffusion models for text-to-image synthesis on the cloud. arXiv 2023, arXiv:2309.05534. [Google Scholar]
Chen, J.; Xu, G. The Application of Style Transfer Algorithm in the Design of Lacquer Art and Cultural Creation. J. Art Design 2020, 3, 82–85. [Google Scholar]
Liu, Z.; Gao, F.; Wang, Y. A generative adversarial network for AI-aided chair design. In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 28–30 March 2019; pp. 486–490. [Google Scholar]
Deng, Z.; Lyu, J.; Liu, X.; Hou, Y.; Wang, S. StyleGAN-based Sketch Generation Method for Product Design Renderings. Packag. Eng. 2023, 44, 188–195. [Google Scholar]
Dai, Y.; Li, Y.; Liu, L.J. New Product Design with Automatic Scheme Generation. Sens. Imaging Int. J. 2019, 20, 29. [Google Scholar] [CrossRef]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, 18–24 June 2022; pp. 10674–10685. [Google Scholar]
Hollstien, R.B. Artificial Genetic Adaptation in Computer Control Systems; University of Michigan: Ann Arbor, MI, USA, 1971. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
Guoqiang, C.; Zhengyi, S.; Li, S.; Mengfan, Z.; Tong, L. Intelligent Cockpit Perceptual Image Prediction Based on BP Neural Network Optimization Genetic Algorithm. Automot. Eng. 2023, 45, 1479–1488. [Google Scholar]

Figure 1. Process of morphological optimization design.

Figure 2. Technical route of image style transfer model.

Figure 3. Generator architecture.

Figure 4. Structure of the CSC module. (1*1-Conv denotes a convolutional layer with kernel size 1 × 1.).

Figure 5. Discriminator structure.

Figure 6. Key control points.

Figure 7. Samples of optimized front fascia forms.

Figure 8. Optimal scheme of car front face.

Figure 9. Generation of car front faces.

Figure 10. Definitive design scheme.

Figure 11. Style experiment contrast: (a) style image for Group 1, (b) generated image for Group 1, (c) style image for Group 2, (d) generated image for Group 2.

Table 1. Explanation of the meaning of esthetic indicators.

Index ID	Aesthetic Index	Explanation of Meaning
$D_{1}$	Balance	Calculate the difference between the total weight of the elements on either side of the horizontal and vertical axes of symmetry.
$D_{2}$	Symmetry	Calculate the degree of symmetry between interface elements in the 3 directions of vertical, horizontal, and diagonal.
$D_{3}$	Proportionality	Calculates and compares the similarity between the scale values and the value scale between interface elements and layouts.
$D_{4}$	Rhythmicity	Rhythm is the dynamic generated when elements appear continuously according to certain rules, and is expressed through the arrangement order, size ratio, quantity distribution, and form changes of elements.
$D_{5}$	Sequentiality	Order is an indicator that quantifies whether the arrangement of interface elements conforms to the natural reading order of the human eye (from top to bottom, left to right, and large to small).
$D_{6}$	Integrity	Wholeness is a measure of how compact the layout of the elements is. The higher the degree of wholeness, the lower the morphological complexity, the easier it is to identify, and the more harmonious the morphology.
$D_{7}$	Regularity	The degree of regularity is the degree of alignment of the elements, including the left, right, top, and bottom alignment of the elements and the horizontal and vertical alignment of the centroid (the center of the smallest circumscribed rectangle).
$D_{8}$	Common Directionality	Describes the parallelism of the contour lines by how parallel they are.
$D_{9}$	Continuity	Continuity is a measure of the degree to which the visual continuity of the morphological elements within the contour line is grouped.
$D_{10}$	Simplification	Simplification is a measure of the degree to which a morphological element within a contour line has a set of visual simplifications.
$D_{11}$	Similarity	Similarity is a measure of the degree of visual similarity of morphological elements within the contour line.
$D_{12}$	Proportional Similarity	The similarity ratio is a measure of the similarity of the aspect ratio of elements; the higher the similarity ratio, the more harmonious the form.
$D_{13}$	Stability	Stability is a measure of the ability of a morphological element within a contour line to maintain its visual morphological stability.
$D_{14}$	Hierarchy	The factors affecting the degree of hierarchy are as follows: first, the distance between different levels and the distance within the layer, and second, the number of layers.
$D_{15}$	Contrast	Contrast refers to the difference in form and size of design elements, usually represented by changes in the length of the lines.
$D_{16}$	Complexity	The complexity of the contour is described in terms of the relative increment in the circumference of the contour relative to the perimeter of its convex hull.

Table 2. Beauty index values for 20 samples after optimization of the front fascia target elements.

Number	$D_{1}$	$D_{2}$	$D_{3}$	$D_{4}$	$D_{5}$	$D_{6}$	$D_{7}$	$D_{8}$	$D_{9}$	$D_{10}$	$D_{11}$	$D_{12}$	$D_{13}$	$D_{14}$	$D_{15}$	$D_{16}$
1	0.891	0.943	0.870	0.421	0.875	0.547	0.083	0.981	0.993	0.400	0.180	0.515	0.592	0.764	0.734	0.626
2	0.961	0.989	0.851	0.445	0.500	0.467	0.191	0.975	0.992	0.588	0.108	0.542	0.573	0.854	0.556	0.659
3	0.939	0.944	0.834	0.425	0.750	0.456	0.094	0.974	0.993	0.333	0.111	0.493	0.584	0.743	0.717	0.668
4	0.914	0.755	0.861	0.341	1.000	0.475	0.256	0.975	0.992	0.650	0.162	0.585	0.581	0.832	0.509	0.616
5	0.911	0.906	0.795	0.457	0.500	0.521	0.375	0.981	0.993	0.533	0.207	0.567	0.599	0.828	0.516	0.733
6	0.923	0.679	0.857	0.429	0.625	0.481	0.352	0.976	0.993	0.682	0.294	0.577	0.581	0.722	0.730	0.644
7	0.924	0.370	0.846	0.445	0.500	0.585	0.231	0.978	0.993	0.600	0.216	0.565	0.914	0.753	0.717	0.599
8	0.935	0.549	0.846	0.411	0.750	0.544	0.103	0.981	0.993	0.529	0.189	0.527	0.879	0.769	0.738	0.561
9	0.886	0.787	0.851	0.416	1.000	0.322	0.054	0.979	0.993	0.357	0.157	0.567	0.595	0.844	0.516	0.644
10	0.939	1.104	0.858	0.432	0.750	0.418	0.053	0.980	0.993	0.421	0.174	0.504	0.589	0.765	0.721	0.608
11	0.936	1.118	0.814	0.431	0.750	0.402	0.115	0.980	0.993	0.417	0.053	0.525	0.590	0.766	0.720	0.738
12	0.899	1.168	0.874	0.420	0.750	0.401	0.074	0.979	0.992	0.588	0.213	0.530	0.580	0.841	0.530	0.663
13	0.882	1.078	0.844	0.445	0.625	0.507	0.102	0.976	0.993	0.563	0.210	0.532	0.590	0.834	0.529	0.658
14	0.916	0.409	0.823	0.487	0.500	0.536	0.075	0.981	0.993	0.400	0.170	0.564	0.580	0.738	0.733	0.712
15	0.926	0.816	0.846	0.439	0.750	0.311	0.075	0.977	0.985	0.200	0.040	0.507	0.589	0.749	0.729	0.813
16	0.901	0.100	0.814	0.417	0.625	0.502	0.054	0.980	0.993	0.429	0.133	0.568	0.590	0.843	0.511	0.685
17	0.915	1.415	0.878	0.444	0.750	0.548	0.086	0.977	0.993	0.563	0.148	0.513	0.882	0.778	0.721	0.662
18	0.929	0.722	0.842	0.412	0.750	0.602	0.063	0.979	0.993	0.333	0.115	0.472	0.589	0.754	0.738	0.720
19	0.906	0.922	0.865	0.429	1.000	0.458	0.045	0.978	0.992	0.636	0.207	0.557	0.600	0.838	0.520	0.558
20	0.909	0.305	0.813	0.444	0.500	0.385	0.037	0.977	0.993	0.200	0.035	0.407	0.591	0.828	0.522	0.832

Table 3. Comprehensive evaluation values of 20 samples for front-face form optimization of automobiles.

Sample ID	Evaluation Value	Ranking	Sample ID	Evaluation Value	Ranking
1	0.600	6	11	0.592	9
2	0.568	16	12	0.580	11
3	0.578	12	13	0.573	15
4	0.612	5	14	0.548	18
5	0.595	7	15	0.568	16
6	0.614	4	16	0.526	19
7	0.656	2	17	0.685	1
8	0.655	3	18	0.578	12
9	0.574	14	19	0.595	7
10	0.584	10	20	0.507	20

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, A.; Wang, X.; Huang, Y.; Wang, W.; Zhang, S.; Ouyang, J. Product Image Generation Method Based on Morphological Optimization and Image Style Transfer. Appl. Sci. 2025, 15, 7330. https://doi.org/10.3390/app15137330

AMA Style

Zhou A, Wang X, Huang Y, Wang W, Zhang S, Ouyang J. Product Image Generation Method Based on Morphological Optimization and Image Style Transfer. Applied Sciences. 2025; 15(13):7330. https://doi.org/10.3390/app15137330

Chicago/Turabian Style

Zhou, Aimin, Xinle Wang, Yujin Huang, Weitang Wang, Shutao Zhang, and Jinyan Ouyang. 2025. "Product Image Generation Method Based on Morphological Optimization and Image Style Transfer" Applied Sciences 15, no. 13: 7330. https://doi.org/10.3390/app15137330

APA Style

Zhou, A., Wang, X., Huang, Y., Wang, W., Zhang, S., & Ouyang, J. (2025). Product Image Generation Method Based on Morphological Optimization and Image Style Transfer. Applied Sciences, 15(13), 7330. https://doi.org/10.3390/app15137330

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Product Image Generation Method Based on Morphological Optimization and Image Style Transfer

Abstract

1. Introduction

2. Literature Review

2.1. Intelligent Design of Product Form

2.2. Aesthetic Evaluation of Product Form

2.3. Image Style Transfer

3. Optimization Design Model of Product Form Based on Genetic Algorithm and Esthetic Evaluation

3.1. A Morphological Optimization Design Model Based on the Genetic Algorithm

3.2. A Comprehensive Esthetic Evaluation Model for Product Form

3.2.1. Construction Method of Esthetic Index Formula

3.2.2. Construction of the Esthetic Index Formula

3.2.3. Stability

3.2.4. Hierarchy

3.2.5. Contrast

3.2.6. Complexity

3.2.7. Esthetic Evaluation Model of Product Form Based on Entropy Weighting Method

4. IST Model Based on GAN

4.1. Architecture of GAN

4.2. Building a Style Transfer Product Image Generation Model

4.2.1. Generator Module

4.2.2. Discriminator Module

4.2.3. Loss Function

5. Case Study

5.1. Experimental Design for Car Front-Face Form Optimization

5.1.1. Car Front-Face Form Design Parameters

5.1.2. Optimizing the Front Fascia of the Automobile

5.1.3. Calculating the Esthetic Index for the Front Fascia of Automobiles

5.1.4. Weight Analysis Based on Entropy Method

5.2. Automotive Front-Face Style Transfer Experiment Design

5.2.1. Experimental Setup and Dataset

5.2.2. Product Image Generation

5.2.3. Comparative Assessment of Multi-Style Transfer

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI