A Study of Sustainable Product Design Evaluation Based on the Analytic Hierarchy Process and Deep Residual Networks

: Traditional product design evaluation processes are resource-intensive and time-consuming, resulting in unsustainably higher costs and longer lead times. Therefore, sustainable product design evaluation has become an increasingly crucial aspect of product design, focusing on creating a high-efﬁciency, high-reliability, and low-carbon-emission approach. In this study, we proposed an integrated approach that combines manual design evaluation based on the analytic hierarchy process (AHP) with an automatic design evaluation based on a ResNet-50 network in order to develop a sustainable design evaluation method. First, the evaluation level and indicators for the shape design of a tail-light were deﬁned using the AHP. We followed this by establishing a determination matrix and weight coefﬁcients for the design indicators to create a manual design evaluation model. Second, tail-light shape image datasets were manually annotated based on the evaluation indicators, and design datasets were constructed. The ResNet-50 algorithm was introduced to train the datasets, and the automatic evaluation model for product design was constructed through training and tuning. Finally, we validated the feasibility and effectiveness of the product design evaluation method, which was based on AHP and ResNet-50, by comparing the results obtained using both manual design and automatic design evaluations. The results showed that the proposed sustainable product design evaluation model provides an efﬁcient and reliable method for evaluating product design, improves the decision-making process, and empowers the design and development process. The model enhances resource efﬁciency and economic


Introduction
Sustainability evaluation, also known as sustainability assessment, can be broadly defined as the process that directs planning and decision making towards sustainable development [1,2].Sustainability evaluation involves assessing the potential environmental, social, and economic impacts of proposed activities and projects in order to ensure that they are sustainable and contribute to the overall wellbeing of people and the planet.In recent years, sustainability evaluation has gained increasing attention from the scientific research community [3][4][5][6].The existing sustainability evaluation literature predominantly focuses on efficiency improvements [7-9], with the objective of reducing waste and decreasing energy consumption.Sustainability evaluation often includes reporting on the environmental impact, economic benefits, and social benefits of an activity [1,10].For example, sustainability evaluation may consider the economic costs and benefits of different options, the social implications of a project, and its impact on the environment.As such, sustainability evaluation can serve as a tool for identifying trade-offs among various sustainability goals and developing strategies for negative impact mitigation.
In the field of product design, sustainable design evaluation is an essential stage in improving the efficiency of the entire product development process and is widely regarded as one of the most important practices in achieving sustainability [11,12].The traditional design evaluation process often requires significant resource allocation and tends to be timeconsuming, resulting in increased costs and longer lead times for product development.In contrast, sustainable design evaluation offers to expedite decision making, accelerate time to market, reduce labor costs, and improve resource efficiency.These benefits contribute to mitigating the negative environmental impact of products while maximizing their economic and social advantages.As such, sustainable design evaluation has attracted widespread attention in the field of product design and is seen as an important step towards creating a more sustainable and responsible approach to product development.
Design evaluation involves a systematic assessment of design quality [13][14][15], functionality [16,17], and usability [18,19].For example, Isler et al. [16] evaluated a sudden brake warning system (SBWS) using a video-based simulation dual task, revealing that the SBWS was effective and beneficial in enabling smoother deceleration and reducing fuel consumption.Lai et al. [17] conducted objective experiments and subjective evaluations through an on-road test to evaluate the effect of tinted helmet visors on visibility.They proposed an optimal tinted helmet visor design for motorcycle riders that effectively avoids a reduction in visibility performance.Azham et al. [20] evaluated the usability of mobile phone app designs and proposed a usability metric framework that establishes links between usability goals, such as simplicity, accuracy, and safety.To achieve these evaluations, specific criteria and standards were established to measure the design's compliance and effectiveness in achieving desired objectives.These criteria and standards can vary depending on the product's purpose, target audience, and industry standards.Design evaluation can be performed using various methods, including manual evaluation by human evaluators, automatic evaluations using computer-based algorithms, or a combination of both.
Manual design evaluation is conducted by human evaluators, who use their experience, skills, and expertise to assess a design.Evaluation criteria are generally based on various factors, including aesthetics, functionality, user experience, and sustainability.Manual design evaluation relies on the personal preferences, expertise, and knowledge of the human evaluators and can be particularly useful when evaluating designs that involve subjective criteria, such as aesthetics or user experience.Several studies have been conducted regarding manual design evaluation.For example, Ariffe et al. [21] presented a methodology for evaluating and selecting the most appropriate design concepts at the conceptual design stage based on the analytical hierarchy process (AHP).Kapkın et al. [22] evaluated product form design based on a method of thinking aloud and subjective rating, and they suggested that the physical quality of the form and the perceived meaning followed patterns akin to Fechner's law of perception.Luo et al. [23] conducted a preliminary study of perceptual matching in order to evaluate beverage bottle design based on a classification task and a semantic differential (SD) experiment.Sonderegger and Sauer [24] investigated the role of non-visual aesthetics in consumer product evaluations based on a subjective rating with a Likert scale.Lin et al. [25] evaluated headphone design based on Kansei engineering and proposed headphone design strategies that conformed to consumer perception and preference.
Automatic design evaluation refers to the use of computer-based algorithms and software tools to evaluate a design based on predefined criteria and standards.These algorithms can analyze various aspects of a design, such as its functionality, manufacturability, and cost-effectiveness, to provide quantitative feedback on the design's performance.With the development of artificial intelligence technologies such as big data and machine learning algorithms, automatic design evaluation has started to empower the product design and evaluation process.It can provide a fast, objective, and comprehensive evaluation of a design's performance, while also improving the quality and efficiency of the design process.Several researchers have studied automatic design evaluation.For example, Wu [26] proposed a method for evaluating the design of product shapes using a multilayer perceptron genetic algorithm in a neural network (GA-MLP-NN).Hao et al. [27] proposed a novel quality-guided deep neural network and weighting scheme to achieve a multiobjective evaluation and decision across multiple industrial products and their design processes.
A combination of manual and automatic evaluation methods has also been used.In such instances, human evaluators performed the initial evaluation, and the results were then verified and validated using computer-based algorithms.This approach combined the benefits of both methods, providing a comprehensive evaluation that was both objective and captured the human evaluators' expertise.Xing et al. [28] constructed a dataset using GUI images with the ratio of likes/views as the ground-truth annotation, and proposed a GUI (graphic user interface) aesthetic evaluation model based on squeeze-and-excitation VGG19 network architecture.Zhang et al. [29] constructed a self-adaptive deep aesthetic model of Chinese ink paintings using a subject query mechanism.The results demonstrated that the proposed aesthetic model had a higher aesthetic evaluation performance.Although several studies have been conducted on the combination of manual and automatic evaluation methods, few studies have been carried out in the field of sustainable product design.The relevant studies on design evaluation are presented in Table 1.This study aimed to develop a sustainable product design evaluation method that integrates an AHP-based manual design evaluation method with an automatic design evaluation method using deep residual network algorithms (i.e., ResNet).After considering multiple factors and criteria, this sustainable design evaluation method could improve the efficiency of the design process while also ensuring that the product met necessary requirements.

Theoretical Background 2.1. The Analytic Hierarchy Process
The analytic hierarchy process (AHP) is a decision-making tool developed by Saaty [30].It is a mathematical model that organizes and prioritizes complex decision-making processes by comparing the relative attributes and importance of two items.AHP is based on the concept that decision making involves several criteria, each with different levels of importance, and that these criteria can be organized into a hierarchical structure [31,32].This hierarchy consists of a goal, which is broken down into a series of criteria.These are then further broken down into sub-criteria, and so on.In this study, operating based on AHP, decisionmakers were asked to evaluate each criterion and sub-criterion in relation to each other using a pairwise comparison method.The results of these comparisons were then used to calculate the weights for each criterion, reflecting their relative importance in achieving the overall goal.The AHP method involved five primary steps: 1.
Define the decision problem: The first step was to define the decision problem and identify relevant, important, and measurable criteria.

2.
Create a hierarchy: The second step was to develop a hierarchy of the decision problem.This hierarchy consisted of the goal, criteria, and indicators.The goal was the overall objective, the criteria were the factors used to evaluate the indicators, and the indicators were the options evaluated.

3.
Importance pairwise comparison: This involved comparing each decision criterion with every other criterion and each indicator with every other indicator in terms of their importance.The comparisons were made using a scale of 1-9, where 1 indicated that the two elements were equally important, and 9 indicated one element was significantly more important than the other.Each of the comparison matrices followed the form of Equation (1).
In this matrix, d ij refers to the pairwise comparison rating for attributes i and j.

4.
Calculate the weights: The weights of the criteria and indicators were calculated using a mathematical formula that considered the pairwise comparisons.To ensure that the pairwise comparisons were consistent, a consistency ratio was calculated.If the consistency ratio was greater than 0.1, then the pairwise comparisons required review and revision.Once the weights were calculated, the indicators could then be evaluated using the decision criteria.The indicators were scored based on how well they met the decision criteria.

5.
Calculate the overall rankings: We calculated the overall rankings of the indicators by multiplying the indicator performance scores by the corresponding criteria weights and summing the results for each indicator.
The AHP method has been widely used in various fields, including design, engineering, environmental sciences, and management, due to its flexibility in handling complex decision-making tasks [33][34][35][36].For example, Ginting et al. [37] used the AHP method, value engineering, and a brainstorming model to design a standard static bicycle for post-stroke patients.Zhou et al. [38] employed the AHP method to evaluate the landslide susceptibility of a construction site for photovoltaic power generation facilities.Uzoka and Ijatuyi [39] utilized the AHP approach to investigate library management, specifically focusing on the acquisition process.They considered various criteria such as costs, resource availability, and the required number of book copies.Dožić et al. [40] structured and rated the criteria regarding the airline choices of freight forwarders using the AHP method.Veisi et al. [41] assessed the opinions of elite farmers, agricultural water specialists, extension experts, and agricultural non-governmental organizations on three irrigation systems using the AHP method based on nine indicators to select the best irrigation system.

Residual Neural Network
Deep residual networks (ResNet) are a type of deep neural network architecture proposed by He et al. [42] in 2015.By incorporating a unique design involving identity mapping in the deep convolutional neural network, ResNet addressed the challenges caused by the deep network architecture, such as gradient disappearance, gradient explosion, and network degradation [42,43].ResNet enhanced the number of information transmission paths, extended the network depth from dozens to thousands of layers, and significantly improved the accuracy of the system.
Figure 1 shows the structure of the residual learning module of ResNet, comprising a parametric network layer mapped to H, with x as input and H(x) as output [44].In contrast to traditional deep convolutional neural networks that directly learn the expression formula for H, ResNet learned the mapping of the residual F(x) and x itself through multiple parameterized layers, resulting in H(x) = F(x) + x.The residual network only derived H(x) − x, making it easier and more effective to directly obtain the residuals using a generalized parametric layer.ResNet has demonstrated outstanding performance in various fields, such as image classification [45,46], facial detection [47], and semantic segmentation [48].Additionally, it has proven to be a highly effective image feature extractor [49,50].In the field of product design evaluation, ResNet is starting to be applied for design evaluation, enhancing traditional design evaluation methods.For example, Wu et al. [51] developed an evaluation prediction model for the products of an industrial design competition based on a deep neural convolutional network (DCNN) and introduced the SEFL-ResNet (squeeze-and-excitation Focal-ResNet) model to accurately predict the evaluation of product design awards.Wang et al. [52] trained the side profiles of lifting machinery using ResNet and presented an artificial intelligence decision-making model for lifting machinery.
Various types of ResNet architectures have been introduced, such as ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152.Among these, ResNet-50 is a variant that has been broadly applied in many computer vision applications [53][54][55][56].The ResNet-50 network consists of six stages, as depicted in Figure 2 [57], comprising the input stage (stage 1), four residual stages (stage 2-5), and the output stage (stage 6).It consists of a total of 49 convolutional layers and 1 fully connected layer, with each layer utilizing batch normalization and a ReLU activation function to enhance the model's fitting ability.ResNet-50 has demonstrated outstanding performance in numerous image recognition tasks, aesthetic evaluations, and assessment predictions.For example, Dewi et al. [58] combined ResNet-50 with spatial pyramid pooling (SPP) to identify musical instruments that were identical in size, form, and sound.The authors then proposed a ResNet-50 SPP model with strong identification performance.Fulton et al. [59] used ResNet-50 to predict the presence and severity of clinical dementia ratings (CDRs) from MRIs (multiclass classification) and demonstrated that a ResNet-50 network model could help to identify Alzheimer's disease patients automatically, prior to the provider review.Wang et al. [60] proposed a multidimensional aesthetic evaluation method of mobile game images based on a ResNet-50 algorithm and verified the effectiveness of the proposed evaluation algorithm through comparative experiments.The high recognition accuracy and real-time performance of ResNet-50 made it a suitable choice for product design evaluation.Furthermore, considering the equipment applied in this study, ResNet-50 was selected as a tool for product design evaluation.

Methods
This study aimed to combine a manual design evaluation method based on the analytic hierarchy process (AHP) with automatic design evaluation based on deep residual network algorithms (i.e., ResNet) to develop a sustainable evaluation method for product design.As the design of a tail-light profile plays an essential role in the overall aesthetics and functionality of a vehicle and is becoming an increasingly important aspect for vehicle design evaluation and decision making, tail-light design was used as a case study for product design evaluation.

Experiment 1: Manual Design Evaluation Based on the AHP Method
This section outlines the development of a manual design evaluation model based on the AHP method for tail-light design evaluation.

Establishment of Design Evaluation Indicators
Luo et al. [61,62] demonstrated that safety and aesthetics are the two primary criteria for tail-light shape design.In terms of the safety criterion, their study revealed that a longer tail-light shape is safer in comparison to a shorter tail-light shape [61].Furthermore, an enclosed-shaped tail-light was found to significantly enhance vehicle conspicuity when compared to an open-contour-shaped tail-light.The visibility of the tail-light shape has a direct impact on driving safety [63,64], making it a crucial safety criterion.
Regarding the aesthetic criteria, it was found that consumers exhibit a preference for tail-light shapes with contour features or medium length.Consequently, tail-light shapes with contour features and medium length can be considered as important aesthetic criteria [61].Complexity has been identified as a strong predictor of aesthetic judgment [65][66][67].Previous studies have consistently demonstrated that as stimulus complexity increases, there is an initial rise in preference until reaching an optimal point, beyond which preference declines in an inverted-U pattern [68,69].Maintaining a moderate level of complexity can effectively capture observers' attention [69] and enhance visual aesthetic preferences [70,71].Therefore, complexity can be regarded as one of the indicators for tail-light shape design.
The word "streamline" refers to the incorporation of sleek and aerodynamic characteristics into the shape of tail-lights.This design approach aims to reduce air resistance and minimize drag, resulting in improved aerodynamic efficiency for vehicles [72].By creating a streamlined profile, tail-lights contribute to the smooth flow of air around a vehicle, reducing turbulence and optimizing performance.This not only enhances fuel efficiency but also improves handling and stability at higher speeds [73].The Integration of streamline principles into tail-light design showcases the commitment of automotive manufacturers to achieving greater efficiency and enhancing the overall performance of their vehicles.Moreover, streamline tail-light shape design often contributes to the overall aesthetic appeal of a vehicle, creating a more modern and stylish look [74].
Harmony refers to the overall coherence and balance achieved in the tail-light shape.According to Van Poolen [75], "delightful harmony" is the central norm that characterizes the aesthetic dimension of engineering design.Pirsig [76] defines aesthetically good design as inducing a holistic, delightful harmony between human and machine.As our sense of beauty corresponds to our innate sense of harmony [77], achieving harmonious tail-light shape design is crucial for the overall appearance of a vehicle.Such a design enhances the vehicle's aesthetic appeal, imparting elegance and a sense of completeness.When the tail-light shape complements other elements of the vehicle, it creates visual consistency, resulting in a smoother and more appealing exterior.
Exquisiteness emphasizes the intricate and refined aspects of the tail-light shape.It is often associated with elegance, refinement, and meticulous attention to detail, particularly in the realm of product design.Exquisiteness plays a significant role in the overall quality and appeal of a product, creating a pleasing and impressive appearance for users or customers [78].For instance, Hou et al. [79] evaluated smartphone designs based on dimensions such as exquisiteness, balance, color, and reliability, constructing a Kansei image rating system to assess smartphone appearance.Cheng et al. [80] focused their efforts on the design themes of "exquisite" and "intimacy," showcasing a successful design practice for product form innovation.In the context of tail-light shape, an exquisite design serves as a testament to the exceptional level of precision and attention to detail incorporated into the vehicle's production.Exquisiteness elevates the tail-light from being a mere functional component to becoming a work of art that adds a touch of luxury and refinement to the overall appearance of the vehicle.
The concept of trendiness captures the incorporation of modern and fashionable elements into tail-light shape design.The product semantics of Hsiao et al. [81] can be categorized into four fundamental dimensions: trendiness, emotion, complexity, and potency.These dimensions play crucial roles in affecting judgments of product shapes and contribute to the development of emotionally appealing product designs.Hung et al. [82] investigated the impact of trendiness, complexity, and emotion on judgments of product novelty and aesthetic preference.Bloch [83] highlighted the influence of prevailing styles and fashion on consumer preferences for product design, emphasizing the significance of trendiness in consumer perceptions of product appearance [82,84].In tail-light shape design, trendiness entails incorporating contemporary and stylish elements that reflect current fashion and design trends.By embracing trendiness, tail-light designs can appeal to consumers who appreciate modern aesthetics, enhancing overall vehicle appeal and desirability.
The tail-light shape design criteria and their corresponding indicators were defined and are summarized in Table 2, including safety and aesthetics.The safety criteria had three indicators, which were the length of the shape (S1), the closure of the shape (S2), and the visibility of the shape (S3).The aesthetic criteria (A level) had seven indicators, which were the length of the shape (A1), the filling degree of the pattern (A2), streamlining (A3), complexity (A4), balance (A5), exquisiteness (A6), and trendiness (A7).These design criteria and indicators for tail-light shape design have been agreed upon and confirmed based on expert evaluations performed by seven vehicle designers with at least five years of design experience.

Safety (S)
The length of shape (S1) Evaluates whether the length of the tail-light shape is long The closure of shape (S2) Evaluates whether the outline of the tail-light is closed The visibility of shape (S3) Evaluates whether the overall shape (size, shape, etc.) of the tail-light can attract the attention of passengers The length of shape (A1) Evaluates whether the length of the tail-light shape is medium Filling degree of pattern (A2) Evaluates whether the tail-light shape pattern has low filling degree (low filling, such as hollow; high filling, such as solid; medium filling, such as textured) Streamline (A3) Evaluates whether the tail-light shape is streamlined Evaluates whether the complexity of the tail-light shape is moderate Harmony (A5) Evaluates whether the shape of the tail-light is harmonious Exquisiteness (A6) Evaluates whether the shape of the tail-light is exquisite Trendiness (A7) Evaluates whether the shape of the tail-light meets the aesthetic development trend of the times

Pairwise Comparison
Based on the AHP method, the first steps to achieve a more objective and accurate calculation of weights among the different indicators were to conduct pairwise comparisons and create determination matrices for each level.If we consider the upper-level criteria as D and the lower-level sub-criteria as d 1 , d 2 , . .., d n , it is possible to determine the weights indicating the relative importance of elements d i and d j to the criteria D. The typical approach to measuring the relative importance of indicators involves assigning numerical values and their reciprocals on a scale of 1-9 [32].Table 3 shows the meaning of the scale for the determination matrix used to evaluate the design of the tail-light shape.
The determination matrix equation of the design level was defined by Equation (2): where A group of nine car users with driver's licenses and over five years of experience in automotive and industrial design were invited to evaluate the tail-light's shape design using the pairwise comparison.The following scores represent one participant's ratings for the determination matrices at each level.More comparison scores are shown in Table A1.

. Calculation of Weight Coefficients
Once the determination matrix was constructed, the next step was to calculate the important weights between the indicators.Common methods for weight calculation include the geometric mean, arithmetic mean, eigenvector, and least-squares methods [85].For this paper, the weight vector for each determination matrix was calculated using the geometric mean method [86] in accordance with the process outlined below: 1.
Calculate the product D i of the scale values in each row of the determination matrix D, as shown in Equation ( 3): where n represents the number of indicators and d ij represents the i-th row and j-th column indicators in the determination matrix D.

2.
Calculate the geometric mean of the indicators in each row, as shown in Equation (4): 3.
Normalize the result to obtain weights, as shown in Equation ( 5): Perform a consistency assessment using Equations ( 6) and (7): where λ max is the largest eigenvalue, CI is the consistency index, D w i is the i-th component of the vector D w , and n is the order of the determination matrix.The random index (RI) was introduced to measure the magnitude of the consistency index (CI) [87].Further calculations were performed to obtain the consistency ratio (CR) as CR = CI/RI; generally, the smaller the CR value was, the better the consistency of the determination matrix would be.A CR value of less than 0.1 indicates that the determination matrix meets the consistency test, while a CR value greater than 0.1 indicates that the determination matrix needs to be adjusted and reanalyzed.
Based on this process, the weight coefficient w i , maximum eigenvalue λ max , CI, and CR in criteria layers S and A were calculated.The CR values of the obtained results were all less than 0.1, and all values passed the consistency test.All the data for each determination matrix are shown in Appendix A (Tables A2-A4).

Ranking of Indicators
The weight coefficients of each indicator in the aforementioned determination matrices were analyzed and are presented in Table 4. Based on the important weighting coefficients, the indicators could be ranked as follows: In the V level, safety was ranked higher than aesthetics.In the S level, visibility was ranked higher than the length and the closure of the outline.In the A level, harmony was ranked higher than exquisiteness, trendiness, being streamlined, filling degree, complexity, and length.These rankings could provide valuable insights that allow designers to evaluate and make decisions regarding product design.

Calculation of Manual Design Evaluation
Once the indicators and their weights were obtained, manual scoring was conducted based on the indicators.Specifically, we used a 5-point Likert scale to allow evaluators to indicate their attitudes rapidly [88].After evaluating each indicator, its score was multiplied by its corresponding weight coefficient and then the values were summed to calculate the total score of the solution, as shown in Equations ( 8)- (10).
where R V is the total evaluation score, R S is the safety score, R A is the aesthetic score, R i is the mean value of evaluation score, and w i is the weight value of the indicator R i .

Experiment 2: Automatic Design Evaluation Based on ResNet
This section describes the combination of deep residual networks used to empower manual design evaluation, realizing the automation of design evaluation and compensating for the low efficiency of manual design evaluation.

Datasets
To construct an automatic evaluation model for vehicle tail-light shapes, a design dataset of tail-light shapes was first defined.The dataset consists of two parts: a tail-light shape image dataset and its evaluation scores.The tail-light shape image dataset was the same as that used in the study by Lin et al. [89], which had a total of 1577 images, as shown in Figure 3.To comprehensively evaluate the tail-light shape design, the labels of the tail-light shape image dataset were defined using the 10 indicators obtained in Section 3.1.1.As there were no specific or standard datasets related to the evaluation of vehicular tail-light shape design, manual labeling was used to perform scoring.Five design experts were invited to participate in the evaluation of the vehicle tail-light shapes.Before the design evaluation, the researcher introduced tail-light safety and aesthetic design knowledge obtained from the research of Luo et al. [61,62] to the experts, which were two of the scoring criteria.Each participant rated all the evaluation indicators independently based on a 5-point Likert scale (5 points for totally agree, 4 points for agree, 3 points for neutral, 2 points for disagree, and 1 point for totally disagree).The specific evaluation indicators are shown in Table 5.
Finally, the average scores given by the five automotive design experts were calculated and used as the score labels for each evaluation indicator.This led to the further development of a design dataset for tail-light shape images and their corresponding score labels.
The design dataset of tail-light shapes consisted of 1577 images corresponding to 10 evaluation indicators.In this experiment, 90% of the dataset was designated as the training set, whereas the remaining 10% was used as the validation set.

No
Evaluation Indicator Rating The length of the tail-light shape is long.
The outline of the tail-light shape is closed.The length of the tail-light shape is medium.
The filling degree of the tail-light shape design pattern is low (low degree-contour; high degree-solid; medium degree-texture [55,56]) The tail-light shape is streamlined.
The complexity of the tail-light shape is moderate.
The tail-light shape is harmonious.
The tail-light shape is exquisite.
The tail-light shape is trendy.

Apparatus
The experiment utilized the Python language and the Pytorch deep learning framework.The computer hardware used for the experiment included a 64-bit Windows 10 operating system, a 2.8 GHz Intel i7 CPU, and an Nvidia RTX 2070 graphics card.The software configuration included Python 3, which integrated Anaconda and Pytorch 1.0.

Model Training
This study used the ResNet-50 framework to construct an evaluation model for the design of the vehicle tail-light shape.As shown in Figure 4, the process of the experimental model construction included data pretraining, transfer learning, fine-tuning of target data, and model performance evaluation.Before inputting the data into ResNet-50, the tail-light images were uniformly scaled to a size of 224 × 224 pixels.The input layer included preprocessed shape images and evaluation labels.After inputting the tail-light shape image dataset, it first underwent a 2D convolution, which changed the original channel number of the image to 64.The convolutional kernel size was 7 × 7, the stride was 2, and the padding was 3.During training, the pooling layer changed the size of the data features.The convolutional kernel size of the first pooling layer was 3 × 3, and the stride was 2. The convolutional kernel sizes of other pooling layers were 2 × 2, and the stride was 2. After the convolution layer, there was a batch normalization layer and a non-linear activation function ReLU, which was followed by maximum pooling for down-sampling.The batch normalization layer had the function of accelerating training and optimizing results, thereby improving the training effect of the deep residual network.
After the first convolutional layer operation, the procedure entered four residual model stages, in sequence, each of which contained multiple repetitive residual modules.Each residual module included a main path and a shortcut.The main path had three convolutional layers that extracted tail-light image features and evaluation information.The shortcut possessed one convolutional layer that controlled the number of features and doubled the convolutional kernel to speed up the residual propagation speed.The tail-light feature dimensions and layer numbers obtained from the main path and shortcut were the same.These were superimposed at the end of the residual module, thereby fusing the deep and shallow features of the tail-light shape image and evaluation data to obtain more tail-light shape image feature information.
After the residual stage, the model continued through a final stage consisting of an average pooling layer and a fully connected layer.It integrated the convolutional features obtained earlier and converted the data into a 10-dimensional, fully connected network output, which was then used for tail-light shape design evaluation and prediction.The mean square error (MSE) loss function was used to determine the degree of convergence between the predicted and true values.ReLU was used as the activation function for each convolutional and fully connected layer.
In this experiment, the batch size of the training parameters was set to 16, the initial learning rate was 0.0001, and the training cycle (epochs) was set to 20 based on the size of the dataset.The stochastic gradient descent was used for training and the Adam optimizer was used to learn the parameters and optimize the network.The optimizer had a momentum parameter set at 0.1, and a weight decay parameter set at 0.0001.The MSE function was used as the optimization loss function, which calculated the square sum of the distance between the predicted value and the true value, as shown in the following formula: where m refers to the number of pictures, y i refers to the true value, and ∧ y i refers to the predicted value.When continuously training the model, the smaller the loss function that was obtained, the higher the accuracy of the model became, which indicated a higher accuracy when predicting the evaluation of tail-light shape images.Figure 5

Results
The experimental datasets were split into training and validation sets using the 10-fold cross-validation method, where 90% of the data was used for training and 10% for validation.The model's performance was evaluated by adjusting the parameters and conducting multiple iterative tests.The MSE was used as the evaluation metric, with a smaller MSE indicating a smaller prediction error and better model performance.During the experiment, it was observed that the model started to converge after 20 training epochs, as shown in Figure 6, where the horizontal axis represents the training epochs and the vertical axis represents the loss.
Additionally, the dataset was trained and compared with the VGG-16 and Alexnet deep learning frameworks.Figure 6 illustrates the MSE loss value curves for the three different models.At the optimal training epoch, the MSE values for the validation set of the three models were obtained at 0.484 (ResNet-50), 0.5058 (VGG-16), and 0.5079 (Alexnet).By combining the MSE loss value curves and MSE values, it was discovered that the prediction accuracy of ResNet-50 for tail-light shape design evaluation was superior to that of the other two deep neural networks.These results suggested that using a deep residual network for the prediction of automotive tail-light shape design evaluation was feasible.

Comparison and Verification of Experimental Results
To validate the effectiveness and the feasibility of the tail-light shape design evaluation model based on a ResNet-50 network, we compared the scores obtained from the automatic design evaluation with those obtained from the manual design evaluation.Firstly, seven taillight shape images were randomly selected from the dataset as validation samples, and both manual and automatic design evaluations were performed to compare the different scores.
In the manual design evaluation process, we invited 32 participants, including 10 vehicle designers and 22 consumers, to rate 10 indicators of tail-light shape design.Each participant independently rated all evaluation indicators based on a 5-point Likert scale (5 points for "very consistent", 4 points for "consistent", 3 points for "neutral", 2 points for "inconsistent", and 1 point for "very inconsistent").After all participants had completed their assessments, we calculated the total score for each tail-light shape design by weighting the 10 indicators according to Equations ( 8)- (10).The average score of each tail-light shape design was then calculated as the final score of the manual evaluation.The manual design evaluation score was used as the expected output value (ideal value) of the evaluation in this study.
In the automatic design evaluation process, we selected seven tail-light shape images and input them into the constructed ResNet-50 framework.Through the semantic decoding of the tail-light shape images, we obtained evaluation scores ranging from 1 to 5 for the corresponding 10 indicators.Based on Equations ( 8)-( 10), we calculated the total score for each tail-light shape image by weighting the scores of the 10 indicators.The automatic design evaluation score was used as the training output value (measurement value) of the evaluation model in this study.
In practical applications of theoretical prediction models, reasonable errors are generally allowed.Since relative error can reflect the credibility of measurements [45,46], we used the relative error value to compare the results of the automated evaluation with those of the manual evaluation in order to verify the effectiveness of the automatic evaluation model.The equation for the relative error was as follows: where E r is the relative error, x is the automatic design evaluation score, and u is the manual design evaluation score.
The evaluation scores and relative errors obtained from both the automatic and manual design evaluations are presented in Table 6.This study set the allowable error range at 3% based on studies conducted by Yang et al. [90] and Zhao [91].The results demonstrated a close agreement between the automatic and manual design evaluation scores, with a maximum relative error of 2.611%.This indicated that the automatic evaluation model had a good generalization ability and could be effectively used to automatically evaluate tail-light shapes.The close similarity between the automatic and manual evaluation scores, as shown in Table 6, supported the concept that an evaluation model based on ResNet-50 could be applied to the automatic evaluation of tail-light shape design.The evaluation model constructed, which integrated AHP with ResNet-50, could achieve a faster evaluation speed, thereby improving efficiency and reliability when evaluating and making decisions about tail-light shape designs during vehicle design and development.

Discussion and Conclusions
This study aimed to develop a sustainable design evaluation method for product design that considered multiple factors and criteria by conducting integrated research on a manual design evaluation based on the AHP and an automatic design evaluation based on a ResNet-50 network.
In this study, the tail-light shape design was utilized as a case study for product design evaluation.Firstly, the evaluation level and the indicators of tail-light shape design were defined based on the AHP.Then, a determination matrix was constructed and the weight coefficients for the design indicators were calculated in order to establish the manual design evaluation model.Secondly, the tail-light shape image dataset was manually annotated based on the evaluation indicators to create a relevant dataset.The ResNet-50 algorithm was then introduced to train the dataset, transforming the design evaluation problem into an image feature learning problem, and the product automatic design evaluation model was constructed through modeling and training.Finally, the feasibility and effectiveness of the sustainable evaluation method of tail-light shape design based on AHP and ResNet-50 were validated by comparing the results of manual and automatic design evaluations.
The integrated sustainable product design evaluation model, which incorporates the AHP and ResNet-50, offers a sustainable, efficient, and reliable method for evaluating product design.This innovative approach is particularly significant in the context of vehicle design, on which no related studies have been conducted.By combining the advantages of manual design evaluation, such as flexibility, human judgment, and a holistic view, with the benefits of automatic design evaluation, including speed, consistency, and cost-effectiveness, the constructed evaluation model offers a comprehensive solution.It significantly shortens the product design decision-making process, leading to a faster time to market, reduced labor costs, and more efficient resource utilization.Furthermore, by integrating sustainability considerations into the initial design phase, the model can help to avoid the costly environmental and social impacts associated with unsustainable practices, such as pollution or workforce exploitation, that may arise later in the product lifecycle.
The sustainable product design evaluation model has broad applicability, ranging from vehicle design in the transportation field to consumer product design in the electronics industry.Policymakers at national and local levels, in collaboration with sustainability advocates, can leverage this approach to effectively reduce the carbon footprint across various industries.Moreover, the methods presented in this paper, which initially focus on exploring tail-light shape design, can be extended to other product design and development processes.This flexibility allows for the streamlining of decision making in diverse product domains.
Overall, this integrated approach offers a pathway to improved production efficiency, reduced waste generation, and minimized carbon footprint.It holds significant promise for creating sustainable and environmentally conscious products.
There are some limitations to this study, which should be noted.Firstly, this approach integrates the AHP method with ResNet-50, rendering the robustness of the dataset derived from the AHP method and the training performance based on ResNet-50 crucial and highly influential factors in the final evaluation accuracy and effectiveness.However, due to time and budget constraints, the sample size of experts involved in scoring and labeling the tail-light shape design dataset was small.To enhance dataset robustness, future studies should engage more scoring experts through manual rating tasks.Additionally, this study represents a preliminary exploration and application of automatic tail-light shape design evaluation.A basic ResNet-50 model was introduced for training purposes to achieve an automatic evaluation function.In the future, innovative algorithms should be further optimized to train a tail-light shape design evaluation model in order to achieve higher prediction accuracy.Moreover, more empirical comparative studies of tail-light shape design practice based on the proposed evaluation model and other models could be conducted to identify opportunities for improvement.Examining the scalability and adaptability of the approach to different contexts and regions could also provide valuable insights.Furthermore, exploring potential synergies with related research areas and investigating interdisciplinary collaborations could broaden the scope and impact of this study.

, and d ii = 1 .
If all the indicators of the determination matrix D satisfy d ij •d jk = d ik , then D is a consistent matrix.

3
The size, shape, and other aspects of the tail-light shape can attract the driver'

Figure 5 .
Figure 5. Architecture diagram of tail-light shape design evaluation model based on ResNet-50.

Figure 5 .
Figure 5. Architecture diagram of tail-light shape design evaluation model based on ResNet-50.

Figure 6 .
Figure 6.(a) MSE loss curve of tail-light shape design evaluation based on ResNet-50.(b) MSE loss curve of tail-light shape design evaluation based on VGG-16.(c) MSE loss curve of tail-light shape design evaluation based on AlexNet.

Figure 6 .
Figure 6.(a) MSE loss curve of tail-light shape design evaluation based on ResNet-50.(b) MSE loss curve of tail-light shape design evaluation based on VGG-16.(c) MSE loss curve of tail-light shape design evaluation based on AlexNet.

Figure 7 Figure 7 .
Figure 7 depicts a comparative line chart of the automatic and manual design evaluation scores.The horizontal axis represents the sample number of the test, and the vertical axis represents the user rating.The blue solid line shows the automatic design rating (training output value), whereas the red dashed line shows the manual design rating (expected output value).It was apparent that the trend of the curves was almost identical.Sustainability 2023, 15, x FOR PEER REVIEW 7 of 7

Figure 7 .
Figure 7. Comparative line chart of automatic design evaluation scores and manual design evaluation score.

Table 1 .
The relevant studies on design evaluation.

Table 2 .
Tail-light shape design criteria and their indicators.

Table 3 .
Scale for the determination matrix of tail-light design evaluation indicators.

Relative Importance Value (d i /d j ) Scale Meaning 1
Indicator d i is as important as indicator d j 3 Indicator d i is more important than indicator d j 5 Indicator d i is much more important than indicator d j 7 Indicators d i is strongly important compared to indicator d j 9 Indicators d i is absolutely important compared to indicator d j 2, 4, 6, 8 Determine the importance level based on adjacent scale values reciprocal If the ratio of the importance of d i to d j is d ij , then the ratio of the importance of d j to d i is

Table 4 .
Weight coefficient of each indicator in each layer.

Table 5 .
The specific evaluation indicators for rating.

Table 6 .
Relative error results.

Data Availability Statement: Not applicable. Conflicts of Interest:
The authors declare no conflict of interest.

Table A1 .
Comparison matrices for rating by each user.

Table A2 .
Weight coefficient of determination matrix V.

Table A3 .
Weight coefficient of determination matrix S.

Table A4 .
Weight coefficient of determination matrix A.