Virtual Reality-Based Stimuli for Immersive Car Clinics: A Performance Evaluation Model

: This study proposes a model to evaluate the performance of virtual reality-based stimuli for immersive car clinics. The model considered Attribute Importance, Stimuli Efﬁcacy and Stimuli Cost factors and the method was divided into three stages: we deﬁned the importance of fourteen attributes relevant to a car clinic based on the perceptions of Marketing and Design experts; then we deﬁned the efﬁcacy of ﬁve virtual stimuli based on the perceptions of Product Development and Virtual Reality experts; and we used a cost factor to calculate the efﬁciency of the ﬁve virtual stimuli in relation to the physical. The Marketing and Design experts identiﬁed a new attribute, Scope; eleven of the ﬁfteen attributes were rated as Important or Very Important, while four were removed from the model due to being considered irrelevant. According to our performance evaluation model, virtual stimuli have the same efﬁcacy as physical stimuli. However, when cost is considered, virtual stimuli outperform physical stimuli, particularly virtual stimuli with glasses. We conclude that virtual stimuli have the potential to reduce the cost and time required to develop new stimuli in car clinics, but with concerns related to hardware, software, and other deﬁnitions.


Introduction
Original equipment manufacturers conduct surveys on customers' preferences in order to obtain insight into their propensity for future purchases. These surveys, called car clinics, adopt standard market research methods to systematically measure the subjective opinion of selected participants on new vehicles [1] and provide in-depth feedback and insights from customers to Marketing, Engineering, and Design areas [2].
A vehicle's size and design are significant features guiding customer product desires and decisions about their acquisition. Thus, car clinics are typically performed through static full-scale physical models of the final product as a Visual stimulus. A survey includes a variety of stimuli, with the new vehicle being compared to four to eight competing models. All stimuli are normally arranged side by side in a secure showroom, prohibiting unauthorized persons from accessing the stimuli. Potential and recent segment clients are prompted to evaluate the stimuli by providing impressions and ratings on style and design. To avoid research bias, respondents are not given information about the brand, vehicle names, or models.
Running a car clinic is a challenging operation since the production of such stimuli requires the use of specialized manufacturing methods and a significant amount of effort on the part of skilled workers. These operations are expensive and need careful planning throughout the automotive product development process. The logistical effort required to do this is time and money consuming, since a highly confidential prototype must be manufactured and transported to the test setting. Competitor vehicles, as well as large test facilities with professional lighting, must be leased. Everything must meet the high safety requirements of the manufacturers in order to safeguard the prototypes on display [3,4].
Meanwhile, the automotive industry is progressively adopting virtual reality (VR), which is a "computer-generated digital environment that can be experienced and interacted with as if it were real" [5]. The applications include vehicle design [6,7], assembly training [8], planning and implementing new workstations [9], and logistics optimization [10], as well as automotive market research, with conventional surveys shifting from a physical to a virtual format [1]. However, the automotive industry is predominantly using virtual reality with customers after product conception and does not generate input data for product creation [10].
Virtual reality has the potential to enhance car clinics by lowering costs, improving interaction flexibility, bringing customers nearer, and avoiding the shipping of physical stimuli [3]. In terms of cost savings, because the time required to develop a virtual prototype is minimal and virtual prototypes can be generated earlier than physical stimuli [11], development schedules can be improved and manufacturers can receive early feedback on different design options before a real prototype is even produced [4], thus improving the overall review process.
The use of virtual reality-based stimuli increases the flexibility of interactions by enabling trials that would be impossible to conduct in a physical study, such as experiments in different scenarios [12] and the examination of vehicle variations (different colors, wheel rims, interior layouts, and functionalities) within a single stimulus and clinic session.
Customers' proximity is also increased by virtual car clinics since they no longer have to travel to attend the surveys [8]. As a result, data collection might be much simplified, with car clinics taking place simultaneously in multiple locations and having significantly fewer requirements in terms of test facility space and setup work [4]. Furthermore, unlike confidential physical stimuli, which are difficult to handle and transport from a manufacturing unit to a display room [8,11], virtual reality-based stimuli do not need relocation or storage space. Customers are also barred from causing physical prototypes to be damaged during car clinics, which is a common occurrence that jeopardizes the survey.
Despite all of its potential benefits, virtual reality technology still has limitations [1,13]. One key issue with the use of virtual reality in automotive market research, for example, is graphic quality, since the clinic's goal is to evaluate consumer style acceptability using stimuli as visually realistic as a production vehicle [3,14,15]. Color and Texture issues may affect customers' perceptions [8], so the restricted Field of View (FOV) and resolution of existing VR devices are critical features that limit the user's sense of presence and may have an impact on the performance of the car clinic. Reflecting surfaces in VR must be given special attention in order to be realistic; otherwise, there will be inconsistencies that do not provide the visual experience of a vehicle in front of the user and may also alter the results of the car clinic [3,4]. Another issue is a lack of Depth Perception, which is most noticeable at close distances, with users complaining about the difficulty of one eye to focus as well as the other [16].
A critical challenge is to provide Interaction and Manipulation with the product. Customers must be able to interact with virtual stimuli in the same manner that they would with physical stimuli in order to be engaged in an immersive survey. Aside from the visual sense, a lack of haptic feedback might have a detrimental influence on the virtual analysis [3,4,15,16]. Other senses, such as auditory feedback and, in rare cases, olfactory simulations, are needed for assessing elements such as air quality inside the vehicle [3].
Furthermore, most virtual reality equipment restricts customer movement since few devices provide a lightweight wireless function that enables users to wander freely as they would in a physical survey [17]. Other challenges include motion tracking [3,15,16], lack of VR knowledge [11], high software and hardware costs [11,17], the need for a dedicated facility to set the VR equipment sensors [17], intuitiveness [18], cybersickness [19], and cybersecurity vulnerability [17].
As a result, although virtual reality may already be employed in car clinics by the automotive industry, the benefits and constraints of the technology must be properly accounted for, leaving an open research question: how well do virtual stimuli perform versus physical stimuli? Ref. [20] found a high correlation between physical and virtual prototypes results; however, the research was carried out in 2015. Since then, virtual reality devices have improved considerably, with the release of technologically mature headsets in 2016 representing a "very big breakthrough" for VR applications [11,14,21]. As a result, there is still a knowledge gap on this topic.
The worldwide virtual reality market is projected to increase from USD 6.30 billion in 2021 to USD 84.09 billion by 2028 [22]. According to other reports, the surge might be considerably bigger, considering that the COVID-19 epidemic boosted the usage of virtual reality [23]. Given the pressure to improve automotive product definition accuracy and reduce time to market, practitioners and researchers may benefit from a better understanding of the performance of virtual reality-based stimuli as they develop innovative immersive car clinics to reduce costs and shorten the cycle time to design new products. Thus, this study aims to propose a model to evaluate the efficiency of virtual reality-based stimuli for immersive car clinics.
This paper is organized as follows. Section 1 presents related works; Section 2 describes materials and methods adopted; Section 3 presents the evaluation model and analyses the observed results. Section 4 presents our final considerations and further research recommended.

Materials and Methods
Our performance evaluation model considers the important attributes required in the car clinic to be delivered by the stimuli and weights them according to their level of importance. The model uses different virtual stimuli based on the feasibility and knowledge of auto industry applicability and in line with the car clinic objectives. Every stimulus has its performance measured against each attribute, and the stimuli costs are used to compare their efficiency. Figure 1 is a schematic of the proposed performance evaluation model. The performance evaluation model multiplies the Stimulus performance factor and Attribute Importance and divides them by Stimulus Cost to identify the cost-efficiency of each stimulus. The comparison is performed by Equations (1) and (2), where Attribute Importance is the level of importance of each attribute for a car clinic, Stimuli Efficacy is the efficacy of the stimuli to deliver the related attribute and Stimuli Cost is the cost for the stimuli construction.

Attribute Identification
Sixteen attributes can impact a virtual car clinic [3]. From these attributes, we disregarded the cost attribute, because for this model, cost is considered a process input and its importance is directly measured through the stimulus cost factor, and the transportation avoidance attribute, because it is part of the clinic cost which was not considered in the performance evaluation model. The fourteen attributes considered in this paper are described as follow:

1.
Interaction and Manipulation-Customers' capacity to interact with stimuli such as walking around the vehicle, opening the side door/liftgate, sitting inside the vehicle, and so on.

2.
Visual-Spatial-The capacity of a customer to perceive the stimuli on a 1:1 scale.

3.
Visual Quality-Stimulus visual similarity with the final vehicle shape and design. 4.
Intuitiveness-Customer intuitiveness to interact with stimulus or utilize VR equipment during clinic.

5.
Data Security-Data and information security before to, during, and after the clinic, such as the potential of non-authorized personnel taking stimulus photographs, gaining access to stimulus, and so on. 6.
Comfort-Physical sensations of the customer during the interview (nausea of looking too long to stimulus, screen, etc.). Difficulty carrying/manipulating clinic-required equipment (Ex: heavy virtual reality vests, uncomfortable headsets, etc.). 7.
Depth Perception-Depth perception is the ability to perceive the stimulus's threedimensional volume and spatial layout.

8.
Haptic-Perception of being able to grasp or touch surfaces or objects in the stimulus. For example, touch the steering column, reach operational switches/buttons, and so on. 9.
Motion-While one of them is moving, the perception of customer movement in relation to the stimulus. For example, perception of customer movement while walking around the stimulus, and so forth. 10. Movement-Customers' perceptions of their position vary in response to the stimulus or a piece of it. 11. Color and Texture-The Color and Texture of the stimulus are similar to those of a real car. 12. Sound-The audible feedback while moving, knocking, and so on stimulus. For instance, a door closing sound, a switch "click" sound when triggered, and so on. 13. Flexibility-The possibility to research stimulus with various series, content, colors and textures, and so on. 14. Location-The clinic's proximity to the interviewee's house (to avoid interviewee travel, etc.).

Stimuli Definition
For our performance evaluation model, we proposed six types of automotive stimuli. These types were defined from the author's knowledge of virtual models of the automotive industry, their feasibility of application in a car clinic and their different coverage of the attributes identified in the previous chapter. Since the performance evaluation model aims to identify the most efficient stimulus, we considered models delivering different performance levels for the attributes. Table 1 describes the six stimuli hardware concepts, the baseline is the most common stimulus used by the automotive industry nowadays, a hard-Physical stimulus on scale 1:1 and the other five stimuli used VR on some level.

Attribute Importance
We selected 30 experts from Marketing, Design and similar areas with more than two years of experience in car clinics (Marketing and Design Group), through professional social networks (e.g., LinkedIn and bebee), distribution lists of research companies (e.g., Survey-Monkey), professional publications and recommendations from work peers. The research adopted a mixed methods approach, integrating both qualitative and quantitative methods [24] to define the attributes' importance, incorporating a deeper understanding of the attributes' importance and statistical techniques used for the results of interviews conducted with experts.
The experts were individually interviewed, with questions related to personal and professional profiles. Likert scale questions on the importance level of the 14 car clinic attributes and an open question to identify any other relevant attributes not identified previously by the literature were included. Likert scale questions on experts' perceptions of six stimuli, defined in Table 1, were included to deliver the 14 attributes classified in the previous step and an open field to capture interactions, explanations of assessments, and any other relevant information that could contribute to the research in question.
The data analysis used a mixed approach, with qualitative information collected to determine if any further attributes should be included in the investigation, and quantitative data were examined using statistical methods to determine the level of importance of each characteristic.
The Likert scale used to identify Attribute Importance was converted into a numeric scale from 1 to 5:
Very Important Statistical tools were used to analyze the attributes' importance. Due to the type of data being based on a Likert procedure, the median and coefficient of variation were the main statistical information used in this study. We decided to test the equality of medians rather than the equality of means because medians are less sensitive to the presence of outliers than means. Since the data are continuous and non-normally distributed, and the study is based on two or more median comparisons; the hypothesis testing used was Mood's Median Test. A cluster analysis was performed to confirm if the attributes could be grouped at the same level of importance. A boxplot was used to present the comparison between the result of each stimulus and the Physical model for all attributes. The outlier represents any observation that is at least one and half times the interquartile range from the edge of the box.

Stimuli Efficacy
We selected 53 experts from the automotive product development area with VR application or experts with knowledge in comparative analysis of the use of virtual and physical tools or who have involved in at least one comparative analysis survey of the use of virtual and physical tools for the stimuli evaluation (Product Development and VR Group).
We adopted a quantitative approach to define the efficacy of five virtual stimuli concerning the attributes [24], incorporating statistical techniques in questionnaires' results conducted with experts. The Product Development and VR Group defined the stimuli's performance by applying an individual electronic quiz sent to the participants. The quiz was composed of questions related to their personal and professional profiles. Likert scale questions were about experts' perceptions of the efficacy level of five VR stimuli to deliver the attributes, compared to a Physical stimulus. Stimuli are defined in Table 1 and the attribute definition was based on the Marketing and Design Group's selection in the previous step.
The Likert answers collected from Product Development and VR Group questionnaires were converted into a numerical scale from 2 to 0.1, where 2 is Much Better, 1.5 is Better, 1 is Same, 0.5 is Worse, and 0.1 is Much Worse than the Physical stimulus. We calculated the median for comparison purposes. Next, a series of hypotheses comparison using median and p-value was performed to identify if any virtual reality stimuli had similar performances among themselves or to the Physical stimulus. A similar p-value approach and hypothesis definition to that used for the Attribute Importance was used in the efficacy study with a significance level of 0.10 being used in this evaluation.

Stimuli Cost
The proposed performance evaluation model utilized data from two traditional Brazilian companies in producing physical stimuli for car clinics, show car events or market research. The cost data were based on the production of two hard stimuli with an accurate external and internal design concept and limited functionality incorporated. The physical cost hardware of the Hybrid reality stimulus was based on the purchase of a similar stimulus by a Brazilian original equipment manufacturers in the first quarter of 2021 and it is composed of seat buck structures constructed with a standard aluminum profile that simulate front and rear seat rows and instrumental panel.
The virtual stimulus utilized the hours spent to build a virtual model with the same Computer-aided Design (CAD) used in the base model stimulus. These hours were then converted into cost using three traditional service providers' average cost/hour on the virtual model's construction.
This study considered only stimulus cost construction. Indirect costs to perform automotive marketing research, such as rental placing, stimulus transportation, and so forth, were not considered in this analysis. The hardware required for the Hybrid stimulus was also considered using data from a supplier purchase. The Stimuli Cost factor was defined as a direct comparison between the six stimuli. Every stimulus cost was divided by the lowest stimulus cost, providing a scale of value.

Results and Discussion
The combination of each Attribute Importance and stimuli performance, as well as each Stimuli Cost, allowed us to measure stimuli efficiency. In this section we present the outcomes of the interviews and questionnaires, their data statistical analysis, and the values of the Attribute Importance and Stimuli Efficacy as well as a discussion of these results. We also discuss the costs of stimuli construction as well as the measurement of each stimulus's efficiency.

Hypothesis Definition
Beyond comparing the VR stimuli performance against a Physical stimulus, we also want to compare the virtual reality performances among themselves to identify the attribute importance. p-Value is characterized as the probability that a statistical measure, such as the median, of assumed probability distribution will be greater than or equal to observed results. Even though the industry uses a standard significance level measured through a p-value of 0.05 as hypothesis acceptance, some authors approach raising questions on a real error on hypothesis definition [25]. To improve the comparison, this study used as a base of comparison a significance level of 0.10, which is more stringent in the comparison than the standard in the industry.
The hypothesis analyses were performed with the use of the Minitab software. The hypothesis was characterized by: Hypothesis 0 (H0). All medians are equal.

Hypothesis 1 (H1). At least one median is different.
For instance, every time the p-value is greater than the significance level of 0.10, one fails to reject the null hypothesis, so differences between the median are not statistically significant.

Attribute Importance Results
Professional experience of the experts varied from two to 31 years in the Marketing or Design field, with most of them with more than 15 years of experience. All had experience in automotive market research, with the majority having worked on more than ten re-search projects. 80% had more than five clinic participation and 37% had more than 20 clinic participation ( Figure 2). All the experts demonstrated intense knowledge in research with Physical stimulus and some knowledge with VR technology, but limited experience on virtual reality application with haptic sensors or acoustic. 87% of the interviewee had used VR in some departments in their companies, but only 17% confirmed some use of VR in car clinics. Three experts explained that virtual reality utilization is still in the trial phase. Only two described utilization of virtual application in automotive research, using a pre-established 2D animation stimulus video on a big screen while the moderator collected information from the marketing research participant with no direct interaction between the stimuli and interviewee.
The answers pointed out that research scope restriction is an important attribute not identified previously in the literature. The Scope attribute is the ability of the moderator to conduct the marketing research so that participants focus on the research objective and do not lose it during the research. The experts mentioned that the stimulus and the technology of its constructions could influence this attribute. Some loss of focus examples was that a low-quality stimulus might cause parallel conversation among the participants or participants not used to VR might become distracted with the new technology that they are being exposed to. The new Scope attribute increased the number of attributes being investigated to fifteen.
The new attribute, Scope, can be defined as the ability of the interviewer to conduct the marketing research in such a way that participants are focused on the research objective and are not distracted during the research. Table 2 shows the descriptive statistics of the Marketing and Design Group interview sorted from the largest to smallest mean. Firstly we highlight that Visual-Spatial, Data Security, Visual Quality and Depth Perception attributes presented a high mean and median with a low coefficient of variation. It may indicate unanimity in the perception of these stimuli as very important for the interviewees. For instance, Visual-Spatial and Data Security have an Interquartile Range of zero, and Data Security has a huge kurtosis of 10.46, indicating a sharply peaked data. On the other hand, Motion, Location and Sound distributions are right skewed, and they presented the smallest means and medians with the highest values of coefficient of variation. It may indicate that they are not considered important for automotive car clinics; however it is not an unanimous result for the experts. The experts' perceptions corroborate the literature review since all the attributes identified in the literature review had a median from Moderate to Very Important classification ( Figure 3). The Scope attribute, revealed by the interviews, scored a Very Important classification, identifying that this attribute should be considered along with the one raised by the State of Art. Six attributes have their median classified as Very Important: Visual-Spatial, Data Security, Visual Quality, Depth Perception, Interaction and Manipulation and Scope. Five attributes had a median classified as Important: Movement, Comfort, Color and Texture, Flexibility and Intuitiveness.
Concerning the six attributes classified as Very Important, three directly correlate with a visual performance of the stimuli: Visual-Spatial, Depth Perception, and Visual Quality. They also have a low coefficient of variation among the other attributes, demonstrating a consistent perception between the experts interviewed. The experts perceive visual impact as important in purchasing choice, therefore corroborating the automotive industry research. Vehicle styling/exterior is in the top six features of purchasing choices and its importance grew between 2013 and 2015 [26]. Potential automotive buyers in the United Kingdom identified that Style/Appearance is the sixth most important factor in a purchasing decision [27]. Exterior styling was identified to be one of the most important factors for the consumer's choice of vehicle after the evaluation of over 20,000 rating values in 2007 vehicle quality survey data provided by J.D. Power and Associates [28]. Lastly, another study also demonstrated that interior and exterior perceived design is the most salient dimension of quality driving consumer choice [29].
The Data Security attribute, perceived as Very Important, had several comments on the importance of the clinic being kept confidential since at a product launch the appropriate product exposition timing can be crucial. Product development is characterized as a secret commodity in the automotive industry and is recognized as a differential among competitors. The industry uses all ways necessary to prevent undesired new product information leaks, providing very physical and virtually restricted access to new product development departments. Prototype camouflage is used as standard industry practice where vehicles cannot be contained in a private area. The new Scope attribute, found during the interview process, was classified as Very Important but had the highest coefficient of variation among the Very Important attributes, demonstrating some spread among the expert's perceptions. The experts highlighted the possibility of stimulus influencing participants to be focused on clinic expectations. Regarding virtual reality utilization, there is a concern of participants not being familiar with VR, where the technology itself might be the focus of participants and draw attention away from the clinic scope.
Four attributes had a score lower than Important. Location, Sound, and Motion attributes have the lower classification with a median score of three (Moderate) and the Haptic attribute has a score of three and a half (between Important and Moderate). These four attributes also had a high coefficient of variation among all the attributes.
Diverging from studies that argued the importance of travel reduction on the use of VR [8], the Location attribute's low score is explained due to experts experiencing problems of participants traveling even from abroad to support the car clinics. The cost related to travel is not significant compared to the total cost of the clinic and has not been considered. The spread is explained since some experts took into consideration the effort to organize traveling or its related expenses, while others did not. Some experts had experiences of participants being tired due to long traveling, arriving too close to the clinic time, or having problems with hotels during their stays. The interviewees mentioned that finding participants with the required profile can be challenging, so travel participants have much lower importance in the clinic process.
Aside from some studies suggesting that hearing is the second most important human sense, right behind sight [30], the Sound attribute received a low score, which may be explained by the fact that in a design clinic, the auditory sense is far less important than the visual or haptic sense. As one of the five human sensory systems, Sound is an attribute that may improve VR participant immersion, for example, in manufacturing feasibility or machinery movement studies [16]. However, it is a low-impact attribute on design definition, and a Physical stimulus has limits in delivering the same level of performance as a final vehicle.For example, the experts commented that the door closing sound is much louder in the Physical stimulus than in a final vehicle. While some experts believe that sound has low importance in this type of research, others classify that as medium importance, stating that it could be one of the senses that could improve interviewee experience to get close to the final vehicle performance.
We also applied a cluster analysis of the attributes to identify their importance. The attributes were grouped into four to five clusters, with Intuitiveness, Haptic, Motion and Sound attributes grouped in a cluster, and Location as an isolated attribute. These four groups of attributes have low importance for the car clinic since they have a median below the important classification and a high coefficient of variation, and they will be discarded for the model analysis.
Besides, a k-means clusterization analysis of the other eleven attributes (Important to Very Important) used a silhouette method that identifies them in two clusters. The first cluster comprises Virtual Spatial, Data Security, Visual Quality, Depth Perception, Scope, Interaction and Movement attributes while the second cluster is Comfort, Color and Texture, Flexibility, and Intuitiveness attributes. Another clusterization analysis of the eleven attributes using the PAM methodology with two clusters presented similar results to the k-means method except for the attribute Movement, which was grouped with Comfort, Color and Texture, Flexibility, and Intuitiveness attributes.
Based on the clusterization, median, median confidence interval, boxplot interpretation (

Stimuli Efficacy-Marketing and Design Group
Cambridge (2021) defines efficacy as the ability to produce the intended results or a method of achieving something. So, the output of this process can be interpreted as the efficacy of the stimulus. The concept of Importance-Performance Analysis was introduced by Ref. [31], where the importance was defined for each attribute as long as the performance perceived in each attribute is defined by the perceived customer expectation. Based on the predetermined attribute, two dimensions are classified: I. the importance of each attribute and II. judgments of its performance [32]. The delivery of the desired clinic information, or output of this process, has two combined factors that must be considered. The first is the ability of every stimulus to deliver the expected attribute, and the second is the level of importance of each attribute for the desired information outcome.
At this step, 26 experts from the total of 30 Marketing and Design experts classified their perception on the efficacy of six stimuli to deliver the 15 attributes. Even though these data were not used to build the performance evaluation model, they shed some light on the automotive industry. Their Likert answers were converted into a numerical scale (5-Excellent, 4-Good, 3-Medium, 2-Bad, and 1-Very Bad) and a median calculation was used to compare the stimulus performance perception by the experts (Figure 5). The experts perceive that the Physical stimulus has an overall better performance, among all the stimuli. The Hybrid stimulus has the best performance in relation to VR stimuli, followed by the virtual reality with haptic sensors (VR + Gl and VR + V), Vis + Ac and last, the Visual stimulus. Even though Hybrid stimulus have the best performance among the virtual stimuli, the other virtual stimuli perform better in Security, Sound, and Location attributes. Location and Security attributes' scores relate to the need to transport physical properties in the Hybrid stimulus, which the experts identified as having some level of difficulty or risk. The Sound performance attribute is worse only than VR + Ac, the only stimulus identified to have a similar performance to that of the Physical stimulus.
VR + Gl or Vir + V have very similar performances with the sensory vest scoring better in the Motion attribute. Similar situations can be noticed between Visual and V + AC except for with the Sound attribute. All VR stimuli performed better than the Physical stimulus in Location and Interactions attributes. The experts' perceptions of these attributes relating to the Physical stimulus were that it is very difficult to be transported, due to its security, size, and weight (more than 1 Ton) and due to very limited flexibility, since it is mainly built in only one configuration, not providing possibilities to change color, texture, and content. Concerning the Data Security attribute, the experts' perception is that virtual stimuli have an advantage over the physical mainly during stimuli transportation, where a considerable effort to prevent unauthorized people from seeing the physical stimuli is required. Access control from the research staff was also raised as being more efficient in virtual reality.
The Location attribute performance was also affected by the facility of virtual stimuli transportation over physical stimuli and the Flexibility attribute. VR performed better than the physical since the electronic stimuli can be easily programmed to perform different vehicle options. In contrast, the physical property, composed of a single stimulus, requires other ways to analyze different property compositions, such as 2D-illustrations or small samples.

Stimuli Efficacy-Product Development and VR Group
Questionnaires were performed with 53 experts whose experience varies from two to 30 years in product development or VR research. The average field experience was 15 years, and 49 of the experts worked directly with OEM. 47% of the experts had experience in the research of virtual in comparison with the physical environment and 96% currently work in a company that adopts virtual reality ( Figure 6). The majority of the experts classified themselves with medium to high experience in Hybrid Reality and VR Visual, but with medium to low experience in virtual reality with acoustic or haptic features (Vis + Ac, Vis + Gl, and Vis + V). Since the data are continuous and non-normally distributed and the study is based on two or more median comparisons, the hypothesis testing used was Mood's Median Test. A comparison of median and distribution plots was performed in every stimulus performance concerning the different attributes. This study focused on the eleven attributes that Ref. [3] found to be strongly influential on the model. Table 3 shows the descriptive statistics of Product Development and VR Group interviews. Location is the only attribute that was evaluated as somewhat better or much better than the physical prototype by three quarters of the experts (for all stimuli except the Hybrid one). Data Security and Flexibility were also considered similar to or better than the physical prototype for all virtual stimuli by three quarters of the experts (including the Hybrid prototype). Sixty-four percent of the experts considered Location at the virtual stimulus much better than the physical prototype (high kurtosis number). On the other hand, Data Security and Sound were evaluated somewhat worse or much worse than the physical prototype for all virtual stimuli by more than a half of the experts, except for Data Security with the Hybrid stimulus and Sound with the Vis + Ac stimulus. Three quarters of the experts are also concerned about Interaction and Manipulation attribute of the visual and Vis + Ac stimulus.

Stimuli Efficacy Results
Physical and Hybrid stimuli have the same median for the Interaction and Manipulation attribute, followed by Vis + Gl and Vis + V stimuli also with similar median and at the end Visual and Vis + Ac stimuli with a lower and the same median (Figure 7).
The similarity between Visual and Vis + AC stimuli is proved to be statistically equivalent considering their Mood's p-value of 0.658 and Levene's p-value of 0.736, which represent a very high significance. VR Vis + Gl and Vis + V stimuli present a Mood's p-value of 0.844 and Levene's p-value of 0.472, also very high. Experts perceive that Hybrid stimulus using partially physical properties will perform very similarly to the Physical stimulus. The Interaction and Manipulation attribute is also affected by the haptic hardware, demonstrating that VR with haptic sensors (Vis + Gl and Vis + V) will perform similarly among them and better than Visual and Vis + Ac stimulus, which is predominantly based on visual and acoustic sensors. Visual and Vis + Ac stimuli with the same median will have a similar performance. These results point out that Hybrid stimulus have a similar performance to that of the physical. The visual and Vis + Ac have similar performance with a score of 0.5, while Vis + Gl and Vis + V also have similar performance, but with a score of 0.8. Figure 8 demonstrates that all the stimuli have the same median for the Visual-Spatial attribute. A statistical analysis of all the virtual stimuli also found a Mood's p-value of 0.680 and a Levene p-value of higher than 0.328, demonstrating similarity. This illustrated that all the stimuli would have a similar performance concerning this attribute based on the expert's perception. The median for Visual Quality attribute is statistically similar for all the virtual stimuli with a Mood's p-value of 0.911 and a Levene's p-value of higher than 0.343 ( Figure 9). Hence, all the stimuli have the same performance, delivering the same level of the Visual Quality as the Physical stimulus.  Regarding Data Security attribute, Figure 11 demonstrates a median statistically similar with a Mood's p-value of 0.883 and a Levene's p-value of 0.618 for all VR stimuli and that the virtual stimuli perform better than the Physical stimulus in this attribute.
The better performance of the virtual stimuli in relation to the physical may be related to the difficulty of handling the Physical stimulus during their construction, transportation, and during the clinic. Expert perception is that the digital data are easier to control. Figure 11. Security Attribute. Regarding the Data Security, the better performance of the virtual stimuli in relation to the physical may be related to the difficulty of handling the Physical stimulus during the car clinics process, since the construction up to the interviews. Black circle (‚) represent the median, plus within a circle ( ) is the mean and asterisk (*) represent an outlier. Identical outliers are represented by the quantity of asterisk symmetrically offset in the graph.
The Data Security attribute presented no median statistically similar with a p-value of 0.017 for all virtual reality stimuli ( Figure 12). Median statistically similar with Mood's p-value of 0.935 and Levene's p-value of 0.510 for all VR stimuli, except for VR Hybrid in the customer feeling comfortable during the research attribute. The Hybrid stimulus has a median at the same level as the Physical stimulus. Any virtual stimulus is worse than the hybrid stimulus.
The experts perceive that the Hybrid stimulus having a partially physical property can provide more comfort for the interviewee during the clinic. This fact is probably related to the perception of cybersickness in the virtual world and that being not so severe when the stimuli provide some level of physical contact, such as the one provided in the hybrid stimulus.
Regarding the Depth Perception attribute, Figure 13 presents a Mood's p-value of 0.224 and a Levene's p-value of 0.132. Also, a statistical analysis of the Hybrid, Vis + Gl and Vis + V stimulus together provided a Mood's p-value of 0.499, demonstrating some statistical similarity. Further analysis can also identify grouping stimuli with a higher p-value. Visual and Vis + Ac together have a Mood's p-value of 0.697 and Levene's p-value of 0.765, while VR Vis + Gl and Vis + V have a Mood's and Levene's p-value of 1.000. Some stimuli have higher Mood's and Levene's p-value and all the stimuli are equivalent in performance on this attribute based on the defined p-value threshold of 0.1.
Experts perceive that this attribute is positively impacted by haptic sensors or some physical contact (Hybrid) during the clinic. In addition, the auditory sense does not provide significant benefits to only Visual Virtual stimulus. At the same time, the incorporation of gloves and vests with sensors presents a higher spread, demonstrating that the expert's perception is that this attribute may be influenced not only by visual characteristics.  Regarding Movement attribute, Figure 14 demonstrates a spread classification on different virtual reality stimuli and presents a median statistically similar with a Mood's p-value of 0.423 and Levene's p-value higher than 0.306. Further analysis can also identify that the Virtual, Vis + Ac, Vis + Gl, and Vis + V stimuli are statistically similar with a Mood's p-value of 0.885 and Levene's p-value higher than 0.786 in Position Perception.
Based on expert perceptions, all the virtual stimuli performed similarly except for the Hybrid stimulus, the performance of which is slightly better than that of the others and quite like that of the Physical stimulus. Thus, the visual sensor is predominant over the other haptic and acoustic sensors for this attribute, so the addition of these sensors causes almost no difference in the interviewee's perception of their position to the stimuli. Since all the stimuli compared provided Mood's and Levene's p-value higher than the threshold of 0.1, all the stimuli will be assumed to have the same performance for this model.  Figure 15 shows that the median of the Color and Texture attribute is statistically similar with a Mood's p-value of 0.884 and a Levene's p-value higher than 0.659 for all VR stimuli, which demonstrated that all the virtual stimuli have a similar performance to that of the physical stimuli. The expert perception of similarity between physical and virtual performance may be related to the limitations of physical properties concerning Color and Texture. These stimuli are built with similar vehicle Color and Texture, but in most, these features are achieved by applying a film that simulates these attributes, which are commonly more fragile than the production parts and can get damaged if not correctly managed during the clinic. At the same time, virtual stimuli are currently delivering very precise Color and Texture. It is important to note that the experts have limited knowledge of the haptic sensitivity for attributes such as product texture. Regarding the Flexibility attribute, Figure 16 demonstrates that virtual stimuli have a median higher than the physical attribute and an analysis of their p-value shows that they are statistically similar with a Mood's p-value of 1.000 and Levene's p-value higher than 0.712. The experts' interpretation in the interaction of the stimuli is that all the virtual stimuli perform very similarly among themselves and better than the Physical stimulus. This perception is based on the virtual stimuli' possibilities of quickly changing color, textures, content, and so forth. A Physical stimulus is mainly built in a single version, and all the differentiation performed in the clinic is based on additional 2-D images or small sample sizes, which required the interviewee to make some interpretations of the final product. Finally the Scope attribute has a median with a Mood's p-value of 0.757 and Levene p-value higher than 0.033 for all virtual reality stimuli (Figure 17), which demonstrated no similarity among all of them. Further investigation found a Mood's p-value of 0.996 and Levene p-value higher than 0.213 when grouping Visual, Vis + Ag, Vis + Gl, and Vis + V. Except for the Hybrid stimulus, all the other stimuli performed equivalently.

Stimuli Efficacy-Final Considerations
The statistical analyses allowed us to compare the performance of every attribute against each stimulus. The Moods and Levene's p-value confirm performance similarity in some attributes based on the median except for two stimuli. The attributes with inconclusive statistical data are Intuitiveness, with Mood's p-value identifying the Hybrid stimulus as non-statistically similar (even though it has the same median as the other VR stimuli), and Scope, which, besides a Mood's p-value of 0.757 for all the VR stimuli, obtained a low Levene's p-value when the Hybrid stimulus was grouped with the others. Since these attributes did not provide a statistical similarity, we will maintain the Hybrid stimulus with no performance definition for those. Figure 18 shows the values for each stimulus and a sum of the attribute's values at the bottom. Virtual stimuli perform better than the physical in the Security and Flexibility attributes. On the other hand, all the virtual stimuli except for Hybrid performed worse than the Physical stimulus in Interaction and Manipulation and Comfort attributes. The other at-tributes had similar performance between the Physical stimulus and the virtual. Overall, all the virtual stimuli had a very similar performance to the Physical stimulus. The Hybrid stimulus performed better than the Physical stimulus in the attributes, the data of which were not possible to statistically compare, and were left blank in the matrix. That demonstrated that even though its performance is not statistically equal to the virtual stimuli in the Intuitiveness and Scope attribute, it should not have a worse performance than the other virtual stimuli.
VR stimuli with haptic features such as gloves and the sensory vest have similar performances for all attributes. The same occurs with Visual and Vis + Ac. Based on the expert's perception, there is a low need for acoustic features to be added on top of the Visual Virtual hardware. A similar conclusion is made for the haptic sensors, where haptic gloves can be preferred over the sensor vest. These additional features require higher investment and programming hours and bring no significant performance improvement in terms of the major clinic attributes.
A similar analysis was performed with the assessment performed by the Marketing and Design Group ( Figure 19). Even though these data were not used to build the performance evaluation model, they bring further clarity about the different group levels of VR knowledge perception. To obtain a similar comparison as that performed with the Product Development and VR Group data, a series of hypotheses comparisons using the median and p-value was performed with the Marketing and Design Group data to identify whether any virtual reality stimuli had a similar performance among themselves or with the Physical stimulus. The Physical stimulus median was used as a baseline and the other stimulus scores were compare with it using the difference between the stimulus being assessed and the Physical stimulus and dividing the value by two to have the same scale used by the Product Development and VR Groups (Much Better, Better, Same, Worse, and Much Worse).
A similar p-value approach and hypothesis definition used in the Product Development and VR Group efficacy was used in the efficacy study with a significance level (p-value) of 0.10 for all evaluations.
The same similarity can be found in both groups' perceptions. Virtual stimuli perform better than the physical in the Security and Flexibility attributes. Comfort attributes in the virtual stimuli had one of the worst performances when compared with the Physical stimulus. VR stimuli with haptic features such as gloves and the sensory vest have similar performances for all attributes. The same occurs with Visual and Vis + Ac. On the other hand, the Marketing and Design Group classified Intuitiveness in the virtual stimuli as worse than the Physical stimulus, which is not identified in the Product Development and VR Group.
Overall, all the virtual stimuli had a very similar performance, but all worse than the Physical stimulus. The Hybrid stimulus performance of the Depth Perception and Color and Texture as long the Interaction and Manipulation in the Vis + Ac a could not get statistically comparison with the other stimuli and attributes and are left in blank in the matrix. Something similar happened with the Depth Perception attribute, where no statistical comparison could be identified between the stimuli.
When the perceptions of the two groups are compared, the Marketing and Design group rates the performance of virtual stimuli lower than the Product Development and VR Group. The level of expertise in VR is one aspect to note, with the Product Development and VR Group having greater experience in this technology than the Marketing and Design Group. This understanding may help to explain the disparities in perception of VR performance.

Stimuli Cost Factor
The proposed performance evaluation model utilized data from traditional companies in producing physical stimuli for car clinics, show car events or market research and in companies with expertise in developing virtual models.
The lowest cost was converted to a factor and was equivalent compared with the other cost stimuli, providing a value scale. Table 4 provides the six Stimuli Cost factors.

Performance Evaluation Model
Efficiency can be defined as to do things right. For instance, based on the available resources, something is performed in the most suitable way [33]. Efficiency in a process is commonly measured through the useful output divided by the process input. In the case of the stimuli, the output interpretation is the delivery of the desired clinic information, while the input can be defined as the resources provided to build the stimulus (Figure 20). Four attributes (Haptic, Location, Motion and Sound) have an evaluation lower than four (Important) and have a high standard deviation. Due to low importance and spread over an extensive range, these factors should not be relevant in this study and were not considered in the performance evaluation model. Since the VR Vis + Ac stimulus has only an acoustic factor difference from the VR Visual stimulus and auditory sense was removed from the attribute due to its importance, this stimulus was also removed from the performance evaluation model. We also removed the stimulus with sensor vest (Vis + V) from the model because its performance is the same as the stimulus with sensor gloves (Vis + Gl), which is cheaper to construct. The performance evaluation model is divided into three portions. The top portion ( Figure 21) identifies the efficiency of each attribute concerning its related stimulus. Efficiency factors are reported, and the best results per stimulus are highlighted in blue. Blank cells are related to attributes and stimulus which presented no comparable statistical data. The second part of the performance evaluation model calculates the overall stimulus efficiency, considering its performance against all the attributes. It shows the overall Stimulus Efficiency factor and compares the performance of this factor among all the stimuli. It uses a percentage comparison of the efficiency in two ways. The first percentage is related to the best stimulus performance. In other words, it divides the stimulus efficiency by the higher stimulus efficiency of the group. It used a color-coding factor grouping the stimulus performance in three classifications (1-performance degradation not lower than 10% of best stimulus; 2-performance degradation between 10% to 20% of best stimulus, and 3-performance degradation higher than 20% of best stimulus).
The second comparison is related to the Physical stimulus performance, calculating the percentage of each stimulus against the Physical stimulus performance. Its percentage compares only the rated attributes, so the Intuitiveness and Scope attributes are removed from the divider for calculating the Hybrid stimulus. The third part is composed of a graphic bar ( Figure 22) and spider chart (Figure 23), providing an image of the performance of each stimulus and attribute.  The performance evaluation model identified that VR Visual had the best score in the analysis, but VR Gl had a very close performance. VR Visual also achieved the best score in ten of the eleven attributes investigated. Its performance is directly related to the stimulus construction cost since the efficacy of the entire stimulus was very similar, and the efficiency is measured by the efficacy factor divided by the cost factor. Figure 24 summarizes the results of our Evaluation Performance Model. Among the VR stimuli, Hybrid performed the worst. Physical stimulus efficiency is the attribute with the lowest performance of all. The Stimulus Cost is the most important factor influencing the Physical stimulus since it is more than 100 times the factor of the other stimuli. The cost of the stimulus also has an impact on VR Hybrid performance since the physical buck is expensive in comparison to other VR stimuli. Because the physical buck in the VR Hybrid may be reused, this analysis may change somewhat in the long term for applications with many clinics utilizing the same physical buck. For a stimulus selection on an eventual car clinic, the analysis should evaluate the clinic's main focus as well as the efficacy and efficiency of each stimulus, since each stimulus has a varied amount of risks and opportunities.

Conclusions
The automotive business has been under pressure to improve its timing to market with assertive product definition. The use of VR in market research may be an opportunity to reduce cost and timing in the automotive business.
The Marketing and Design experts identified a new attribute-Clinic Scope-and classified the degrees of importance of the fifteen attributes, eleven of which were rated between important and very important, and four were discarded from the model due to their low importance factor. Location, Data Security and the possibility of increased flexibility of the models are the main advantages of using virtual stimuli in car design clinics. On the other hand, Data Security and sound attributes were listed by the experts as the main disadvantages when using virtual models in car clinics.
The results of the performance evaluation model indicated that there are virtual stimuli with a similar efficacy to the physical stimuli. However, when the cost factor is considered, the efficiency of virtual stimuli is higher than that of the physical stimuli, especially the virtual stimulus with glasses, which had the best performance. On this basis, we conclude that virtual stimuli can be used in car clinics, providing cost and time reduction in the development of new vehicles; however, hardware, software, and other definitions must be considered. Our performance evaluation model has limitations, such as a wide dispersion of attribute values and stimuli evaluation, limited sample sizes, low experience in VR stimuli with tactile and acoustic sensors from participants; group evaluations were based on perception and not experience, cost factors were used only for stimuli construction and not for clinical execution (no transport of the stimuli, no clinic location renting, no travel costs, etc.), and the technological evolution of hardware and software, among others. The proposed performance model can be used for other stimuli not considered in this study as long as efficacy data are collected with a similar approach to that performed in this study. Further research should be performed to improve the correlation of VR in automotive marketing research compared to physical stimuli based on experimentation, because the current results generated hypotheses for further investigation with quantitative data. Other user cases may be compared with our proposed model to mitigate the disadvantages of this model. Considerations of investigations may consider sample size, full clinic costs, the VR technological evolution of hardware and software and expert experimentation, among others. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study is fully available in the Supplementary Material.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: