Observing Pictures and Videos of Creative Products: An Eye Tracking Study

: The paper o ﬀ ers insights into people’s exploration of creative products shown on a computer screen within the overall task of capturing artifacts’ original features and functions. In particular, the study presented here analyzes the e ﬀ ects of di ﬀ erent forms of representations, i.e., static pictures and videos. While the relevance of changing stimuli’s forms of representation is acknowledged in both engineering design and human-computer interaction, scarce attention has been paid to this issue hitherto when creative products are in play. Six creative products have been presented to twenty-eight subjects through either pictures or videos in an Eye-Tracking-supported experiment. The results show that major attention is paid by people to original product features and functional elements when products are displayed by means of videos. This aspect is of paramount importance, as original shapes, parts, or characteristics of creative products might be inconsistent with people’s habits and cast doubts about their rationale and utility. In this sense, videos seemingly emphasize said original elements and likely lead to their explanation / resolution. Overall, the outcomes of the study strengthen the need to match appropriate forms of representation with di ﬀ erent design stages in light of the needs for designs’ evaluation and testing user experience.


Introduction
In the design field, a traditional strategy to reduce the risk of failures stands in making designed products be evaluated, especially if they have innovative features and functions or if they create and fulfill new users' needs. "Evaluations" is here intended in a broad sense and includes all categories of conscious and unconscious reactions people (other from designers) have when faced with a design. The importance of evaluating a product already from the early design phases is stressed by many scholars, e.g., [1][2][3][4]. In particular, Arrighi et al. [5] hypothesize an improvement of products with consequent saving of resources if final users are involved from the beginning of the design process. A collaborative relationship between designers and users is therefore crucial for the success of a product, and the scholars developed a tool to collect data in a faster and flexible way.
Users have been routinely involved in the design process to carry out evaluations, but this has not covered the early design stages regularly-prototypes close to their final design have been mainly dealt with in the past. With the advent of computers, the Internet, e-commerce, and social media, the variety of representation forms of designs has increased, and the number of potential evaluators has grown dramatically. The use of physical prototypes to test the effectiveness of a product has advantages in terms of evaluations' reliability, but presents many disadvantages that cannot be overlooked, especially in terms of resources and time. Mengoni et al. [6] report that the effort in virtual prototyping should be less than 40% compared to physical prototyping in consideration of all digital steps of the settings. Dong and Liu [7] state that physical and virtual models should be combined and integrated for an ideal presentation of a design concept. Users' impressions and experience of a product are certainly influenced by the level of human-product interaction, which changes significantly if products are represented by visual and non-physical stimuli, rather than physical ones [8]. When only virtual prototypes are involved, Chen et al. [9] show that a product is evaluated better if it is placed in the (virtual) context where users can interact with it through multiple senses and have the impression of an experience closer to reality.
Designers supposedly choose products' forms of representation with a different degree of abstraction depending on the design strategy and on the product development stage. As a result, just a conceptual representation of the product can be leveraged in the very early design stages [10]. Therefore, actually, while the possible forms of product representation are manifold (see Table 1), the acknowledged design paradigm that follows has not been overcome despite the diffusion of computers and IT. The outputs of early design phases are vague [11,12], the degree to which products are defined is low and the necessary information to evaluate the goodness of such outputs is missing. At the same time, feedback coming from players intended to evaluate products is unreliable and user experience studies are unfeasible. In such uncertain conditions, early design stages are attributed of the major responsibilities for the successful completion of the product development process. Therefore, there is little chance to get valuable information when it would be the most needed.
It follows that it is imperative to find a trade-off between the detail level and the interactivity of design forms, the availability of design information and the reliability of evaluations, which is the research context of the present study. While the objectives will be better clarified in Section 2, the starting point of the research is to understand the peculiarities of different forms of representation when these are employed for evaluation purposes. At a first stage, it is of special interest to understand if a correlation exists between the abstraction degree of products' representations and the depth of their related evaluations. Therefore, the authors collected a sample of scientific contributions in which products have been evaluated. Table 1 shows in detail how the contributions have been organized relating the forms of representation used by scholars (rows) with the kind of evaluations that have been made (columns).
The forms of representation have been arranged with an increasing order of design detail starting from "text", when users were only provided with a description or written details of the object. Next to the text are "images", which can be further differentiated into sketches, photos, and photorealistic images. Images with text integration have been classified as "text + images". Other representation forms are "videos", which are not particularly widespread as a whole, while "Virtual prototypes", "Virtual Reality" (VR), "Virtual/Mixed Reality" represent products as if they were in their final stages of design, with a high degree of detail but with a lower level of interaction than "physical prototypes" and "end-use products". Consequently, the latter are mainly involved at the end of the whole design process.
The authors further classified the representation forms into three categories of stimuli: static, dynamic, and physical ones, as apparent from Table 1. The classes are considered self-explanatory and intuitive, so that no clarifications for the rationale behind this categorization are given. In particular, in the very early stages of design, it is necessary to use static and dynamic stimuli because physical objects are not available. Likewise, it is supposed that more insightful evaluations and interactive experiences take place as the design process progresses. However, it has emerged that there is no balance between the use of static and dynamic stimuli. It is worth noting that the matching of representation forms and evaluation scopes does not give rise to any strong relations between the former and the latter, as the table is not populated by references on the diagonal or in specific quadrants only. In addition, Table 1 clearly shows scholars' preference in using the former more diffusedly than the latter whatever the evaluation objectives. As aforementioned, images are the most leveraged form of representation; indeed, they are used for each kind of evaluation examined (particularly, to evaluate user experience, attractiveness, and value perception).  [75,78] As already stated, forms of representation substantially affect user-product interaction and consequently product evaluation [7,8]. However, there is a lack of studies focusing on this aspect; actually, most scholars adopt forms of representation based on availability or convenience, as their focus is steadily on users' evaluations and feedback, rather than leveraged inputs. Even in contributions that use multiple forms of representation for the same product to compare the outputs of different evaluations [1,2,13], scholars' aims have not targeted users' observation strategies during the interaction with stimuli.
It can also be remarked that the study of interactions with stimuli can nowadays benefit from systems and technologies that allow scholars to analyze people's behavior insightfully. Such systems refer particularly to tools for biometric measures [83], such as Eye-Tracking (ET), and behavioral studies [84], such as facial expression recognition. All these systems, which have made inroads in design and human-computer interaction, are capable of capturing facets of people's interactions in terms of spontaneous and uncontrolled actions, reactions and physiological changes.

Objectives and Originality of the Study
Given the lack of studies in the understanding of people's interaction with creative products, this paper is intended as a starting point in the study of the visual behavior of potential users when they are administered with different forms of representation. In particular, the overall scope is to understand if there are tangible differences between two or more forms of representations. Images and videos are chosen as stimuli for a first investigation for the reasons that follow.

•
They have different level of dynamics based on the classification presented in Table 1, i.e., pictures are static, while videos are dynamic stimuli.

•
It is possible to use consistent stimuli for the product sets under analysis; otherwise said, the possible bias due to the use of different products can be overcome due to the diffused presence of pictures and videos depicting the same product.

•
Both forms of representation can be employed in ET studies alternatively and supported by the same hardware and software. ET is clearly essential to capture data on people's visual behavior objectively.
An exhaustive description of the experimental applications of ET in design is provided by [83]. When product evaluations are considered, pictures still represent the predominant form of representation in design-related experiments, while the opportunity to employ videos is substantially neglected. Therefore, the use of ET in the study of products displayed through videos is an additional element of originality of the present paper.
Conversely to design and engineering, contributions are diffused in medicine, education, and social sciences that use videos as stimuli to be studied in ET studies. Some examples are given below.
Pusiol et al. [85] studied the possibility to characterize and diagnose different developmental disorders using the participants' gaze patterns. Different types of ET were employed by Kok and Jarodzka [86], who reviewed the pros and cons of using these tools to analyze the appropriateness of videos for learning purposes in the medical field. Another study [87] focused on the frontiers of ET in the study of visual expertise in the medical field. Another example of the learning process and skill acquisition is given by [88]; the scholars used a video to assess awareness of high-stress situations in case of an aircraft mission simulator. Videos with captions are used by [89] to understand the participant's attention allocation in the pre-learning of unknown words of a foreign language. A similar study was carried out by [90], even though infant sensitivity to visual language was taken into account here. In [91], a mute video of a project team meeting was shown to external observers, and it was possible to measure their level of engagement during the observation of non-verbal interaction of the team members through a remote ET.
These studies overall indicate that videos analyzed through ET are useful to investigate the understanding and the visual behavior of people, especially when they perceive something unfamiliar, unusual, or new.

Materials and Methods
In line with the objectives of the paper, the authors carried out an experiment to gather data on people's visual behavior while they observe creative products presented through pictures and videos. Six different creative products were shown to two groups of participants (Group_A and Group_B, 14 participants each) in the mentioned forms of representation. More precisely, the products that were shown as pictures to Group_A were presented as videos to Group_B and vice versa. In this way, each participant observed 3 products in form of pictures and 3 products in form of videos. More in details, videos and pictures were presented on a 23-inch LCD monitor while a remote ET device (Tobii X2-60) recorded participants' visual behavior. The ET acquired data on the screen coordinates in which participants focused their attention on while carrying out the same task, i.e., observing the pictures/videos with the aim of understanding the products' original characteristics, advantages, and disadvantages.

Participants
A sample of 28 adult participants (14 females) have been involved in this research. They were recruited during the initiatives/events that follow in the timeframe September-December 2019. • The The involvement in the experiment was volunteer. Written consent has not been filled in, as participants were not requested of any sensitive or demographic data, which was considered of minor importance at the present exploratory stage of research. However, given the circumstances in which participants were hired, a vast majority of them were in the age range 18-30 at the time of the experiment.

Stimuli
Six different creative products (see Figure 1), in which all the authors recognized original characteristics, have been exploited in the experiment. Creativity was meant as a fundamental feature of stimuli because of the reasons that follow.

•
Creative designs are supposed to attract greater interest than commonplace products and capture users' attention.

•
A task was designed to keep participants' attention focused on products and their elements (see Section 3.3) and this required the use of products that are not supposedly everyday objects.

•
As stated in the Introduction section, creative designs are featured by a major need for being evaluated, as they introduce novel elements or functions that deviate from people's habits. From the pictures shown in Figure 1, it is possible to notice the original characteristics of the corresponding products; the authors deemed that their original characteristics could be understood or inferred, although with a different level of difficulty.
The products shown in Figure 1 present the following creative characteristics:  Equilibrium: It is a colander integrated into a bowl. The water is kept inside the bowl; the removal of the water takes place without any need for disassembling the object while the fall of the washed food is prevented. Hood: it is a kitchen hood integrated in the hob so that the user can arrange easily the hob in the middle of the kitchen room.  Tire: it is an airless tire that keeps being usable in case of an accidental puncture.

Procedure
Two different sequences of alternated pictures and videos (see Figure 2) were arranged to be shown to Group_A and Group_B, respectively. Consequently, if a specific product was depicted as picture in the first sequence, the corresponding video was shown in the second sequence and vice versa. The stimuli have been sorted in increasing order of presumable difficulty of understanding. The choice of making participants observe a subset of pictures and videos, instead of e.g., pictures only or videos only, was aimed to minimize the dependence of results on people's typical eye scan paths, which are known to be largely idiosyncratic, e.g., [92]. This led, therefore, to design an The creative products were selected based on the availability of videos on YouTube that illustrate these products' functioning. In addition, the authors selected products that, despite being considered creative and non-commonplace, could be used or benefitted from in everyday situations. This measure is meant to minimize the potential effects of people's age, experience, and technical background.
The videos were first cut in order to obtain eight-second-long clips and edited to remove text messages. The employed pictures coincide with the first frame of the corresponding video and they are shown in Figure 1. Thanks to this measure, leveraged pictures and videos include a comparable amount of information about the shown products; the major difference is the fact that depicted objects, components, and/or people move in videos.
From the pictures shown in Figure 1, it is possible to notice the original characteristics of the corresponding products; the authors deemed that their original characteristics could be understood or inferred, although with a different level of difficulty.
The products shown in Figure 1 present the following creative characteristics: • Equilibrium: It is a colander integrated into a bowl. The water is kept inside the bowl; the removal of the water takes place without any need for disassembling the object while the fall of the washed food is prevented. • Hood: it is a kitchen hood integrated in the hob so that the user can arrange easily the hob in the middle of the kitchen room. • Tire: it is an airless tire that keeps being usable in case of an accidental puncture.

Procedure
Two different sequences of alternated pictures and videos (see Figure 2) were arranged to be shown to Group_A and Group_B, respectively. Consequently, if a specific product was depicted as picture in the first sequence, the corresponding video was shown in the second sequence and vice versa. The stimuli have been sorted in increasing order of presumable difficulty of understanding. The choice of making participants observe a subset of pictures and videos, instead of e.g., pictures only or videos only, was aimed to minimize the dependence of results on people's typical eye scan paths, which are known to be largely idiosyncratic, e.g., [92]. This led, therefore, to design an experiment where the condition related to the form of representation, i.e., picture or video, has been randomized between subjects.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 25 experiment where the condition related to the form of representation, i.e., picture or video, has been randomized between subjects. At the beginning of each test, in order to allow the ET technology to acquire reliable data on the visual data of each participant, a calibration process was performed. Then, the authors read participants the following test instructions. "You will see pictures or watch videos showing products for a few seconds. We ask you to observe them in silence. At the end of each display, the screen will turn black. As soon as the screen turns black, describe the product, its advantages/disadvantages and the characteristics that make it unusual. At the end of your explanation, nod, look at the center of the screen and we will move forward. So, to summarize, you have to describe the product, its advantages/disadvantages and the characteristics that make it unusual." In this way, the authors introduced a measure to make the participants' observations comparable, as they were assigned the same task. In addition, the presence of a quiz-like task was supposed to motivate participants' persistence in observing products and particularly those features relevant for the understanding of creative aspects.
All the pictures were shown for eight seconds, so that the exposition time of videos and pictures was consistent. When every video and image disappeared, a black screen was shown to allow participants to describe the product they had just seen or watched, for which unlimited time was assigned. Actually, the first product shown was a trial picture aimed to verify the participants' understanding of the procedure.

Areas of Interest and Eye-Tracking Data
As mentioned in the previous sections, the visual behavior of participants was acquired through an ET technology. However, in order to compare the two forms of representations in terms of quantitative data, it is necessary to resort to Areas of Interest (AOIs). The AOIs are portions of the pictures (frames when it comes to videos) on which specific ET data can be collected. In other words, At the beginning of each test, in order to allow the ET technology to acquire reliable data on the visual data of each participant, a calibration process was performed. Then, the authors read participants the following test instructions. "You will see pictures or watch videos showing products for a few seconds. We ask you to observe them in silence. At the end of each display, the screen will turn black. As soon as the screen turns black, describe the product, its advantages/disadvantages and the characteristics that make it unusual. At the end of your explanation, nod, look at the center of the screen and we will move forward. So, to summarize, you have to describe the product, its advantages/disadvantages and the characteristics that make it unusual." In this way, the authors introduced a measure to make the participants' observations comparable, as they were assigned the same task. In addition, the presence of a quiz-like task was supposed to motivate participants' persistence in observing products and particularly those features relevant for the understanding of creative aspects.
All the pictures were shown for eight seconds, so that the exposition time of videos and pictures was consistent. When every video and image disappeared, a black screen was shown to allow participants to describe the product they had just seen or watched, for which unlimited time was assigned. Actually, the first product shown was a trial picture aimed to verify the participants' understanding of the procedure.

Areas of Interest and Eye-Tracking Data
As mentioned in the previous sections, the visual behavior of participants was acquired through an ET technology. However, in order to compare the two forms of representations in terms of quantitative data, it is necessary to resort to Areas of Interest (AOIs). The AOIs are portions of the pictures (frames when it comes to videos) on which specific ET data can be collected. In other words, the creation of an AOI enables the collection of specific ET data ascribable to that portion of the picture/frame. In particular, in this paper, the Total Visit Duration (TVD) on the AOIs has been taken as a reference of AOI's attraction and devoted attention. The TVD is the amount of time that one spends gazing a specific AOI-it considers both the time of fixations and saccades located in the same AOI.
Since the AOIs are created to map the TVD on specific areas of pictures/frames, these areas include pictures/frames' characteristics useful for the scope of the study. Clearly, the AOIs are static across the product exposition when the products are shown as pictures. On the other hand, since the spatial position of the specific products' features could change along videos, dynamic AOIs that follow these features' changes were created.
More in details, Figure 3 shows the static AOIs created on the products' pictures. AOIs with the same name corresponding to the same products' features were created on products' videos. Here, the dynamic changes of AOIs are based on interpolation of polygons across a number of frames in which the authors have adapted AOIs' shapes. A full explanation of the interpolation mechanism is available in the user's manual of the software Tobii Pro Studio used in the present experiment, accessible here (see "Dynamic AOIs", p. 79). Figure 4 shows, as an illustrative example, the different polygons depicting AOIs created on the OnPot's video in different timeframes indicated in the bottom right of the corresponding frames, which represent the boundary conditions for the creation of dynamic AOIs. Illustrative dynamic AOIs can be viewed in the video OnPot.avi, provided as supplementary material.  A list of all AOIs created, along with the reason for considering them, is illustrated in Table 2. Eventually, 23 different AOIs were created across the six creative products. Clearly, the size of each AOI and its time of exposition could have an influence on the TVD. Indeed, due to the dynamic nature of the AOIs created on the videos, it is possible that AOIs' size changes or they disappear for limited videos scenes. These sources of variability will be discussed in light of the results emerged.

Data Elaboration
TVD data has been summarized in Table 3. In the table, the form of representation (picture/video) of the AOIs analyzed is presented in the fourth column. It is possible to notice the alternation of pictures and videos of the same AOI. The TVD sum is the amount of time (in seconds) spent to the specific AOI by the 14 participants. The TVD Average is the average time (in seconds) spent by each participant to the specific AOI, while the TVD SD is the resulting standard deviation.

Data Elaboration
TVD data has been summarized in Table 3. In the table, the form of representation (picture/video) of the AOIs analyzed is presented in the fourth column. It is possible to notice the alternation of pictures and videos of the same AOI. The TVD sum is the amount of time (in seconds) spent to the specific AOI by the 14 participants. The TVD Average is the average time (in seconds) spent by each participant to the specific AOI, while the TVD SD is the resulting standard deviation.
In order to compare the two forms of representation, the variable TVD Diff% has been created and it is shown in the penultimate column of Table 3. The TVD Diff% is a variable considered within each AOI and it has been calculated as in Equations (1) From the above equations, it can be noticed that the variable TVD Diff% indicates the extent to which the considered form of representation has extended the TVD of the same AOI. For instance, a high value of TVD Diff% for an AOI referred to a video, e.g., +453% for the AOI Water in the product Equilibrium, means that the element depicted in that AOI was dedicated more attention to in presence of a dynamic stimulus (5.53 times in the illustrative example). On the other hand, as for the same product, the picture as form of representation was able to drive the attention on the AOI Hand for a much longer time (TVD Diff% = +591). Table 3 includes indications on the significance of the increase of the TVD when pictures or videos are displayed. Clearly, the stars depicting the level of significance of the increases as a common rule of thumb can be present just in one of the two rows standing for the same AOI of the same product. For the sake clarity, significant TVD decreases are not indicated, but they can be inferred for videos (pictures) when significant TVD increases emerge for pictures (videos). With the aim of shading light on the major TVD differences inferable from Table 3 (|TVD Diff%| > 100), a graphical representation of these differences is presented in Figures 5-10. In these figures, the distribution frequencies of the TVD on specific AOIs are presented through violin plots. Blue violins refer to pictures, whereas red violins denote videos. The width of each violin represents the proportion of the TVD located there. Each of these figures considers a specific product.
the distribution frequencies of the TVD on specific AOIs are presented through violin plots. Blue violins refer to pictures, whereas red violins denote videos. The width of each violin represents the proportion of the TVD located there. Each of these figures considers a specific product.
When observing Equilibrium through videos, people tend to pay significantly more attention on Fruit (+277%), Hinge-1 (+447%), and Water (+453%), while the Hand of the user tends to be more unnoticed (−86%), as it can be easily inferred through Figure 5. A clear shift of attention can be perceived between Flame and Handle in Figure 6. Indeed, people that observed the product Flame through a video tended to focus on the Flame AOI (+253%), while people that observed the same product through picture tended to focus more on the Handle (+109%)-both differences are statistically significant.  . Violin plots highlighting the most significant differences on TVD Diff% between forms of representation for the product Flame.
In Figure 7, it is possible to notice that the participants that observed the product Stairs in form of picture tended not to pay much attention on Frame-2. Figure 6. Violin plots highlighting the most significant differences on TVD Diff% between forms of representation for the product Flame. Figure 6. Violin plots highlighting the most significant differences on TVD Diff% between forms of representation for the product Flame.
In Figure 7, it is possible to notice that the participants that observed the product Stairs in form of picture tended not to pay much attention on Frame-2.   As for the product Hood, from Figure 9, it is apparent that the Hood AOI tended to be observed overall less on the Hood's picture. In addition, the AOI Steam-1 tended to be observed more through the product's video.  . Violin plots highlighting the most significant differences on TVD Diff% between forms of representation for the product OnPot.
As for the product Hood, from Figure 9, it is apparent that the Hood AOI tended to be observed overall less on the Hood's picture. In addition, the AOI Steam-1 tended to be observed more through the product's video. Figure 9. Violin plots highlighting the most significant differences on TVD Diff% between forms of representation for the product Hood.
The video of the Tire generally tended to focus the attention on the AOIs Nail-2, Rim, and especially on Tire more than when the corresponding picture was shown. Appl. Sci. 2020, 10, x FOR PEER REVIEW 17 of 25 Figure 10. Violin plots highlighting the most significant differences on TVD Diff% between forms of representation for the product Tire.

Discussions and Limitations
The paper presents a pioneer study of the comparison of people's visual behavior when they are administered with static (pictures) and dynamic stimuli (videos) depicting creative products. Beyond its objectives, as already highlighted in Section 2, original elements of the paper include the use of ET in analyzing videos in the design field, and, more in general, the possibility to exploit dynamic AOIs provided by some ET software applications. The critical discussion of the main outcomes follows.
Videos, as a form of representation, overall tend to focus participants' visual attention on the products' parts that the authors considered more original and helpful to explain the products' novel elements. It is worth underlining that the function of videos in product representations might be a specific and not generalizable characteristic for product design and evaluation. Indeed, although videos' capability of driving attention towards moving elements can be intuitively supposed, the effectiveness of dynamic elements in terms of willingly directing people's attention has been When observing Equilibrium through videos, people tend to pay significantly more attention on Fruit (+277%), Hinge-1 (+447%), and Water (+453%), while the Hand of the user tends to be more unnoticed (−86%), as it can be easily inferred through Figure 5.
A clear shift of attention can be perceived between Flame and Handle in Figure 6. Indeed, people that observed the product Flame through a video tended to focus on the Flame AOI (+253%), while people that observed the same product through picture tended to focus more on the Handle (+109%)-both differences are statistically significant.
In Figure 7, it is possible to notice that the participants that observed the product Stairs in form of picture tended not to pay much attention on Frame-2.
A clear and significant difference between the two forms of representation emerged for the AOI Cap of the product OnPot. Indeed, in Figure 8, it is possible to notice how the video of the OnPot tended to direct the attention to the Cap if compared to the corresponding picture.
As for the product Hood, from Figure 9, it is apparent that the Hood AOI tended to be observed overall less on the Hood's picture. In addition, the AOI Steam-1 tended to be observed more through the product's video.
The video of the Tire generally tended to focus the attention on the AOIs Nail-2, Rim, and especially on Tire more than when the corresponding picture was shown.

Discussions and Limitations
The paper presents a pioneer study of the comparison of people's visual behavior when they are administered with static (pictures) and dynamic stimuli (videos) depicting creative products. Beyond its objectives, as already highlighted in Section 2, original elements of the paper include the use of ET in analyzing videos in the design field, and, more in general, the possibility to exploit dynamic AOIs provided by some ET software applications. The critical discussion of the main outcomes follows.
Videos, as a form of representation, overall tend to focus participants' visual attention on the products' parts that the authors considered more original and helpful to explain the products' novel elements. It is worth underlining that the function of videos in product representations might be a specific and not generalizable characteristic for product design and evaluation. Indeed, although videos' capability of driving attention towards moving elements can be intuitively supposed, the effectiveness of dynamic elements in terms of willingly directing people's attention has been investigated in different fields of research with divergent results, e.g., [93,94].
In contrast, pictures have shown to lead to a dispersion of attention towards elements of little relevance for the understanding of the products' functioning and originality such as objects in the background. To this respect, there is an even more remarked difference in the cases of the products Flame and OnPot. Here, indeed, the video led all participants to increase their attention to the AOIs Flame and Cap for the products Flame and OnPot, respectively. These two AOIs have been identified by the authors as the original characteristic of the product (Flame) and the object that undergoes the main function of the product OnPot (Cap). A similar result can be observed for the AOIs Fruit and Water in the product Equilibrium. These AOIs were observed longer in the videos and, these AOIs are representative of the objects that undergo the main function.
An additional evidence emerged by the data analysis is the better capability of the videos to focus the people's attention on products' essential/original characteristics even if these are small-sized. To this respect, the AOIs Nail-2 and Hinge-1 of the products Tire and Equilibrium, respectively, can be taken into consideration. Moreover, as for the product Tire, the attention is noteworthy captured by the AOI Nail-2 as the exposure time varies. Indeed, this AOI was shown statically for 8 s while, in its dynamic exposure, Nail-2 was shown for about 4 s only. Nevertheless, Nail-2 is observed more in the video rather than in the corresponding picture.
In the product Hood, it is possible to notice that the video led participants to pay particular attention to Steam-1 compared with the same AOI shown in the picture. This can be due to the fact that the movement of a hardly visible element like steam can be noticed and observed better in a video than in a picture or, at least, to notice the movement of steam in a picture can be more challenging.
In the videos of Equilibrium and Flame, a lower tendency of the participants to observe the AOIs Hand (Equilibrium) and Handle (Flame) can be noticed. This highlights the videos' capability to drive participants' attention on specific products' features than on the user.
The above observations and discussions were based on the data analysis carried out in the experiment described in this paper. It is useful to remark that the data collected were related to participants' eight-second-long exposure to each stimulus. The analysis of the dynamics of visual behavior could lead to interesting considerations regarding the participants' attention along the exposure time. Indeed, on the one hand, the quantitative analyses clearly indicate that videos tend to maintain a high concentration of the participants on the characteristic features of the product. On the other hand, the relationship of the time spent on specific AOIs and the understanding of the original characteristics has not been investigated. The use of videos seemingly diminishes mismatches between designers' intention and observers' interpretation, and favors products' understanding, as reported by a study conducted by the authors in parallel to the present one, which partially shares materials and methods [95]. However, it could be possible that a long lasting attention on certain features is not necessarily motivated by the need for product comprehension and evaluation. Consequently, a brief exposure time could lead to different results in terms of comparing different forms of representation. The dynamic analysis of visual behavior could be carried out using the methods proposed in [96], which, on the other hand, underlines the difficulties to be faced when accounting for the time dimension in ET studies. Through the dynamic analysis of visual behavior, it could be possible to study the speed at which videos and pictures tend to drive participants towards focusing on specific AOIs. However, to understand how this affects the understanding and evaluation of the product, future tests, in which the exposure time to the stimuli will be varied, will be carried out.
Beyond the lack of clear indications of dynamics and sequences of observation, which requires additional scrutiny, the study is affected by some limitations. At first, as aforementioned, the duration of the presence of AOIs and their size varies when comparing pictures and videos. This element has been overlooked so far and the study of its potential moderating effect is the object of future planned studies. The outcomes of the study are supposedly affected by the following choices and conditions, whose impact could be beneficially analyzed.

•
The number of leveraged products is limited, as well as the sample of participants is not representative of any population. Despite the latter, many significant differences between attention focused by pictures and videos emerged.

•
The choice of products along with the corresponding videos was largely arbitrary. The level of creativity and technological sophistication can differ across the chosen products, as well as, although they refer to supposedly common contexts (home, kitchen, means of transportation), they can be featured by different familiarity. With respect to the unusualness of the products chosen, the selection can be considered successful, as no participant spontaneously stated that they were overall familiar with the depictions.

•
Pictures were made out of the first frame of corresponding videos, but they did not necessarily coincide with the most explicative frame of the same videos. As the latter criterion could have introduced additional arbitrariness, the former was chosen. The background of participants is known just in a subset of cases (engineering, industrial design) and its effect is worth taking into account in future studies along with other demographic data.

•
The duration of videos and corresponding pictures' exposition was arbitrary, although consistent.

•
The task participants had to carry out is not a standard one. Results would have been likely different if participants had been left free to observe products and videos. The relationship between TVD on AOIs and the understanding of products' original elements could be beneficially analyzed.

•
The sequences were standardized for the sake of convenience, but they could be randomized in future studies.

•
The TVD was here chosen as a measure of attention and interest aroused by AOIs, but other ET variables are common in design studies to represent gaze and observation phenomena, see [83].

Conclusions and Outlook
Despite the limitations exposed in Section 5 and the number of potentially affecting conditions, the authors deem that the results are sufficiently strong to outline peculiarities of using videos in design evaluations. To this respect, a clear issue stands in the impossibility of creating videos or dynamic representations until an intermediate phase in the design process has been reached, i.e., a first layout of the new product is available in a CAD system. The large and significant differences in observed product elements remark that the forms of representation are not neutral when it comes to reactions to depictions of designs and creative products in particular. Therefore, within design studies of emotions, user-experience, or human-computer interaction, forms of representations are to be chosen accurately, as well as multiple ones could be beneficially used to achieve reliable evaluators' feedback.
From a methodological point of view, the approach followed in the present study can be considered as an initial benchmark for understanding the different effects of forms of representation on product evaluation. However, while the detected differences regard the visual behavior, other aspects can be of interest and worth investigating. To this respect, the evaluation of additional criteria might benefit from different techniques, including participants' questionnaires, interviews, reports, and surveys, which are not seldom combined with ET data [83]. Still, biometric and behavioral measures other than ET are nowadays available to investigate emotions. In this context, the recalled facial expression recognition systems are clearly an increasingly viable option within design [97,98] and product evaluation [99], while the chance to combining them with ET is under investigation [100].
The integration of the ET-based investigation with other techniques represents a trigger for future work along with the determination of the effects played by the potentially impacting circumstances underlined in Section 5.
Eventually, the authors are available to share the materials used in the experiments, ET data and some information about participants' understanding of creative product features. Funding: This research received no external funding. The study is conducted in the frame of the project EYE-TRACK funded by the Free University of Bozen-Bolzano with the call CRC2017.