You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

13 February 2023

Low-Level Video Features as Predictors of Consumer Engagement in Multimedia Advertisement

,
,
and
1
User-Adapted Communication and Ambient Intelligence Lab, Faculty of Electrical Engineering, University of Ljubljana, SI 1000 Ljubljana, Slovenia
2
Nielsen Lab d.o.o., SI 1000 Ljubljana, Slovenia
3
Scientific Research Centre, ZRC SAZU, SI 1000 Ljubljana, Slovenia
*
Author to whom correspondence should be addressed.
This article belongs to the Section Computing and Artificial Intelligence

Abstract

The article addresses modelling of consumer engagement in video advertising based on automatically derived low-level video features. The focus is on a young consumer group (18–24 years old) that uses ad-supported online streaming more than any other group. The reference ground truth for consumer engagement was collected in an online crowdsourcing study (N = 150 participants) using the User Engagement Scale-Short Form (UES-SF). Several aspects of consumer engagement were modeled: focused attention, aesthetic appeal, perceived usability, and reward. The contribution of low-level video features was assessed using both the linear and nonlinear models. The best predictions were obtained for the UES-SF dimension Aesthetic Appeal ( R 2 = 0.35 ) using a nonlinear model. Overall, the results show that several video features are statistically significant in predicting consumer engagement with an ad. We have identified linear relations with Lighting Key and quadratic relations with Color Variance and Motion features ( p < 0.02 ). However, their explained variance is relatively low (up to 25%).

1. Introduction

The rapid and sustained development of new content consumption models based on personalized media access and streaming is having a significant impact on the established advertising landscape [1]. Audience measurement providers are struggling to keep pace with these advances and to provide advertisers with improved advertising effectiveness metrics. The latter requires robust metrics to demonstrate the impact of advertising campaigns on potential consumers.
Exposure to ads can be quantified in the context of consumer engagement (UE). To this end, brands use innovative tactics to create memorable impressions in order to increase consumer awareness and engagement with ads. Higher engagement leads to more ad exposure, improves brand memorability, and positively affects consumer purchase intentions [2,3,4]. However, exposure to ads and its effects are complex and depend on several factors. Subjective and objective factors that affect ad exposure include the circumstances in which a particular advertisement is consumed (context, denoted by c), subjective conditions related to a consumer’s personality and/or temporal mood (person, denoted by p), and the characteristics of the advertising video materials (video, denoted by v). A measured engagement using of UES-SF instrument is denoted by U E , and the measurement noise of this measurement is denoted by ε :
f ( c , p , v ) = U E + ε
A thorough model of ad engagement necessarily considers all of the three above mentioned factor groups. The present research focuses on parts related to the impact of video material characteristics on user engagement. We present a novel approach to modeling the relation between consumer ad engagement and visual video features derived from ads. The advantage of the proposed approach is that it is unobtrusive, scalable, and can be automated. To our knowledge, most of the existing approaches estimate consumer engagement based on user and content metadata (such as the number of views, video resolution, and genre), ignoring the potential effects that visual video features might have on consumer engagement. In addition, many existing approaches lack a referential ground truth for user-perceived engagement. We hypothesize that the low-level video features of an ad might capture some of its visual effects on consumers that cannot be modeled from the metadata itself.
The article has two objectives: a) to identify relevant low-level video features that can be automatically derived from video ad, and b) to test the feasibility of the model built on these features by evaluating their predictive power for assessing consumer engagement. The reference ground truth for the consumer engagement was obtained using the User Engagement Scale-Short Form (UES-SF), a standardized psychometric instrument which has been shown to capture several aspects of engagement (focused attention, aesthetic appeal, perceived usability, and reward) and provide reliable results [5]. The ground truth was collected in an online crowdsourcing study (N = 150 participants) that measured participants’ engagement in video ads using UES-SF. The target audience of the present study was young adults, the largest segment of digital natives who engage in online multimedia, especially ad-supported video streaming. The focus was on short-term exposure to in-video ads, which are common in ad-supported video streaming [6].
The remainder of this paper is organized as follows: Section 2 briefly introduces related work and discusses the concepts of ad exposure, consumer engagement, and relevant low-level video features. Section 3 presents the materials and methods used in the observational study, gathering the ground truth of perceived consumer engagement, and constructing a model based on video features. Section 4 presents the results of the study and the evaluation of low-level video features for assessing consumer ad engagement scores. In Section 5, we discuss our findings, shortcomings of the presented research, and future work.

3. Materials and Methods

3.1. Video Ads

The study was conducted using video ads. The selection criteria were developed in collaboration with Nielsen Company’s media and marketing experts, taking into account the target audience and the different contexts of potential consumer engagement. The focus was on short-term ad exposure, which is common in ad-supported video streaming [6].
The exclusion criteria for the video ads were: (a) content that depicted or promoted smoking, alcohol, or sex; (b) ads that did not promote an actual product but conveyed a social message (HIV testing, no drinking and driving) or promoted a service (e.g., dry cleaning, real estate companies, etc.); (c) low-resolution videos or very old video ads (produced before 1980); or (d) ads that promoted baby- or child-specific products because they were not relevant to the target audience. The inclusion criteria for the video ads were as follows: (a) language (only ads in English), (b) length (between 15 and 75 s in duration), and (c) accessibility (available online).
Based on these criteria, 30 video ads were selected from YouTube. A complete list of videos with access links is provided in the referenced dataset. These video ads were later used to derive low-level video features relevant to modeling of consumer ad engagement.

3.2. Low-Level Video Feature Selection

Based on the selection of video descriptors described in Section 2, a set of low-level video features was obtained from the six video ads used in the observational crowdsourcing study. These video features include Lighting Key, Color Variance, Mean Motion Activity, Standard Deviation of Motion Activity, and Average Shot Length.
For the Lighting Key and Color Variance features, the values for each video frame were determined first. The average of the values assigned to the frames was calculated to obtain a single scalar value for each video. Similarly, optical flow vectors were calculated for each successive pair of video frames. The measurement of motion activity with respect to the two frames was calculated using the mean and standard deviation of the absolute values of the flow vectors. The two corresponding scalar motion-related video features associated with the entire video were then calculated as the average over the corresponding video frames.
The dynamics of the video processing were also considered. The Average Shot Length represents the average video scene length between two video edits during the video production. Scene transition estimates were obtained using the PySceneDetect [32] software package. A content-based detector is also used. This method is based on jump-cut detection and seeks the differences in consecutive video frames exceeding a certain threshold. The experimentally determined threshold was set as t = 10 , where the threshold was an 8-bit integer. The entire process was set to operate in the HSV color space by using the average value of the three color channels. The scene transitions obtained were thoroughly verified by adjusting the values in rare cases where the automatically determined timestamps of the transitions did not match the actual shot boundaries.

3.3. Referential Ground Truth for Perceived Ad Engagement

The referential ground truth for modeling consumers’ perceived ad engagement was collected using the User Engagement Scale-Short Form (UES-SF), which has been shown to provide reliable results [5]. UES-SF is a 12-item psychometric instrument that represents the cognitive, behavioral, and social aspects of engagement in four core components (subscales): Focused Attention, Aesthetic Appeal, Perceived Usability, and Reward. It was developed as a short and quick alternative to the 31-item UES and is particularly suitable for the online collection of participants’ responses [5].
The UES-SF items are rated on a five-point rating scale. The wording of the item questions can be adapted to the specific use case. For example, instructions to participants can be worded as follows, “The following questions aim to rate your experience viewing the advertisement. For each question, please use the following scale to indicate which most closely applies to you.” An example of a question item from the Focused Attention component: “I was absorbed in this advertisement”.
Scores can be provided for individual components (FA, PU, AE, and RW) or for the entire UES-SF. Scores were calculated as the mean for each component and for each participant by calculating the average of the summed scores for the items in each component. The total UES-SF score per participant was calculated by adding the mean scores of the components. A detailed description of the UES components and items, as well as the scoring procedure, can be found in [5]. The scoring allows for modifications of questions in order to fit the evaluated content type. The exact questions used in the research are part of the referenced and published dataset.

3.4. Observational Study

An online observational study was conducted using the Clickworker crowdsourcing platform (https://www.clickworker.com/, accessed on 20 November 2022) to collect data on consumer engagement with ads. The observational study is part of a broader investigation of consumer behavior related to in-video advertising. It covers multiple aspects of ad exposure, including engagement, attitude, and purchase intent, with psychometric measures of participants’ mood, affective state, preferences, and demographics. Consequently, the number of materials used must be minimized to accommodate the scope of the study. In this article, the focus is on ad engagement.
The target population was young adults (18–24 years old), native English speakers from the USA who engage in online multimedia more than any other age group, particularly ad-supported video streaming. 54% of the respondents were female and 46% male. 61% of them were white Caucasian, 8% Hispanic, 16% African American, 2% Native American, 10% Asian and 3% of other ethnicity. In addition, 70% were living in metropolitan and 30% in rural areas. The highest educational degree achieved was 2% primary, 50% secondary, 7% vocational, 35% bachelor’s and 6% master’s education. Furthermore, 47% of the respondents were students, 23% self-employed, 19% regularly employed and 11% unemployed. The reference ground truth for consumer engagement with ads was obtained using (UES-SF) and includes several aspects of engagement: focused attention, aesthetic appeal, perceived usability, and reward [5]. Using UES-SF, each participant rated only one video ad randomly assigned to them. The number of raters per individual video was balanced ( N r p v = 5 ). Responses were collected from a total of ( N = 150 ) participants.
To control for the technology-related effects of multimedia exposure (e.g., screen size and technology-related usage behavior), only participants who used a personal computer were included in the study. This restriction also reduced the influence of ambient distractions in case of mobile media consumption. Participants were rewarded 2.50 Euro each for participating in the study. First, informed consent was obtained from the participants, informing them of the purpose of the study. They were then given a brief description of the purpose and duration of the study (average duration of five minutes). The participants were informed that no time restrictions were imposed on them, and were instructed to set an appropriate volume on their computer to fully experience the video ad. After viewing the ad, each participant was asked to rate their engagement using UES-SF. A content-related verification question has been implemented to make sure media content was indeed consumed by the participant.

3.5. Data Processing

Video features have been determined according to definitions by using own routines in Matlab R2021a. Statistical analysis and modelling were performed in Python 3.8.13 using libraries pydove 0.3.5, seaborn 0.11.2 and numpy 1.21.5. All major scripts together with the corresponding dataset are available on GitHub https://github.com/ub-lucami/llvfeatues, accessed on 21 November 2022.

4. Results

The following sections report the results of the observational study described in Section 3 and of the modeling of consumers’ ad engagement.

4.1. Descriptive Statistics for Video Features and Engagement Scores

4.1.1. Video Features

The basic statistics for the video features are shown in Figure 1. The figure includes the minimum, maximum, mean, and standard deviation of the corresponding video features for each ad. Each set is complemented with a randomized bar graph of raw consecutive feature values and histogram plots thereof.
Figure 1. Basic statistics for low-level video features derived from the ads. The table columns represent the minimum, maximum, mean, and standard deviation for each video feature, respectively. The last two columns show the randomized bar graph of raw consecutive feature values for each ad and their histograms.
Lighting has a nearly uniform distribution around an average value of 0.117 . Color Variance has much higher values in the range of 10 6 . The values for Mean Motion Activity and Standard Deviation of Motion Activity are low and in the range of 10 3 . Average Shot Length of the videos is in the range of 40–80 frames in most cases, with some single-shot videos pushing the average shot duration towards 850 frames. The histogram distribution of Color Variance, Mean Motion Activity, Standard Deviation of Motion Activity and Average Shot Length is uneven, skewed to the right with a strong gap towards the maximum values.

4.1.2. Consumer Engagement Scores

The consumer engagement scores were collected in the observational study using UES-SF. The scores were calculated for the 30 videos and all four UES-SF dimensions (see Section 3). The results for 150 participants are shown in Figure 2.
Figure 2. Basic statistics for ground truth data collected using UES-SF. The table columns represent the minimum, maximum, mean, and standard deviation for each dimension. The last two columns present a randomized bar graph of raw consecutive values of the engagement dimensions for each video rating and their histograms.
The reliability coefficient hierarchical ω of the overall model is ω = 0.81 . Measurement models of all four dimensions of USE-SF are tau-equivalent [33] and the pertained reliability coefficient is Tau-equivalent ρ T . For comparability, we also report commonly used reliability coefficients Cronbach’s α , see Table 3. The reliability of the dimensions Aesthetic Appeal (AE) and Reward (RW) is very high, and the estimated reliability of the dimensions Focused Attention (FA) and Perceived Usability (PU) is moderately high. The obtained values are comparable to the reliability measures reported in the original article presenting UES-SF [5].
Table 3. Reliability coefficients Cronbach α and Tau-equivalent ρ T for each UES-SF dimension.
Figure 2 shows the statistics for the UES-SF scores. The scores for the UES-SF dimensions are given on a scale from 1 to 5. The minimum, maximum, mean, and standard deviation are summarized in the first four columns. The randomized bar graphs in the fifth column represent the individual dimension values of engagement bound to each consecutive observation. The last column contains the histograms. The values of Focused Attention (FA) are distributed over the entire score range in a near-normal shape. Perceived Usability (PU) scores tend to express a high number of lowest rates (1), with other values distributed uniformly. The distributions of Aesthetic Appeal (AE) and Reward (RW) tend to dominate in high values between 3 and 5. Overall engagement score (UES) expresses appearance of a normal distribution with a mean value of 3 but of a relatively small deviation.

4.2. Modeling Consumer Ad Engagement

Based on the UES-SF scores and video analysis, we conducted a thorough review of the contributions of low-level video features to each UES-SF dimension. At first, we tried to identify possible linear associations between the video features and UES-SF scores.

4.2.1. Identification of Linear Associations of Video Features and Engagement

The significance of video features in terms of linear association for a given UES-SF dimension was tested using the Kruskal–Wallis ANOVA test (dependent variable normality assumptions of ANOVA were not met, and all Shapiro–Wilks test p-values were < 0.05 ). Levels for each video feature are determined per feature individually. Results are shown in Table 4. There were only three positive feature identifications. Regarding the multiple-testing problem [34], the expected number of false positives is 25 × 0.05 = 1.2 that is one or possibly two false positives.
Table 4. Reported p-values of Kruskal–Wallis test with randomized effects, where the independent variables are the video features, and the dependent variables are the individual dimensions of the UES-SF scale.

4.2.2. Identification of Model Shapes Associating Video Features and Engagement

A series of scatter plots comparing each selected video feature in pairs with the UES-SF dimensions were investigated both visually and statistically to determine the nature of the relationships between them. Visual inspection revealed a potential linear relationship between the low-level visual features and the corresponding dimensions of engagement. For example, the relationship between the video feature Lighting Key and the UES-SF dimension Reward (RW) indicates a linear relationship, as shown in Figure 3a.
Figure 3. Examples of (a) scatter plot and linear model plot of a decreasing linear relation between Lighting Key and Reward (RW); (b) scatter plot and a quadratic model plot of relation between Color Variance and overall User Engagement Score (UES), indicating a U-shaped relation; (c) scatter and quadratic plot of the relation between Standard Deviation of Motion and Perceived Usability (PU) dimension. Please note that a small jitter was added to improve visualisation of overlapped values.
Several cases indicate that there may be more complex dependencies in addition to a simple linear relationship. Considering the U-shaped (or possible inverted U-shaped) relationship between a perceived response and its stimulus, which is often found in studies on human behavior [35,36], nonlinear modeling was also considered. At this stage of the study, the goal was to identify the nature of the relationship rather than to define an optimal model. A quadratic model was applied as the simplest nonlinear model capable of capturing non-monotonous relations. An example of a possible U-shaped relationship between Standard deviation of Motion and the UES-SF dimension Perceived Usability (PU) is shown in Figure 3b. Figure 3c represents a U-shaped relationship between the mean value of Color Variance and overall User Engagement Score (UES).

4.3. Contributions of Video Features to UES-SF

Based on these results, several regression analyses were performed to determine the nature of the relationship between all the possible feature and the consumer engagement rating pairs. For each combination, the p-value of the null hypothesis H 0 = [ R 2 = 0 ] was calculated, where R 2 represented a coefficient of determination of the analyzed model. The assumed risk level was set at α = 0.05 .
The confidence interval of the fitted curve (confidence curve) using a standard bootstrapping procedure [37] was used to test the stability of the obtained models. The mean curves and confidence curves shown in Figure 3 indicate stable modeling, and there were no cases of instability in all 25 cases of modeling.
The contributions of individual video features are discussed in the following subsections.

4.3.1. Lighting Key

Lighting Key is a video feature that represents the relationship between light and shadow in the video. The regression analyses of this feature for each of the engagement dimensions are shown in Table 5. Significant relations have been found in the case of linear models for the dimensions of Aesthetic Appeal (AE) and Reward (RW) as well as in the case of overall User Engagement Score (UES). Statistical evaluation also confirms the quadratic relation in the dimensions of Aesthetic Appeal (AE) and Reward (RW); although R 2 values are slightly in favour here, upon visual inspection, one can identify little improvements in case quadratic factors are deployed.
Table 5. Lighting Key in relation to user engagement scale (short form) for the linear and quadratic model.

4.3.2. Color Variance

The video feature Color Variance shows the variance of colors within a video, and its relationship with consumer engagement is shown in Table 6. We did confirm a quadratic relation of the feature in most dimensions, as well as in overall UES score. The strongest confirmed relation was identified for Focused Attention (FA) and Aesthetic Appeal (AE), followed by Reward (RW). There is no relation if this feature with Perceived Usability (PU) could be seen. In all cases, the U-shaped relation has been identified, indicating that either high or low values of Color Variance increase user exposure scores in total and in the reported dimensions of FA, AE and RW.
Table 6. Color Variance relation to user engagement scale (short form) for linear and quadratic model.

4.3.3. Average Shot Length

A characteristic feature of video editing is the average duration of shots, i.e., continuous frame sequences between two cuts. The duration of the shots is related to the dynamics of the video shot or cut. The statistics of this feature are extremely uneven in the case of our video selection because a small number of videos were shot using the one-shot cinema technique and therefore consist of a single long shot, which then results in extreme values of the feature. The regression and scatter plots were therefore not optimally suited for visual inspection. The regression analyses of this feature for each of the engagement dimensions are shown in Table 7. No significant relation between Lighting Key alone and any of the engagement dimensions could be identified from the visualisations, which matches the high reported p-values.
Table 7. Shot Length relation to user engagement scale (short form) for linear and quadratic model.

4.3.4. Motion Mean and Motion Standard Deviation

The degree of motion differs significantly between certain video categories, such as action movies, sports, relaxing movies, and dramas. As explained in Section 2.2.2, we used two different features related to the average motion activity and its variability. The linear and quadratic model fits of engagement dimensions to mean motion computed using optical flow are summarized in Table 8. Table 9 refers to linear and quadratic models of standard deviation of motion. The values of p for quadratic models of both the mean and standard deviation of the motion were below α = 0.05 for the sub-dimension of Perceived Usability (PU). Figure 3c as an example demonstrates a U-shaped relation between the mean value of Motion Activity and Perceived Usability (PU) dimension. The same behaviour could be identified for the case of Motion Standard Deviation. This behaviour indicates optimum values of motion which will lead to the best sub-score of Perceived Usability (PU). This implies that very slow or very fast motion in advertising has a negative impact on this dimension of engagement.
Table 8. Mean motion in relation to user engagement scale (short form) for a linear and quadratic regression model.
Table 9. Standard deviation of motion in relation to user engagement scale (short form) for a linear and quadratic regression model.

4.3.5. Feature Contributions and Model Shapes

Overall, a series of 25 combinations of video features and their respective engagement dimensions including its total score were tested. Each of the five low-level video features was tested for four engagement dimensions and for a total engagement score (see Table 5, Table 6, Table 7, Table 8 and Table 9), using both linear and quadratic models. Based on the statistical significance of the coefficients of determination R 2 (null hypothesis H 0 = [ R 2 = 0 ] ), the low-level video features were identified as significant in two cases for the linear models and in six cases for the quadratic models.
As indicated in Section 4.2, the polynomial models were restricted to quadratic order. Only the models that were significant in terms of the coefficient of determination were considered, despite the fact that also some other plots exhibit convincing model shapes.
The observed shapes are presented in Table 10. To summarize the relationships between the video features and the predicted engagement dimensions, the relations of the Color Variance video feature towards engagement score subdimensions of Focused Attention (FA), Aesthetic Appeal (AE) and Reward (RW), as well as towards overall User Engagement Score (UES), are U-shaped. The same observations are valid for relations of video features Motion Mean and Motion Standard Deviation with Perceived Usability (PU) and for the video feature Shot Length towards Aesthetic Appeal (AE). We identified no case of a prevailing linear model shape. Typical representations of these shapes are shown in Figure 3. The interpretation of these observations is weak because very little theoretical knowledge is available from the existing studies.
Table 10. Model shapes of four O’Brien UES user engagement scale dimensions estimated from 5 low-level ad video features. The foreseen possible model shapes are labelled with symbols ∪ representing U-shape, ∩ indicating upside-down U shape, ↗ showing increasing and ↘ for decreasing behaviour.
In general, the tests yielded low R 2 values, suggesting that the models explained little of the variability in the UES-SF dimensions. These results were expected and can be seen visually in the widely dispersed score values shown in Figure 3 and in the other scatter plots under evaluation. Further discussion of unexplained variability in the models is provided in Section 4.4.
Regarding the multiple-testing problem [34], the expected number of false positives is 25 × 0.05 = 1.2 , which is one or possibly two false positives.

4.4. Unexplained Variability of Consumer Ad Engagement

Based on the R 2 values, the models presented explain only a small portion of the variability (up to 10%) in the consumer ad engagement scores collected from UES-SF. Figure 3 shows the scatter plots with the six vertical clusters bound to the feature values of each video and plotted on the x-axis.
This unexplained variability is not uncommon, especially in human psychology studies [14]. Moreover, this was expected in the present study because of the nature and complexity of the engagement construct and the lack of experimental data from related work. Consequently, the proposed models based on low-level video features alone cannot achieve a higher value of explained variance ( R 2 ).
To put the reported R 2 values into context, the question is what the contribution is of the evaluated models compared to the maximum achievable value of R m 2 for a given feature. To determine the maximum achievable R m 2 for an ideal model based on video features, the ad engagement scores were evaluated using categorical regression based on the ad videos themselves. Categorical data from the six videos were quantified by assigning numerical values to their respective categories with the goal of the model yielding the best possible linear regression fit. The mean value for consumer engagement in ads was used as a separate transformation of the categorical video value for each engagement dimension.
An example of a scatter plot and the corresponding linear regression plot for overall User Engagement Score (UES) is shown in Figure 4. A high level of variability in consumer engagement can be observed for each video.
Figure 4. Scatter plot and corresponding linear regression plot for the optimal video feature against the overall User Engagement Score (UES). Note that the x-axis (abscissa) relates to categorical variable of Video ID; the assigned numerical values are equal to corresponding mean engagement scores.
A summary of maximum achievable R 2 values for the case of categorical regression based on video advertisements themself can be found in Table 11, where values in the data column contain the estimated R m 2 values of the ideal model. Relatively low values agree with our initial assumption (see Equation (1)) that there are other factors affecting the consumer engagement apart from the content (video) itself.
Table 11. Maximum R 2 values for categorical regression quantified by assigning idealized numerical values for each engagement dimension.

5. Discussion

The article presents a novel approach to model the influence of automatically derived low-level video features on short-term ad engagement. Most existing approaches for modeling consumer ad engagement are based on video metadata and use high-level content features (e.g., genre, director, cast, tags, text ratings, and similar metadata). Such metadata are often scarce and may not capture the impact of visual video features on consumers. The implicit metadata available through the computational analysis of low-level video features might better capture some of the ad’s visual effects on consumers. Moreover, in most cases, the existing approaches lack referential ground truth for consumer-perceived engagement.
To this end, UES-SF scores on the four dimensions of consumer engagement were collected to provide the ground truth. Low-level video features were then derived and used to model the UES-SF scores. The results show that several features can be used to estimate consumer engagement, but each explains only a small portion of the variability for the respective UES-SF dimensions.
For most low-level video features studied, we can identify their relationships with the four dimensions of UES-SF. In the existing literature, we have found no linear nor more complex models of user engagement scores based on low-level video descriptors. However, we found some evidence of U-shaped behavior of the models related to human behaviour [35,36]. Indeed, linear models have been shown to explain a statistically significant portion of the variability only in rare cases of our study. Given the tentative U-shaped model options, we opted for quadratic models and identified a satisfactory number of significant model relations.
The upper limit of the explained variability related to the video contribution by any of the models is low for itself, ranging from 25% to 50%, see Table 11. This behaviour, also indicated by relatively small R 2 values, was expected as the factors of personality and context influencing the user engagement were not part of this study. Consumer ad engagement is a complex construct that cannot be captured solely by visual features of an ad, but is influenced by many factors, such as context and consumer perceptions and attitudes, such as personality, brand likability, familiarity, and recall; see Equation (1).
There are several limitations to the presented study. First, as a general limitation, the study addresses only short-term engagement and does not address possible changes or patterns of ad engagement over time. Second, the study participants represent a specific group of consumers within a narrow age range and in a specific location (the United States). This group may have specific consumer behavior and usage patterns, as well as social and/or cultural characteristics that cannot be generalized to other consumer groups or locations. In addition, only personal computer devices were used in this study, while usage patterns and consumer behavior in general may be different on mobile devices such as tablets and smartphones. Another potential limitation is the choice of UES-SF to collect ground truth data. UES-SF may be too general to provide more detailed insights into consumer engagement, such as brand awareness and recall, which are important factors in advertising. This potential problem needs to be explored in the future, as the current state of the art does not provide comparable psychometric instruments for measuring consumer engagement in in-video ads.

6. Conclusions

The presented approach is robust and scalable to other domains where video features could be used to model engagement. It is also novel compared to the existing literature. While most existing approaches model consumer engagement based on some descriptive metadata (such as views, like/dislike ratio, time spent watching, sentiment, etc.), the presented approach automatically models consumer engagement from the derived low-level video features of an ad. Moreover, the proposed estimation of ad engagement is based on the ground truth of perceived engagement measured by the established psychometric instrument, the UES-SF. To this end, the presented estimates of engagement and video feature importances are perceptually grounded.
The presented observational study is part of a broader and more complex investigation of consumer behavior related to in-video advertising. Overall, the results presented are encouraging. However, further studies with a larger number of materials and participants will be conducted to further evaluate the presented findings.
In the future, we will also investigate how user-related factors with varying (e.g., mood, fatigue) and non-varying factors (personality) can improve ad engagement estimates. To this end, our future work will aim to take a more holistic approach by incorporating context and content metadata to build a more complete model of user engagement.

Author Contributions

Conceptualization, E.A.O. and A.K.; Data curation, E.A.O. and U.B.; Formal analysis, G.S. and A.K.; Funding acquisition, A.K.; Investigation, G.S. and U.B.; Methodology, G.S., A.K. and U.B.; Project administration, A.K.; Resources, E.A.O. and U.B.; Software, G.S., A.K. and U.B.; Supervision, A.K.; Validation, G.S., A.K. and U.B.; Visualization, U.B.; Writing—original draft, U.B.; Writing—review and editing, E.A.O., G.S. and A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the projects MMEE—Multimedia Exposure Estimation and P2-0246 ICT4QoL—Information and Communications Technologies for Quality of Life.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board. All procedures performed were in accordance with the ethical standards of the institutional research committee.

Data Availability Statement

A collection of scripts and user engagement data used in this research is available at https://doi.org/10.5281/zenodo.7341316, accessed on 20 November 2022. For copyright issues, the access of video advertisements is available only in the form of YouTube links. Readers may refer to description files on GitHub for more information.

Acknowledgments

The authors would like to thank The Nielsen Company (US), LLC, https://www.nielsen.com/, accessed on 20 November 2022, for marketing expert cooperation and valuable opinions regarding the target audience and the different contexts of potential consumer engagement.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
UESUser Engagement Scale
UES-SFUser Engagement Scale-Short Form [5]
FAFocused Attention
AEAesthetic Appeal
PUPerceived Usability
RWReward

References

  1. Danaher, P.J. Advertising Effectiveness and Media Exposure. In Handbook of Marketing Decision Models; Wierenga, B., van der Lans, R., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 463–481. [Google Scholar] [CrossRef]
  2. Calder, B.J.; Isaac, M.S.; Malthouse, E.C. How to Capture Consumer Experiences: A Context-Specific Approach To Measuring Engagement. J. Advert. Res. 2016, 56, 39–52. [Google Scholar] [CrossRef]
  3. Dessart, L.; Veloutsou, C.; Morgan-Thomas, A. Capturing consumer engagement: Duality, dimensionality and measurement. J. Mark. Manag. 2016, 32, 399–426. [Google Scholar] [CrossRef]
  4. Araujo, T.; Copulsky, J.R.; Hayes, J.L.; Kim, S.J.; Srivastava, J. From Purchasing Exposure to Fostering Engagement: Brand–Consumer Experiences in the Emerging Computational Advertising Landscape. J. Advert. 2020, 49, 428–445. [Google Scholar] [CrossRef]
  5. O’Brien, H.L.; Cairns, P.; Hall, M. A practical approach to measuring user engagement with the refined user engagement scale (UES) and new UES short form. Int. J. -Hum.-Comput. Stud. 2018, 112, 28–39. [Google Scholar] [CrossRef]
  6. Che, X.; Ip, B.; Lin, L. A Survey of Current YouTube Video Characteristics. IEEE Multimed. 2015, 22, 56–63. [Google Scholar] [CrossRef]
  7. Nijholt, A.; Vinciarelli, A. Measuring engagement: Affective and social cues in interactive media. In Proceedings of the 8th International Conference on Methods and Techniques in Behavioral Research, Measuring Behavior, Utrecht, The Netherlands, 28–31 August 2012; Volume 63, pp. 72–74. [Google Scholar]
  8. Hollebeek, L.; Glynn, M.; Brodie, R. Consumer Brand Engagement in Social Media: Conceptualization, Scale Development and Validation. J. Interact. Mark. 2014, 28, 149–165. [Google Scholar] [CrossRef]
  9. Shen, W.; Bai, H.; Ball, L.J.; Yuan, Y.; Wang, M. What makes creative advertisements memorable? The role of insight. Psychol. Res. 2020, 85, 2538–2552. [Google Scholar] [CrossRef]
  10. Niederdeppe, J. Meeting the Challenge of Measuring Communication Exposure in the Digital Age. Commun. Methods Meas. 2016, 10, 170–172. [Google Scholar] [CrossRef]
  11. De Vreese, C.H.; Neijens, P. Measuring Media Exposure in a Changing Communications Environment. Commun. Methods Meas. 2016, 10, 69–80. [Google Scholar] [CrossRef]
  12. Gambetti, R.C.; Graffigna, G.; Biraghi, S. The Grounded Theory Approach to Consumer-brand Engagement: The Practitioner’s Standpoint. Int. J. Mark. Res. 2012, 54, 659–687. [Google Scholar] [CrossRef]
  13. Anderson, A.; Hsiao, T.; Metsis, V. Classification of Emotional Arousal During Multimedia Exposure. In Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments, Rhodes, Greece, 21–23 June 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 181–184. [Google Scholar] [CrossRef]
  14. Hoyt, W.T. Rater bias in psychological research: When is it a problem and what can we do about it? Psychol. Methods 2000, 5, 64. [Google Scholar] [CrossRef] [PubMed]
  15. Chaturvedi, I.; Thapa, K.; Cavallari, S.; Cambria, E.; Welsch, R.E. Predicting video engagement using heterogeneous DeepWalk. Neurocomputing 2021, 465, 228–237. [Google Scholar] [CrossRef]
  16. Bulathwela, S.; Pérez-Ortiz, M.; Lipani, A.; Yilmaz, E.; Shawe-Taylor, J. Predicting engagement in video lectures. arXiv 2020, arXiv:2006.00592. [Google Scholar]
  17. Stappen, L.; Baird, A.; Lienhart, M.; Bätz, A.; Schuller, B. An Estimation of Online Video User Engagement From Features of Time- and Value-Continuous, Dimensional Emotions. Front. Comput. Sci. 2022, 4, 773154. [Google Scholar] [CrossRef]
  18. Wu, S.; Rizoiu, M.A.; Xie, L. Beyond Views: Measuring and Predicting Engagement in Online Videos. In Proceedings of the International AAAI Conference on Web and Social Media, Palo Alto, CA, USA, 25–28 June 2018; Volume 12. [Google Scholar]
  19. Lops, P.; de Gemmis, M.; Semeraro, G. Content-based Recommender Systems: State of the Art and Trends. In Recommender Systems Handbook; Springer: New York, NY, USA, 2010; pp. 73–105. [Google Scholar] [CrossRef]
  20. ISO/IEC TR 15938; ISO Central Secretary. Standard ISO/IEC TR 15938. International Organization for Standardization: Geneva, Switzerland, 2002.
  21. Martinez, J.M.; Koenen, R.; Pereira, F. MPEG-7: The generic multimedia content description standard, part 1. IEEE Multimed. 2002, 9, 78–87. [Google Scholar] [CrossRef]
  22. Baştan, M.; Çam, H.; Güdükbay, U.; Ulusoy, Ö. BilVideo-7: An MPEG-7-Compatible Video Indexing and Retrieval System. IEEE Multimed. 2009, 17, 62–73. [Google Scholar] [CrossRef]
  23. Eidenberger, H. Statistical analysis of content-based MPEG-7 descriptors for image retrieval. Multimed. Syst. 2004, 10, 84–97. [Google Scholar] [CrossRef]
  24. Zettl, H. Essentials of Applied Media Aesthetics. In Media Computing; Springer: New York, NY, USA, 2002; pp. 11–38. [Google Scholar] [CrossRef]
  25. Ricci, F.; Rokach, L.; Shapira, B. Introduction to Recommender Systems Handbook. In Recommender Systems Handbook; Springer: New York, NY, USA, 2010; pp. 1–35. [Google Scholar] [CrossRef]
  26. Deldjoo, Y.; Elahi, M.; Cremonesi, P.; Garzotto, F.; Piazzolla, P.; Quadrana, M. Content-Based Video Recommendation System Based on Stylistic Visual Features. J. Data Semant. 2016, 5, 99–113. [Google Scholar] [CrossRef]
  27. Deldjoo, Y.; Elahi, M.; Quadrana, M.; Cremonesi, P. Using visual features based on MPEG-7 and deep learning for movie recommendation. Int. J. Multimed. Inf. Retr. 2018, 7, 207–219. [Google Scholar] [CrossRef]
  28. Rasheed, Z.; Sheikh, Y.; Shah, M. On the use of computable features for film classification. IEEE Trans. Circuits Syst. Video Technol. 2005, 15, 52–64. [Google Scholar] [CrossRef]
  29. Kobla, V.; Doermann, D.; Faloutsos, C. Video Trails: Representing and Visualizing Structure in Video Sequences. In Proceedings of the Fifth ACM International Conference on Multimedia—MULTIMEDIA ’97, Seattle, WA, USA, 9–13 November 1997. [Google Scholar] [CrossRef]
  30. Horn, B.K.; Schunck, B.G. Determining optical flow. Artif. Intell. 1981, 17, 185–203. [Google Scholar] [CrossRef]
  31. Vasconcelos, N.; Lippman, A. Statistical models of video structure for content analysis and characterization. IEEE Trans. Image Process. 2000, 9, 3–19. [Google Scholar] [CrossRef] [PubMed]
  32. Home-PySceneDetect. Available online: https://pyscenedetect.readthedocs.io/en/latest/ (accessed on 18 March 2022).
  33. Cho, E. Making Reliability Reliable: A Systematic Approach to Reliability Coefficients. Organ. Res. Methods 2016, 19, 651–682. [Google Scholar] [CrossRef]
  34. Miller, R.G. Simultaneous Statistical Inference; Springer: New York, NY, USA; Heidelberger/Berlin, Germany, 1981. [Google Scholar]
  35. Northoff, G.; Tumati, S. Average is good, extremes are bad – Nonlinear inverted U-shaped relationship between neural mechanisms and functionality of mental features. Neurosci. Biobehav. Rev. 2019, 104, 11–25. [Google Scholar] [CrossRef]
  36. Van Steenbergen, H.; Band, G.P.H.; Hommel, B. Does conflict help or hurt cognitive control? Initial evidence for an inverted U-shape relationship between perceived task difficulty and conflict adaptation. Front. Psychol. 2015, 6, 00974. [Google Scholar] [CrossRef]
  37. Tibshirani, R.J.; Efron, B. An introduction to the bootstrap. Monogr. Stat. Appl. Probab. 1993, 57, 1–436. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.