Deep Learning for EEG-Based Preference Classiﬁcation in Neuromarketing

Featured Application: This article presents an application of deep learning in preference detection performed using EEG-based BCI. Abstract: The traditional marketing methodologies (e.g., television commercials and newspaper advertisements) may be unsuccessful at selling products because they do not robustly stimulate the consumers to purchase a particular product. Such conventional marketing methods attempt to determine the attitude of the consumers toward a product, which may not represent the real behavior at the point of purchase. It is likely that the marketers misunderstand the consumer behavior because the predicted attitude does not always reﬂect the real purchasing behaviors of the consumers. This research study was aimed at bridging the gap between traditional market research, which relies on explicit consumer responses, and neuromarketing research, which reﬂects the implicit consumer responses. The EEG-based preference recognition in neuromarketing was extensively reviewed. Another gap in neuromarketing research is the lack of extensive data-mining approaches for the prediction and classiﬁcation of the consumer preferences. Therefore, in this work, a deep-learning approach is adopted to detect the consumer preferences by using EEG signals from the DEAP dataset by considering the power spectral density and valence features. The results demonstrated that, although the proposed deep-learning exhibits a higher accuracy, recall, and precision compared with the k-nearest neighbor and support vector machine algorithms, random forest reaches similar results to deep learning on the same dataset. 2367 unique features illustrating the EEG activity in each trial. We used different evaluations measures (accuracy, recall, and precision) and various validation methods (holdout, LOOCV, and k-fold cross validation) to test classiﬁers’ performance. We built four different classiﬁers, namely the DNN, RF, SVM, and KNN classiﬁers, which achieved an accuracy of 94%, 92%, 62%, and 88%, respectively. The results demonstrate that, although the proposed DNN exhibits a higher accuracy, recall, and precision compared with the KNN and SVM, RF reaches similar results to DNN on the same dataset. Future research directions will involve exploring the DNNs in the context of transfer learning for preference detection.


Introduction
Neuromarketing is an emerging field that links the cognitive and affective sides of the consumer behavior by using neuroscience. Since its origin in 2002, this field has rapidly achieved credibility among the advertising and marketing specialists, and many such specialists are adopting neuromarketing strategies [1].
Neuromarketing can assist marketers in understanding how a consumer's brain evaluates the diverse brands and recognizing the factors that affect the consumers' choices when purchasing products. Neuromarketing research has demonstrated that people do not always recognize what happens in their unconscious brains. Furthermore, it has been demonstrated that people are not always explicit in their preferences or intentions [2].
The use of traditional marketing tools, such as interviews and questionnaires, to assess consumer preferences, needs, and buying intentions can lead to the generation of biased or incorrect conclusions [3,4]. Similarly, an oral expression of preferences can produce conscious or unconscious biases. It is difficult to extract the consumer preferences directly through choices, owing to the high product costs, ethical caution considerations, or the product not having been invented at the time of evaluation [3]. These elements highlight a contradiction in the users' opinions during the usability assessments and their actual opinions, feelings, and senses regarding the use of a product [4].
Therefore, neuromarketing requires more effective methodological alternatives to evaluate the consumer behavior. Novel neuroimaging procedures provide an effective approach to study consumer behavior. Such methods ultimately help marketers examine the consumers' brains to obtain valuable insights into the subconscious procedures underlying successful or failed marketing messages. This information is obtained by eliminating the primary problem in traditional advertising research, that is, trusting people; in particular, people should be trusted, whether they are consumers or workers who report on how the consumers are influenced by a specific part of an advertisement [1].
Brain-computer interfaces (BCIs) are promising neuroimaging tools in neuromarketing. This technology allows the users to communicate effectively with computer systems. A BCI does not require the use of any external devices or muscle interference to produce commands [5]. Furthermore, a BCI employs voluntarily generated user brain activity to control a system through signals, which provides the ability to communicate or interact with the nearby environment. Electroencephalography (EEG) is one of the main instruments used to examine brain activity. The EEG technique is the only practical, versatile, affordable, portable, non-invasive BCI to perform repetitive, real-time analysis of brain interactions in high temporal resolution [5][6][7].
Therefore, in the present research study, EEG was adopted as the input brain signal for a BCI system. Using classification algorithms, BCIs can be used as neural measures to distinguish the preference patterns from brain signals and translate them into actions to promote a product. In addition, the performance of a deep neural network (DNN) was implemented and examined to model a benchmark dataset for the preference classification.
The main objective of this research was to deeply investigate EEG-based preference recognition in neuromarketing to enhance the accuracy of classification prediction by comparing the performance of deep-learning with other conventional classification algorithms, such as support vector machine (SVM), random forest (RF), and k-nearest neighbors (KNN).

Background
This section provides a review of the main concepts used in this research: neuromarketing, BCI, and EEG.

Neuromarketing
Traditional marketing approaches include surveys, interviews, questionnaires, and focus groups, in which consumers openly and consciously report their experiences and opinions. However, such traditional approaches cannot evaluate the unconscious side of the consumer behavior. Neuroscience has the potential to discern the unconscious motivations that influence the act of making choices. It has been reported that approximately 90% of data are processed subconsciously in the human mind [8]. In the field of neuromarketing and consumer neuroscience, the evaluation of the subconscious activities exposes the true preferences of consumers more accurately than traditional marketing research does. Furthermore, neuromarketing can reveal information regarding the consumer preferences/ratings that cannot be accurately determined through the traditional methods. This is because subconscious opinions play a key role in consumer decision-making. Traditional market research approaches fail to assess the subconscious activities in the consumer brain, which leads to an inequality between the results of the traditional market research and the real behavior of the consumers at the points of purchase [8].
The term "neuromarketing" is derived by combining the prefix "neuro" and the term 'marketing", indicating the integration of two study areas: neuroscience and marketing [1]. Neuroscience is a field that examines the facets of the brain at the biological level and from a psychological perspective [2]. In addition, neuroscience has significantly enlightened the field of marketing, and the interaction between these fields assists in intuiting the consumer behavior [8].
The term "neuromarketing" began to emerge organically around 2002. At that time, a few corporations, such as Brighthouse and SalesBrain, began offering neuromarketing studies and consultations, motivating the application of technology and knowledge from cognitive neuroscience to the field of marketing. Neuromarketing values the study of the consumer behavior from a psychological perspective [1]. Recently, several high-profile companies have begun exploiting neuromarketing approaches to assess the advertisements before introducing products to consumers [9]. This neuromarketing approach has gradually gained favor with brand executives in major corporations, such as Coca-Cola and Campbell's [10].
Neuromarketing researchers aim to use neuroscientific procedures to exploit consumer behavior (i.e., requests, needs, and preferences) when shoppers purchase goods. This factor represents the researchers' primary motivation for examining the consumers' sensorimotor, mental, and effective feedback for products and advertisements through various modalities [9]. There are several neuromarketing modalities besides BCI, such as eye-tracking, galvanic skin response, skin conductance, facial coding, and facial electromyography. Each modality records different neural measure [11]. Eye tracking is used to determine eye locations and eye movement to grasp the consumer's attention and natural responses to marketing stimuli. Galvanic skin response measures the moisture activity, which is related to the emotional state. Electromyography is used to evaluate the physiological features of facial muscles. Facial coding measures emotional states through facial expressions.

BCIs
BCIs are some of the most promising neuroimaging technologies in the neuromarketing domain. This technology helps facilitate effective communication between the users and computer systems. BCIs do not require any nerves, muscles, or movement interferences to issue a command [5] and employ the voluntarily-generated user brain activity to control a system through signals to communicate or interact with the nearby environment. Such environments can include wheelchairs, artificial arms/hands, and entertainment applications that involve skillful visualization, digital painting, and game playing [6].
BCI systems have contributed to numerous fields, including manufacturing, education, marketing, smart transportation, biomedical engineering, clinical neurology, and neuroscience [5,12]. A BCI system includes an input (i.e., the user's mental activity), output (i.e., states or commands), a decoder component between the input and output, and a protocol that regulates the beginning, offset, and timing of the action [6]. The BCI research is expected to lead to an approach in which the brain signals are operated to aid people in interaction actions [6].
In a BCI, the brain signals require processing in non-clinical situations, which corresponds to a new challenge in computational neuroscience research. Currently, most of the application-oriented BCI research is focused on endowing users-not only disabled people-the ability to control systems or sensors with various environments [6].
Different neuroimaging techniques can be recorded with non-invasive BCI, such as EEG, fNIRS, fMRI, PET, MEG, SST, and TMS. EEG has better temporal resolution than fNIRS, which is a relatively new neuroimaging technique in neuromarketing research. However, recent fNIRS research is still in the substantial validation phase [13,14]. EEG is most commonly used in neuromarketing research due to its advantages that are detailed in the next subsection.

EEG
The EEG is a widely used tool that examines the brain activity. The electrical activity is recorded on the scalp by evaluating the voltage variations from neurons firing in the brain. These electrical activities are logged over a period of time using several electrodes positioned on the scalp directly above the cortex. The electrodes are connected in a hat-like device [5,7]. The EEG has the following key benefits: it is non-invasive, portable, cost effective, and relatively simple to use, and it has an exceptional temporal resolution (up to milliseconds). However, the signal-to-noise ratio and spatial resolution are restricted compared with those of other techniques. Nevertheless, EEG is considered to be the only practical, non-invasive BCI input to realize a repetitive, real-time brain interactive analysis [5][6][7]. Therefore, EEG was selected as the input brain signal for the BCI in this research.
The international 10-20 system is a method used to name electrodes based on their location on the scalp. The approach relates information pertaining to the inter-electrode space, specifically, 10-20% of the front-to-back or right-to-left of the scalp boundaries. In other words, the distance between the nearby electrodes is either 10% or 20% of the scalp diameter, as depicted in Figure 1. The 10-20 standard has been frequently used across diverse EEG systems to increase the dependability of the signals and decrease the signal-to-noise ratio [5,7].

EEG Signals
The brain produces abundant neural activity, which can be captured as EEG signals for the BCI. These neural activities consist of two types: (1) rhythms; and (2) transient activities. The EEG activity can be further categorized on the basis of these types of activities [6,7].

Rhythms:
Rhythms, neural oscillations, or brainwaves are repetitive forms of neural activity. The rhythms are measures of collective synaptic, neuronal, and axonal activities of the neuronal sets. The EEG activity is characterized by separating the frequencies into bands, denoted as delta, theta, alpha, beta, gamma, and mu rhythms. Table 1 presents the details of the EEG rhythms, ranges of frequency, amplitude, and shape, as well as the brain regions in which these activities are the most common along with the events usually associated with the type of band [6,7].
These frequency bands have been linked to affective reactions. The theta band in the front-center of the brain reflects the emotional processing when a consumer looks at a product. The alpha band on the prefrontal cortex differentiates between the positive and negative emotional valences. The beta band is correlated with the alterations during affective arousal. Finally, the gamma band is largely associated with the arousal effects [15].

Transient activities
The transient activities or field potentials replicate the action potentials of certain neurons in a manner similar to spikes. These spikes can be recognized by their position, frequency, amplitude, shape, recurrence, and operational properties. The event-related potentials (ERPs) and event-related spectral perturbations (ERSPs) are common types of transient activities [3,16].
ERP is the most common spike and arises as a reaction to a specific event or stimulus. These spikes have extremely small amplitudes. Consequently, the EEG samples must be averaged over many iterations to uncover the ERPs and eliminate noise fluctuations [16]. Table 2 presents the common ERPs used in the neuromarketing research. ERSPs compute the reaction to a stimulus over a period of time and are similar to the ERPs. However, the ERSPs split the EEG signals into the diverse frequency bands to test whether a variation exists in the power of a specified frequency band over time [3]. Table 2. Common ERP components used in neuromarketing studies.

P300
P300 is a positive potential that arises approximately 300 ms after the onset of a stimulus as a result of internal decision-making [16]. It shows the activity of attention in working memory as well as the adjustment to responses [17].

N200
N200 is a negative potential related to unfamiliarity, and reaches its peak between 200 and 350 ms after the onset of a stimulus [16].

LPC
The late positive component is a positive potential that arises approximately 400-800 ms after the onset of a stimulus associated with explicit recognition memory [18].

LPP
The late positive potential indicates enabled attention to emotional stimuli of either a positive or negative valence. This can be translated as neutral, pleasant, or unpleasant contextual stimuli.

N400
N400 is a negative potential related to oddness experiments such as brand names with products [10] FN400 FN400 is located in the front-center of the brain related to familiarity and arises approximately 300-500 ms after the onset of a stimulus. FN400 has a more negative potential for new words than for similar and familiar words [18] FRN Feedback related negativity is a front-central negative potential related to a subject's choices. It arises in response to passively observed products 200-300 ms after the presentation of unfavorable versus favorable products [3].

PSW
Positive slow waves are correlated with sustained attention to visual emotional stimuli and can be detected long after the appearance of an emotional stimulus [10].

Literature Review
This section details EEG-based preference recognition, specifically, the neural correlation of the preference, predictive features of the preference, and preference classification algorithms.
Preference can be defined as a human attitude toward a collection of entities that can be mirrored in an explicit decision-making procedure. This aspect can also be an evaluative judgment in the sense of liking or disliking an object [19]. The possibility of measuring the conscious and unconscious brain activity in assessing advertisements, through processing the consumer's processing of the advertisement message, cognitive workload, and emotional state, cannot be disregarded. The idea of a 'buy button' in the brain may be overexaggerated; however, the research efforts to utilize the neural measures in monitoring the consumer thought processes are not trivial [20]. Understanding the neural process behind the preference, feelings, and decision-making can enhance the prediction of the user preferences and choices, and neuromarketing provides a precise objective determination of the implicit preferences of the consumers [21].
Several studies [10,[22][23][24] have shown that the EEG can be used to determine the consumer preferences. To better utilize the EEG in consumer neuroscience research, the psychological processes underlying the consumer preferences must be understood.
In the following subsection, we describe the neural correlations of the EEG-based preference. Next, we classify the relevant studies into: (1) predictive features; and (2) classification algorithms of the preference recognition. Finally, we explain how the preferences can be detected using BCI.

Neural Correlations of the Preference
This subsection explains the neural elements correlated with the preferences. Certain areas of the brain are responsible for various cognitive and mental functions. To determine the positions of EEG electrodes, the underlying brain regions that are responsible for preference processing must be understood. Studies have demonstrated that the preference is linked to the frontal brain regions, specifically, the medial prefrontal cortex, nucleus accumbens [19,25], and medial orbitofrontal cortex [19,26].
Knutson et al. [25] linked the choice prediction of the products to the nucleus accumbens. When a consumer views the product, a higher activation of this region indicates a higher probability of the consumer purchasing that product. Furthermore, Kirk et al. [26] proved the relationship between the contextual preference and the medial orbitofrontal cortex; a higher activation in this region is related to higher level of preferences.
Recording the neural activity correlated with a certain function requires placing the electrodes directly above the corresponding brain area. Figure 2 shows the main electrode positions and the associated neural activity according to the 10-20 [27]. Although many researchers [24][25][26] have proved that the medial-frontal cortex is responsible for the preference function, no consensus exists on which electrodes should be used within the same brain area. Table 3 summarizes the electrode positions used in the preference recognition research.
Vecchiato et al. [28] found that asymmetrical increments of the theta and alpha bands are linked to watching pleasant (unpleasant) advertisements, as noted in the left (right) brain areas at electrodes Fp1 (Fp2), AF7 (AF8), F7 (F8), and F1 (F2). The spectral power for alpha bands increases noticeably for liked advertisements at electrode F1 and for the disliked advertisements at electrodes AF8 and AF4. In the theta band, increased activity occurs at electrodes F2, AF8, and F3 for the disliked advertisements and at Fp1 for the liked advertisements.
Touchette et al. [29] found that the frontal asymmetry in the alpha band is linked to the consumers' unconscious reactions to the product attractiveness at electrodes F3 and F4. Vecchiato et al. [28] found that the asymmetrical frontal activity is statistically significantly positive in the alpha and theta bands between F1 and F2. In addition, this activity is significantly negative in the theta band between Fp2 and Fp1, AF8 and AF7, and F8 and F7. Table 3. Common rhythms/ERP and electrode positions used for the preference detection in neuromarketing.

EEG Indices
Based on our literature review, we identified four autonomic EEG indices that have been utilized for evaluating the reactions of the people in marketing stimuli: (1) the approach-withdrawal (AW) motivation index; (2) effort index; (3) choice index; and (4) valence. Such indices assist marketers in understanding customer responses to products [30,31].

AW Index
The AW index is also known as the frontal alpha asymmetry, which indicates motivation, desire, or approach avoidance. The frontal asymmetry theory, which was initiated in 1985, states that the frontal regions of the left and right hemispheres are responsible for positive feelings (approach motivation) and negative feelings (withdrawal motivation) [29], respectively. This index can be defined as the difference between the two hemispheres in the prefrontal alpha band, that is, the relative engagement of the frontal left hemisphere compared with the right one. Positive AW values correspond to positive motivation (approach behaviors), expressed in terms of the higher activation of the left frontal cortex. In contrast, negative AW values correspond to negative motivation (avoidance behaviors), expressed in terms of the higher activation of the left frontal cortex [29][30][31][32].
Numerous researchers have demonstrated the reliability and dependability of the frontal alpha asymmetry as an effective marker in the emotion and neuromarketing research [29][30][31][32][33][34]. Touchette [29] calculated the frontal alpha asymmetry scores by considering the difference between the right and left power spectral densities divided by their sum, as obtained using electrodes F4 and F3.

Effort Index
The effort index is defined as the frontal theta activity in the prefrontal cortex. A higher theta power in the frontal region has been linked to higher levels of task difficulty and complexity. This index acts as a sign of cognitive processing that results from mental fatigue [33], and it has been investigated extensively in neuromarketing research [3,24,28,33,35]. This factor demonstrates the importance of positive and negative emotional processing for the creation of the steady memory traces during advertising [30].

Choice Index
The choice index is defined in terms of the frontal asymmetric gamma and beta oscillations, which are mostly linked to the real decision-making stage. It is also the most related element to willingness-to-pay responses, especially in the gamma band, for evaluating consumer preference and choice. Higher values of gamma and beta bands indicate a stronger activation of the left prefrontal region, and lower values are linked to relatively stronger activation of the right region [32]. Ramsoy et al. calculated the choice index for each band individually (gamma and beta) using electrodes AF3 and AF4 according to Equation (2):

Valence
Frontal asymmetry has been linked to preferences expressed as valence (i.e., the direction of a customer's emotional state). Left and right frontal activation is related to positive and negative valence, respectively. Numerous studies have supported the hypothesis that the frontal EEG asymmetry is an indicator of valence [34].

Predictive Features for the Preferences
This section reports on the studies that focused on the predictive features of neuroscience methods that can aid marketers in forecasting consumer preferences, as described in Table 4. Most of these studies employed distinctions of the standard regression analyses toward their prediction models. We classified these predictive features based on the EEG signal types: (1) rhythms; and (2) transient activities.  [11] Watching TV ads EEG Alpha (for emotional state) EMG, BCI and GSR 2010 [36] Viewing brands and images ERP N200 and P300 BCI 2010 [37] Watching TV ads EEG Alpha band frontal asymmetry BCI 2010 [35] Watching TV ads EEG Theta and gamma Heart rate, BCI and GSR 2011 [28] Watching TV ads EEG Asymmetrical increase in theta and alpha in PSD BCI 2012 [38] Viewing brand names ERP N400 BCI and EOG for eye movement 2012 [18] Viewing products and prices ERP FN400, LPC, and P200 BCI 2013 [39] Viewing products EEG Alpha, beta, theta, gamma, and delta BCI, Eye tracking 2014 [40] Viewing products ERP P300 BCI 2014 [41] Watching TV ads EEG Theta and alpha Heart rate, BCI and GSR 2015 [3] Viewing products ERSP and ERP Theta, N200, and FRN BCI 2015 [24] Watching ads (movie trailers) EEG (64 electrodes) Beta and gamma oscillations BCI and EOG for eye movement (2 electrode) 2016 [20] Watching TV ads dense-array EEG Three epochs: 200-350, 350-500, and 500-800 BCI 2016 [42] Viewing brand names ERP LPP BCI 2017 [43] Viewing product images ERP N200, LPP, and PSW BCI 2017 [30] Watching ad videos EEG Theta and alpha Heart rate, BCI and GSR 2017 [22] Viewing product images EEG Delta, theta, alpha, beta, and gamma BCI 2017 [29] Viewing product images EEG Alpha BCI 2018 [44] Viewing ads of food products EEG Delta, theta, alpha, beta, and gamma BCI 2018 [32] Viewing products and prices EEG Theta BCI 2018 [31] Tasting drinks EEG Alpha BCI 2018 [33] Viewing and touching products EEG Alpha and theta BCI 2019 [27] Viewing product images ERSP, ERP Theta, beta and N200 BCI 2019 [45] Viewing tourism images, videos and words EEG Delta, theta, alpha, beta and gamma BCI and GSR

Rhythms as Features
The beta and gamma oscillations from consumers who watched movie trailers were utilized to predict the box office sales and recall [24]. These factors were also used as an indicator of the willingness-to-pay to evaluate consumer preference and choice [32].
The alpha oscillations were used to compute the neural likeness and forecast recall and ticket sales. High-frequency EEG components were connected to both the individual preference (beta wave) and population preference (gamma wave) [24].
In addition, alpha frontal asymmetry was linked to the consumers' unconscious reactions to the product attractiveness [29]. Similarly, Modica et al. [33] linked the higher alpha frequencies to comfort food as well as foreign food products. Moreover, awarded campaigns (i.e., the campaigns that received prizes) in anti-smoking public service announcements were linked to higher alpha frequencies [30].
Lower theta frequencies were associated with the negative results toward choosing products [3]. Moreover, these frequencies have been linked to effective anti-smoking public service announcements [30] and foreign products compared with local products [33].

Transient Activities as Features
In the cognitive processes associated with preferences, several research studies considered the ERP components N400, N200, and P300, each of which can be described as follows.

N400
Some researchers [10] found that the N400 component can reflect familiarity in forecasting hits in brand extension. A powerful association with well-known brand names was replicated in the case of larger N400 amplitudes, foreseeing greater consumer preference. Another brand extension study [38] reported that the N400 component is associated with the unconscious conceptual categorization of products and brands, albeit not with conscious assessments.

N200
Some other researchers [36] observed that the N200 amplitude exposed a relationship between the emotional state and brand extension categories. This relationship appeared only with negative emotions and moderate brand extensions. A second study [43] suggested that N200 could indicate the product preferences, as determined by spontaneous procedures, whereas the LPP and PSW could indicate the product preferences, as determined by the conscious cognitive procedures. In a third study, the Cerebro system [27] combined the N200 mean, N200 minima, and ERSP to rank products according to customer preferences. Similarly, in a fourth study [3], the researchers used two methodologies, ERSP and ERP, to forecast the product preferences by examining theta brainwaves, N200, and FNR.

P300
The consumer preferences for the expanded brand labels were clarified using greater P300 amplitudes [17]. The authors of [40] used P300 as a measure of the consumer preferences for certain product features.
Other researchers [18,39] have considered factors that influence the purchasing decisions. The authors of [18] investigated the roles of mathematical ability, gender, pricing, and discount promotions in the process of consumer purchasing using the active BCI. The authors correlated the 'buy' decisions with ERP components, such as P200 and P300. To understand the product preferences, the authors also evaluated the relative importance (mutual information) of the diverse product (i.e., cracker) characteristics involved in the decision-making process by evaluating the cognitive processing by using the EEG alpha, beta, and theta brainwaves [39]. The researchers used eye tracking for choosing the preferred product. Michael et al. [45] used the same approach (EEG with eye tracking) to investigate the emotional reactions of tourism preferences by using different stimuli (words, images, and video). The authors observed that the images had higher affective responses than those of words in travel decision-making driven by the unconscious preference.
The authors of [22] built a predictive model for consumer product choice from the EEG data. The researchers studied the roles of gender and age in the process of consumer preferences in terms of liking/disliking by using a passive BCI. Another research study [20] involved the use of an inductive research method to evaluate three successful and three unsuccessful advertisements by using a dense array EEG data. The results suggest that statistically significant ERP differences existed between the successful and unsuccessful advertisements.

Preference Classification Algorithms
Although considerable progress has been made in connecting the brain activities to the user choice, indications that neural assessment could genuinely be beneficial for forecasting the success of marketing activities remain limited [24]. The neural assessments can significantly increase the predictive power above and beyond that of the traditional assessments. Because the neural assessments are better predictors than self-reported assessments, the capability of neuroscience methods to forecast the preferences in real-world situations has incredible consequences for marketers. The first study to address this was published in 2007, and it was concluded that the pre-decisional activation in the related brain areas could be used to forecast the consequent choices [46]. Since then, many neuromarketing studies have published similar conclusions.
In recent years, it has become common practice to use multivariate methods, such as pattern classification, to predict choices. For example, a classification approach can be used to predict the out-of-sample choices from "non-choice' neural responses to different products. The resulting models, which were founded on basic neuroscience methods, are more reliable for predicting the new states and settings compared to traditional market methods, such as focus groups and questionnaires. Moreover, these methods are more likely to be scalable, providing marketers with a deeper understanding of consumers and crucial economic outcomes [46].
Preference modeling using data-mining approaches can be classified into three general signal fields: time, frequency, and a combination of time and frequency. Time-based preference modeling exploits the discovery of the ERPs, as discussed in Section 3.2.2. Frequency-based modeling is accomplished by understanding the features gained by performing power spectrum analyses by generating delta, theta, alpha, beta, and gamma frequency bands, as explained in Section 3.2.1. In addition, different frequency-based feature extraction methods can be used; for instance, common spatial patterns (CSPs) and spectral filters were used in the preference classification for music with an SVM, and an accuracy of 74.77% and 68.22%, respectively, was obtained [47]. Fast Fourier transform (FFT) as the feature extraction method was used, and the SVM obtained an accuracy of 82.14% [48]. In another study, the researchers used the FFT with the radial SVMs for the preference classification and obtained an accuracy of 75.44% [49]. The last preference model combines time and frequency by analyzing the power spectrum at the time intervals that cover the entire duration of the post stimuli interval to assess the brain signals. Several traditional data-mining algorithms have been applied to classify preferences, and the utilization of different time-frequency analysis (TF) approaches has been considered [15,23,50] to detect the user preferences for music. The use of KNN led to an accuracy of 86.5% and 83.34% with different TF approaches, namely the Hilbert-Huang Spectrum (HHS) and spectrogram, respectively [15]. In their extended study, Hadjidimitriou and Hadjileontiadis [51], using familiar music data, managed to obtain a considerably higher accuracy of 91.0%. Another work involved the performance of the music preference classification by using the TF approaches, namely, the discrete Fourier transform with a KNN, and an accuracy rate of 97.99% was achieved. The researchers could achieve a similar accuracy result when using the quadratic discriminant analysis (QDA) at 97.39% [52].
Most researchers applied variations of standard regression analysis to their prediction models. However, numerous techniques and methods have been developed to process EEG to determine the preference state of consumers by using classification algorithms. A review of some experimental neuromarketing articles and comparisons of computational approaches is presented in Table 5. Table 5. Computational approaches for assessing the customer preferences.

Regression/Classification
Univariate linear regression analysis 2 [3,24]  Linear regression (Lasso, Ridge) 1 [27] Some preference studies involved the use of more than two classification algorithms to discover well-matched classifiers for a definite feature set [12]. Chew et al. [54] measured the user preferences for the aesthetics presented as virtual 3D shapes by using EEG. The researchers used the frequency bands as features to classify EEG into two classes-liked and disliked-by using the KNN and SVM and achieved high classification accuracies of 80% and 75%, respectively. However, these results cannot be considered reliable because the authors used an extremely small dataset of five subjects. In their extended study [55,56], the authors increased the number of subjects to 16 but better results were not obtained. Hakim et al. [44] achieved an accuracy of 68.5% by using the SVM to predict the most and least favored products by combining EEG measures with questionnaire measures.
Classifier combinations such as boosting, voting, or stacking can be used to join numerous classifiers, by merging their outputs and/or training them to complement each other and improve their performance [57]. The selection of the classification algorithms in a BCI system is mostly based on both the form of the acquired mental signals and the context in which the application is expected to be used. However, LDA and SVM are the most commonly applied classification algorithms and have been used in more than half of the EEG-based BCI articles.
Another categorization of the classifications was based on the survey research [57], which considered the BCI and machine learning literature from 2007 to 2017. The findings of the recently designed classification algorithms were divided into four main categories: adaptive classifiers, matrix and tensor classifiers, transfer learning, and deep learning. The adaptive classifiers are classifiers whose parameters, such as the feature weights, are gradually re-assessed and revised over time as new EEG data are presented. The matrix and tensor classifiers (multi-way array) avoid the use of the filters and feature selections and map the data directly onto a certain space with appropriate measures. The transfer learning approach aims to improve the performance of a learned classifier trained on a domain based on the information acquired, while learning another domain or task.
In recent times, deep learning has been employed in EEG-based preference recognition. DNNs are gatherings of the artificial neurons organized in layers to estimate the nonlinear resolution border. The most popular type of DNN used for BCIs is the multi-layer perceptron (MLP), which normally consists of only one or two hidden layers. Other DNN types have been explored less frequently, such as the Gaussian classifier neural networks or learning vector quantization neural networks [57]. Furthermore, Teo et al. [55,56] proposed deep learning approaches for preference recognition by using 3D rotating objects. The results prove that the use of the deep network could obtain a higher accuracy compared to that of the other machine learning classifiers, such as the SVM, RF, and KNN algorithms. In their extended research, Teo et al. [23] improved the result accuracies by using a deep network plus dropout architecture, with rectified linear units and tanh for activation at 79.76%. Table 6 presents some neuromarketing studies that used different classification algorithms to obtain the most accurate results in predicting the consumer preferences. The review highlights the need to use more features and hybrid classifiers to improve the accuracy results of the predictions [22,44].

Preference Detection Using a BCI
This section explains the design process of the neuromarketing experiment to predict the consumer preferences and choices. First, a BCI device must be placed on the head of a consumer. Next, the consumer is asked to look at the products. During the recording phase, the EEG data are recorded concurrently while the consumer views a product. After viewing each product, the user is asked for his or her preference toward the product in terms of a five-or nine-point scale of subjective rank. When all products are displayed, the subjective ranks must be manually labeled as liked or disliked classes. Next, the EEG signals undergo a signal preprocessing and feature extraction. The classification module is developed based on the ground truth completed by the consumer's selection (subjective ranks). Figure 3 presents a proposed BCI system for the preference detection composed of three main modules: signal preprocessing, feature extraction and selection, and classification.

Proposed System for the EEG-based Preference Detection
The performance of EEG recognition systems is based on the selection of a feature extraction technique and a classification algorithm. In our study, we investigated the possibility of detecting two preference states, namely pleasant and unpleasant, by using EEG and classification algorithms. To this end, we performed rigorous offline analysis to investigate the computational intelligence for the preference detection and classification. We used deep learning classification from the DEAP dataset to explore how to employ intelligent computational methods in the form of classification algorithms. This could effectively mirror the preference states of the subjects. Furthermore, we compared our classification performance with those of the KNN and RF classifiers. We built our model in the open source programming language Python and used the Scikit-Learn toolbox for machine learning, along with SciPy for EEG filtering and preprocessing, MNE for EEG-specific signal processing, and the Keras library for deep learning.
In this section, we discuss our methodology along with some implementation details of the proposed system for EEG-based preference detection. We begin with describing the benchmark dataset and ground truth of the preference labeling. Next, we explain the feature extraction. Finally, we illustrate the DNN classification model.

Dataset Description
DEAP [58] is a benchmark EEG database developed for affective analysis. The DEAP database was built at the Queen Mary University in London, and it has been used in several research studies for preference detection [59,60]. Table 7 summarizes some characteristics of the DEAP dataset.

Stimuli
Visual-and audio-based stimuli (1-min music video)

participants aged 19 to 37 years
Trials 1280 trials (40 trials for each subject) EEG device 32 EEG channels of the Biosemi Active Two system. The EEG data stream was collected using 32 Ag/AgCl electrodes, which were arranged in accordance with the 10-20 international system.

Experimental Protocol
Each participant watched and rated his or her emotional responses to 40 music videos on scales of arousal, valence, and dominance using self-assessment manikins (SAM). Participants also reported their liking of and familiarity with the videos.

DEAP Database
Different datasets that are publicly available in the DEAP database include recorded signal data, frontal face videos for a subset of participants, stimuli-volunteers' self-reported data, and subjects' self-assessments.

Preference Modeling and Ground Truth
To set the true preferences (ground truth table), we used the DEAP self-assessment reports to identify the preference states using a nine-point Likert scale for valence dimension. In this study, we considered the valence dimension as a preference indicator to align with the target preference state: pleasant and unpleasant. Moreover, we considered EEG trials that had at least two different valence levels-low and high. The valence levels are classified as follows: (1) low valence if the valence rating is between 1 and 5; and (2) high valence if the valence rating is between 6 and 9. The presence of a low or high valence is an indicator of an unpleasant or pleasant preference state, respectively.

Data Pre-Processing
We used the preprocessed EEG dataset from the DEAP database, where the sampling rate of the original recorded data of 512 Hz was down-sampled to a sampling rate of 128 Hz, with a bandpass frequency filter that ranged from 4.0 to 45.0 Hz, and the EOG artifacts were eliminated from the signals using a blind source separation method, namely independent component analysis ICA. The data were averaged and segmented to 60-s trials. Then, we applied a channel selection step with a dimensionality reduction technique. The aim of this step was to reduce the number of features and/or channels used by selecting a subset that excludes very high-dimensional and noisy data. Ideally, the features that are meaningful or useful in the classification stage are identified and selected, while others, including outliers and artifacts, are omitted. Moreover, it reduces the computational cost of the subsequent steps. Therefore, we keep only the channels in which we are interested (Fz, AF4, AF3, F4, and F3).

Feature Extraction
Feature extraction plays a crucial role in building EEG-based BCI applications. Thus, we extracted the EEG frequency bands by using a power spectral density (PSD) method called the Welch method. Subsequently, we used the resulting frequency bands to calculate the valence as a preference indicator. Figure 4 presents the block diagram of feature extraction.

PSD
The PSD is one of the most popular feature extraction methods based on the frequency domain analysis in the neuromarketing research. Research studies [11,37,39] have demonstrated that the PSD obtained from the EEG signals works well for determining consumer preferences. The PSD method converts the data in the time domain to the frequency domain and vice versa. This conversation is based on the FFT, which calculates the discrete Fourier transform and its inverse.

Valence
The valence was selected as the measure of preference in this study. Strong valence is reflected in the activation of frontal EEG asymmetry [34]. In DEAP dataset [58], there was high correlation between valence and EEG frequency bands, as shown in Figure 5. The increment in valence led to power increment in alpha, which is consistent with the results in a similar study [34]. We did not use liking rating in the DEAP dataset because the data owners [58] reported conflicting findings between the activation in left alpha power and liking. We applied the different valence equations and investigated the relationship with the DEAP self-assessment valence measurement. For the valence calculation, we used the extracted alpha and beta band powers from the DEAP data and considered only the following electrodes: Fz, AF3, F3, AF4, and F4. Finally, we computed the values of the valence by using four different equation (Equations (3)-(6)), which have been well-explained in a previous paper [34] authored by an author of this paper.

DNN Classification
Deep learning has been proved as an effective tool to help make the EEG signals meaningful because of its ability to learn the feature representations from the raw data. DNNs are models consisting of the combined layers of "neurons" in which each layer applies a linear transformation to the input data. Then, the transformation result of each layer undergoes processing on the basis of a nonlinear cost function. The parameters of such transformations are deduced by minimizing a cost function [61]. The DNN operates in one forward direction, from the input neurons through the hidden ones (if available) to the output neurons in the forward directions. Assuming that the length window of the samples is s, the input of the DNN for the EEG signals consists of a multidimensional array X i ∈ R e×s that contains s samples associated with a window for all e electrodes. The fully connected layer, which is the most common type of layer used in building a DNN, consists of fully connected neurons. The input of every neuron is the activation of each neuron from the previous layer [61].
Our study aimed to detect two preference states in the EEG data. Therefore, we employed intelligent classification algorithms that could effectively mirror the preferences of the subjects. We proposed a DNN classifier and compared its performance with those of KNN and RF classifiers.
The block diagram of the proposed DNN classifier is shown in Figure 6. First, the extracted features are normalized using minimum-maximum normalization (Equation (7)) and then fed into the DNN classifier.
x − scaled = (x − min)/ (max − min) (7) In this study, the considered DNN architecture is a fully connected feed-forward neural network with three hidden layers, which contain units involving rectified linear activation functions (ReLu). The output is obtained as a soft-max layer with a binary cross-entropy cost function. The input layer consists of 2367 units, and each hidden layer consists of 75% units from its predecessor (previous) layer. In particular, the first, second, and third hidden layers involve 1800, 1300, and 800 units, respectively. The output layer dimensions pertain to the number of target preferences state (2) units. To train the DNN classifier, we used Adam gradient descent with three objective loss functions: binary cross-entropy, categorical cross-entropy, and hinge cross function. For transfer learning, we considered the reasonable defaults and followed the established best practices: the start learning rate was 0.001. Then, we linearly reduced the rate with each epoch such that the learning rate for the last epoch was 0.0001. We set the dropout for the input and hidden layers as 0.1 and 0.05, respectively. The stopping criterion of the network training was determined according to the model performance on a testing set. If the network started to over-fit, the network training was stopped. This stopping criterion is helpful for reducing the possibility of over-fitting of the validation data. The network was tested on a test set, which contained approximately 20% of the data samples in the dataset.

Results and Discussion
We predicted the preference states (pleasant or unpleasant) using different classification algorithms: DNN, RF, KNN, and SVM. We used different evaluation measurements: accuracy, recall, and precision. The accuracy was calculated as the average of the binary measurements in which the score of every class was weighted by its availability in the real data. Precision is the proportion of pleasant preference predictions that were actually correct. Recall is the proportion of actual pleasant preferences that were successfully predicted. To evaluate the performance of classification algorithms, we used different cross validation methods: holdout (train/test splitting), k-folds cross validation, and leave-one-out cross validation (LOOCV). Table 8 presents the accuracy results of DNN, RF, KNN, and SVM for each cross validation method. In LOOCV, RF reached the best accuracy results at 90% while DNN reached similar results to RF at 93% in the holdout validation method. In k-fold validation method, KNN achieved the best accuracy results at 90% and 91% when k was set to 10 and 20, respectively. Because the best accuracy results were achieved using the holdout validation, this method was chosen as the base validation for comparison and the tuning the DNN hyper parameter (loss function). The proposed DNN model was compared with three conventional classification algorithms for EEG signals: SVM, RF, and KNN. Table 9 presents the accuracy, recall, and precision results of RF, KNN, and DNN using three different loss functions in the DNN: the categorical cross-entropy function, binary cross-entropy function, and hinge function. The KNN classifier led to a better accuracy of 88% when K was set to 1. Although the RF achieved a high accuracy of 92%, the DNN reached the highest accuracy result of 94% with hinge cross-entropy function compared to the other conventional classification algorithms. To ensure that the DNN does not have over-fitting problem, we presented the loss per epoch for each cross-entropy function. The average loss per epoch DNN with the categorical, binary, and hinge function reached a value of 0.28, 0.24, and 0.23, respectively, as shown in Figure 7. (a) (b) (c) Figure 7. Loss per epoch on the training and validation sets in DNN using different cross-entropy functions: (a) categorical cross-entropy (average loss rate = 0.28); (b) Binary cross-entropy (average loss rate = 0.24); and (c) hinge cross-entropy (average loss rate = 0.23).

Conclusions
This study proposed a DNN model to detect the preferences from the EEG signals by using the pre-processed DEAP dataset. Two types of features were extracted from the EEG: the PSD and valence. This aspect resulted in a group of 2367 unique features illustrating the EEG activity in each trial. We used different evaluations measures (accuracy, recall, and precision) and various validation methods (holdout, LOOCV, and k-fold cross validation) to test classifiers' performance. We built four different classifiers, namely the DNN, RF, SVM, and KNN classifiers, which achieved an accuracy of 94%, 92%, 62%, and 88%, respectively. The results demonstrate that, although the proposed DNN exhibits a higher accuracy, recall, and precision compared with the KNN and SVM, RF reaches similar results to DNN on the same dataset. Future research directions will involve exploring the DNNs in the context of transfer learning for preference detection.
Author Contributions: M.A. conceived, designed, and performed the experiment; analyzed and interpreted the data; and drafted the manuscript. A.A.-N. co-supervised the analysis, reviewed the manuscript, and contributed to the discussion. M.Y. supervised this study. All authors have read and approved the submitted version of the manuscript.