The Cognition of Audience to Artistic Style Transfer

: Artificial Intelligence (AI) is becoming more popular in various fields, including the area of art creation. Advances in AI technology bring new opportunities and challenges in the creation, experience, and appreciation of art. The neural style transfer (NST) realizes the intelligent conversion of any artistic style using neural networks. However, the artistic style is the product of cognition that involving from visual to feel. The purpose of this paper is to study factors affecting audience cognitive difference and preference on artistic style transfer. Those factors are discussed to investigate the application of the AI generator model in art creation. Therefore, based on the artist’s encoding attributes (color, stroke, texture) and the audience’s decoding cognitive levels (technical, semantic, effectiveness), this study proposed a framework to evaluate artistic style transfer in the perspective of cognition. Thirty ‐ one subjects with a background in art, aesthetics, and design were recruited to participate in the experiment. The experimental process consists of four style groups, including Fauvism, Expressionism, Cubism, and Renaissance. According to the finding in this study, participants can still recognize different artistic styles after transferred by neural networks. Besides, the features of texture and stroke are more impact on the perception of fitness than color. The audience may prefer the samples with high cognition in the semantic and effectiveness levels. The above indicates that through AI automated routine work, the cognition of the audience to artistic style still can be kept and transferred.


Introduction
In recent years, more significant progress has been made in Artificial Intelligence (AI) research, including the area of art creation. The continuous development of technology has promoted increasing interaction between art and AI. The neural network as a popular machine learning method (ML) can also be applied in artistic stylization, called neural style transfer (NST). There have been many methods proposed by computer science researchers to get a better conversion effect. However, when methods of NST are adopted to transfer artistic style, evaluating their results is so complex that only employing quantitative methods commonly used in the computer graphics community is not enough [1,2]. So far, there are few evaluations of machine learning artistic style from human cognitive factors. The artwork encodes from the artist's inner expression to the outer form, while the audience decodes it from external recognition to inner feeling. Even at the intersection of AI and art, human cognition can still feedback to the optimization direction and application mechanism of the AI generator model in the field of art.
This study is intended to provide researchers who focus on AI application in art with a framework of how to obtain the art cognitive effects for machine-generated results. As shown in Figure 1, this study can be divided into three sections. In Section 1, a literature review was used to explore the communication matrix for evaluating artistic style transfer. Section 2 invited nine experts in artistic and/or aesthetic backgrounds to select painting samples according to cognitive attributes and choose the suitable method for transferring abstract art style. In Section 3, the data collected from the grouping experiments of four art schools were used for analysis and discussion. Finally, the discussion and conclusion of this study were given.

Application of NST into Art
Neural Style Transfer (NST) is a type of algorithm that uses deep neural networks to render a content image in different styles. In the early research stage of Non-Photorealistic Rendering (NPR), many stylization algorithms were designed to automatically turn photos into synthetic artworks by stimulating artists to create art [3,4]. However, these patch-based methods' limitation is that only low-level image features are adopted while often failing to capture image structures effectively [5]. Gatys et al. [6] are the first to apply Convolutional Neural Networks (CNNs) to transfer famous painting styles into photography. Since the algorithm of Gatys et al. does not have any explicit restrictions on the type of style images and needs ground truth results for training, it breaks the constraints of previous approaches. In this case, it ushered in the new field called Neural Style Transfer (NST). The method put forward by Gatys et al. based on the Gram-matrixmatching-based style representations requires a slow iterative optimization process, which is computationally expensive. Still, it is usually regarded as the gold-standard method in the community of NST [5,7].
During the follow-up years, there are a series of optimization methods for NST that can fall into paired and non-paired methods. To be specific, the former is to transfer the style of one image to another, while the latter is to learn the style of multiple images in a dataset and then transfer it. As the experiment in this study tends to provide each painting and its corresponding conversion as a paired stimulus, it is more suitable to choose a pairing method. In addition to Gatys' original algorithm, this type of methods consists of several representative algorithms, such as AdaIN [7] and WCT [8] for improving speedy, as well as MST [9] and SEMST [10] which focuses on the matching of semantic patterns between content images and style images. Inspired by the instance normalization, Huang and Belongie [7] proposed a new interpretation by normalizing feature statistics to adaptive instance normalization (AdaIN), which first enables arbitrary style transfer in real-time. Like Huang and Belongie's method but replacing the AdaIN layer between the encoder and the decoder, Li et al. [8] embedded a pair of feature transforms, whitening, and coloring (WCT) into an image reconstruction network. However, these algorithms based on feed-forward networks failed to consider the feature details and local structure and suffered from wash-out artifacts [5,9]. To match the content structure, Zhang et al. [9] first introduced a multimodal style transfer method called MST. He considered the semantic content structure and the matching with style patterns by using K-means to split style patterns from style image and combine it with content images via graph-cut. Recently, Chen [10] analyzed the shortcomings of MST, such as the inability to consider the structural information of the content image or the loss of the style characteristics due to the feature's high dimensional and low-resolution characteristics space and the characteristics of the graph cut method. In further research, he proposed a structureemphasized multimodal style transfer (SEMST) that can flexibly match the content cluster and the style cluster even if the number of clusters differs. Although there are many other papers and studies in the NST field, the purpose of this study is to explore the differences in audience cognition to the results of artistic style transfer, rather than review all methods.
The way to compare different NST algorithms includes both qualitative and quantitative methods. The main quantitative evaluation metrics focus on transfer speed and loss comparison [5,9]. Regarding the qualitative evaluation, the observation of style features and user study by voting are usually used. In the experiment of Li et al. [8], they invited 80 subjects to vote for their favorite result for each method. Then, the study shows the WCT received more votes than other methods like AdaIN and Gatys et al. Yet, as the qualitative assessment of Zhang et al. [9], the result showed that the MST obtained the highest percentage of votes, while Gatys et al. ranked second above AdaIN and WCT. Due to the varied choices of samples and users, the results by using the qualitative methods are also different [8,9]. The evaluation of response to artistic imagery is a complex process that includes both perceptions of shape and evaluating aesthetics and other properties of images [1]. To obtain accurate and stable results from the qualitative assessment, a reasonable evaluation mechanism needs to be constructed.
The field of NST has received increasing attention from academics and industry in recent years, resulting in some related industrial applications, such as the websites Ostagram [11] and Deep Dream Generator [12], the mobile application Prisma [13], and even film production [14]. People can use those platforms to create some interesting images for pleasure or spawning creativity. Machine-based intelligence learning from human intelligence will also be inspired by human insight. Therefore, the evaluation results from audience cognition can provide the algorithm with some suggestions for further optimization and the application regarding the intersection with art.

Attributes of Artistic Style
In the visual arts, style is a distinctive manner that permits the grouping of works into related categories [15], and it refers to similar critical features for recognition, such as characteristic subject matter or materials, distinctive ways of drawing or applying paints, preferences for specific color combinations, brushstrokes, distortion, and exaggeration. These visual pieces of information can direct the viewer's attention and affect the viewer's perception [16,17]. The perception of visual art is a complex performed by the brain to perceive different elements' features in the painting [18,19]. The various painting attributes, such as colors, shapes, and boundaries, are selectively redistributed to the brain for processing [20]. There are three critical features in the receptive field: position, shape, and specificity, similar to modern art, with its accent on simplification [21].
Artistic visual styles such as Fauvism, Expressionism, and Cubism in the early 20th century have the commonality of abstraction and expression. Meanwhile, they have distinctive attributes that support the grouping of artworks into related art movements [22][23][24][25] Both Fauvism and Expressionism use pure colors and subject distortion. However, unlike the thick coating method or squeezing the paint directly from the tube in Fauvism, Expressionism is characterized by broad brush strokes to exaggerate artists' inner emotions and feelings [23,26]. Cubism is known for its reduction of subjects into geometric shapes to produce a more three-dimensional perspective. It is a calm and reflective experimental art that balances representation and abstraction and abandons colors in favor of an almost monochrome palette [22]. Different from the modernist style, realism such as the Renaissance involves several techniques that make the subjects and backgrounds look like what they would be in real life. As Renaissance artists began using the scientific perspective and new ways of painting light, their paintings became more meaningful and realistic [27].

Communication between Audience and Artwork
Art is the media that provides an understanding of visual communication [28]. As a communication channel, artwork can connect the artist and audience through the creator's encoding and the audience's decoding [28][29][30]. The interplay between the internal (cognitive) representation and the external (physical) representation is a fascinating problem in cognitive psychology, art, science, and philosophy [18]. Each of us approaches an art object with a significantly different perspective because of our unique personal social experiences [18,31]. Whoever can feel the beauty of colors and forms has understood non-objective painting [32]. The aesthetic experience involves the processing of stylistic information closely related to the artist's movements [31,33] and simultaneously performing the movements can enhance the preference for paintings of the corresponding style [34]. In abstract paintings, the ideas, emotions, and visual sensations are expressed solely through lines, shapes, color, and textures that have no symbolic significance [35]. According to people's feelings, color may be warm or cold, and cheerful or somber [20]. Audiences can perceive the painter's actions by merely observing the brushstroke of the painting [36]. Texture includes all painting areas enriched by combing lines, shapes, tones, and colors [16,35]. Therefore, from the perspective of cognition, we can learn more about the process of creation [37].
In the symbol communication mode, cognition can be divided into three levels: technical, semantic, and effectiveness levels [38]. To evaluate the artwork, it is necessary to find out the cognitive factors affecting them. Lin et al. [39] proposed a communication matrix that combined the cognition and communication theory, which integrated into three dimensions for evaluating artwork. There are three stages for artists to express their thought through artworks: inspiration, ideation, and implementation. For audiences, the three stages of experience include aesthetic, meaning, and emotional. Shusterman [40] and Bergeron and Lopes [41] suggest that an aesthetic experience is featured with an evaluative dimension involving semantic and affective aspects to confer the aesthetic quality to the experience. However, in the literature, there is little evidence of discussing the evaluation concerning the results of artistic style transfer by computer algorithms from the aspects of aesthetic cognition. In the creative process of AI-Art, the artists choose AI algorithms according to their intentions for creating the artwork, and audience acceptance is a critical defining step to decide whether it is "art" [42]. Studying the factors that affect artistic cognition from the perspective of the audience can help build a bridge between artists and the audience [43,44].

Research Purpose
The perception of artistic style is the complex processing of visual information. The audience is a vital participant of art, especially when AI technology is involved in the art field. The artistic style transfer methods by using neural networks have improved efficiency and effectiveness in Non-Photorealistic Rendering. If trying to further optimize industry applications, the audience's perception of the style transfer results should be considered. Technology comes from humanity, so the integration of Hi-tech and Hi-touch is necessary for art creation and product design, especially in the 21st-century digital technology world [42].
In this study, the style transfer results of four painting schools were evaluated from the perspective of audience cognition. Through four cognitive experiments, including three abstract schools of modernism in the early 20th century (Fauvism, expressionism, cubism) and Realism (Renaissance), it tends to explore whether the audience can still distinguish the different art schools and keep the cognition from technical, semantic and effectiveness levels. The main factors affecting the audience's cognition in artistic style will be discussed to understand the difference before and after style transfer. The result of this study can be used as an optimization suggestion to improve the performance of NST methods closer to human cognition.

Pre-Experiment
To select the appropriate algorithm for this experiment, the method of Gatys et al. [6], AdaIN [7], WCT [8], MST [9], and SEMST [10] were used to transfer four samples' style to the same content image. Gatys et al. [6]'s seminal work synthesized stylization through an iterative optimization scheme. AdaIN [7] and WCT [8] are methods using feed-forward networks to enhance efficiency to a great extent. MST [9] and SEMST [10] solved the above methods' artifact problem by matching the semantic pattern in style and content images via K-means. It should be mentioned that the codes employed here come from GitHub provided by the author or written according to papers. The links are shown in Table 1.

Methods
GitHub links Gatys et al. [6] https://github.com/ProGamerGov/neural-style-pt (accessed on 1 June 2020) AdaIN [7] https://github.com/naoto0804/pytorch-AdaIN (accessed on 1 June 2020) WCT [8] https://github.com/sunshineatnoon/PytorchWCT (accessed on 1 June 2020) MST [9] https://github.com/irasin/Pytorch_MST (accessed on 1 June 2020) AEMST [10] https://github.com/irasin/Structure-emphasized-Multimodal-Style-Transfer (accessed on 10 July 2020) The five methods were adopted with their default parameters to create stylized images by learning the features of four schools' representative paintings. A photograph of Guangong in opera was used as the content image for its rich semantic involving character, decorative patterns, and background. Then, nine experts with artistic and/or aesthetic backgrounds were invited to vote for the best method through perception, and the best method in each art school was marked with the blue line in Figure 2. The feedback from experts is that the results of Fauvism and Expressionism can keep the balance between style and content by using the method of Gatys et al. method. Although the use of WCT in the former two schools produced serious distortions and lacked content structure, the result of Cubism still retained the feeling of geometric fragments. For the Renaissance style, SEMST is more semantic and more suitable for transferring realistic style. However, the sense of abstract style was lost in the other three schools of modernists. Based on the purpose of this research, the method of Gatys et al. was selected to produce all the stimuli of experiments.  [6], AdaIN [7], WCT [8], MST [9], and SEMST [10]. The most suitable method in each school was selected by experts and marked by a blue line.

Stimuli
Based on previous studies [29,30,39,43,44], the artwork can be evaluated from technical, semantic, and effectiveness levels. The technical level focus on the visual elements. The semantic level means letting the audience accurately understand the meaning of the artwork through his/her realization. The effectiveness level concerns the inner feeling of the audience on the emotional expression of the artist. Color, stroke, and texture as important factors of painting style can still be evaluated from the above three levels. Therefore, the matrix for evaluating the cognitive effect of artistic style transfer through machine learning was reconstructed, as shown in Figure 3. The nine experts who are mentioned above continued to pick up paintings with typical color, stroke, and texture from wikiart.org and then chosen one sample matching each cognitive attribute in the evaluation matrix, and they are Color-technical, Colorsemantic, Color-effectiveness, Stroke-technical, Stroke-semantic, Stroke -effectiveness, Texture-technical, Texture-semantic, and Texture-effectiveness. After obtaining nine paintings of each school, the experts further describe its characteristics according to the representative attributes of each work, which were used as the questionnaire items. For example, according to the feature of the F1 sample in the attribute it represents, the Colortechnical item in the Fauvism group was described to be "complementary," which is shown in Table 2. Theoretically, the F1 sample should get the highest score on this option. Besides, a Mona Lisa portrait was inserted into each group as the reference sample, and the score should be higher in the Renaissance group. In the Fauvism, Expressionism, and Cubism groups, the Mona Lisa should have significant cognitive differences from other samples due to different periods and styles.
The content image, a photograph of a character in opera called Guangong, was combined with the style of paintings using the method of Gatys et al., shown in Table 2. Before transfer, all of the style and content images were resized to 600 px high. Furthermore, the default parameters suggested by the authors and the pre-trained VGG19 network were also adopted. In layer options, reconstructions used the conv1_1, conv2_1, conv3_1, conv4_1, and conv5_1 and content reconstructions used the conv4_2 layer. The process was initialized with the content image instead of random noise in the paper to exclude random results. For the style setting, the ratio content_weight/style_weight was 1 × 10 −3 . Experiments conducted by Gatys et al. [7] prove that this ratio can better balance content and style. Besides, 1.0 as the scale of the style was used to keep the proportional transfer of texture. The number of iterations was 1000 as default parameters because there is no obvious change after increasing more.

Participants
All of the study participants were artists, professors, or Ph.D. with art and/or design background. A total of 31 subjects (17 male and 14 female) participated in the experiment. The mean age of the male participants was 48.29 (SD = 9.45) years. The mean age of the female participants was 42.71 (SD = 10.91) years.

Experimental Procedure
This study consisted of four groups of experiments. The stimuli of each group contained nine samples representing nine cognitive attributes of one art school, as well as the corresponding results combining with their artistic style and content images by using the method of Gatys et al. The famous painting Mona Lisa was inserted into each group as the reference sample to find whether there is a difference in its cognition from others.
The researcher first introduced the process to the subjects. Then, starting from Fauvism, there were ten sets of stimuli, including nine pairs of paintings belonging to Fauvism and one pair of Mona Lisa. Each painting and its transfer result were projected onto the screen simultaneously. Subjects had 2-3 min to complete the evaluation of the samples. For every group, the experiment lasted for less than 1 h with at least one week apart to avoid learning effects, and the duration was four weeks.
The subjects were informed of the experimental task and then asked to rate each pair of samples according to their subjective opinion. The dependent variables were presented in the questionnaire on the participants' perception. In the first part, the degree of fitness for style attributes involving the color, stroke, texture, and overall fitness between the original painting and transfer result, with a score range from 0 to 100%-the higher the score, the higher the fit. The second part explored the degree of fitness on the nine cognitive attributes in each pair of stimuli. The attribute category includes "Colortechnical", "Color-semantic", "Color-effectiveness", "Stroke-technical", "Strokesemantic", "Stroke-effectiveness", "Texture-technical", "Texture-semantic" and "Textureeffectiveness." The item description of each attribute in the questionnaire was provided by the nine experts mentioned above, based on each sample's feeling. A 5-point Likert scale was used to score the fitness of nine cognitive attributes of each pair from 1 ("Very unfit") to 5 ("Very fit"). Finally, the subjects were asked to choose a favorite sample and a least favorite sample respectively in each group.

Statistical Analysis
Firstly, one-way ANOVA was adopted to test whether the style attributes and cognitive attributes of ten pairs of painting factors per group were significant (significance level was set at 0.05). For programs reaching the significance level, we used Duncan was used to figure out whether there was a significant difference among ten averages. In addition, a Pearson correlation coefficient was also employed to investigate the potential correlation between overall fitness and three style factors, including color, stroke, and texture.

Results of Descriptive Statistics and ANOVA Analysis
According to the results of variation analysis on the first part of the questionnaire as shown in Table 2, after the same subjects viewed the four groups of paintings, they displayed a significant difference in their feelings for style attributes (i.e., "color fitness," "stroke fitness," "texture fitness," and "overall fitness"). Different from the results in Fauvism, Expressionism, and Cubism that the sample of Mona Lisa independently exists in the lowest score group, it was classified into the same group as other samples in Renaissance. The result indicates that subjects could still distinguish the different schools of painting even after style transfer. Otherwise, similar to the literature discussion, the method based on iterative optimization and feed-forward networks lacking to consider the feature details and content structure [5,9], the algorithm of Gatys et al. is more suitable to transfer abstract style than realism style.
It is worth noting that the samples which belong to stroke and texture attributes in Fauvism, Expressionism, and Cubism got a higher score in their fitness evaluation. However, among the Renaissance stimuli, the samples with color characteristics scored higher than others. The relevant scores have been marked with red lines in Table 3.
As the results of the evaluation for cognitive levels in part two, some results of samples with certain cognitive features can still be perceived by subjects, as shown in Table 4. The score belongs to style attributes of color, strokes, and texture was marked by the red line, as well as the gray background color emphasizes the score of samples on their corresponding attributes. Furthermore, the highest score on the corresponding attribute is marked by the blue dashed line, which is mainly distributed in stroke and texture attributes in Fauvism, Expressionism, and Cubism. However, in the Renaissance group which lacked stroke and texture features, the participants had a stronger perception of color. The reference sample (Mona Lisa) obtain the lowest scores in all cognitive attributes of Fauvism, Expressionism, and Cubism. However, in the style that it belongs to, Mona Lisa had a higher ranking in various indicators of Renaissance, further explaining that the subjects can distinguish the different schools of painting even after style transfer.

Pearson Correlation Coefficient
A strong positive association was established using a Pearson correlation coefficient between overall fitness and three style attributes between including color, stroke, and texture, shown in Table 5. On the one hand, it means that color, stroke, and texture can be used as factors to evaluate the algorithm of artistic style transfer. In addition, stroke and texture have a higher correlation with the overall fit than color. Hence, compared with color, the simulation stroke and texture in artistic style transfer play a more critical role and should be paid more attention.

The Favorite and Least Favorite Paintings
To further determine the critical point that affects participants' preference, in this study, subjects were invited to choose their favorite and least favorite samples respectively from each group. Table 6 shows the proportion of each painting being selected as a favorite. The stimulus ranking top in the preference vote had higher scores of overall fitness. Besides, their score in corresponding cognitive attributes is the highest. Based on the previous discussion, users have a higher perception of abstract style strokes and textures. The samples that can accurately transfer the cognitive features of strokes and textures are more welcome. However, as the realistic style without evident stroke and texture features, the transition result of Renaissance received more attention in color. In Figure 4, the most favorite sample of each group was listed. Subjects can feel the corresponding cognitive attribute of P9, P8, P5, and P3: Cute Pattern, Wantonly, Bright & Fast, and Lively.  The proportion of each painting being selected as the least favorite was shown in Table 7. Subjects voted P10 as their least favorite sample in all groups except Renaissance. In the Renaissance group, there was no significant difference in cognition because the Mona Lisa portrait belonged to the same style as other samples. Therefore, it can be further explained that the subjects can distinguish different artistic styles even reproduced by machine learning. In the Fauvism, expressionism, and Cubism groups, the image with the lowest score was also the subjects' least favorite samples, which showed a positive correlation between the preference and cognitive fitness. From Figure 5, it can be found that the two least favorite samples also lost the content's details seriously. The more details lost, the fewer subjects like it. The wash-out artifact problem in the feed-forward networks-based method [5,9] causes the bad transfer result of realistic style.

Conclusions
The experimental results showed that subjects could still distinguish the different art schools and keep the cognition from technical, semantic, and effectiveness levels. The cognitive attributes have a strong relationship with the overall feeling and audience's preference. It supported that the matrix based on style attributes and cognitive levels can be used for subjective evaluation of the NST method. Among the three style attributes, the fitness of stroke and texture have a higher degree of correlation with overall fitness than color. However, for the Renaissance featured with the style without obvious stroke and texture, the audience turned their attention to the feeling of color and the painting's content. Therefore, which style attribute should be strengthened is based on the style sample's features during the NST encoding. Besides, the sample whose semantic or effectiveness level was accurately perceived usually has a higher preference, so the two cognitive levels are the key to improve the audience's preference. The correlation between style feature and their corresponding cognitive effects should be further discussed.
This study presented a framework that can be used as the fundamental research for subject evaluating AI algorithms in the art field. However, this study still had several limitations. Firstly, as the classification of painting styles is complex and subjective, there are some uncertainties in the choice of painting samples and the corresponding evaluation criteria. Therefore, in the future study, the two samples with the highest scores in each group will be selected for cognitive evaluation of a large number of subjects to further elaborate on how the result could be applied to a group of users without a professional background. Secondly, the methods of comparison in this study are difficult to fully cover the newest technologies with the rapid development in this field. The algorithms selected in this study only involve the typical paired methods, while failing to include the nonpaired ways by representing the features from datasets.