Next Article in Journal
Strengthening and Repair of Reinforced Concrete Columns by Jacketing: State-of-the-Art Review
Next Article in Special Issue
Session-Based Recommender System for Sustainable Digital Marketing
Previous Article in Journal
How to Save Bike-Sharing: An Evidence-Based Survival Toolkit for Policy-Makers and Mobility Providers
Previous Article in Special Issue
Research Challenges in Digital Marketing: Sustainability
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

A Text Mining Approach for Sustainable Performance in the Film Industry

Graduate School of Information, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea
Division of Computer Science and Engineering, Sunmoon University, 70, Sunmoon-ro221beon-gil, Tangjeong-myeon, Asan-si, Chungcheongnam-do 31460, Korea
Author to whom correspondence should be addressed.
Sustainability 2019, 11(11), 3207;
Submission received: 8 April 2019 / Revised: 4 June 2019 / Accepted: 4 June 2019 / Published: 9 June 2019


Many previous studies have shown that the volume or valence of electronic word of mouth (eWOM) has a sustainable and significant impact on box office performance. Traditional studies used quantitative data, such as ratings, to measure eWOM. However, recent studies analyzed unstructured data, such as comments, through web-based text analysis. Based on recent research trends, we analyzed not only quantitative data, like ratings, but also text data, like reviews, and we performed a sentiment analysis using a text mining technique. Studies have also examined the effect of cultural differences on the decision-making processes of individuals and organizations. We applied Hofstede’s cultural theory to eWOM and analyzed the moderating effect of cultural differences on eWOM influence. We selected 338 films released between 2006 and 2015 from the BoxOfficeMojo database. We collected ratings and reviews, box office revenues, and other basic information from the Internet Movie Database (IMDb). We also analyzed the effects of cultural differences, such as power distance, individualism, uncertainty avoidance, and masculinity, on box office performance. We found that user comments have a greater impact on film sales than user ratings, and movie stars and co-production contribute to box office success. We also conclude that cultural and geographical differences moderate the sentiment elasticity of eWOM.

1. Introduction

Today’s international economic circumstances are undergoing vast changes due to the uncertainty of the global economy and the growing tendency toward protectionism in each country. In response, Korea and other governments are making efforts to transform this crisis into opportunities by building a local economic community and establishing a favored network with other major countries [1]. In this process, it is necessary for companies to understand the cultures of their trading partners, in addition to the policies of each country, for a smooth entry into foreign markets. On the other hand, each country is expanding electronic commerce systems, using big data analysis to pioneer international markets and improve sustainable revenue generation [2]. Therefore, a systematic study on the behavior of online users is required for companies to successfully utilize information technology in all international transactions. Over the past several decades, a number of prior studies in various disciplines, including international economics and the digital marketing environment, have analyzed the impact of online user behavior or cultural differences on business performance [3]. However, it is difficult to study the behavior of online users and cultural differences at the same time. The purpose of this study is to examine the effect of the cultural differences of online users’ characteristics thus as to promote sustainable performance. Specifically, we examined the effects of electronic word of mouth (eWOM) and cultural differences with regard to box office performance, which is a typical form of cultural output.
The concept of sustainable performance first emerged in the field of business administration, and is now widely used across all areas of society. From a business perspective, sustainable performance is a combination of sustainable development and corporate social responsibility (CSR) [1]. As the meaning is derived from various fields, it has come to incorporate both sound environmental and economic performance as well as social responsibility [3]. We have added to the definition of sustainable performance in this study. First, sustainable performance is a strategy that both enhances shareholder value and adds social and environmental value for external stakeholders. Second, it includes efforts by all stakeholders to improve the quality of life of workers, their families, and the larger communities in which they live; this leads to sustainable economic development for these communities. Third, it is a collective action in which the economy, environment, and society are together responsible for sustainable development.
There are two major streams of research on the relationship between classical WOM and culture. One focuses on how cultural differences affect user participation in reviews. This stream of research examines cultural differences as antecedents of WOM senders and insists that cultural differences affect the extent to which customers search for information. In contrast, another flow is the study of how cultural differences affect WOM. This stream of research examines cultural differences as antecedents of WOM receivers. For example, a prior study found that “received WOM has a stronger effect on the evaluation of customers in high-uncertainty-avoidance than in low-uncertainty-avoidance cultures” based on the four cultural dimensions of Hofstede and Hofstede [4]. However, no research study has examined how culture affects the relationship between eWOM and customer preferences in the online world.
We divided the data analysis into two processes in this study. We verified the previous studies in Phase 1, and we compared the sentiment analysis with text mining and cultural differences in Phase 2. Phase 1 focuses on the comparison (a two-by-two comparison with users and critics) of the effects of reviews and ratings on box office revenue. In Phase 2, the reviews alone were subject to the same comparison, and the significance of these moderating effects and Hofstede’s cultural grades were compared for eight countries and two regions based on the nationalities of multiple users and professional critics who left reviews.
This study proceeded as follows. First, we generated hypotheses based on the previous studies of eWOM and box office revenue and combined Hofstede’s cultural theory with eWOM research. Then, we proposed the research methodologies. Lastly, we analyzed the data and discussed the results for sustainable box office performance. From the results of the study, we identified possible implications for the film industry.

2. Background and Hypothesis Development

2.1. Prior Studies on Electronic Word of Mouth

The term “word of mouth” has been widely studied in the field of economics since it was first introduced in Fortune Magazine in 1954. WOM is more effective than other marketing tools, such as advertising, because of information reliability and faithfulness. WOM is more effective than traditional marketing tools in acquiring new customers, and WOM is significant when consumers are deciding whether to purchase unknown products or services [5].
Recent research has been particularly focused on eWOM in online communities. Many studies have provided an overview of online feedback mechanisms and pointed out that eWOM can be stored and accessed at the same time; they argue that eWOM has anonymity, without the constraints of time and space, due to the nature of the Internet [6,7]. Other studies have investigated the self-selection effect of online product reviews. In particular, they analyzed online reviews and found that, over time, user ratings were affected by initial user reviews, and that self-selection effects led to bias [8].
Previous studies on eWOM have primarily targeted specialized sites featuring films, restaurants, hotels, and software programs, or specific products on online shopping sites. The studies targeting films can be divided into two categories: Studies on user reviews [9,10] and studies on third-party reviews (TPRs), such as critics’ reviews [11,12]. eWOM studies on specialized sites have focused on various products and services, including TV shows [13], restaurants [14], and hotels [15]. These studies used quantitative indicators such as revenue, sales rank, and stock returns as response variables to evaluate the effect of eWOM, as well as some quantified qualitative variables such as helpfulness, based on surveys. For explanatory variables, volume or ratings were commonly used. However, text mining has been recently adopted to analyze user sentiment [16]. Table 1 shows prior studies on electronic word of mouth.

2.2. User Reviews and Sustainable Box Office Revenue

We analyzed the relationship between the valence of eWOM and box office revenue. Valence determines whether WOM reviews are positive or negative, and in the case of films, is generally measured by quantified user ratings. Some earlier studies have argued that the valence of eWOM, or user ratings, has a positive effect on sustainable box office revenue [22]. On the other hand, other studies that used simultaneous equation panel data analysis described no significant correlations between them [23]. To validate the relationship between the valence of eWOM and box office revenue, we proposed the following hypothesis.
Hypothesis 1a.
User Ratings, a Form of eWOM, Have a Positive Impact on Box Office Revenue.
We analyzed the effect of text-based user ratings and reviews, as well as users’ sentimental reaction to films. It used to be technically impossible to perform sentiment analysis and process users’ opinions, but advances in unstructured data analysis have facilitated sentiment analysis using text mining. A prior study analyzed 12,000 film reviews using manual coding and human judgment, and they described it as “an extremely tedious task” [13]. However, recent years have seen an increase in studies that analyzed user ratings using text mining [24]. For instance, a prior study combined natural language-processing techniques and statistical learning methods [25]. They studied the methods to predict the return on the investment as an indicator of a film’s sustainable performance by analyzing film scripts.
Many studies have concluded that reviews of eWOM have a positive effect on box office revenue. Some asserted that “customers read review text rather than relying simply on summary statistics” [8], while others claimed that “the textual content of product reviews is an important determinant of consumers’ choices” [17]. The following hypothesis was generated to analyze users’ positive or negative reactions based on texts, as well as the effect of such reactions on box office revenue.
Hypothesis 1b.
User Reviews, a Form of eWOM, Have a Positive Impact on Box Office Revenue.

2.3. Third-Party Product Reviews and Box Office Revenue

Studies on TPRs in the film industry have largely focused on whether film critics’ ratings influence or predict box office performance [26]. Some have argued that critics’ reviews are correlated with late and cumulative box office receipts rather than early box office receipts, and that film critics’ ratings and reviews are, therefore, predictors rather than influencers [12]. Another study analyzed the relationship between critics’ reviews and the box office performance of 200 films and concluded that critics’ reviews are both influencers and predictors of box office revenue. The authors also argued that negative reviews have a greater impact on box office performance than positive reviews [27]. A prior study analyzed 80 films, 1040 reviews by critics, and 55,156 user reviews and concluded that reviews by professional critics improved forecasting accuracy for sustainable film performance. Another prior study analyzed professional and amateur critics’ ratings for 246 films in six major genre categories and argued that high film revenues at the beginning of a release increased subsequent film ratings [21]. Moreover, another prior study analyzed 177 films produced from 1995 to 2007 at 21 studios owned by seven companies listed on the New York Stock Exchange. They argued, by analyzing the relationship between 542 professional critics’ reviews from Metacritic and daily stock returns, that the valence of TPRs had a significant impact on stock returns [11]. We derived the following hypothesis to test the conclusions of previous studies that critic ratings affect box office performance.
Hypothesis 2a.
Critics’ Ratings, a Form of eWOM, Have a Positive Impact on Box Office Revenue.
As text mining techniques have evolved, research has increasingly explored the impact of critics’ reviews, expressed in text as well as ratings, on the film industry. A study currently analyzed 12,136 WOM reviews by critics and weekly box office performances for 40 films and showed that WOM information from critics had significant explanatory power for both aggregate and weekly box office revenue. They explained that this explanatory power was based on the valence of WOM, such as the percentages of positive and negative reviews, rather than the volume of WOM [13]. However, few studies have analyzed the positive and negative aspects of critics’ reviews to understand the predictors or influencers of box office performance. Therefore, we established the following hypothesis to analyze the impact of sentiments in critics’ reviews on box office performance.
Hypothesis 2b.
Critics’ Reviews, a Form of eWOM, Have a Positive Impact on Box Office Revenue.

2.4. Prior Studies on Cultural Elements

One of the traditional research questions for eWOM is “How does eWOM differ cross-culturally?”, and previous studies have primarily asked this question with regard to how culture affects consumers’ decision-making processes [28,29]. However, little research has been done on how consumers accept eWOM in a given cultural environment, especially in the film industry. Thus, we needed to examine the applicability of Hofstede’s cultural dimension theory, which is a widely used cultural theory. Hofstede analyzed the attitudes and values of 117,000 IBM employees in 40 countries and summarized the cultural differences between countries in four dimensions: Power distance, individualism versus collectivism, uncertainty avoidance, and masculinity versus femininity [30]. He then set up four levels of value in 76 countries through six inter-country studies from 1990 to 2002 [31]. In later studies, he extended the existing four dimensions to six by adding: Long-term orientation and indulgence [32]. Despite many studies applying this to eWOM, we could not find any studies that used Hofstede’s cultural dimension theory to study the impact of eWOM specifically in the film industry.
A filmgoer reads other users’ reviews to decide whether to watch a film (e.g., whether the film fits one’s taste or whether it is fun), a process we can interpret as uncertainty avoidance. A filmgoer also chooses to watch a film based on the views of film critics, who are authorities on the film industry. We can interpret this process as one of submission to authority and define the degree of submission to authority as power distance. On the other hand, an audience’s evaluation of a film may be consistent or varied depending on the culture. We can interpret the diversity of film evaluations as the concept of collectivism versus individualism. Thus, Hofstede’s cultural dimension can be used as factors of consumers’ decisions and sustainable performance in the film industry.
We considered the need to examine whether the effect of eWOM differs between regions, and we set a hypothesis with two geographical regions as moderating variables: Western and Asian. This regional distinction is based on the marketing theory that Asian consumers rely heavily on others’ opinions due to their cultural background, while Western consumers make independent purchase decisions [33]. They argued that there is a self-construal difference between Asian and American cultures: American consumers are independent, whereas Asian consumers are interdependent in their self-construal. This concept has been widely adopted in many cross-cultural studies to describe the difference in psychology between Asian and Western consumers [34,35]. Based on Hofstede’s cultural grades, detailed in Table 2, we derived the following hypotheses, in which geographical regions act as moderating variables.
Hypothesis 3a.
The valence from user reviews to box office revenue has a stronger effect for Asian users than Western users.
Hypothesis 3b.
The valence from user reviews to box office revenue has an obvious distinction according to country.
Hypothesis 4a.
The valence from critics’ reviews to box office revenue has a stronger effect for Asian critics than Western critics.
Hypothesis 4b.
The valence from critics’ reviews to box office revenue has an obvious distinction according to country.

3. Methodology

3.1. Research Design

We examined the effect of eWOM, user ratings, and user reviews on box office revenue for Hypotheses 1 and 2. In this process, we also investigated the effect of movie stars, coproduction methods, and other controlled variables on box office revenue. In addition, we validated Hypotheses 3 and 4 to examine whether the regional difference in eWOM, a moderating variable, affected film sales. The research model is diagrammed in Figure 1.

3.2. Sample Selection

Users can leave eWOM on online marketplaces, for instance, for a wide range of products such as books and digital cameras, Yahoo! Movie or for films, for videos, and for restaurants. People tend to rely on the opinions of consumers who have had experience with the goods, especially experience-related products such as films [14]. The types of cultural content that are exchanged between countries include, but are not limited to, publications, films, music, broadcasts, advertisements, and games. We chose the film industry to identify the relationship between eWOM and sustainable revenue for the following reasons. First, the film industry, unlike other content industries, sells tickets at the same price within a country. This characteristic allows us to rule out the possibility that price mediates consumers’ purchasing and, thus, enables us to more clearly understand the relationship between determinants and box office revenue in the film industry [23]. Second, because box office revenue is open to the public, unlike revenue for other content products such as books and CDs, it is possible to obtain correct data for the film industry, and we can thus minimize measurement error. Third, the film industry is characterized as a high-risk, high-return, one-source, and multi-use sector. While a few films make significant profits, most films are unprofitable [13]. Therefore, it is important to find the influences and predictors for sustainable box office success.
To select the sample, we extracted box office results by year for films released between 2006 and 2015 on Box Office Mojo ( and confirmed 338 box office films released in 93 countries, each with revenue exceeding 200 million US dollars. Our sample was limited to the top-eight countries (the United States, the United Kingdom, France, Germany, South Korea, Japan, China, and Hong Kong), which account for 62.8% of total film sales, or 157 billion US dollars, and their users’ and critics’ ratings and reviews.
Archived eWOM data were collected from the Internet Movie Database (IMDb,, which includes general film information and users’ and critics’ reviews. Specifically, we collected general film information from the film’s main page on IMDb, for example, production years, Motion Picture Association of America (MPAA) ratings, genre, directors, actors, language, release dates, and runtime; from there we also collected eWOM-related data such as number of users who left ratings, average user ratings, number of critics who left ratings, and average critics’ ratings. In addition, we collected revenue by country, weekly rank, budget, and production years from the box office pages. We collected users’ reviews, ratings, IDs, countries, and review dates from user review pages, and we gathered critic information such as media affiliations, names, scores, and reviews from the critic review pages of IMDb, which are organized by the film. Next, we categorized each film’s box office revenue and user reviews by the reviewers’ country. Two regional categories were created, Asian and Western, and the top-eight countries were considered. The box office revenue for the two regions was researched for each film. The same two regional categories were used for user reviews. Figure 2, Figure 3 and Figure 4 show the locations on the IMDb pages from which we collected data related to box office revenue as well as users’ and critics’ ratings and reviews, respectively.
We crawled data using InfoFinder v2.0 from Wisenut, Inc., and saved the files in XML format. Then, we converted the files to JSON and csv formats for further analysis and performed taxonomy configuration and parsing with SAS Enterprise Content Categorization (consisting of Content Categorization Studio and Information Retrieval Studio). In addition, we implemented the structural equation model for hypothesis testing and determined moderating effects through SAS eMiner and AMOS.

3.3. Measurement

The following operational definitions for response variables, explanatory variables, moderating variables, and control variables were used, as shown in Table 3.
We used box office revenue (L_RVi,r) as a response variable, as well as four explanatory variables: User rating of eWOM (URTi,r), critic rating of eWOM (CRTi,r), and their net eWOM sentiments (L_USTi,r and L_CSTi,r). The continuous data, including box office revenue and net user sentiment of eWOM, were converted into natural logarithms because their values did not represent a normal distribution with wide variation and large skewness. We set regions as moderating variables. We categorized 93 countries, where hit films (films for which box office revenue exceeded 200 million US dollars for the 10 years since 2006) were released, into two groups: Asian (RGOr1 = 1), consisting of 32 countries, and Western (RGOr1 = 0), consisting of 61 countries. These two groups were used as dummy variables. The moderating effect of the region was also measured for each of the top-eight countries. This study only adopted data for these eight major countries as the analytic source data for the simultaneous test of the two moderating effects.
We extracted data for the variables that have been verified as factors of box office revenue in earlier studies for control variables. Numerous studies have validated whether the appearance of film stars positively affects box office revenue [13,20,36]. To quantify the effect of the appearance of stars in films, we used the Delphi technique. We extracted a list of the main actors and actresses in each film from IMDb and identified cast members who had won Oscar awards for Best Actor or Actress between 1986 and 2015 as stars (STRi).
In addition, we defined an instance of coproduction when either of the following conditions was met, based on information from IMDb: Foreign capital was invested in a film, or a film was coproduced with a foreign director or a film’s major cast included a foreign star (COPi). In addition, we added as control variables film genre (GRi), MPAA rating (MPAi), and the number of screens (L_SCRi,r) [10,13,20]. Film genre and MPAA rating were set as dummy variables, and number of screens was converted into a natural logarithm. Therefore, we have 2096 types of raw observations, which include 317,720 user reviews, 114,927,728 user ratings, and 30,420 critics’ reviews and ratings.

4. Data Analysis and Result

4.1. Data Analysis

The basic method we used to test the hypotheses was sentiment analysis through text mining. We will also quantify each variable and examine the moderating effect of each through structural equations. As is well known, the structural equation model through regression equations is the most effective way to test not only the basic hypothesis but also the moderating effect. There is also a limitation in that this method uses a model that is simplistic. However, there is no doubt that the structural equation model has already been optimized to compensate for such limitations. The econometric model, as shown in Formulas (1) and (2), was derived from the research hypotheses. In Formula (1), i indicates an individual film (i = 1, 2, 3, …, 100), and r1 represents the region, which is either the Asian region (r1 = 1) or the Western region (r1 = 0). In Formula (2), i still indicates an individual film (i = 1, 2, 3, …, 100), but r2 represents “one of the top-eight countries, namely, the United States (US), the United Kingdom (UK), France (FR), Germany (GR), Korea (KR), Japan (JP), China (CN), and Hong Kong (HK). It is assumed that the residual value εi,r has a normal distribution.
L_RVi,r1 = b0 + b1URTi,r1 + b2CRTi,r1 + b3L_USTi,r1 + b4L_CSTi,r1 + b5RGOr1
+ b6L_USTi,r1 * RGOr1 + b7L_CSTi,r1 * RGOr1
          + b8STRi + b9COPi + b10L_SCRi,r1 + b11GRi + b12MPAi + εi,r1
L_RVi,r2 = b0 + b1URTi,r2 + b2CRTi,r2 + b3L_USTi,r2 + b4L_CSTi,r2+ b5RGO r2
+ b6L_USTi,r2* RGOr2 + b7L_CSTi,r2 * RGOr2
          + b8STRi + b9COPi + b10L_SCRi,r2 + b11GRi + b12MPAi + εi,r2
The descriptive statistics from the sample data are shown in Table 4.

4.2. Phase 1: Ratings vs. Reviews

We performed standardized structural estimation using partial least squares regression. The results, shown in Table 5, verify the results from prior studies, and we used these results in Phase 1 of our research, which focused on comparing the effects of ratings and reviews on box office revenue. The comparison was a two-by-two comparison with users and critics. In line with earlier studies on eWOM [13,20,23], it was found that the eWOM user ratings contributed positively to box office performance (H1a is supported). On the other hand, our results, unlike those of previous studies, showed that critics’ ratings do not have a significant effect on international box office revenue (H2a is rejected). We consider this finding a reflection of the trend in which critics rate a film according to a comprehensive assessment of its various elements, including cinematic quality and artistry, not just the film’s popularity. Based on the results for both ratings, we can say that ratings do not have a significant effect on sustainable box office revenue. We concluded that this finding was due to the endogenous nature of online reviews. In other words, online reviews no longer have a convincing effect on a consumer’s purchase decision. However, our user sentiment analysis, based on text mining, revealed that positive user reviews improved sustainable box office revenue (H1b and H2b are supported), which indicated that reviews were a more significant index for predicting sustainable box office revenue compared to ratings. We perceived that our lexicon and coding method for distinguishing between positive and negative terms (in reviews) largely reflected the rating standard for a film and its popularity.

4.3. Phase 2: Comparing the Moderating Effects of Regions and Countries

From the moderating effect of the region (applying r1), as shown in Table 6, we concluded that box office performance was the same when USTi,r1 was within the Asian region (r1 = 1) and when it was in the Western region (r1 = 0; H3a is rejected). In contrast, CSTi,r1 had a positive effect on film revenue in Asian countries but not in Western countries, as indicated by the positive coefficient values for critic sentiment * region (H4a is supported). Based on the results of hypothesis 3a, we cannot say that Asian users, including those from China, Japan, Hong Kong, and Korea, are more affected by eWOM than Western users. This finding is due to the fact that users in China and Hong Kong are less interdependent than other Asian users, in general, are expected to be; thus, are not as affected by user reviews when making their own decisions. This parallels the index of “uncertainty avoidance” in Hofstede’s classification. In other words, while the “uncertainty avoidance” of Asian countries is generally higher than that of Western countries, China and Hong Kong show considerably lower indicators than other Asian countries as well as Western countries. As a result, we concluded that the cultural peculiarities of China and Hong Kong have lowered the Asian indexes on average, offsetting the gap with Western countries. However, the results for the differences between Asian and Western critics’ reviews (H4a) are significant. The fact that the effect of critics’ reviews is more significant for Asian users is very much consistent with the index of “power distance” in Hofstede’s classification, which means that the effect of critics’ reviews could be compared with the tendency to lean psychologically on a stronger object and, furthermore, with the propensity to conform more to a hierarchy (power distance).
We used a three-by-two table to visualize the interaction effect of user and critic reviews and region (categorical moderator: Asian = 1, Western = 0). Figure 5 describes the relationship between users’ and critics’ positive and negative reviews and film sales, which indicated the sentiment elasticity of eWOM. From the critics’ sentiment, we found that Asian reviews had high sentiment elasticity, whereas Western reviews had low sentiment elasticity.
In addition, we conducted a moderating test with eight countries, applying r2 to identify countries’ response order from users’ and critics’ reviews and to compare it with Hofstede’s classification. In this case, eight moderating dummy variables were considered too many for an unconstrained model test. Therefore, we performed an ordinary least squares regression estimation using SAS 9.4, and the results are described in Table 7. The results for Models 1 and 2 revealed the direct effects of eWOM and region on box office revenue, and the results for Model 3 revealed the interaction effects of eWOM and region on box office revenue. Our discussion in this section focuses on the results of Model 3. The results of the moderating test showed that the dummy variable from the moderators was not significant enough to sustainable film performance except for in the United States, which played the role of a dummy standard. In addition, the interactions between L_USTi,r2 and r2s (Log value of user sentiment × each country), and between L_CSTi,r2 and r2s (Log value of critics’ sentiment × each country) were all significant (H3b and H4b were supported, respectively).
From the significant interaction model, we saw that the valence of users’ or critics’ reviews to box office revenue had an obvious distinction according to country. Moreover, the interaction coefficient values represented the slope between film sales and the user or critic sentiment; the greater the slope of the line, the stronger the effect of the interactions. From L_USTi,r2*r2s, we saw that users in Korea and Japan followed other users’ reviews (higher coefficient value) and had higher uncertainty avoidance (from Hofstede’s classification). Conversely, users in Hong Kong and China tended not to take direction from other users and had lower uncertainty avoidance than we expected. The values of the slope were ordered as follows: Korea and Japan > France and Germany > the United States and the United Kingdom > China and Hong Kong.
From the interaction results of L_CSTi,r2*r2s, we found that the value of power distance (from Hofstede’s classification) paralleled our results: China > France > Hong Kong > Korea and Japan > the United States, the United Kingdom, and Germany. The effect of critic reviews differed from that of user reviews. China had a lower coefficient for user sentiment but a high coefficient for critic sentiment, in line with the value and meaning of power distance (from Hofstede’s classification). We have doubts about the extremely distant coefficient value between user and critic sentiment for China. This result means that Chinese people do not accept the opinions of other users but absolutely depend on the official opinions of experts and institutions. We attributed this result to their longstanding communist culture.
The results of the analysis of control variables and their effects on film sales indicated that films that are coproduced by or that include stars from other countries have higher film sales. In addition, consistent with previous research, we found that the total number of screens was a significant determinant of good performance in film sales, while genre and film ratings were irrelevant to film sales.

4.4. Miscellaneous Concerns: Standard Deviation of User Sentiment and Individualism

Another miscellaneous concern that emerged from our analysis was that we identified the concept of individualism (versus collectivism), a dimension of Hofstede’s cultural classification that we did not use, as another concept (apart from uncertainty avoidance) that was reflected by users’ and critics’ reviews. In other words, collectivism was perceived as the concept of diversity of film selection, owing to the dictionary definition of the term, that is, the moral stance, political philosophy, ideology, or social outlook that influences the group or its interests. In this study, we adopted the concept of individualism (versus collectivism) for the number of films that were reviewed by country, rather than for the volume of positive reviews for a film. As shown in Table 8, we found a strong correlation between the variances of user reviews, that is, the differences in the standard deviation. In the case of China and Korea, which have low individualism (high collectivism), the range of film selection (number of films that the user has commented on) was very narrow. However, the United States and the United Kingdom had a wide range of film selection. In other words, the standard deviation and value of individualism display similar propensities: The US and the UK > France > Germany > Japan > Hong Kong > Korea and China. These results showed that the diversity of film choice (or the number of films the user had commented on) by country had a similar trend to the value of individualism (versus collectivism), one of Hofstede’s cultural dimensions.

5. Discussion

5.1. Theoretical and Practical Implications

The study’s theoretical contributions to the literature are as follows. To begin with, this is the first study to analyze the effect of cultural differences on the relationship between eWOM and customer preferences. Before now, studies on cultural differences had mainly been conducted in terms of information-seeking behaviors [37]. However, it is also important to look at changes in the social influence of eWOM according to cultural differences. This issue concerns how people who receive information evaluate and accept the given information, or eWOM. We also analyzed whether cultural differences affected the size of the eWOM effect. A few studies have said that the social structure of digital networks plays a critical role in the spread of a viral message [38], or they have used the geographical region of eWOM as a control variable [15]. However, only a few studies have analyzed whether cultural differences change the size of the eWOM effect. In this study, we found that Asian users were more affected by eWOM than Western users. In other words, cultural and geographical differences change the sentiment elasticity of eWOM. Second, we examined whether the effect of eWOM differs between regions, and we set a hypothesis with two geographical regions as moderating variables: Western and Asian. We found that Asian users were not more affected by eWOM than Western users. This is because the coefficients of user reviews, as a moderator, for China and Hong Kong are lower than we expected for the second moderating test for each country, a result that completely coincides with Hofstede’s classification. In other words, cultural and geographical differences change the sentiment elasticity of eWOM. Third, we performed sentiment analysis using text mining. Previous studies paid more attention to eWOM as a factor of film industry performance. However, they only used quantitative data, such as numeric user ratings [39]. In contrast, we conducted not only a quantitative analysis based on user ratings but also a sentiment analysis using user reviews. To this end, we collected 114,927,728 user ratings and 317,720 eWOM reviews for 338 box office films. Finally, our research suggests the direction of big data analytics with respect to text mining and sentiment analysis. Recent studies have suggested a multidimensional approach to integrating big data into business intelligence [7], and based on this academic trend, we presented a methodology to analyze both structured and unstructured data.
Our conclusions provide several practical implications for sustainable performance in the film industry. First, film producers and distributors must put forth a variety of efforts to encourage users to post positive reviews on the film portals and databases. To this end, they need to actively circulate film trailers on film portal sites and databases around the world, and they should enhance their analyses of users’ eWOM to create appropriate marketing strategies. Second, eWOM-based marketing will be more useful for attracting Asian users, while content-based marketing is recommended for Western users. In particular, movie marketers need to pay close attention to Chinese consumers, and it is important to note that for Chinese consumers, the effect of eWOM on critics is more influential than the effect of eWOM on general users. On the other hand, marketers need to develop strategies to promote the artistry and content of films, such as through awards, for Western consumers. Aggressive promotion in international film festivals and film markets is highly recommended. Third, the selection of actors or actresses should be carefully planned in the preproduction stages. In particular, films for export to Asian countries must cast stars who are preferred in those target markets. A noticeable example that proves the importance of this strategy is the success of Hollywood superhero movies based on Marvel comics, which have sold at high rates and led the sustainable revenue increases. Fourth, coproduction between nations needs to increase for ease of funding and technology exchange. Furthermore, the results of this study exhibit how coproduction is a good strategy for ensuring that films will be exempt from non-tariff barriers and the quota system. This is also shown in the success stories of co-produced movies such as “Late Autumn” (2010) (66.20 million yuan), which caste the Chinese actress Tang Wei as the main actress and “A Wedding Invitation” (2013) (190.11 million yuan), a Korean and Chinese joint-venture.

5.2. Limitations and Future Research

First, a critical limitation of this study is in the data collection process. We collected data for the top-100 box office films released since 2005. However, it is necessary to collect data on more films and more user reviews to test more diverse and complex questions. It is also necessary to ensure the accuracy of a database and to collect data from various film websites. Second, although sentiment analysis using text mining was carried out, only a text analysis of positive and negative criteria was conducted. In the future, we need to conduct a more effective text analysis by including other factors such as a film’s quality and audiences’ level of enjoyment. Third, our analysis centered on eWOM, among the various factors of the sustainable performance of the film industry, and we excluded factors related to marketing costs, such as advertising costs, which need to be considered as a major explanatory variable [11]. Fourth, our analysis did not reflect sustainable performance changes during a film’s running period in foreign markets, which is important given that the running period is relatively short; however, changes during this period can be an important factor when researching marketing timing. Finally, our research is limited to a comparative study with only eight main countries. Thus, in future research, we should survey and discuss more countries that have varying business environments and cultures.

Author Contributions

Conceptualization, H.H.; Data curation, J.K.; Formal analysis, H.H.; Investigation, J.K.; Methodology, H.H.; Project administration, H.H.; Resources, J.K.; Software, J.K.; Supervision, H.H.; Validation, J.K.; Visualization, J.K.; Writing—original draft, H.H.; Writing—review & editing, J.K.


This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Minton, E.; Lee, C.; Orth, U.; Kim, C.-H.; Kahle, L. Sustainable marketing and social media: A cross-country analysis of motives for sustainable behaviors. J. Advert. 2012, 41, 69–84. [Google Scholar] [CrossRef]
  2. Saura, J.; Reyes-Menendez, A.; Alvarez-Alonso, C. Do online comments affect environmental management? Identifying factors related to environmental management and sustainability of hotels. Sustainability 2018, 10, 3016. [Google Scholar] [CrossRef]
  3. Saura, J.R.; Palos-Sánchez, P.; Cerdá Suárez, L.M. Understanding the digital marketing environment with KPIs and web analytics. Future Internet 2017, 9, 76. [Google Scholar] [CrossRef]
  4. Schumann, J.H.; Wangenheim, F.; Stringfellow, A.; Yang, Z.; Blazevic, V.; Praxmarer, S.; Shainesh, G.; Komor, M.; Shannon, R.M.; Jiménez, F.R. Cross-Cultural Differences in the Effect of Received Word-of-Mouth Referral in Relational Service Exchange. J. Int. Mark. 2010, 18, 62–80. [Google Scholar] [CrossRef] [Green Version]
  5. Trusov, M.; Bucklin, R.E.; Pauwels, K. Effects of Word-of-Mouth Versus Traditional Marketing: Findings from an Internet Social Networking Site. J. Mark. 2009, 73, 90–102. [Google Scholar] [CrossRef]
  6. Dellarocas, C. The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms. Manag. Sci. 2003, 49, 1407–1424. [Google Scholar] [CrossRef] [Green Version]
  7. Anastasiei, B.; Dospinescu, N. Electronic Word-of-Mouth for Online Retailers: Predictors of Volume and Valence. Sustainability 2019, 11, 814. [Google Scholar] [CrossRef]
  8. Chevalier, J.A.; Mayzlin, D. The effect of word of mouth on sales: Online book reviews. J. Mark. Res. 2006, 43, 345–354. [Google Scholar] [CrossRef]
  9. Li, X.; Hitt, L.M. Self-Selection and Information Role of Online Product Reviews. Inf. Syst. Res. 2008, 19, 456–474. [Google Scholar] [CrossRef] [Green Version]
  10. Dellarocas, C.; Gao, G.; Narayan, R. Are Consumers More Likely to Contribute Online Reviews for Hit or Niche Products? J. Manag. Inf. Syst. 2010, 27, 127–157. [Google Scholar] [CrossRef]
  11. Chen, Y.; Liu, Y.; Zhang, J. When Do Third-Party Product Reviews Affect Firm Value and What Can Firms Do? The Case of Media Critics and Professional Movie Reviews. J. Mark. 2012, 76, 116–134. [Google Scholar] [CrossRef]
  12. Eliashberg, J.; Shugan, S.M. Film critics: Influencers or predictors? J. Mark. 1997, 61, 68. [Google Scholar] [CrossRef]
  13. Liu, Y. Word of Mouth for Movies: Its Dynamics and Impact on Box Office Revenue. J. Mark. 2006, 70, 74–89. [Google Scholar] [CrossRef]
  14. Godes, D.; Mayzlin, D. Using Online Conversations to Study Word-of-Mouth Communication. Mark. Sci. 2004, 23, 545–560. [Google Scholar] [CrossRef]
  15. Sridhar, S.; Srinivasan, R. Social Influence Effects in Online Product Ratings. J. Mark. 2012, 76, 70–88. [Google Scholar] [CrossRef]
  16. Saura, J.R.; Palos-Sanchez, P.; Grilo, A. Detecting indicators for startup business success: Sentiment analysis using text data mining. Sustainability 2019, 11, 917. [Google Scholar] [CrossRef]
  17. Archak, N.; Ghose, A.; Ipeirotis, P.G. Deriving the Pricing Power of Product Features by Mining Consumer Reviews. Manag. Sci. 2011, 57, 1485–1509. [Google Scholar] [CrossRef] [Green Version]
  18. Cao, Q.; Duan, W.; Gan, Q. Exploring determinants of voting for the “helpfulness” of online user reviews: A text mining approach. Decis. Support Syst. 2011, 50, 511–521. [Google Scholar] [CrossRef]
  19. Chintagunta, P.K.; Gopinath, S.; Venkataraman, S. The Effects of Online User Reviews on Movie Box Office Performance: Accounting for Sequential Rollout and Aggregation Across Local Markets. Mark. Sci. 2010, 29, 944–957. [Google Scholar] [CrossRef]
  20. Dellarocas, C.; Zhang, X.; Awad, N.F. Exploring the value of online product reviews in forecasting sales: The case of motion pictures. J. Int. Mark. 2007, 21, 23–45. [Google Scholar] [CrossRef]
  21. Moon, S.; Bergey, P.K.; Iacobucci, D. Dynamic Effects Among Movie Ratings, Movie Revenues, and Viewer Satisfaction. J. Mark. 2010, 74, 108–121. [Google Scholar] [CrossRef]
  22. Sawhney, M.S.; Eliashberg, J. A parsimonious model for forecasting gross box-office revenues of motion pictures. Mark. Sci. 1996, 15, 113–131. [Google Scholar] [CrossRef]
  23. Duan, W.; Gu, B.; Whinston, A.B. Do online reviews matter?—An empirical investigation of panel data. Decis. Support Syst. 2008, 45, 1007–1016. [Google Scholar] [CrossRef]
  24. Marine-Roig, E. Measuring destination image through travel reviews in search engines. Sustainability 2017, 9, 1425. [Google Scholar] [CrossRef]
  25. Eliashberg, J.; Hui, S.K.; Zhang, Z.J. From Story Line to Box Office: A New Approach for Green-Lighting Movie Scripts. Manag. Sci. 2007, 53, 881–893. [Google Scholar] [CrossRef] [Green Version]
  26. Basuroy, S.; Desai, K.K.; Talukdar, D. An empirical investigation of signaling in the motion picture industry. J. Mark. Res. 2006, 43, 287–295. [Google Scholar] [CrossRef]
  27. Basuroy, S.; Chatterjee, S.; Ravid, S.A. How Critical Are Critical Reviews? The Box Office Effects of Film Critics, Star Power, and Budgets. J. Mark. 2003, 67, 103–117. [Google Scholar] [CrossRef]
  28. King, R.A.; Racherla, P.; Bush, V.D. What We Know and Don’t Know About Online Word-of-Mouth: A Review and Synthesis of the Literature. J. Int. Mark. 2014, 28, 167–183. [Google Scholar] [CrossRef]
  29. Maichum, K.; Parichatnon, S.; Peng, K.-C. Application of the extended theory of planned behavior model to investigate purchase intention of green products among Thai consumers. Sustainability 2016, 8, 1077. [Google Scholar] [CrossRef]
  30. Hofstede, G. Culture’s Consequences: International Differences in Work-Related Values; Sage Publications Inc.: Thousand Oaks, CA, USA, 1984; Volume 5. [Google Scholar]
  31. Hofstede, G.H.; Hofstede, G. Culture’s Consequences: Comparing Values, Behaviors, Institutions and Organizations across Nations; Sage Publications Inc.: Thousand Oaks, CA, USA, 2001. [Google Scholar]
  32. Hofstede, G.; Minkov, M. Long-versus short-term orientation: New perspectives. Asia Pac. Bus. Rev. 2010, 16, 493–504. [Google Scholar] [CrossRef]
  33. Markus, H.R.; Kitayama, S. Culture and the self: Implications for cognition, emotion, and motivation. Psychol. Rev. 1991, 98, 224. [Google Scholar] [CrossRef]
  34. Berry, J.W. Cross-Cultural Psychology: Research and Applications; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
  35. Samovar, L.; Porter, R.; McDaniel, E. Communication between Cultures; Cengage Learning: Boston, MA, USA, 2009. [Google Scholar]
  36. Elberse, A. The Power of Stars: Do Star Actors Drive the Success of Movies? J. Mark. 2007, 71, 102–120. [Google Scholar] [CrossRef]
  37. Fong, J.; Burton, S. A cross-cultural comparison of electronic word-of-mouth and country-of-origin effects. J. Bus. Res. 2008, 61, 233–242. [Google Scholar] [CrossRef]
  38. Bampo, M.; Ewing, M.T.; Mather, D.R.; Stewart, D.; Wallace, M. The effects of the social structure of digital networks on viral marketing performance. Inf. Syst. Res. 2008, 19, 273–290. [Google Scholar] [CrossRef]
  39. Forman, C.; Ghose, A.; Wiesenfeld, B. Examining the relationship between reviews and sales: The role of reviewer identity disclosure in electronic markets. Inf. Syst. Res. 2008, 19, 291–313. [Google Scholar] [CrossRef]
Figure 1. Research Design.
Figure 1. Research Design.
Sustainability 11 03207 g001
Figure 2. Data Collection from Box Office Mojo.
Figure 2. Data Collection from Box Office Mojo.
Sustainability 11 03207 g002
Figure 3. Crawling from IMDb (User and Critic Rating).
Figure 3. Crawling from IMDb (User and Critic Rating).
Sustainability 11 03207 g003
Figure 4. Crawling from IMDb (User and Critic Review).
Figure 4. Crawling from IMDb (User and Critic Review).
Sustainability 11 03207 g004
Figure 5. Sentiment elasticity from users’ and critics’ review.
Figure 5. Sentiment elasticity from users’ and critics’ review.
Sustainability 11 03207 g005
Table 1. Prior Studies on Electronic Word of Mouth.
Table 1. Prior Studies on Electronic Word of Mouth.
AuthorsResponse VariableExplanatory VariablesIndustry (Source)Sample
Archak, et al. [17]sales rankOOe-commerce (,897 reviews for 41 digital cameras and 6786 reviews for 19 camcorders
Cao, et al. [18]Helpfulness OPortal (CNET reviews for 87 software programs
Chen, Liu, and Zhang [11]abnormal stock returnsO film (Metacritic)1275 third-party product reviews (TPR) for movies produced 7 companies
Chintagunta, et al. [19]box office salesOOfilm (Yahoo! Movies)148 movies and 3766 reviews
Dellarocas, Gao, and Narayan [10]volume of user review, box office revenuesO film (Yahoo! Movies, BoxOfficeMojo)63,889 reviews for 104 movies in 2002 and 95,443 reviews for 143 movies
Dellarocas, et al. [20]box office revenues, volume of user reviewsO film (Yahoo! Movies, BoxOfficeMojo)critic grade, user grade and user reviews for 71 movies
Moon, et al. [21]box office revenue, satisfactionO Film (Rotten Tomatoes, Yahoo! Movies)professional critics’ and amateurs’ ratings for 246 movies
Table 2. Hofstede’s Cultural Grades.
Table 2. Hofstede’s Cultural Grades.
NationsPower DistanceIndividualismUncertainty AvoidanceMasculinity
United States40914662
United Kingdom35893566
South Korea60188539
Hong Kong68252927
Table 3. Operational Definitions of Variables.
Table 3. Operational Definitions of Variables.
Response VariablesL_RVi,rNatural log of regional box office revenue for the movie i (US dollars)
Explanatory VariablesURTi,rAverage scale of regional user rating for the movie i (10 point scale)
CRTi,rAverage scale of professional critics’ rating for the movie i (10 point scale)
UPOSi,rNumber of regional users’ positive terms for the movie i
UNEGi,rNumber of regional users’ negative terms for the movie i
L_USTi,rNatural log of users’ net sentiment (UPOSi,rUNEGi,r)
CPOSi,rNumber of regional professional critics’ positive terms for the movie i
CNEGi,rNumber of regional professional critics’ negative terms for the movie i
L_CSTi,rNatural log of professional critics’ net sentiment (CPOSi,rCNEGi,r)
Moderating VariablesRGOr1Dummy variable for the region r1 where a comment was written
(1: For Asian and 0 for Western)
RGOr2Eight dummy variables for the region r2 where a comment was written
(US, UK, FR, GR, KR, JP, CN, HK)
Control VariablesSTRiDummy variable for whether there are stars in the movie i
COPiDummy variable for whether the movie i is co-produced
L_SCRi,rNatural log of the total number of regional screens in opening weekend of the movie i
GRiMovie i’s genre using seven dummy variables
(Action, Comedy, Drama, Science Fiction, Thriller, Kids, Romance)
MPAiMovie i’s MPAA ratings using five dummy variables
(G, PG, PG13, R, and NR)
Table 4. Descriptive Statistics.
Table 4. Descriptive Statistics.
VariablesNMeanStd. Dev.MinMax
Notes: Total number of user ratings analyzed = 114,927,728; total number of user WOM messages analyzed = 317,720; total number of critics’ ratings analyzed = 15,210; total number of critics’ WOM messages analyzed = 15,210.
Table 5. Standardized Structural Estimates of the Model.
Table 5. Standardized Structural Estimates of the Model.
H1aURTi,r to RVi,r0.3456.039**Accept0.446
H1bUSTi,r to RVi,r0.4027.431**Accept0.624
H2aCRTi,r to RVi,r0.0400.419n.s.Reject0.021
H2bCSTi,r to RVi,r0.2474.177**Accept0.374
Model fitness: χ2/df = 1.074 (p < 0.001), GFI = 0.858, AGFI = 0.832, TLI = 0.927, CFI = 0.935
** p < 0.05, *** p < 0.01.
Table 6. Moderating effect of two regions (applied r1).
Table 6. Moderating effect of two regions (applied r1).
Hypotheses & PathsUnstandardized Coefficientsχ2Test Results
Asian GroupWestern GroupUnconstrained ModelConstrained ModelΔχ2
H3a (USTi,r1 to RVi,r1)0.5680.5831644.5401644.5230.017Reject
H4a (CSTi,r1 to RVi,r1)0.8480.3511245.4701207.41438.056 **Accept
** p < 0.05, *** p < 0.01.
Table 7. Moderating effects on eight countries (applied r2).
Table 7. Moderating effects on eight countries (applied r2).
URTi,r20.342 **0.343 **0.345 **L_CST*H.K. 0.547 **
CRTi,r20.0060.0060.005L_CST*KR 0.471 **
L_USTi,r20.401 **0.403 **0.402 **L_CST*U.K. 0.254 **
L_CSTi,r20.243 **0.242 **0.245 **L_CST*FR 0.592 **
CNr2 0.0220.022L_CST*GR 0.281 **
JPr2 0.0650.061L_CST*U.S. 0.243 **
H.K.r2 0.0180.019STR0.042 **0.044 **0.040 **
KRr2 0.0480.045COP0.058 ***0.058 ***0.056 ***
U.K.r2 0.0660.067L_SCR0.046 ***0.047 ***0.044 ***
FRr2 0.0720.074GR-SF−0.042−0.045−0.043
GRr2 0.0490.051GR-KD−0.065−0.065−0.063
U.S.r2 0.143 *0.146 *GR-DRM0.2060.2080.204
L_UST*CN 0.199 **GR-CMD0.2650.2660.266
L_UST*JP 0.543 **GR-RMC0.3030.3050.302
L_UST*H.K. 0.213 **GR-AT0.4390.4390.438
L_UST*KR 0.572 **MPA-PG0.0470.0480.045
L_UST*U.K. 0.311 **MPA-PG13−0.623−0.625−0.622
L_UST*FR 0.431 **MPA-R−0.147−0.149−0.146
L_UST*GR 0.393 **MPA-NR−0.028−0.032−0.027
L_UST*U.S. 0.276 **R20.6280.6290.631
L_CST*CN 0.619 **F-value78.29 ***78.82 ***79.27 ***
L_CST*JP 0.468 **----
** p < 0.05, *** p < 0.01.
Table 8. Moderating effects of Region Two (Applied r2).
Table 8. Moderating effects of Region Two (Applied r2).
VariableStandard DeviationValue of Individualism vs. Collectivism

Share and Cite

MDPI and ACS Style

Hwangbo, H.; Kim, J. A Text Mining Approach for Sustainable Performance in the Film Industry. Sustainability 2019, 11, 3207.

AMA Style

Hwangbo H, Kim J. A Text Mining Approach for Sustainable Performance in the Film Industry. Sustainability. 2019; 11(11):3207.

Chicago/Turabian Style

Hwangbo, Hyunwoo, and Jonghyuk Kim. 2019. "A Text Mining Approach for Sustainable Performance in the Film Industry" Sustainability 11, no. 11: 3207.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop