Analyzing Restaurant Customers’ Evolution of Dining Patterns and Satisfaction during COVID-19 for Sustainable Business Insights

: Observing and interpreting restaurant customers’ evolution of dining patterns and satisfaction during COVID-19 is of critical importance in terms of developing sustainable business insights. This study describes and analyzes customers’ dining behavior before and after the pandemic outbreak by means of statistically aggregating and empirically correlating 651,703 restaurant-user-generated contents posted by diners during 2019–2020. Twenty review topics, mostly food, were identiﬁed by latent Dirichlet allocation, whereas analysis of variation and rating-review regression were performed to explore whether and why customers became less satisﬁed. Results suggest that customers have been paying fewer visits to restaurants since the outbreak, assigning lower ratings, and showing limited evidence of spending more. Interestingly, queuing, the most annoying factor for restaurant customers during normal periods, turns out to receive much less complaint during COVID-19. This study contributes by discovering business knowledge in the context of COVID-19 based on big data that features accessibility, relevance, volume, and information richness, which is transferable to future studies and can beneﬁt additional population and business. Meanwhile, this study also provides practical suggestions to managers regarding the framework of self-evaluation, business mode, and operational optimization.


Introduction
The COVID-19 pandemic has fundamentally changed the world in all walks of life [1][2][3][4]. The restaurant industry has been among the hardest hit industries by coronavirus such that daily restaurant demand reduced by 0.06% for every 1% increment in daily new COVID-19 cases [5]. This has posed both threat and opportunity to the sustainability of the restaurant industry. On the threat side, the reduced number of dining-out customers undoubtedly harms restaurants' revenue, whereas the precautionary measures by restaurants may also compromise customer satisfaction [6]. However, on the opportunity side, because customers' need for tasty food, cozy environment, and premium service never actually disappear, restaurants have to be fully prepared for the resumption of customer demand in the long run [7] while sparing no effort to better understand and serve their customers in the short run. It is therefore meaningful to observe and interpret restaurant customers' evolution of dining patterns and satisfaction during COVID-19 to come up with sustainable business insights.
Researchers have been studying the impact COVID-19 on consumers' dining behavior shortly after the outbreak and have proposed preliminary findings. For example, in the cyber space, Mayasari et al. [8] reported that people were submitting fewer Google queries about "restaurant" but searching more about "delivery" and "take-away". In the real world, Yang et al. [5] discovered that both daily new COVID-19 cases and stay-at-home orders had negative impact on restaurant consumption, especially for full-service establishments.
Meanwhile, Kim et al. [9] observed sharp increase in daily food-delivery orders along with growing COVID-19 case and death daily report.
Unfortunately, existing studies have been limited by data accessibility, data relevance, data volume, and information richness such that it could be problematic to transfer the methodology or knowledge from extant studies to future ones and to benefit additional population or business. For example, although foot traffic and card transaction data [5] are extremely reliable, they are not unconditionally open to the research community. Google query data [8], though fully accessible, can only measure customer purchase intention instead of actual purchase. Likewise, Kim and Lee's [10] discovery that customers prefer private dining spaces during COVID-19 was based on questionnaires in a virtual setting. To achieve stronger relevance, Chen et al. [11] obtained user posts on the internet written by customers who had actually visited the restaurants, but the relatively small dataset might weaken the generalizability of related findings. Kim et al.'s [9] large dataset of delivery sales during COVID-19 revealed the hierarchical effect of the pandemic on food-delivery orders in terms of restaurant type, but no mediating effect was addressed because the dataset contains no detailed description regarding how a meal was consumed or whether the meal was satisfactory.
Indeed, obtaining restaurant dining data during COVID-19 with authenticity, great volume, and rich information is a tough task, but it is not without solution. Consumers are more and more willing to share their experience and point of view about restaurants on the internet, collectively referred to as user-generated contents (UGCs). UGC contains important information regarding whether a customer is satisfied with a service as well as the underlying reason [12,13]. For example, a five-star rating surely indicates satisfactory, whereas the accompanying review "the steak is incomparable" provides the explanation. Under this rationale, Jia [14,15] exhibited the feasibility and achievement of quantifying the causation between restaurant rating and review to discover the knowledge hidden within.
Based on the necessity of observing and interpreting restaurant customers' evolution of dining patterns and satisfaction during COVID-19 as well as the availability of online UGCs and analyzing tools, this study aims at describing and understanding customers' dining behavior before and after the pandemic outbreak by means of statistically aggregating and empirically correlating restaurant UGCs posted by diners during 2019-2020. More specifically, two key research questions are examined: RQ1: Did customers visit restaurants less frequently because of COVID-19? RQ2: Were customers less satisfied with restaurants because of COVID-19? If so, what were the reasons?
The remainder of this paper is thus organized as follows. The literature review section provides a brief overview of UGC analysis techniques and concerns. The data and method section describes in detail the source of UGC, the processing of review data, and the modeling approach. The results section reports the statistical and empirical findings with the discussion section responding with explanations. The final section concludes the paper with future study suggestions.
The contribution of this study is two-fold. On one hand, this study reveals the change of dining patterns and satisfaction of restaurant customers due to COVID-19 and its underlying mechanism. It complements existing meal service literature with rich facts and findings uncovered during COVID-19, a crisis situation. On the other hand, this study provides instant and practical suggestions to restaurant managers, helping them reflect on their pain and gain during the pandemic and getting them better prepared for the upcoming resumption of customer dining-out demand.

UGC Analysis Techniques
Online rating and review are the two most typical forms of UGC [16]. Rating, usually based on an interval scale, quantifies whether and to what extent a consumer is satisfied with a service [17]. Review is rather a textual comment, describing the service experience of the consumer in a qualitative style [18] that can explain why the consumer is happy or not. Rating and review together construct a complete description of customer satisfaction [19].
A prerequisite for analyzing online review is transforming the natural language within into structured data. Some studies use manual coding [20][21][22] or manually create a dictionary for subsequent automatic coding [23][24][25], both involving substantial human labor. Alternatively, the advancement in text-mining technology has increased the chances of automatically discovering the major topics in the reviews [26]. Compared with manual topic modeling, automatic topic modeling not only has the advantage of being faster when processing larger data sets, but it also incorporates less human intervention, resulting in more objective results. Latent Dirichlet allocation (LDA), a state-of-art topic modeling tool, is both capable and suitable to automatically perform the topic identification task [27][28][29] such that a qualitative review is thereby quantifiable by calculating the weight of topics within [30,31].

Concerns Over Fake UGC
UGC analysis is not without limitations. Perhaps the biggest concern that disturbs researchers is UGC fraud, or fake UGC, which is generated by individual fraudsters or professionally organized review campaigners for "face lift" or other purposes [32]. Fake UGCs may undermine the credibility of a UGC study [33]. Fortunately, UGC websites are now capable of automatically detecting fake UGCs thanks to advanced filtering technology [34]. Besides, extant UGC studies have agreed that fake UGCs have limited impact on the detection of major topics and satisfaction factors from UGC [19,35,36].

Data Source
This study obtained UGC data from Dianping.com (accessed on 19 April 2021), a Chinese online review community website where a number of important researches gathered data [12][13][14][15]. Furthermore, this study focused on the top 100 restaurants in Shanghai, China, with the most UGC posted up to 1 January 2021, as this could help eliminate the effect of regional difference as well as achieve best research cost efficiency [12][13][14]. One restaurant was removed from the top 100 list because it opened in 2020, which meant missing data in 2019. Regarding the remaining 99 restaurants, the 651,703 UGCs posted during 2019-2020 were obtained from Dianping.com (accessed on 19 April 2021). The sampling period was chosen as such because 1 January 2021 was a key time point for Chinese citizens regarding COVID-19, around which people began to know about the pandemic, then considered an epidemic, and the government started to enact stay-at-home decrees. Therefore, the first sample year 2019 was regarded as not influenced by COVID-19, whereas the second sample year 2020 was treated as influenced by COVID-19.

Ethical Considerations
This study was carefully performed to preserve the rights of the UGC website, the restaurants, and the UGC posters. Using online UGC data for academic research objectives has been an appropriate and usual practice. Yet, the data acquiring procedure was cautiously conducted to minimize the network burden on the website server. In the meantime, the brands of the restaurants were masked in order not to appreciate one restaurant over another. This study rather intends to uncover knowledge from the collective ensemble. Finally, the identities of the studied UGC posters were anonymized.

Data Processing
A typical UGC contains four pieces of information, namely: date, price per customer, rating, and review. The date of UGC helps identify whether the dining happened in 2019 or 2020. It is assumed the date difference between restaurant visiting and UGC posting is negligible. Price per customer is calculated by dividing a restaurant bill with the number of diners. Rating is an interval scale assigned by the UGC poster, with 5.0 indicating extremely satisfied and 0.5 indicating totally unsatisfied. The above three pieces of information are all structured data that can be directly incorporated into statistical and empirical analysis.
Reviews, or textual comments, require additional data processing to convert into structured data. The reviews, mostly written in Chinese, were first broken into sentences and further into words using Jieba [37], a Chinese language processing tool. Before counting the frequency of each word in all reviews, words that carry little meaning were removed, such as "I" and "today". Words that had a frequency of less than 3258 were also removed because they appeared on average in only 0.5% of the reviews which means that they were of limited importance. The remaining high frequency meaningful words were grouped into clusters, or topics, in order to reduce the dimension of the analysis. Latent Dirichlet allocation (LDA) [38,39] was applied to handle the clustering task such that words of similar meanings or high correlations were aggregated to form topics. Finally, weight of topic (WOT) was calculated per Equation (1) such that a review is quantified by the number of topics within and the relative frequency of each topic.
where WOT i is the weight of topic i in review j, and TF ij is the frequency of topic i in review j. A "1" was added to the denominator in case there was a review that did not mention any topic. The data and variables of this study are thus summarized in Table 1.

Model
Analysis of variance (ANOVA) was applied to statistically explore whether customers paid fewer visits to the restaurants, whether price per customer declined, and whether the customers were less satisfied after the COVID-19 outbreak. To quantify the causation between rating and review, Equation (2) was proposed.
where Rating j is the rating assigned by the UGC poster along with review j; Year j is the year when review j was posted; and α 0 , α i , β 0 , and β i are the coefficients to be estimated. A positive α i suggests the users are relatively satisfied with topic i [40], such as food or service. For similar reasons, a negative β i indicates the users become less satisfied with topic i after the COVID-19 outbreak.

Descriptive Statistics
The descriptive statistics of this study is shown in Table 2. Panel 1 aggregates the data on the restaurant level. Averagely, 42% of the visits happened in 2020, indicating a decline in annual visits from 2019. The price ranges from 30.23 RMB (a beverage store) to 422.54 RMB (a Japanese restaurant). The standard deviation for price per customer is also as large as 64.67 RMB comparing with the average value of 127.33 RMB. This suggests the necessity of addressing the hierarchical effect of restaurant price level. The mean rating is 4.34 with a small standard deviation of 0.17. This means the customers are generally satisfied with the studied restaurants, echoing the fact that the restaurants have more UGC posted than other restaurants. Panel 2 reports the statistics on the UGC level, which is largely in agreement with Panel 1. The standard deviation of rating is as high as 0.83, making it possible and meaningful to explain the variation of customer satisfaction with review information.

Review Topic Identification
From the 651,703 reviews, high frequency meaningful words were first extracted by a word counting program, and then clustered by the unsupervised LDA. Altogether, 20 topics were identified, as are enumerated in Table 3. Table 3 has three panels. Panel 1 contains topics that provide information on things before a meal. Users wrote in reviews about friends or the occasion of a meal, about making appointment or otherwise having to wait for seat, and about the environment of a restaurant. It is noted here that the naming of each topic based on the word cluster, as well as the grouping of topics, were manually conducted, which is a common practice [12,13,15]. Panel 2 contains topics that provide information on the meal itself. As many as 13 food topics were identified, ranging from specific food to food styles, suggesting users' primary focus of a restaurant is food. Panel 3 contains topics that provide information on things after a meal, including services (this actually happens during the meal, but is put here for categorizing purpose), discount, and cost performance.    Weight (17,489) The above topics have constructed a general view of what a diner cares about concerning a restaurant. Considering the fact that among the identified topics are similar item pairs such as appointment and queuing and discount and cost performance and the fact that there are a great number of food topics, the current result has been sufficient for subsequent analysis without further increasing the number of topics.
Based on the identified topics, WOT was calculated for each review. Table 4 reports the statistics of WOT on the UGC level. The mean values of the majority of WOT i are around 0.050, a value suggesting generally even distribution of topics in reviews, albeit some topics are more frequently mentioned than others, e.g., WOT 20 > WOT 14 (p < 0.001).

Analysis of Variance
ANOVA was performed to test whether customers paid fewer visits to the restaurants, whether price per customer declined, and whether the customers were less satisfied after the COVID-19 outbreak. Table 5 reports that COVID-19 did reduce yearly review posting from 3830 to 2753 per restaurant (p < 0.001). This suggests that customers' visits to restaurants may also suffer a decline of 28% due to COVID-19. Likewise, customer satisfaction significantly dropped from 4.39 to 4.27 (p < 0.001). Unlike N or Rating, Price only slightly changed by an increase of 2.3 (p < 0.05) and was not significant from an economic point of view.

Rating-Review Regression
Multi-linear regression was conducted to quantify the causation between rating and review, with ordinary least squares applied to estimate the coefficients in Equation (2). Table 6 summarizes the regression results. It is noted that although a "0.05" was introduced into Equation (1) such that Equation (2) did not suffer from perfect collinearity by ΣWOT i , WOT 1 (i.e., Topic 1. Friend) was excluded from regression to further reduce the risk from multi-collinearity [40,41].   Standard errors in parentheses, * p < 0.05, ** p < 0.01, *** p < 0.001.

Model 1 included only WOT i without incorporating the effect of
Year or the control of Price. The result suggests that in the sample window of 2019-2020, customers were generally more satisfied with some topics and less satisfied with others. More specifically, customers were more satisfied with cost performance (α 20 = 0.405, p < 0.001) as well as some food types such as beef hotpot (α 8 = 0.654, p < 0.001) and dessert (α 17 = 0.432, p < 0.001). However, customers were quite unsatisfied about queuing (α 3 = −0.699, p < 0.001). In addition, lower ratings are associated with environment (α 4 = −0.152, p < 0.001), services (α 18 = −0.165, p < 0.001), and discount (α 19 = −0.167, p < 0.001). Model 2 controlled Price in an attempt to address the hierarchical effect of restaurant price level, but no significant difference was found from Model 1 in terms of estimated coefficients or adjusted R 2 .
Model 3 incorporated Year and its cross terms with WOT i .
Year has a negative impact on Rating (β 0 = −0.224, p < 0.001), which is consistent with the ANOVA result in Table 5 regarding both sign and magnitude. It is therefore further confirmed that customers became less satisfied with restaurants after the COVID-19 outbreak. Meanwhile, the regression results for the cross terms indicate that the reduction of satisfaction is not evenly distributed among topics. On the one side are topics that have not offset the negative impact of Year, such as services (β 18 = 0.025, p > 0.05) and discount (β 19 = −0.059, p > 0.05), meaning the customers have become less satisfied partially because of these topics. On the other side are topics that not only have significant and positive coefficients, but also have coefficients that totally offset the negative impact of Year, such as queuing (β 3 = 0.357, p < 0.001), environment (β 4 = 0.380, p < 0.001), and western cuisine (β 16 = 0.336, p < 0.001). This is sending a positive signal to restaurant owners, which will be further explored in the discussion section. Last but not least, the hierarchical effect of restaurant price level was again not observed in Model 4.

Discussion
Identifying major topics from online review using unsupervised machine learning such as LDA has been an effective approach to probe customer interest, as has been demonstrated in this study. However, a more careful examination of the identified food topics (see Table 3, Panel 2) shows that some restaurants are more emphasized whereas others are weakened. On the one extreme are several topics addressing similar concepts, such as spicy chicken with chicken, beef hotpot with bullfrog hotpot, and Cantonese cuisine with Hong Kong cuisine. On the other extreme are topics each incorporating multiple concepts, such as coffee embedded in crab, and all western restaurants, be it German or Spanish, combined in Western cuisine. Such heterogeneity is a natural consequence of the uneven distribution of the types of the sampled restaurants as well as the unsupervised rationale of the chosen machine learning protocol. Nevertheless, Table 3, with the three panels together, has provided a complete framework to analyze a restaurant from customers' perspective. Moreover, the cross sectional WOT in Table 4 demonstrates customers' relative interest on some topics over others. Restaurant managers are therefore suggested to selfevaluate following the above framework as well as track the change of customer interest in order to optimize resource allocation in operational improvement.
COVID-19 has indeed introduced negative impact on the restaurant industry, with the studied restaurants collectively being no exception. Customers have been paying fewer visits to restaurants since the outbreak, assigning lower ratings, and showing limited evidence of spending more. The stay-at-home orders, together with fear of infection, have surely reduced customers' motivation of dining out. Meanwhile, the precautionary measures by restaurants may also compromise customer satisfaction, with some buffet restaurants temporarily switching to ordering mode, resulting in a freefall in dining experience. Given the systematic problem of COVID-19, restaurant managers are advised to hold a rational perspective on the temporary situation and not to expect avoiding loss in the dine-in business. They are instead encouraged to look beyond dine-in, the primary source of income during normal periods, and seek alternative revenue stream such as takeaway and delivery.
Besides challenge, COVID-19 has also brought opportunity to restaurants. Queuing, the most annoying factor for restaurant customers during normal periods, turns out to receive much less complaint during COVID-19. Of course, that is the result of fewer customers dining out. But more importantly, both the cross-sectional data and the beforeand-after comparison indicate that customers are extremely sensitive to queuing. Managers are thus strongly advised to optimize waiting time as a preparation for the upcoming resumption by analyzing customers' temporal pattern of visiting as well as designing coupons that can reposition price-sensitive customers from rush hours.

Conclusions
Observing and interpreting restaurant customers' evolution of dining patterns and satisfaction during COVID-19 is of critical importance in terms of developing sustainable business insights. This study has described and analyzed customers' dining behavior before and after the pandemic outbreak by means of statistically aggregating and empirically correlating 651,703 restaurant UGCs posted by diners during 2019-2020. Twenty review topics, mostly food, have been identified by LDA, and ANOVA and rating-review regression have been performed to explore whether and why customers became less satisfied. Results suggest that customers have been paying fewer visits to restaurants since the outbreak, assigning lower ratings, and showing limited evidence of spending more. Interestingly, queuing, the most annoying factor for restaurant customers during normal periods, turns out to receive much less complaint during COVID-19.
This study contributes by discovering business knowledge in the context of COVID-19, based on big data that features accessibility, relevance, volume, and information richness, which is transferable to future studies and can benefit additional population and business. To be specific, although this study obtained restaurant UGC from Shanghai written in Chinese during COVID-19, the methodology is by no means limited to specific type of service, region, language, or time period. Therefore, this study is of interest to other national audiences at any time and about any industry. Meanwhile, this study has also provided practical suggestions to managers regarding the framework of self-evaluation, business mode, and operational optimization. They are encouraged to self-evaluate under the established framework to optimize resource allocation in operational improvement, look beyond dine-in and seek alternative revenue stream such as takeaway and delivery, and reduce waiting time as a preparation for the upcoming resumption by analyzing customers' temporal pattern of visiting as well as designing coupons that can reposition price-sensitive customers from rush hours.
One of the limitations of this study is sample bias, because this study only considers restaurant customers who have chosen to post online UGC. In other words, the sampled customers may not completely represent those who have not post UGC. Hence, future studies are encouraged to incorporate additional data sources such as foot traffic and card transaction data so as to come up with a more reliable measure of restaurant visits, and on-site questionnaires to obtain offline UGC. Another limitation of this study is the fineness of the discovered topics, with all the reviews characterized by only 20 topics. Although the discovered topics have provided an overview of customer experience, they are not elaborate enough to quantify the subtle differences among individual customers who had written seemingly similar reviews but assigned diverse ratings. The relatively small adjusted R 2 suggests that the majority of the variation of customer satisfaction remains unexplained. Therefore, future studies may address this issue by improving text mining and empirical modeling.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The author declares no conflict of interest.