Social Media Data Analytics to Enhance Sustainable Communications between Public Users and Providers in Weather Forecast Service Industry

: The weather forecast service industry needs to understand customers’ opinions of the weather forecast to enhance sustainable communication between forecast providers and recipients particularly inﬂuenced by inherent uncertainty in the forecast itself and cultural factors. This study aims to investigate the potential for using social media data to analyze users’ opinions of the wrong weather forecast. Twitter data from Korea in 2014 are analyzed using textual analysis and association rule mining to extract meaningful emotions or behaviors from weather forecast users. The results of textual analysis show that the frequency of negative opinions is considerably high compared to positive opinions. More than half of the tweets mention precipitation forecasts among the meteorological phenomena, implying that most Koreans are sensitive to rain events. Moreover, association rules extracted from the negative tweets reveal a pattern of user criticism according to the seasons and types of forecast errors such as a “false alarm” or “miss” error. This study shows that social media data can provide valuable information on the actual opinion of the forecast users in almost real time, enabling the weather forecast providers to communicate e ﬀ ectively with the public. the actual weather phenomena time. Second, a data mining technique was adopted to present the possibility of extracting beneﬁcial information on the characteristics of public opinion of the weather forecast. This study uses association rule mining as an e ﬀ ective tool to generate a set of rules from a given data pattern.


Introduction
Understanding customers' perceptions of products and services is one of the most important tasks in marketing sector. This is because users' perceptions affect their purchasing patterns and can ultimately be linked to company performance such as sales [1][2][3][4]. Therefore, the data on user perception are used to establish and revise the sales strategy by grasping the customer needs in the enterprises. Furthermore, users' satisfaction data can be utilized for planning the better service and budget allocation in the public service sector [5]. The importance of information related with users' perception can also be applied to the meteorological community that provides weather forecasting services. In most countries around the world, weather forecasts are delivered to the public daily through a variety of media. While these forecasts are beneficial to the public, meteorological forecasts are not always recognized as useful and valuable information. This is partly due to unavoidable uncertainties in weather forecasts [6], or partly because forecast users are unable to respond effectively to forecasts or warnings caused by communication errors [7].
Many studies have not only focused on improving the accuracy of weather forecasts, but also on enhancing communication between forecast providers and users through a better understanding weather forecasts from the Korean Meteorological Administration (KMA). To do this, Twitter data were analyzed in two steps: a basic sentiment analysis and association rule mining. First, a macroscale text analysis of day-to-day tweets was performed to obtain a daily time series of public sentiments. Then, a microscopic analysis of the peaks in the negative sentiment frequencies was performed to determine specific cases with which to explain the mood of the users. Second, attempts were made to develop comprehensive correlations between Twitter users' angry sentiments and the characteristics of weather forecast errors by utilizing association rule mining.

Data Collection
Public tweets containing the keyword "Korea Meteorological Administration (KMA)" in South Korea were collected from 1 January to 31 December 2014. Our tweet data consist of 15,783 posts and of these, 2921 containing sentiments, comments, or opinions were selected to form the data set used in our analysis. Figure 1 shows the percentages of the selected tweets posted in each month. A little more than 3% of the tweets were posted in May, in contrast to about 20% in July and August. The frequencies of the tweets related to the KMA and the number of rain events followed a very similar pattern, with a positive Pearson correlation coefficient of r = 0.85 (p-value = 0.001). It is not surprising that a relatively high percentage of people tweeted about the weather in periods with more precipitation, as this is one of the most common adverse weather phenomena in Korea. In fact, according to a survey conducted in 2011, 55.5% of the general public responded to precipitation as the most interesting forecast element of the KMA [43]. This suggests that Korean weather agencies must prioritize forecasts related to meteorological phenomena that occur in summer to boost positive public opinions of the weather forecast service.
Sustainability 2020, 12, x FOR PEER REVIEW 3 of 13 platforms, to investigate whether social media can be used to obtain a more in-depth picture of public attitudes towards weather forecasts from the Korean Meteorological Administration (KMA). To do this, Twitter data were analyzed in two steps: a basic sentiment analysis and association rule mining. First, a macroscale text analysis of day-to-day tweets was performed to obtain a daily time series of public sentiments. Then, a microscopic analysis of the peaks in the negative sentiment frequencies was performed to determine specific cases with which to explain the mood of the users. Second, attempts were made to develop comprehensive correlations between Twitter users' angry sentiments and the characteristics of weather forecast errors by utilizing association rule mining.

Data Collection
Public tweets containing the keyword "Korea Meteorological Administration (KMA)" in South Korea were collected from 1 January to 31 December 2014. Our tweet data consist of 15,783 posts and of these, 2921 containing sentiments, comments, or opinions were selected to form the data set used in our analysis. Figure 1 shows the percentages of the selected tweets posted in each month. A little more than 3% of the tweets were posted in May, in contrast to about 20% in July and August. The frequencies of the tweets related to the KMA and the number of rain events followed a very similar pattern, with a positive Pearson correlation coefficient of r = 0.85 (p-value = 0.001). It is not surprising that a relatively high percentage of people tweeted about the weather in periods with more precipitation, as this is one of the most common adverse weather phenomena in Korea. In fact, according to a survey conducted in 2011, 55.5% of the general public responded to precipitation as the most interesting forecast element of the KMA [43]. This suggests that Korean weather agencies must prioritize forecasts related to meteorological phenomena that occur in summer to boost positive public opinions of the weather forecast service.

Analysis Method
The selected tweets were classified into categories such as season, sentiment, meteorological phenomenon, type of forecast error, and purpose of forecast usage. The set of hierarchical labels for the categories depicted in Figure 2 were attached to each of the 2921 tweets based on their content. One tweet can be coded with multiple labels according to its content regarding the weather forecast.

Analysis Method
The selected tweets were classified into categories such as season, sentiment, meteorological phenomenon, type of forecast error, and purpose of forecast usage. The set of hierarchical labels for the categories depicted in Figure 2 were attached to each of the 2921 tweets based on their content. One tweet can be coded with multiple labels according to its content regarding the weather forecast. Hierarchy of labels for five categories, i.e., season, sentiment, weather phenomena, type of forecast error, and application of forecast. One tweet can be coded with multiple labels according to its content regarding the weather forecast. Table 1 shows that about 75% of the aggregate sentiments for weather forecasts were negative. More than half of the negative tweets mentioned "rain" or "downpour", indicating that Koreans are sensitive to precipitation events. Just less than 4% of the sentiment for forecasts were positive. A small percentage (22%), most of which corresponded to weather information seeking or sharing between users, was neutral. The substantial percentage of negative sentiments does not correspond with the findings of Anderson [44], who showed that extremely satisfied and dissatisfied customers are more likely to express their experiences relative to customers with moderate levels of contentment. Users are inclined to share positive experiences so that they gain social or self-approval for their purchases [45]. However, the fact that weather forecast services in Korea are free could potentially explain the overwhelming majority of negative tweets, because Koreans have no incentive to express their satisfaction with the service. As Skowronski & Carlston [46] showed, negative comments have a greater impact than positive ones in terms of making impressions. Therefore, weather agencies need to be more concerned about reducing negative tweets to improve the public impression of their services. In accordance with these requirements, this study focuses on obtaining a better understanding of the public negative perceptions.
We conducted textual analysis of the negative comments posted on Twitter in two ways. First, four days with the most frequent negative comments were selected and the contents of the negative tweets were examined in depth. For the selected days, the responses of the public on the tweets were analyzed by comparing the weather forecasts with the actual weather phenomena by time. Second, a data mining technique was adopted to present the possibility of extracting beneficial information on the characteristics of public opinion of the weather forecast. This study uses association rule mining as an effective tool to generate a set of rules from a given data pattern. Hierarchy of labels for five categories, i.e., season, sentiment, weather phenomena, type of forecast error, and application of forecast. One tweet can be coded with multiple labels according to its content regarding the weather forecast. Table 1 shows that about 75% of the aggregate sentiments for weather forecasts were negative. More than half of the negative tweets mentioned "rain" or "downpour", indicating that Koreans are sensitive to precipitation events. Just less than 4% of the sentiment for forecasts were positive. A small percentage (22%), most of which corresponded to weather information seeking or sharing between users, was neutral. The substantial percentage of negative sentiments does not correspond with the findings of Anderson [44], who showed that extremely satisfied and dissatisfied customers are more likely to express their experiences relative to customers with moderate levels of contentment. Users are inclined to share positive experiences so that they gain social or self-approval for their purchases [45]. However, the fact that weather forecast services in Korea are free could potentially explain the overwhelming majority of negative tweets, because Koreans have no incentive to express their satisfaction with the service. As Skowronski & Carlston [46] showed, negative comments have a greater impact than positive ones in terms of making impressions. Therefore, weather agencies need to be more concerned about reducing negative tweets to improve the public impression of their services. In accordance with these requirements, this study focuses on obtaining a better understanding of the public negative perceptions.
We conducted textual analysis of the negative comments posted on Twitter in two ways. First, four days with the most frequent negative comments were selected and the contents of the negative tweets were examined in depth. For the selected days, the responses of the public on the tweets were analyzed by comparing the weather forecasts with the actual weather phenomena by time. Second, a data mining technique was adopted to present the possibility of extracting beneficial information on the characteristics of public opinion of the weather forecast. This study uses association rule mining as an effective tool to generate a set of rules from a given data pattern.
Association rule mining is a method for extracting causal relationships or frequent patterns between item sets in large databases [47]. A typical application of this is market basket analysis, which investigates whether two or more products are likely to be purchased together, and whether the purchase of one product increases the likelihood of the other being purchased. The output of the market basket analysis, represented by a set of association rules, is used to help determine appropriate product marketing strategies. An association rule is denoted "A→B," where A and B are two independent items, referred to, respectively, as the left-hand side (antecedent) and right-hand side (consequent). The arrow means "is related to." The three measures used to estimate the strength of association implied by a rule are "support," "confidence," and "lift." An association may be thought of as more frequent and interesting as the values of "support" and "confidence" are closer to 1. A "lift" of an association rule greater than 1 suggests that the rule has considerable usefulness or strength [48]. This study adopted the a priori algorithm [49] implemented in a package in R to obtain a useful set of association rules with large lifts. Association rule mining is a method for extracting causal relationships or frequent patterns between item sets in large databases [47]. A typical application of this is market basket analysis, which investigates whether two or more products are likely to be purchased together, and whether the purchase of one product increases the likelihood of the other being purchased. The output of the market basket analysis, represented by a set of association rules, is used to help determine appropriate product marketing strategies. An association rule is denoted "A→B," where A and B are two independent items, referred to, respectively, as the left-hand side (antecedent) and right-hand side (consequent). The arrow means "is related to." The three measures used to estimate the strength of association implied by a rule are "support," "confidence," and "lift." An association may be thought of as more frequent and interesting as the values of "support" and "confidence" are closer to 1. A "lift" of an association rule greater than 1 suggests that the rule has considerable usefulness or strength [48]. This study adopted the a priori algorithm [49] implemented in a package in R to obtain a useful set of association rules with large lifts.   Table 2 shows that the frequency of negative sentiments, frequency of comments on meteorological phenomena, and types of forecast error for four days with the highest frequency of negative opinions. There are two concepts, "False alarm" and "Miss", in types of forecast error. "False alarm" is an occasion when meteorological agency provides wrong information that particular weather phenomenon will happen, and "Miss" is the opposite case.  Table 2 shows that the frequency of negative sentiments, frequency of comments on meteorological phenomena, and types of forecast error for four days with the highest frequency of negative opinions. There are two concepts, "False alarm" and "Miss", in types of forecast error. "False alarm" is an occasion when meteorological agency provides wrong information that particular weather phenomenon will happen, and "Miss" is the opposite case. The sum of the frequencies of the four meteorological phenomena is not equal to the frequency of all the negative comments (233) because some of the tweets contain only negative comments without referring to specific weather phenomena. For the same reason, the sum of the frequencies for the two error types is less than the total frequencies of the negative comments.

Textual Analysis of Negative Opinions about the Weather Forecast Errors
The negative comment rate was significantly high on all four days, and people were dissatisfied mostly with the forecast errors related to rain. Most postings related to Case A in Table 2 referred to "False alarms" regarding the precipitation forecast, and many of the complaints also discussed heat. The maximum rainfall (100 mm) was forecasted the day before the event, while the actual precipitation levels of 53 mm and only 9.5 mm were recorded from midnight to 3:00 a.m. and 9:00 p.m. to 11:00 p.m. in Seoul, making some people disappointed with the failure to forecast this precipitation. On that day, the actual maximum temperature was 34 • C and the humidity was 54%. The discomfort index (DI) calculated based on the temperature and humidity was 83.35, which is a "very high" level of discomfort according to KMA standards.
Case B is similar to Case A in that it also had a high frequency of comments about "False alarms" regarding the precipitation forecast. Rain was predicted owing to the indirect effects of a typhoon. In Seoul, 13 mm of rainfall was recorded, but the precipitation was concentrated at dawn and in the nighttime, not during the day. Therefore, it is possible that people felt there was almost no rain despite the heavy rainfall expected due to the typhoon. This led to a high feeling of discontent owing to the "False alarm".
In Case C, unlike the other cases, there was not much of a difference between the frequencies of negative opinions about "False alarms" (19) and "Misses" (12). The forecast for July 18 th predicted that it would rain in Seoul, but only 1.5 mm of rainfall recorded since the rainfall actually stopped after dawn. Meanwhile, there was heavy rainfall (28.7 mm) from 3:00 p.m. in Busan despite a forecast indicating no rain. Hence it was assumed that the negative tweets about the "False alarm" were posted from Seoul, while users near Busan might have posted about the "Miss" error.
Finally, Case D indicated an overwhelming mention of "Misses" in relation to precipitation. No rain was forecast on September 12 th , but there was a sudden occurrence of thunderstorms and lightning showers, with more than 50 mm of rain, in the northern part of Seoul; this led to several people being stranded because of floods near rivers. At that time, the KMA received a lot of criticism due to the unexpected event.
These four cases indicate the advantages and potential usages of weather-related social media data. It was observed that people tweeted about real-world meteorological phenomena. The veracity of the content of the tweets suggests that forecast providers can use a real-time analysis of social media data to understand current weather phenomena that people are experiencing. In addition, as shown in Cases A, B, and C, there may be a slight difference between the accuracy of the forecast calculated by meteorological agencies and the actual experience of the public depending on the time, region, and amount of precipitation. Therefore, the results indicate a need to improve forecast accuracy models, evaluated from a user's point of view. For instance, the accuracy of daily precipitation forecasts could be calculated by dividing a day into two times: a busy period that includes traditional working hours (e.g., 08:00-20:00) and the less busy period (e.g., 21:00-07:00) rather than the usual time period of the entire day (00:00-23:00).

Interpretation of Association Rules
This study seeks to deduce reasons of negative opinions, which have a significant impact on user attitudes toward certain products or services [50]. As with the customer segmentation method in online marketing, it is more cost-effective to focus on the most dissatisfied users than to deal with all situations with negative comments. Therefore, this study divided users who had negative comments into normal and critical groups and focused on analyzing the most dissatisfied critical user group as the target segment.
A dataset used for mining association rules was prepared with 2177 tweets associated with negative labels. The negative opinions were classified into two levels, i.e., normal and critical, according to the severity of the negative comments. Among the negative tweets, 1430 comments were assigned to the "normal" category, which encompasses inconveniences caused by incorrect forecasts. The remaining 747 comments were labeled "censure" and classified in the "critical" category, which included posts that were very angry with, slanderous of, or blameful of the KMA. An R package was used to generate a set of association rules with the consequents of "censure," for which the support, confidences, and lifts were greater than 0.1, 0.5, and 1.4, respectively. In Figure 4, which presents information on each association rule, words other than the "censure" are items corresponding to the antecedent of an association rule. An association rule is depicted as a circle whose size is proportional to the confidence value of the corresponding rule arrows from an item point toward a circle, and arrows from a circle point to the "censure." The color of a circle is equivalent to the lift value of its rule, as follows: the darker the color, the greater the lift value. For example, the circle marked "A" in Figure 4 represents an association rule of {Summer, Rain, FA (False Alarm), Heat}→{Censure}. It can be seen that the lift value is large because the circle is darker than others.

Interpretation of Association Rules
This study seeks to deduce reasons of negative opinions, which have a significant impact on user attitudes toward certain products or services [50]. As with the customer segmentation method in online marketing, it is more cost-effective to focus on the most dissatisfied users than to deal with all situations with negative comments. Therefore, this study divided users who had negative comments into normal and critical groups and focused on analyzing the most dissatisfied critical user group as the target segment.
A dataset used for mining association rules was prepared with 2177 tweets associated with negative labels. The negative opinions were classified into two levels, i.e., normal and critical, according to the severity of the negative comments. Among the negative tweets, 1430 comments were assigned to the "normal" category, which encompasses inconveniences caused by incorrect forecasts. The remaining 747 comments were labeled "censure" and classified in the "critical" category, which included posts that were very angry with, slanderous of, or blameful of the KMA. An R package was used to generate a set of association rules with the consequents of "censure," for which the support, confidences, and lifts were greater than 0.1, 0.5, and 1.4, respectively. In Figure 4, which presents information on each association rule, words other than the "censure" are items corresponding to the antecedent of an association rule. An association rule is depicted as a circle whose size is proportional to the confidence value of the corresponding rule arrows from an item point toward a circle, and arrows from a circle point to the "censure." The color of a circle is equivalent to the lift value of its rule, as follows: the darker the color, the greater the lift value. For example, the circle marked "A" in Figure 4 represents an association rule of {Summer, Rain, FA (False Alarm), Heat}→{Censure}. It can be seen that the lift value is large because the circle is darker than others. . Diagram of set of association rules with "censure" consequents. The Diagram is generated using a package in R, whose support, confidence, and lift values are greater than 0.1, 0.5, and 1.4 respectively. The size of a circle, which corresponds to an association rule, represents the confidence value. The color of a circle expresses the lift value of its rule; a darker color indicates a larger lift value.
The typical circles marked "A" to "H" in Figure 4 were transformed into the equivalent association rules of the related items that led to the users' accusations, as shown in Table 3. It should be noted that all the rules, except rule G, include precipitation as a factor leading to criticism. This indicates that Koreans are highly sensitive to incorrect rainfall forecasts. Most Koreans dislike getting wet. In addition, since more than 40% of Koreans are likely to use public transportation owing to the high cost of parking and traffic jams in cities, they pay keen attention to the rainfall forecast to decide whether to take umbrellas or raincoats before going out.  Figure 4. Diagram of set of association rules with "censure" consequents. The Diagram is generated using a package in R, whose support, confidence, and lift values are greater than 0.1, 0.5, and 1.4 respectively. The size of a circle, which corresponds to an association rule, represents the confidence value. The color of a circle expresses the lift value of its rule; a darker color indicates a larger lift value.
The typical circles marked "A" to "H" in Figure 4 were transformed into the equivalent association rules of the related items that led to the users' accusations, as shown in Table 3. It should be noted that all the rules, except rule G, include precipitation as a factor leading to criticism. This indicates that Koreans are highly sensitive to incorrect rainfall forecasts. Most Koreans dislike getting wet. In addition, since more than 40% of Koreans are likely to use public transportation owing to the high cost of parking and traffic jams in cities, they pay keen attention to the rainfall forecast to decide whether to take umbrellas or raincoats before going out. Table 3. Association rules obtained from the circles marked "A" to "H" in Figure 4. The rules are listed in order of their lift values.

Item Association Rule Support Confidence Lift
Comparing the Rule with each Case in Whereas in spring and summer, as seen in Rules A, C, and D, there were high criticisms of the "False alarm" error in the precipitation forecast, in autumn, there were also high concerns about the "Miss" error. Since the frequency of complaints caused by the error in the rain forecast was relatively low during the winter when it usually snows rather than rains, no related rule was generated. The reason why there are more severe complaints about "False alarm" than "Miss" in spring is due to seasonal characteristics in Korea. In other words, some regions frequently suffer crop damage and water shortage around spring, as the decrease in precipitation starting from winter is continued until spring as shown in Table 3. In addition, as the particulate matter concentration is the highest in spring, a strong social atmosphere to wait for the rain to remove it makes it more disappointing not having the expected rain than having the unexpected rain in spring. In hot summer, as many Koreans also look forward to a cool rain to bring temperature down, the accusation against "False alarm" is higher than "Miss", like in spring. However, the unusual point is that in summer, as in Rule A and Rule D, "False alarm" precipitation forecasts are often accused with the term 'heat' and 'umbrella'. Here are some examples of actual tweets related to this.
Rule A: • Hey, KMA... When is the shower coming? It's hot enough.

•
The forecast said it would rain heavily, but sweat is pouring like rain. I was tricked again by Korea Meteorological Administration.
Rule D: • The weather service gave me muscle in my arm. What about my umbrella? It is only sunny. • I even brought an umbrella and I'm wearing boots, believing it would rain. Instead of rain, the sun is sizzling ... Weather agency, are you kidding?
Rule A can be interpreted to mean that the complaints were numerous enough to blame KMA if the rain did not fall and temperatures were high despite forecasts that rainfall would occur in the summer. This is similar to Case A in Figure 3; about 50% of tweets complaining about the situation related to the above rule were written on 25 July, when Case A occurred. This shows that Case A can be generalized as Rule A, which implies that a "False alarm" error tends to cause greater user dissatisfaction under summer weather conditions in which the temperature and humidity are high simultaneously in Korea. Rule D is similar to Rule A in that it describes a "False alarm" forecasting error in summer, except that it does not rain when people take their umbrellas or raincoats in anticipation of predicted rainfall. Rule D implies that it is very annoying to carry an umbrella that would not be used on a hot summer day.
Rules B, E, F, and H describe public frustration with the "Miss" error in precipitation forecasts due to the characteristics of Koreans who are reluctant to be rained. In other words, the four rules contain the annoying situations of rain exposures caused by, in turn, unexpected rain forecasts in autumn, no prepared umbrellas, the wrong expected time of precipitation, and the absences of torrential rain forecasts. Specific examples of the complaints about the cases are as follows.
Rule B: • It's a lot of rain for the fall. The KMA said it's 0.1mm per hour, but it doesn't look like this.
Rule E: • Hey, KMA guys. You said the weather would be mild and nice today, didn't you? I got wet without an umbrella. If I have a lousy head, I'll sue you.
Rule F: • It's raining. I didn't bring an umbrella with me because it said it would rain around 9 p.m. So, I got rained on my way home from work. The weather agency's supercomputer is worth 10 billion won? The salary of its employees is our tax.
Rule H: • The rain is so severe that it is almost invisible like typhoon, but did you say it would just rain once or twice?
Rule B indicates that the "Miss" error in rain forecasts is more upsetting in the fall; this contrast to Rule C, which emphasizes the negative effect of a "False alarm" error in the spring. The reason for this is because although there are many days in spring on which air quality is bad, including days on which particulate matter and pollen levels are high, the most favorable weather to go out occurs in the fall, making the people perceive rainfall as a phenomenon that causes considerable inconvenience. Rule E, which is the opposite of Rule D, indicates complaints in which rainfall occurred but people went out without umbrellas because no rain was forecasted. Rule F reflects the fact that although weather agencies consider daily rain forecasts to be correct, the public may consider it a "Miss" error if they are caught in rain due to an incorrect forecast. This suggests that it is desirable to consider the forecast accuracy for each forecast time zone in addition to the traditional forecast verification method in which the accuracy of the 24 h forecast is forecast on a daily basis. Rule H represents a situation where people caught in heavy rainfall consider even correct rain forecasts to be false if the severity of the predicted rainfall differed. This case shows that, while the accuracy-based evaluation method classifies a forecast that predicts a small amount of rainfall as a "Hit," actual users of the forecasts consider it a "Miss." Accordingly, to communicate more effectively with their users, meteorological agencies should consider the deviation between the forecasted and actual precipitation, and the extent of accuracy, as a forecast verification measure.
Rule G indicates that the long-term forecasts released in the fall are associated with a high level of dissatisfaction owing to the failure to predict frequent severely cold conditions in the coming winter. Of 76 tweets related to the cold in winter, 60 were determined to correspond to Rule G, despite the precise 24 h forecast of the cold as follows.
Rule G: • Korea Meteorological Administration said this winter is supposed to be warm, but it is cold from early December! The weather forecast from the KMA is wrong again! Actually, in the case of the above comment, the 24-hour weather forecast accurately predicted a cold weather and issued even a cold wave watch, but the people ignored the accuracy of the forecast and accused the long-term prediction of the overall mild winter. The reason for this is that the cold weather came earlier than anticipated before being fully prepared due to the false long-term forecast. Rule G reflects that, in relation to the cold, users are more sensitive to the long-term forecasts announced before winter than the 24-hour forecasts, suggesting the importance of precise long-term forecast of the cold for the upcoming winter.
Considering the association rules shown in Figure 4 and Table 3, four ways to reduce the negative opinion of users about the weather forecast service in Korea were established. First, the "False alarm" rate in precipitation forecasts should be reduced both in summer and in spring. Second, rain forecasts should be made accurately to reduce the "Miss" rate in fall. Third, the accuracy of long-term forecasts for cold, rather than precipitation, should be improved in winter. Finally, efforts should also be made to better predict the frequency and level of rainfall, as well as whether any will occur at all. The last two involve additional costs and time for technological development, but the first two can be implemented immediately because weather forecasters can adjust the rate of "False alarms" and "Misses" by tuning the threshold of precipitation forecasts based on the season.

Discussion
Prior to this study, most of survey-based research could understand only the superficial or partial users' opinions about weather forecasts service due to limitations in the scope or depth the questions could deal with. Thus, there is a need for a new method to identify the forecast users' attitudes and behaviors in more detail, replacing the question-based survey approach. This study investigated how social media data could be used as a new method to explore more specifically and realistically how the public users perceive the weather forecast service. We conducted a textual analysis and association rule mining by using Twitter data created in Korea during 2014 to derive the beneficial lessons to enhance the satisfaction of Korean weather forecast users.
The results of textual analysis showed that the proportion of negative opinions was relatively high (75%) compared to positive, because the motivation for users to deliberately write positive comments is lower than that of paid goods in the sense that weather forecast is a free service in Korea. Thus, it is not necessary to judge the quality of the forecast service relatively low even if there are more negative opinions than other products, and the research of the content analysis of social media on free goods such as public service is needed to be carried out more deeply. As a result of analyzing daily frequency of negative Tweets, we found that most of the days of negative comments were gathered in July and August. Most of the negative comments written on the four days with the largest number of daily negative tweets were about the precipitation forecast errors. The complaints about "False alarm" and "Miss" were clearly distinguished from each other depending on the weather conditions. If the forecasted rain did not come during the hot and humid summer season, the discomfort index rose, making the accusation against "False alarm" more apparent than in other seasons. On the other hand, there was a lot of criticism regarding "Miss" when unpredicted downpour caused major or minor damage due to the failure to prepare for it. Since these tweet articles are written almost in real time and accurately reflect the actual weather conditions of the users, it can be used as data that can enhance the reliability of the forecast evaluation rather than the measurement results obtained at observation points installed only in limited places.
The association rule analysis was able to identify several major factors for the phenomena in which the degree of user complaint, among other negative comments, is very high. With the exception of winter, discontent caused by errors in precipitation forecasts was the most common pattern, varying from season to season. Blame for "False alarm" errors were mainly expressed in spring and summer, and criticism of "Miss" errors occurred significantly in autumn. In addition, we also found out blame patterns in which the ill-predicted forecasts of the time or amount of the precipitation would be regarded to be at fault. On the other hand, the conventional verification criteria would classify the forecasts which accurately predicted only precipitation itself as "Hit", even if they had miscalculated the time of amount of the rain. In winter, there was a high level of dissatisfaction with forecast errors related to cold weather rather than precipitation forecasts. In particular, even when the severe cold was predicted accurately in the day ahead forecast, it was highly dissatisfied with the long-term forecast of overall mild winter temperatures. Therefore, the users do not form the satisfaction of the forecast service in the form of evaluating and averaging all kinds of forecasts as a whole. Instead, we can see that they tend to alter the image of the entire service to negative, if they experience inconvenience or damage to their life or schedule by at least one forecast.

Conclusions
From the Korean specific cultural characteristics discovered in the study so far, we can infer the following points to be considered by the KMA in order to enhance sustainable relationships between public users and the meteorological society.

•
It is crucial to consider the behaviors of Korean people and improve the recognized accuracy of precipitation forecasts. It is particularly necessary to make even minor corrections to precipitation forecasts for each period to reduce the frequency of "False alarm" errors in spring and summer, and to prevent "Miss" errors in fall (see Rule A~D).

•
In winter, the temperature forecast is more important than the precipitation forecast. The technical aspects of the long-term forecast related to winter cold, which is announced late in fall (the preparation time for winter) need improvement because this forecast has a greater impact on public impressions compared to 24-h forecasts (see Rule G).
Until recently, the KMA only interacted with the public in relation to the accuracy of its precipitation predictions, creating a large gap between the forecast providers and the public. The differences in the points of view held by these groups must be reduced with the introduction of a customized forecast evaluation model that considers accurate predictions of the time, location, and quantities of rainfall. In other words, there is a need to try to switch from the conventional forecasting system, which focused on the extent of accuracy, to forecast production, provision, and evaluation from a user perspective, which could reflect patterns of urban life that involve complicated and diverse needs. Of course, social media analysis cannot be used to assess the emotions of all users. However, it can provide valuable information that traditional survey methods cannot, owing to the large amount of data collected via widespread real-time feedback. The generalization problem caused by potential sampling biases can be mitigated if data collected from more than one social media platform are analyzed. Enhanced science communication based on useful information provided by analyses of social media can help forecast agencies to improve the existing forecast system, enabling them to ultimately produce more valuable weather forecast services that could provide high satisfaction to the public.